Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
> On 18 Oct 2023, at 18:59, Ilya Maximets wrote: > > On 10/18/23 17:14, Vladislav Odintsov wrote: >> Hi Ilya, Terry, >> >>> On 7 Mar 2023, at 14:03, Ilya Maximets wrote: >>> >>> On 3/7/23 00:15, Vladislav Odintsov wrote: Hi Ilya, I’m wondering whether there are possible configuration parameters for ovsdb relay -> main ovsdb server inactivity probe timer. My cluster experiencing issues where relay disconnects from main cluster due to 5 sec. inactivity probe timeout. Main cluster has quite big database and a bunch of daemons, which connects to it and it makes difficult to maintain connections in time. For ovsdb relay as a remote I use in-db configuration (to provide inactivity probe and rbac configuration for ovn-controllers). For ovsdb-server, which serves SB, I just set --remote=pssl:. I’d like to configure remote for ovsdb cluster via DB to set inactivity probe setting, but I’m not sure about the correct way for that. For now I see only two options: 1. Setup custom database scheme with connection table, serve it in same SB cluster and specify this connection when start ovsdb sb server. >>> >>> There is a ovsdb/local-config.ovsschema shipped with OVS that can be >>> used for that purpose. But you'll need to craft transactions for it >>> manually with ovsdb-client. >>> >>> There is a control tool prepared by Terry: >>> >>> https://patchwork.ozlabs.org/project/openvswitch/patch/20220713030250.2634491-1-twil...@redhat.com/ >>> >>> But it's not in the repo yet (I need to get back to reviews on that >>> topic at some point). The tool itself should be fine, but maybe name >>> will change. >> >> I want to step back to this thread. >> The mentioned patch is archived with "Changes Requested" state, but there is >> no review comments in this patch. >> If there is no ongoing work with it, I can take it over to finalise. >> For now it needs a small rebase, so I can do it and resend, but before want >> to hear your thoughts on this. >> >> Internally we use this patch to work with Local_Config DB for almost 6 >> months and it works fine. >> On each OVS update we have to re-apply it and sometimes solve conflicts, so >> would be nice to have this patch in upstream. > > Hi, I'm currently in the middle of re-working the ovsdb-server configuration > for a different approach that will replace command-line and appctl configs > with a config file (cmdline and appctls will be preserved for backward > compatibility, but there will be a new way of setting things up). This should > be much more flexible and user-friendly than working with a local-config > database. That should also address most of the concerns raised by Terry > regarding usability of local-config (having way too many ways of configuring > the same thing mainly, and requirement to use special tools to modify the > configuration). I'm planning to post the first version of the change > relatively soon. I can Cc you on the patches. Okay, got it. It would be nice if you can Cc me for not to miss patches, thanks! > > Best regards, Ilya Maximets. > >> >>> 2. Setup second connection in ovn sb database to be used for ovsdb cluster and deploy cluster separately from ovsdb relay, because they both start same connections and conflict on ports. (I don’t use docker here, so I need a separate server for that). >>> >>> That's an easy option available right now, true. If they are deployed >>> on different nodes, you may even use the same connection record. >>> Anyway, if I configure ovsdb remote for ovsdb cluster with specified inactivity probe (say, to 60k), I guess it’s still not enough to have ovsdb pings every 60 seconds. Inactivity probe must be the same from both ends - right? From the ovsdb relay process. >>> >>> Inactivity probes don't need to be the same. They are separate for each >>> side of a connection and so configured separately. >>> >>> You can set up inactivity probe for the server side of the connection via >>> database. So, server will probe the relay every 60 seconds, but today >>> it's not possible to set inactivity probe for the relay-to-server direction. >>> So, relay will probe the server every 5 seconds. >>> >>> The way out from this situation is to allow configuration of relays via >>> database as well, e.g. relay:db:Local_Config,Config,relays. This will >>> require addition of a new table to the Local_Config database and allowing >>> relay config to be parsed from the database in the code. That wasn't >>> implemented yet. >>> I saw your talk on last ovscon about this topic, and the solution was in progress there. But maybe there were some changes from that time? I’m ready to test it if any. Or, maybe there’s any workaround? >>> >>> Sorry, we didn't move forward much on that topic since the presentation. >>> There are few unanswered questions
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
On 10/18/23 17:14, Vladislav Odintsov wrote: > Hi Ilya, Terry, > >> On 7 Mar 2023, at 14:03, Ilya Maximets wrote: >> >> On 3/7/23 00:15, Vladislav Odintsov wrote: >>> Hi Ilya, >>> >>> I’m wondering whether there are possible configuration parameters for ovsdb >>> relay -> main ovsdb server inactivity probe timer. >>> My cluster experiencing issues where relay disconnects from main cluster >>> due to 5 sec. inactivity probe timeout. >>> Main cluster has quite big database and a bunch of daemons, which connects >>> to it and it makes difficult to maintain connections in time. >>> >>> For ovsdb relay as a remote I use in-db configuration (to provide >>> inactivity probe and rbac configuration for ovn-controllers). >>> For ovsdb-server, which serves SB, I just set --remote=pssl:. >>> >>> I’d like to configure remote for ovsdb cluster via DB to set inactivity >>> probe setting, but I’m not sure about the correct way for that. >>> >>> For now I see only two options: >>> 1. Setup custom database scheme with connection table, serve it in same SB >>> cluster and specify this connection when start ovsdb sb server. >> >> There is a ovsdb/local-config.ovsschema shipped with OVS that can be >> used for that purpose. But you'll need to craft transactions for it >> manually with ovsdb-client. >> >> There is a control tool prepared by Terry: >> >> https://patchwork.ozlabs.org/project/openvswitch/patch/20220713030250.2634491-1-twil...@redhat.com/ >> >> But it's not in the repo yet (I need to get back to reviews on that >> topic at some point). The tool itself should be fine, but maybe name >> will change. > > I want to step back to this thread. > The mentioned patch is archived with "Changes Requested" state, but there is > no review comments in this patch. > If there is no ongoing work with it, I can take it over to finalise. > For now it needs a small rebase, so I can do it and resend, but before want > to hear your thoughts on this. > > Internally we use this patch to work with Local_Config DB for almost 6 months > and it works fine. > On each OVS update we have to re-apply it and sometimes solve conflicts, so > would be nice to have this patch in upstream. Hi, I'm currently in the middle of re-working the ovsdb-server configuration for a different approach that will replace command-line and appctl configs with a config file (cmdline and appctls will be preserved for backward compatibility, but there will be a new way of setting things up). This should be much more flexible and user-friendly than working with a local-config database. That should also address most of the concerns raised by Terry regarding usability of local-config (having way too many ways of configuring the same thing mainly, and requirement to use special tools to modify the configuration). I'm planning to post the first version of the change relatively soon. I can Cc you on the patches. Best regards, Ilya Maximets. > >> >>> 2. Setup second connection in ovn sb database to be used for ovsdb cluster >>> and deploy cluster separately from ovsdb relay, because they both start >>> same connections and conflict on ports. (I don’t use docker here, so I need >>> a separate server for that). >> >> That's an easy option available right now, true. If they are deployed >> on different nodes, you may even use the same connection record. >> >>> >>> Anyway, if I configure ovsdb remote for ovsdb cluster with specified >>> inactivity probe (say, to 60k), I guess it’s still not enough to have ovsdb >>> pings every 60 seconds. Inactivity probe must be the same from both ends - >>> right? From the ovsdb relay process. >> >> Inactivity probes don't need to be the same. They are separate for each >> side of a connection and so configured separately. >> >> You can set up inactivity probe for the server side of the connection via >> database. So, server will probe the relay every 60 seconds, but today >> it's not possible to set inactivity probe for the relay-to-server direction. >> So, relay will probe the server every 5 seconds. >> >> The way out from this situation is to allow configuration of relays via >> database as well, e.g. relay:db:Local_Config,Config,relays. This will >> require addition of a new table to the Local_Config database and allowing >> relay config to be parsed from the database in the code. That wasn't >> implemented yet. >> >>> I saw your talk on last ovscon about this topic, and the solution was in >>> progress there. But maybe there were some changes from that time? I’m ready >>> to test it if any. Or, maybe there’s any workaround? >> >> Sorry, we didn't move forward much on that topic since the presentation. >> There are few unanswered questions around local config database. Mainly >> regarding upgrades from cmdline/main db -based configuration to a local >> config -based. But I hope we can figure that out in the current release >> time frame, i.e. before 3.2 release. >> >> There is also this workaround: >>
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
Hi Ilya, Terry, > On 7 Mar 2023, at 14:03, Ilya Maximets wrote: > > On 3/7/23 00:15, Vladislav Odintsov wrote: >> Hi Ilya, >> >> I’m wondering whether there are possible configuration parameters for ovsdb >> relay -> main ovsdb server inactivity probe timer. >> My cluster experiencing issues where relay disconnects from main cluster due >> to 5 sec. inactivity probe timeout. >> Main cluster has quite big database and a bunch of daemons, which connects >> to it and it makes difficult to maintain connections in time. >> >> For ovsdb relay as a remote I use in-db configuration (to provide inactivity >> probe and rbac configuration for ovn-controllers). >> For ovsdb-server, which serves SB, I just set --remote=pssl:. >> >> I’d like to configure remote for ovsdb cluster via DB to set inactivity >> probe setting, but I’m not sure about the correct way for that. >> >> For now I see only two options: >> 1. Setup custom database scheme with connection table, serve it in same SB >> cluster and specify this connection when start ovsdb sb server. > > There is a ovsdb/local-config.ovsschema shipped with OVS that can be > used for that purpose. But you'll need to craft transactions for it > manually with ovsdb-client. > > There is a control tool prepared by Terry: > > https://patchwork.ozlabs.org/project/openvswitch/patch/20220713030250.2634491-1-twil...@redhat.com/ > > But it's not in the repo yet (I need to get back to reviews on that > topic at some point). The tool itself should be fine, but maybe name > will change. I want to step back to this thread. The mentioned patch is archived with "Changes Requested" state, but there is no review comments in this patch. If there is no ongoing work with it, I can take it over to finalise. For now it needs a small rebase, so I can do it and resend, but before want to hear your thoughts on this. Internally we use this patch to work with Local_Config DB for almost 6 months and it works fine. On each OVS update we have to re-apply it and sometimes solve conflicts, so would be nice to have this patch in upstream. > >> 2. Setup second connection in ovn sb database to be used for ovsdb cluster >> and deploy cluster separately from ovsdb relay, because they both start same >> connections and conflict on ports. (I don’t use docker here, so I need a >> separate server for that). > > That's an easy option available right now, true. If they are deployed > on different nodes, you may even use the same connection record. > >> >> Anyway, if I configure ovsdb remote for ovsdb cluster with specified >> inactivity probe (say, to 60k), I guess it’s still not enough to have ovsdb >> pings every 60 seconds. Inactivity probe must be the same from both ends - >> right? From the ovsdb relay process. > > Inactivity probes don't need to be the same. They are separate for each > side of a connection and so configured separately. > > You can set up inactivity probe for the server side of the connection via > database. So, server will probe the relay every 60 seconds, but today > it's not possible to set inactivity probe for the relay-to-server direction. > So, relay will probe the server every 5 seconds. > > The way out from this situation is to allow configuration of relays via > database as well, e.g. relay:db:Local_Config,Config,relays. This will > require addition of a new table to the Local_Config database and allowing > relay config to be parsed from the database in the code. That wasn't > implemented yet. > >> I saw your talk on last ovscon about this topic, and the solution was in >> progress there. But maybe there were some changes from that time? I’m ready >> to test it if any. Or, maybe there’s any workaround? > > Sorry, we didn't move forward much on that topic since the presentation. > There are few unanswered questions around local config database. Mainly > regarding upgrades from cmdline/main db -based configuration to a local > config -based. But I hope we can figure that out in the current release > time frame, i.e. before 3.2 release. > > There is also this workaround: > > https://patchwork.ozlabs.org/project/openvswitch/patch/an2a4qcpihpcfukyt1uomqre.1.1641782536691.hmail.wentao@easystack.cn/ > It simply takes the server->relay inactivity probe value and applies it > to the relay->server connection. But it's not a correct solution, because > it relies on certain database names. > > Out of curiosity, what kind of poll intervals you see on your main server > setup that triggers inactivity probe failures? Can upgrade to OVS 3.1 > solve some of these issues? 3.1 should be noticeably faster than 2.17, > and also parallel compaction introduced in 3.0 removes one of the big > reasons for large poll intervals. OVN upgrade to 22.09+ or even 23.03 > should also help with database sizes. > > Best regards, Ilya Maximets. Regards, Vladislav Odintsov ___ discuss mailing list
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
On 3/31/23 01:14, Vladislav Odintsov wrote: > Thanks Ilya for such a detailed description about inactivity probes and > keepalives. > > regards, > Vladislav Odintsov > > > > regards, > Vladislav Odintsov >> On 31 Mar 2023, at 00:37, Ilya Maximets via discuss >> wrote: >> >> On 3/30/23 22:51, Vladislav Odintsov via discuss wrote: >>> Hi Ilya, >>> following your recomendation I’ve built OVS 3.1.0 plus Terry’s patch [1]. >>> It’s a bit outdated, but with some changes related to "last_command" logic >>> in ovs*ctl it successfully built. >>> Also, I grabbed your idea gust to hardcode inactivity interval for ovsdb >>> relay because it just solves my issue. >>> So, after testing it seems to work fine. I’ve managed to run ovn-sb-db >>> cluster with custom connections from local db and ovsdb relay with >>> connections from sb db. >> >> Good to know. >> >>> I’ve got a question here: >>> Do we actually need probing from relay to sb cluster if we have configured >>> probing from the other side in other direction (db cluster to relay)? Maybe >>> we even can just set to 0 inactivity probes in ovsdb/relay.c? >> >> If connection between relay and the main cluster is lost, relay may >> not notice this and just think that there are no new updates. All the >> clients connected to that relay will have stale data as a result. >> Inactivity probe interval is essentially a value for how long you think >> you can afford that condition to last. > > Do I understand you correctly that by “connection is lost” you mean an > accidental termination of tcp session? Like iptables drop or cluster member > got killed by sigkill? Right. Here also sudden power loss, someone tripping over a cable, machine force-reset and other causes like this. > In my understanding if cluster member will just be gracefully stopped, it’ll > gracefully shutdown the connection and relay will reconnect to another > cluster member? That's correct. Graceful shutdown will trigger a correct termination of TCP session, so the other end will know and re-connect. > > Just of curiosity, in case of accidental termination, where some “outdated” > ovn-controller which is connected to relay which in turn thinks it is > connected to cluster but it is not. If in such condition ovn-controller tries > to claim vif, will relay detect connection failure and reconnecy to another > “upstream”? It depends, but it may not detect an issue. Relay basically forwards a transaction. So, it will send the transaction received from ovn-controller to the socket "connected" to the main cluster. And it will wait for reply. And reply will never arrive. If transaction doesn't have a timeout specified (and ovn-controller transactions do not), both controller and the relay may wait for the transaction reply indefinitely. > >> >>> Also, ovsdb relay has active bidirectional probing to ovn-controllers. >>> If tcp session got dropped, ovsdb relay wont notice this without probing? >> >> TCP timeouts can be very high or may not exist at all, if the network >> connectivity suddenly disappears (a firewall in between or one of >> the nodes crashed), both the client and the server may not notice >> that for a very long time. I've seen in practice OVN clusters where >> nodes suddenly disappeared (crashed) and other nodes didn't notice >> that for many hours (caused by non-working inactivity probes). >> >> Another interesting side effect to consider is if controller disappears >> and the relay will keep sending updates to it, that may cause significant >> memory usage increase on the relay, because it will keep the backlog of >> data that underlying socket didn't accept. May end up being killed by >> OOM killer, if that continues long enough. > > By disappearing you mean death of ovn-controller without proper connection > termination? Yes. > So if I understand correctly, relay-to-controllers probing is a “must have”. > That’s interesting, thanks! More or less, yes. As I said in some other email, it's less important than the opposite direction, so the actual probe interval can likely be set higher, but we should have something to close dead connections eventually. > >> >> If you don't want to deal with inactivity probes, you may partially >> replace them with TCP keepalive. Disable probes and start daemons with >> keepalive library preloaded, i.e. LD_PRELOAD=libkeepalive.so with the >> configuration you think is suitable (default keepalive time is 2 hours >> on many systems, so defaults are likely not a good choice). You will >> loose ability to detect infinite loops or deadlocks and stuff like that, >> but at least, you'll be protected from pure network failures. >> See some examples at the end of this page: >> https://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/ >> >> Running a cluster without any bidirectional probes is not advisable. >> >> Best regards, Ilya Maximets. >> >>> Thank you for your help and Terry for his patch! >>> 1: >>>
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
Thanks Ilya for such a detailed description about inactivity probes and keepalives. regards, Vladislav Odintsov regards, Vladislav Odintsov > On 31 Mar 2023, at 00:37, Ilya Maximets via discuss > wrote: > > On 3/30/23 22:51, Vladislav Odintsov via discuss wrote: >> Hi Ilya, >> following your recomendation I’ve built OVS 3.1.0 plus Terry’s patch [1]. >> It’s a bit outdated, but with some changes related to "last_command" logic >> in ovs*ctl it successfully built. >> Also, I grabbed your idea gust to hardcode inactivity interval for ovsdb >> relay because it just solves my issue. >> So, after testing it seems to work fine. I’ve managed to run ovn-sb-db >> cluster with custom connections from local db and ovsdb relay with >> connections from sb db. > > Good to know. > >> I’ve got a question here: >> Do we actually need probing from relay to sb cluster if we have configured >> probing from the other side in other direction (db cluster to relay)? Maybe >> we even can just set to 0 inactivity probes in ovsdb/relay.c? > > If connection between relay and the main cluster is lost, relay may > not notice this and just think that there are no new updates. All the > clients connected to that relay will have stale data as a result. > Inactivity probe interval is essentially a value for how long you think > you can afford that condition to last. Do I understand you correctly that by “connection is lost” you mean an accidental termination of tcp session? Like iptables drop or cluster member got killed by sigkill? In my understanding if cluster member will just be gracefully stopped, it’ll gracefully shutdown the connection and relay will reconnect to another cluster member? Just of curiosity, in case of accidental termination, where some “outdated” ovn-controller which is connected to relay which in turn thinks it is connected to cluster but it is not. If in such condition ovn-controller tries to claim vif, will relay detect connection failure and reconnecy to another “upstream”? > >> Also, ovsdb relay has active bidirectional probing to ovn-controllers. >> If tcp session got dropped, ovsdb relay wont notice this without probing? > > TCP timeouts can be very high or may not exist at all, if the network > connectivity suddenly disappears (a firewall in between or one of > the nodes crashed), both the client and the server may not notice > that for a very long time. I've seen in practice OVN clusters where > nodes suddenly disappeared (crashed) and other nodes didn't notice > that for many hours (caused by non-working inactivity probes). > > Another interesting side effect to consider is if controller disappears > and the relay will keep sending updates to it, that may cause significant > memory usage increase on the relay, because it will keep the backlog of > data that underlying socket didn't accept. May end up being killed by > OOM killer, if that continues long enough. By disappearing you mean death of ovn-controller without proper connection termination? So if I understand correctly, relay-to-controllers probing is a “must have”. That’s interesting, thanks! > > If you don't want to deal with inactivity probes, you may partially > replace them with TCP keepalive. Disable probes and start daemons with > keepalive library preloaded, i.e. LD_PRELOAD=libkeepalive.so with the > configuration you think is suitable (default keepalive time is 2 hours > on many systems, so defaults are likely not a good choice). You will > loose ability to detect infinite loops or deadlocks and stuff like that, > but at least, you'll be protected from pure network failures. > See some examples at the end of this page: > https://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/ > > Running a cluster without any bidirectional probes is not advisable. > > Best regards, Ilya Maximets. > >> Thank you for your help and Terry for his patch! >> 1: >> https://patchwork.ozlabs.org/project/openvswitch/patch/20220713030250.2634491-1-twil...@redhat.com/ On 7 Mar 2023, at 19:43, Ilya Maximets via discuss wrote: >>> On 3/7/23 16:58, Vladislav Odintsov wrote: I’ve sent last mail from wrong account and indentation was lost. Resending... > On 7 Mar 2023, at 18:01, Vladislav Odintsov via discuss > wrote: > Thanks Ilya for the quick and detailed response! >> On 7 Mar 2023, at 14:03, Ilya Maximets via discuss >> wrote: >> On 3/7/23 00:15, Vladislav Odintsov wrote: >>> Hi Ilya, >>> I’m wondering whether there are possible configuration parameters for >>> ovsdb relay -> main ovsdb server inactivity probe timer. >>> My cluster experiencing issues where relay disconnects from main >>> cluster due to 5 sec. inactivity probe timeout. >>> Main cluster has quite big database and a bunch of daemons, which >>> connects to it and it makes difficult to maintain connections in time. >>> For ovsdb relay as a remote I use in-db configuration (to
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
On 3/30/23 22:51, Vladislav Odintsov via discuss wrote: > Hi Ilya, > > following your recomendation I’ve built OVS 3.1.0 plus Terry’s patch [1]. > It’s a bit outdated, but with some changes related to "last_command" logic in > ovs*ctl it successfully built. > > Also, I grabbed your idea gust to hardcode inactivity interval for ovsdb > relay because it just solves my issue. > > So, after testing it seems to work fine. I’ve managed to run ovn-sb-db > cluster with custom connections from local db and ovsdb relay with > connections from sb db. Good to know. > > I’ve got a question here: > Do we actually need probing from relay to sb cluster if we have configured > probing from the other side in other direction (db cluster to relay)? Maybe > we even can just set to 0 inactivity probes in ovsdb/relay.c? If connection between relay and the main cluster is lost, relay may not notice this and just think that there are no new updates. All the clients connected to that relay will have stale data as a result. Inactivity probe interval is essentially a value for how long you think you can afford that condition to last. > Also, ovsdb relay has active bidirectional probing to ovn-controllers. > If tcp session got dropped, ovsdb relay wont notice this without probing? TCP timeouts can be very high or may not exist at all, if the network connectivity suddenly disappears (a firewall in between or one of the nodes crashed), both the client and the server may not notice that for a very long time. I've seen in practice OVN clusters where nodes suddenly disappeared (crashed) and other nodes didn't notice that for many hours (caused by non-working inactivity probes). Another interesting side effect to consider is if controller disappears and the relay will keep sending updates to it, that may cause significant memory usage increase on the relay, because it will keep the backlog of data that underlying socket didn't accept. May end up being killed by OOM killer, if that continues long enough. If you don't want to deal with inactivity probes, you may partially replace them with TCP keepalive. Disable probes and start daemons with keepalive library preloaded, i.e. LD_PRELOAD=libkeepalive.so with the configuration you think is suitable (default keepalive time is 2 hours on many systems, so defaults are likely not a good choice). You will loose ability to detect infinite loops or deadlocks and stuff like that, but at least, you'll be protected from pure network failures. See some examples at the end of this page: https://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/ Running a cluster without any bidirectional probes is not advisable. Best regards, Ilya Maximets. > > > Thank you for your help and Terry for his patch! > > 1: > https://patchwork.ozlabs.org/project/openvswitch/patch/20220713030250.2634491-1-twil...@redhat.com/ > >> On 7 Mar 2023, at 19:43, Ilya Maximets via discuss >> wrote: >> >> On 3/7/23 16:58, Vladislav Odintsov wrote: >>> I’ve sent last mail from wrong account and indentation was lost. >>> Resending... >>> On 7 Mar 2023, at 18:01, Vladislav Odintsov via discuss wrote: Thanks Ilya for the quick and detailed response! > On 7 Mar 2023, at 14:03, Ilya Maximets via discuss > wrote: > > On 3/7/23 00:15, Vladislav Odintsov wrote: >> Hi Ilya, >> >> I’m wondering whether there are possible configuration parameters for >> ovsdb relay -> main ovsdb server inactivity probe timer. >> My cluster experiencing issues where relay disconnects from main cluster >> due to 5 sec. inactivity probe timeout. >> Main cluster has quite big database and a bunch of daemons, which >> connects to it and it makes difficult to maintain connections in time. >> >> For ovsdb relay as a remote I use in-db configuration (to provide >> inactivity probe and rbac configuration for ovn-controllers). >> For ovsdb-server, which serves SB, I just set --remote=pssl:. >> >> I’d like to configure remote for ovsdb cluster via DB to set inactivity >> probe setting, but I’m not sure about the correct way for that. >> >> For now I see only two options: >> 1. Setup custom database scheme with connection table, serve it in same >> SB cluster and specify this connection when start ovsdb sb server. > > There is a ovsdb/local-config.ovsschema shipped with OVS that can be > used for that purpose. But you'll need to craft transactions for it > manually with ovsdb-client. > > There is a control tool prepared by Terry: > > https://patchwork.ozlabs.org/project/openvswitch/patch/20220713030250.2634491-1-twil...@redhat.com/ Thanks for pointing on a patch, I guess, I’ll test it out. > > But it's not in the repo yet (I need to get back to reviews on that > topic at some point). The tool itself should be fine, but maybe name > will change. Am I
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
Hi Ilya, following your recomendation I’ve built OVS 3.1.0 plus Terry’s patch [1]. It’s a bit outdated, but with some changes related to "last_command" logic in ovs*ctl it successfully built. Also, I grabbed your idea gust to hardcode inactivity interval for ovsdb relay because it just solves my issue. So, after testing it seems to work fine. I’ve managed to run ovn-sb-db cluster with custom connections from local db and ovsdb relay with connections from sb db. I’ve got a question here: Do we actually need probing from relay to sb cluster if we have configured probing from the other side in other direction (db cluster to relay)? Maybe we even can just set to 0 inactivity probes in ovsdb/relay.c? Also, ovsdb relay has active bidirectional probing to ovn-controllers. If tcp session got dropped, ovsdb relay wont notice this without probing? Thank you for your help and Terry for his patch! 1: https://patchwork.ozlabs.org/project/openvswitch/patch/20220713030250.2634491-1-twil...@redhat.com/ > On 7 Mar 2023, at 19:43, Ilya Maximets via discuss > wrote: > > On 3/7/23 16:58, Vladislav Odintsov wrote: >> I’ve sent last mail from wrong account and indentation was lost. >> Resending... >> >>> On 7 Mar 2023, at 18:01, Vladislav Odintsov via discuss >>> wrote: >>> >>> Thanks Ilya for the quick and detailed response! >>> On 7 Mar 2023, at 14:03, Ilya Maximets via discuss wrote: On 3/7/23 00:15, Vladislav Odintsov wrote: > Hi Ilya, > > I’m wondering whether there are possible configuration parameters for > ovsdb relay -> main ovsdb server inactivity probe timer. > My cluster experiencing issues where relay disconnects from main cluster > due to 5 sec. inactivity probe timeout. > Main cluster has quite big database and a bunch of daemons, which > connects to it and it makes difficult to maintain connections in time. > > For ovsdb relay as a remote I use in-db configuration (to provide > inactivity probe and rbac configuration for ovn-controllers). > For ovsdb-server, which serves SB, I just set --remote=pssl:. > > I’d like to configure remote for ovsdb cluster via DB to set inactivity > probe setting, but I’m not sure about the correct way for that. > > For now I see only two options: > 1. Setup custom database scheme with connection table, serve it in same > SB cluster and specify this connection when start ovsdb sb server. There is a ovsdb/local-config.ovsschema shipped with OVS that can be used for that purpose. But you'll need to craft transactions for it manually with ovsdb-client. There is a control tool prepared by Terry: https://patchwork.ozlabs.org/project/openvswitch/patch/20220713030250.2634491-1-twil...@redhat.com/ >>> >>> Thanks for pointing on a patch, I guess, I’ll test it out. >>> But it's not in the repo yet (I need to get back to reviews on that topic at some point). The tool itself should be fine, but maybe name will change. >>> >>> Am I right that in-DB remote configuration must be a hosted by this >>> ovsdb-server database? > > Yes. > >>> What is the best way to configure additional DB on ovsdb-server so that >>> this configuration to be permanent? > > You may specify multiple database files on the command-line for ovsdb-server > process. It will open and serve each of them. They all can be in different > modes, e.g. you have multiple clustered, standalone and relay databases in > the same ovsdb-server process. > > There is also ovsdb-server/add-db appctl to add a new database to a running > process, but it will not survive the restart. > >>> Also, am I understand correctly that there is no necessity for this DB to >>> be clustered? > > It's kind of a point of the Local_Config database to not be clustered. > The original use case was to allow each cluster member to listen on a > different IP. i.e. if you don't want to listen on 0.0.0.0 and your > cluster members are on different nodes, so have different listening IPs. > >>> > 2. Setup second connection in ovn sb database to be used for ovsdb > cluster and deploy cluster separately from ovsdb relay, because they both > start same connections and conflict on ports. (I don’t use docker here, > so I need a separate server for that). That's an easy option available right now, true. If they are deployed on different nodes, you may even use the same connection record. > > Anyway, if I configure ovsdb remote for ovsdb cluster with specified > inactivity probe (say, to 60k), I guess it’s still not enough to have > ovsdb pings every 60 seconds. Inactivity probe must be the same from both > ends - right? From the ovsdb relay process. Inactivity probes don't need to be the same. They are separate for each side of a connection and so configured separately.
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
On 3/21/23 07:18, Jake Yip wrote: > > > On 20/3/2023 10:51 pm, Ilya Maximets wrote: >> On 3/16/23 23:06, Jake Yip wrote: >>> Hi all, >>> >>> Apologies for jumping into this thread. We are seeing the same and it's >>> nice to find someone with similar issues :) >>> >>> On 8/3/2023 3:43 am, Ilya Maximets via discuss wrote: >> >> We see failures on the OVSDB Relay side: >> >> 2023-03-06T22:19:32.966Z|00099|reconnect|ERR|ssl:xxx:16642: no response >> to inactivity probe after 5 seconds, disconnecting >> 2023-03-06T22:19:32.966Z|00100|reconnect|INFO|ssl:xxx:16642: connection >> dropped >> 2023-03-06T22:19:40.989Z|00101|reconnect|INFO|ssl:xxx:16642: connected >> 2023-03-06T22:19:50.997Z|00102|reconnect|ERR|ssl:xxx:16642: no response >> to inactivity probe after 5 seconds, disconnecting >> 2023-03-06T22:19:50.997Z|00103|reconnect|INFO|ssl:xxx:16642: connection >> dropped >> 2023-03-06T22:19:59.022Z|00104|reconnect|INFO|ssl:xxx:16642: connected >> 2023-03-06T22:20:09.026Z|00105|reconnect|ERR|ssl:xxx:16642: no response >> to inactivity probe after 5 seconds, disconnecting >> 2023-03-06T22:20:09.026Z|00106|reconnect|INFO|ssl:xxx:16642: connection >> dropped >> 2023-03-06T22:20:17.052Z|00107|reconnect|INFO|ssl:xxx:16642: connected >> 2023-03-06T22:20:27.056Z|00108|reconnect|ERR|ssl:xxx:16642: no response >> to inactivity probe after 5 seconds, disconnecting >> 2023-03-06T22:20:27.056Z|00109|reconnect|INFO|ssl:xxx:16642: connection >> dropped >> 2023-03-06T22:20:35.111Z|00110|reconnect|INFO|ssl:xxx:16642: connected >> >> On the DB cluster this looks like: >> >> 2023-03-06T22:19:04.208Z|00451|stream_ssl|WARN|SSL_read: unexpected SSL >> connection close >> 2023-03-06T22:19:04.211Z|00452|reconnect|WARN|ssl:xxx:52590: connection >> dropped (Protocol error) OK. These are symptoms. The cause must be something like 'Unreasonably long MANY ms poll interval' on the DB cluster side. i.e. the reason why the main DB cluster didn't reply to the probes sent from the relay. Because as soon as server receives the probe, it replies right back. If it didn't reply, it was doing something else for an extended period of time. "MANY" is more than 5 seconds. >>> >>> We are seeing the same issue here after moving to OVN relay. >>> >>> - On the relay "no response to inactivity probe after 5 seconds" >>> - On the OVSDB cluster >>> - "Unreasonably long 1726ms poll interval" >>> - "connection dropped (Input/output error)" >>> - "SSL_write: system error (Broken pipe)" >>> - 100% CPU on northd process >>> >>> Is there anything we could look for on the OVSDB side to narrow down what >>> may be causing the load on the cluster side? >>> >>> A brief history - We are migrating an OpenStack cloud from MidoNet to OVN. >>> This cloud has roughly >>> >>> - 400 neutron networks / ovn logical switches >>> - 300 neutron routers >>> - 14000 neutron ports / ovn logical switchports >>> - 28000 neutron security groups / ovn port group >>> - 8 neutron secgroup rules / acl >>> >>> We populated the OVN DB by using OpenStack/Neutron ovn sync script. >>> >>> We have attempted the migration twice previously (2021, 2022) but failed >>> due to load issues. We've reported issues and have seen lots of performance >>> improvements over the last two years. Here is a BIG thank you to the dev >>> teams! >>> >>> We are now on the following versions >>> >>> - OVS 2.17 >>> - OVN 22.03 >>> >>> We are exploring upgrade as an option, but I am concerned if there's >>> something fundamentally wrong with the data / config we have that is >>> causing the high load, and would like to rule that out first. Please let me >>> know if you need more information, will be happy to start a new thread too. >> >> Hi, Jake. Your scale numbers are fairly high, i.e. this number of >> objects in the setup may indeed create a noticeable load. >> >> The fact that relay is disconnecting with only 1726ms poll intervals >> on the main cluster side is a bit strange. Not sure why this happened. >> Normally it should be 5+ seconds. > > There are multiple errors. I just grabbed the first I found; indeed there are > poll intervals >5secs like > > ovs|05000|timeval|WARN|Unreasonably long 13209ms poll interval (12942ms user, > 264ms system) Yeah, this one is pretty high. Is it, by any chance, database compaction related? i.e. are there database compaction related logs in the close proximity to this one? In case all the huge poll intervals are compaction-related, upgrade to OVS 3.0+ may completely solve the issue, since most of the compaction work is moved into a separate thread there. > >> >> The versions you're using have an upgrade path with potentially >> significant performance improvements, e.g OVS 3.1 + OVN 23.03. >> Both ovsdb-server and core OVN components became much faster in >> the
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
On 20/3/2023 10:51 pm, Ilya Maximets wrote: On 3/16/23 23:06, Jake Yip wrote: Hi all, Apologies for jumping into this thread. We are seeing the same and it's nice to find someone with similar issues :) On 8/3/2023 3:43 am, Ilya Maximets via discuss wrote: We see failures on the OVSDB Relay side: 2023-03-06T22:19:32.966Z|00099|reconnect|ERR|ssl:xxx:16642: no response to inactivity probe after 5 seconds, disconnecting 2023-03-06T22:19:32.966Z|00100|reconnect|INFO|ssl:xxx:16642: connection dropped 2023-03-06T22:19:40.989Z|00101|reconnect|INFO|ssl:xxx:16642: connected 2023-03-06T22:19:50.997Z|00102|reconnect|ERR|ssl:xxx:16642: no response to inactivity probe after 5 seconds, disconnecting 2023-03-06T22:19:50.997Z|00103|reconnect|INFO|ssl:xxx:16642: connection dropped 2023-03-06T22:19:59.022Z|00104|reconnect|INFO|ssl:xxx:16642: connected 2023-03-06T22:20:09.026Z|00105|reconnect|ERR|ssl:xxx:16642: no response to inactivity probe after 5 seconds, disconnecting 2023-03-06T22:20:09.026Z|00106|reconnect|INFO|ssl:xxx:16642: connection dropped 2023-03-06T22:20:17.052Z|00107|reconnect|INFO|ssl:xxx:16642: connected 2023-03-06T22:20:27.056Z|00108|reconnect|ERR|ssl:xxx:16642: no response to inactivity probe after 5 seconds, disconnecting 2023-03-06T22:20:27.056Z|00109|reconnect|INFO|ssl:xxx:16642: connection dropped 2023-03-06T22:20:35.111Z|00110|reconnect|INFO|ssl:xxx:16642: connected On the DB cluster this looks like: 2023-03-06T22:19:04.208Z|00451|stream_ssl|WARN|SSL_read: unexpected SSL connection close 2023-03-06T22:19:04.211Z|00452|reconnect|WARN|ssl:xxx:52590: connection dropped (Protocol error) OK. These are symptoms. The cause must be something like 'Unreasonably long MANY ms poll interval' on the DB cluster side. i.e. the reason why the main DB cluster didn't reply to the probes sent from the relay. Because as soon as server receives the probe, it replies right back. If it didn't reply, it was doing something else for an extended period of time. "MANY" is more than 5 seconds. We are seeing the same issue here after moving to OVN relay. - On the relay "no response to inactivity probe after 5 seconds" - On the OVSDB cluster - "Unreasonably long 1726ms poll interval" - "connection dropped (Input/output error)" - "SSL_write: system error (Broken pipe)" - 100% CPU on northd process Is there anything we could look for on the OVSDB side to narrow down what may be causing the load on the cluster side? A brief history - We are migrating an OpenStack cloud from MidoNet to OVN. This cloud has roughly - 400 neutron networks / ovn logical switches - 300 neutron routers - 14000 neutron ports / ovn logical switchports - 28000 neutron security groups / ovn port group - 8 neutron secgroup rules / acl We populated the OVN DB by using OpenStack/Neutron ovn sync script. We have attempted the migration twice previously (2021, 2022) but failed due to load issues. We've reported issues and have seen lots of performance improvements over the last two years. Here is a BIG thank you to the dev teams! We are now on the following versions - OVS 2.17 - OVN 22.03 We are exploring upgrade as an option, but I am concerned if there's something fundamentally wrong with the data / config we have that is causing the high load, and would like to rule that out first. Please let me know if you need more information, will be happy to start a new thread too. Hi, Jake. Your scale numbers are fairly high, i.e. this number of objects in the setup may indeed create a noticeable load. The fact that relay is disconnecting with only 1726ms poll intervals on the main cluster side is a bit strange. Not sure why this happened. Normally it should be 5+ seconds. There are multiple errors. I just grabbed the first I found; indeed there are poll intervals >5secs like ovs|05000|timeval|WARN|Unreasonably long 13209ms poll interval (12942ms user, 264ms system) The versions you're using have an upgrade path with potentially significant performance improvements, e.g OVS 3.1 + OVN 23.03. Both ovsdb-server and core OVN components became much faster in the previous year. Thanks for the work you've put into OVN. I've seen your conference presentations and believe that is a valid way forward. One thing keeping us back is that there are no Ubuntu packages for us. So we may need to build them. We may also be exploring containers but is still not sure not sure how containerised openvswitch works. Another issue is if integration will work - we are using Neutron Yoga. I believe OVN being able to be upgraded from one LTS to another means Neutron Yoga should work with OVS 3.1 + OVN 23.03 ? I'm not sure if there is anything fundamentally wrong with your setup, other than the total amount of resources. If you have a freedom of building your own packages and the relay disconnection is the main problem in your setup, you may try something like this: diff --git a/ovsdb/relay.c b/ovsdb/relay.c
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
On 3/16/23 23:06, Jake Yip wrote: > Hi all, > > Apologies for jumping into this thread. We are seeing the same and it's nice > to find someone with similar issues :) > > On 8/3/2023 3:43 am, Ilya Maximets via discuss wrote: We see failures on the OVSDB Relay side: 2023-03-06T22:19:32.966Z|00099|reconnect|ERR|ssl:xxx:16642: no response to inactivity probe after 5 seconds, disconnecting 2023-03-06T22:19:32.966Z|00100|reconnect|INFO|ssl:xxx:16642: connection dropped 2023-03-06T22:19:40.989Z|00101|reconnect|INFO|ssl:xxx:16642: connected 2023-03-06T22:19:50.997Z|00102|reconnect|ERR|ssl:xxx:16642: no response to inactivity probe after 5 seconds, disconnecting 2023-03-06T22:19:50.997Z|00103|reconnect|INFO|ssl:xxx:16642: connection dropped 2023-03-06T22:19:59.022Z|00104|reconnect|INFO|ssl:xxx:16642: connected 2023-03-06T22:20:09.026Z|00105|reconnect|ERR|ssl:xxx:16642: no response to inactivity probe after 5 seconds, disconnecting 2023-03-06T22:20:09.026Z|00106|reconnect|INFO|ssl:xxx:16642: connection dropped 2023-03-06T22:20:17.052Z|00107|reconnect|INFO|ssl:xxx:16642: connected 2023-03-06T22:20:27.056Z|00108|reconnect|ERR|ssl:xxx:16642: no response to inactivity probe after 5 seconds, disconnecting 2023-03-06T22:20:27.056Z|00109|reconnect|INFO|ssl:xxx:16642: connection dropped 2023-03-06T22:20:35.111Z|00110|reconnect|INFO|ssl:xxx:16642: connected On the DB cluster this looks like: 2023-03-06T22:19:04.208Z|00451|stream_ssl|WARN|SSL_read: unexpected SSL connection close 2023-03-06T22:19:04.211Z|00452|reconnect|WARN|ssl:xxx:52590: connection dropped (Protocol error) >> >> OK. These are symptoms. The cause must be something like >> 'Unreasonably long MANY ms poll interval' on the DB cluster side. >> i.e. the reason why the main DB cluster didn't reply to the >> probes sent from the relay. Because as soon as server receives >> the probe, it replies right back. If it didn't reply, it was >> doing something else for an extended period of time. "MANY" is >> more than 5 seconds. >> > > We are seeing the same issue here after moving to OVN relay. > > - On the relay "no response to inactivity probe after 5 seconds" > - On the OVSDB cluster > - "Unreasonably long 1726ms poll interval" > - "connection dropped (Input/output error)" > - "SSL_write: system error (Broken pipe)" > - 100% CPU on northd process > > Is there anything we could look for on the OVSDB side to narrow down what may > be causing the load on the cluster side? > > A brief history - We are migrating an OpenStack cloud from MidoNet to OVN. > This cloud has roughly > > - 400 neutron networks / ovn logical switches > - 300 neutron routers > - 14000 neutron ports / ovn logical switchports > - 28000 neutron security groups / ovn port group > - 8 neutron secgroup rules / acl > > We populated the OVN DB by using OpenStack/Neutron ovn sync script. > > We have attempted the migration twice previously (2021, 2022) but failed due > to load issues. We've reported issues and have seen lots of performance > improvements over the last two years. Here is a BIG thank you to the dev > teams! > > We are now on the following versions > > - OVS 2.17 > - OVN 22.03 > > We are exploring upgrade as an option, but I am concerned if there's > something fundamentally wrong with the data / config we have that is causing > the high load, and would like to rule that out first. Please let me know if > you need more information, will be happy to start a new thread too. Hi, Jake. Your scale numbers are fairly high, i.e. this number of objects in the setup may indeed create a noticeable load. The fact that relay is disconnecting with only 1726ms poll intervals on the main cluster side is a bit strange. Not sure why this happened. Normally it should be 5+ seconds. The versions you're using have an upgrade path with potentially significant performance improvements, e.g OVS 3.1 + OVN 23.03. Both ovsdb-server and core OVN components became much faster in the previous year. I'm not sure if there is anything fundamentally wrong with your setup, other than the total amount of resources. If you have a freedom of building your own packages and the relay disconnection is the main problem in your setup, you may try something like this: diff --git a/ovsdb/relay.c b/ovsdb/relay.c index 9ff6ed8f3..5c5937c27 100644 --- a/ovsdb/relay.c +++ b/ovsdb/relay.c @@ -152,6 +152,7 @@ ovsdb_relay_add_db(struct ovsdb *db, const char *remote, shash_add(_dbs, db->name, ctx); ovsdb_cs_set_leader_only(ctx->cs, false); ovsdb_cs_set_remote(ctx->cs, remote, true); +ovsdb_cs_set_probe_interval(ctx->cs, 16000); VLOG_DBG("added database: %s, %s", db->name, remote); } --- This change will set 16 seconds as inactivity probe interval for relay-to-server connection by default. Best regards, Ilya
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
Hi all, Apologies for jumping into this thread. We are seeing the same and it's nice to find someone with similar issues :) On 8/3/2023 3:43 am, Ilya Maximets via discuss wrote: We see failures on the OVSDB Relay side: 2023-03-06T22:19:32.966Z|00099|reconnect|ERR|ssl:xxx:16642: no response to inactivity probe after 5 seconds, disconnecting 2023-03-06T22:19:32.966Z|00100|reconnect|INFO|ssl:xxx:16642: connection dropped 2023-03-06T22:19:40.989Z|00101|reconnect|INFO|ssl:xxx:16642: connected 2023-03-06T22:19:50.997Z|00102|reconnect|ERR|ssl:xxx:16642: no response to inactivity probe after 5 seconds, disconnecting 2023-03-06T22:19:50.997Z|00103|reconnect|INFO|ssl:xxx:16642: connection dropped 2023-03-06T22:19:59.022Z|00104|reconnect|INFO|ssl:xxx:16642: connected 2023-03-06T22:20:09.026Z|00105|reconnect|ERR|ssl:xxx:16642: no response to inactivity probe after 5 seconds, disconnecting 2023-03-06T22:20:09.026Z|00106|reconnect|INFO|ssl:xxx:16642: connection dropped 2023-03-06T22:20:17.052Z|00107|reconnect|INFO|ssl:xxx:16642: connected 2023-03-06T22:20:27.056Z|00108|reconnect|ERR|ssl:xxx:16642: no response to inactivity probe after 5 seconds, disconnecting 2023-03-06T22:20:27.056Z|00109|reconnect|INFO|ssl:xxx:16642: connection dropped 2023-03-06T22:20:35.111Z|00110|reconnect|INFO|ssl:xxx:16642: connected On the DB cluster this looks like: 2023-03-06T22:19:04.208Z|00451|stream_ssl|WARN|SSL_read: unexpected SSL connection close 2023-03-06T22:19:04.211Z|00452|reconnect|WARN|ssl:xxx:52590: connection dropped (Protocol error) OK. These are symptoms. The cause must be something like 'Unreasonably long MANY ms poll interval' on the DB cluster side. i.e. the reason why the main DB cluster didn't reply to the probes sent from the relay. Because as soon as server receives the probe, it replies right back. If it didn't reply, it was doing something else for an extended period of time. "MANY" is more than 5 seconds. We are seeing the same issue here after moving to OVN relay. - On the relay "no response to inactivity probe after 5 seconds" - On the OVSDB cluster - "Unreasonably long 1726ms poll interval" - "connection dropped (Input/output error)" - "SSL_write: system error (Broken pipe)" - 100% CPU on northd process Is there anything we could look for on the OVSDB side to narrow down what may be causing the load on the cluster side? A brief history - We are migrating an OpenStack cloud from MidoNet to OVN. This cloud has roughly - 400 neutron networks / ovn logical switches - 300 neutron routers - 14000 neutron ports / ovn logical switchports - 28000 neutron security groups / ovn port group - 8 neutron secgroup rules / acl We populated the OVN DB by using OpenStack/Neutron ovn sync script. We have attempted the migration twice previously (2021, 2022) but failed due to load issues. We've reported issues and have seen lots of performance improvements over the last two years. Here is a BIG thank you to the dev teams! We are now on the following versions - OVS 2.17 - OVN 22.03 We are exploring upgrade as an option, but I am concerned if there's something fundamentally wrong with the data / config we have that is causing the high load, and would like to rule that out first. Please let me know if you need more information, will be happy to start a new thread too. Regards, Jake ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
On 3/8/23 08:57, Frode Nordahl wrote: Does it state that configuring inactivity probe on the DB cluster side will not help and configuration on the relay side must be done? >> >> Yes. You likely need a configuration on the relay side. > > Sorry for butting into an ongoing discussion, but this part resonated > with one of my past ventures. While investigating a different problem > we kind of hit a similar problem [0]. Aligning client, relay and > backend server configuration has potential to become complicated. > Would an alternative be for the real server and relay server to > exchange this information in-line as part of their communication, for > example exposing it in the special _Server built-in database [1]? > > 0: > https://bugs.launchpad.net/ubuntu/lunar/+source/openvswitch/+bug/1998781/comments/3 > 1: https://github.com/openvswitch/ovs/blob/master/ovsdb/_server.ovsschema > Hi, Frode. Do you mean synchronizing probes for passive connections, i.e. having ptcp/pssl remotes with the same inactivity probes on the main DB and relay? Or making --> connection have the same probe interval as passive --> connection? The main problem with the former I see is that it is currently configurable for each side individually. And it would be confusing if ovsdb-server will override the user-specified value. Hence, the config knob, i.e. configuration by the user, will be needed anyway. The latter is basically some form of what Wentao Jia proposed [2]. We could do something like that, since there is no way to configure the probe interval in --> direction at the moment. But that that will be different from any other connection we have in OVS world, so may complicate the understanding of the matter even more. I hope that we can forget about --> inactivity probes at some point and just use the default. We do control the server and we can make it faster / operate better. In fact, we do not see any large poll intervals in large scale ovn-heater tests on neither Sb nor Nb OVSDB servers, not even 1 second long, with recent OVS versions. Remaining cases I'm aware of are associated with the database conversion and potential mass re-connections, which are both solvable and being worked at. The opposite --> direction is a bit more problematic, because we do not control the client application, so we don't know how long it may not reply. E.g. full recompute on ovn-controller may still take a lot of time. But perhaps we could just bump the default probe interval for this direction to something like 60 seconds or even more. Checking if the client is alive isn't really that important for a server, we just need to disconnect dead clients eventually. [2] https://patchwork.ozlabs.org/project/openvswitch/patch/an2a4qcpihpcfukyt1uomqre.1.1641782536691.hmail.wentao@easystack.cn/ Best regards, Ilya Maximets. ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
On Tue, Mar 7, 2023 at 5:43 PM Ilya Maximets via discuss wrote: > > On 3/7/23 16:58, Vladislav Odintsov wrote: > > I’ve sent last mail from wrong account and indentation was lost. > > Resending... > > > >> On 7 Mar 2023, at 18:01, Vladislav Odintsov via discuss > >> wrote: > >> > >> Thanks Ilya for the quick and detailed response! > >> > >>> On 7 Mar 2023, at 14:03, Ilya Maximets via discuss > >>> wrote: > >>> > >>> On 3/7/23 00:15, Vladislav Odintsov wrote: > Hi Ilya, > > I’m wondering whether there are possible configuration parameters for > ovsdb relay -> main ovsdb server inactivity probe timer. > My cluster experiencing issues where relay disconnects from main cluster > due to 5 sec. inactivity probe timeout. > Main cluster has quite big database and a bunch of daemons, which > connects to it and it makes difficult to maintain connections in time. > > For ovsdb relay as a remote I use in-db configuration (to provide > inactivity probe and rbac configuration for ovn-controllers). > For ovsdb-server, which serves SB, I just set --remote=pssl:. > > I’d like to configure remote for ovsdb cluster via DB to set inactivity > probe setting, but I’m not sure about the correct way for that. > > For now I see only two options: > 1. Setup custom database scheme with connection table, serve it in same > SB cluster and specify this connection when start ovsdb sb server. > >>> > >>> There is a ovsdb/local-config.ovsschema shipped with OVS that can be > >>> used for that purpose. But you'll need to craft transactions for it > >>> manually with ovsdb-client. > >>> > >>> There is a control tool prepared by Terry: > >>> > >>> https://patchwork.ozlabs.org/project/openvswitch/patch/20220713030250.2634491-1-twil...@redhat.com/ > >> > >> Thanks for pointing on a patch, I guess, I’ll test it out. > >> > >>> > >>> But it's not in the repo yet (I need to get back to reviews on that > >>> topic at some point). The tool itself should be fine, but maybe name > >>> will change. > >> > >> Am I right that in-DB remote configuration must be a hosted by this > >> ovsdb-server database? > > Yes. > > >> What is the best way to configure additional DB on ovsdb-server so that > >> this configuration to be permanent? > > You may specify multiple database files on the command-line for ovsdb-server > process. It will open and serve each of them. They all can be in different > modes, e.g. you have multiple clustered, standalone and relay databases in > the same ovsdb-server process. > > There is also ovsdb-server/add-db appctl to add a new database to a running > process, but it will not survive the restart. > > >> Also, am I understand correctly that there is no necessity for this DB to > >> be clustered? > > It's kind of a point of the Local_Config database to not be clustered. > The original use case was to allow each cluster member to listen on a > different IP. i.e. if you don't want to listen on 0.0.0.0 and your > cluster members are on different nodes, so have different listening IPs. > > >> > >>> > 2. Setup second connection in ovn sb database to be used for ovsdb > cluster and deploy cluster separately from ovsdb relay, because they > both start same connections and conflict on ports. (I don’t use docker > here, so I need a separate server for that). > >>> > >>> That's an easy option available right now, true. If they are deployed > >>> on different nodes, you may even use the same connection record. > >>> > > Anyway, if I configure ovsdb remote for ovsdb cluster with specified > inactivity probe (say, to 60k), I guess it’s still not enough to have > ovsdb pings every 60 seconds. Inactivity probe must be the same from > both ends - right? From the ovsdb relay process. > >>> > >>> Inactivity probes don't need to be the same. They are separate for each > >>> side of a connection and so configured separately. > >>> > >>> You can set up inactivity probe for the server side of the connection via > >>> database. So, server will probe the relay every 60 seconds, but today > >>> it's not possible to set inactivity probe for the relay-to-server > >>> direction. > >>> So, relay will probe the server every 5 seconds. > >>> > >>> The way out from this situation is to allow configuration of relays via > >>> database as well, e.g. relay:db:Local_Config,Config,relays. This will > >>> require addition of a new table to the Local_Config database and allowing > >>> relay config to be parsed from the database in the code. That wasn't > >>> implemented yet. > >>> > I saw your talk on last ovscon about this topic, and the solution was in > progress there. But maybe there were some changes from that time? I’m > ready to test it if any. Or, maybe there’s any workaround? > >>> > >>> Sorry, we didn't move forward much on that topic since the presentation. > >>> There are
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
On 3/7/23 16:58, Vladislav Odintsov wrote: > I’ve sent last mail from wrong account and indentation was lost. > Resending... > >> On 7 Mar 2023, at 18:01, Vladislav Odintsov via discuss >> wrote: >> >> Thanks Ilya for the quick and detailed response! >> >>> On 7 Mar 2023, at 14:03, Ilya Maximets via discuss >>> wrote: >>> >>> On 3/7/23 00:15, Vladislav Odintsov wrote: Hi Ilya, I’m wondering whether there are possible configuration parameters for ovsdb relay -> main ovsdb server inactivity probe timer. My cluster experiencing issues where relay disconnects from main cluster due to 5 sec. inactivity probe timeout. Main cluster has quite big database and a bunch of daemons, which connects to it and it makes difficult to maintain connections in time. For ovsdb relay as a remote I use in-db configuration (to provide inactivity probe and rbac configuration for ovn-controllers). For ovsdb-server, which serves SB, I just set --remote=pssl:. I’d like to configure remote for ovsdb cluster via DB to set inactivity probe setting, but I’m not sure about the correct way for that. For now I see only two options: 1. Setup custom database scheme with connection table, serve it in same SB cluster and specify this connection when start ovsdb sb server. >>> >>> There is a ovsdb/local-config.ovsschema shipped with OVS that can be >>> used for that purpose. But you'll need to craft transactions for it >>> manually with ovsdb-client. >>> >>> There is a control tool prepared by Terry: >>> >>> https://patchwork.ozlabs.org/project/openvswitch/patch/20220713030250.2634491-1-twil...@redhat.com/ >> >> Thanks for pointing on a patch, I guess, I’ll test it out. >> >>> >>> But it's not in the repo yet (I need to get back to reviews on that >>> topic at some point). The tool itself should be fine, but maybe name >>> will change. >> >> Am I right that in-DB remote configuration must be a hosted by this >> ovsdb-server database? Yes. >> What is the best way to configure additional DB on ovsdb-server so that this >> configuration to be permanent? You may specify multiple database files on the command-line for ovsdb-server process. It will open and serve each of them. They all can be in different modes, e.g. you have multiple clustered, standalone and relay databases in the same ovsdb-server process. There is also ovsdb-server/add-db appctl to add a new database to a running process, but it will not survive the restart. >> Also, am I understand correctly that there is no necessity for this DB to be >> clustered? It's kind of a point of the Local_Config database to not be clustered. The original use case was to allow each cluster member to listen on a different IP. i.e. if you don't want to listen on 0.0.0.0 and your cluster members are on different nodes, so have different listening IPs. >> >>> 2. Setup second connection in ovn sb database to be used for ovsdb cluster and deploy cluster separately from ovsdb relay, because they both start same connections and conflict on ports. (I don’t use docker here, so I need a separate server for that). >>> >>> That's an easy option available right now, true. If they are deployed >>> on different nodes, you may even use the same connection record. >>> Anyway, if I configure ovsdb remote for ovsdb cluster with specified inactivity probe (say, to 60k), I guess it’s still not enough to have ovsdb pings every 60 seconds. Inactivity probe must be the same from both ends - right? From the ovsdb relay process. >>> >>> Inactivity probes don't need to be the same. They are separate for each >>> side of a connection and so configured separately. >>> >>> You can set up inactivity probe for the server side of the connection via >>> database. So, server will probe the relay every 60 seconds, but today >>> it's not possible to set inactivity probe for the relay-to-server direction. >>> So, relay will probe the server every 5 seconds. >>> >>> The way out from this situation is to allow configuration of relays via >>> database as well, e.g. relay:db:Local_Config,Config,relays. This will >>> require addition of a new table to the Local_Config database and allowing >>> relay config to be parsed from the database in the code. That wasn't >>> implemented yet. >>> I saw your talk on last ovscon about this topic, and the solution was in progress there. But maybe there were some changes from that time? I’m ready to test it if any. Or, maybe there’s any workaround? >>> >>> Sorry, we didn't move forward much on that topic since the presentation. >>> There are few unanswered questions around local config database. Mainly >>> regarding upgrades from cmdline/main db -based configuration to a local >>> config -based. But I hope we can figure that out in the current release >>> time frame, i.e. before 3.2 release. > > Regarding
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
I’ve sent last mail from wrong account and indentation was lost. Resending... > On 7 Mar 2023, at 18:01, Vladislav Odintsov via discuss > wrote: > > Thanks Ilya for the quick and detailed response! > >> On 7 Mar 2023, at 14:03, Ilya Maximets via discuss >> wrote: >> >> On 3/7/23 00:15, Vladislav Odintsov wrote: >>> Hi Ilya, >>> >>> I’m wondering whether there are possible configuration parameters for ovsdb >>> relay -> main ovsdb server inactivity probe timer. >>> My cluster experiencing issues where relay disconnects from main cluster >>> due to 5 sec. inactivity probe timeout. >>> Main cluster has quite big database and a bunch of daemons, which connects >>> to it and it makes difficult to maintain connections in time. >>> >>> For ovsdb relay as a remote I use in-db configuration (to provide >>> inactivity probe and rbac configuration for ovn-controllers). >>> For ovsdb-server, which serves SB, I just set --remote=pssl:. >>> >>> I’d like to configure remote for ovsdb cluster via DB to set inactivity >>> probe setting, but I’m not sure about the correct way for that. >>> >>> For now I see only two options: >>> 1. Setup custom database scheme with connection table, serve it in same SB >>> cluster and specify this connection when start ovsdb sb server. >> >> There is a ovsdb/local-config.ovsschema shipped with OVS that can be >> used for that purpose. But you'll need to craft transactions for it >> manually with ovsdb-client. >> >> There is a control tool prepared by Terry: >> >> https://patchwork.ozlabs.org/project/openvswitch/patch/20220713030250.2634491-1-twil...@redhat.com/ > > Thanks for pointing on a patch, I guess, I’ll test it out. > >> >> But it's not in the repo yet (I need to get back to reviews on that >> topic at some point). The tool itself should be fine, but maybe name >> will change. > > Am I right that in-DB remote configuration must be a hosted by this > ovsdb-server database? > What is the best way to configure additional DB on ovsdb-server so that this > configuration to be permanent? > Also, am I understand correctly that there is no necessity for this DB to be > clustered? > >> >>> 2. Setup second connection in ovn sb database to be used for ovsdb cluster >>> and deploy cluster separately from ovsdb relay, because they both start >>> same connections and conflict on ports. (I don’t use docker here, so I need >>> a separate server for that). >> >> That's an easy option available right now, true. If they are deployed >> on different nodes, you may even use the same connection record. >> >>> >>> Anyway, if I configure ovsdb remote for ovsdb cluster with specified >>> inactivity probe (say, to 60k), I guess it’s still not enough to have ovsdb >>> pings every 60 seconds. Inactivity probe must be the same from both ends - >>> right? From the ovsdb relay process. >> >> Inactivity probes don't need to be the same. They are separate for each >> side of a connection and so configured separately. >> >> You can set up inactivity probe for the server side of the connection via >> database. So, server will probe the relay every 60 seconds, but today >> it's not possible to set inactivity probe for the relay-to-server direction. >> So, relay will probe the server every 5 seconds. >> >> The way out from this situation is to allow configuration of relays via >> database as well, e.g. relay:db:Local_Config,Config,relays. This will >> require addition of a new table to the Local_Config database and allowing >> relay config to be parsed from the database in the code. That wasn't >> implemented yet. >> >>> I saw your talk on last ovscon about this topic, and the solution was in >>> progress there. But maybe there were some changes from that time? I’m ready >>> to test it if any. Or, maybe there’s any workaround? >> >> Sorry, we didn't move forward much on that topic since the presentation. >> There are few unanswered questions around local config database. Mainly >> regarding upgrades from cmdline/main db -based configuration to a local >> config -based. But I hope we can figure that out in the current release >> time frame, i.e. before 3.2 release. Regarding configuration method… Just like an idea (I haven’t seen this variant as one of possible). Remote add/remove is possible via ovsdb-server ctl socket. Could introducing new command "ovsdb-server/set-remote-param PARAM=VALUE" be a solution here? >> >> There is also this workaround: >> >> https://patchwork.ozlabs.org/project/openvswitch/patch/an2a4qcpihpcfukyt1uomqre.1.1641782536691.hmail.wentao@easystack.cn/ >> It simply takes the server->relay inactivity probe value and applies it >> to the relay->server connection. But it's not a correct solution, because >> it relies on certain database names. >> >> Out of curiosity, what kind of poll intervals you see on your main server >> setup that triggers inactivity probe failures? Can upgrade to OVS 3.1 >> solve some of these issues? 3.1
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
On 7 Mar 2023, at 18:01, Vladislav Odintsov via discuss wrote: Thanks Ilya for the quick and detailed response! On 7 Mar 2023, at 14:03, Ilya Maximets via discuss wrote: On 3/7/23 00:15, Vladislav Odintsov wrote: Hi Ilya, I’m wondering whether there are possible configuration parameters for ovsdb relay -> main ovsdb server inactivity probe timer. My cluster experiencing issues where relay disconnects from main cluster due to 5 sec. inactivity probe timeout. Main cluster has quite big database and a bunch of daemons, which connects to it and it makes difficult to maintain connections in time. For ovsdb relay as a remote I use in-db configuration (to provide inactivity probe and rbac configuration for ovn-controllers). For ovsdb-server, which serves SB, I just set --remote=pssl:. I’d like to configure remote for ovsdb cluster via DB to set inactivity probe setting, but I’m not sure about the correct way for that. For now I see only two options: 1. Setup custom database scheme with connection table, serve it in same SB cluster and specify this connection when start ovsdb sb server. There is a ovsdb/local-config.ovsschema shipped with OVS that can be used for that purpose. But you'll need to craft transactions for it manually with ovsdb-client. There is a control tool prepared by Terry: https://patchwork.ozlabs.org/project/openvswitch/patch/20220713030250.2634491-1-twil...@redhat.com/ Thanks for pointing on a patch, I guess, I’ll test it out. But it's not in the repo yet (I need to get back to reviews on that topic at some point). The tool itself should be fine, but maybe name will change. Am I right that in-DB remote configuration must be a hosted by this ovsdb-server database? What is the best way to configure additional DB on ovsdb-server so that this configuration to be permanent? Also, am I understand correctly that there is no necessity for this DB to be clustered? 2. Setup second connection in ovn sb database to be used for ovsdb cluster and deploy cluster separately from ovsdb relay, because they both start same connections and conflict on ports. (I don’t use docker here, so I need a separate server for that). That's an easy option available right now, true. If they are deployed on different nodes, you may even use the same connection record. Anyway, if I configure ovsdb remote for ovsdb cluster with specified inactivity probe (say, to 60k), I guess it’s still not enough to have ovsdb pings every 60 seconds. Inactivity probe must be the same from both ends - right? From the ovsdb relay process. Inactivity probes don't need to be the same. They are separate for each side of a connection and so configured separately. You can set up inactivity probe for the server side of the connection via database. So, server will probe the relay every 60 seconds, but today it's not possible to set inactivity probe for the relay-to-server direction. So, relay will probe the server every 5 seconds. The way out from this situation is to allow configuration of relays via database as well, e.g. relay:db:Local_Config,Config,relays. This will require addition of a new table to the Local_Config database and allowing relay config to be parsed from the database in the code. That wasn't implemented yet. I saw your talk on last ovscon about this topic, and the solution was in progress there. But maybe there were some changes from that time? I’m ready to test it if any. Or, maybe there’s any workaround? Sorry, we didn't move forward much on that topic since the presentation. There are few unanswered questions around local config database. Mainly regarding upgrades from cmdline/main db -based configuration to a local config -based. But I hope we can figure that out in the current release time frame, i.e. before 3.2 release. Regarding configuration method… Just like an idea (I haven’t seen this variant as one of possible). Remote add/remove is possible via ovsdb-server ctl socket. Could introducing new command "ovsdb-server/set-remote-param PARAM=VALUE" be a solution here? There is also this workaround: https://patchwork.ozlabs.org/project/openvswitch/patch/an2a4qcpihpcfukyt1uomqre.1.1641782536691.hmail.wentao@easystack.cn/ It simply takes the server->relay inactivity probe value and applies it to the relay->server connection. But it's not a correct solution, because it relies on certain database names. Out of curiosity, what kind of poll intervals you see on your main server setup that triggers inactivity probe failures? Can upgrade to OVS 3.1 solve some of these issues? 3.1 should be noticeably faster than 2.17, and also parallel compaction introduced in 3.0 removes one of the big reasons for large poll intervals. OVN upgrade to 22.09+ or even 23.03 should also help with database sizes. We see failures on the OVSDB Relay side: 2023-03-06T22:19:32.966Z|00099|reconnect|ERR|ssl:xxx:16642: no response to inactivity probe after 5 seconds, disconnecting
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
Thanks Ilya for the quick and detailed response! > On 7 Mar 2023, at 14:03, Ilya Maximets via discuss > wrote: > > On 3/7/23 00:15, Vladislav Odintsov wrote: >> Hi Ilya, >> >> I’m wondering whether there are possible configuration parameters for ovsdb >> relay -> main ovsdb server inactivity probe timer. >> My cluster experiencing issues where relay disconnects from main cluster due >> to 5 sec. inactivity probe timeout. >> Main cluster has quite big database and a bunch of daemons, which connects >> to it and it makes difficult to maintain connections in time. >> >> For ovsdb relay as a remote I use in-db configuration (to provide inactivity >> probe and rbac configuration for ovn-controllers). >> For ovsdb-server, which serves SB, I just set --remote=pssl:. >> >> I’d like to configure remote for ovsdb cluster via DB to set inactivity >> probe setting, but I’m not sure about the correct way for that. >> >> For now I see only two options: >> 1. Setup custom database scheme with connection table, serve it in same SB >> cluster and specify this connection when start ovsdb sb server. > > There is a ovsdb/local-config.ovsschema shipped with OVS that can be > used for that purpose. But you'll need to craft transactions for it > manually with ovsdb-client. > > There is a control tool prepared by Terry: > > https://patchwork.ozlabs.org/project/openvswitch/patch/20220713030250.2634491-1-twil...@redhat.com/ Thanks for pointing on a patch, I guess, I’ll test it out. > > But it's not in the repo yet (I need to get back to reviews on that > topic at some point). The tool itself should be fine, but maybe name > will change. Am I right that in-DB remote configuration must be a hosted by this ovsdb-server database? What is the best way to configure additional DB on ovsdb-server so that this configuration to be permanent? Also, am I understand correctly that there is no necessity for this DB to be clustered? > >> 2. Setup second connection in ovn sb database to be used for ovsdb cluster >> and deploy cluster separately from ovsdb relay, because they both start same >> connections and conflict on ports. (I don’t use docker here, so I need a >> separate server for that). > > That's an easy option available right now, true. If they are deployed > on different nodes, you may even use the same connection record. > >> >> Anyway, if I configure ovsdb remote for ovsdb cluster with specified >> inactivity probe (say, to 60k), I guess it’s still not enough to have ovsdb >> pings every 60 seconds. Inactivity probe must be the same from both ends - >> right? From the ovsdb relay process. > > Inactivity probes don't need to be the same. They are separate for each > side of a connection and so configured separately. > > You can set up inactivity probe for the server side of the connection via > database. So, server will probe the relay every 60 seconds, but today > it's not possible to set inactivity probe for the relay-to-server direction. > So, relay will probe the server every 5 seconds. > > The way out from this situation is to allow configuration of relays via > database as well, e.g. relay:db:Local_Config,Config,relays. This will > require addition of a new table to the Local_Config database and allowing > relay config to be parsed from the database in the code. That wasn't > implemented yet. > >> I saw your talk on last ovscon about this topic, and the solution was in >> progress there. But maybe there were some changes from that time? I’m ready >> to test it if any. Or, maybe there’s any workaround? > > Sorry, we didn't move forward much on that topic since the presentation. > There are few unanswered questions around local config database. Mainly > regarding upgrades from cmdline/main db -based configuration to a local > config -based. But I hope we can figure that out in the current release > time frame, i.e. before 3.2 release. > > There is also this workaround: > > https://patchwork.ozlabs.org/project/openvswitch/patch/an2a4qcpihpcfukyt1uomqre.1.1641782536691.hmail.wentao@easystack.cn/ > It simply takes the server->relay inactivity probe value and applies it > to the relay->server connection. But it's not a correct solution, because > it relies on certain database names. > > Out of curiosity, what kind of poll intervals you see on your main server > setup that triggers inactivity probe failures? Can upgrade to OVS 3.1 > solve some of these issues? 3.1 should be noticeably faster than 2.17, > and also parallel compaction introduced in 3.0 removes one of the big > reasons for large poll intervals. OVN upgrade to 22.09+ or even 23.03 > should also help with database sizes. We see failures on the OVSDB Relay side: 2023-03-06T22:19:32.966Z|00099|reconnect|ERR|ssl:xxx:16642: no response to inactivity probe after 5 seconds, disconnecting 2023-03-06T22:19:32.966Z|00100|reconnect|INFO|ssl:xxx:16642: connection dropped
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
On 3/7/23 00:15, Vladislav Odintsov wrote: > Hi Ilya, > > I’m wondering whether there are possible configuration parameters for ovsdb > relay -> main ovsdb server inactivity probe timer. > My cluster experiencing issues where relay disconnects from main cluster due > to 5 sec. inactivity probe timeout. > Main cluster has quite big database and a bunch of daemons, which connects to > it and it makes difficult to maintain connections in time. > > For ovsdb relay as a remote I use in-db configuration (to provide inactivity > probe and rbac configuration for ovn-controllers). > For ovsdb-server, which serves SB, I just set --remote=pssl:. > > I’d like to configure remote for ovsdb cluster via DB to set inactivity probe > setting, but I’m not sure about the correct way for that. > > For now I see only two options: > 1. Setup custom database scheme with connection table, serve it in same SB > cluster and specify this connection when start ovsdb sb server. There is a ovsdb/local-config.ovsschema shipped with OVS that can be used for that purpose. But you'll need to craft transactions for it manually with ovsdb-client. There is a control tool prepared by Terry: https://patchwork.ozlabs.org/project/openvswitch/patch/20220713030250.2634491-1-twil...@redhat.com/ But it's not in the repo yet (I need to get back to reviews on that topic at some point). The tool itself should be fine, but maybe name will change. > 2. Setup second connection in ovn sb database to be used for ovsdb cluster > and deploy cluster separately from ovsdb relay, because they both start same > connections and conflict on ports. (I don’t use docker here, so I need a > separate server for that). That's an easy option available right now, true. If they are deployed on different nodes, you may even use the same connection record. > > Anyway, if I configure ovsdb remote for ovsdb cluster with specified > inactivity probe (say, to 60k), I guess it’s still not enough to have ovsdb > pings every 60 seconds. Inactivity probe must be the same from both ends - > right? From the ovsdb relay process. Inactivity probes don't need to be the same. They are separate for each side of a connection and so configured separately. You can set up inactivity probe for the server side of the connection via database. So, server will probe the relay every 60 seconds, but today it's not possible to set inactivity probe for the relay-to-server direction. So, relay will probe the server every 5 seconds. The way out from this situation is to allow configuration of relays via database as well, e.g. relay:db:Local_Config,Config,relays. This will require addition of a new table to the Local_Config database and allowing relay config to be parsed from the database in the code. That wasn't implemented yet. > I saw your talk on last ovscon about this topic, and the solution was in > progress there. But maybe there were some changes from that time? I’m ready > to test it if any. Or, maybe there’s any workaround? Sorry, we didn't move forward much on that topic since the presentation. There are few unanswered questions around local config database. Mainly regarding upgrades from cmdline/main db -based configuration to a local config -based. But I hope we can figure that out in the current release time frame, i.e. before 3.2 release. There is also this workaround: https://patchwork.ozlabs.org/project/openvswitch/patch/an2a4qcpihpcfukyt1uomqre.1.1641782536691.hmail.wentao@easystack.cn/ It simply takes the server->relay inactivity probe value and applies it to the relay->server connection. But it's not a correct solution, because it relies on certain database names. Out of curiosity, what kind of poll intervals you see on your main server setup that triggers inactivity probe failures? Can upgrade to OVS 3.1 solve some of these issues? 3.1 should be noticeably faster than 2.17, and also parallel compaction introduced in 3.0 removes one of the big reasons for large poll intervals. OVN upgrade to 22.09+ or even 23.03 should also help with database sizes. Best regards, Ilya Maximets. ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
Hi Ilya, I’m wondering whether there are possible configuration parameters for ovsdb relay -> main ovsdb server inactivity probe timer. My cluster experiencing issues where relay disconnects from main cluster due to 5 sec. inactivity probe timeout. Main cluster has quite big database and a bunch of daemons, which connects to it and it makes difficult to maintain connections in time. For ovsdb relay as a remote I use in-db configuration (to provide inactivity probe and rbac configuration for ovn-controllers). For ovsdb-server, which serves SB, I just set --remote=pssl:. I’d like to configure remote for ovsdb cluster via DB to set inactivity probe setting, but I’m not sure about the correct way for that. For now I see only two options: 1. Setup custom database scheme with connection table, serve it in same SB cluster and specify this connection when start ovsdb sb server. 2. Setup second connection in ovn sb database to be used for ovsdb cluster and deploy cluster separately from ovsdb relay, because they both start same connections and conflict on ports. (I don’t use docker here, so I need a separate server for that). Anyway, if I configure ovsdb remote for ovsdb cluster with specified inactivity probe (say, to 60k), I guess it’s still not enough to have ovsdb pings every 60 seconds. Inactivity probe must be the same from both ends - right? From the ovsdb relay process. I saw your talk on last ovscon about this topic, and the solution was in progress there. But maybe there were some changes from that time? I’m ready to test it if any. Or, maybe there’s any workaround? Regards, Vladislav Odintsov ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
Hi,Ilya Maximetsthanks for your replay I am running ovn large scale test,with 1000 sandbox (that is 1000 ovn-controller),3 clustered nb ,3 nb-relay, 3 clustered sb,20 sb-relay configure flows:neutron-server <> nb-relay <> nb <> northd <> sb <> sb-relay <> ovn-controller default 5 seconds probe interval will cause connection flapping: large transaction handing,db log compression,... ovsdb relay server has two kinds of connections:active connection and passive connection, active connection ,as ovsdb client,connect to clustered ovsdb server,and passive connection listening other client connect to itself I config this two kinds of connections in nb: active connection: "tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641" passive connection: “ptcp:6641:0.0.0.0” it cannot relay server share the same connection configurations with clustered ovsdb-server? it is not a good way to have another small database with a relay configuration. an example: ovn-northd has no database,probe interval read from NB,config northd probe interval like this:ovn-nbctl set NB_Global . options:northd_probe_interval=6,can relay sever read probe interval from NB or SB? if probe interval of relay server cannot read from NB or SB, appctl command can be as consider,because it can reconfig without restart Best regards, Wentao Jia the follow msg is configuration of my test: clustered ovsdb server ovsdb-server -vconsole:info -vsyslog:off -vfile:off --log-file=/var/log/ovn/ovsdb-server-nb.log --remote=punix:/var/run/ovn/ovnnb_db.sock --pidfile=/var/run/ovn/ovnnb_db.pid --unixctl=/var/run/ovn/ovnnb_db.ctl --remote=db:OVN_Northbound,NB_Global,connections --private-key=db:OVN_Northbound,SSL,private_key --certificate=db:OVN_Northbound,SSL,certificate --ca-cert=db:OVN_Northbound,SSL,ca_cert --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers /etc/ovn/ovnnb_db.db ovsdb relay server: ovsdb-server --remote=db:OVN_Northbound,NB_Global,connections -vconsole:info -vsyslog:off -vfile:off --log-file=/var/log/ovn/ovsdb-server-nb.log relay:OVN_Northbound:tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641 connecion configuration:one active connection and one passive connection ()[root@ovn-busybox-0 /]# ovn-nbctl list connection _uuid : 5ddab5a4-a267-42b4-9dd4-76d55855a109 external_ids: {} inactivity_probe: 12 is_connected: true max_backoff : [] other_config: {} status : {sec_since_connect="143208", state=ACTIVE} target : "tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641" _uuid : 351b99bb-dd6a-4ba3-9c30-c0b4cff183e7 external_ids: {} inactivity_probe: 0 is_connected: true max_backoff : [] other_config: {} status : {bound_port="6641", sec_since_connect="0", sec_since_disconnect="0"} target : "ptcp:6641:0.0.0.0" 发件人:Ilya Maximets 发送日期:2021-08-26 02:38:58 收件人:ovs-discuss@openvswitch.org,"贾文涛" 抄送人:i.maxim...@ovn.org 主题:Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work>> hi,all >> >> >> the default inactivity probe interval of ovsdb relay server to nb/sb ovsdb >> server is 5000ms. >> I set an active connection as follow,set inactivity probe interval to >> 12ms : >> _uuid : 5ddab5a4-a267-42b4-9dd4-76d55855a109 >> external_ids: {} >> inactivity_probe: 12 >> is_connected: true >> max_backoff : [] >> other_config: {} >> status : {sec_since_connect="0", state=ACTIVE} >> target : "tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641" > > >Hmm. How exactly did you configure that? > >> >> ovn-ovsdb-nb.openstack.svc.cluster.local is a vip >> but the inactivity probe is still 5000> >> 2021-08-24T12:34:17.313Z|04924|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: >> idle 120225 ms, sending inactivity probe >> 2021-08-24T12:36:17.759Z|05854|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: >> idle 120446 ms, sending inactivity probe >> 2021-08-24T12:37:06.326Z|06145|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: >> idle 6853 ms, sending inactivity probe >> 2021-08-24T12:37:11.330Z|06155|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: >> idle 5004 ms, sending inactivity probe > >This looks like you have 2 different connections. One with 5000 and >one with 12 inactivity probe interval. > >I suspect that relay server is started something like this: > >ovsdb-server ... --remo
Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
> hi,all > > > the default inactivity probe interval of ovsdb relay server to nb/sb ovsdb > server is 5000ms. > I set an active connection as follow,set inactivity probe interval to > 12ms : > _uuid : 5ddab5a4-a267-42b4-9dd4-76d55855a109 > external_ids: {} > inactivity_probe: 12 > is_connected: true > max_backoff : [] > other_config: {} > status : {sec_since_connect="0", state=ACTIVE} > target : "tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641" Hmm. How exactly did you configure that? > > ovn-ovsdb-nb.openstack.svc.cluster.local is a vip > but the inactivity probe is still 5000> > 2021-08-24T12:34:17.313Z|04924|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: > idle 120225 ms, sending inactivity probe > 2021-08-24T12:36:17.759Z|05854|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: > idle 120446 ms, sending inactivity probe > 2021-08-24T12:37:06.326Z|06145|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: > idle 6853 ms, sending inactivity probe > 2021-08-24T12:37:11.330Z|06155|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: > idle 5004 ms, sending inactivity probe This looks like you have 2 different connections. One with 5000 and one with 12 inactivity probe interval. I suspect that relay server is started something like this: ovsdb-server ... --remote=db:OVN_Northbound,NB_Global,connections \ relay:OVN_Northbound:tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641 And the connection showed above is configured in this 'connections' row, right? Connections configured with '--remote' are not the same as 'relay' connections. So, in this case ovsdb-server will create a relay with the remote specified in a 'relay:' part and with a default inactivity probe interval. And it will open a connection to what is specified in a database row pointed by '--remote' with a configured values for that connection. It will expect a client on the other side of the connection. So, this connection will connect main server with relay, but they both will just wait database queries from each other. Configuring things this way you will also, probably, have a self-connection from the main server to itself, right? In general, currently, there is no way to configure inactivity probe interval for "relay" --> "main server" connection, you can only configure it in the opposite direction. Does default inactivity interval cause problems for your setup? I have a plan to implement that though. There are several options how to do that: 1. Add a simple cmdline argument like '--relay-inactivity-probe=N' that will affect all the relay databases on this ovsdb-server process. Pros: Simple Cons: Affects all relay databases of this process, change requires restart, configuration applied to a single process. 2. appctl command that can be executed against relay server, e.g. ovs-appctl ovsdb-server/relay-set-inactivity-probe OVN_Northbound 12 Pros: Simple Cons: Doesn't survive restart, configuration applied to a single process. 3. Add more configuration options to the 'relay:' syntax, e.g.: relay:inactivity-probe=12:OVN_Northbound:tcp:127.0.0.1:6641 Pros: Simple Cons: Doesn't look like a good API. 4. Have a separate small database with a relay configuration, e.g. ovsdb-server ... relay:db:OVSDB_Relay,Relay,relays relay.db And a small tool to interact with this local database: ovs-relayctl add-relay OVN_Northbound \ tcp:127.0.0.1:6641 inactivity-probe=12 This will add a new relay configuration to the OVSDB_Relay database and ovsdb-server will start relaying it. Pros: Lots of things can be configured including inactivity probes and backoff. Can be extended with relay specific configs in the future. Survives restart. relay.db can be relayed from a separate ovsdb-server, if needed, so there is no need to configure each relay separately. Cons: A bit more complex implementation. Example of a complex setup would be: # start a main database server a. ovsdb-server --remote=db:OVN_Nortbound:NB_Global,connections ovnnb.db # start a small database server that only holds relay.db b. ovsdb-server --remote=pssl:6647:server relay.db ovs-relayctl add-relay OVN_Northbound tcp:your-server:6641 inactivity-probe=12 # start a relay server that relays OVSDB_Relay and relays everything # that configured in this db. If OVSDB_Relay db has configured # OVN_Northbound db, start accepting connections on remotes configured there. c. ovsdb-server --remote=db:OVN_Nortbound:NB_Global,connections \ relay:db:OVSDB_Relay,Relay,relays \ relay:OVSDB_Relay:ssl:server:6647 Once this started, server 'c' will connect to server 'b' and get the OVSDB_Relay
[ovs-discuss] ovsdb relay server active connection probe interval do not work
hi,all the default inactivity probe interval of ovsdb relay server to nb/sb ovsdb server is 5000ms. I set an active connection as follow,set inactivity probe interval to 12ms : _uuid : 5ddab5a4-a267-42b4-9dd4-76d55855a109 external_ids: {} inactivity_probe: 12 is_connected: true max_backoff : [] other_config: {} status : {sec_since_connect="0", state=ACTIVE} target : "tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641" ovn-ovsdb-nb.openstack.svc.cluster.local is a vip but the inactivity probe is still 5000 the follow is log of ovsdb relay server 2021-08-24T12:34:17.313Z|04924|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 120225 ms, sending inactivity probe 2021-08-24T12:36:17.759Z|05854|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 120446 ms, sending inactivity probe 2021-08-24T12:37:06.326Z|06145|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 6853 ms, sending inactivity probe 2021-08-24T12:37:11.330Z|06155|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:37:16.334Z|06165|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5003 ms, sending inactivity probe 2021-08-24T12:37:21.339Z|06175|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5005 ms, sending inactivity probe 2021-08-24T12:37:33.850Z|06226|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 6681 ms, sending inactivity probe 2021-08-24T12:37:38.855Z|06236|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5003 ms, sending inactivity probe 2021-08-24T12:37:43.859Z|06246|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:37:48.864Z|06256|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:37:53.870Z|06266|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5006 ms, sending inactivity probe 2021-08-24T12:37:58.876Z|06276|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5006 ms, sending inactivity probe 2021-08-24T12:38:08.882Z|06293|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 6299 ms, sending inactivity probe 2021-08-24T12:38:13.887Z|06303|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5003 ms, sending inactivity probe 2021-08-24T12:38:18.890Z|06313|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 121131 ms, sending inactivity probe 2021-08-24T12:38:18.891Z|06316|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:38:23.895Z|06330|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:38:28.901Z|06340|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5005 ms, sending inactivity probe 2021-08-24T12:38:33.905Z|06350|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:38:38.909Z|06360|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5003 ms, sending inactivity probe 2021-08-24T12:38:43.913Z|06370|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5003 ms, sending inactivity probe 2021-08-24T12:38:48.922Z|06380|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5009 ms, sending inactivity probe 2021-08-24T12:38:53.926Z|06390|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5003 ms, sending inactivity probe 2021-08-24T12:38:58.930Z|06400|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5003 ms, sending inactivity probe 2021-08-24T12:39:03.934Z|06410|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5003 ms, sending inactivity probe 2021-08-24T12:39:08.938Z|06420|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:39:13.941Z|06430|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5002 ms, sending inactivity probe 2021-08-24T12:39:18.946Z|06440|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:39:23.951Z|06452|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5005 ms, sending inactivity probe 2021-08-24T12:39:28.956Z|06462|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:39:33.962Z|06472|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5006 ms, sending inactivity probe best regards,Wentao Jia ___ discuss mailing list disc...@openvswitch.org