On Fri, Oct 13, 2017 at 6:05 AM, Andy Zhou <[email protected]> wrote: > Hi, Numan, > > I am curious why default 5 seconds inactivity time does not work? Do > you have more details? > > Does the glitch usually happen around the HA switch over? If this > happens during normal operation, > Then this is not HA specific issue, but an indication of some > connectivity issues. >
Hi Andy. This happens in the openstack deployment and when the neutron-server is busy handling lots of API requests. Normally the deployment would be having 3 controller nodes and neutron-server would be running in each node. On each controller node, neutron-server starts around 10 - 12 neutron workers (which are separate processes). Number of API workers is a configuration option and normally number of cores = no of neutron works if not configured. I have tested in both physical nodes deployment and virtual deployment (3 controllers running as vms in a node). Around 40 connections are opened to the OVN north ovsdb-server by all the neutron workers in the physical deployment and around 15 connections are opened in the virtual deployment. When neutron-server is loaded with many API requests, I have noticed that, ovsdb-server drops the connections when it doesn't get the echo reply every 5 seconds. This leads to lot of reconnections to the ovsdb-server and the response from the neutron-server is very slow and bad. With this patch it seems to work fine. The issue is not because of any network issues but because of lots of connections from the neutron-server workers to the ovsdb-server and failure by the idl clients to reply to the echo request every 5 seconds when the neutron-server is loaded. I can make the patch to provide the configuration option to override the inactivity probe value so that it doesn't affect others who use the OVN OCF pacemaker script. Let me know your comments. Thanks Numan > > On Thu, Oct 12, 2017 at 11:08 AM, Andy Zhou <[email protected]> wrote: > > Sure, I will take a look. > > > > On Thu, Oct 12, 2017 at 10:49 AM, Ben Pfaff <[email protected]> wrote: > >> Hi Andy. In the IRC meeting today, Numan suggested that you might be an > >> appropriate reviewer for this patch, so if you agree and you have a > >> chance to look at this then it would be appreciated. > >> > >> Thanks, > >> > >> Ben. > >> > >> On Wed, Oct 11, 2017 at 02:22:33PM +0530, [email protected] wrote: > >>> From: Numan Siddique <[email protected]> > >>> > >>> In the case of OVN HA deployments with openstack, it has been noticed > >>> that the 5 seconds inactivity probe interval is not enough and > ovsdb-servers > >>> time out. > >>> This patch > >>> - providdes an option to configure this value. > >>> - creates a connection row in NB/SB dbs and sets the target and > >>> inactivity_probe values when the node is promoted to master. > >>> > >>> CC: Andy Zhou <[email protected]> > >>> Signed-off-by: Numan Siddique <[email protected]> > >>> --- > >>> ovn/utilities/ovndb-servers.ocf | 27 +++++++++++++++++++++++++++ > >>> 1 file changed, 27 insertions(+) > >>> > >>> diff --git a/ovn/utilities/ovndb-servers.ocf > b/ovn/utilities/ovndb-servers.ocf > >>> index fe1207c22..92620af6a 100755 > >>> --- a/ovn/utilities/ovndb-servers.ocf > >>> +++ b/ovn/utilities/ovndb-servers.ocf > >>> @@ -8,6 +8,8 @@ > >>> : ${SB_MASTER_PORT_DEFAULT="6642"} > >>> : ${SB_MASTER_PROTO_DEFAULT="tcp"} > >>> : ${MANAGE_NORTHD_DEFAULT="no"} > >>> +: ${INACTIVE_PROBE_DEFAULT="60000"} > >>> + > >>> CRM_MASTER="${HA_SBIN_DIR}/crm_master -l reboot" > >>> CRM_ATTR_REPL_INFO="${HA_SBIN_DIR}/crm_attribute --type crm_config > --name OVN_REPL_INFO -s ovn_ovsdb_master_server" > >>> OVN_CTL=${OCF_RESKEY_ovn_ctl:-${OVN_CTL_DEFAULT}} > >>> @@ -17,6 +19,7 @@ NB_MASTER_PROTO=${OCF_RESKEY_ > nb_master_protocol:-${NB_MASTER_PROTO_DEFAULT}} > >>> SB_MASTER_PORT=${OCF_RESKEY_sb_master_port:-${SB_MASTER_ > PORT_DEFAULT}} > >>> SB_MASTER_PROTO=${OCF_RESKEY_sb_master_protocol:-${SB_ > MASTER_PROTO_DEFAULT}} > >>> MANAGE_NORTHD=${OCF_RESKEY_manage_northd:-${MANAGE_NORTHD_DEFAULT}} > >>> +INACTIVE_PROBE=${OCF_RESKEY_inactive_probe_interval:-${ > INACTIVE_PROBE_DEFAULT}} > >>> > >>> # Invalid IP address is an address that can never exist in the > network, as > >>> # mentioned in rfc-5737. The ovsdb servers connects to this IP > address till > >>> @@ -101,6 +104,14 @@ ovsdb_server_metadata() { > >>> <content type="string" /> > >>> </parameter> > >>> > >>> + <parameter name="inactive_probe_interval" unique="1"> > >>> + <longdesc lang="en"> > >>> + Inactive probe interval to set for ovsdb-server. > >>> + </longdesc> > >>> + <shortdesc lang="en">Set inactive probe interval</shortdesc> > >>> + <content type="string" /> > >>> + </parameter> > >>> + > >>> </parameters> > >>> > >>> <actions> > >>> @@ -138,6 +149,22 @@ ovsdb_server_notify() { > >>> ${OVN_CTL} --ovn-manage-ovsdb=no start_northd > >>> fi > >>> > >>> + conn=`ovn-nbctl get NB_global . connections` > >>> + if [ "$conn" == "[]" ] > >>> + then > >>> + ovn-nbctl -- --id=@conn_uuid create Connection \ > >>> +target="p${NB_MASTER_PROTO}\:${NB_MASTER_PORT}\:${MASTER_IP}" \ > >>> +inactivity_probe=$INACTIVE_PROBE -- set NB_Global . > connections=@conn_uuid > >>> + fi > >>> + > >>> + conn=`ovn-sbctl get SB_global . connections` > >>> + if [ "$conn" == "[]" ] > >>> + then > >>> + ovn-sbctl -- --id=@conn_uuid create Connection \ > >>> +target="p${SB_MASTER_PROTO}\:${SB_MASTER_PORT}\:${MASTER_IP}" \ > >>> +inactivity_probe=$INACTIVE_PROBE -- set SB_Global . > connections=@conn_uuid > >>> + fi > >>> + > >>> else > >>> if [ "$MANAGE_NORTHD" = "yes" ]; then > >>> # Stop ovn-northd service. Set --ovn-manage-ovsdb=no so > that > >>> -- > >>> 2.13.5 > >>> > >>> _______________________________________________ > >>> dev mailing list > >>> [email protected] > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
