Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-11 Thread Han Zhou
Thanks for the output. It appears to be more complex than I thought before.
It is good that the new slave doesn't listen on 6641, although I am not
sure how is it achieved. I guess a stop&start has been triggered instead of
simply demote, but I need to spend some time on the pacemaker state
machine. And please ignore my comment about calling ovsdb_server_start() in
demote - it would cause recursive call since ovsdb_server_start() calls
demote(), too.

Regarding the change:
 if [ "x${present_master}" = x ]; then
+set $@ --db-nb-create-insecure-remote=yes
+set $@ --db-sb-create-insecure-remote=yes
 # No master detected, or the previous master is not among the
 # set starting.

This "if" branch is when there is no master present, but in fact we want it
to be set when current node is master. So this change doesn't affect
anything. It is the below change that made the test work (so that on slave
node the tcp port is not opened):
 elif [ ${present_master} != ${host_name} ]; then
+set $@ --db-nb-create-insecure-remote=no
+set $@ --db-sb-create-insecure-remote=no

The error log of ovsdb should not be skipped. We should never bind the LB
VIP on the ovsdb socket because it is not on the host. I think it is
related to the code in ovsdb_server_notify():
ovn-nbctl -- --id=@conn_uuid create Connection \
target="p${NB_MASTER_PROTO}\:${NB_MASTER_PORT}\:${MASTER_IP}" \
inactivity_probe=$INACTIVE_PROBE -- set NB_Global . connections=@conn_uuid

When using LB, we should set 0.0.0.0 here.

Also, the failed action is a concern. We may dig more on the root cause.
Thanks for finding these issues.

Thanks,
Han

On Fri, May 11, 2018 at 3:29 PM, aginwala  wrote:

> Sure:
>
> *VIP_ip* = 10.149.4.252
> *LB IP* = 10.149.0.40
> *slave netstat where it syncs from master LB VIP IP *
> #netstat -an | grep 6641
> tcp0  0 10.169.129.34:47426 10.149.4.252:6641
>  ESTABLISHED
> tcp0  0 10.169.129.34:47444 10.149.4.252:6641
>  ESTABLISHED
>
> *Slave OVS:, *
> # ps aux |grep ovsdb-server
> root  7388  0.0  0.0  18048   376 ?Ss   14:08   0:00
> ovsdb-server: monitoring pid 7389 (healthy)
> root  7389  0.0  0.0  18464  4556 ?S14:08   0:00
> ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/
> openvswitch/ovsdb-server-nb.log 
> --remote=punix:/var/run/openvswitch/ovnnb_db.sock
> --pidfile=/var/run/openvswitch/ovnnb_db.pid --unixctl=ovnnb_db.ctl
> --detach --monitor --remote=db:OVN_Northbound,NB_Global,connections
> --private-key=db:OVN_Northbound,SSL,private_key 
> --certificate=db:OVN_Northbound,SSL,certificate
> --ca-cert=db:OVN_Northbound,SSL,ca_cert 
> --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols
> --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers --sync-from=tcp:
> 10.149.4.252:6641 /etc/openvswitch/ovnnb_db.db
> root  7397  0.0  0.0  18048   372 ?Ss   14:08   0:00
> ovsdb-server: monitoring pid 7398 (healthy)
> root  7398  0.0  0.0  18868  5280 ?S14:08   0:01
> ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/
> openvswitch/ovsdb-server-sb.log 
> --remote=punix:/var/run/openvswitch/ovnsb_db.sock
> --pidfile=/var/run/openvswitch/ovnsb_db.pid --unixctl=ovnsb_db.ctl
> --detach --monitor --remote=db:OVN_Southbound,SB_Global,connections
> --private-key=db:OVN_Southbound,SSL,private_key 
> --certificate=db:OVN_Southbound,SSL,certificate
> --ca-cert=db:OVN_Southbound,SSL,ca_cert 
> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers --sync-from=tcp:
> 10.149.4.252:6642 /etc/openvswitch/ovnsb_db.db
>
> *Master netstat where connections is established with LB :*
> netstat -an | grep 6641
> tcp0  0 0.0.0.0:66410.0.0.0:*   LISTEN
> tcp0  0 10.169.129.33:6641  10.149.0.40:47426
>  ESTABLISHED
> tcp0  0 10.169.129.33:6641  10.149.0.40:47444
>  ESTABLISHED
>
> *Master OVS:*
> # ps aux | grep ovsdb-server
> root  3318  0.0  0.0  12940  1012 pts/0S+   15:23   0:00 grep
> --color=auto ovsdb-server
> root 11648  0.0  0.0  18048   372 ?Ss   14:08   0:00
> ovsdb-server: monitoring pid 11649 (healthy)
> root 11649  0.0  0.0  18312  4208 ?S14:08   0:01
> ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/
> openvswitch/ovsdb-server-nb.log 
> --remote=punix:/var/run/openvswitch/ovnnb_db.sock
> --pidfile=/var/run/openvswitch/ovnnb_db.pid --unixctl=ovnnb_db.ctl
> --detach --monitor --remote=db:OVN_Northbound,NB_Global,connections
> --private-key=db:OVN_Northbound,SSL,private_key 
> --certificate=db:OVN_Northbound,SSL,certificate
> --ca-cert=db:OVN_Northbound,SSL,ca_cert 
> --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols
> --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers
> --remote=ptcp:6641:0.0.0.0 --sync-from=tcp:192.0.2.254:6641
> /etc/openvswitch/ovnnb_db.db
> root 11657  0.0  0.0  18048   376 ?Ss   14:08   0:00

Re: [ovs-discuss] Using --ovs with ovn-sbctl lflow-list

2018-05-11 Thread Ben Pfaff
Oh, you're looking for an OpenFlow server.

Use "ovs-vsctl set-controller".

(Also, "tcp://IP" is not correct syntax for anything in OVS.  The
documentation explains the syntax.)

On Fri, May 11, 2018 at 11:13:57PM +, Ankur Sharma wrote:
> Hi Ben,
> 
> Thanks for pointers.
> 
> ovsdb-server understands openflow?
> We tried starting a server on ovsdb-server and now connection resets.
> 
> i.e we executed following command:
>  ovn-sbctl --ovs=tcp://SERVER_IP lflow_list
> 
> And this time connection resets, packet capture show that as soon as 
> sbctl sends OFPT_HELLO, server sends a FIN back.
> 
> So far, we could not figure how we can enable openflow on ovsdv-server (if at 
> all it is possible).
> 
> Appreciate your help.
> 
> Regards,
> Ankur
> 
> 
> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org] 
> Sent: Thursday, May 10, 2018 10:47 AM
> To: Ankur Sharma 
> Cc: ovs-discuss@openvswitch.org; Mary Manohar 
> Subject: Re: [ovs-discuss] Using --ovs with ovn-sbctl lflow-list
> 
> ovsdb-server(1)
> 
> On Thu, May 10, 2018 at 05:22:36PM +, Ankur Sharma wrote:
> > Hi Ben,
> > 
> > Got it, thanks a lot for pointing out.
> > Do we have some documentation indicating how to start tcp server  that 
> > switch would listen to?
> > 
> > Appreciate your help
> > 
> > Regards,
> > Ankur
> > 
> > 
> > -Original Message-
> > From: Ben Pfaff [mailto:b...@ovn.org]
> > Sent: Wednesday, May 9, 2018 8:31 AM
> > To: Ankur Sharma 
> > Cc: ovs-discuss@openvswitch.org
> > Subject: Re: [ovs-discuss] Using --ovs with ovn-sbctl lflow-list
> > 
> > On Tue, May 08, 2018 at 11:44:44PM +, Ankur Sharma wrote:
> > > Hi,
> > > 
> > > We are trying to use "--ovs" option with ovn-sbctl  lflow-list 
> > > command 
> > > (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ope
> > > nv 
> > > switch_ovs_commit_c2f4c39be4e288e7a08974aea53b18627a1ef9ef&d=DwIBAg&
> > > c= 
> > > s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY
> > > &m 
> > > =x0l3zFaEXEIKMYIEKQslilS0ifJUN_CD16qSs2wQD64&s=0fZZp6JkZJLQJE20wvPcF
> > > ce
> > > xCAaSuBEvKY6R5qc3LYo&e=) Running it on remote controller always 
> > > fails with connection refused error.
> > > 
> > > Do we have to start something on vswitch side?
> > > Please feel free to point out if we are missing something basic.
> > > 
> > > Example:
> > > ovn-sbctl --ovs=tcp:HYPERVISOR_IP dump-flows
> > > 
> > > 2018-05-08T23:43:16Z|1|stream|WARN|The default OpenFlow port 
> > > number has changed from 6633 to 6653
> > > 2018-05-08T23:43:16Z|2|sbctl|WARN|tcp:10.15.17.211: connection 
> > > failed (Connection refused)
> > 
> > Is the switch listening for tcp connections?  It doesn't by default.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Using --ovs with ovn-sbctl lflow-list

2018-05-11 Thread Ankur Sharma
Hi Ben,

Thanks for pointers.

ovsdb-server understands openflow?
We tried starting a server on ovsdb-server and now connection resets.

i.e we executed following command:
 ovn-sbctl --ovs=tcp://SERVER_IP lflow_list

And this time connection resets, packet capture show that as soon as 
sbctl sends OFPT_HELLO, server sends a FIN back.

So far, we could not figure how we can enable openflow on ovsdv-server (if at 
all it is possible).

Appreciate your help.

Regards,
Ankur


-Original Message-
From: Ben Pfaff [mailto:b...@ovn.org] 
Sent: Thursday, May 10, 2018 10:47 AM
To: Ankur Sharma 
Cc: ovs-discuss@openvswitch.org; Mary Manohar 
Subject: Re: [ovs-discuss] Using --ovs with ovn-sbctl lflow-list

ovsdb-server(1)

On Thu, May 10, 2018 at 05:22:36PM +, Ankur Sharma wrote:
> Hi Ben,
> 
> Got it, thanks a lot for pointing out.
> Do we have some documentation indicating how to start tcp server  that switch 
> would listen to?
> 
> Appreciate your help
> 
> Regards,
> Ankur
> 
> 
> -Original Message-
> From: Ben Pfaff [mailto:b...@ovn.org]
> Sent: Wednesday, May 9, 2018 8:31 AM
> To: Ankur Sharma 
> Cc: ovs-discuss@openvswitch.org
> Subject: Re: [ovs-discuss] Using --ovs with ovn-sbctl lflow-list
> 
> On Tue, May 08, 2018 at 11:44:44PM +, Ankur Sharma wrote:
> > Hi,
> > 
> > We are trying to use "--ovs" option with ovn-sbctl  lflow-list 
> > command 
> > (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ope
> > nv 
> > switch_ovs_commit_c2f4c39be4e288e7a08974aea53b18627a1ef9ef&d=DwIBAg&
> > c= 
> > s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY
> > &m 
> > =x0l3zFaEXEIKMYIEKQslilS0ifJUN_CD16qSs2wQD64&s=0fZZp6JkZJLQJE20wvPcF
> > ce
> > xCAaSuBEvKY6R5qc3LYo&e=) Running it on remote controller always 
> > fails with connection refused error.
> > 
> > Do we have to start something on vswitch side?
> > Please feel free to point out if we are missing something basic.
> > 
> > Example:
> > ovn-sbctl --ovs=tcp:HYPERVISOR_IP dump-flows
> > 
> > 2018-05-08T23:43:16Z|1|stream|WARN|The default OpenFlow port 
> > number has changed from 6633 to 6653
> > 2018-05-08T23:43:16Z|2|sbctl|WARN|tcp:10.15.17.211: connection 
> > failed (Connection refused)
> 
> Is the switch listening for tcp connections?  It doesn't by default.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-11 Thread aginwala
Sure:

*VIP_ip* = 10.149.4.252
*LB IP* = 10.149.0.40
*slave netstat where it syncs from master LB VIP IP *
#netstat -an | grep 6641
tcp0  0 10.169.129.34:47426 10.149.4.252:6641
 ESTABLISHED
tcp0  0 10.169.129.34:47444 10.149.4.252:6641
 ESTABLISHED

*Slave OVS:, *
# ps aux |grep ovsdb-server
root  7388  0.0  0.0  18048   376 ?Ss   14:08   0:00
ovsdb-server: monitoring pid 7389 (healthy)
root  7389  0.0  0.0  18464  4556 ?S14:08   0:00
ovsdb-server -vconsole:off -vfile:info
--log-file=/var/log/openvswitch/ovsdb-server-nb.log
--remote=punix:/var/run/openvswitch/ovnnb_db.sock
--pidfile=/var/run/openvswitch/ovnnb_db.pid --unixctl=ovnnb_db.ctl --detach
--monitor --remote=db:OVN_Northbound,NB_Global,connections
--private-key=db:OVN_Northbound,SSL,private_key
--certificate=db:OVN_Northbound,SSL,certificate
--ca-cert=db:OVN_Northbound,SSL,ca_cert
--ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols
--ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers --sync-from=tcp:
10.149.4.252:6641 /etc/openvswitch/ovnnb_db.db
root  7397  0.0  0.0  18048   372 ?Ss   14:08   0:00
ovsdb-server: monitoring pid 7398 (healthy)
root  7398  0.0  0.0  18868  5280 ?S14:08   0:01
ovsdb-server -vconsole:off -vfile:info
--log-file=/var/log/openvswitch/ovsdb-server-sb.log
--remote=punix:/var/run/openvswitch/ovnsb_db.sock
--pidfile=/var/run/openvswitch/ovnsb_db.pid --unixctl=ovnsb_db.ctl --detach
--monitor --remote=db:OVN_Southbound,SB_Global,connections
--private-key=db:OVN_Southbound,SSL,private_key
--certificate=db:OVN_Southbound,SSL,certificate
--ca-cert=db:OVN_Southbound,SSL,ca_cert
--ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
--ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers --sync-from=tcp:
10.149.4.252:6642 /etc/openvswitch/ovnsb_db.db

*Master netstat where connections is established with LB :*
netstat -an | grep 6641
tcp0  0 0.0.0.0:66410.0.0.0:*   LISTEN
tcp0  0 10.169.129.33:6641  10.149.0.40:47426
 ESTABLISHED
tcp0  0 10.169.129.33:6641  10.149.0.40:47444
 ESTABLISHED

*Master OVS:*
# ps aux | grep ovsdb-server
root  3318  0.0  0.0  12940  1012 pts/0S+   15:23   0:00 grep
--color=auto ovsdb-server
root 11648  0.0  0.0  18048   372 ?Ss   14:08   0:00
ovsdb-server: monitoring pid 11649 (healthy)
root 11649  0.0  0.0  18312  4208 ?S14:08   0:01
ovsdb-server -vconsole:off -vfile:info
--log-file=/var/log/openvswitch/ovsdb-server-nb.log
--remote=punix:/var/run/openvswitch/ovnnb_db.sock
--pidfile=/var/run/openvswitch/ovnnb_db.pid --unixctl=ovnnb_db.ctl --detach
--monitor --remote=db:OVN_Northbound,NB_Global,connections
--private-key=db:OVN_Northbound,SSL,private_key
--certificate=db:OVN_Northbound,SSL,certificate
--ca-cert=db:OVN_Northbound,SSL,ca_cert
--ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols
--ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers --remote=ptcp:6641:0.0.0.0
--sync-from=tcp:192.0.2.254:6641 /etc/openvswitch/ovnnb_db.db
root 11657  0.0  0.0  18048   376 ?Ss   14:08   0:00
ovsdb-server: monitoring pid 11658 (healthy)
root 11658  0.0  0.0  19340  5552 ?S14:08   0:01
ovsdb-server -vconsole:off -vfile:info
--log-file=/var/log/openvswitch/ovsdb-server-sb.log
--remote=punix:/var/run/openvswitch/ovnsb_db.sock
--pidfile=/var/run/openvswitch/ovnsb_db.pid --unixctl=ovnsb_db.ctl --detach
--monitor --remote=db:OVN_Southbound,SB_Global,connections
--private-key=db:OVN_Southbound,SSL,private_key
--certificate=db:OVN_Southbound,SSL,certificate
--ca-cert=db:OVN_Southbound,SSL,ca_cert
--ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
--ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers --remote=ptcp:6642:0.0.0.0
--sync-from=tcp:192.0.2.254:6642 /etc/openvswitch/ovnsb_db.db



Same is for 6642 for sb db. Hope it's clear. Sorry did not post in the
previous message as I thought you already got the point  :) .



Regards,
Aliasgar


On Fri, May 11, 2018 at 3:16 PM, Han Zhou  wrote:

> Ali, could you share output of "ps | grep ovsdb" and "netstat -lpn | grep
> 6641" on the new slave node after you do "crm resource move"?
>
> On Fri, May 11, 2018 at 2:25 PM, aginwala  wrote:
>
>> Thanks Han for more suggestions:
>>
>>
>> I did test failover by gracefully stopping pacemaker+corosync on master
>> node along with crm move and it works as expected too as crm move is
>> triggering promote of new master and hence the new master gets elected
>> along with slave getting demoted as expected to listen on sync-from node.
>> Hence, whatever code change I posted earlier is well and good.
>>
>> # crm stat
>> Stack: corosync
>> Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with
>> quorum
>> 2 nodes and 2 resources configured
>>
>> Online: [ test-pace1-2365293 test-pace2-2365308 ]
>>
>> Full list of resources:
>>
>>  Master/Slave Set: ovndb_servers-master [ovndb_servers]
>>  Masters: [ test-pace2-2365308 ]
>>  Sla

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-11 Thread Han Zhou
Ali, could you share output of "ps | grep ovsdb" and "netstat -lpn | grep
6641" on the new slave node after you do "crm resource move"?

On Fri, May 11, 2018 at 2:25 PM, aginwala  wrote:

> Thanks Han for more suggestions:
>
>
> I did test failover by gracefully stopping pacemaker+corosync on master
> node along with crm move and it works as expected too as crm move is
> triggering promote of new master and hence the new master gets elected
> along with slave getting demoted as expected to listen on sync-from node.
> Hence, whatever code change I posted earlier is well and good.
>
> # crm stat
> Stack: corosync
> Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with
> quorum
> 2 nodes and 2 resources configured
>
> Online: [ test-pace1-2365293 test-pace2-2365308 ]
>
> Full list of resources:
>
>  Master/Slave Set: ovndb_servers-master [ovndb_servers]
>  Masters: [ test-pace2-2365308 ]
>  Slaves: [ test-pace1-2365293 ]
>
> #crm --debug resource move ovndb_servers test-pace1-2365293
> DEBUG: pacemaker version: [err: ][out: CRM Version: 1.1.14 (70404b0)]
> DEBUG: found pacemaker version: 1.1.14
> DEBUG: invoke: crm_resource --quiet --move -r 'ovndb_servers'
> --node='test-pace1-2365293'
> # crm stat
>
> Stack: corosync
> Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with
> quorum
> 2 nodes and 2 resources configured
>
> Online: [ test-pace1-2365293 test-pace2-2365308 ]
>
> Full list of resources:
>
>  Master/Slave Set: ovndb_servers-master [ovndb_servers]
>  Masters: [ test-pace1-2365293 ]
>  Slaves: [ test-pace2-2365308 ]
>
> Failed Actions:
> * ovndb_servers_monitor_1 on test-pace2-2365308 'master' (8): call=46,
> status=complete, exitreason='none',
> last-rc-change='Fri May 11 14:08:35 2018', queued=0ms, exec=83ms
>
> Note: Failed Actions warning only comes for crm move command and not
> using reboot/kill/service pacemaker/corosync stop/start
>
> I cleaned up the warning using below commad:
> #crm_resource -P
> Waiting for 1 replies from the CRMd. OK
>
> Also wanted to call out above findings noticed that ocf_attribute_target
> is not getting called as per pacemaker logs as code says it will not work
> for older pacemaker versions and not sure what versions exactly as I am on
> version 1.1.14
> # pacemaker logs
>  notice: operation_finished: ovndb_servers_monitor_1:7561:stderr [
> /usr/lib/ocf/resource.d/ovn/ovndb-servers: line 31: ocf_attribute_target:
> command not found ]
>
>
> # Also need nb db logs are showing socket util errors which I think need a
> code change too to skip stamping it as functionality is still working as
> expected (may be in a separate commit since its ovsdb change)
> 018-05-11T21:14:25.958Z|00560|socket_util|ERR|6641:10.149.4.252: bind:
> Cannot assign requested address
> 2018-05-11T21:14:25.958Z|00561|socket_util|ERR|6641:10.149.4.252: bind:
> Cannot assign requested address
> 2018-05-11T21:14:27.859Z|00562|socket_util|ERR|6641:10.149.4.252: bind:
> Cannot assign requested address
>
>
>
> Let me know for any suggestions further.
>
>
> Regards,
> Aliasgar
>
>
> On Thu, May 10, 2018 at 3:49 PM, Han Zhou  wrote:
>
>> Good progress!
>>
>> I think at least one more change is needed to ensure when demote happens,
>> the TCP port is shut down. Otherwise, the LB will be confused again and
>> can't figure out which one is active. This is the graceful failover
>> scenario which can be tested by crm resource move instead of reboot/killing
>> process.
>>
>> This may be done by the same approach you did for promote, i.e. stop
>> ovsdb and then call ovsdb_server_start() so the parameters are reset
>> correctly before starting. Alternatively we can add a command in
>> ovsdb-server, in addition to the commands that switches to/from
>> active/backup modes, to open/close the TCP ports, to avoid restarting
>> during failover, but I am not sure if this is valuable. It depends on
>> whether restarting ovsdb-server during failover is sufficient enough. Could
>> you add the restart logic for demote and try more? Thanks!
>>
>> Thanks,
>> Han
>>
>> On Thu, May 10, 2018 at 1:54 PM, aginwala  wrote:
>>
>>> Hi :
>>>
>>> Just to further update, I am able to re-open tcp port for failover
>>> scenario when new master is getting promoted with additional code changes
>>> as below which do require stop of ovs service on the new selected master to
>>> reset the tcp settings:
>>>
>>>
>>> diff --git a/ovn/utilities/ovndb-servers.ocf
>>> b/ovn/utilities/ovndb-servers.ocf
>>> index 164b6bc..8cb4c25 100755
>>> --- a/ovn/utilities/ovndb-servers.ocf
>>> +++ b/ovn/utilities/ovndb-servers.ocf
>>> @@ -295,8 +295,8 @@ ovsdb_server_start() {
>>>
>>>  set ${OVN_CTL}
>>>
>>> -set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
>>> -set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
>>> +set $@ --db-nb-port=${NB_MASTER_PORT}
>>> +set $@ --db-sb-port=${SB_MASTER_PORT}
>>>
>>>  if [ "x${NB_MASTER_PROTO}" = xtcp ]; 

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-11 Thread aginwala
Thanks Han for more suggestions:


I did test failover by gracefully stopping pacemaker+corosync on master
node along with crm move and it works as expected too as crm move is
triggering promote of new master and hence the new master gets elected
along with slave getting demoted as expected to listen on sync-from node.
Hence, whatever code change I posted earlier is well and good.

# crm stat
Stack: corosync
Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with
quorum
2 nodes and 2 resources configured

Online: [ test-pace1-2365293 test-pace2-2365308 ]

Full list of resources:

 Master/Slave Set: ovndb_servers-master [ovndb_servers]
 Masters: [ test-pace2-2365308 ]
 Slaves: [ test-pace1-2365293 ]

#crm --debug resource move ovndb_servers test-pace1-2365293
DEBUG: pacemaker version: [err: ][out: CRM Version: 1.1.14 (70404b0)]
DEBUG: found pacemaker version: 1.1.14
DEBUG: invoke: crm_resource --quiet --move -r 'ovndb_servers'
--node='test-pace1-2365293'
# crm stat

Stack: corosync
Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with
quorum
2 nodes and 2 resources configured

Online: [ test-pace1-2365293 test-pace2-2365308 ]

Full list of resources:

 Master/Slave Set: ovndb_servers-master [ovndb_servers]
 Masters: [ test-pace1-2365293 ]
 Slaves: [ test-pace2-2365308 ]

Failed Actions:
* ovndb_servers_monitor_1 on test-pace2-2365308 'master' (8): call=46,
status=complete, exitreason='none',
last-rc-change='Fri May 11 14:08:35 2018', queued=0ms, exec=83ms

Note: Failed Actions warning only comes for crm move command and not using
reboot/kill/service pacemaker/corosync stop/start

I cleaned up the warning using below commad:
#crm_resource -P
Waiting for 1 replies from the CRMd. OK

Also wanted to call out above findings noticed that ocf_attribute_target is
not getting called as per pacemaker logs as code says it will not work for
older pacemaker versions and not sure what versions exactly as I am on
version 1.1.14
# pacemaker logs
 notice: operation_finished: ovndb_servers_monitor_1:7561:stderr [
/usr/lib/ocf/resource.d/ovn/ovndb-servers: line 31: ocf_attribute_target:
command not found ]


# Also need nb db logs are showing socket util errors which I think need a
code change too to skip stamping it as functionality is still working as
expected (may be in a separate commit since its ovsdb change)
018-05-11T21:14:25.958Z|00560|socket_util|ERR|6641:10.149.4.252: bind:
Cannot assign requested address
2018-05-11T21:14:25.958Z|00561|socket_util|ERR|6641:10.149.4.252: bind:
Cannot assign requested address
2018-05-11T21:14:27.859Z|00562|socket_util|ERR|6641:10.149.4.252: bind:
Cannot assign requested address



Let me know for any suggestions further.


Regards,
Aliasgar


On Thu, May 10, 2018 at 3:49 PM, Han Zhou  wrote:

> Good progress!
>
> I think at least one more change is needed to ensure when demote happens,
> the TCP port is shut down. Otherwise, the LB will be confused again and
> can't figure out which one is active. This is the graceful failover
> scenario which can be tested by crm resource move instead of reboot/killing
> process.
>
> This may be done by the same approach you did for promote, i.e. stop ovsdb
> and then call ovsdb_server_start() so the parameters are reset correctly
> before starting. Alternatively we can add a command in ovsdb-server, in
> addition to the commands that switches to/from active/backup modes, to
> open/close the TCP ports, to avoid restarting during failover, but I am not
> sure if this is valuable. It depends on whether restarting ovsdb-server
> during failover is sufficient enough. Could you add the restart logic for
> demote and try more? Thanks!
>
> Thanks,
> Han
>
> On Thu, May 10, 2018 at 1:54 PM, aginwala  wrote:
>
>> Hi :
>>
>> Just to further update, I am able to re-open tcp port for failover
>> scenario when new master is getting promoted with additional code changes
>> as below which do require stop of ovs service on the new selected master to
>> reset the tcp settings:
>>
>>
>> diff --git a/ovn/utilities/ovndb-servers.ocf
>> b/ovn/utilities/ovndb-servers.ocf
>> index 164b6bc..8cb4c25 100755
>> --- a/ovn/utilities/ovndb-servers.ocf
>> +++ b/ovn/utilities/ovndb-servers.ocf
>> @@ -295,8 +295,8 @@ ovsdb_server_start() {
>>
>>  set ${OVN_CTL}
>>
>> -set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
>> -set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
>> +set $@ --db-nb-port=${NB_MASTER_PORT}
>> +set $@ --db-sb-port=${SB_MASTER_PORT}
>>
>>  if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
>>  set $@ --db-nb-create-insecure-remote=yes
>> @@ -307,6 +307,8 @@ ovsdb_server_start() {
>>  fi
>>
>>  if [ "x${present_master}" = x ]; then
>> +set $@ --db-nb-create-insecure-remote=yes
>> +set $@ --db-sb-create-insecure-remote=yes
>>  # No master detected, or the previous master is not among the
>>  # set starting.
>>   

Re: [ovs-discuss] ovs-ofctl mod-port doesn't work with ports in network namespaces

2018-05-11 Thread Justin Pettit

> On May 11, 2018, at 3:00 AM, Jakub Libosvar  wrote:
> 
> Hi all,
> it seems I hit a bug when trying to implement a fix in OpenStack Neutron
> that uses heavily network namespaces with OVS internal ports in them.
> 
> Ports attached to OVS bridge that are placed in network namespace can't
> have their status changed using 'ovs-ofctl mod-port' command.
> 
> Steps to reproduce:
> 
> ovs-vsctl add-br test-br
> ovs-vsctl add-port test-br test-port -- set Interface test-port
> type=internal
> ip net a test-ns
> ip l s test-port netns test-ns
> ovs-ofctl mod-port test-br test-port up
> 
> -- check that port is still down
> ip net e test-ns ip l sh test-port
>test-port:  <--- port is still DOWN

I think this is expected behavior.  You're moving the port into a different 
namespace from ovs-vswitchd, so the port doesn't actually exist in its view 
anymore.  (You probably don't get interface counters either and it won't show 
up in utilities like ifconfig.)  However, ovs-vswitchd still has a handle to 
the interface so that it can send and receive traffic.  The advantage is that 
you get much better performance than something like veths, but you do get some 
weirdness like this, since its sort of breaking the namespace model.

--Justin


___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Incremental perf results

2018-05-11 Thread Han Zhou
On Fri, May 11, 2018 at 7:58 AM, Mark Michelson  wrote:

> On 05/10/2018 08:33 PM, Han Zhou wrote:
>
>>
>>
>> On Wed, May 9, 2018 at 12:21 PM, Mark Michelson > > wrote:
>>
>> On 05/07/2018 08:29 AM, Mark Michelson wrote:
>>
>> On 05/04/2018 04:15 PM, Han Zhou wrote:
>>
>>
>>
>> On Thu, May 3, 2018 at 10:49 AM, Mark Michelson
>> mailto:mmich...@redhat.com>
>> >>
>>
>> wrote:
>>
>>  On 05/03/2018 12:58 PM, Mark Michelson wrote:
>>
>>  Hi Han,
>>
>>  (cc'ed ovs-discuss)
>>
>>  I have some test results here for your incremental
>> branch[0].
>>
>>  Browbeat was the test orchestrator for the test,
>> and it uses
>>  ovn-scale-test[1] to configure the test parameters
>> and run the test.
>>
>>  The test use one central node on which ovn-northd
>> runs. There
>>  are three farm nodes on which sandboxes are run for
>>  ovn-controller. Each farm node runs 24 sandboxes,
>> for a total of
>>  72 sandboxes (and thus 72 ovn-controller processes).
>>
>>  The test uses the datapath context[2] to set up 72
>> logical
>>  switches and one logical router in advance. Then
>> during each
>>  test iteration, a logical switch port is added to
>> one of the
>>  logical switches and bound on one of the sandboxes.
>> The next
>>  iteration does not start until the previous
>> iteration is 100%
>>  complete (i.e. we see that the logical switch port
>> is "up" in
>>  the northbound db). The total number of logical
>> switch ports
>>  added during the tests is 3312.
>>
>>  During the test, I ran `perf record` on one of the
>>  ovn-controller processes and then created a flame
>> graph[3] from
>>  the results. I have attached the flame graph to
>> this e-mail. I
>>  think this can give us a good jumping off point for
>> determining
>>  more optimizations to make to ovn-controller.
>>
>>  [0] https://github.com/hzhou8/ovs/tree/ip7
>> 
>>  > >
>>  [1] https://github.com/openvswitch/ovn-scale-test
>> 
>>  > >
>>  [2]
>> https://github.com/openvswitch/ovn-scale-test/pull/165
>> 
>> > h/ovn-scale-test/pull/165
>> >
>>  [3]
>> http://www.brendangregg.com/FlameGraphs/cpuflamegraphs
>> 
>> > lameGraphs/cpuflamegraphs
>> >
>>
>>
>>
>>   From the IRC meeting, it was requested to see a flame
>> graph of
>>  performance on OVS master. I am attaching that on this
>> e-mail.
>>
>>  One difference in this test run is that the number of
>> switch ports
>>  was fewer (I'm not 100% sure of the exact number), so
>> the number of
>>  samples in perf record is less than in the flame graph
>> I previously
>>  sent.
>>
>>  The vast majority of the time is spent in lflow_run().
>> Based on this
>>  flame graph, our initial take on the matter was that we
>> could get
>>  improved performance by reducing the number of logical
>> flows to
>>  process. The incremental branch seemed like a good
>> testing target to
>>  that end.
>>
>>
>> Thanks Mark for sharing the results!
>> It seems you have sent the wrong attachment perf-master.svg
>> in your second email, which is still the same one as in the
>> first email. Would you mind sendin

Re: [ovs-discuss] Geneve and IPv6

2018-05-11 Thread Gregory Rose

On 5/10/2018 8:08 PM, Ben Pfaff wrote:

On Thu, May 10, 2018 at 05:54:30PM -0700, Gregory Rose wrote:

On 5/1/2018 6:53 PM, Ben Pfaff wrote:

Hi Greg.

We've had multiple reports now that Geneve kernel-based tunnels do not
work in Open vSwitch if IPv6 is not enabled.  Do you have an idea
whether we should consider this a bug in OVS or a bug in the
documentation?  That is, should we plan to fix it (when we can) or
should we plan to document that Geneve requires IPv6?  (Or is something
else going on?)

No rush.

Thanks,

Ben.

Sorry for the late response.  It is just how the Linux kernel is
configured.  There was an internal bug
filed a few months ago where I found that geneve depends on a certain kernel
configuration that
cannot be overridden and disabling ipv6 causes the check for the
configuration to always fail.

The upshot is that there is no fix without making changes to the upstream
Linux kernel.  It is a Linux
kernel limitation and should probably be documented as you say, i.e. "Geneve
requires ipv6".

Thanks for investigating.

Even if this isn't fixable out-of-tree, is it something for which we
should submit a fix so that future upstream releases don't have the same
limitation?


It's worth investigating.  The way the kernel forces the ipv6 option 
with geneve gives me reason to
believe that there are some dependencies that would need to be undone so 
it's hard to say how

much work it would require.

Thanks,

- Greg



Thanks,

Ben.


___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Incremental perf results

2018-05-11 Thread Mark Michelson

On 05/10/2018 08:33 PM, Han Zhou wrote:



On Wed, May 9, 2018 at 12:21 PM, Mark Michelson > wrote:


On 05/07/2018 08:29 AM, Mark Michelson wrote:

On 05/04/2018 04:15 PM, Han Zhou wrote:



On Thu, May 3, 2018 at 10:49 AM, Mark Michelson
mailto:mmich...@redhat.com>
>>
wrote:

     On 05/03/2018 12:58 PM, Mark Michelson wrote:

     Hi Han,

     (cc'ed ovs-discuss)

     I have some test results here for your incremental
branch[0].

     Browbeat was the test orchestrator for the test,
and it uses
     ovn-scale-test[1] to configure the test parameters
and run the test.

     The test use one central node on which ovn-northd
runs. There
     are three farm nodes on which sandboxes are run for
     ovn-controller. Each farm node runs 24 sandboxes,
for a total of
     72 sandboxes (and thus 72 ovn-controller processes).

     The test uses the datapath context[2] to set up 72
logical
     switches and one logical router in advance. Then
during each
     test iteration, a logical switch port is added to
one of the
     logical switches and bound on one of the sandboxes.
The next
     iteration does not start until the previous
iteration is 100%
     complete (i.e. we see that the logical switch port
is "up" in
     the northbound db). The total number of logical
switch ports
     added during the tests is 3312.

     During the test, I ran `perf record` on one of the
     ovn-controller processes and then created a flame
graph[3] from
     the results. I have attached the flame graph to
this e-mail. I
     think this can give us a good jumping off point for
determining
     more optimizations to make to ovn-controller.

     [0] https://github.com/hzhou8/ovs/tree/ip7

     >
     [1] https://github.com/openvswitch/ovn-scale-test

     >
     [2]
https://github.com/openvswitch/ovn-scale-test/pull/165


>
     [3]
http://www.brendangregg.com/FlameGraphs/cpuflamegraphs


>



  From the IRC meeting, it was requested to see a flame
graph of
     performance on OVS master. I am attaching that on this
e-mail.

     One difference in this test run is that the number of
switch ports
     was fewer (I'm not 100% sure of the exact number), so
the number of
     samples in perf record is less than in the flame graph
I previously
     sent.

     The vast majority of the time is spent in lflow_run().
Based on this
     flame graph, our initial take on the matter was that we
could get
     improved performance by reducing the number of logical
flows to
     process. The incremental branch seemed like a good
testing target to
     that end.


Thanks Mark for sharing the results!
It seems you have sent the wrong attachment perf-master.svg
in your second email, which is still the same one as in the
first email. Would you mind sending the right one? Also,
please if you could share total CPU cost for incremental
v.s. master, when you have the data.

  From your text description, it is improved as expected,
since the bottleneck moved from lflow_run() to ofctrl_put().
For the new bottleneck ofctrl_put(), it's a good finding,
and I t

[ovs-discuss] ovs-ofctl mod-port doesn't work with ports in network namespaces

2018-05-11 Thread Jakub Libosvar
Hi all,
it seems I hit a bug when trying to implement a fix in OpenStack Neutron
that uses heavily network namespaces with OVS internal ports in them.

Ports attached to OVS bridge that are placed in network namespace can't
have their status changed using 'ovs-ofctl mod-port' command.

Steps to reproduce:

 ovs-vsctl add-br test-br
 ovs-vsctl add-port test-br test-port -- set Interface test-port
type=internal
 ip net a test-ns
 ip l s test-port netns test-ns
 ovs-ofctl mod-port test-br test-port up

-- check that port is still down
 ip net e test-ns ip l sh test-port
test-port:  <--- port is still DOWN
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss