On Wed, Mar 21, 2018 at 9:49 AM, aginwala <[email protected]> wrote: > Thanks Numan: > > Yup agree with the locking part. For now; yes I am running northd on one > node. I might right a script to monitor northd in cluster so that if the > node where it's running goes down, script can spin up northd on one other > active nodes as a dirty hack. > > The "dirty hack" is pacemaker :)
> Sure, will await for the inputs from Ben too on this and see how complex > would it be to roll out this feature. > > > Regards, > > > On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique <[email protected]> > wrote: > >> Hi Aliasgar, >> >> ovsdb-server maintains locks per each connection and not across the db. A >> workaround for you now would be to configure all the ovn-northd instances >> to connect to one ovsdb-server if you want to have active/standy. >> >> Probably Ben can answer if there is a plan to support ovsdb locks across >> the db. We also need this support in networking-ovn as it also uses ovsdb >> locks. >> >> Thanks >> Numan >> >> >> On Wed, Mar 21, 2018 at 1:40 PM, aginwala <[email protected]> wrote: >> >>> Hi Numan: >>> >>> Just figured out that ovn-northd is running as active on all 3 nodes >>> instead of one active instance as I continued to test further which results >>> in db errors as per logs. >>> >>> >>> # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs in >>> ovn-north >>> 2018-03-21T06:01:59.442Z|00007|ovsdb_idl|WARN|transaction error: >>> {"details":"Transaction causes multiple rows in \"Datapath_Binding\" table >>> to have identical values (1) for index on column \"tunnel_key\". First >>> row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was inserted by >>> this transaction. Second row, with UUID >>> 8e06f919-4cc7-4ffc-9a79-20ce6663b683, >>> existed in the database before this transaction and was not modified by the >>> transaction.","error":"constraint violation"} >>> >>> In southbound datapath list, 2 duplicate records gets created for same >>> switch. >>> >>> # ovn-sbctl list Datapath >>> _uuid : b270ae30-3458-445f-95d2-b14e8ebddd01 >>> external_ids : >>> {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d", >>> name="ls2"} >>> tunnel_key : 2 >>> >>> _uuid : 8e06f919-4cc7-4ffc-9a79-20ce6663b683 >>> external_ids : >>> {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d", >>> name="ls2"} >>> tunnel_key : 1 >>> >>> >>> >>> # on nodes 1 and 2 where northd is running, it gives below error: >>> 2018-03-21T06:01:59.437Z|00008|ovsdb_idl|WARN|transaction error: >>> {"details":"cannot delete Datapath_Binding row >>> 8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining >>> reference(s)","error":"referential integrity violation"} >>> >>> As per commit message, for northd I re-tried setting --ovnnb-db="tcp: >>> 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641" >>> and --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp: >>> 10.148.181.162:6642" and it did not help either. >>> >>> There is no issue if I keep running only one instance of northd on any >>> of these 3 nodes. Hence, wanted to know is there something else missing >>> here to make only one northd instance as active and rest as standby? >>> >>> >>> Regards, >>> >>> On Thu, Mar 15, 2018 at 3:09 AM, Numan Siddique <[email protected]> >>> wrote: >>> >>>> That's great >>>> >>>> Numan >>>> >>>> >>>> On Thu, Mar 15, 2018 at 2:57 AM, aginwala <[email protected]> wrote: >>>> >>>>> Hi Numan: >>>>> >>>>> I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu 16.04)with >>>>> fresh installation and it worked super fine for both sb and nb dbs. Seems >>>>> like some kernel issue on the previous nodes when I re-installed raft >>>>> patch >>>>> as I was running different ovs version on those nodes before. >>>>> >>>>> >>>>> For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp: >>>>> 10.169.125.131:6642, tcp:10.148.181.162:6642" and started controller >>>>> and it works super fine. >>>>> >>>>> >>>>> Did some failover testing by rebooting/killing the leader ( >>>>> 10.169.125.152) and bringing it back up and it works as expected. >>>>> Nothing weird noted so far. >>>>> >>>>> # check-cluster gives below data one of the node(10.148.181.162) post >>>>> leader failure >>>>> >>>>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db >>>>> ovsdb-tool: leader /etc/openvswitch/ovnsb_db.db for term 2 has log >>>>> entries only up to index 18446744073709551615, but index 9 was committed >>>>> in >>>>> a previous term (e.g. by /etc/openvswitch/ovnsb_db.db) >>>>> >>>>> >>>>> For check-cluster, are we planning to add more output showing which >>>>> node is active(leader), etc in upcoming versions ? >>>>> >>>>> >>>>> Thanks a ton for helping sort this out. I think the patch looks good >>>>> to be merged post addressing of the comments by Justin along with the man >>>>> page details for ovsdb-tool. >>>>> >>>>> >>>>> I will do some more crash testing for the cluster along with the scale >>>>> test and keep you posted if something unexpected is noted. >>>>> >>>>> >>>>> >>>>> Regards, >>>>> >>>>> >>>>> >>>>> On Tue, Mar 13, 2018 at 11:07 PM, Numan Siddique <[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Mar 14, 2018 at 7:51 AM, aginwala <[email protected]> wrote: >>>>>> >>>>>>> Sure. >>>>>>> >>>>>>> To add on , I also ran for nb db too using different port and >>>>>>> Node2 crashes with same error : >>>>>>> # Node 2 >>>>>>> /usr/share/openvswitch/scripts/ovn-ctl --db-nb-addr=10.99.152.138 >>>>>>> --db-nb-port=6641 --db-nb-cluster-remote-addr="tcp: >>>>>>> 10.99.152.148:6645" --db-nb-cluster-local-addr="tcp: >>>>>>> 10.99.152.138:6645" start_nb_ovsdb >>>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db: cannot >>>>>>> identify file type >>>>>>> >>>>>>> >>>>>>> >>>>>> Hi Aliasgar, >>>>>> >>>>>> It worked for me. Can you delete the old db files in >>>>>> /etc/openvswitch/ and try running the commands again ? >>>>>> >>>>>> Below are the commands I ran in my setup. >>>>>> >>>>>> Node 1 >>>>>> ------- >>>>>> sudo /usr/share/openvswitch/scripts/ovn-ctl >>>>>> --db-sb-addr=192.168.121.91 --db-sb-port=6642 >>>>>> --db-sb-create-insecure-remote=yes >>>>>> --db-sb-cluster-local-addr=tcp:192.168.121.91:6644 start_sb_ovsdb >>>>>> >>>>>> Node 2 >>>>>> --------- >>>>>> sudo /usr/share/openvswitch/scripts/ovn-ctl >>>>>> --db-sb-addr=192.168.121.87 --db-sb-port=6642 >>>>>> --db-sb-create-insecure-remote=yes >>>>>> --db-sb-cluster-local-addr="tcp:192.168.121.87:6644" >>>>>> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644" start_sb_ovsdb >>>>>> >>>>>> Node 3 >>>>>> --------- >>>>>> sudo /usr/share/openvswitch/scripts/ovn-ctl >>>>>> --db-sb-addr=192.168.121.78 --db-sb-port=6642 >>>>>> --db-sb-create-insecure-remote=yes >>>>>> --db-sb-cluster-local-addr="tcp:192.168.121.78:6644" >>>>>> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644" start_sb_ovsdb >>>>>> >>>>>> >>>>>> >>>>>> Thanks >>>>>> Numan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> On Tue, Mar 13, 2018 at 9:40 AM, Numan Siddique <[email protected] >>>>>>> > wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Mar 13, 2018 at 9:46 PM, aginwala <[email protected]> wrote: >>>>>>>> >>>>>>>>> Thanks Numan for the response. >>>>>>>>> >>>>>>>>> There is no command start_cluster_sb_ovsdb in the source code >>>>>>>>> too. Is that in a separate commit somewhere? Hence, I used >>>>>>>>> start_sb_ovsdb >>>>>>>>> which I think would not be a right choice? >>>>>>>>> >>>>>>>> >>>>>>>> Sorry, I meant start_sb_ovsdb. Strange that it didn't work for you. >>>>>>>> Let me try it out again and update this thread. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Numan >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> # Node1 came up as expected. >>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642 >>>>>>>>> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tc >>>>>>>>> p:10.99.152.148:6644" start_sb_ovsdb. >>>>>>>>> >>>>>>>>> # verifying its a clustered db with ovsdb-tool db-local-address >>>>>>>>> /etc/openvswitch/ovnsb_db.db >>>>>>>>> tcp:10.99.152.148:6644 >>>>>>>>> # ovn-sbctl show works fine and chassis are being populated >>>>>>>>> correctly. >>>>>>>>> >>>>>>>>> #Node 2 fails with error: >>>>>>>>> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138 >>>>>>>>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes >>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644" >>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb >>>>>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot >>>>>>>>> identify file type >>>>>>>>> >>>>>>>>> # So i did start the sb db the usual way using start_ovsdb to just >>>>>>>>> get the db file created and killed the sb pid and re-ran the command >>>>>>>>> which >>>>>>>>> gave actual error where it complains for join-cluster command that is >>>>>>>>> being >>>>>>>>> called internally >>>>>>>>> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138 >>>>>>>>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes >>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644" >>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb >>>>>>>>> ovsdb-tool: /etc/openvswitch/ovnsb_db.db: not a clustered database >>>>>>>>> * Backing up database to /etc/openvswitch/ovnsb_db.db.b >>>>>>>>> ackup1.15.0-70426956 >>>>>>>>> ovsdb-tool: 'join-cluster' command requires at least 4 arguments >>>>>>>>> * Creating cluster database /etc/openvswitch/ovnsb_db.db from >>>>>>>>> existing one >>>>>>>>> >>>>>>>>> >>>>>>>>> # based on above error I killed the sb db pid again and try to >>>>>>>>> create a local cluster on node then re-ran the join operation as per >>>>>>>>> the >>>>>>>>> source code function. >>>>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db >>>>>>>>> OVN_Southbound tcp:10.99.152.138:6644 tcp:10.99.152.148:6644 >>>>>>>>> which still complains >>>>>>>>> ovsdb-tool: I/O error: /etc/openvswitch/ovnsb_db.db: create failed >>>>>>>>> (File exists) >>>>>>>>> >>>>>>>>> >>>>>>>>> # Node 3: I did not try as I am assuming the same failure as node 2 >>>>>>>>> >>>>>>>>> >>>>>>>>> Let me know may know further. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Mar 13, 2018 at 3:08 AM, Numan Siddique < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi Aliasgar, >>>>>>>>>> >>>>>>>>>> On Tue, Mar 13, 2018 at 7:11 AM, aginwala <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Ben/Noman: >>>>>>>>>>> >>>>>>>>>>> I am trying to setup 3 node southbound db cluster using raft10 >>>>>>>>>>> <https://patchwork.ozlabs.org/patch/854298/> in review. >>>>>>>>>>> >>>>>>>>>>> # Node 1 create-cluster >>>>>>>>>>> ovsdb-tool create-cluster /etc/openvswitch/ovnsb_db.db >>>>>>>>>>> /root/ovs-reviews/ovn/ovn-sb.ovsschema tcp:10.99.152.148:6642 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> A different port is used for RAFT. So you have to choose another >>>>>>>>>> port like 6644 for example. >>>>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> # Node 2 >>>>>>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db >>>>>>>>>>> OVN_Southbound tcp:10.99.152.138:6642 tcp:10.99.152.148:6642 --cid >>>>>>>>>>> 5dfcb678-bb1d-4377-b02d-a380edec2982 >>>>>>>>>>> >>>>>>>>>>> #Node 3 >>>>>>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db >>>>>>>>>>> OVN_Southbound tcp:10.99.152.101:6642 tcp:10.99.152.138:6642 >>>>>>>>>>> tcp:10.99.152.148:6642 --cid 5dfcb678-bb1d-4377-b02d-a380ed >>>>>>>>>>> ec2982 >>>>>>>>>>> >>>>>>>>>>> # ovn remote is set to all 3 nodes >>>>>>>>>>> external_ids:ovn-remote="tcp:10.99.152.148:6642, tcp: >>>>>>>>>>> 10.99.152.138:6642, tcp:10.99.152.101:6642" >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> # Starting sb db on node 1 using below command on node 1: >>>>>>>>>>> >>>>>>>>>>> ovsdb-server --detach --monitor -vconsole:off -vraft -vjsonrpc >>>>>>>>>>> --log-file=/var/log/openvswitch/ovsdb-server-sb.log >>>>>>>>>>> --pidfile=/var/run/openvswitch/ovnsb_db.pid >>>>>>>>>>> --remote=db:OVN_Southbound,SB_Global,connections >>>>>>>>>>> --unixctl=ovnsb_db.ctl >>>>>>>>>>> --private-key=db:OVN_Southbound,SSL,private_key >>>>>>>>>>> --certificate=db:OVN_Southbound,SSL,certificate >>>>>>>>>>> --ca-cert=db:OVN_Southbound,SSL,ca_cert >>>>>>>>>>> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols >>>>>>>>>>> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers >>>>>>>>>>> --remote=punix:/var/run/openvswitch/ovnsb_db.sock >>>>>>>>>>> /etc/openvswitch/ovnsb_db.db >>>>>>>>>>> >>>>>>>>>>> # check-cluster is returning nothing >>>>>>>>>>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db >>>>>>>>>>> >>>>>>>>>>> # ovsdb-server-sb.log below shows the leader is elected with >>>>>>>>>>> only one server and there are rbac related debug logs with rpc >>>>>>>>>>> replies and >>>>>>>>>>> empty params with no errors >>>>>>>>>>> >>>>>>>>>>> 2018-03-13T01:12:02Z|00002|raft|DBG|server 63d1 added to >>>>>>>>>>> configuration >>>>>>>>>>> 2018-03-13T01:12:02Z|00003|raft|INFO|term 6: starting election >>>>>>>>>>> 2018-03-13T01:12:02Z|00004|raft|INFO|term 6: elected leader by >>>>>>>>>>> 1+ of 1 servers >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Now Starting the ovsdb-server on the other clusters fails saying >>>>>>>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot >>>>>>>>>>> identify file type >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Also noticed that man ovsdb-tool is missing cluster details. >>>>>>>>>>> Might want to address it in the same patch or different. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Please advise to what is missing here for running ovn-sbctl show >>>>>>>>>>> as this command hangs. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I think you can use the ovn-ctl command "start_cluster_sb_ovsdb" >>>>>>>>>> for your testing (atleast for now) >>>>>>>>>> >>>>>>>>>> For your setup, I think you can start the cluster as >>>>>>>>>> >>>>>>>>>> # Node 1 >>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642 >>>>>>>>>> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tc >>>>>>>>>> p:10.99.152.148:6644" start_cluster_sb_ovsdb >>>>>>>>>> >>>>>>>>>> # Node 2 >>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.138 --db-sb-port=6642 >>>>>>>>>> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tc >>>>>>>>>> p:10.99.152.138:6644" --db-sb-cluster-remote-addr="tcp: >>>>>>>>>> 10.99.152.148:6644" start_cluster_sb_ovsdb >>>>>>>>>> >>>>>>>>>> # Node 3 >>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.101 --db-sb-port=6642 >>>>>>>>>> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tc >>>>>>>>>> p:10.99.152.101:6644" --db-sb-cluster-remote-addr="tcp: >>>>>>>>>> 10.99.152.148:6644" start_cluster_sb_ovsdb >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Let me know how it goes. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Numan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> discuss mailing list >>>>>>>>>>> [email protected] >>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > > _______________________________________________ > discuss mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > >
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
