Re: [ovs-discuss] raft ovsdb clustering

2018-04-04 Thread aginwala
Cool! Yup Makes sense for sandbox northd also to point to clustered nb/sb
dbs.

On Wed, Apr 4, 2018 at 4:01 PM, Ben Pfaff  wrote:

> Oh, I see, from reading further in the thread, that this was indeed a
> misunderstanding.  Well, in any case that new option to ovs-sandbox can
> be useful.
>
> On Wed, Apr 04, 2018 at 04:00:20PM -0700, Ben Pfaff wrote:
> > I would like to support cluster-wide locks.  They require extra work and
> > they require new OVSDB JSON-RPC protocol design (because locks are
> > currently per-server, not per-database).  I do not currently have a
> > schedule for designing and implementing them.
> >
> > However, I am surprised that this is an issue for northd.  For a
> > clustered database, ovn-northd always connects to the cluster leader.
> > There is at most one leader in the cluster at a given time, so as long
> > as ovn-northd obtains a lock on the leader, this should ensure that only
> > one ovn-northd is active at a time.  There could be brief races, in
> > which two ovn-northds believe that they have the lock, but they should
> > not persist.
> >
> > You see different behavior, so there is a bug or a misunderstanding.
> > I don't see the same misbehavior, though, when I do a similar test in
> > the sandbox.  If you apply the patches I just posted:
> > https://patchwork.ozlabs.org/patch/895184/
> > https://patchwork.ozlabs.org/patch/895185/
> > then you can try it out with:
> > make sandbox SANDBOXFLAGS='--ovn --sbdb-model=clustered
> --n-northds=3'
> >
> > On Wed, Mar 21, 2018 at 01:12:48PM -0700, aginwala wrote:
> > > :) The only thing is while using pacemaker, if the node that pacemaker
> if
> > > pointing to is down, all the active/standby northd nodes have to be
> updated
> > > to new node from the cluster. But will dig in more to see what else I
> can
> > > find.
> > >
> > > @Ben: Any suggestions further?
> > >
> > >
> > > Regards,
> > >
> > > On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou  wrote:
> > >
> > > >
> > > >
> > > > On Wed, Mar 21, 2018 at 9:49 AM, aginwala  wrote:
> > > >
> > > >> Thanks Numan:
> > > >>
> > > >> Yup agree with the locking part. For now; yes I am running northd
> on one
> > > >> node. I might right a script to monitor northd  in cluster so that
> if the
> > > >> node where it's running goes down, script can spin up northd on one
> other
> > > >> active nodes as a dirty hack.
> > > >>
> > > >> The "dirty hack" is pacemaker :)
> > > >
> > > >
> > > >> Sure, will await for the inputs from Ben too on this and see how
> complex
> > > >> would it be to roll out this feature.
> > > >>
> > > >>
> > > >> Regards,
> > > >>
> > > >>
> > > >> On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique <
> nusid...@redhat.com>
> > > >> wrote:
> > > >>
> > > >>> Hi Aliasgar,
> > > >>>
> > > >>> ovsdb-server maintains locks per each connection and not across
> the db.
> > > >>> A workaround for you now would be to configure all the ovn-northd
> instances
> > > >>> to connect to one ovsdb-server if you want to have active/standy.
> > > >>>
> > > >>> Probably Ben can answer if there is a plan to support ovsdb locks
> across
> > > >>> the db. We also need this support in networking-ovn as it also
> uses ovsdb
> > > >>> locks.
> > > >>>
> > > >>> Thanks
> > > >>> Numan
> > > >>>
> > > >>>
> > > >>> On Wed, Mar 21, 2018 at 1:40 PM, aginwala 
> wrote:
> > > >>>
> > >  Hi Numan:
> > > 
> > >  Just figured out that ovn-northd is running as active on all 3
> nodes
> > >  instead of one active instance as I continued to test further
> which results
> > >  in db errors as per logs.
> > > 
> > > 
> > >  # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs
> in
> > >  ovn-north
> > >  2018-03-21T06:01:59.442Z|7|ovsdb_idl|WARN|transaction error:
> > >  {"details":"Transaction causes multiple rows in
> \"Datapath_Binding\" table
> > >  to have identical values (1) for index on column \"tunnel_key\".
> First
> > >  row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was
> inserted by
> > >  this transaction.  Second row, with UUID 8e06f919-4cc7-4ffc-9a79-
> 20ce6663b683,
> > >  existed in the database before this transaction and was not
> modified by the
> > >  transaction.","error":"constraint violation"}
> > > 
> > >  In southbound datapath list, 2 duplicate records gets created for
> same
> > >  switch.
> > > 
> > >  # ovn-sbctl list Datapath
> > >  _uuid   : b270ae30-3458-445f-95d2-b14e8ebddd01
> > >  external_ids: {logical-switch="4d6674e3-
> ff9f-4f38-b050-0fa9bec9e34d",
> > >  name="ls2"}
> > >  tunnel_key  : 2
> > > 
> > >  _uuid   : 8e06f919-4cc7-4ffc-9a79-20ce6663b683
> > >  external_ids: {logical-switch="4d6674e3-
> ff9f-4f38-b050-0fa9bec9e34d",
> > >  name="ls2"}
> > >  tunnel_key  : 1
> > > 
> 

Re: [ovs-discuss] raft ovsdb clustering

2018-04-04 Thread Ben Pfaff
Oh, I see, from reading further in the thread, that this was indeed a
misunderstanding.  Well, in any case that new option to ovs-sandbox can
be useful.

On Wed, Apr 04, 2018 at 04:00:20PM -0700, Ben Pfaff wrote:
> I would like to support cluster-wide locks.  They require extra work and
> they require new OVSDB JSON-RPC protocol design (because locks are
> currently per-server, not per-database).  I do not currently have a
> schedule for designing and implementing them.
> 
> However, I am surprised that this is an issue for northd.  For a
> clustered database, ovn-northd always connects to the cluster leader.
> There is at most one leader in the cluster at a given time, so as long
> as ovn-northd obtains a lock on the leader, this should ensure that only
> one ovn-northd is active at a time.  There could be brief races, in
> which two ovn-northds believe that they have the lock, but they should
> not persist.
> 
> You see different behavior, so there is a bug or a misunderstanding.
> I don't see the same misbehavior, though, when I do a similar test in
> the sandbox.  If you apply the patches I just posted:
> https://patchwork.ozlabs.org/patch/895184/
> https://patchwork.ozlabs.org/patch/895185/
> then you can try it out with:
> make sandbox SANDBOXFLAGS='--ovn --sbdb-model=clustered --n-northds=3'
> 
> On Wed, Mar 21, 2018 at 01:12:48PM -0700, aginwala wrote:
> > :) The only thing is while using pacemaker, if the node that pacemaker if
> > pointing to is down, all the active/standby northd nodes have to be updated
> > to new node from the cluster. But will dig in more to see what else I can
> > find.
> > 
> > @Ben: Any suggestions further?
> > 
> > 
> > Regards,
> > 
> > On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou  wrote:
> > 
> > >
> > >
> > > On Wed, Mar 21, 2018 at 9:49 AM, aginwala  wrote:
> > >
> > >> Thanks Numan:
> > >>
> > >> Yup agree with the locking part. For now; yes I am running northd on one
> > >> node. I might right a script to monitor northd  in cluster so that if the
> > >> node where it's running goes down, script can spin up northd on one other
> > >> active nodes as a dirty hack.
> > >>
> > >> The "dirty hack" is pacemaker :)
> > >
> > >
> > >> Sure, will await for the inputs from Ben too on this and see how complex
> > >> would it be to roll out this feature.
> > >>
> > >>
> > >> Regards,
> > >>
> > >>
> > >> On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique 
> > >> wrote:
> > >>
> > >>> Hi Aliasgar,
> > >>>
> > >>> ovsdb-server maintains locks per each connection and not across the db.
> > >>> A workaround for you now would be to configure all the ovn-northd 
> > >>> instances
> > >>> to connect to one ovsdb-server if you want to have active/standy.
> > >>>
> > >>> Probably Ben can answer if there is a plan to support ovsdb locks across
> > >>> the db. We also need this support in networking-ovn as it also uses 
> > >>> ovsdb
> > >>> locks.
> > >>>
> > >>> Thanks
> > >>> Numan
> > >>>
> > >>>
> > >>> On Wed, Mar 21, 2018 at 1:40 PM, aginwala  wrote:
> > >>>
> >  Hi Numan:
> > 
> >  Just figured out that ovn-northd is running as active on all 3 nodes
> >  instead of one active instance as I continued to test further which 
> >  results
> >  in db errors as per logs.
> > 
> > 
> >  # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs in
> >  ovn-north
> >  2018-03-21T06:01:59.442Z|7|ovsdb_idl|WARN|transaction error:
> >  {"details":"Transaction causes multiple rows in \"Datapath_Binding\" 
> >  table
> >  to have identical values (1) for index on column \"tunnel_key\".  First
> >  row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was inserted by
> >  this transaction.  Second row, with UUID 
> >  8e06f919-4cc7-4ffc-9a79-20ce6663b683,
> >  existed in the database before this transaction and was not modified 
> >  by the
> >  transaction.","error":"constraint violation"}
> > 
> >  In southbound datapath list, 2 duplicate records gets created for same
> >  switch.
> > 
> >  # ovn-sbctl list Datapath
> >  _uuid   : b270ae30-3458-445f-95d2-b14e8ebddd01
> >  external_ids: 
> >  {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
> >  name="ls2"}
> >  tunnel_key  : 2
> > 
> >  _uuid   : 8e06f919-4cc7-4ffc-9a79-20ce6663b683
> >  external_ids: 
> >  {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
> >  name="ls2"}
> >  tunnel_key  : 1
> > 
> > 
> > 
> >  # on nodes 1 and 2 where northd is running, it gives below error:
> >  2018-03-21T06:01:59.437Z|8|ovsdb_idl|WARN|transaction error:
> >  {"details":"cannot delete Datapath_Binding row
> >  8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining
> >  

Re: [ovs-discuss] raft ovsdb clustering

2018-04-04 Thread Ben Pfaff
I would like to support cluster-wide locks.  They require extra work and
they require new OVSDB JSON-RPC protocol design (because locks are
currently per-server, not per-database).  I do not currently have a
schedule for designing and implementing them.

However, I am surprised that this is an issue for northd.  For a
clustered database, ovn-northd always connects to the cluster leader.
There is at most one leader in the cluster at a given time, so as long
as ovn-northd obtains a lock on the leader, this should ensure that only
one ovn-northd is active at a time.  There could be brief races, in
which two ovn-northds believe that they have the lock, but they should
not persist.

You see different behavior, so there is a bug or a misunderstanding.
I don't see the same misbehavior, though, when I do a similar test in
the sandbox.  If you apply the patches I just posted:
https://patchwork.ozlabs.org/patch/895184/
https://patchwork.ozlabs.org/patch/895185/
then you can try it out with:
make sandbox SANDBOXFLAGS='--ovn --sbdb-model=clustered --n-northds=3'

On Wed, Mar 21, 2018 at 01:12:48PM -0700, aginwala wrote:
> :) The only thing is while using pacemaker, if the node that pacemaker if
> pointing to is down, all the active/standby northd nodes have to be updated
> to new node from the cluster. But will dig in more to see what else I can
> find.
> 
> @Ben: Any suggestions further?
> 
> 
> Regards,
> 
> On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou  wrote:
> 
> >
> >
> > On Wed, Mar 21, 2018 at 9:49 AM, aginwala  wrote:
> >
> >> Thanks Numan:
> >>
> >> Yup agree with the locking part. For now; yes I am running northd on one
> >> node. I might right a script to monitor northd  in cluster so that if the
> >> node where it's running goes down, script can spin up northd on one other
> >> active nodes as a dirty hack.
> >>
> >> The "dirty hack" is pacemaker :)
> >
> >
> >> Sure, will await for the inputs from Ben too on this and see how complex
> >> would it be to roll out this feature.
> >>
> >>
> >> Regards,
> >>
> >>
> >> On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique 
> >> wrote:
> >>
> >>> Hi Aliasgar,
> >>>
> >>> ovsdb-server maintains locks per each connection and not across the db.
> >>> A workaround for you now would be to configure all the ovn-northd 
> >>> instances
> >>> to connect to one ovsdb-server if you want to have active/standy.
> >>>
> >>> Probably Ben can answer if there is a plan to support ovsdb locks across
> >>> the db. We also need this support in networking-ovn as it also uses ovsdb
> >>> locks.
> >>>
> >>> Thanks
> >>> Numan
> >>>
> >>>
> >>> On Wed, Mar 21, 2018 at 1:40 PM, aginwala  wrote:
> >>>
>  Hi Numan:
> 
>  Just figured out that ovn-northd is running as active on all 3 nodes
>  instead of one active instance as I continued to test further which 
>  results
>  in db errors as per logs.
> 
> 
>  # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs in
>  ovn-north
>  2018-03-21T06:01:59.442Z|7|ovsdb_idl|WARN|transaction error:
>  {"details":"Transaction causes multiple rows in \"Datapath_Binding\" 
>  table
>  to have identical values (1) for index on column \"tunnel_key\".  First
>  row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was inserted by
>  this transaction.  Second row, with UUID 
>  8e06f919-4cc7-4ffc-9a79-20ce6663b683,
>  existed in the database before this transaction and was not modified by 
>  the
>  transaction.","error":"constraint violation"}
> 
>  In southbound datapath list, 2 duplicate records gets created for same
>  switch.
> 
>  # ovn-sbctl list Datapath
>  _uuid   : b270ae30-3458-445f-95d2-b14e8ebddd01
>  external_ids: 
>  {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
>  name="ls2"}
>  tunnel_key  : 2
> 
>  _uuid   : 8e06f919-4cc7-4ffc-9a79-20ce6663b683
>  external_ids: 
>  {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
>  name="ls2"}
>  tunnel_key  : 1
> 
> 
> 
>  # on nodes 1 and 2 where northd is running, it gives below error:
>  2018-03-21T06:01:59.437Z|8|ovsdb_idl|WARN|transaction error:
>  {"details":"cannot delete Datapath_Binding row
>  8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining
>  reference(s)","error":"referential integrity violation"}
> 
>  As per commit message, for northd I re-tried setting --ovnnb-db="tcp:
>  10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"
>  and --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
>  10.148.181.162:6642" and it did not help either.
> 
>  There is no issue if I keep running only one instance of northd on any
>  of these 3 nodes. Hence, wanted to know is there 

Re: [ovs-discuss] raft ovsdb clustering

2018-03-27 Thread aginwala
Sure:


#Node1

/usr/share/openvswitch/scripts/ovn-ctl  --db-nb-addr=192.168.220.101
--db-nb-port=6641 --db-nb-cluster-local-addr=tcp:192.168.220.101:6645
--db-nb-create-insecure-remote=yes start_nb_ovsdb

/usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.220.101
--db-sb-port=6642 --db-sb-create-insecure-remote=yes
--db-sb-cluster-local-addr=tcp:192.168.10.220:6644 start_sb_ovsdb

ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:192.168.
220.101:6641,tcp:192.168.220.102:6641,tcp:192.168.220.103:6641" --ovnsb-db="
tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp:192.168.220.103:6642"
--no-chdir --log-file=/var/log/openvswitch/ovn-northd.log
--pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor

#Node2

/usr/share/openvswitch/scripts/ovn-ctl  --db-nb-addr=192.168.220.102
--db-nb-port=6641 --db-nb-cluster-local-addr=tcp:192.168.220.102:6645
--db-nb-cluster-remote-addr="tcp:192.168.220.101:6645"
--db-nb-create-insecure-remote=yes start_nb_ovsdb

/usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.220.102
--db-sb-port=6642 --db-sb-create-insecure-remote=yes
--db-sb-cluster-local-addr="tcp:192.168.220.102:6644"
--db-sb-cluster-remote-addr="tcp:192.168.220.101:6644"  start_sb_ovsdb

ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:192.168.
220.101:6641,tcp:192.168.220.102:6641,tcp:192.168.220.103:6641" --ovnsb-db="
tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp:192.168.220.103:6642"
--no-chdir --log-file=/var/log/openvswitch/ovn-northd.log
--pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor


#Node3

/usr/share/openvswitch/scripts/ovn-ctl  --db-nb-addr=192.168.220.103
--db-nb-port=6641 --db-nb-cluster-local-addr=tcp:192.168.220.103:6645
--db-nb-cluster-remote-addr="tcp:192.168.220.101:6645"
--db-nb-create-insecure-remote=yes start_nb_ovsdb

/usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.220.103
--db-sb-port=6642 --db-sb-create-insecure-remote=yes
--db-sb-cluster-local-addr="tcp:192.168.220.103:6644"
--db-sb-cluster-remote-addr="tcp:192.168.220.101:6644"  start_sb_ovsdb

ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:
192.168.220.101:6641,tcp:192.168.220.102:6641,tcp:192.168.220.103:6641"
--ovnsb-db="tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp:
192.168.220.103:6642" --no-chdir --log-file=/var/log/openvswitch/ovn-northd.log
--pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor

#.export remote="tcp:192.168.220.103:6641,tcp:192.168.220.102:6641,tcp:
192.168.220.101:6641"

#. ovn-nbctl show can be done using command below

ovn-nbctl --db=$remote show

#.ovn-sbctl commands can be run as below:

ovn-sbctl --db=$remote show


Regards,



On Tue, Mar 27, 2018 at 12:08 PM, Numan Siddique 
wrote:

> Thanks Aliasgar,
>
> I am still facing the same issue.
>
> Can you also share the (ovn-ctl) commands you used to start/join the
> ovsdb-server clusters in your nodes ?
>
> Thanks
> Numan
>
>
> On Tue, Mar 27, 2018 at 11:04 PM, aginwala  wrote:
>
>> Hu Numan:
>>
>> You need to use --db as you are now running db in cluster, you can access
>> data from any of the three dbs.
>>
>> So if the leader crashes, it re-elects from the other two. Below is the
>> e.g. command:
>>
>> # export remote="tcp:192.168.220.103:6641,tcp:192.168.220.102:6641,tcp:
>> 192.168.220.101:6641"
>> # kill -9 3985
>> # ovn-nbctl --db=$remote show
>> switch 1d86ab4e-c8bf-4747-a716-8832a285d58c (ls1)
>> # ovn-nbctl --db=$remote ls-del ls1
>>
>>
>>
>>
>>
>>
>>
>> Hope it helps!
>>
>> Regards,
>>
>>
>> On Tue, Mar 27, 2018 at 10:01 AM, Numan Siddique 
>> wrote:
>>
>>> Hi Aliasgar,
>>>
>>> In your setup, if you kill the leader what is the behaviour ?  Are you
>>> still able to create or delete any resources ? Is a new leader elected ?
>>>
>>> In my setup, the command "ovn-nbctl ls-add" for example blocks until I
>>> restart the ovsdb-server in node 1. And I don't see any other ovsdb-server
>>> becoming leader. May be I have configured wrongly.
>>> Could you please test this scenario if not yet please and let me know
>>> your observations if possible.
>>>
>>> Thanks
>>> Numan
>>>
>>>
>>> On Thu, Mar 22, 2018 at 12:28 PM, Han Zhou  wrote:
>>>
 Sounds good.

 Just checked the patch, by default the C IDL has "leader_only" as true,
 which ensures that connection is to leader only. This is the case for
 northd. So the lock works for northd hot active-standby purpose if all the
 ovsdb endpoints of a cluster are specified to northd, since all northds are
 connecting to the same DB, the leader.

 For neutron networking-ovn, this may not work yet, since I didn't see
 such logic in the python IDL in current patch series. It would be good if
 we add similar logic for python IDL. (@ben/numan, correct me if I am wrong)


 On Wed, Mar 21, 2018 at 6:49 PM, aginwala  wrote:

> 

Re: [ovs-discuss] raft ovsdb clustering

2018-03-27 Thread Numan Siddique
Thanks Aliasgar,

I am still facing the same issue.

Can you also share the (ovn-ctl) commands you used to start/join the
ovsdb-server clusters in your nodes ?

Thanks
Numan


On Tue, Mar 27, 2018 at 11:04 PM, aginwala  wrote:

> Hu Numan:
>
> You need to use --db as you are now running db in cluster, you can access
> data from any of the three dbs.
>
> So if the leader crashes, it re-elects from the other two. Below is the
> e.g. command:
>
> # export remote="tcp:192.168.220.103:6641,tcp:192.168.220.102:6641,tcp:
> 192.168.220.101:6641"
> # kill -9 3985
> # ovn-nbctl --db=$remote show
> switch 1d86ab4e-c8bf-4747-a716-8832a285d58c (ls1)
> # ovn-nbctl --db=$remote ls-del ls1
>
>
>
>
>
>
>
> Hope it helps!
>
> Regards,
>
>
> On Tue, Mar 27, 2018 at 10:01 AM, Numan Siddique 
> wrote:
>
>> Hi Aliasgar,
>>
>> In your setup, if you kill the leader what is the behaviour ?  Are you
>> still able to create or delete any resources ? Is a new leader elected ?
>>
>> In my setup, the command "ovn-nbctl ls-add" for example blocks until I
>> restart the ovsdb-server in node 1. And I don't see any other ovsdb-server
>> becoming leader. May be I have configured wrongly.
>> Could you please test this scenario if not yet please and let me know
>> your observations if possible.
>>
>> Thanks
>> Numan
>>
>>
>> On Thu, Mar 22, 2018 at 12:28 PM, Han Zhou  wrote:
>>
>>> Sounds good.
>>>
>>> Just checked the patch, by default the C IDL has "leader_only" as true,
>>> which ensures that connection is to leader only. This is the case for
>>> northd. So the lock works for northd hot active-standby purpose if all the
>>> ovsdb endpoints of a cluster are specified to northd, since all northds are
>>> connecting to the same DB, the leader.
>>>
>>> For neutron networking-ovn, this may not work yet, since I didn't see
>>> such logic in the python IDL in current patch series. It would be good if
>>> we add similar logic for python IDL. (@ben/numan, correct me if I am wrong)
>>>
>>>
>>> On Wed, Mar 21, 2018 at 6:49 PM, aginwala  wrote:
>>>
 Hi :

 Just sorted out the correct settings and northd also works in ha in
 raft.

 There were 2 issues in the setup:
 1. I had started nb db without --db-nb-create-insecure-remote
 2. I also started northd locally on all 3 without remote which is like
 all three northd trying to lock the ovsdb locally.

 Hence, the duplicate logs were populated in the southbound datapath due
 to multiple northd trying to write the local copy.

 So, I now start nb db with --db-nb-create-insecure-remote and northd on
 all 3 nodes using below command:

 ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:
 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"
 --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
 10.148.181.162:6642" --no-chdir 
 --log-file=/var/log/openvswitch/ovn-northd.log
 --pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor


 #At start, northd went active on the leader node and standby on other
 two nodes.

 #After old leader crashed and new leader got elected, northd goes
 active on any of the remaining 2 nodes as per sample logs below from
 non-leader node:
 2018-03-22T00:20:30.732Z|00023|ovn_northd|INFO|ovn-northd lock lost.
 This ovn-northd instance is now on standby.
 2018-03-22T00:20:30.743Z|00024|ovn_northd|INFO|ovn-northd lock
 acquired. This ovn-northd instance is now active.

 # Also ovn-controller works similar way if leader goes down and
 connects to any of the remaining 2 nodes:
 2018-03-22T01:21:56.250Z|00029|ovsdb_idl|INFO|tcp:10.148.181.162:6642:
 clustered database server is disconnected from cluster; trying another
 server
 2018-03-22T01:21:56.250Z|00030|reconnect|INFO|tcp:10.148.181.162:6642:
 connection attempt timed out
 2018-03-22T01:21:56.250Z|00031|reconnect|INFO|tcp:10.148.181.162:6642:
 waiting 4 seconds before reconnect
 2018-03-22T01:23:52.417Z|00043|reconnect|INFO|tcp:10.148.181.162:6642:
 connected



 Above settings will also work if we put all the nodes behind the vip
 and updates the ovn configs to use vips. So we don't need pacemaker
 explicitly for northd HA :).

 Since the setup is complete now, I will populate the same in scale test
 env and see how it behaves.

 @Numan: We can try the same with networking-ovn integration and see if
 we find anything weird there too. Not sure if you have any exclusive
 findings for this case.

 Let me know if something else is missed here.




 Regards,

 On Wed, Mar 21, 2018 at 2:50 PM, Han Zhou  wrote:

> Ali, sorry if I misunderstand what you are saying, but pacemaker here
> is for northd HA. pacemaker 

Re: [ovs-discuss] raft ovsdb clustering

2018-03-27 Thread aginwala
Hu Numan:

You need to use --db as you are now running db in cluster, you can access
data from any of the three dbs.

So if the leader crashes, it re-elects from the other two. Below is the
e.g. command:

# export remote="tcp:192.168.220.103:6641,tcp:192.168.220.102:6641,tcp:
192.168.220.101:6641"
# kill -9 3985
# ovn-nbctl --db=$remote show
switch 1d86ab4e-c8bf-4747-a716-8832a285d58c (ls1)
# ovn-nbctl --db=$remote ls-del ls1







Hope it helps!

Regards,


On Tue, Mar 27, 2018 at 10:01 AM, Numan Siddique 
wrote:

> Hi Aliasgar,
>
> In your setup, if you kill the leader what is the behaviour ?  Are you
> still able to create or delete any resources ? Is a new leader elected ?
>
> In my setup, the command "ovn-nbctl ls-add" for example blocks until I
> restart the ovsdb-server in node 1. And I don't see any other ovsdb-server
> becoming leader. May be I have configured wrongly.
> Could you please test this scenario if not yet please and let me know your
> observations if possible.
>
> Thanks
> Numan
>
>
> On Thu, Mar 22, 2018 at 12:28 PM, Han Zhou  wrote:
>
>> Sounds good.
>>
>> Just checked the patch, by default the C IDL has "leader_only" as true,
>> which ensures that connection is to leader only. This is the case for
>> northd. So the lock works for northd hot active-standby purpose if all the
>> ovsdb endpoints of a cluster are specified to northd, since all northds are
>> connecting to the same DB, the leader.
>>
>> For neutron networking-ovn, this may not work yet, since I didn't see
>> such logic in the python IDL in current patch series. It would be good if
>> we add similar logic for python IDL. (@ben/numan, correct me if I am wrong)
>>
>>
>> On Wed, Mar 21, 2018 at 6:49 PM, aginwala  wrote:
>>
>>> Hi :
>>>
>>> Just sorted out the correct settings and northd also works in ha in raft.
>>>
>>> There were 2 issues in the setup:
>>> 1. I had started nb db without --db-nb-create-insecure-remote
>>> 2. I also started northd locally on all 3 without remote which is like
>>> all three northd trying to lock the ovsdb locally.
>>>
>>> Hence, the duplicate logs were populated in the southbound datapath due
>>> to multiple northd trying to write the local copy.
>>>
>>> So, I now start nb db with --db-nb-create-insecure-remote and northd on
>>> all 3 nodes using below command:
>>>
>>> ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:
>>> 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"
>>> --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
>>> 10.148.181.162:6642" --no-chdir 
>>> --log-file=/var/log/openvswitch/ovn-northd.log
>>> --pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor
>>>
>>>
>>> #At start, northd went active on the leader node and standby on other
>>> two nodes.
>>>
>>> #After old leader crashed and new leader got elected, northd goes active
>>> on any of the remaining 2 nodes as per sample logs below from non-leader
>>> node:
>>> 2018-03-22T00:20:30.732Z|00023|ovn_northd|INFO|ovn-northd lock lost.
>>> This ovn-northd instance is now on standby.
>>> 2018-03-22T00:20:30.743Z|00024|ovn_northd|INFO|ovn-northd lock
>>> acquired. This ovn-northd instance is now active.
>>>
>>> # Also ovn-controller works similar way if leader goes down and connects
>>> to any of the remaining 2 nodes:
>>> 2018-03-22T01:21:56.250Z|00029|ovsdb_idl|INFO|tcp:10.148.181.162:6642:
>>> clustered database server is disconnected from cluster; trying another
>>> server
>>> 2018-03-22T01:21:56.250Z|00030|reconnect|INFO|tcp:10.148.181.162:6642:
>>> connection attempt timed out
>>> 2018-03-22T01:21:56.250Z|00031|reconnect|INFO|tcp:10.148.181.162:6642:
>>> waiting 4 seconds before reconnect
>>> 2018-03-22T01:23:52.417Z|00043|reconnect|INFO|tcp:10.148.181.162:6642:
>>> connected
>>>
>>>
>>>
>>> Above settings will also work if we put all the nodes behind the vip and
>>> updates the ovn configs to use vips. So we don't need pacemaker explicitly
>>> for northd HA :).
>>>
>>> Since the setup is complete now, I will populate the same in scale test
>>> env and see how it behaves.
>>>
>>> @Numan: We can try the same with networking-ovn integration and see if
>>> we find anything weird there too. Not sure if you have any exclusive
>>> findings for this case.
>>>
>>> Let me know if something else is missed here.
>>>
>>>
>>>
>>>
>>> Regards,
>>>
>>> On Wed, Mar 21, 2018 at 2:50 PM, Han Zhou  wrote:
>>>
 Ali, sorry if I misunderstand what you are saying, but pacemaker here
 is for northd HA. pacemaker itself won't point to any ovsdb cluster node.
 All northds can point to a LB VIP for the ovsdb cluster, so if a member of
 ovsdb cluster is down it won't have impact to northd.

 Without clustering support of the ovsdb lock, I think this is what we
 have now for northd HA. Please suggest if anyone has any other idea. Thanks
 :)

 On Wed, Mar 21, 2018 at 1:12 PM, 

Re: [ovs-discuss] raft ovsdb clustering

2018-03-27 Thread Numan Siddique
Hi Aliasgar,

In your setup, if you kill the leader what is the behaviour ?  Are you
still able to create or delete any resources ? Is a new leader elected ?

In my setup, the command "ovn-nbctl ls-add" for example blocks until I
restart the ovsdb-server in node 1. And I don't see any other ovsdb-server
becoming leader. May be I have configured wrongly.
Could you please test this scenario if not yet please and let me know your
observations if possible.

Thanks
Numan


On Thu, Mar 22, 2018 at 12:28 PM, Han Zhou  wrote:

> Sounds good.
>
> Just checked the patch, by default the C IDL has "leader_only" as true,
> which ensures that connection is to leader only. This is the case for
> northd. So the lock works for northd hot active-standby purpose if all the
> ovsdb endpoints of a cluster are specified to northd, since all northds are
> connecting to the same DB, the leader.
>
> For neutron networking-ovn, this may not work yet, since I didn't see such
> logic in the python IDL in current patch series. It would be good if we add
> similar logic for python IDL. (@ben/numan, correct me if I am wrong)
>
>
> On Wed, Mar 21, 2018 at 6:49 PM, aginwala  wrote:
>
>> Hi :
>>
>> Just sorted out the correct settings and northd also works in ha in raft.
>>
>> There were 2 issues in the setup:
>> 1. I had started nb db without --db-nb-create-insecure-remote
>> 2. I also started northd locally on all 3 without remote which is like
>> all three northd trying to lock the ovsdb locally.
>>
>> Hence, the duplicate logs were populated in the southbound datapath due
>> to multiple northd trying to write the local copy.
>>
>> So, I now start nb db with --db-nb-create-insecure-remote and northd on
>> all 3 nodes using below command:
>>
>> ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:
>> 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"
>> --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
>> 10.148.181.162:6642" --no-chdir 
>> --log-file=/var/log/openvswitch/ovn-northd.log
>> --pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor
>>
>>
>> #At start, northd went active on the leader node and standby on other two
>> nodes.
>>
>> #After old leader crashed and new leader got elected, northd goes active
>> on any of the remaining 2 nodes as per sample logs below from non-leader
>> node:
>> 2018-03-22T00:20:30.732Z|00023|ovn_northd|INFO|ovn-northd lock lost.
>> This ovn-northd instance is now on standby.
>> 2018-03-22T00:20:30.743Z|00024|ovn_northd|INFO|ovn-northd lock acquired.
>> This ovn-northd instance is now active.
>>
>> # Also ovn-controller works similar way if leader goes down and connects
>> to any of the remaining 2 nodes:
>> 2018-03-22T01:21:56.250Z|00029|ovsdb_idl|INFO|tcp:10.148.181.162:6642:
>> clustered database server is disconnected from cluster; trying another
>> server
>> 2018-03-22T01:21:56.250Z|00030|reconnect|INFO|tcp:10.148.181.162:6642:
>> connection attempt timed out
>> 2018-03-22T01:21:56.250Z|00031|reconnect|INFO|tcp:10.148.181.162:6642:
>> waiting 4 seconds before reconnect
>> 2018-03-22T01:23:52.417Z|00043|reconnect|INFO|tcp:10.148.181.162:6642:
>> connected
>>
>>
>>
>> Above settings will also work if we put all the nodes behind the vip and
>> updates the ovn configs to use vips. So we don't need pacemaker explicitly
>> for northd HA :).
>>
>> Since the setup is complete now, I will populate the same in scale test
>> env and see how it behaves.
>>
>> @Numan: We can try the same with networking-ovn integration and see if we
>> find anything weird there too. Not sure if you have any exclusive findings
>> for this case.
>>
>> Let me know if something else is missed here.
>>
>>
>>
>>
>> Regards,
>>
>> On Wed, Mar 21, 2018 at 2:50 PM, Han Zhou  wrote:
>>
>>> Ali, sorry if I misunderstand what you are saying, but pacemaker here is
>>> for northd HA. pacemaker itself won't point to any ovsdb cluster node. All
>>> northds can point to a LB VIP for the ovsdb cluster, so if a member of
>>> ovsdb cluster is down it won't have impact to northd.
>>>
>>> Without clustering support of the ovsdb lock, I think this is what we
>>> have now for northd HA. Please suggest if anyone has any other idea. Thanks
>>> :)
>>>
>>> On Wed, Mar 21, 2018 at 1:12 PM, aginwala  wrote:
>>>
 :) The only thing is while using pacemaker, if the node that pacemaker
 if pointing to is down, all the active/standby northd nodes have to be
 updated to new node from the cluster. But will dig in more to see what else
 I can find.

 @Ben: Any suggestions further?


 Regards,

 On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou  wrote:

>
>
> On Wed, Mar 21, 2018 at 9:49 AM, aginwala  wrote:
>
>> Thanks Numan:
>>
>> Yup agree with the locking part. For now; yes I am running northd on
>> one node. I might right 

Re: [ovs-discuss] raft ovsdb clustering

2018-03-22 Thread Han Zhou
Sounds good.

Just checked the patch, by default the C IDL has "leader_only" as true,
which ensures that connection is to leader only. This is the case for
northd. So the lock works for northd hot active-standby purpose if all the
ovsdb endpoints of a cluster are specified to northd, since all northds are
connecting to the same DB, the leader.

For neutron networking-ovn, this may not work yet, since I didn't see such
logic in the python IDL in current patch series. It would be good if we add
similar logic for python IDL. (@ben/numan, correct me if I am wrong)

On Wed, Mar 21, 2018 at 6:49 PM, aginwala  wrote:

> Hi :
>
> Just sorted out the correct settings and northd also works in ha in raft.
>
> There were 2 issues in the setup:
> 1. I had started nb db without --db-nb-create-insecure-remote
> 2. I also started northd locally on all 3 without remote which is like all
> three northd trying to lock the ovsdb locally.
>
> Hence, the duplicate logs were populated in the southbound datapath due to
> multiple northd trying to write the local copy.
>
> So, I now start nb db with --db-nb-create-insecure-remote and northd on
> all 3 nodes using below command:
>
> ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:
> 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"
> --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
> 10.148.181.162:6642" --no-chdir --log-file=/var/log/openvswitch/ovn-northd.log
> --pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor
>
>
> #At start, northd went active on the leader node and standby on other two
> nodes.
>
> #After old leader crashed and new leader got elected, northd goes active
> on any of the remaining 2 nodes as per sample logs below from non-leader
> node:
> 2018-03-22T00:20:30.732Z|00023|ovn_northd|INFO|ovn-northd lock lost. This
> ovn-northd instance is now on standby.
> 2018-03-22T00:20:30.743Z|00024|ovn_northd|INFO|ovn-northd lock acquired.
> This ovn-northd instance is now active.
>
> # Also ovn-controller works similar way if leader goes down and connects
> to any of the remaining 2 nodes:
> 2018-03-22T01:21:56.250Z|00029|ovsdb_idl|INFO|tcp:10.148.181.162:6642:
> clustered database server is disconnected from cluster; trying another
> server
> 2018-03-22T01:21:56.250Z|00030|reconnect|INFO|tcp:10.148.181.162:6642:
> connection attempt timed out
> 2018-03-22T01:21:56.250Z|00031|reconnect|INFO|tcp:10.148.181.162:6642:
> waiting 4 seconds before reconnect
> 2018-03-22T01:23:52.417Z|00043|reconnect|INFO|tcp:10.148.181.162:6642:
> connected
>
>
>
> Above settings will also work if we put all the nodes behind the vip and
> updates the ovn configs to use vips. So we don't need pacemaker explicitly
> for northd HA :).
>
> Since the setup is complete now, I will populate the same in scale test
> env and see how it behaves.
>
> @Numan: We can try the same with networking-ovn integration and see if we
> find anything weird there too. Not sure if you have any exclusive findings
> for this case.
>
> Let me know if something else is missed here.
>
>
>
>
> Regards,
>
> On Wed, Mar 21, 2018 at 2:50 PM, Han Zhou  wrote:
>
>> Ali, sorry if I misunderstand what you are saying, but pacemaker here is
>> for northd HA. pacemaker itself won't point to any ovsdb cluster node. All
>> northds can point to a LB VIP for the ovsdb cluster, so if a member of
>> ovsdb cluster is down it won't have impact to northd.
>>
>> Without clustering support of the ovsdb lock, I think this is what we
>> have now for northd HA. Please suggest if anyone has any other idea. Thanks
>> :)
>>
>> On Wed, Mar 21, 2018 at 1:12 PM, aginwala  wrote:
>>
>>> :) The only thing is while using pacemaker, if the node that pacemaker
>>> if pointing to is down, all the active/standby northd nodes have to be
>>> updated to new node from the cluster. But will dig in more to see what else
>>> I can find.
>>>
>>> @Ben: Any suggestions further?
>>>
>>>
>>> Regards,
>>>
>>> On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou  wrote:
>>>


 On Wed, Mar 21, 2018 at 9:49 AM, aginwala  wrote:

> Thanks Numan:
>
> Yup agree with the locking part. For now; yes I am running northd on
> one node. I might right a script to monitor northd  in cluster so that if
> the node where it's running goes down, script can spin up northd on one
> other active nodes as a dirty hack.
>
> The "dirty hack" is pacemaker :)


> Sure, will await for the inputs from Ben too on this and see how
> complex would it be to roll out this feature.
>
>
> Regards,
>
>
> On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique 
> wrote:
>
>> Hi Aliasgar,
>>
>> ovsdb-server maintains locks per each connection and not across the
>> db. A workaround for you now would be to configure all the ovn-northd
>> 

Re: [ovs-discuss] raft ovsdb clustering

2018-03-21 Thread aginwala
Hi :

Just sorted out the correct settings and northd also works in ha in raft.

There were 2 issues in the setup:
1. I had started nb db without --db-nb-create-insecure-remote
2. I also started northd locally on all 3 without remote which is like all
three northd trying to lock the ovsdb locally.

Hence, the duplicate logs were populated in the southbound datapath due to
multiple northd trying to write the local copy.

So, I now start nb db with --db-nb-create-insecure-remote and northd on all
3 nodes using below command:

ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:
10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"
--ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
10.148.181.162:6642" --no-chdir --log-file=/var/log/openvswitch/ovn-northd.log
--pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor


#At start, northd went active on the leader node and standby on other two
nodes.

#After old leader crashed and new leader got elected, northd goes active on
any of the remaining 2 nodes as per sample logs below from non-leader node:
2018-03-22T00:20:30.732Z|00023|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.
2018-03-22T00:20:30.743Z|00024|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.

# Also ovn-controller works similar way if leader goes down and connects to
any of the remaining 2 nodes:
2018-03-22T01:21:56.250Z|00029|ovsdb_idl|INFO|tcp:10.148.181.162:6642:
clustered database server is disconnected from cluster; trying another
server
2018-03-22T01:21:56.250Z|00030|reconnect|INFO|tcp:10.148.181.162:6642:
connection attempt timed out
2018-03-22T01:21:56.250Z|00031|reconnect|INFO|tcp:10.148.181.162:6642:
waiting 4 seconds before reconnect
2018-03-22T01:23:52.417Z|00043|reconnect|INFO|tcp:10.148.181.162:6642:
connected



Above settings will also work if we put all the nodes behind the vip and
updates the ovn configs to use vips. So we don't need pacemaker explicitly
for northd HA :).

Since the setup is complete now, I will populate the same in scale test env
and see how it behaves.

@Numan: We can try the same with networking-ovn integration and see if we
find anything weird there too. Not sure if you have any exclusive findings
for this case.

Let me know if something else is missed here.




Regards,

On Wed, Mar 21, 2018 at 2:50 PM, Han Zhou  wrote:

> Ali, sorry if I misunderstand what you are saying, but pacemaker here is
> for northd HA. pacemaker itself won't point to any ovsdb cluster node. All
> northds can point to a LB VIP for the ovsdb cluster, so if a member of
> ovsdb cluster is down it won't have impact to northd.
>
> Without clustering support of the ovsdb lock, I think this is what we have
> now for northd HA. Please suggest if anyone has any other idea. Thanks :)
>
> On Wed, Mar 21, 2018 at 1:12 PM, aginwala  wrote:
>
>> :) The only thing is while using pacemaker, if the node that pacemaker if
>> pointing to is down, all the active/standby northd nodes have to be updated
>> to new node from the cluster. But will dig in more to see what else I can
>> find.
>>
>> @Ben: Any suggestions further?
>>
>>
>> Regards,
>>
>> On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou  wrote:
>>
>>>
>>>
>>> On Wed, Mar 21, 2018 at 9:49 AM, aginwala  wrote:
>>>
 Thanks Numan:

 Yup agree with the locking part. For now; yes I am running northd on
 one node. I might right a script to monitor northd  in cluster so that if
 the node where it's running goes down, script can spin up northd on one
 other active nodes as a dirty hack.

 The "dirty hack" is pacemaker :)
>>>
>>>
 Sure, will await for the inputs from Ben too on this and see how
 complex would it be to roll out this feature.


 Regards,


 On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique 
 wrote:

> Hi Aliasgar,
>
> ovsdb-server maintains locks per each connection and not across the
> db. A workaround for you now would be to configure all the ovn-northd
> instances to connect to one ovsdb-server if you want to have 
> active/standy.
>
> Probably Ben can answer if there is a plan to support ovsdb locks
> across the db. We also need this support in networking-ovn as it also uses
> ovsdb locks.
>
> Thanks
> Numan
>
>
> On Wed, Mar 21, 2018 at 1:40 PM, aginwala  wrote:
>
>> Hi Numan:
>>
>> Just figured out that ovn-northd is running as active on all 3 nodes
>> instead of one active instance as I continued to test further which 
>> results
>> in db errors as per logs.
>>
>>
>> # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs in
>> ovn-north
>> 2018-03-21T06:01:59.442Z|7|ovsdb_idl|WARN|transaction error:
>> 

Re: [ovs-discuss] raft ovsdb clustering

2018-03-21 Thread Han Zhou
Ali, sorry if I misunderstand what you are saying, but pacemaker here is
for northd HA. pacemaker itself won't point to any ovsdb cluster node. All
northds can point to a LB VIP for the ovsdb cluster, so if a member of
ovsdb cluster is down it won't have impact to northd.

Without clustering support of the ovsdb lock, I think this is what we have
now for northd HA. Please suggest if anyone has any other idea. Thanks :)

On Wed, Mar 21, 2018 at 1:12 PM, aginwala  wrote:

> :) The only thing is while using pacemaker, if the node that pacemaker if
> pointing to is down, all the active/standby northd nodes have to be updated
> to new node from the cluster. But will dig in more to see what else I can
> find.
>
> @Ben: Any suggestions further?
>
>
> Regards,
>
> On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou  wrote:
>
>>
>>
>> On Wed, Mar 21, 2018 at 9:49 AM, aginwala  wrote:
>>
>>> Thanks Numan:
>>>
>>> Yup agree with the locking part. For now; yes I am running northd on one
>>> node. I might right a script to monitor northd  in cluster so that if the
>>> node where it's running goes down, script can spin up northd on one other
>>> active nodes as a dirty hack.
>>>
>>> The "dirty hack" is pacemaker :)
>>
>>
>>> Sure, will await for the inputs from Ben too on this and see how complex
>>> would it be to roll out this feature.
>>>
>>>
>>> Regards,
>>>
>>>
>>> On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique 
>>> wrote:
>>>
 Hi Aliasgar,

 ovsdb-server maintains locks per each connection and not across the db.
 A workaround for you now would be to configure all the ovn-northd instances
 to connect to one ovsdb-server if you want to have active/standy.

 Probably Ben can answer if there is a plan to support ovsdb locks
 across the db. We also need this support in networking-ovn as it also uses
 ovsdb locks.

 Thanks
 Numan


 On Wed, Mar 21, 2018 at 1:40 PM, aginwala  wrote:

> Hi Numan:
>
> Just figured out that ovn-northd is running as active on all 3 nodes
> instead of one active instance as I continued to test further which 
> results
> in db errors as per logs.
>
>
> # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs in
> ovn-north
> 2018-03-21T06:01:59.442Z|7|ovsdb_idl|WARN|transaction error:
> {"details":"Transaction causes multiple rows in \"Datapath_Binding\" table
> to have identical values (1) for index on column \"tunnel_key\".  First
> row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was inserted by
> this transaction.  Second row, with UUID 
> 8e06f919-4cc7-4ffc-9a79-20ce6663b683,
> existed in the database before this transaction and was not modified by 
> the
> transaction.","error":"constraint violation"}
>
> In southbound datapath list, 2 duplicate records gets created for same
> switch.
>
> # ovn-sbctl list Datapath
> _uuid   : b270ae30-3458-445f-95d2-b14e8ebddd01
> external_ids: 
> {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
> name="ls2"}
> tunnel_key  : 2
>
> _uuid   : 8e06f919-4cc7-4ffc-9a79-20ce6663b683
> external_ids: 
> {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
> name="ls2"}
> tunnel_key  : 1
>
>
>
> # on nodes 1 and 2 where northd is running, it gives below error:
> 2018-03-21T06:01:59.437Z|8|ovsdb_idl|WARN|transaction error:
> {"details":"cannot delete Datapath_Binding row
> 8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining
> reference(s)","error":"referential integrity violation"}
>
> As per commit message, for northd I re-tried setting --ovnnb-db="tcp:
> 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"
> and --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
> 10.148.181.162:6642" and it did not help either.
>
> There is no issue if I keep running only one instance of northd on any
> of these 3 nodes. Hence, wanted to know is there something else
> missing here to make only one northd instance as active and rest as
> standby?
>
>
> Regards,
>
> On Thu, Mar 15, 2018 at 3:09 AM, Numan Siddique 
> wrote:
>
>> That's great
>>
>> Numan
>>
>>
>> On Thu, Mar 15, 2018 at 2:57 AM, aginwala  wrote:
>>
>>> Hi Numan:
>>>
>>> I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu 16.04)with
>>> fresh installation and it worked super fine for both sb and nb dbs. 
>>> Seems
>>> like some kernel issue on the previous nodes when I re-installed raft 
>>> patch
>>> as I was running different ovs version on those nodes before.
>>>
>>>
>>> For 2 HVs, I now set 

Re: [ovs-discuss] raft ovsdb clustering

2018-03-21 Thread aginwala
:) The only thing is while using pacemaker, if the node that pacemaker if
pointing to is down, all the active/standby northd nodes have to be updated
to new node from the cluster. But will dig in more to see what else I can
find.

@Ben: Any suggestions further?


Regards,

On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou  wrote:

>
>
> On Wed, Mar 21, 2018 at 9:49 AM, aginwala  wrote:
>
>> Thanks Numan:
>>
>> Yup agree with the locking part. For now; yes I am running northd on one
>> node. I might right a script to monitor northd  in cluster so that if the
>> node where it's running goes down, script can spin up northd on one other
>> active nodes as a dirty hack.
>>
>> The "dirty hack" is pacemaker :)
>
>
>> Sure, will await for the inputs from Ben too on this and see how complex
>> would it be to roll out this feature.
>>
>>
>> Regards,
>>
>>
>> On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique 
>> wrote:
>>
>>> Hi Aliasgar,
>>>
>>> ovsdb-server maintains locks per each connection and not across the db.
>>> A workaround for you now would be to configure all the ovn-northd instances
>>> to connect to one ovsdb-server if you want to have active/standy.
>>>
>>> Probably Ben can answer if there is a plan to support ovsdb locks across
>>> the db. We also need this support in networking-ovn as it also uses ovsdb
>>> locks.
>>>
>>> Thanks
>>> Numan
>>>
>>>
>>> On Wed, Mar 21, 2018 at 1:40 PM, aginwala  wrote:
>>>
 Hi Numan:

 Just figured out that ovn-northd is running as active on all 3 nodes
 instead of one active instance as I continued to test further which results
 in db errors as per logs.


 # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs in
 ovn-north
 2018-03-21T06:01:59.442Z|7|ovsdb_idl|WARN|transaction error:
 {"details":"Transaction causes multiple rows in \"Datapath_Binding\" table
 to have identical values (1) for index on column \"tunnel_key\".  First
 row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was inserted by
 this transaction.  Second row, with UUID 
 8e06f919-4cc7-4ffc-9a79-20ce6663b683,
 existed in the database before this transaction and was not modified by the
 transaction.","error":"constraint violation"}

 In southbound datapath list, 2 duplicate records gets created for same
 switch.

 # ovn-sbctl list Datapath
 _uuid   : b270ae30-3458-445f-95d2-b14e8ebddd01
 external_ids: 
 {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
 name="ls2"}
 tunnel_key  : 2

 _uuid   : 8e06f919-4cc7-4ffc-9a79-20ce6663b683
 external_ids: 
 {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
 name="ls2"}
 tunnel_key  : 1



 # on nodes 1 and 2 where northd is running, it gives below error:
 2018-03-21T06:01:59.437Z|8|ovsdb_idl|WARN|transaction error:
 {"details":"cannot delete Datapath_Binding row
 8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining
 reference(s)","error":"referential integrity violation"}

 As per commit message, for northd I re-tried setting --ovnnb-db="tcp:
 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"
 and --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
 10.148.181.162:6642" and it did not help either.

 There is no issue if I keep running only one instance of northd on any
 of these 3 nodes. Hence, wanted to know is there something else
 missing here to make only one northd instance as active and rest as
 standby?


 Regards,

 On Thu, Mar 15, 2018 at 3:09 AM, Numan Siddique 
 wrote:

> That's great
>
> Numan
>
>
> On Thu, Mar 15, 2018 at 2:57 AM, aginwala  wrote:
>
>> Hi Numan:
>>
>> I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu 16.04)with
>> fresh installation and it worked super fine for both sb and nb dbs. Seems
>> like some kernel issue on the previous nodes when I re-installed raft 
>> patch
>> as I was running different ovs version on those nodes before.
>>
>>
>> For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp:
>> 10.169.125.131:6642, tcp:10.148.181.162:6642"  and started
>> controller and it works super fine.
>>
>>
>> Did some failover testing by rebooting/killing the leader (
>> 10.169.125.152) and bringing it back up and it works as expected.
>> Nothing weird noted so far.
>>
>> # check-cluster gives below data one of the node(10.148.181.162) post
>> leader failure
>>
>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
>> ovsdb-tool: leader /etc/openvswitch/ovnsb_db.db for term 2 has log
>> entries only up to index 18446744073709551615, but 

Re: [ovs-discuss] raft ovsdb clustering

2018-03-21 Thread aginwala
Thanks Numan:

Yup agree with the locking part. For now; yes I am running northd on one
node. I might right a script to monitor northd  in cluster so that if the
node where it's running goes down, script can spin up northd on one other
active nodes as a dirty hack.

Sure, will await for the inputs from Ben too on this and see how complex
would it be to roll out this feature.


Regards,


On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique  wrote:

> Hi Aliasgar,
>
> ovsdb-server maintains locks per each connection and not across the db. A
> workaround for you now would be to configure all the ovn-northd instances
> to connect to one ovsdb-server if you want to have active/standy.
>
> Probably Ben can answer if there is a plan to support ovsdb locks across
> the db. We also need this support in networking-ovn as it also uses ovsdb
> locks.
>
> Thanks
> Numan
>
>
> On Wed, Mar 21, 2018 at 1:40 PM, aginwala  wrote:
>
>> Hi Numan:
>>
>> Just figured out that ovn-northd is running as active on all 3 nodes
>> instead of one active instance as I continued to test further which results
>> in db errors as per logs.
>>
>>
>> # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs in
>> ovn-north
>> 2018-03-21T06:01:59.442Z|7|ovsdb_idl|WARN|transaction error:
>> {"details":"Transaction causes multiple rows in \"Datapath_Binding\" table
>> to have identical values (1) for index on column \"tunnel_key\".  First
>> row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was inserted by
>> this transaction.  Second row, with UUID 
>> 8e06f919-4cc7-4ffc-9a79-20ce6663b683,
>> existed in the database before this transaction and was not modified by the
>> transaction.","error":"constraint violation"}
>>
>> In southbound datapath list, 2 duplicate records gets created for same
>> switch.
>>
>> # ovn-sbctl list Datapath
>> _uuid   : b270ae30-3458-445f-95d2-b14e8ebddd01
>> external_ids: {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
>> name="ls2"}
>> tunnel_key  : 2
>>
>> _uuid   : 8e06f919-4cc7-4ffc-9a79-20ce6663b683
>> external_ids: {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
>> name="ls2"}
>> tunnel_key  : 1
>>
>>
>>
>> # on nodes 1 and 2 where northd is running, it gives below error:
>> 2018-03-21T06:01:59.437Z|8|ovsdb_idl|WARN|transaction error:
>> {"details":"cannot delete Datapath_Binding row
>> 8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining
>> reference(s)","error":"referential integrity violation"}
>>
>> As per commit message, for northd I re-tried setting --ovnnb-db="tcp:
>> 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"
>> and --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
>> 10.148.181.162:6642" and it did not help either.
>>
>> There is no issue if I keep running only one instance of northd on any of
>> these 3 nodes. Hence, wanted to know is there something else missing
>> here to make only one northd instance as active and rest as standby?
>>
>>
>> Regards,
>>
>> On Thu, Mar 15, 2018 at 3:09 AM, Numan Siddique 
>> wrote:
>>
>>> That's great
>>>
>>> Numan
>>>
>>>
>>> On Thu, Mar 15, 2018 at 2:57 AM, aginwala  wrote:
>>>
 Hi Numan:

 I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu 16.04)with
 fresh installation and it worked super fine for both sb and nb dbs. Seems
 like some kernel issue on the previous nodes when I re-installed raft patch
 as I was running different ovs version on those nodes before.


 For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp:
 10.169.125.131:6642, tcp:10.148.181.162:6642"  and started controller
 and it works super fine.


 Did some failover testing by rebooting/killing the leader (
 10.169.125.152) and bringing it back up and it works as expected.
 Nothing weird noted so far.

 # check-cluster gives below data one of the node(10.148.181.162) post
 leader failure

 ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
 ovsdb-tool: leader /etc/openvswitch/ovnsb_db.db for term 2 has log
 entries only up to index 18446744073709551615, but index 9 was committed in
 a previous term (e.g. by /etc/openvswitch/ovnsb_db.db)


 For check-cluster, are we planning to add more output showing which
 node is active(leader), etc in upcoming versions ?


 Thanks a ton for helping sort this out.  I think the patch looks good
 to be merged post addressing of the comments by Justin along with the man
 page details for ovsdb-tool.


 I will do some more crash testing for the cluster along with the scale
 test and keep you posted if something unexpected is noted.



 Regards,



 On Tue, Mar 13, 2018 at 11:07 PM, Numan Siddique 
 wrote:

>
>
> On Wed, 

Re: [ovs-discuss] raft ovsdb clustering

2018-03-21 Thread Numan Siddique
Hi Aliasgar,

ovsdb-server maintains locks per each connection and not across the db. A
workaround for you now would be to configure all the ovn-northd instances
to connect to one ovsdb-server if you want to have active/standy.

Probably Ben can answer if there is a plan to support ovsdb locks across
the db. We also need this support in networking-ovn as it also uses ovsdb
locks.

Thanks
Numan


On Wed, Mar 21, 2018 at 1:40 PM, aginwala  wrote:

> Hi Numan:
>
> Just figured out that ovn-northd is running as active on all 3 nodes
> instead of one active instance as I continued to test further which results
> in db errors as per logs.
>
>
> # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs in
> ovn-north
> 2018-03-21T06:01:59.442Z|7|ovsdb_idl|WARN|transaction error:
> {"details":"Transaction causes multiple rows in \"Datapath_Binding\" table
> to have identical values (1) for index on column \"tunnel_key\".  First
> row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was inserted by this
> transaction.  Second row, with UUID 8e06f919-4cc7-4ffc-9a79-20ce6663b683,
> existed in the database before this transaction and was not modified by the
> transaction.","error":"constraint violation"}
>
> In southbound datapath list, 2 duplicate records gets created for same
> switch.
>
> # ovn-sbctl list Datapath
> _uuid   : b270ae30-3458-445f-95d2-b14e8ebddd01
> external_ids: {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
> name="ls2"}
> tunnel_key  : 2
>
> _uuid   : 8e06f919-4cc7-4ffc-9a79-20ce6663b683
> external_ids: {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
> name="ls2"}
> tunnel_key  : 1
>
>
>
> # on nodes 1 and 2 where northd is running, it gives below error:
> 2018-03-21T06:01:59.437Z|8|ovsdb_idl|WARN|transaction error:
> {"details":"cannot delete Datapath_Binding row
> 8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining
> reference(s)","error":"referential integrity violation"}
>
> As per commit message, for northd I re-tried setting --ovnnb-db="tcp:
> 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"  and
> --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
> 10.148.181.162:6642" and it did not help either.
>
> There is no issue if I keep running only one instance of northd on any of
> these 3 nodes. Hence, wanted to know is there something else missing here
> to make only one northd instance as active and rest as standby?
>
>
> Regards,
>
> On Thu, Mar 15, 2018 at 3:09 AM, Numan Siddique 
> wrote:
>
>> That's great
>>
>> Numan
>>
>>
>> On Thu, Mar 15, 2018 at 2:57 AM, aginwala  wrote:
>>
>>> Hi Numan:
>>>
>>> I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu 16.04)with
>>> fresh installation and it worked super fine for both sb and nb dbs. Seems
>>> like some kernel issue on the previous nodes when I re-installed raft patch
>>> as I was running different ovs version on those nodes before.
>>>
>>>
>>> For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp:
>>> 10.169.125.131:6642, tcp:10.148.181.162:6642"  and started controller
>>> and it works super fine.
>>>
>>>
>>> Did some failover testing by rebooting/killing the leader (
>>> 10.169.125.152) and bringing it back up and it works as expected.
>>> Nothing weird noted so far.
>>>
>>> # check-cluster gives below data one of the node(10.148.181.162) post
>>> leader failure
>>>
>>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
>>> ovsdb-tool: leader /etc/openvswitch/ovnsb_db.db for term 2 has log
>>> entries only up to index 18446744073709551615, but index 9 was committed in
>>> a previous term (e.g. by /etc/openvswitch/ovnsb_db.db)
>>>
>>>
>>> For check-cluster, are we planning to add more output showing which node
>>> is active(leader), etc in upcoming versions ?
>>>
>>>
>>> Thanks a ton for helping sort this out.  I think the patch looks good to
>>> be merged post addressing of the comments by Justin along with the man page
>>> details for ovsdb-tool.
>>>
>>>
>>> I will do some more crash testing for the cluster along with the scale
>>> test and keep you posted if something unexpected is noted.
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> On Tue, Mar 13, 2018 at 11:07 PM, Numan Siddique 
>>> wrote:
>>>


 On Wed, Mar 14, 2018 at 7:51 AM, aginwala  wrote:

> Sure.
>
> To add on , I also ran for nb db too using different port  and Node2
> crashes with same error :
> # Node 2
> /usr/share/openvswitch/scripts/ovn-ctl --db-nb-addr=10.99.152.138
> --db-nb-port=6641 --db-nb-cluster-remote-addr="tcp:10.99.152.148:6645"
> --db-nb-cluster-local-addr="tcp:10.99.152.138:6645" start_nb_ovsdb
> ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db: cannot
> identify file type
>
>
>
 Hi Aliasgar,

 It worked for me. Can you delete the old 

Re: [ovs-discuss] raft ovsdb clustering

2018-03-21 Thread aginwala
Hi Numan:

Just figured out that ovn-northd is running as active on all 3 nodes
instead of one active instance as I continued to test further which results
in db errors as per logs.


# on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs in
ovn-north
2018-03-21T06:01:59.442Z|7|ovsdb_idl|WARN|transaction error:
{"details":"Transaction causes multiple rows in \"Datapath_Binding\" table
to have identical values (1) for index on column \"tunnel_key\".  First
row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was inserted by this
transaction.  Second row, with UUID 8e06f919-4cc7-4ffc-9a79-20ce6663b683,
existed in the database before this transaction and was not modified by the
transaction.","error":"constraint violation"}

In southbound datapath list, 2 duplicate records gets created for same
switch.

# ovn-sbctl list Datapath
_uuid   : b270ae30-3458-445f-95d2-b14e8ebddd01
external_ids:
{logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d", name="ls2"}
tunnel_key  : 2

_uuid   : 8e06f919-4cc7-4ffc-9a79-20ce6663b683
external_ids:
{logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d", name="ls2"}
tunnel_key  : 1



# on nodes 1 and 2 where northd is running, it gives below error:
2018-03-21T06:01:59.437Z|8|ovsdb_idl|WARN|transaction error:
{"details":"cannot delete Datapath_Binding row
8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining
reference(s)","error":"referential integrity violation"}

As per commit message, for northd I re-tried setting --ovnnb-db="tcp:
10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"  and
--ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
10.148.181.162:6642" and it did not help either.

There is no issue if I keep running only one instance of northd on any of
these 3 nodes. Hence, wanted to know is there something else missing here
to make only one northd instance as active and rest as standby?


Regards,

On Thu, Mar 15, 2018 at 3:09 AM, Numan Siddique  wrote:

> That's great
>
> Numan
>
>
> On Thu, Mar 15, 2018 at 2:57 AM, aginwala  wrote:
>
>> Hi Numan:
>>
>> I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu 16.04)with
>> fresh installation and it worked super fine for both sb and nb dbs. Seems
>> like some kernel issue on the previous nodes when I re-installed raft patch
>> as I was running different ovs version on those nodes before.
>>
>>
>> For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp:
>> 10.169.125.131:6642, tcp:10.148.181.162:6642"  and started controller
>> and it works super fine.
>>
>>
>> Did some failover testing by rebooting/killing the leader (10.169.125.152)
>> and bringing it back up and it works as expected. Nothing weird noted so
>> far.
>>
>> # check-cluster gives below data one of the node(10.148.181.162) post
>> leader failure
>>
>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
>> ovsdb-tool: leader /etc/openvswitch/ovnsb_db.db for term 2 has log
>> entries only up to index 18446744073709551615, but index 9 was committed in
>> a previous term (e.g. by /etc/openvswitch/ovnsb_db.db)
>>
>>
>> For check-cluster, are we planning to add more output showing which node
>> is active(leader), etc in upcoming versions ?
>>
>>
>> Thanks a ton for helping sort this out.  I think the patch looks good to
>> be merged post addressing of the comments by Justin along with the man page
>> details for ovsdb-tool.
>>
>>
>> I will do some more crash testing for the cluster along with the scale
>> test and keep you posted if something unexpected is noted.
>>
>>
>>
>> Regards,
>>
>>
>>
>> On Tue, Mar 13, 2018 at 11:07 PM, Numan Siddique 
>> wrote:
>>
>>>
>>>
>>> On Wed, Mar 14, 2018 at 7:51 AM, aginwala  wrote:
>>>
 Sure.

 To add on , I also ran for nb db too using different port  and Node2
 crashes with same error :
 # Node 2
 /usr/share/openvswitch/scripts/ovn-ctl --db-nb-addr=10.99.152.138
 --db-nb-port=6641 --db-nb-cluster-remote-addr="tcp:10.99.152.148:6645"
 --db-nb-cluster-local-addr="tcp:10.99.152.138:6645" start_nb_ovsdb
 ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db: cannot
 identify file type



>>> Hi Aliasgar,
>>>
>>> It worked for me. Can you delete the old db files in /etc/openvswitch/
>>> and try running the commands again ?
>>>
>>> Below are the commands I ran in my setup.
>>>
>>> Node 1
>>> ---
>>> sudo /usr/share/openvswitch/scripts/ovn-ctl
>>> --db-sb-addr=192.168.121.91 --db-sb-port=6642 
>>> --db-sb-create-insecure-remote=yes
>>> --db-sb-cluster-local-addr=tcp:192.168.121.91:6644 start_sb_ovsdb
>>>
>>> Node 2
>>> -
>>> sudo /usr/share/openvswitch/scripts/ovn-ctl
>>> --db-sb-addr=192.168.121.87 --db-sb-port=6642 
>>> --db-sb-create-insecure-remote=yes
>>> --db-sb-cluster-local-addr="tcp:192.168.121.87:6644"
>>> 

Re: [ovs-discuss] raft ovsdb clustering

2018-03-15 Thread Numan Siddique
That's great

Numan


On Thu, Mar 15, 2018 at 2:57 AM, aginwala  wrote:

> Hi Numan:
>
> I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu 16.04)with fresh
> installation and it worked super fine for both sb and nb dbs. Seems like
> some kernel issue on the previous nodes when I re-installed raft patch as I
> was running different ovs version on those nodes before.
>
>
> For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp:
> 10.169.125.131:6642, tcp:10.148.181.162:6642"  and started controller and
> it works super fine.
>
>
> Did some failover testing by rebooting/killing the leader (10.169.125.152)
> and bringing it back up and it works as expected. Nothing weird noted so
> far.
>
> # check-cluster gives below data one of the node(10.148.181.162) post
> leader failure
>
> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
> ovsdb-tool: leader /etc/openvswitch/ovnsb_db.db for term 2 has log entries
> only up to index 18446744073709551615, but index 9 was committed in a
> previous term (e.g. by /etc/openvswitch/ovnsb_db.db)
>
>
> For check-cluster, are we planning to add more output showing which node
> is active(leader), etc in upcoming versions ?
>
>
> Thanks a ton for helping sort this out.  I think the patch looks good to
> be merged post addressing of the comments by Justin along with the man page
> details for ovsdb-tool.
>
>
> I will do some more crash testing for the cluster along with the scale
> test and keep you posted if something unexpected is noted.
>
>
>
> Regards,
>
>
>
> On Tue, Mar 13, 2018 at 11:07 PM, Numan Siddique 
> wrote:
>
>>
>>
>> On Wed, Mar 14, 2018 at 7:51 AM, aginwala  wrote:
>>
>>> Sure.
>>>
>>> To add on , I also ran for nb db too using different port  and Node2
>>> crashes with same error :
>>> # Node 2
>>> /usr/share/openvswitch/scripts/ovn-ctl --db-nb-addr=10.99.152.138
>>> --db-nb-port=6641 --db-nb-cluster-remote-addr="tcp:10.99.152.148:6645"
>>> --db-nb-cluster-local-addr="tcp:10.99.152.138:6645" start_nb_ovsdb
>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db: cannot identify
>>> file type
>>>
>>>
>>>
>> Hi Aliasgar,
>>
>> It worked for me. Can you delete the old db files in /etc/openvswitch/
>> and try running the commands again ?
>>
>> Below are the commands I ran in my setup.
>>
>> Node 1
>> ---
>> sudo /usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.121.91
>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>> --db-sb-cluster-local-addr=tcp:192.168.121.91:6644 start_sb_ovsdb
>>
>> Node 2
>> -
>> sudo /usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.121.87
>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>> --db-sb-cluster-local-addr="tcp:192.168.121.87:6644"
>> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644"  start_sb_ovsdb
>>
>> Node 3
>> -
>> sudo /usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.121.78
>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>> --db-sb-cluster-local-addr="tcp:192.168.121.78:6644"
>> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644"  start_sb_ovsdb
>>
>>
>>
>> Thanks
>> Numan
>>
>>
>>
>>
>>
>>>
>>> On Tue, Mar 13, 2018 at 9:40 AM, Numan Siddique 
>>> wrote:
>>>


 On Tue, Mar 13, 2018 at 9:46 PM, aginwala  wrote:

> Thanks Numan for the response.
>
> There is no command start_cluster_sb_ovsdb in the source code too. Is
> that in a separate commit somewhere? Hence, I used start_sb_ovsdb
> which I think would not be a right choice?
>

 Sorry, I meant start_sb_ovsdb. Strange that it didn't work for you. Let
 me try it out again and update this thread.

 Thanks
 Numan


>
> # Node1  came up as expected.
> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp:
> 10.99.152.148:6644" start_sb_ovsdb.
>
> # verifying its a clustered db with ovsdb-tool db-local-address
> /etc/openvswitch/ovnsb_db.db
> tcp:10.99.152.148:6644
> # ovn-sbctl show works fine and chassis are being populated correctly.
>
> #Node 2 fails with error:
> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot
> identify file type
>
> # So i did start the sb db the usual way using start_ovsdb to just get
> the db file created and killed the sb pid and re-ran the command which 
> gave
> actual error where it complains for join-cluster command that is being
> called internally
> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
> --db-sb-port=6642 

Re: [ovs-discuss] raft ovsdb clustering

2018-03-14 Thread aginwala
Hi Numan:

I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu 16.04)with fresh
installation and it worked super fine for both sb and nb dbs. Seems like
some kernel issue on the previous nodes when I re-installed raft patch as I
was running different ovs version on those nodes before.


For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp:
10.169.125.131:6642, tcp:10.148.181.162:6642"  and started controller and
it works super fine.


Did some failover testing by rebooting/killing the leader (10.169.125.152)
and bringing it back up and it works as expected. Nothing weird noted so
far.

# check-cluster gives below data one of the node(10.148.181.162) post
leader failure

ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
ovsdb-tool: leader /etc/openvswitch/ovnsb_db.db for term 2 has log entries
only up to index 18446744073709551615, but index 9 was committed in a
previous term (e.g. by /etc/openvswitch/ovnsb_db.db)


For check-cluster, are we planning to add more output showing which node is
active(leader), etc in upcoming versions ?


Thanks a ton for helping sort this out.  I think the patch looks good to be
merged post addressing of the comments by Justin along with the man page
details for ovsdb-tool.


I will do some more crash testing for the cluster along with the scale test
and keep you posted if something unexpected is noted.



Regards,



On Tue, Mar 13, 2018 at 11:07 PM, Numan Siddique 
wrote:

>
>
> On Wed, Mar 14, 2018 at 7:51 AM, aginwala  wrote:
>
>> Sure.
>>
>> To add on , I also ran for nb db too using different port  and Node2
>> crashes with same error :
>> # Node 2
>> /usr/share/openvswitch/scripts/ovn-ctl --db-nb-addr=10.99.152.138
>> --db-nb-port=6641 --db-nb-cluster-remote-addr="tcp:10.99.152.148:6645"
>> --db-nb-cluster-local-addr="tcp:10.99.152.138:6645" start_nb_ovsdb
>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db: cannot identify
>> file type
>>
>>
>>
> Hi Aliasgar,
>
> It worked for me. Can you delete the old db files in /etc/openvswitch/ and
> try running the commands again ?
>
> Below are the commands I ran in my setup.
>
> Node 1
> ---
> sudo /usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.121.91
> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
> --db-sb-cluster-local-addr=tcp:192.168.121.91:6644 start_sb_ovsdb
>
> Node 2
> -
> sudo /usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.121.87
> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
> --db-sb-cluster-local-addr="tcp:192.168.121.87:6644"
> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644"  start_sb_ovsdb
>
> Node 3
> -
> sudo /usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.121.78
> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
> --db-sb-cluster-local-addr="tcp:192.168.121.78:6644"
> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644"  start_sb_ovsdb
>
>
>
> Thanks
> Numan
>
>
>
>
>
>>
>> On Tue, Mar 13, 2018 at 9:40 AM, Numan Siddique 
>> wrote:
>>
>>>
>>>
>>> On Tue, Mar 13, 2018 at 9:46 PM, aginwala  wrote:
>>>
 Thanks Numan for the response.

 There is no command start_cluster_sb_ovsdb in the source code too. Is
 that in a separate commit somewhere? Hence, I used start_sb_ovsdb
 which I think would not be a right choice?

>>>
>>> Sorry, I meant start_sb_ovsdb. Strange that it didn't work for you. Let
>>> me try it out again and update this thread.
>>>
>>> Thanks
>>> Numan
>>>
>>>

 # Node1  came up as expected.
 ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
 --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp:
 10.99.152.148:6644" start_sb_ovsdb.

 # verifying its a clustered db with ovsdb-tool db-local-address
 /etc/openvswitch/ovnsb_db.db
 tcp:10.99.152.148:6644
 # ovn-sbctl show works fine and chassis are being populated correctly.

 #Node 2 fails with error:
 /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
 --db-sb-port=6642 --db-sb-create-insecure-remote=yes
 --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
 --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
 ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot
 identify file type

 # So i did start the sb db the usual way using start_ovsdb to just get
 the db file created and killed the sb pid and re-ran the command which gave
 actual error where it complains for join-cluster command that is being
 called internally
 /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
 --db-sb-port=6642 --db-sb-create-insecure-remote=yes
 --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
 --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
 ovsdb-tool: /etc/openvswitch/ovnsb_db.db: not a clustered database
  * Backing up database to 

Re: [ovs-discuss] raft ovsdb clustering

2018-03-13 Thread aginwala
Sure.

To add on , I also ran for nb db too using different port  and Node2
crashes with same error :
# Node 2
/usr/share/openvswitch/scripts/ovn-ctl --db-nb-addr=10.99.152.138
--db-nb-port=6641 --db-nb-cluster-remote-addr="tcp:10.99.152.148:6645"
--db-nb-cluster-local-addr="tcp:10.99.152.138:6645" start_nb_ovsdb
ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db: cannot identify
file type



On Tue, Mar 13, 2018 at 9:40 AM, Numan Siddique  wrote:

>
>
> On Tue, Mar 13, 2018 at 9:46 PM, aginwala  wrote:
>
>> Thanks Numan for the response.
>>
>> There is no command start_cluster_sb_ovsdb in the source code too. Is
>> that in a separate commit somewhere? Hence, I used start_sb_ovsdb which
>> I think would not be a right choice?
>>
>
> Sorry, I meant start_sb_ovsdb. Strange that it didn't work for you. Let me
> try it out again and update this thread.
>
> Thanks
> Numan
>
>
>>
>> # Node1  came up as expected.
>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
>> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp:
>> 10.99.152.148:6644" start_sb_ovsdb.
>>
>> # verifying its a clustered db with ovsdb-tool db-local-address
>> /etc/openvswitch/ovnsb_db.db
>> tcp:10.99.152.148:6644
>> # ovn-sbctl show works fine and chassis are being populated correctly.
>>
>> #Node 2 fails with error:
>> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot identify
>> file type
>>
>> # So i did start the sb db the usual way using start_ovsdb to just get
>> the db file created and killed the sb pid and re-ran the command which gave
>> actual error where it complains for join-cluster command that is being
>> called internally
>> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
>> ovsdb-tool: /etc/openvswitch/ovnsb_db.db: not a clustered database
>>  * Backing up database to /etc/openvswitch/ovnsb_db.db.b
>> ackup1.15.0-70426956
>> ovsdb-tool: 'join-cluster' command requires at least 4 arguments
>>  * Creating cluster database /etc/openvswitch/ovnsb_db.db from existing
>> one
>>
>>
>> # based on above error I killed the sb db pid again and  try to create a
>> local cluster on node  then re-ran the join operation as per the source
>> code function.
>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
>> 10.99.152.138:6644 tcp:10.99.152.148:6644 which still complains
>> ovsdb-tool: I/O error: /etc/openvswitch/ovnsb_db.db: create failed (File
>> exists)
>>
>>
>> # Node 3: I did not try as I am assuming the same failure as node 2
>>
>>
>> Let me know may know further.
>>
>>
>> On Tue, Mar 13, 2018 at 3:08 AM, Numan Siddique 
>> wrote:
>>
>>> Hi Aliasgar,
>>>
>>> On Tue, Mar 13, 2018 at 7:11 AM, aginwala  wrote:
>>>
 Hi Ben/Noman:

 I am trying to setup 3 node southbound db cluster  using raft10
  in review.

 # Node 1 create-cluster
 ovsdb-tool create-cluster /etc/openvswitch/ovnsb_db.db
 /root/ovs-reviews/ovn/ovn-sb.ovsschema tcp:10.99.152.148:6642

>>>
>>> A different port is used for RAFT. So you have to choose another port
>>> like 6644 for example.
>>>
>>

 # Node 2
 ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
 10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
 5dfcb678-bb1d-4377-b02d-a380edec2982

 #Node 3
 ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
 10.99.152.101:6642 tcp:10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
 5dfcb678-bb1d-4377-b02d-a380edec2982

 # ovn remote is set to all 3 nodes
 external_ids:ovn-remote="tcp:10.99.152.148:6642, tcp:10.99.152.138:6642,
 tcp:10.99.152.101:6642"

>>>
 # Starting sb db on node 1 using below command on node 1:

 ovsdb-server --detach --monitor -vconsole:off -vraft -vjsonrpc
 --log-file=/var/log/openvswitch/ovsdb-server-sb.log
 --pidfile=/var/run/openvswitch/ovnsb_db.pid
 --remote=db:OVN_Southbound,SB_Global,connections
 --unixctl=ovnsb_db.ctl --private-key=db:OVN_Southbound,SSL,private_key
 --certificate=db:OVN_Southbound,SSL,certificate
 --ca-cert=db:OVN_Southbound,SSL,ca_cert 
 --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
 --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers
 --remote=punix:/var/run/openvswitch/ovnsb_db.sock
 /etc/openvswitch/ovnsb_db.db

 # check-cluster is returning nothing
 ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db

Re: [ovs-discuss] raft ovsdb clustering

2018-03-13 Thread Numan Siddique
On Tue, Mar 13, 2018 at 9:46 PM, aginwala  wrote:

> Thanks Numan for the response.
>
> There is no command start_cluster_sb_ovsdb in the source code too. Is
> that in a separate commit somewhere? Hence, I used start_sb_ovsdb which I
> think would not be a right choice?
>

Sorry, I meant start_sb_ovsdb. Strange that it didn't work for you. Let me
try it out again and update this thread.

Thanks
Numan


>
> # Node1  came up as expected.
> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp:
> 10.99.152.148:6644" start_sb_ovsdb.
>
> # verifying its a clustered db with ovsdb-tool db-local-address
> /etc/openvswitch/ovnsb_db.db
> tcp:10.99.152.148:6644
> # ovn-sbctl show works fine and chassis are being populated correctly.
>
> #Node 2 fails with error:
> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot identify
> file type
>
> # So i did start the sb db the usual way using start_ovsdb to just get the
> db file created and killed the sb pid and re-ran the command which gave
> actual error where it complains for join-cluster command that is being
> called internally
> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
> ovsdb-tool: /etc/openvswitch/ovnsb_db.db: not a clustered database
>  * Backing up database to /etc/openvswitch/ovnsb_db.db.b
> ackup1.15.0-70426956
> ovsdb-tool: 'join-cluster' command requires at least 4 arguments
>  * Creating cluster database /etc/openvswitch/ovnsb_db.db from existing one
>
>
> # based on above error I killed the sb db pid again and  try to create a
> local cluster on node  then re-ran the join operation as per the source
> code function.
> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
> 10.99.152.138:6644 tcp:10.99.152.148:6644 which still complains
> ovsdb-tool: I/O error: /etc/openvswitch/ovnsb_db.db: create failed (File
> exists)
>
>
> # Node 3: I did not try as I am assuming the same failure as node 2
>
>
> Let me know may know further.
>
>
> On Tue, Mar 13, 2018 at 3:08 AM, Numan Siddique 
> wrote:
>
>> Hi Aliasgar,
>>
>> On Tue, Mar 13, 2018 at 7:11 AM, aginwala  wrote:
>>
>>> Hi Ben/Noman:
>>>
>>> I am trying to setup 3 node southbound db cluster  using raft10
>>>  in review.
>>>
>>> # Node 1 create-cluster
>>> ovsdb-tool create-cluster /etc/openvswitch/ovnsb_db.db
>>> /root/ovs-reviews/ovn/ovn-sb.ovsschema tcp:10.99.152.148:6642
>>>
>>
>> A different port is used for RAFT. So you have to choose another port
>> like 6644 for example.
>>
>
>>>
>>> # Node 2
>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
>>> 10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
>>> 5dfcb678-bb1d-4377-b02d-a380edec2982
>>>
>>> #Node 3
>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
>>> 10.99.152.101:6642 tcp:10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
>>> 5dfcb678-bb1d-4377-b02d-a380edec2982
>>>
>>> # ovn remote is set to all 3 nodes
>>> external_ids:ovn-remote="tcp:10.99.152.148:6642, tcp:10.99.152.138:6642,
>>> tcp:10.99.152.101:6642"
>>>
>>
>>> # Starting sb db on node 1 using below command on node 1:
>>>
>>> ovsdb-server --detach --monitor -vconsole:off -vraft -vjsonrpc
>>> --log-file=/var/log/openvswitch/ovsdb-server-sb.log
>>> --pidfile=/var/run/openvswitch/ovnsb_db.pid
>>> --remote=db:OVN_Southbound,SB_Global,connections --unixctl=ovnsb_db.ctl
>>> --private-key=db:OVN_Southbound,SSL,private_key
>>> --certificate=db:OVN_Southbound,SSL,certificate
>>> --ca-cert=db:OVN_Southbound,SSL,ca_cert 
>>> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
>>> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers
>>> --remote=punix:/var/run/openvswitch/ovnsb_db.sock
>>> /etc/openvswitch/ovnsb_db.db
>>>
>>> # check-cluster is returning nothing
>>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
>>>
>>> # ovsdb-server-sb.log below shows the leader is elected with only one
>>> server and there are rbac related debug logs with rpc replies and empty
>>> params with no errors
>>>
>>> 2018-03-13T01:12:02Z|2|raft|DBG|server 63d1 added to configuration
>>> 2018-03-13T01:12:02Z|3|raft|INFO|term 6: starting election
>>> 2018-03-13T01:12:02Z|4|raft|INFO|term 6: elected leader by 1+ of 1
>>> servers
>>>
>>>
>>> Now Starting the ovsdb-server on the other clusters fails saying
>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot identify
>>> file type
>>>
>>>
>>> Also noticed that 

Re: [ovs-discuss] raft ovsdb clustering

2018-03-13 Thread aginwala
Thanks Numan for the response.

There is no command start_cluster_sb_ovsdb in the source code too. Is that
in a separate commit somewhere? Hence, I used start_sb_ovsdb which I think
would not be a right choice?

# Node1  came up as expected.
ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
--db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp:
10.99.152.148:6644" start_sb_ovsdb.

# verifying its a clustered db with ovsdb-tool db-local-address
/etc/openvswitch/ovnsb_db.db
tcp:10.99.152.148:6644
# ovn-sbctl show works fine and chassis are being populated correctly.

#Node 2 fails with error:
/usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
--db-sb-port=6642 --db-sb-create-insecure-remote=yes
--db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
--db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot identify
file type

# So i did start the sb db the usual way using start_ovsdb to just get the
db file created and killed the sb pid and re-ran the command which gave
actual error where it complains for join-cluster command that is being
called internally
/usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
--db-sb-port=6642 --db-sb-create-insecure-remote=yes
--db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
--db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
ovsdb-tool: /etc/openvswitch/ovnsb_db.db: not a clustered database
 * Backing up database to /etc/openvswitch/ovnsb_db.db.backup1.15.0-70426956
ovsdb-tool: 'join-cluster' command requires at least 4 arguments
 * Creating cluster database /etc/openvswitch/ovnsb_db.db from existing one


# based on above error I killed the sb db pid again and  try to create a
local cluster on node  then re-ran the join operation as per the source
code function.
ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
10.99.152.138:6644 tcp:10.99.152.148:6644 which still complains
ovsdb-tool: I/O error: /etc/openvswitch/ovnsb_db.db: create failed (File
exists)


# Node 3: I did not try as I am assuming the same failure as node 2


Let me know may know further.

On Tue, Mar 13, 2018 at 3:08 AM, Numan Siddique  wrote:

> Hi Aliasgar,
>
> On Tue, Mar 13, 2018 at 7:11 AM, aginwala  wrote:
>
>> Hi Ben/Noman:
>>
>> I am trying to setup 3 node southbound db cluster  using raft10
>>  in review.
>>
>> # Node 1 create-cluster
>> ovsdb-tool create-cluster /etc/openvswitch/ovnsb_db.db
>> /root/ovs-reviews/ovn/ovn-sb.ovsschema tcp:10.99.152.148:6642
>>
>
> A different port is used for RAFT. So you have to choose another port like
> 6644 for example.
>

>>
>> # Node 2
>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
>> 10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
>> 5dfcb678-bb1d-4377-b02d-a380edec2982
>>
>> #Node 3
>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
>> 10.99.152.101:6642 tcp:10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
>> 5dfcb678-bb1d-4377-b02d-a380edec2982
>>
>> # ovn remote is set to all 3 nodes
>> external_ids:ovn-remote="tcp:10.99.152.148:6642, tcp:10.99.152.138:6642,
>> tcp:10.99.152.101:6642"
>>
>
>> # Starting sb db on node 1 using below command on node 1:
>>
>> ovsdb-server --detach --monitor -vconsole:off -vraft -vjsonrpc
>> --log-file=/var/log/openvswitch/ovsdb-server-sb.log
>> --pidfile=/var/run/openvswitch/ovnsb_db.pid
>> --remote=db:OVN_Southbound,SB_Global,connections --unixctl=ovnsb_db.ctl
>> --private-key=db:OVN_Southbound,SSL,private_key
>> --certificate=db:OVN_Southbound,SSL,certificate
>> --ca-cert=db:OVN_Southbound,SSL,ca_cert 
>> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
>> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers
>> --remote=punix:/var/run/openvswitch/ovnsb_db.sock
>> /etc/openvswitch/ovnsb_db.db
>>
>> # check-cluster is returning nothing
>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
>>
>> # ovsdb-server-sb.log below shows the leader is elected with only one
>> server and there are rbac related debug logs with rpc replies and empty
>> params with no errors
>>
>> 2018-03-13T01:12:02Z|2|raft|DBG|server 63d1 added to configuration
>> 2018-03-13T01:12:02Z|3|raft|INFO|term 6: starting election
>> 2018-03-13T01:12:02Z|4|raft|INFO|term 6: elected leader by 1+ of 1
>> servers
>>
>>
>> Now Starting the ovsdb-server on the other clusters fails saying
>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot identify
>> file type
>>
>>
>> Also noticed that man ovsdb-tool is missing cluster details. Might want
>> to address it in the same patch or different.
>>
>>
>> Please advise to what is missing here for running ovn-sbctl show as this
>> command hangs.
>>
>>
>>
>
> I think you can use the ovn-ctl command "start_cluster_sb_ovsdb" for your
> testing (atleast for now)
>
> For your setup, I think you can start the cluster as

Re: [ovs-discuss] raft ovsdb clustering

2018-03-13 Thread Numan Siddique
Hi Aliasgar,

On Tue, Mar 13, 2018 at 7:11 AM, aginwala  wrote:

> Hi Ben/Noman:
>
> I am trying to setup 3 node southbound db cluster  using raft10
>  in review.
>
> # Node 1 create-cluster
> ovsdb-tool create-cluster /etc/openvswitch/ovnsb_db.db
> /root/ovs-reviews/ovn/ovn-sb.ovsschema tcp:10.99.152.148:6642
>

A different port is used for RAFT. So you have to choose another port like
6644 for example.

>
>
> # Node 2
> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
> 10.99.152.138:6642 tcp:10.99.152.148:6642 --cid 5dfcb678-bb1d-4377-b02d-
> a380edec2982
>
> #Node 3
> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
> 10.99.152.101:6642 tcp:10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
> 5dfcb678-bb1d-4377-b02d-a380edec2982
>
> # ovn remote is set to all 3 nodes
> external_ids:ovn-remote="tcp:10.99.152.148:6642, tcp:10.99.152.138:6642,
> tcp:10.99.152.101:6642"
>

> # Starting sb db on node 1 using below command on node 1:
>
> ovsdb-server --detach --monitor -vconsole:off -vraft -vjsonrpc
> --log-file=/var/log/openvswitch/ovsdb-server-sb.log 
> --pidfile=/var/run/openvswitch/ovnsb_db.pid
> --remote=db:OVN_Southbound,SB_Global,connections --unixctl=ovnsb_db.ctl
> --private-key=db:OVN_Southbound,SSL,private_key 
> --certificate=db:OVN_Southbound,SSL,certificate
> --ca-cert=db:OVN_Southbound,SSL,ca_cert 
> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers 
> --remote=punix:/var/run/openvswitch/ovnsb_db.sock
> /etc/openvswitch/ovnsb_db.db
>
> # check-cluster is returning nothing
> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
>
> # ovsdb-server-sb.log below shows the leader is elected with only one
> server and there are rbac related debug logs with rpc replies and empty
> params with no errors
>
> 2018-03-13T01:12:02Z|2|raft|DBG|server 63d1 added to configuration
> 2018-03-13T01:12:02Z|3|raft|INFO|term 6: starting election
> 2018-03-13T01:12:02Z|4|raft|INFO|term 6: elected leader by 1+ of 1
> servers
>
>
> Now Starting the ovsdb-server on the other clusters fails saying
> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot identify
> file type
>
>
> Also noticed that man ovsdb-tool is missing cluster details. Might want to
> address it in the same patch or different.
>
>
> Please advise to what is missing here for running ovn-sbctl show as this
> command hangs.
>
>
>

I think you can use the ovn-ctl command "start_cluster_sb_ovsdb" for your
testing (atleast for now)

For your setup, I think you can start the cluster as

# Node 1
ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
--db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp:
10.99.152.148:6644" start_cluster_sb_ovsdb

# Node 2
ovn-ctl --db-sb-addr=10.99.152.138 --db-sb-port=6642
--db-sb-create-insecure-remote=yes
--db-sb-cluster-local-addr="tcp:10.99.152.138:6644"
--db-sb-cluster-remote-addr="tcp:10.99.152.148:6644" start_cluster_sb_ovsdb

# Node 3
ovn-ctl --db-sb-addr=10.99.152.101 --db-sb-port=6642
--db-sb-create-insecure-remote=yes
--db-sb-cluster-local-addr="tcp:10.99.152.101:6644"
--db-sb-cluster-remote-addr="tcp:10.99.152.148:6644" start_cluster_sb_ovsdb


Let me know how it goes.

Thanks
Numan



>
>
>
>
>
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss