Re: [ovs-discuss] [branch-2.16] ovn distributed gateway chassisredirect strip_vlan not taking effect with stt

2024-04-24 Thread aginwala via discuss
Update:

Seems in upstream 5.4 linux, it only clears vlan_present vs old 4.15 kernel
https://github.com/torvalds/linux/blob/v5.4/net/core/skbuff.c#L5408
int skb_vlan_pop(struct sk_buff *skb)
{
  u16 vlan_tci;
  __be16 vlan_proto;
  int err;

  if (likely(skb_vlan_tag_present(skb))) {
__vlan_hwaccel_clear_tag(skb);
  } else {
...

static inline void __vlan_hwaccel_clear_tag(struct sk_buff *skb)
{
  skb->vlan_present = 0;  only clears 'present' flag
}


Hence, we patched stt on branch 2.16 ovs t
## update __push_stt_header on ovs 2.16
diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
index 39a294764..ad1f0aa39 100644
--- a/datapath/linux/compat/stt.c
+++ b/datapath/linux/compat/stt.c
@@ -622,7 +622,9 @@ static int __push_stt_header(struct sk_buff *skb,
__be64 tun_id,
stth->flags |= STT_CSUM_VERIFIED;
}

-   stth->vlan_tci = htons(skb->vlan_tci);
+   if (skb_vlan_tag_present(skb)) {
+stth->vlan_tci = htons(skb->vlan_tci);
+}
skb->vlan_tci = 0;
put_unaligned(tun_id, >key);

Looks like part of linux change, stt side it was either not called out or
missed. Hence, let us know for any further amendments on above changes if
any as issue is mitigated with this patch and workaround is needed no more.
We will do some more tests and call out for any other failures.


Regards,
Aliasgar

On Tue, Apr 23, 2024 at 10:35 AM aginwala  wrote:

> Hi:
>
> Data plane restores when cleaning up flows using ovs-dpctl del-flows and
> eventually all the flows catch up as flows added by ovn are intact.
> However, not sure what flow caused this as the issue pops up on
> ovs-vswitchd restarts and needs to be  workaround by dpctl del-flows. Not
> sure if it's due to version compatibility with 2.11 ovn and 2.16 ovs or any
> particular patch in ovs/ovn that already has this fix . Will keep looking
> in parallel as the workaround unblocks this for now. Any additional
> pointers would be good too vs this workaround.
>
> Regards,
> Aliasgar
>
>
> On Fri, Apr 19, 2024 at 4:24 PM aginwala  wrote:
>
>> Hi All:
>>
>> Part of upgrading OVN north south gateway to the new 5.4 kernel , VMs
>> connectivity is lost when setting chassis for provider network lrp to this
>> new gateway. For interconnection gateways and hypervisors its not an issue/
>> lrp
>> _uuid   : 387a735d-fc11-4e90-8655-07785aa024af
>> chassis : b80a285b-586a-42d9-b189-69d641f143b1
>> datapath: d9219b69-5961-4f24-8414-1d4054b23169
>> external_ids: {}
>> gateway_chassis : [728adc6d-3236-4637-86e3-0f6745cf1b50,
>> 7a372e68-c228-400b-9a4b-439cf234ed40, 82295a9c-02aa-416b-bac3-83755c687caf,
>> d1b42374-c475-4745-abdb-36e72140c5b5]
>> logical_port: "cr-lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"
>> mac : ["74:db:d1:80:d3:af 10.169.247.140/24"]
>> nat_addresses   : []
>> options :
>> {distributed-port="lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"}
>> parent_port : []
>> tag : []
>> tunnel_key  : 2
>> type: chassisredirect
>>
>> provider network
>> port provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90
>> type: localnet
>> tag: 20
>> addresses: ["unknown"]
>> ## encap ip for ovn is on eth0
>>
>> ## gw interfaces brens2f0 hosts uplink provider network
>> ovs-vsctl list-br
>> br-int
>> brens2f0
>> ovs-vsctl list-ports brens2f0
>> ens2f0
>> patch-provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90-to-br-int
>> ## fail mode secure
>> ovs-vsctl get-fail-mode br-int
>> secure
>> ## set chassis
>> ovn-nbctl lrp-set-gateway-chassis
>> lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e
>> cee81be9-f782-4c82-800e-c5c5327531e4 101
>>
>> ovn-controller is running as a container on the new gateway
>> ovn-controller --version
>> ovn-controller (Open vSwitch) 2.11.1-13
>> OpenFlow versions 0x4:0x4
>>
>> ## ovs on the host 5.4 kernel
>> ovs-vsctl --version
>> ovs-vsctl (Open vSwitch) 2.16.0
>> DB Schema 8.3.0
>>
>> ovs-ofctl --version
>> ovs-ofctl (Open vSwitch) 2.16.0
>> OpenFlow versions 0x1:0x6
>>
>>
>> Digging further with tcpdump on the destination vm interface shows vlan
>> being present causing connectivity failure and no reply packet
>> 20:26:06.371540 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q
>> (0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id
>> 53702, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 >
>&g

Re: [ovs-discuss] [branch-2.16] ovn distributed gateway chassisredirect strip_vlan not taking effect with stt

2024-04-23 Thread aginwala via discuss
Hi:

Data plane restores when cleaning up flows using ovs-dpctl del-flows and
eventually all the flows catch up as flows added by ovn are intact.
However, not sure what flow caused this as the issue pops up on
ovs-vswitchd restarts and needs to be  workaround by dpctl del-flows. Not
sure if it's due to version compatibility with 2.11 ovn and 2.16 ovs or any
particular patch in ovs/ovn that already has this fix . Will keep looking
in parallel as the workaround unblocks this for now. Any additional
pointers would be good too vs this workaround.

Regards,
Aliasgar


On Fri, Apr 19, 2024 at 4:24 PM aginwala  wrote:

> Hi All:
>
> Part of upgrading OVN north south gateway to the new 5.4 kernel , VMs
> connectivity is lost when setting chassis for provider network lrp to this
> new gateway. For interconnection gateways and hypervisors its not an issue/
> lrp
> _uuid   : 387a735d-fc11-4e90-8655-07785aa024af
> chassis : b80a285b-586a-42d9-b189-69d641f143b1
> datapath: d9219b69-5961-4f24-8414-1d4054b23169
> external_ids: {}
> gateway_chassis : [728adc6d-3236-4637-86e3-0f6745cf1b50,
> 7a372e68-c228-400b-9a4b-439cf234ed40, 82295a9c-02aa-416b-bac3-83755c687caf,
> d1b42374-c475-4745-abdb-36e72140c5b5]
> logical_port: "cr-lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"
> mac : ["74:db:d1:80:d3:af 10.169.247.140/24"]
> nat_addresses   : []
> options :
> {distributed-port="lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"}
> parent_port : []
> tag : []
> tunnel_key  : 2
> type: chassisredirect
>
> provider network
> port provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90
> type: localnet
> tag: 20
> addresses: ["unknown"]
> ## encap ip for ovn is on eth0
>
> ## gw interfaces brens2f0 hosts uplink provider network
> ovs-vsctl list-br
> br-int
> brens2f0
> ovs-vsctl list-ports brens2f0
> ens2f0
> patch-provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90-to-br-int
> ## fail mode secure
> ovs-vsctl get-fail-mode br-int
> secure
> ## set chassis
> ovn-nbctl lrp-set-gateway-chassis lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e
> cee81be9-f782-4c82-800e-c5c5327531e4 101
>
> ovn-controller is running as a container on the new gateway
> ovn-controller --version
> ovn-controller (Open vSwitch) 2.11.1-13
> OpenFlow versions 0x4:0x4
>
> ## ovs on the host 5.4 kernel
> ovs-vsctl --version
> ovs-vsctl (Open vSwitch) 2.16.0
> DB Schema 8.3.0
>
> ovs-ofctl --version
> ovs-ofctl (Open vSwitch) 2.16.0
> OpenFlow versions 0x1:0x6
>
>
> Digging further with tcpdump on the destination vm interface shows vlan
> being present causing connectivity failure and no reply packet
> 20:26:06.371540 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q
> (0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id
> 53702, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 >
> 10.78.8.42: ICMP echo request, id 7765, seq 791, length 64
> 20:26:07.375960 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q
> (0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id
> 36269, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 >
> 10.78.8.42: ICMP echo request, id 7765, seq 792, length 64
>
> openflow rules for atrip vlan 20 is correct that are programmed with ovn
> on new/old gw :
> ovs-ofctl dump-flows br-int | grep strip_vlan | grep 20
> cookie=0x0, duration=27.894s, table=65, n_packets=136, n_bytes=19198,
> idle_age=0, priority=100,reg15=0x1,metadata=0x1
> actions=mod_vlan_vid:20,output:161,strip_vlan
> cookie=0x0, duration=30.055s, table=0, n_packets=1592, n_bytes=130783,
> idle_age=0, priority=150,in_port=161,dl_vlan=20
> actions=strip_vlan,load:0xe1->NXM_NX_REG13[],load:0x36->NXM_NX_REG11[],load:0xd7->NXM_NX_REG12[],load:0x1->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],resubmit(,8)
>
>
> Checking ovs datapath flow shows vlan being present
> ovs-dpctl dump-flows  | grep vlan
> recirc_id(0x422),tunnel(tun_id=0x1006605,src=10.172.66.144,dst=10.173.84.83,flags(-df+csum+key)),in_port(1),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(dst=74:db:d1:80:0a:15),eth_type(0x8100),vlan(vid=20/0x14),encap(eth_type(0x0800),ipv4(frag=no)),
> packets:1713, bytes:174726, used:0.145s, actions:5
>
> Couldn't find much drift with ofproto/trace
> ovs-appctl ofproto/trace br-int in_port=2321,dl_vlan=20
> running on old/new gw (replace with in_port)
>
>
> Tried stripping on the hypervisor/compute and data plane is ok but thats
> not the right approach
> ovs-ofctl add-flow br-int "priority=65535,dl_vlan=20
> actions=strip_vlan,outp

[ovs-discuss] [branch-2.16] ovn distributed gateway chassisredirect strip_vlan not taking effect with stt

2024-04-19 Thread aginwala via discuss
Hi All:

Part of upgrading OVN north south gateway to the new 5.4 kernel , VMs
connectivity is lost when setting chassis for provider network lrp to this
new gateway. For interconnection gateways and hypervisors its not an issue/
lrp
_uuid   : 387a735d-fc11-4e90-8655-07785aa024af
chassis : b80a285b-586a-42d9-b189-69d641f143b1
datapath: d9219b69-5961-4f24-8414-1d4054b23169
external_ids: {}
gateway_chassis : [728adc6d-3236-4637-86e3-0f6745cf1b50,
7a372e68-c228-400b-9a4b-439cf234ed40, 82295a9c-02aa-416b-bac3-83755c687caf,
d1b42374-c475-4745-abdb-36e72140c5b5]
logical_port: "cr-lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"
mac : ["74:db:d1:80:d3:af 10.169.247.140/24"]
nat_addresses   : []
options :
{distributed-port="lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"}
parent_port : []
tag : []
tunnel_key  : 2
type: chassisredirect

provider network
port provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90
type: localnet
tag: 20
addresses: ["unknown"]
## encap ip for ovn is on eth0

## gw interfaces brens2f0 hosts uplink provider network
ovs-vsctl list-br
br-int
brens2f0
ovs-vsctl list-ports brens2f0
ens2f0
patch-provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90-to-br-int
## fail mode secure
ovs-vsctl get-fail-mode br-int
secure
## set chassis
ovn-nbctl lrp-set-gateway-chassis lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e
cee81be9-f782-4c82-800e-c5c5327531e4 101

ovn-controller is running as a container on the new gateway
ovn-controller --version
ovn-controller (Open vSwitch) 2.11.1-13
OpenFlow versions 0x4:0x4

## ovs on the host 5.4 kernel
ovs-vsctl --version
ovs-vsctl (Open vSwitch) 2.16.0
DB Schema 8.3.0

ovs-ofctl --version
ovs-ofctl (Open vSwitch) 2.16.0
OpenFlow versions 0x1:0x6


Digging further with tcpdump on the destination vm interface shows vlan
being present causing connectivity failure and no reply packet
20:26:06.371540 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q
(0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id
53702, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 >
10.78.8.42: ICMP echo request, id 7765, seq 791, length 64 20:26:07.375960
74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q (0x8100), length
102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id 36269, offset 0,
flags [none], proto ICMP (1), length 84) 10.228.4.180 > 10.78.8.42: ICMP
echo request, id 7765, seq 792, length 64

openflow rules for atrip vlan 20 is correct that are programmed with ovn on
new/old gw :
ovs-ofctl dump-flows br-int | grep strip_vlan | grep 20
cookie=0x0, duration=27.894s, table=65, n_packets=136, n_bytes=19198,
idle_age=0, priority=100,reg15=0x1,metadata=0x1
actions=mod_vlan_vid:20,output:161,strip_vlan
cookie=0x0, duration=30.055s, table=0, n_packets=1592, n_bytes=130783,
idle_age=0, priority=150,in_port=161,dl_vlan=20
actions=strip_vlan,load:0xe1->NXM_NX_REG13[],load:0x36->NXM_NX_REG11[],load:0xd7->NXM_NX_REG12[],load:0x1->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],resubmit(,8)


Checking ovs datapath flow shows vlan being present
ovs-dpctl dump-flows  | grep vlan
recirc_id(0x422),tunnel(tun_id=0x1006605,src=10.172.66.144,dst=10.173.84.83,flags(-df+csum+key)),in_port(1),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(dst=74:db:d1:80:0a:15),eth_type(0x8100),vlan(vid=20/0x14),encap(eth_type(0x0800),ipv4(frag=no)),
packets:1713, bytes:174726, used:0.145s, actions:5

Couldn't find much drift with ofproto/trace
ovs-appctl ofproto/trace br-int in_port=2321,dl_vlan=20
running on old/new gw (replace with in_port)


Tried stripping on the hypervisor/compute and data plane is ok but thats
not the right approach
ovs-ofctl add-flow br-int "priority=65535,dl_vlan=20
actions=strip_vlan,output:4597"

Downgrading the kernel to 4.15 and pinning to ovs 2.11 restores the data
plane with no vlan and 802.1q in the tcpdump on the destion workload tap
interface.


Is it a bug or known issue with later versions; post 2.11 version of ovs
when tagged vlan is present for provider network?

Tried to pin oflow version to 1.4 too but didn't help much as strip_vlan
flows are good. Any pointers further would be great as we continue to debug.


Regards,
Aliasgar
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] 2.16.0 deb: compilation/build error

2021-08-27 Thread aginwala
Hi All:

We ran into debian build issue for  latest ovs v2.16.0 against 5.4.0-80-generic
ubuntu 20
dh binary --with autoreconf,python3 --parallel
dh: error: unable to load addon python3: Can't locate
Debian/Debhelper/Sequence/python3.pm in @INC (you may need to install the
Debian::Debhelper::Sequence::python3 module) (@INC contains: /etc/perl
/usr/local/lib/x86_64-linux-gnu/perl/5.30.0 /usr/local/share/perl/5.30.0
/usr/lib/x86_64-linux-gnu/perl5/5.30 /usr/share/perl5
/usr/lib/x86_64-linux-gnu/perl/5.30 /usr/share/perl/5.30
/usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at (eval 13)
line 1.
BEGIN failed--compilation aborted at (eval 13) line 1.
BEGIN failed--compilation aborted at (eval 13) line 1.

make: *** [debian/rules:25: binary] Error 255


Was able to fix it by installing dh-python  explicitly as a build
dependency.

Should we include this in debian/control dependencies?
diff --git a/debian/control b/debian/control
index 6420b9d3e2..53a6b61f14 100644
--- a/debian/control
+++ b/debian/control
@@ -18,7 +18,8 @@ Build-Depends: graphviz,
python3-twisted,
python3-zope.interface,
libunbound-dev,
-   libunwind-dev
+   libunwind-dev,
+   dh-python



Ali
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] HA OVN "Central" as a kubernetes service

2020-07-06 Thread aginwala
Hi:

Adding the ML too. Folks from k8s can comment on the same to see if ovn-k8s
repo needs an update in the documentation for you to get the setup working
when using their specs as is without any code changes in addition to using
your own custom ovn images, etc. I am getting mail failure when adding
ovn-k8s google group as I think I don't have permission to post there. Also
the yaml specs and raft scripts have good comments which can give you a
clear idea too.

Also cc'd Girish who can comment further.


Also things like volumes(PV) for ovn central dedicated nodes, monitoring,
backing up ovn db,  etc. needs to be considered so that when the pod is
restarted or ovn version is upgraded, cluster settings are retained and
cluster health stats are also taken into consideration.


I got the design aspect of it sorted a week ago and had internal review too
cc Han as we do not use ovn as CNI too including some pending
containerizing items for ovn global dbs and ovn interconnect controller to
use for ovn interconnect. However, it's pending testing in k8s with all the
specs/tweaks due to some other priorities. As the approach taken by ovn-k8s
is succinct and already tested, it shouldn't be a bottleneck.

I agree that overall documentation needs to be consolidated on both ovn-k8s
side or ovn repo.

On Mon, Jul 6, 2020 at 9:49 AM Brendan Doyle 
wrote:

> Hi,
>
> I've been trying to follow the instructions at
> https://github.com/ovn-org/ovn-kubernetes
> to set up an OVN "Central/Master" high availability (HA).  I want to
> deploy and manage that
> cluster as a Kubernetes service .
>
> I can find lots of stuff on "ovn-kube" but this seems to be using OVN as
> a  kubernetes CNI instead of
> Flannel etc.  But this is not what I want to do, I have a kubernetes
> cluster using Flannel as the CNI,
> now  I want to deploy a HA OVN "Central" as a kubernetes service. Kind
> of like how you can deploy
> a MySQL cluster in kubernetes using a SatefulSet deployment.
>
> I have found this:
> https://github.com/ovn-org/ovn-kubernetes#readme
>
> But it is not clear to me if this is how to setup OVN as a kubernetes
> CNI or it's how to setup a HA OVN central as kubernetes service.
>
> I did try he steps in the READMe above, but they did not seem to work, then
> I have just seen that there is a ovnkube-db-raft.yaml file, this seems more
> promising as it does use a StatefulSet, but I can find no documentation
> on this
> file.
>
> Thanks
>
> Brendan
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] HA OVN "Central" as a kubernetes service

2020-07-06 Thread aginwala
On Mon, Jul 6, 2020 at 4:33 AM Brendan Doyle 
wrote:

> Hi,
>
> So I'm really confused by what you have pointed me to here. As stated I do
> NOT
> want to use OVN as a CNI. I have a k8s cluster that use flannel as the
> CNI. I simply
> want to create an OVN "central" cluster as a Stateful set in my *existing*
> K8
> config.
>
> This repo:
>
> https://github.com/ovn-org/ovn-kubernetes/commit/a07b1a01af7e37b15c2e5f179ffad2b9f25a083d
>
> Seems to be for setting up a K8s cluster to use OVN as the CNI??
> Have you tried this?
> What IP do the ovn-controllers use to reach the OVN "central cluster?
> It seems to use an OVN docker image from docker.io, I want to use my own
> OVN src
> Do I use/modify the dist/images/Dockerfile in this repo? that has loads of
> references to CNI
> like I said I don't want to use OVN as the CNI??
>
> A pre-req for running ovn central as a k8s app is containerize ovn central
components. Hence, you need to start your own containers using docker.
Either you follow the approach from ovn-k8s repo as to how to build ovn
images or refer to the docker instructions in ovn repo. Since this app (ovn
central) will run behind a k8s service, ovn-controller should point to the
service ip of ovn central k8s app. k8s folks can comment on how to build
image that is in k8s pod specs e.g
http://docker.io/ovnkube/ovn-daemonset:latest

>
> The instructions here
> https://github.com/ovn-org/ovn/blob/d6b56b1629d5984ef91864510f918e232efb89de/Documentation/intro/install/general.rst
> seem more promising, if not a little confusing:
>
> IN the section "Starting OVN Central services in containers"
>
> Export following variables in .env and place it under project root:
>
> $ OVN_BRANCH=
> $ OVN_VERSION=
> $ DISTRO=
> $ KERNEL_VERSION=
> $ GITHUB_SRC=
> $ DOCKER_REPO=
>
>
> Does it mean create a file called ".env" and place it in the toplevel dir
> of the cloned ovn repo?
> Or does it mean just add these to you shell environment (i.e put them in
> .bashrc)?
>
> You can just export OVN_BRANCH=xx in your shell for all variables and
build your containers with desired distro/version using make build
>
> Then we have:
>
> 1)
>
> Start OVN containers using below command:
>
> $ docker run -itd --net=host --name=ovn-nb \
>   : ovn-nb-tcp
>
> $ docker run -itd --net=host --name=ovn-sb \
>   : ovn-sb-tcp
>
> $ docker run -itd --net=host --name=ovn-northd \
>   : ovn-northd-tcp
>
> followed by
>
> 2)
>
> $ docker run -e "host_ip=" -e "nb_db_port=" -itd \
>   --name=ovn-nb-raft --net=host --privileged : \
>   ovn-nb-cluster-create
>
> $ docker run -e "host_ip=" -e "sb_db_port=" -itd \
>   --name=ovn-sb-raft --net=host --privileged : \
>   ovn-sb-cluster-create
>
> $ docker run -e "OVN_NB_DB=tcp::6641,tcp::6641,\
>   tcp::6641" -e "OVN_SB_DB=tcp::6642,tcp::6642,\
>   tcp::6642" -itd --name=ovn-northd-raft : \
>   ovn-northd-cluster
>
> Does it mean do 1), then 2) or does it mean do 1) for non HA OVN central
> *OR* 2)
> for HA/clustered OVN Central?
>
> Doc says Start OVN containers in cluster mode using below command on
node2 and node3 to make them join the peer using below command:. Hence, you
can even play with just docker on 3 nodes where you run step1 on node1 that
creates cluster and do the join-cluster on rest two nodes to give you a
clear idea before moving to pod in k8s. Not sure if you need more details
to update doc. We can always improvise. Upstream ovn-k8s does the same for
pods where e.g. ovn-kube0 pod creates a cluster and rest two pods joins

> It's not clear
>
> Thanks
>
>
>
>
>
>
> On 25/06/2020 17:36, aginwala wrote:
>
> Hi:
>
> There are a couple of options as I have been exploring this too:
>
> 1. Upstream ovn-k8s patches (
> https://github.com/ovn-org/ovn-kubernetes/commit/a07b1a01af7e37b15c2e5f179ffad2b9f25a083d)
> uses statefulset and headless service for starting ovn central raft cluster
> with 3 replicas. Cluster startup code and pod specs are pretty neat that
> addresses most of the doubts.
>
> OVN components have been containerized too to start them in pods. You can
> also refer to
> https://github.com/ovn-org/ovn/blob/d6b56b1629d5984ef91864510f918e232efb89de/Documentation/intro/install/general.rst
>  for the same and use them to make it work in pod specs too.
>
>
> 2. Write a new ovn operator similar to etcd operator
> https://github.com/coreos/etcd-operator which just takes the count of
> raft replicas and does the job in the background.
>
> I also added ovn-k8s group so they can comment on any other ideas too.
> Hope it helps.
>
>
>
> On Thu,

Re: [ovs-discuss] HA OVN "Central" as a kubernetes service

2020-06-25 Thread aginwala
Hi:

There are a couple of options as I have been exploring this too:

1. Upstream ovn-k8s patches (
https://github.com/ovn-org/ovn-kubernetes/commit/a07b1a01af7e37b15c2e5f179ffad2b9f25a083d)
uses statefulset and headless service for starting ovn central raft cluster
with 3 replicas. Cluster startup code and pod specs are pretty neat that
addresses most of the doubts.

OVN components have been containerized too to start them in pods. You can
also refer to
https://github.com/ovn-org/ovn/blob/d6b56b1629d5984ef91864510f918e232efb89de/Documentation/intro/install/general.rst
 for the same and use them to make it work in pod specs too.


2. Write a new ovn operator similar to etcd operator
https://github.com/coreos/etcd-operator which just takes the count of raft
replicas and does the job in the background.

I also added ovn-k8s group so they can comment on any other ideas too. Hope
it helps.



On Thu, Jun 25, 2020 at 7:15 AM Brendan Doyle 
wrote:

> Hi,
>
> So I'm trying to find information on setting up an OVN "Central/Master"
> high availability (HA)
> Not as Active-Backup with Pacemaker, but as a cluster. But I want to
> deploy and manage that
> cluster as a Kubernetes service .
>
> I can find lots of stuff on "ovn-kube" but this seems to be using OVN as
> a  kubernetes CNI instead of
> Flannel etc.  But this is not what I want to do, I have a kubernetes
> cluster using Flannel as the CNI,
> now  I want to deploy a HA OVN "Central" as a kubernetes service. Kind
> of like how you can deploy
> a MySQL cluster in kubernetes using a SatefulSet deployment.
>
> I have found this:
>   https://github.com/ovn-org/ovn-kubernetes#readme
>
> But it is not clear to me if this is how to setup OVN as a kubernetes
> CNI or it's how to setup a HA
> OVN central as kubernetes service.
>
> Can anybody comment, has anyone done this?
>
>
> I guess I could run an OVN central as standalone and use a kubernetes
> deployment with 3
>   replica sets and "export" as a NodePort service. And have a
> floating/VIP on my kubernetes
> nodes. And direct ovn-controllers to the VIP. So only the pod that holds
> the VIP would service
> requests. This would work and give HA, but you don't get the performance
> of an OVN
> clustered Database Model, where each OVN central could service requests.
>
>
>
>
> Thanks
>
>
> Rdgs
> Brendan
>
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] How to set inactivity_probe between members in RAFT cluster

2020-02-08 Thread aginwala
On Fri, Feb 7, 2020 at 6:26 PM taoyunupt  wrote:

> Hi,Aliasgar,
>Maybe I need to tell you the way how I deployed the
> RAFT cluster, to make you understand my situation. I have 3 servers ,IPs
> are  192.5.0.8, 192.5.0.9, 192.5.0.10.
>After  reading my steps,you may know why  my RAFT
> cluste do not have output of   "ovn-sbctl get-connection",but It also
> works most of time.
> If the way I used to deployed cluster is not good ,
> please point it out. Thanks very much.
>
Hi Yun:

Your approach to start the cluster seems correct. The reason you don't see
the connection entry after creating cluster is because it's needed for
clients to allow them to connect to the cluster and not to form the
cluster. Hence, you just need an additional step to create one nb and one
sb connection entry for which you can set connection to ptcp:6641/42 so
that clients like northd, ovn-controller, etc can connect to the cluster.
Please also refer to cluster db section in
https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb.7.rst.
for more details.

>
>   1.First step to create cluster by ovsdb-tool commands
>
> *Create* a cluster on the first node, and the IP
> address of this node is 192.5.0.8
>*# ovsdb-tool create-cluster
> /etc/openvswitch/ovnsb_db.db /usr/share/openvswitch/ovn-sb.ovsschema 
> tc**p:192.5.0.8:6644
> <http://192.5.0.8:6644>*
>  * # ovsdb-tool create-cluster
> /etc/openvswitch/ovnnb_db.db /usr/share/openvswitch/ovn-nb.ovssche**ma
> tcp:**192.5.0**.8:664**3*
>
>   *Join* the cluster on the second node, and the IP
> address of this node is 192.5.0.9
>   *# ovsdb-tool join-cluster
> /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:192.5.0.9:6644
> <http://192.5.0.9:6644> tcp:192.5.0.8:6644 <http://192.5.0.8:6644>
> tcp:192.5.0.10:6644 <http://192.5.0.10:6644>*
> * # ovsdb-tool join-cluster
> /etc/openvswitch/ovnnb_db.db OVN_Northbound tcp:192.5.0.9:6643
> <http://192.5.0.9:6643> tcp:192.5.0.8:6643 <http://192.5.0.8:6643>
> tcp:192.5.0.10:6643 <http://192.5.0.10:6643>*
>
>   *Join* the cluster on the third node, and the IP
> address of this node is 192.5.0.10
>   *#ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db
> OVN_Southbound tcp:192.5.0.10:6644 <http://192.5.0.10:6644>
> tcp:192.5.0.8:6644 <http://192.5.0.8:6644> tcp:192.5.0.9:6644
> <http://192.5.0.9:6644>*
> *#ovsdb-tool join-cluster /etc/openvswitch/ovnnb_db.db
> OVN_Northbound tcp:192.5.0.10:6643 <http://192.5.0.10:6643>
> tcp:192.5.0.8:6643 <http://192.5.0.8:6643> tcp:192.5.0.9:6643
> <http://192.5.0.9:6643>*
>
>  2.Second step to conifg cluster
>
>Edit the / etc / sysconfig / ovn-northd file of each
> node, add the OVN_NORTHD_OPTS option and content,
>The IP of first node is 192.5.0.8, the added content 
> is,*Other
> nodes are similar**:*
>
>*OVN_NORTHD_OPTS="--db-nb-addr=192.5.0.8
> --db-nb-create-insecure-remote=yes --db-sb-addr=192.5.0.8 \*
>
> *--db-sb-create-insecure-remote=yes --db-nb-cluster-local-addr=192.5.0.8
> --db-sb-cluster-local-addr=192.5.0.8 --ovn-northd-nb-db=tcp:192.5.0.8:6641
> <http://192.5.0.8:6641>,tcp:192.5.0.9:6641
> <http://192.5.0.9:6641>,tcp:192.5.0.10:6641 <http://192.5.0.10:6641> \*
>
> *--ovn-northd-sb-db=tcp:192.5.0.8:6642
> <http://192.5.0.8:6642>,tcp:192.5.0.9:6642
> <http://192.5.0.9:6642>,tcp:192.5.0.10:6642 <http://192.5.0.10:6642>"*
>
>3.Third step to start cluster
>
>  Execute the following command to start the cluster
>
> #systemctl restart openvswitch ovn-northd
>
> Regards,
> Yun
>
>
>
>
> 在 2020-02-07 22:45:36,"taoyunupt"  写道:
>
> Hi,Aliasgar,
>
>Thanks for your reply.  I have tried your suggestion. But I
> found that  it just could create one NB connection or one SB connection.
> In RAFT, we need at least two.
>That means  the output  of 'ovn-nbctl get-connection' has
> two lines. What do you think if I want to fix this problem?
>May be you don't need to consider how to have two
> connections for NB. Actually, I want to know how to solve the
> "inactivity_probe"  problem.
>
>
>
> Regards,
> Yun
>
> At 2020-02-07 03:05:37, "aginwala"  wrote:
>
> Hi Yun:
>
> For changing inactivity probe which is 5 sec de

Re: [ovs-discuss] [OVN] How to set inactivity_probe between members in RAFT cluster

2020-02-06 Thread aginwala
Hi Yun:

For changing inactivity probe which is 5 sec default, you need to create
connection entry both for sb and nb db.
ovn-nbctl -- --id=@conn_uuid create Connection \
target="\:\:" \
inactivity_probe= -- set NB_Global . connections=@conn_uuid

ovn-nbctl set connection . inactivity_probe= will then work!

To tune the election timer for raft on say nb db, you can tune with below
command:
ovs-appctl -t /var/run/openvswitch/ovnnb_db.ctl
cluster/change-election-timer OVN_Northbound 
You can run similar settings for sb db for tuning the value

On Wed, Feb 5, 2020 at 4:00 AM taoyunupt  wrote:

> Hi,Numan,
> I happend the problem that there are frequently elections
> in RAFT cluster members . I think it was cause by the not good connection
> between members of RARF cluster. As the log shows.
> Becase  the output of  "ovn-sbctl get-connection"  is none
> in RAFT cluster member,  So the command "ovn-sbctl set connection .
> inactivity_probe=18"  not works.
> Do you know how to set "inactivity_probe"  when we use
> RAFT cluster?   It will be appreciateed  if you have more suggestions.
>
>
> 2020-02-05T01:37:29.178Z|03424|reconnect|ERR|tcp:10.254.8.210:52048: no
> response to inactivity probe after 5 seconds, disconnecting
> 2020-02-05T01:37:30.519Z|03425|raft|INFO|tcp:10.xxx.8.210:59300: learned
> server ID cdec
> 2020-02-05T01:37:30.519Z|03426|raft|INFO|tcp:10.xxx.8.210:59300: learned
> remote address tcp:10.254.8.210:6643
> 2020-02-05T03:52:02.791Z|03427|raft|INFO|received leadership transfer from
> 3e2e in term 64
> 2020-02-05T03:52:02.791Z|03428|raft|INFO|term 65: starting election
> 2020-02-05T03:52:02.792Z|03429|reconnect|INFO|tcp:10.xxx.8.208:6643:
> connection closed by peer
> 2020-02-05T03:52:02.869Z|03430|raft|INFO|term 65: elected leader by 2+ of
> 3 servers
> 2020-02-05T03:52:03.210Z|03431|raft|INFO|tcp:10.xxx.8.208:46140: learned
> server ID 3e2e
> 2020-02-05T03:52:03.210Z|03432|raft|INFO|tcp:10.xxx.8.208:46140: learned
> remote address tcp:10.xxx.8.208:6643
> 2020-02-05T03:52:03.793Z|03433|reconnect|INFO|tcp:10.254.8.208:6643:
> connecting...
> 2020-02-05T03:52:03.793Z|03434|reconnect|INFO|tcp:10.254.8.208:6643:
> connected
>
>
> Thanks,
> Yun
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVS/OVN docker image for each stable release

2019-11-13 Thread aginwala
On Tue, Nov 12, 2019 at 10:57 PM Numan Siddique  wrote:

> On Wed, Nov 13, 2019 at 12:02 AM Shivaram Mysore
>  wrote:
> >
> > No need to indicate "built on Ubuntu" for docker image tags.
> > Alpine tag is specifically used as it used different libraries and image
> size is small.  Ideally, for Docker images, we should use Alpine Linux.  If
> OVS for Alpine is latest, then image size will be further reduced.
> >
> > Note thAt at the end of the day, container is just a delivery or
> packaging vehicle.
> >
> > /Shivaram
> > ::Sent from my mobile device::
> >
> > On Nov 12, 2019, at 9:49 AM, aginwala  wrote:
> >
> > 
> > Thanks Shivaram:
> >
> > On Tue, Nov 12, 2019 at 9:28 AM Shivaram Mysore <
> shivaram.mys...@gmail.com> wrote:
> >>
> >> I am not sure why "*_debian" is used.  The image should work across
> OS.  I have not seen use of  "*_linux"  as most docker images use some form
> of shell scripts.
> >>
> > Because the container image published is ubuntu and hence we tagged it
> with _debian. It doesn't indicate it will not work on rhel. If we all agree
> we can remove the tags and update the readme.md on docker.io that each
> container image is using ubuntu as base image. I am fine with any approach.
> >>
> >> Also, in my opinion, the docker image should not build OVS.  If it can
> add appropriate OVS packages like
> https://github.com/servicefractal/ovs/blob/master/Dockerfile is better as
> they are already tested.   Building OVS as a part of this will cause more
> testing impacts and is unnecessary.  The objective is to run OVS in a
> container image.  I would keep it simple.
>
> I think the idea was to have an OVS container image with the latest
> master code right Aliasgar ?

Yes. E.g ovs docker image for ovs release 2.12.0 with debian/rhel will
checkout v2.12.0 code from git and build it. That way source code will
exist in docker image from which ovs2.12.0 will be installed on that
container.

>
>
> Getting OVS packages is good, but then the debian/ubuntu/fedora
> packages  should be updated as soon as OVS does a release.

You mean to say e.g install with dpkg -i for latest version pushed using
*2.12.0.deb for debian and skip building from source Code?

So I think the open question now is;  do we want to have source code in
version specific container image with ovs installed from source code or we
just need version specific ovs installed without source in the container?

Thanks
> Numan
>
> >>
> > I think the objective is to have an image per upstream stable ovs
> release and hence building it in container. Hope everyone is ok here.
> >>
> >> On Tue, Nov 12, 2019 at 12:51 AM aginwala  wrote:
> >>>
> >>> Thanks Guru.
> >>>
> >>> On Mon, Nov 11, 2019 at 1:03 PM Guru Shetty  wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Mon, 11 Nov 2019 at 10:08, aginwala  wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Mon, Nov 11, 2019 at 9:00 AM Guru Shetty  wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Fri, 8 Nov 2019 at 14:41, aginwala  wrote:
> >>>>>>>
> >>>>>>> openvswitch.ko ships default with newer kernel but if we want to
> use say stt, we need to build it with respective kernel for host on which
> we will run. Hence, to skip host level installation , we pack the modules
> in container.
> >>>>>>
> >>>>>>
> >>>>>> It is not clear to me. Is DKMS enabled here? Or is it that
> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic will only work on kernel
> 4.15.0-66-generic?
> >>>>>>
> >>>>>
> >>>>> No. Dkms is not enabled because idea is to release a new docker
> image for every new kernel upgrade on compute (Not sure if dkms will help
> much in container case as we are not installing on host). Do you have any
> specific use case which? Yes on host with 4.15.0-66-generic.
> >>>>
> >>>>
> >>>> It will probably be very hard to release each OVS version to so many
> available kernels. How do you decide which kernel that you want to release
> a image for? What is the plan here? I think it makes sense to release one
> image without a kernel module packed with it.
> >>>>
> >>> Agree, we can't publish too many images based on different kernel
> versions. Hence, I am ok with the approach you proposed by publishing
> singl

Re: [ovs-discuss] OVS/OVN docker image for each stable release

2019-11-12 Thread aginwala
Thanks Shivaram:

On Tue, Nov 12, 2019 at 9:28 AM Shivaram Mysore 
wrote:

> I am not sure why "*_debian" is used.  The image should work across OS.
> I have not seen use of  "*_linux"  as most docker images use some form of
> shell scripts.
>
> Because the container image published is ubuntu and hence we tagged it
with _debian. It doesn't indicate it will not work on rhel. If we all agree
we can remove the tags and update the readme.md on docker.io that each
container image is using ubuntu as base image. I am fine with any approach.

> Also, in my opinion, the docker image should not build OVS.  If it can add
> appropriate OVS packages like
> https://github.com/servicefractal/ovs/blob/master/Dockerfile is better as
> they are already tested.   Building OVS as a part of this will cause more
> testing impacts and is unnecessary.  The objective is to run OVS in a
> container image.  I would keep it simple.
>
> I think the objective is to have an image per upstream stable ovs release
and hence building it in container. Hope everyone is ok here.

> On Tue, Nov 12, 2019 at 12:51 AM aginwala  wrote:
>
>> Thanks Guru.
>>
>> On Mon, Nov 11, 2019 at 1:03 PM Guru Shetty  wrote:
>>
>>>
>>>
>>> On Mon, 11 Nov 2019 at 10:08, aginwala  wrote:
>>>
>>>>
>>>>
>>>> On Mon, Nov 11, 2019 at 9:00 AM Guru Shetty  wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, 8 Nov 2019 at 14:41, aginwala  wrote:
>>>>>
>>>>>> openvswitch.ko ships default with newer kernel but if we want to use
>>>>>> say stt, we need to build it with respective kernel for host on which we
>>>>>> will run. Hence, to skip host level installation , we pack the modules in
>>>>>> container.
>>>>>>
>>>>>
>>>>> It is not clear to me. Is DKMS enabled here? Or is it that
>>>>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic will only work on
>>>>> kernel 4.15.0-66-generic?
>>>>>
>>>>>
>>>> No. Dkms is not enabled because idea is to release a new docker image
>>>> for every new kernel upgrade on compute (Not sure if dkms will help much in
>>>> container case as we are not installing on host). Do you have any specific
>>>> use case which? Yes on host with 4.15.0-66-generic.
>>>>
>>>
>>> It will probably be very hard to release each OVS version to so many
>>> available kernels. How do you decide which kernel that you want to release
>>> a image for? What is the plan here? I think it makes sense to release one
>>> image without a kernel module packed with it.
>>>
>>> Agree, we can't publish too many images based on different kernel
>> versions. Hence, I am ok with the approach you proposed by publishing
>> single image for each stable release leveraging host kernel modules. I have
>> pushed  2 debian images for each stable releases 2.11.2_debian and
>>  2.12.0_debian under openvswitch/ovs accordingly. I also sent the
>> corresponding patch https://patchwork.ozlabs.org/patch/1193372/ to
>> refactor the docker builds to support an option to skip kernel modules for
>> ovs repo so that user can choose to build/run with/without kernel modules.
>> Let me know further.
>>
>>
>>>
>>>
>>>>
>>>>>> On Fri, Nov 8, 2019 at 2:37 PM Guru Shetty  wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, 8 Nov 2019 at 14:18, aginwala  wrote:
>>>>>>>
>>>>>>>> Hi all:
>>>>>>>>
>>>>>>>>
>>>>>>>> I have pushed two images to public openvswitch org on docker.io
>>>>>>>> for ovs and ovn;
>>>>>>>> OVS for ubuntu with 4.15 kernel:
>>>>>>>> *openvswitch/ovs:2.12.0_debian_4.15.0-66-generic*
>>>>>>>>
>>>>>>>
>>>>>>> Why is the kernel important here? Is the OVS kernel module being
>>>>>>> packed?
>>>>>>>
>>>>>>>
>>>>>>>>      run as : docker run -itd --net=host
>>>>>>>> --name=ovsdb-server openvswitch/ovs:2.12.0_debian_4.15.0-66-generic
>>>>>>>> ovsdb-server
>>>>>>>> docker run -itd --net=host
>>>>>>>> --name=ovs-vswitchd  --volumes-from=ovsdb-serve

Re: [ovs-discuss] OVS/OVN docker image for each stable release

2019-11-12 Thread aginwala
Thanks Guru.

On Mon, Nov 11, 2019 at 1:03 PM Guru Shetty  wrote:

>
>
> On Mon, 11 Nov 2019 at 10:08, aginwala  wrote:
>
>>
>>
>> On Mon, Nov 11, 2019 at 9:00 AM Guru Shetty  wrote:
>>
>>>
>>>
>>> On Fri, 8 Nov 2019 at 14:41, aginwala  wrote:
>>>
>>>> openvswitch.ko ships default with newer kernel but if we want to use
>>>> say stt, we need to build it with respective kernel for host on which we
>>>> will run. Hence, to skip host level installation , we pack the modules in
>>>> container.
>>>>
>>>
>>> It is not clear to me. Is DKMS enabled here? Or is it that
>>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic will only work on
>>> kernel 4.15.0-66-generic?
>>>
>>>
>> No. Dkms is not enabled because idea is to release a new docker image for
>> every new kernel upgrade on compute (Not sure if dkms will help much in
>> container case as we are not installing on host). Do you have any specific
>> use case which? Yes on host with 4.15.0-66-generic.
>>
>
> It will probably be very hard to release each OVS version to so many
> available kernels. How do you decide which kernel that you want to release
> a image for? What is the plan here? I think it makes sense to release one
> image without a kernel module packed with it.
>
> Agree, we can't publish too many images based on different kernel
versions. Hence, I am ok with the approach you proposed by publishing
single image for each stable release leveraging host kernel modules. I have
pushed  2 debian images for each stable releases 2.11.2_debian and
 2.12.0_debian under openvswitch/ovs accordingly. I also sent the
corresponding patch https://patchwork.ozlabs.org/patch/1193372/ to refactor
the docker builds to support an option to skip kernel modules for ovs repo
so that user can choose to build/run with/without kernel modules. Let me
know further.


>
>
>>
>>>> On Fri, Nov 8, 2019 at 2:37 PM Guru Shetty  wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, 8 Nov 2019 at 14:18, aginwala  wrote:
>>>>>
>>>>>> Hi all:
>>>>>>
>>>>>>
>>>>>> I have pushed two images to public openvswitch org on docker.io for
>>>>>> ovs and ovn;
>>>>>> OVS for ubuntu with 4.15 kernel:
>>>>>> *openvswitch/ovs:2.12.0_debian_4.15.0-66-generic*
>>>>>>
>>>>>
>>>>> Why is the kernel important here? Is the OVS kernel module being
>>>>> packed?
>>>>>
>>>>>
>>>>>>  run as : docker run -itd --net=host --name=ovsdb-server
>>>>>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovsdb-server
>>>>>> docker run -itd --net=host
>>>>>> --name=ovs-vswitchd  --volumes-from=ovsdb-server --privileged
>>>>>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovs-vswitchd
>>>>>>
>>>>>> OVN debian docker image:
>>>>>> *openvswitch/ovn:2.12_e60f2f2_debian_master* as we don't have a
>>>>>> branch cut out for ovn yet. (Hence, tagged it with last commit on master)
>>>>>> Follow steps as per:
>>>>>> https://github.com/ovn-org/ovn/blob/master/Documentation/intro/install/general.rst
>>>>>>
>>>>>>
>>>>>> Thanks Guru for sorting out the access/cleanups for openvswitch org
>>>>>> on docker.io.
>>>>>>
>>>>>> We can plan to align this docker push for each stable release ahead.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Nov 8, 2019 at 10:17 AM aginwala  wrote:
>>>>>>
>>>>>>> Thanks Guru:
>>>>>>>
>>>>>>> Sounds good. Can you please grant user aginwala as admin? I can
>>>>>>> create two repos ovs and ovn under openvswitch org and can push new 
>>>>>>> stable
>>>>>>> release versions there.
>>>>>>>
>>>>>>> On Fri, Nov 8, 2019 at 10:04 AM Guru Shetty  wrote:
>>>>>>>
>>>>>>>> On Fri, 8 Nov 2019 at 09:53, Guru Shetty  wrote:
>>>>>>>>
>>>>>>>>> I had created a openvswitch repo in docker as a placeholder. Happy
>>>>>>>>> to provide it to whoever the admin is.
>>>>>>>>&g

Re: [ovs-discuss] OVS/OVN docker image for each stable release

2019-11-11 Thread aginwala
On Mon, Nov 11, 2019 at 9:00 AM Guru Shetty  wrote:

>
>
> On Fri, 8 Nov 2019 at 14:41, aginwala  wrote:
>
>> openvswitch.ko ships default with newer kernel but if we want to use say
>> stt, we need to build it with respective kernel for host on which we will
>> run. Hence, to skip host level installation , we pack the modules in
>> container.
>>
>
> It is not clear to me. Is DKMS enabled here? Or is it that
> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic will only work on
> kernel 4.15.0-66-generic?
>
>
No. Dkms is not enabled because idea is to release a new docker image for
every new kernel upgrade on compute (Not sure if dkms will help much in
container case as we are not installing on host). Do you have any specific
use case which? Yes on host with 4.15.0-66-generic.

>
>> On Fri, Nov 8, 2019 at 2:37 PM Guru Shetty  wrote:
>>
>>>
>>>
>>> On Fri, 8 Nov 2019 at 14:18, aginwala  wrote:
>>>
>>>> Hi all:
>>>>
>>>>
>>>> I have pushed two images to public openvswitch org on docker.io for
>>>> ovs and ovn;
>>>> OVS for ubuntu with 4.15 kernel:
>>>> *openvswitch/ovs:2.12.0_debian_4.15.0-66-generic*
>>>>
>>>
>>> Why is the kernel important here? Is the OVS kernel module being packed?
>>>
>>>
>>>>  run as : docker run -itd --net=host --name=ovsdb-server
>>>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovsdb-server
>>>> docker run -itd --net=host --name=ovs-vswitchd
>>>>  --volumes-from=ovsdb-server --privileged
>>>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovs-vswitchd
>>>>
>>>> OVN debian docker image:  *openvswitch/ovn:2.12_e60f2f2_debian_master*
>>>> as we don't have a branch cut out for ovn yet. (Hence, tagged it with last
>>>> commit on master)
>>>> Follow steps as per:
>>>> https://github.com/ovn-org/ovn/blob/master/Documentation/intro/install/general.rst
>>>>
>>>>
>>>> Thanks Guru for sorting out the access/cleanups for openvswitch org on
>>>> docker.io.
>>>>
>>>> We can plan to align this docker push for each stable release ahead.
>>>>
>>>>
>>>>
>>>> On Fri, Nov 8, 2019 at 10:17 AM aginwala  wrote:
>>>>
>>>>> Thanks Guru:
>>>>>
>>>>> Sounds good. Can you please grant user aginwala as admin? I can create
>>>>> two repos ovs and ovn under openvswitch org and can push new stable 
>>>>> release
>>>>> versions there.
>>>>>
>>>>> On Fri, Nov 8, 2019 at 10:04 AM Guru Shetty  wrote:
>>>>>
>>>>>> On Fri, 8 Nov 2019 at 09:53, Guru Shetty  wrote:
>>>>>>
>>>>>>> I had created a openvswitch repo in docker as a placeholder. Happy
>>>>>>> to provide it to whoever the admin is.
>>>>>>>
>>>>>>
>>>>>> i.e. You can use the keyword "openvswitch". For e.g., right now, it
>>>>>> has one stale image.
>>>>>>
>>>>>> docker run -d --net=none openvswitch/ipam:v2.4.90 /bin/sh -c "while
>>>>>> true; do echo hello world; sleep 1; done"
>>>>>>
>>>>>> So if we want the name "openvswitch", this is one option. If we
>>>>>> prefer ovs/ovn or other keywords, then the admin can create a new one.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Thu, 7 Nov 2019 at 13:15, aginwala  wrote:
>>>>>>>
>>>>>>>> Hi All:
>>>>>>>>
>>>>>>>> As discussed in the meeting today, we all agreed that it will be a
>>>>>>>> good idea to push docker images for each new ovs/ovn stable release. 
>>>>>>>> Hence,
>>>>>>>> need help from maintainers Ben/Mark/Justin/Han to address some open 
>>>>>>>> action
>>>>>>>> items as it is more of org/ownership/rights related:
>>>>>>>>
>>>>>>>>1. Get new repo created under docker.io with name either
>>>>>>>>ovs/ovn and declare it public repo
>>>>>>>>2. How about copy-rights for running images for open source
>>>>>>>>projects
>>>>>>>>3. Storage: unlimited or some limited GBs
>>>>>>>>4. Naming conventions for docker images ;e.g
>>>>>>>>openswitch/ovn:2.13.1_debian or openswitch/ovn:2.13.1_rhel.
>>>>>>>>Similar for ovs.
>>>>>>>>
>>>>>>>>
>>>>>>>> Once this is done, we can bundle docker image changes in the same
>>>>>>>> release process
>>>>>>>>
>>>>>>>> Please feel free to add any missing piece.
>>>>>>>>
>>>>>>>> ___
>>>>>>>> discuss mailing list
>>>>>>>> disc...@openvswitch.org
>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>>>>>>
>>>>>>>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVS/OVN docker image for each stable release

2019-11-08 Thread aginwala
Sounds good. Looking forward to it. Just want to reiterate that this
discussion is more about maintaining docker image for each stable ovn/ovs
upstream release and inputs from mainteners.  We can start a
separate thread for performance and other issues for running ovs/ovn in
containers accordingly after your talk.




On Fri, Nov 8, 2019 at 4:53 PM Shivaram Mysore 
wrote:

> I am giving a talk about the same at OVS conference.  Most of the info is
> documented in the github repo.
>
> If that does not help, please post questions and I will help document the
> same.
>
> /Shivaram
> ::Sent from my mobile device::
>
> On Nov 8, 2019, at 6:49 PM, aginwala  wrote:
>
> 
> Hi Shivaram:
>
> Thanks for comments. Can you explain what is the bottleneck? Also for
> addressing performance related issues that you suggested, I would say if
> you can submit PR in ovs repo mentioning to use additional docker options
> for startup for better performance, it would be helpful. I did not get a
> chance to try out additional options apart from the base ones as it just
> does its job at-least while running ovs/ovn in pre-prod/testing env. Didn't
> get as chance to scale test it.
>
> On Fri, Nov 8, 2019 at 3:35 PM Shivaram Mysore 
> wrote:
>
>> The point about kernel module is correct- no need to include it in docket
>> image.  It will not work.
>>
>> /Shivaram
>> ::Sent from my mobile device::
>>
>> On Nov 8, 2019, at 5:42 PM, aginwala  wrote:
>>
>> 
>> openvswitch.ko ships default with newer kernel but if we want to use say
>> stt, we need to build it with respective kernel for host on which we will
>> run. Hence, to skip host level installation , we pack the modules in
>> container.
>>
>> On Fri, Nov 8, 2019 at 2:37 PM Guru Shetty  wrote:
>>
>>>
>>>
>>> On Fri, 8 Nov 2019 at 14:18, aginwala  wrote:
>>>
>>>> Hi all:
>>>>
>>>>
>>>> I have pushed two images to public openvswitch org on docker.io for
>>>> ovs and ovn;
>>>> OVS for ubuntu with 4.15 kernel:
>>>> *openvswitch/ovs:2.12.0_debian_4.15.0-66-generic*
>>>>
>>>
>>> Why is the kernel important here? Is the OVS kernel module being packed?
>>>
>>>
>>>>  run as : docker run -itd --net=host --name=ovsdb-server
>>>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovsdb-server
>>>> docker run -itd --net=host --name=ovs-vswitchd
>>>>  --volumes-from=ovsdb-server --privileged
>>>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovs-vswitchd
>>>>
>>>> OVN debian docker image:  *openvswitch/ovn:2.12_e60f2f2_debian_master*
>>>> as we don't have a branch cut out for ovn yet. (Hence, tagged it with last
>>>> commit on master)
>>>> Follow steps as per:
>>>> https://github.com/ovn-org/ovn/blob/master/Documentation/intro/install/general.rst
>>>>
>>>>
>>>> Thanks Guru for sorting out the access/cleanups for openvswitch org on
>>>> docker.io.
>>>>
>>>> We can plan to align this docker push for each stable release ahead.
>>>>
>>>>
>>>>
>>>> On Fri, Nov 8, 2019 at 10:17 AM aginwala  wrote:
>>>>
>>>>> Thanks Guru:
>>>>>
>>>>> Sounds good. Can you please grant user aginwala as admin? I can create
>>>>> two repos ovs and ovn under openvswitch org and can push new stable 
>>>>> release
>>>>> versions there.
>>>>>
>>>>> On Fri, Nov 8, 2019 at 10:04 AM Guru Shetty  wrote:
>>>>>
>>>>>> On Fri, 8 Nov 2019 at 09:53, Guru Shetty  wrote:
>>>>>>
>>>>>>> I had created a openvswitch repo in docker as a placeholder. Happy
>>>>>>> to provide it to whoever the admin is.
>>>>>>>
>>>>>>
>>>>>> i.e. You can use the keyword "openvswitch". For e.g., right now, it
>>>>>> has one stale image.
>>>>>>
>>>>>> docker run -d --net=none openvswitch/ipam:v2.4.90 /bin/sh -c "while
>>>>>> true; do echo hello world; sleep 1; done"
>>>>>>
>>>>>> So if we want the name "openvswitch", this is one option. If we
>>>>>> prefer ovs/ovn or other keywords, then the admin can create a new one.
>>>>

Re: [ovs-discuss] OVS/OVN docker image for each stable release

2019-11-08 Thread aginwala
Hi Shivaram:

Thanks for comments. Can you explain what is the bottleneck? Also for
addressing performance related issues that you suggested, I would say if
you can submit PR in ovs repo mentioning to use additional docker options
for startup for better performance, it would be helpful. I did not get a
chance to try out additional options apart from the base ones as it just
does its job at-least while running ovs/ovn in pre-prod/testing env. Didn't
get as chance to scale test it.

On Fri, Nov 8, 2019 at 3:35 PM Shivaram Mysore 
wrote:

> The point about kernel module is correct- no need to include it in docket
> image.  It will not work.
>
> /Shivaram
> ::Sent from my mobile device::
>
> On Nov 8, 2019, at 5:42 PM, aginwala  wrote:
>
> 
> openvswitch.ko ships default with newer kernel but if we want to use say
> stt, we need to build it with respective kernel for host on which we will
> run. Hence, to skip host level installation , we pack the modules in
> container.
>
> On Fri, Nov 8, 2019 at 2:37 PM Guru Shetty  wrote:
>
>>
>>
>> On Fri, 8 Nov 2019 at 14:18, aginwala  wrote:
>>
>>> Hi all:
>>>
>>>
>>> I have pushed two images to public openvswitch org on docker.io for ovs
>>> and ovn;
>>> OVS for ubuntu with 4.15 kernel:
>>> *openvswitch/ovs:2.12.0_debian_4.15.0-66-generic*
>>>
>>
>> Why is the kernel important here? Is the OVS kernel module being packed?
>>
>>
>>>  run as : docker run -itd --net=host --name=ovsdb-server
>>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovsdb-server
>>> docker run -itd --net=host --name=ovs-vswitchd
>>>  --volumes-from=ovsdb-server --privileged
>>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovs-vswitchd
>>>
>>> OVN debian docker image:  *openvswitch/ovn:2.12_e60f2f2_debian_master*
>>> as we don't have a branch cut out for ovn yet. (Hence, tagged it with last
>>> commit on master)
>>> Follow steps as per:
>>> https://github.com/ovn-org/ovn/blob/master/Documentation/intro/install/general.rst
>>>
>>>
>>> Thanks Guru for sorting out the access/cleanups for openvswitch org on
>>> docker.io.
>>>
>>> We can plan to align this docker push for each stable release ahead.
>>>
>>>
>>>
>>> On Fri, Nov 8, 2019 at 10:17 AM aginwala  wrote:
>>>
>>>> Thanks Guru:
>>>>
>>>> Sounds good. Can you please grant user aginwala as admin? I can create
>>>> two repos ovs and ovn under openvswitch org and can push new stable release
>>>> versions there.
>>>>
>>>> On Fri, Nov 8, 2019 at 10:04 AM Guru Shetty  wrote:
>>>>
>>>>> On Fri, 8 Nov 2019 at 09:53, Guru Shetty  wrote:
>>>>>
>>>>>> I had created a openvswitch repo in docker as a placeholder. Happy to
>>>>>> provide it to whoever the admin is.
>>>>>>
>>>>>
>>>>> i.e. You can use the keyword "openvswitch". For e.g., right now, it
>>>>> has one stale image.
>>>>>
>>>>> docker run -d --net=none openvswitch/ipam:v2.4.90 /bin/sh -c "while
>>>>> true; do echo hello world; sleep 1; done"
>>>>>
>>>>> So if we want the name "openvswitch", this is one option. If we prefer
>>>>> ovs/ovn or other keywords, then the admin can create a new one.
>>>>>
>>>>>
>>>>>>
>>>>>> On Thu, 7 Nov 2019 at 13:15, aginwala  wrote:
>>>>>>
>>>>>>> Hi All:
>>>>>>>
>>>>>>> As discussed in the meeting today, we all agreed that it will be a
>>>>>>> good idea to push docker images for each new ovs/ovn stable release. 
>>>>>>> Hence,
>>>>>>> need help from maintainers Ben/Mark/Justin/Han to address some open 
>>>>>>> action
>>>>>>> items as it is more of org/ownership/rights related:
>>>>>>>
>>>>>>>1. Get new repo created under docker.io with name either ovs/ovn
>>>>>>>and declare it public repo
>>>>>>>2. How about copy-rights for running images for open source
>>>>>>>projects
>>>>>>>3. Storage: unlimited or some limited GBs
>>>>>>>4. Naming conventions for docker images ;e.g
>>>>>>>openswitch/ovn:2.13.1_debian or openswitch/ovn:2.13.1_rhel.
>>>>>>>Similar for ovs.
>>>>>>>
>>>>>>>
>>>>>>> Once this is done, we can bundle docker image changes in the same
>>>>>>> release process
>>>>>>>
>>>>>>> Please feel free to add any missing piece.
>>>>>>>
>>>>>>> ___
>>>>>>> discuss mailing list
>>>>>>> disc...@openvswitch.org
>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>>>>>
>>>>>> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVS/OVN docker image for each stable release

2019-11-08 Thread aginwala
openvswitch.ko ships default with newer kernel but if we want to use say
stt, we need to build it with respective kernel for host on which we will
run. Hence, to skip host level installation , we pack the modules in
container.

On Fri, Nov 8, 2019 at 2:37 PM Guru Shetty  wrote:

>
>
> On Fri, 8 Nov 2019 at 14:18, aginwala  wrote:
>
>> Hi all:
>>
>>
>> I have pushed two images to public openvswitch org on docker.io for ovs
>> and ovn;
>> OVS for ubuntu with 4.15 kernel:
>> *openvswitch/ovs:2.12.0_debian_4.15.0-66-generic*
>>
>
> Why is the kernel important here? Is the OVS kernel module being packed?
>
>
>>  run as : docker run -itd --net=host --name=ovsdb-server
>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovsdb-server
>> docker run -itd --net=host --name=ovs-vswitchd
>>  --volumes-from=ovsdb-server --privileged
>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovs-vswitchd
>>
>> OVN debian docker image:  *openvswitch/ovn:2.12_e60f2f2_debian_master*
>> as we don't have a branch cut out for ovn yet. (Hence, tagged it with last
>> commit on master)
>> Follow steps as per:
>> https://github.com/ovn-org/ovn/blob/master/Documentation/intro/install/general.rst
>>
>>
>> Thanks Guru for sorting out the access/cleanups for openvswitch org on
>> docker.io.
>>
>> We can plan to align this docker push for each stable release ahead.
>>
>>
>>
>> On Fri, Nov 8, 2019 at 10:17 AM aginwala  wrote:
>>
>>> Thanks Guru:
>>>
>>> Sounds good. Can you please grant user aginwala as admin? I can create
>>> two repos ovs and ovn under openvswitch org and can push new stable release
>>> versions there.
>>>
>>> On Fri, Nov 8, 2019 at 10:04 AM Guru Shetty  wrote:
>>>
>>>> On Fri, 8 Nov 2019 at 09:53, Guru Shetty  wrote:
>>>>
>>>>> I had created a openvswitch repo in docker as a placeholder. Happy to
>>>>> provide it to whoever the admin is.
>>>>>
>>>>
>>>> i.e. You can use the keyword "openvswitch". For e.g., right now, it has
>>>> one stale image.
>>>>
>>>> docker run -d --net=none openvswitch/ipam:v2.4.90 /bin/sh -c "while
>>>> true; do echo hello world; sleep 1; done"
>>>>
>>>> So if we want the name "openvswitch", this is one option. If we prefer
>>>> ovs/ovn or other keywords, then the admin can create a new one.
>>>>
>>>>
>>>>>
>>>>> On Thu, 7 Nov 2019 at 13:15, aginwala  wrote:
>>>>>
>>>>>> Hi All:
>>>>>>
>>>>>> As discussed in the meeting today, we all agreed that it will be a
>>>>>> good idea to push docker images for each new ovs/ovn stable release. 
>>>>>> Hence,
>>>>>> need help from maintainers Ben/Mark/Justin/Han to address some open 
>>>>>> action
>>>>>> items as it is more of org/ownership/rights related:
>>>>>>
>>>>>>1. Get new repo created under docker.io with name either ovs/ovn
>>>>>>and declare it public repo
>>>>>>2. How about copy-rights for running images for open source
>>>>>>projects
>>>>>>3. Storage: unlimited or some limited GBs
>>>>>>4. Naming conventions for docker images ;e.g
>>>>>>openswitch/ovn:2.13.1_debian or openswitch/ovn:2.13.1_rhel.
>>>>>>Similar for ovs.
>>>>>>
>>>>>>
>>>>>> Once this is done, we can bundle docker image changes in the same
>>>>>> release process
>>>>>>
>>>>>> Please feel free to add any missing piece.
>>>>>>
>>>>>> ___
>>>>>> discuss mailing list
>>>>>> disc...@openvswitch.org
>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>>>>
>>>>>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVS/OVN docker image for each stable release

2019-11-08 Thread aginwala
Hi all:


I have pushed two images to public openvswitch org on docker.io for ovs and
ovn;
OVS for ubuntu with 4.15 kernel:
*openvswitch/ovs:2.12.0_debian_4.15.0-66-generic*
 run as : docker run -itd --net=host --name=ovsdb-server
openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovsdb-server
docker run -itd --net=host --name=ovs-vswitchd
 --volumes-from=ovsdb-server --privileged
openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovs-vswitchd

OVN debian docker image:  *openvswitch/ovn:2.12_e60f2f2_debian_master* as
we don't have a branch cut out for ovn yet. (Hence, tagged it with last
commit on master)
Follow steps as per:
https://github.com/ovn-org/ovn/blob/master/Documentation/intro/install/general.rst


Thanks Guru for sorting out the access/cleanups for openvswitch org on
docker.io.

We can plan to align this docker push for each stable release ahead.



On Fri, Nov 8, 2019 at 10:17 AM aginwala  wrote:

> Thanks Guru:
>
> Sounds good. Can you please grant user aginwala as admin? I can create two
> repos ovs and ovn under openvswitch org and can push new stable release
> versions there.
>
> On Fri, Nov 8, 2019 at 10:04 AM Guru Shetty  wrote:
>
>> On Fri, 8 Nov 2019 at 09:53, Guru Shetty  wrote:
>>
>>> I had created a openvswitch repo in docker as a placeholder. Happy to
>>> provide it to whoever the admin is.
>>>
>>
>> i.e. You can use the keyword "openvswitch". For e.g., right now, it has
>> one stale image.
>>
>> docker run -d --net=none openvswitch/ipam:v2.4.90 /bin/sh -c "while true;
>> do echo hello world; sleep 1; done"
>>
>> So if we want the name "openvswitch", this is one option. If we prefer
>> ovs/ovn or other keywords, then the admin can create a new one.
>>
>>
>>>
>>> On Thu, 7 Nov 2019 at 13:15, aginwala  wrote:
>>>
>>>> Hi All:
>>>>
>>>> As discussed in the meeting today, we all agreed that it will be a good
>>>> idea to push docker images for each new ovs/ovn stable release. Hence, need
>>>> help from maintainers Ben/Mark/Justin/Han to address some open action items
>>>> as it is more of org/ownership/rights related:
>>>>
>>>>1. Get new repo created under docker.io with name either ovs/ovn
>>>>and declare it public repo
>>>>2. How about copy-rights for running images for open source projects
>>>>3. Storage: unlimited or some limited GBs
>>>>4. Naming conventions for docker images ;e.g
>>>>openswitch/ovn:2.13.1_debian or openswitch/ovn:2.13.1_rhel. Similar
>>>>for ovs.
>>>>
>>>>
>>>> Once this is done, we can bundle docker image changes in the same
>>>> release process
>>>>
>>>> Please feel free to add any missing piece.
>>>>
>>>> ___
>>>> discuss mailing list
>>>> disc...@openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>>
>>>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVS/OVN docker image for each stable release

2019-11-08 Thread aginwala
Thanks Guru:

Sounds good. Can you please grant user aginwala as admin? I can create two
repos ovs and ovn under openvswitch org and can push new stable release
versions there.

On Fri, Nov 8, 2019 at 10:04 AM Guru Shetty  wrote:

> On Fri, 8 Nov 2019 at 09:53, Guru Shetty  wrote:
>
>> I had created a openvswitch repo in docker as a placeholder. Happy to
>> provide it to whoever the admin is.
>>
>
> i.e. You can use the keyword "openvswitch". For e.g., right now, it has
> one stale image.
>
> docker run -d --net=none openvswitch/ipam:v2.4.90 /bin/sh -c "while true;
> do echo hello world; sleep 1; done"
>
> So if we want the name "openvswitch", this is one option. If we prefer
> ovs/ovn or other keywords, then the admin can create a new one.
>
>
>>
>> On Thu, 7 Nov 2019 at 13:15, aginwala  wrote:
>>
>>> Hi All:
>>>
>>> As discussed in the meeting today, we all agreed that it will be a good
>>> idea to push docker images for each new ovs/ovn stable release. Hence, need
>>> help from maintainers Ben/Mark/Justin/Han to address some open action items
>>> as it is more of org/ownership/rights related:
>>>
>>>1. Get new repo created under docker.io with name either ovs/ovn and
>>>declare it public repo
>>>2. How about copy-rights for running images for open source projects
>>>3. Storage: unlimited or some limited GBs
>>>4. Naming conventions for docker images ;e.g
>>>openswitch/ovn:2.13.1_debian or openswitch/ovn:2.13.1_rhel. Similar
>>>for ovs.
>>>
>>>
>>> Once this is done, we can bundle docker image changes in the same
>>> release process
>>>
>>> Please feel free to add any missing piece.
>>>
>>> ___
>>> discuss mailing list
>>> disc...@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>
>>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVS/OVN docker image for each stable release

2019-11-08 Thread aginwala
Thanks Shivaram.

I will wait for mainteners to comment as it would be nice to host a docker
image of at-least one stable release to start with which could be either
2.11/2.12 or upcoming 2.13 version under ovs/ovn org. What do you think?




On Thu, Nov 7, 2019 at 3:56 PM Shivaram Mysore 
wrote:

> Hi
> If it is useful, we can start this with:
> GitHub.com/ServiceFractal/ovs
>
> /Shivaram
> ::Sent from my mobile device::
>
> On Nov 7, 2019, at 4:15 PM, aginwala  wrote:
>
> 
> Hi All:
>
> As discussed in the meeting today, we all agreed that it will be a good
> idea to push docker images for each new ovs/ovn stable release. Hence, need
> help from maintainers Ben/Mark/Justin/Han to address some open action items
> as it is more of org/ownership/rights related:
>
>1. Get new repo created under docker.io with name either ovs/ovn and
>declare it public repo
>2. How about copy-rights for running images for open source projects
>3. Storage: unlimited or some limited GBs
>4. Naming conventions for docker images ;e.g
>openswitch/ovn:2.13.1_debian or openswitch/ovn:2.13.1_rhel. Similar
>for ovs.
>
>
> Once this is done, we can bundle docker image changes in the same release
> process
>
> Please feel free to add any missing piece.
>
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN RBAC role for ovn-northd?

2019-11-07 Thread aginwala
Thanks Frode for covering that. Added minor comments too your PR and you
can send formal patch.







On Thu, Nov 7, 2019 at 2:00 PM Frode Nordahl 
wrote:

> fwiw; I proposed this small note earlier this evening:
> https://github.com/ovn-org/ovn/pull/25
>
> tor. 7. nov. 2019, 21:47 skrev Ben Pfaff :
>
>> Sure, anything helps.
>>
>> On Thu, Nov 07, 2019 at 12:27:44PM -0800, aginwala wrote:
>> > Hi Ben:
>> >
>> > It seems RBAC doc
>> >
>> http://docs.openvswitch.org/en/stable/tutorials/ovn-rbac/#configuring-rbac
>> > only talks
>> > about chassis and not mentioning about northd. I can submit a patch to
>> > update that as a todo for northd and mention the workaround until we add
>> > formal support. Is that ok?
>> >
>> >
>> >
>> >
>> > On Thu, Nov 7, 2019 at 12:14 PM Ben Pfaff  wrote:
>> >
>> > > Have we documented this?  Should we?
>> > >
>> > > On Thu, Nov 07, 2019 at 10:20:22AM -0800, aginwala wrote:
>> > > > Hi:
>> > > >
>> > > > It is a known fact and have-been discussed before. We use the same
>> > > > workaround as you mentioned. Alternatively, you can also set
>> role="" and
>> > > it
>> > > > will work for both northd and ovn-controller instead of separate
>> > > listeners
>> > > > which is also a security loop-hole. In short, some work is needed
>> here
>> > > > to handle rbac for northd.
>> > > >
>> > > > On Thu, Nov 7, 2019 at 9:47 AM Frode Nordahl <
>> > > frode.nord...@canonical.com>
>> > > > wrote:
>> > > >
>> > > > > Hello all,
>> > > > >
>> > > > > TL;DR; When enabling the `ovn-controller` role on the SB DB
>> > > `ovsdb-server`
>> > > > > listener, `ovn-northd` no longer has the necessary access to do
>> its job
>> > > > > when you are unable to use the local unix socket for its
>> connection to
>> > > the
>> > > > > database.
>> > > > >
>> > > > > AFAICT there is no northd-specifc or admin type role available,
>> have I
>> > > > > missed something?
>> > > > >
>> > > > > I have worked around the issue by enabling a separate listener on
>> a
>> > > > > different port on the Southbound ovsdb-servers so that
>> `ovn-northd` can
>> > > > > connect to that.
>> > > > >
>> > > > >
>> > > > > I have a OVN deployment with central components spread across
>> three
>> > > > > machines, there is an instance of the Northbound and Southbound
>> > > > > `ovsdb-server` on each of them which are clustered, and there is
>> also
>> > > an
>> > > > > instance of `ovn-northd` on each of them.
>> > > > >
>> > > > > The deployment is TLS-enabled and I have enabled RBAC.
>> > > > >
>> > > > > Since the DBs are clustered I have no control of which machine
>> will be
>> > > the
>> > > > > leader, and it may be that one machine has the leader for the
>> > > Northbound DB
>> > > > > and a different machine has the leader of the Southbound DB.
>> > > > >
>> > > > > Because of this ovn-northd is unable to talk to the databases
>> through a
>> > > > > local unix socket and must use a TLS-enabled connection to the
>> DBs, and
>> > > > > herein lies the problem.
>> > > > >
>> > > > >
>> > > > > I peeked at the RBAC implementation, and it appears to me that the
>> > > > > permission system is tied to having specific columns in each
>> table that
>> > > > > maps to the name of the client that wants permission.  On the
>> surface
>> > > this
>> > > > > appears to not fit with `ovn-northd`'s needs as I would think it
>> would
>> > > need
>> > > > > full access to all tables perhaps based on a centrally managed
>> set of
>> > > > > hostnames.
>> > > > >
>> > > > > --
>> > > > > Frode Nordahl
>> > > > >
>> > > > > ___
>> > > > > discuss mailing list
>> > > > > disc...@openvswitch.org
>> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>> > > > >
>> > >
>> > > > ___
>> > > > discuss mailing list
>> > > > disc...@openvswitch.org
>> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>> > >
>> > >
>>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] OVS/OVN docker image for each stable release

2019-11-07 Thread aginwala
Hi All:

As discussed in the meeting today, we all agreed that it will be a good
idea to push docker images for each new ovs/ovn stable release. Hence, need
help from maintainers Ben/Mark/Justin/Han to address some open action items
as it is more of org/ownership/rights related:

   1. Get new repo created under docker.io with name either ovs/ovn and
   declare it public repo
   2. How about copy-rights for running images for open source projects
   3. Storage: unlimited or some limited GBs
   4. Naming conventions for docker images ;e.g
   openswitch/ovn:2.13.1_debian or openswitch/ovn:2.13.1_rhel. Similar for
   ovs.


Once this is done, we can bundle docker image changes in the same release
process

Please feel free to add any missing piece.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN RBAC role for ovn-northd?

2019-11-07 Thread aginwala
Hi Ben:

It seems RBAC doc
http://docs.openvswitch.org/en/stable/tutorials/ovn-rbac/#configuring-rbac
only talks
about chassis and not mentioning about northd. I can submit a patch to
update that as a todo for northd and mention the workaround until we add
formal support. Is that ok?




On Thu, Nov 7, 2019 at 12:14 PM Ben Pfaff  wrote:

> Have we documented this?  Should we?
>
> On Thu, Nov 07, 2019 at 10:20:22AM -0800, aginwala wrote:
> > Hi:
> >
> > It is a known fact and have-been discussed before. We use the same
> > workaround as you mentioned. Alternatively, you can also set role="" and
> it
> > will work for both northd and ovn-controller instead of separate
> listeners
> > which is also a security loop-hole. In short, some work is needed here
> > to handle rbac for northd.
> >
> > On Thu, Nov 7, 2019 at 9:47 AM Frode Nordahl <
> frode.nord...@canonical.com>
> > wrote:
> >
> > > Hello all,
> > >
> > > TL;DR; When enabling the `ovn-controller` role on the SB DB
> `ovsdb-server`
> > > listener, `ovn-northd` no longer has the necessary access to do its job
> > > when you are unable to use the local unix socket for its connection to
> the
> > > database.
> > >
> > > AFAICT there is no northd-specifc or admin type role available, have I
> > > missed something?
> > >
> > > I have worked around the issue by enabling a separate listener on a
> > > different port on the Southbound ovsdb-servers so that `ovn-northd` can
> > > connect to that.
> > >
> > >
> > > I have a OVN deployment with central components spread across three
> > > machines, there is an instance of the Northbound and Southbound
> > > `ovsdb-server` on each of them which are clustered, and there is also
> an
> > > instance of `ovn-northd` on each of them.
> > >
> > > The deployment is TLS-enabled and I have enabled RBAC.
> > >
> > > Since the DBs are clustered I have no control of which machine will be
> the
> > > leader, and it may be that one machine has the leader for the
> Northbound DB
> > > and a different machine has the leader of the Southbound DB.
> > >
> > > Because of this ovn-northd is unable to talk to the databases through a
> > > local unix socket and must use a TLS-enabled connection to the DBs, and
> > > herein lies the problem.
> > >
> > >
> > > I peeked at the RBAC implementation, and it appears to me that the
> > > permission system is tied to having specific columns in each table that
> > > maps to the name of the client that wants permission.  On the surface
> this
> > > appears to not fit with `ovn-northd`'s needs as I would think it would
> need
> > > full access to all tables perhaps based on a centrally managed set of
> > > hostnames.
> > >
> > > --
> > > Frode Nordahl
> > >
> > > ___
> > > discuss mailing list
> > > disc...@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> > >
>
> > ___
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN RBAC role for ovn-northd?

2019-11-07 Thread aginwala
Hi:

It is a known fact and have-been discussed before. We use the same
workaround as you mentioned. Alternatively, you can also set role="" and it
will work for both northd and ovn-controller instead of separate listeners
which is also a security loop-hole. In short, some work is needed here
to handle rbac for northd.

On Thu, Nov 7, 2019 at 9:47 AM Frode Nordahl 
wrote:

> Hello all,
>
> TL;DR; When enabling the `ovn-controller` role on the SB DB `ovsdb-server`
> listener, `ovn-northd` no longer has the necessary access to do its job
> when you are unable to use the local unix socket for its connection to the
> database.
>
> AFAICT there is no northd-specifc or admin type role available, have I
> missed something?
>
> I have worked around the issue by enabling a separate listener on a
> different port on the Southbound ovsdb-servers so that `ovn-northd` can
> connect to that.
>
>
> I have a OVN deployment with central components spread across three
> machines, there is an instance of the Northbound and Southbound
> `ovsdb-server` on each of them which are clustered, and there is also an
> instance of `ovn-northd` on each of them.
>
> The deployment is TLS-enabled and I have enabled RBAC.
>
> Since the DBs are clustered I have no control of which machine will be the
> leader, and it may be that one machine has the leader for the Northbound DB
> and a different machine has the leader of the Southbound DB.
>
> Because of this ovn-northd is unable to talk to the databases through a
> local unix socket and must use a TLS-enabled connection to the DBs, and
> herein lies the problem.
>
>
> I peeked at the RBAC implementation, and it appears to me that the
> permission system is tied to having specific columns in each table that
> maps to the name of the client that wants permission.  On the surface this
> appears to not fit with `ovn-northd`'s needs as I would think it would need
> full access to all tables perhaps based on a centrally managed set of
> hostnames.
>
> --
> Frode Nordahl
>
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Running OVS on a Container

2019-10-09 Thread aginwala
Hi:

Also wanted to point out that steps for building/running ovs as a container
are also mentioned in ovs installation doc
https://raw.githubusercontent.com/openvswitch/ovs/1ca0323e7c29dc7ef5a615c265df0460208f92de/Documentation/intro/install/general.rst.
OVS docker
scripts are in
https://github.com/openvswitch/ovs/tree/master/utilities/docker

On Wed, Oct 9, 2019 at 12:24 PM Shivaram Mysore 
wrote:

> Thanks for the information.  I did not know about this.  I had a chance to
> quickly review the links provided and [1].  I could not get a good
> understanding of how this would work *without* a Openstack environment.
> In my work, I focussed on how we could make OVS on containers work without
> depending on other software - ex. running on a basic Container OS like
> CoreOS.  I like the fact that you have DPDK support and am curious to
> better understand the same.
>
> [1]
> https://docs.openstack.org/ocata/networking-guide/deploy-ovs-selfservice.html
>
>
> /Shivaram
>
> On Wed, Oct 9, 2019 at 11:56 AM MEHAN, MUNISH  wrote:
>
>> You can run even OVS-DPDK as container. Here are the build and install
>> details.
>> https://review.opendev.org/#/q/topic:ovsdpdk++status:merged
>>
>>
>> On 10/9/19, 2:42 PM, "ovs-discuss-boun...@openvswitch.org on behalf of
>> Ben Pfaff" 
>> wrote:
>>
>> On Tue, Oct 08, 2019 at 07:35:09PM -0700, Shivaram Mysore wrote:
>> > If you want to run OVS on a container, you can now:
>> >
>> > $ docker pull shivarammysore/ovs
>> >
>> > Source:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_servicefractal_ovs=DwICAg=LFYZ-o9_HUMeMTSQicvjIg=N738y1IimAWlyruQInDOeXAycuJ8SprO4UwKwlj95So=BV5H7MLgsyG6dggZ3sHCCO3YrIHpl81BQ1C4T2DC28M=u9P90lo4XUsaajIaaChWiNO9h2zRJt6ULAZ8B-S_H5g=
>> >
>> > Don't forget to check out the docs directory in the repo where I
>> have a few
>> > more details.
>>
>> Someone said on Twitter, when I posted about this, that the OpenStack
>> Kolla project also runs OVS in a container:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_MichalNasiadka_status_1181807956125474817-3Fs-3D20=DwICAg=LFYZ-o9_HUMeMTSQicvjIg=N738y1IimAWlyruQInDOeXAycuJ8SprO4UwKwlj95So=BV5H7MLgsyG6dggZ3sHCCO3YrIHpl81BQ1C4T2DC28M=PKXAb4I9Ln2vqkXAd5QxLewNuCeAyU55zyje3BNaZVY=
>> This was the first I've heard of that and I wonder whether you've had
>> a
>> chance to look at their implementation?
>> ___
>> discuss mailing list
>> disc...@openvswitch.org
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddiscuss=DwICAg=LFYZ-o9_HUMeMTSQicvjIg=N738y1IimAWlyruQInDOeXAycuJ8SprO4UwKwlj95So=BV5H7MLgsyG6dggZ3sHCCO3YrIHpl81BQ1C4T2DC28M=VAp-PAekbY3wbZJTpmA8o2CAuKl0VKdPKz3-CRYlKpg=
>>
>>
>> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] Hypervisor down during upgrade OVS 2.10.x to 2.10.y

2019-09-06 Thread aginwala
Hi:

Adding correct ovs-discuss ML. I did get a chance to take a look on it a
bit. I think this is the bug in 4.4.0-104-generic kernel on ubuntu 16.04 as
its being discussed on ubuntu forum
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407  where it can
be hit all of a sudden as per the kernel logs shared "unregister_netdevice:
waiting for br0 to become free. Usage count = 1".  Folks are proposing on
the forum to upgrade to higher kernel to get rid of this issue. Upstream
linux proposed relevant fixes @
https://github.com/torvalds/linux/commit/ee60ad219f5c7c4fb2f047f88037770063ef785f
to
address related issues. I guess kernel folks can comment on this more. Not
sure if I missed anything else.

May be we can we can do some improvements in force-reload-kmod where right
now stop_forwarding causes stop ovs-vswitchd but system stalls because br0
(eth0 is added to br0) is busy causing network connectivity loss. Host
recovers only after host restart in current case. Not sure, if we need to
handle this corner case in ovs?

On Wed, Aug 28, 2019 at 2:21 PM Jin, Liang via dev 
wrote:

>
> Hi,
> We upgrade the OVS recently from one version 2.10 to another version
> 2.10.  on some HV upgrade, the HV is down when running force reload kernel.
> In the ovs-ctl log, kill ovs-vswitch is failed, but the script is still
> going to reload the modules.
> ```
> ovsdb-server is running with pid 2431
> ovs-vswitchd is running with pid 2507
> Thu Aug 22 23:13:49 UTC 2019:stop
> 2019-08-22T23:13:50Z|1|fatal_signal|WARN|terminating with signal 14
> (Alarm clock)
> Alarm clock
> 2019-08-22T23:13:51Z|1|fatal_signal|WARN|terminating with signal 14
> (Alarm clock)
> Alarm clock
> * Exiting ovs-vswitchd (2507)
> * Killing ovs-vswitchd (2507)
> * Killing ovs-vswitchd (2507) with SIGKILL
> * Killing ovs-vswitchd (2507) failed
> * Exiting ovsdb-server (2431)
> Thu Aug 22 23:14:58 UTC 2019:load-kmod
> Thu Aug 22 23:14:58 UTC 2019:start --system-id=random --no-full-hostname
> /usr/share/openvswitch/scripts/ovs-ctl: unknown option
> "--no-full-hostname" (use --help for help)
> * Starting ovsdb-server
> * Configuring Open vSwitch system IDs
> * ovs-vswitchd is already running
> * Enabling remote OVSDB managers
> ovsdb-server is running with pid 3860447
> ovs-vswitchd is running with pid 2507
> ovsdb-server is running with pid 3860447
> ovs-vswitchd is running with pid 2507
> Thu Aug 22 23:15:09 UTC 2019:load-kmod
> Thu Aug 22 23:15:09 UTC 2019:force-reload-kmod --system-id=random
> --no-full-hostname
> /usr/share/openvswitch/scripts/ovs-ctl: unknown option
> "--no-full-hostname" (use --help for help)
> * Detected internal interfaces: br-int
> Thu Aug 22 23:37:08 UTC 2019:stop
> 2019-08-22T23:37:09Z|1|fatal_signal|WARN|terminating with signal 14
> (Alarm clock)
> Alarm clock
> 2019-08-22T23:37:10Z|1|fatal_signal|WARN|terminating with signal 14
> (Alarm clock)
> Alarm clock
> * Exiting ovs-vswitchd (2507)
> * Killing ovs-vswitchd (2507)
> * Killing ovs-vswitchd (2507) with SIGKILL
> * Killing ovs-vswitchd (2507) failed
> * Exiting ovsdb-server (3860447)
> Thu Aug 22 23:40:42 UTC 2019:load-kmod
> * Inserting openvswitch module
> Thu Aug 22 23:40:42 UTC 2019:start --system-id=random --no-full-hostname
> /usr/share/openvswitch/scripts/ovs-ctl: unknown option
> "--no-full-hostname" (use --help for help)
> * Starting ovsdb-server
> * Configuring Open vSwitch system IDs
> * Starting ovs-vswitchd
> * Enabling remote OVSDB managers
> ovsdb-server is running with pid 2399
> ovs-vswitchd is running with pid 2440
> ovsdb-server is running with pid 2399
> ovs-vswitchd is running with pid 2440
> Thu Aug 22 23:46:18 UTC 2019:load-kmod
> Thu Aug 22 23:46:18 UTC 2019:force-reload-kmod --system-id=random
> --no-full-hostname
> /usr/share/openvswitch/scripts/ovs-ctl: unknown option
> "--no-full-hostname" (use --help for help)
> * Detected internal interfaces: br-int br0
> * Saving flows
> * Exiting ovsdb-server (2399)
> * Starting ovsdb-server
> * Configuring Open vSwitch system IDs
> * Flush old conntrack entries
> * Exiting ovs-vswitchd (2440)
> * Saving interface configuration
> * Removing datapath: system@ovs-system
> * Removing openvswitch module
> rmmod: ERROR: Module vxlan is in use by: i40e
> * Forcing removal of vxlan module
> * Inserting openvswitch module
> * Starting ovs-vswitchd
> * Restoring saved flows
> * Enabling remote OVSDB managers
> * Restoring interface configuration
> ```
>
> But in kern.log, we see the log as below, the process could not exit
> because waiting br0 release,  and then, the ovs-ctl try to `kill term` and
> `kill -9` the process, it does not work, because kernel is in infinity
> loop.  Then, ovs-ctl try to save the flows, when save flow, core dump
> happened in kernel. Then HV is down until restart it.
> ```
> Aug 22 16:13:45 slx11c-9gjm kernel: [21177057.998961] device br0 left
> promiscuous mode
> Aug 22 16:13:55 slx11c-9gjm kernel: [21177068.044859]
> unregister_netdevice: waiting for br0 to become 

Re: [ovs-discuss] OpenVswitch

2019-08-21 Thread aginwala
Not sure what steps you used to compile and install 2.11. Use  `export
OVS_RUNDIR="/var/run/openvswitch" and then try vsctl commands.



On Wed, Aug 21, 2019 at 2:43 AM V Sai Surya Laxman Rao Bellala <
laxmanraobell...@gmail.com> wrote:

> Hello all,
>
> Can anyone help me in solving this Bug?
> I installed OVS-2.11 latest version and when i am adding the bridge to the
> openvswitch.I am getting the below error.
>
> *ovs-vsctl: unix://var/run/openvswitch/db.sock: database connection failed
> (No such file or directory*
>
> Please help me in solving this problem
>
> Regards
> Laxman
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

2019-06-24 Thread aginwala
Hi:
As per irc meeting discussion, some nice findings were already discussed by
Numan (Thanks for sharing the details).  When changing external_ids for a
claimed port e.g. ovn-nbctl set logical_switch_port sw0-port1
external_ids:foo=bar triggers re-computation on local compute. I do see the
same behavior. Numan is proposing a patch to skip computation for
external_ids column for an already claimed port for port_binding table
because of runtime_data, can't handle change for input SB_port_binding,
fall back to recompute (
https://github.com/openvswitch/ovs/blob/master/ovn/lib/inc-proc-eng.h#L77).
However,  I don't see external_ids in port_binding table for the port being
set explicitly when setting Interface table in the test code that Daniel
posted [1] which could trigger extra re-computation in current test
scenario.

Also ovs-vsctl add-br test will also trigger re-computation on local
compute and yes I can see the same. Since we don't have any handlers for
Ports and Interfaces table similar to port_binding and other handlers @
https://github.com/openvswitch/ovs/blob/master/ovn/controller/ovn-controller.c#L1769,
adding a new bridge also causes re-computation on the local compute. Not
sure if its required immediately because as per the patch shared by Daniel
[1], I don't see any new test bridges getting created  apart from br-int
and hence wont be much impact. Or may be I missed to see if they are also
creating test bridges during testing. Of course, any new ovs-vsctl command
for attaching/detaching vif will sure trigger recompute on br-int as and
when VIF(vm) gets added/deleted to program the flow on local compute.

I didn't get a chance to verify when a chassisredirect port is claimed on a
gateway chassis, it triggers computation on all computes registered with SB
as per code
https://github.com/openvswitch/ovs/blob/master/ovn/controller/binding.c#L722
which was also raises further optimization for chassisredirect flow that
Numan is suggesting.

1.
https://github.com/danalsan/browbeat/commit/0ff72da52ddf17aa9f7269f191eebd890899bdad

On Fri, Jun 21, 2019 at 12:32 AM Han Zhou  wrote:

>
>
> On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique 
> wrote:
> >
> >
> >
> > On Fri, Jun 21, 2019, 11:47 AM Han Zhou  wrote:
> >>
> >>
> >>
> >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez <
> dalva...@redhat.com> wrote:
> >> >
> >> > Thanks a lot Han for the answer!
> >> >
> >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou  wrote:
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara 
> wrote:
> >> > > >
> >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
> >> > > >  wrote:
> >> > > > >
> >> > > > > Hi Han, all,
> >> > > > >
> >> > > > > Lucas, Numan and I have been doing some 'scale' testing of
> OpenStack
> >> > > > > using OVN and wanted to present some results and issues that
> we've
> >> > > > > found with the Incremental Processing feature in
> ovn-controller. Below
> >> > > > > is the scenario that we executed:
> >> > > > >
> >> > > > > * 7 baremetal nodes setup: 3 controllers (running
> >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute
> nodes. OVS
> >> > > > > 2.10.
> >> > > > > * The test consists on:
> >> > > > >   - Create openstack network (OVN LS), subnet and router
> >> > > > >   - Attach subnet to the router and set gw to the external
> network
> >> > > > >   - Create an OpenStack port and apply a Security Group (ACLs
> to allow
> >> > > > > UDP, SSH and ICMP).
> >> > > > >   - Bind the port to one of the 4 compute nodes (randomly) by
> >> > > > > attaching it to a network namespace.
> >> > > > >   - Wait for the port to be ACTIVE in Neutron ('up == True' in
> NB)
> >> > > > >   - Wait until the test can ping the port
> >> > > > > * Running browbeat/rally with 16 simultaneous process to
> execute the
> >> > > > > test above 150 times.
> >> > > > > * When all the 150 'fake VMs' are created, browbeat will delete
> all
> >> > > > > the OpenStack/OVN resources.
> >> > > > >
> >> > > > > We first tried with OVS/OVN 2.10 and pulled some results which
> showed
> >> > > > > 100% success but ovn-controller is quite loaded (as expected)
> in all
> >> > > > > the nodes especially during the deletion phase:
> >> > > > >
> >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
> >> > > > > - Controller node (ovn-northd and ovsdb-servers):
> https://imgur.com/a/8ffKKYF
> >> > > > >
> >> > > > > After conducting the tests above, we replaced ovn-controller in
> all 7
> >> > > > > nodes by the one with the current master branch (actually from
> last
> >> > > > > week). We also replaced ovn-northd and ovsdb-servers but the
> >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The
> expected
> >> > > > > results were to get less ovn-controller CPU usage and also
> better
> >> > > > > times due to the Incremental Processing feature introduced
> recently.
> >> > > > > However, the results don't look very good:
> >> > > > >
> >> > > > > 

Re: [ovs-discuss] Raft issues while removing a node

2018-12-13 Thread aginwala
Hi Ben:
I cannot see the patch series on the patchwork. Is it due to mail server
sync issue or something else? Not sure if its appropriate to try out
https://github.com/blp/ovs-reviews/commits/raft-fixes  since it has the
patches in review in addition to some other patches?

Regards,

On Thu, Nov 15, 2018 at 9:18 AM Ben Pfaff  wrote:

> On Thu, Nov 08, 2018 at 04:17:03PM -0800, ramteja tadishetti wrote:
> > I am facing trouble in graceful removal of node in a 3 Node RAFT setup.
>
> Thanks for the report.  I followed up on it and found a number of bugs
> in the implementation of the "kick" request.  There is a patch series
> out that fixes all of the bugs that I identified:
>
> https://patchwork.ozlabs.org/project/openvswitch/list/?series=76115
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Possible data loss of OVSDB active-backup mode

2018-09-10 Thread aginwala
Cool! Thanks a lot.

On Mon, Sep 10, 2018 at 12:57 AM Numan Siddique  wrote:

>
>
> On Sun, Sep 9, 2018 at 8:38 AM aginwala  wrote:
>
>> Hi:
>>
>> As consented with approach 1, I tested it. DB data is retained even for
>> the continuous fail-over scenario where all 3 nodes are started/stopped at
>> the same time multiple times in a loop. Also, works as expected in the
>> normal failover scenarios.
>>
>> Since you also asked to test failing  process_notification, I did
>> introduce 10 sec sleep after line
>> https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L604
>> which actually resulted in pacemaker failure with unknown error for 2 slave
>> nodes but the function did not report any error messages that I was
>> logging. DB data was still intact since it always promoted the 3rd node as
>> master.
>>
>>
>> Output for above failure test:
>> Online: [ test-pace1-2365293 test-pace2-2365308 test-pace3-2598581 ]
>>
>> Full list of resources:
>>
>>  Master/Slave Set: ovndb_servers-master [ovndb_servers]
>>  ovndb_servers (ocf::ovn:ovndb-servers): FAILED test-pace3-2598581
>> (unmanaged)
>>  ovndb_servers (ocf::ovn:ovndb-servers): FAILED test-pace2-2365308
>> (unmanaged)
>>  Masters: [ test-pace1-2365293 ]
>>
>> Failed Actions:
>> * ovndb_servers_stop_0 on test-pace3-2598581 'unknown error' (1):
>> call=12, status=Timed Out, exitreason='none',
>> last-rc-change='Sat Sep  8 19:22:20 2018', queued=0ms, exec=20003ms
>> * ovndb_servers_stop_0 on test-pace2-2365308 'unknown error' (1):
>> call=12, status=Timed Out, exitreason='none',
>> last-rc-change='Sat Sep  8 19:22:20 2018', queued=0ms, exec=20002ms
>>
>>
>> Another way I tried to intentionally set error to some non-null string
>> that skipped calling process_notification which does wipes out whole db
>> when that node is promoted because of no notification updates. Was this the
>> approach you wanted to test or some other way (correct me if I am wrong)?
>>
>> Also wanted to say, if you can add a info log statement in the formal
>> patch during reset_database function as I used the same in my env which
>> makes clear from log too about the failover behavior.
>>
>
> Thanks for testing it out. I sent a formal patch here adding the log
> message as suggested by you - https://patchwork.ozlabs.org/patch/967888/
>
> Regards
> Numan
>
>
>>
>> As you guys mentioned, not sure what other corner case might have been
>> missed but this patch LGTM overall (safer than the current code that wipes
>> out the db :))
>>
>> Regards,
>>
>> On Wed, Sep 5, 2018 at 1:24 PM Han Zhou  wrote:
>>
>>>
>>>
>>> On Wed, Sep 5, 2018 at 10:44 AM aginwala  wrote:
>>> >
>>> > Thanks Numan:
>>> >
>>> > I will give it shot and update the findings.
>>> >
>>> >
>>> > On Wed, Sep 5, 2018 at 5:35 AM Numan Siddique 
>>> wrote:
>>> >>
>>> >>
>>> >>
>>> >> On Wed, Sep 5, 2018 at 12:42 AM Han Zhou  wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Sun, Sep 2, 2018 at 11:01 PM Numan Siddique 
>>> wrote:
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > On Fri, Aug 10, 2018 at 3:59 AM Ben Pfaff  wrote:
>>> >>> >>
>>> >>> >> On Thu, Aug 09, 2018 at 09:32:21AM -0700, Han Zhou wrote:
>>> >>> >> > On Thu, Aug 9, 2018 at 1:57 AM, aginwala 
>>> wrote:
>>> >>> >> > >
>>> >>> >> > >
>>> >>> >> > > To add on , we are using LB VIP IP and no constraint with 3
>>> nodes as Han
>>> >>> >> > mentioned earlier where active node  have syncs from invalid IP
>>> and rest
>>> >>> >> > two nodes sync from LB VIP IP. Also, I was able to get some
>>> logs from one
>>> >>> >> > node  that triggered:
>>> >>> >> >
>>> https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-server.c#L460
>>> >>> >> > >
>>> >>> >> > > 2018-08-04T01:43:39.914Z|03230|reconnect|DBG|tcp:
>>> 10.189.208.16:50686:
>>> >>> >> > entering RECONNECT
>>> >>> >> > > 2018-08-04T01:43:39.914Z|03231|ovsdb_jsonrpc_

Re: [ovs-discuss] Possible data loss of OVSDB active-backup mode

2018-09-08 Thread aginwala
Hi:

As consented with approach 1, I tested it. DB data is retained even for the
continuous fail-over scenario where all 3 nodes are started/stopped at the
same time multiple times in a loop. Also, works as expected in the normal
failover scenarios.

Since you also asked to test failing  process_notification, I did introduce
10 sec sleep after line
https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L604
which actually resulted in pacemaker failure with unknown error for 2 slave
nodes but the function did not report any error messages that I was
logging. DB data was still intact since it always promoted the 3rd node as
master.


Output for above failure test:
Online: [ test-pace1-2365293 test-pace2-2365308 test-pace3-2598581 ]

Full list of resources:

 Master/Slave Set: ovndb_servers-master [ovndb_servers]
 ovndb_servers (ocf::ovn:ovndb-servers): FAILED test-pace3-2598581
(unmanaged)
 ovndb_servers (ocf::ovn:ovndb-servers): FAILED test-pace2-2365308
(unmanaged)
 Masters: [ test-pace1-2365293 ]

Failed Actions:
* ovndb_servers_stop_0 on test-pace3-2598581 'unknown error' (1): call=12,
status=Timed Out, exitreason='none',
last-rc-change='Sat Sep  8 19:22:20 2018', queued=0ms, exec=20003ms
* ovndb_servers_stop_0 on test-pace2-2365308 'unknown error' (1): call=12,
status=Timed Out, exitreason='none',
last-rc-change='Sat Sep  8 19:22:20 2018', queued=0ms, exec=20002ms


Another way I tried to intentionally set error to some non-null string that
skipped calling process_notification which does wipes out whole db when
that node is promoted because of no notification updates. Was this the
approach you wanted to test or some other way (correct me if I am wrong)?

Also wanted to say, if you can add a info log statement in the formal patch
during reset_database function as I used the same in my env which makes
clear from log too about the failover behavior.

As you guys mentioned, not sure what other corner case might have been
missed but this patch LGTM overall (safer than the current code that wipes
out the db :))

Regards,

On Wed, Sep 5, 2018 at 1:24 PM Han Zhou  wrote:

>
>
> On Wed, Sep 5, 2018 at 10:44 AM aginwala  wrote:
> >
> > Thanks Numan:
> >
> > I will give it shot and update the findings.
> >
> >
> > On Wed, Sep 5, 2018 at 5:35 AM Numan Siddique 
> wrote:
> >>
> >>
> >>
> >> On Wed, Sep 5, 2018 at 12:42 AM Han Zhou  wrote:
> >>>
> >>>
> >>>
> >>> On Sun, Sep 2, 2018 at 11:01 PM Numan Siddique 
> wrote:
> >>> >
> >>> >
> >>> >
> >>> > On Fri, Aug 10, 2018 at 3:59 AM Ben Pfaff  wrote:
> >>> >>
> >>> >> On Thu, Aug 09, 2018 at 09:32:21AM -0700, Han Zhou wrote:
> >>> >> > On Thu, Aug 9, 2018 at 1:57 AM, aginwala 
> wrote:
> >>> >> > >
> >>> >> > >
> >>> >> > > To add on , we are using LB VIP IP and no constraint with 3
> nodes as Han
> >>> >> > mentioned earlier where active node  have syncs from invalid IP
> and rest
> >>> >> > two nodes sync from LB VIP IP. Also, I was able to get some logs
> from one
> >>> >> > node  that triggered:
> >>> >> >
> https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-server.c#L460
> >>> >> > >
> >>> >> > > 2018-08-04T01:43:39.914Z|03230|reconnect|DBG|tcp:
> 10.189.208.16:50686:
> >>> >> > entering RECONNECT
> >>> >> > > 2018-08-04T01:43:39.914Z|03231|ovsdb_jsonrpc_server|INFO|tcp:
> >>> >> > 10.189.208.16:50686: disconnecting (removing OVN_Northbound
> database due to
> >>> >> > server termination)
> >>> >> > > 2018-08-04T01:43:39.932Z|03232|ovsdb_jsonrpc_server|INFO|tcp:
> >>> >> > 10.189.208.21:56160: disconnecting (removing _Server database
> due to server
> >>> >> > termination)
> >>> >> > > 20
> >>> >> > >
> >>> >> > > I am not sure if sync_from on active node too via some invalid
> ip is
> >>> >> > causing some flaw when all are down during the race condition in
> this
> >>> >> > corner case.
> >>> >> > >
> >>> >> > >
> >>> >> > >
> >>> >> > >
> >>> >> > >
> >>> >> > > On Thu, Aug 9, 2018 at 1:35 AM Numan Siddique <
> nusid...@redhat.com> wrote:
> >>> >> > >>
> >>> >> > >>
>

Re: [ovs-discuss] Possible data loss of OVSDB active-backup mode

2018-09-05 Thread aginwala
Thanks Numan:

I will give it shot and update the findings.


On Wed, Sep 5, 2018 at 5:35 AM Numan Siddique  wrote:

>
>
> On Wed, Sep 5, 2018 at 12:42 AM Han Zhou  wrote:
>
>>
>>
>> On Sun, Sep 2, 2018 at 11:01 PM Numan Siddique 
>> wrote:
>> >
>> >
>> >
>> > On Fri, Aug 10, 2018 at 3:59 AM Ben Pfaff  wrote:
>> >>
>> >> On Thu, Aug 09, 2018 at 09:32:21AM -0700, Han Zhou wrote:
>> >> > On Thu, Aug 9, 2018 at 1:57 AM, aginwala  wrote:
>> >> > >
>> >> > >
>> >> > > To add on , we are using LB VIP IP and no constraint with 3 nodes
>> as Han
>> >> > mentioned earlier where active node  have syncs from invalid IP and
>> rest
>> >> > two nodes sync from LB VIP IP. Also, I was able to get some logs
>> from one
>> >> > node  that triggered:
>> >> >
>> https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-server.c#L460
>> >> > >
>> >> > > 2018-08-04T01:43:39.914Z|03230|reconnect|DBG|tcp:
>> 10.189.208.16:50686:
>> >> > entering RECONNECT
>> >> > > 2018-08-04T01:43:39.914Z|03231|ovsdb_jsonrpc_server|INFO|tcp:
>> >> > 10.189.208.16:50686: disconnecting (removing OVN_Northbound
>> database due to
>> >> > server termination)
>> >> > > 2018-08-04T01:43:39.932Z|03232|ovsdb_jsonrpc_server|INFO|tcp:
>> >> > 10.189.208.21:56160: disconnecting (removing _Server database due
>> to server
>> >> > termination)
>> >> > > 20
>> >> > >
>> >> > > I am not sure if sync_from on active node too via some invalid ip
>> is
>> >> > causing some flaw when all are down during the race condition in this
>> >> > corner case.
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Thu, Aug 9, 2018 at 1:35 AM Numan Siddique 
>> wrote:
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >> On Thu, Aug 9, 2018 at 1:07 AM Ben Pfaff  wrote:
>> >> > >>>
>> >> > >>> On Wed, Aug 08, 2018 at 12:18:10PM -0700, Han Zhou wrote:
>> >> > >>> > On Wed, Aug 8, 2018 at 11:24 AM, Ben Pfaff 
>> wrote:
>> >> > >>> > >
>> >> > >>> > > On Wed, Aug 08, 2018 at 12:37:04AM -0700, Han Zhou wrote:
>> >> > >>> > > > Hi,
>> >> > >>> > > >
>> >> > >>> > > > We found an issue in our testing (thanks aginwala) with
>> >> > active-backup
>> >> > >>> > mode
>> >> > >>> > > > in OVN setup.
>> >> > >>> > > > In the 3 node setup with pacemaker, after stopping
>> pacemaker on
>> >> > all
>> >> > >>> > three
>> >> > >>> > > > nodes (simulate a complete shutdown), and then if starting
>> all of
>> >> > them
>> >> > >>> > > > simultaneously, there is a good chance that the whole DB
>> content
>> >> > gets
>> >> > >>> > lost.
>> >> > >>> > > >
>> >> > >>> > > > After studying the replication code, it seems there is a
>> phase
>> >> > that the
>> >> > >>> > > > backup node deletes all its data and wait for data to be
>> synced
>> >> > from the
>> >> > >>> > > > active node:
>> >> > >>> > > >
>> >> >
>> https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L306
>> >> > >>> > > >
>> >> > >>> > > > At this state, if the node was set to active, then all
>> data is
>> >> > gone for
>> >> > >>> > the
>> >> > >>> > > > whole cluster. This can happen in different situations. In
>> the
>> >> > test
>> >> > >>> > > > scenario mentioned above it is very likely to happen, since
>> >> > pacemaker
>> >> > >>> > just
>> >> > >>> > > > randomly select one as master, not knowing 

Re: [ovs-discuss] Possible data loss of OVSDB active-backup mode

2018-08-09 Thread aginwala
To add on , we are using LB VIP IP and no constraint with 3 nodes as Han
mentioned earlier where active node  have syncs from invalid IP and rest
two nodes sync from LB VIP IP. Also, I was able to get some logs from one
node  that triggered:
https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-server.c#L460

2018-08-04T01:43:39.914Z|03230|reconnect|DBG|tcp:10.189.208.16:50686:
entering RECONNECT
2018-08-04T01:43:39.914Z|03231|ovsdb_jsonrpc_server|INFO|tcp:
10.189.208.16:50686: disconnecting (removing OVN_Northbound database due to
server termination)
2018-08-04T01:43:39.932Z|03232|ovsdb_jsonrpc_server|INFO|tcp:
10.189.208.21:56160: disconnecting (removing _Server database due to server
termination)
20

I am not sure if sync_from on active node too via some invalid ip is
causing some flaw when all are down during the race condition in this
corner case.





On Thu, Aug 9, 2018 at 1:35 AM Numan Siddique  wrote:

>
>
> On Thu, Aug 9, 2018 at 1:07 AM Ben Pfaff  wrote:
>
>> On Wed, Aug 08, 2018 at 12:18:10PM -0700, Han Zhou wrote:
>> > On Wed, Aug 8, 2018 at 11:24 AM, Ben Pfaff  wrote:
>> > >
>> > > On Wed, Aug 08, 2018 at 12:37:04AM -0700, Han Zhou wrote:
>> > > > Hi,
>> > > >
>> > > > We found an issue in our testing (thanks aginwala) with
>> active-backup
>> > mode
>> > > > in OVN setup.
>> > > > In the 3 node setup with pacemaker, after stopping pacemaker on all
>> > three
>> > > > nodes (simulate a complete shutdown), and then if starting all of
>> them
>> > > > simultaneously, there is a good chance that the whole DB content
>> gets
>> > lost.
>> > > >
>> > > > After studying the replication code, it seems there is a phase that
>> the
>> > > > backup node deletes all its data and wait for data to be synced
>> from the
>> > > > active node:
>> > > >
>> https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L306
>> > > >
>> > > > At this state, if the node was set to active, then all data is gone
>> for
>> > the
>> > > > whole cluster. This can happen in different situations. In the test
>> > > > scenario mentioned above it is very likely to happen, since
>> pacemaker
>> > just
>> > > > randomly select one as master, not knowing the internal sync state
>> of
>> > each
>> > > > node. It could also happen when failover happens right after a new
>> > backup
>> > > > is started, although less likely in real environment, so starting up
>> > node
>> > > > one by one may largely reduce the probability.
>> > > >
>> > > > Does this analysis make sense? We will do more tests to verify the
>> > > > conclusion, but would like to share with community for discussions
>> and
>> > > > suggestions. Once this happens it is very critical - even more
>> serious
>> > than
>> > > > just no HA. Without HA it is just control plane outage, but this
>> would
>> > be
>> > > > data plane outage because OVS flows will be removed accordingly
>> since
>> > the
>> > > > data is considered as deleted from ovn-controller point of view.
>> > > >
>> > > > We understand that active-standby is not the ideal HA mechanism and
>> > > > clustering is the future, and we are also testing the clustering
>> with
>> > the
>> > > > latest patch. But it would be good if this problem can be addressed
>> with
>> > > > some quick fix, such as keep a copy of the old data somewhere until
>> the
>> > > > first sync finishes?
>> > >
>> > > This does seem like a plausible bug, and at first glance I believe
>> that
>> > > you're correct about the race here.  I guess that the correct behavior
>> > > must be to keep the original data until a new copy of the data has
>> been
>> > > received, and only then atomically replace the original by the new.
>> > >
>> > > Is this something you have time and ability to fix?
>> >
>> > Thanks Ben for quick response. I guess I will not have time until I send
>> > out next series for incremental processing :)
>> > It would be good if someone can help and then please reply this email if
>> > he/she starts working on it so that we will not end up with overlapping
>> > work.
>>
>
> I will give a shot at fixing this i

Re: [ovs-discuss] ovs-appctl to monitor HVs sb connection status

2018-07-09 Thread aginwala
Thanks Ben and Han for the suggestions and clarification:

So will stick around with ovs-appctl -t ovn-controller rconn/show for
individual HVs considering current scope.

For checking all HVs connection stats from central node, we can pick it up
as a new feature going further.


On Mon, Jul 9, 2018 at 8:30 PM Ben Pfaff  wrote:

> On Mon, Jul 09, 2018 at 06:12:11PM -0700, Han Zhou wrote:
> > On Mon, Jul 9, 2018 at 3:37 PM, Ben Pfaff  wrote:
> > >
> > > On Sun, Jul 08, 2018 at 01:09:12PM -0700, aginwala wrote:
> > > > As per discussions in past OVN meetings regarding ovn monitoring
> stand
> > > > point, need some clarity from design perspective. I am thinking of
> below
> > > > approaches:
> > > >
> > > > 1. Can we implement something like ovs-appctl -t 
> > chassis-conn/list
> > > > that will show all HVs stats (connected/non-connected)?
> > >
> > > You're interested particularly in which chassis are connected to the
> ovn
> > > southbound database?  The db server only knows who is connected to it
> if
> > > they provide SSL certificates.  It might not be too hard to get it to
> > > report the common name (CN) of the SSL certificates for the clients
> > > connected to it.  Would that suffice?
> > >
> > > > 2. or  on individual HVs using ovs-appctl -t ovn-controller
> > > > chassis-conn/list ?
> > >
> > > The HVs definitely don't know who is connected to the sbdb server.
> > > ___
> > > discuss mailing list
> > > disc...@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >
> > Discussion in the meeting was about showing connection status to SB-DB
> on a
> > given HV, and for that HV only.
> > So I think a new command like "ovs-appctl -t ovn-controller rconn/show"
> > should be enough.
>
> I forgot the context.  If that's enough, it will work.
>
> We don't currently have a good way to do general-purpose monitoring or
> configuration of ovn-controller.  What little we do now, we do through
> the ovs-vswitchd database.  If we want something more extensive, maybe
> it should have its own hypervisor-local database.
>
> > The suggestion from Ben is a good one if we are trying to check all HV
> > connections status from central node point of view.
>
> I don't think it should be very hard (but sometimes OpenSSL makes easy
> things difficult).
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] ovs-appctl to monitor HVs sb connection status

2018-07-08 Thread aginwala
Hi:

As per discussions in past OVN meetings regarding ovn monitoring stand
point, need some clarity from design perspective. I am thinking of below
approaches:

1. Can we implement something like ovs-appctl -t  chassis-conn/list
that will show all HVs stats (connected/non-connected)?

2. or  on individual HVs using ovs-appctl -t ovn-controller
chassis-conn/list ?

For now, we can manually verify with couple of ways on individual HV using
netstat, controller logs, etc. However, it makes sense to have some feature
to quickly get a glimpse of whats connected and what not in a large env
setup having many HV counts instead of checking individual chassis.

Hence, need some agreement/suggestions before starting the implementation
along with some other alternatives if in mind.



Regards,
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-12 Thread aginwala
On Fri, May 11, 2018 at 5:21 PM, Han Zhou <zhou...@gmail.com> wrote:

> Thanks for the output. It appears to be more complex than I thought
> before. It is good that the new slave doesn't listen on 6641, although I am
> not sure how is it achieved. I guess a stop has been triggered
> instead of simply demote, but I need to spend some time on the pacemaker
> state machine. And please ignore my comment about calling
> ovsdb_server_start() in demote - it would cause recursive call since
> ovsdb_server_start() calls demote(), too.
>
> Regarding the change:
>  if [ "x${present_master}" = x ]; then
> +set $@ --db-nb-create-insecure-remote=yes
> +set $@ --db-sb-create-insecure-remote=yes
>  # No master detected, or the previous master is not among the
>  # set starting.
>
> >>> Sure. Makes sense. Thanks for review.


> This "if" branch is when there is no master present, but in fact we want
> it to be set when current node is master. So this change doesn't affect
> anything. It is the below change that made the test work (so that on slave
> node the tcp port is not opened):
>  elif [ ${present_master} != ${host_name} ]; then
> +set $@ --db-nb-create-insecure-remote=no
> +set $@ --db-sb-create-insecure-remote=no
>
> The error log of ovsdb should not be skipped. We should never bind the LB
> VIP on the ovsdb socket because it is not on the host. I think it is
> related to the code in ovsdb_server_notify():
> ovn-nbctl -- --id=@conn_uuid create Connection \
> target="p${NB_MASTER_PROTO}\:${NB_MASTER_PORT}\:${MASTER_IP}" \
> inactivity_probe=$INACTIVE_PROBE -- set NB_Global . connections=@conn_uuid
>
> >>>>
Thanks for the pointer,
I am able to fix the socket util error by skipping target both for nb and
sb db LB use case. Also, it is getting stamped if we use vritual IPaddr2
heartbeart resource of OCF too with existing feature using L2 VIP IP under
same subnet. Hence, it needs to be skipped for both cases any ways. May be
do we need to handle that in same commit or in different one?
+if [ "x${LISTEN_ON_MASTER_IP_ONLY}" = xyes ]; then
+ovn-nbctl -- --id=@conn_uuid create Connection \
+inactivity_probe=$INACTIVE_PROBE -- set NB_Global . connections=@conn_uuid
+else


> When using LB, we should set 0.0.0.0 here.
>
> Also, the failed action is a concern. We may dig more on the root cause.
> Thanks for finding these issues.
>
>>>> For crm move I am able to now see actual error which resulted in
failed action where move is triggering self replication for a bit.

*2018-05-12T20:08:21.687Z|00021|ovsdb_error|ERR|unexpected ovsdb error:
Server ID check failed: Self replicating is not allowed*

However, functionality is intact.May be due to some race condition via
pacemaker state machine as you pointed out where in crm resource move use
case needs to be handled explicitly? However, reboot node/ service
pacemaker/corosync restart, etc.  do not result in self replicating issues
while promoting the new node. Will also try to see if I can find something
more.


>
> Thanks,
> Han
>
>
> On Fri, May 11, 2018 at 3:29 PM, aginwala <aginw...@asu.edu> wrote:
>
>> Sure:
>>
>> *VIP_ip* = 10.149.4.252
>> *LB IP* = 10.149.0.40
>> *slave netstat where it syncs from master LB VIP IP *
>> #netstat -an | grep 6641
>> tcp0  0 10.169.129.34:47426 10.149.4.252:6641
>>  ESTABLISHED
>> tcp0  0 10.169.129.34:47444 10.149.4.252:6641
>>  ESTABLISHED
>>
>> *Slave OVS:, *
>> # ps aux |grep ovsdb-server
>> root  7388  0.0  0.0  18048   376 ?Ss   14:08   0:00
>> ovsdb-server: monitoring pid 7389 (healthy)
>> root  7389  0.0  0.0  18464  4556 ?S14:08   0:00
>> ovsdb-server -vconsole:off -vfile:info 
>> --log-file=/var/log/openvswitch/ovsdb-server-nb.log
>> --remote=punix:/var/run/openvswitch/ovnnb_db.sock
>> --pidfile=/var/run/openvswitch/ovnnb_db.pid --unixctl=ovnnb_db.ctl
>> --detach --monitor --remote=db:OVN_Northbound,NB_Global,connections
>> --private-key=db:OVN_Northbound,SSL,private_key
>> --certificate=db:OVN_Northbound,SSL,certificate
>> --ca-cert=db:OVN_Northbound,SSL,ca_cert 
>> --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols
>> --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers --sync-from=tcp:
>> 10.149.4.252:6641 /etc/openvswitch/ovnnb_db.db
>> root  7397  0.0  0.0  18048   372 ?Ss   14:08   0:00
>> ovsdb-server: monitoring pid 7398 (healthy)
>> root  7398  0.0  0.0  18868  5280 ?S14:08   0:01
>> ovsdb-server -vconsole:off -vfile:info 
>> -

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-11 Thread aginwala
Sure:

*VIP_ip* = 10.149.4.252
*LB IP* = 10.149.0.40
*slave netstat where it syncs from master LB VIP IP *
#netstat -an | grep 6641
tcp0  0 10.169.129.34:47426 10.149.4.252:6641
 ESTABLISHED
tcp0  0 10.169.129.34:47444 10.149.4.252:6641
 ESTABLISHED

*Slave OVS:, *
# ps aux |grep ovsdb-server
root  7388  0.0  0.0  18048   376 ?Ss   14:08   0:00
ovsdb-server: monitoring pid 7389 (healthy)
root  7389  0.0  0.0  18464  4556 ?S14:08   0:00
ovsdb-server -vconsole:off -vfile:info
--log-file=/var/log/openvswitch/ovsdb-server-nb.log
--remote=punix:/var/run/openvswitch/ovnnb_db.sock
--pidfile=/var/run/openvswitch/ovnnb_db.pid --unixctl=ovnnb_db.ctl --detach
--monitor --remote=db:OVN_Northbound,NB_Global,connections
--private-key=db:OVN_Northbound,SSL,private_key
--certificate=db:OVN_Northbound,SSL,certificate
--ca-cert=db:OVN_Northbound,SSL,ca_cert
--ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols
--ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers --sync-from=tcp:
10.149.4.252:6641 /etc/openvswitch/ovnnb_db.db
root  7397  0.0  0.0  18048   372 ?Ss   14:08   0:00
ovsdb-server: monitoring pid 7398 (healthy)
root  7398  0.0  0.0  18868  5280 ?S14:08   0:01
ovsdb-server -vconsole:off -vfile:info
--log-file=/var/log/openvswitch/ovsdb-server-sb.log
--remote=punix:/var/run/openvswitch/ovnsb_db.sock
--pidfile=/var/run/openvswitch/ovnsb_db.pid --unixctl=ovnsb_db.ctl --detach
--monitor --remote=db:OVN_Southbound,SB_Global,connections
--private-key=db:OVN_Southbound,SSL,private_key
--certificate=db:OVN_Southbound,SSL,certificate
--ca-cert=db:OVN_Southbound,SSL,ca_cert
--ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
--ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers --sync-from=tcp:
10.149.4.252:6642 /etc/openvswitch/ovnsb_db.db

*Master netstat where connections is established with LB :*
netstat -an | grep 6641
tcp0  0 0.0.0.0:66410.0.0.0:*   LISTEN
tcp0  0 10.169.129.33:6641  10.149.0.40:47426
 ESTABLISHED
tcp0  0 10.169.129.33:6641  10.149.0.40:47444
 ESTABLISHED

*Master OVS:*
# ps aux | grep ovsdb-server
root  3318  0.0  0.0  12940  1012 pts/0S+   15:23   0:00 grep
--color=auto ovsdb-server
root 11648  0.0  0.0  18048   372 ?Ss   14:08   0:00
ovsdb-server: monitoring pid 11649 (healthy)
root 11649  0.0  0.0  18312  4208 ?S14:08   0:01
ovsdb-server -vconsole:off -vfile:info
--log-file=/var/log/openvswitch/ovsdb-server-nb.log
--remote=punix:/var/run/openvswitch/ovnnb_db.sock
--pidfile=/var/run/openvswitch/ovnnb_db.pid --unixctl=ovnnb_db.ctl --detach
--monitor --remote=db:OVN_Northbound,NB_Global,connections
--private-key=db:OVN_Northbound,SSL,private_key
--certificate=db:OVN_Northbound,SSL,certificate
--ca-cert=db:OVN_Northbound,SSL,ca_cert
--ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols
--ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers --remote=ptcp:6641:0.0.0.0
--sync-from=tcp:192.0.2.254:6641 /etc/openvswitch/ovnnb_db.db
root 11657  0.0  0.0  18048   376 ?Ss   14:08   0:00
ovsdb-server: monitoring pid 11658 (healthy)
root 11658  0.0  0.0  19340  5552 ?S14:08   0:01
ovsdb-server -vconsole:off -vfile:info
--log-file=/var/log/openvswitch/ovsdb-server-sb.log
--remote=punix:/var/run/openvswitch/ovnsb_db.sock
--pidfile=/var/run/openvswitch/ovnsb_db.pid --unixctl=ovnsb_db.ctl --detach
--monitor --remote=db:OVN_Southbound,SB_Global,connections
--private-key=db:OVN_Southbound,SSL,private_key
--certificate=db:OVN_Southbound,SSL,certificate
--ca-cert=db:OVN_Southbound,SSL,ca_cert
--ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
--ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers --remote=ptcp:6642:0.0.0.0
--sync-from=tcp:192.0.2.254:6642 /etc/openvswitch/ovnsb_db.db



Same is for 6642 for sb db. Hope it's clear. Sorry did not post in the
previous message as I thought you already got the point  :) .



Regards,
Aliasgar


On Fri, May 11, 2018 at 3:16 PM, Han Zhou <zhou...@gmail.com> wrote:

> Ali, could you share output of "ps | grep ovsdb" and "netstat -lpn | grep
> 6641" on the new slave node after you do "crm resource move"?
>
> On Fri, May 11, 2018 at 2:25 PM, aginwala <aginw...@asu.edu> wrote:
>
>> Thanks Han for more suggestions:
>>
>>
>> I did test failover by gracefully stopping pacemaker+corosync on master
>> node along with crm move and it works as expected too as crm move is
>> triggering promote of new master and hence the new master gets elected
>> along with slave getting demoted as expected to listen on sync-from node.
>> Hence, whatever code change I posted earlier is well and good.
>>
>> # crm stat
>> Stack: corosync
>> Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with
>> quorum
>> 2 nodes and 2 resources configured
>&

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-11 Thread aginwala
Thanks Han for more suggestions:


I did test failover by gracefully stopping pacemaker+corosync on master
node along with crm move and it works as expected too as crm move is
triggering promote of new master and hence the new master gets elected
along with slave getting demoted as expected to listen on sync-from node.
Hence, whatever code change I posted earlier is well and good.

# crm stat
Stack: corosync
Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with
quorum
2 nodes and 2 resources configured

Online: [ test-pace1-2365293 test-pace2-2365308 ]

Full list of resources:

 Master/Slave Set: ovndb_servers-master [ovndb_servers]
 Masters: [ test-pace2-2365308 ]
 Slaves: [ test-pace1-2365293 ]

#crm --debug resource move ovndb_servers test-pace1-2365293
DEBUG: pacemaker version: [err: ][out: CRM Version: 1.1.14 (70404b0)]
DEBUG: found pacemaker version: 1.1.14
DEBUG: invoke: crm_resource --quiet --move -r 'ovndb_servers'
--node='test-pace1-2365293'
# crm stat

Stack: corosync
Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with
quorum
2 nodes and 2 resources configured

Online: [ test-pace1-2365293 test-pace2-2365308 ]

Full list of resources:

 Master/Slave Set: ovndb_servers-master [ovndb_servers]
 Masters: [ test-pace1-2365293 ]
 Slaves: [ test-pace2-2365308 ]

Failed Actions:
* ovndb_servers_monitor_1 on test-pace2-2365308 'master' (8): call=46,
status=complete, exitreason='none',
last-rc-change='Fri May 11 14:08:35 2018', queued=0ms, exec=83ms

Note: Failed Actions warning only comes for crm move command and not using
reboot/kill/service pacemaker/corosync stop/start

I cleaned up the warning using below commad:
#crm_resource -P
Waiting for 1 replies from the CRMd. OK

Also wanted to call out above findings noticed that ocf_attribute_target is
not getting called as per pacemaker logs as code says it will not work for
older pacemaker versions and not sure what versions exactly as I am on
version 1.1.14
# pacemaker logs
 notice: operation_finished: ovndb_servers_monitor_1:7561:stderr [
/usr/lib/ocf/resource.d/ovn/ovndb-servers: line 31: ocf_attribute_target:
command not found ]


# Also need nb db logs are showing socket util errors which I think need a
code change too to skip stamping it as functionality is still working as
expected (may be in a separate commit since its ovsdb change)
018-05-11T21:14:25.958Z|00560|socket_util|ERR|6641:10.149.4.252: bind:
Cannot assign requested address
2018-05-11T21:14:25.958Z|00561|socket_util|ERR|6641:10.149.4.252: bind:
Cannot assign requested address
2018-05-11T21:14:27.859Z|00562|socket_util|ERR|6641:10.149.4.252: bind:
Cannot assign requested address



Let me know for any suggestions further.


Regards,
Aliasgar


On Thu, May 10, 2018 at 3:49 PM, Han Zhou <zhou...@gmail.com> wrote:

> Good progress!
>
> I think at least one more change is needed to ensure when demote happens,
> the TCP port is shut down. Otherwise, the LB will be confused again and
> can't figure out which one is active. This is the graceful failover
> scenario which can be tested by crm resource move instead of reboot/killing
> process.
>
> This may be done by the same approach you did for promote, i.e. stop ovsdb
> and then call ovsdb_server_start() so the parameters are reset correctly
> before starting. Alternatively we can add a command in ovsdb-server, in
> addition to the commands that switches to/from active/backup modes, to
> open/close the TCP ports, to avoid restarting during failover, but I am not
> sure if this is valuable. It depends on whether restarting ovsdb-server
> during failover is sufficient enough. Could you add the restart logic for
> demote and try more? Thanks!
>
> Thanks,
> Han
>
> On Thu, May 10, 2018 at 1:54 PM, aginwala <aginw...@asu.edu> wrote:
>
>> Hi :
>>
>> Just to further update, I am able to re-open tcp port for failover
>> scenario when new master is getting promoted with additional code changes
>> as below which do require stop of ovs service on the new selected master to
>> reset the tcp settings:
>>
>>
>> diff --git a/ovn/utilities/ovndb-servers.ocf
>> b/ovn/utilities/ovndb-servers.ocf
>> index 164b6bc..8cb4c25 100755
>> --- a/ovn/utilities/ovndb-servers.ocf
>> +++ b/ovn/utilities/ovndb-servers.ocf
>> @@ -295,8 +295,8 @@ ovsdb_server_start() {
>>
>>  set ${OVN_CTL}
>>
>> -set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
>> -set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
>> +set $@ --db-nb-port=${NB_MASTER_PORT}
>> +set $@ --db-sb-port=${SB_MASTER_PORT}
>>
>>  if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
>>  set $@ --db-nb-create-insecure-remote=yes
>> @@ -307,6 +307,8 @@ ovsdb_server_start() {

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-10 Thread aginwala
Hi :

Just to further update, I am able to re-open tcp port for failover scenario
when new master is getting promoted with additional code changes as below
which do require stop of ovs service on the new selected master to reset
the tcp settings:


diff --git a/ovn/utilities/ovndb-servers.ocf
b/ovn/utilities/ovndb-servers.ocf
index 164b6bc..8cb4c25 100755
--- a/ovn/utilities/ovndb-servers.ocf
+++ b/ovn/utilities/ovndb-servers.ocf
@@ -295,8 +295,8 @@ ovsdb_server_start() {

 set ${OVN_CTL}

-set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
-set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
+set $@ --db-nb-port=${NB_MASTER_PORT}
+set $@ --db-sb-port=${SB_MASTER_PORT}

 if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
 set $@ --db-nb-create-insecure-remote=yes
@@ -307,6 +307,8 @@ ovsdb_server_start() {
 fi

 if [ "x${present_master}" = x ]; then
+set $@ --db-nb-create-insecure-remote=yes
+set $@ --db-sb-create-insecure-remote=yes
 # No master detected, or the previous master is not among the
 # set starting.
 #
@@ -316,6 +318,8 @@ ovsdb_server_start() {
 set $@ --db-nb-sync-from-addr=${INVALID_IP_ADDRESS}
--db-sb-sync-from-addr=${INVALID_IP_ADDRESS}

 elif [ ${present_master} != ${host_name} ]; then
+set $@ --db-nb-create-insecure-remote=no
+set $@ --db-sb-create-insecure-remote=no
 # An existing master is active, connect to it
 set $@ --db-nb-sync-from-addr=${MASTER_IP}
--db-sb-sync-from-addr=${MASTER_IP}
 set $@ --db-nb-sync-from-port=${NB_MASTER_PORT}
@@ -416,6 +420,8 @@ ovsdb_server_promote() {
 ;;
 esac

+${OVN_CTL} stop_ovsdb
+ovsdb_server_start
 ${OVN_CTL} promote_ovnnb
 ${OVN_CTL} promote_ovnsb



Below are the scenarios tested:
MasterSlaveScenarioResult

   -


   -

reboot/failure New master gets promoted with tcp ports enabled to start
taking LB traffic.

   -


   -

reboot/failure
No change and current master continues taking traffic with slave continue
to sync from master.

   -


   -

reboot/failure
New master gets promoted with tcp ports enabled to start taking LB traffic.

Also sync on slaves from master works as expected:
# On master
ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add  556
# on slave port is shutdown as expected
ovn-nbctl --db=tcp:10.169.129.34:6641 show
ovn-nbctl: tcp:10.169.129.34:6641: database connection failed (Connection
refused)
# on slave local unix socket, above lswitch 556 gets replicated too as
--sync-from=tcp:10.149.4.252:6641
ovn-nbctl show
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)

# Same testing for sb db too
# Slave port 6642 is shutdown too
ovn-sbctl --db=tcp:10.169.129.34:6642 show hangs and
# Using master ip works
 ovn-sbctl --db=tcp:10.169.129.33:6642 show
Chassis "21f12bd6-e9e8-4ee2-afeb-28b331df6715"
hostname: "test-pace2-2365308.lvs02.dev.ebayc3.com"
Encap geneve
ip: "10.169.129.34"
options: {csum="true"}



# Accessing via LB vip works fine too as only one member is active:
for i in `seq 1 500`; do ovn-sbctl --db=tcp:10.149.4.252:6642 show; done
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)


Everything works fine as expected. Let me know for any corner case missed.
I will submit a formal patch using LISTEN_ON_MASTER_IP_ONLY for using LB
with tcp  to avoid breaking existing functionality accordingly.



Regards,
Aliasgar



On Thu, May 10, 2018 at 9:55 AM, aginwala <aginw...@asu.edu> wrote:

> Thanks folks for suggestions:
>
> For LB vip configurations, I did  the testing further and yes it does
> tries to hit the slave db as per the logs below and fails as slave do not
> have write permission of which LB is not aware of:
> for i in `seq 1 500`; do ovn-nbctl --db=tcp:10.149.4.252:6641 ls-add
> $i590;done
> ovn-nbctl: transaction error: {"details":"insert operation not allowed
> when database server is in read only mode","error":"not allowed"}
> ovn-nbctl: transaction error: {"details":"insert operation not allowed
> when database server is in read only mode","error":"not allowed"}
> ovn-nbctl: transaction error: {"details":"insert operation not allowed
> when database server is in read only mode","error":"not allowed"}
>
> Hence, with little more code changes(in the same patch without the flag
> variable suggestion), I am able to shutdown the tcp port on the slave and
> it works fine as below:
> #Master Node
> # ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add test444
> #Slave Node
> # ovn-nbctl --db=t

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-10 Thread aginwala
Thanks folks for suggestions:

For LB vip configurations, I did  the testing further and yes it does tries
to hit the slave db as per the logs below and fails as slave do not have
write permission of which LB is not aware of:
for i in `seq 1 500`; do ovn-nbctl --db=tcp:10.149.4.252:6641 ls-add
$i590;done
ovn-nbctl: transaction error: {"details":"insert operation not allowed when
database server is in read only mode","error":"not allowed"}
ovn-nbctl: transaction error: {"details":"insert operation not allowed when
database server is in read only mode","error":"not allowed"}
ovn-nbctl: transaction error: {"details":"insert operation not allowed when
database server is in read only mode","error":"not allowed"}

Hence, with little more code changes(in the same patch without the flag
variable suggestion), I am able to shutdown the tcp port on the slave and
it works fine as below:
#Master Node
# ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add test444
#Slave Node
# ovn-nbctl --db=tcp:10.169.129.34:6641 ls-add test444
ovn-nbctl: tcp:10.169.129.34:6641: database connection failed (Connection
refused)

Code to shutdown tcp port on slave db along with only master listening on
tcp ports:
diff --git a/ovn/utilities/ovndb-servers.ocf
b/ovn/utilities/ovndb-servers.ocf
index 164b6bc..b265df6 100755
--- a/ovn/utilities/ovndb-servers.ocf
+++ b/ovn/utilities/ovndb-servers.ocf
@@ -295,8 +295,8 @@ ovsdb_server_start() {

 set ${OVN_CTL}

-set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
-set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
+set $@ --db-nb-port=${NB_MASTER_PORT}
+set $@ --db-sb-port=${SB_MASTER_PORT}

 if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
 set $@ --db-nb-create-insecure-remote=yes
@@ -307,6 +307,8 @@ ovsdb_server_start() {
 fi

 if [ "x${present_master}" = x ]; then
+set $@ --db-nb-create-insecure-remote=yes
+set $@ --db-sb-create-insecure-remote=yes
 # No master detected, or the previous master is not among the
 # set starting.
 #
@@ -316,6 +318,8 @@ ovsdb_server_start() {
 set $@ --db-nb-sync-from-addr=${INVALID_IP_ADDRESS}
--db-sb-sync-from-addr=${INVALID_IP_ADDR

 elif [ ${present_master} != ${host_name} ]; then
+set $@ --db-nb-create-insecure-remote=no
+set $@ --db-sb-create-insecure-remote=no


But I noticed that if the slave becomes active post failover after active
node reboot/failure, pacemaker shows it online but I am not able to access
the dbs.

# crm status
Online: [ test-pace2-2365308 ]
OFFLINE: [ test-pace1-2365293 ]

Full list of resources:

 Master/Slave Set: ovndb_servers-master [ovndb_servers]
 Masters: [ test-pace2-2365308 ]
 Stopped: [ test-pace1-2365293 ]


# ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add test444
ovn-nbctl: tcp:10.169.129.33:6641: database connection failed (Connection
refused)
# ovn-nbctl --db=tcp:10.169.129.34:6641 ls-add test444
ovn-nbctl: tcp:10.169.129.34:6641: database connection failed (Connection
refused)

Hence, if failover happens, slave is already running with
--sync-from=lbVIP:6641/6642 for nb and sb db respectively. Thus, re-opening
of tcp ports for nb and sb db on the slave that is getting promoted to
master is not happening automatically.

Let me know if there is a valid way/approach too which I am missing to
handle it during slave promote logic?  Will do further code changes
accordingly.

Note: Current code changes for use with LB will needs to be handled for ssl
too. Will have to handle that separately but want to get the tcp working
first and we can add ssl support later.


Regards,
Aliasgar

On Wed, May 9, 2018 at 12:19 PM, Numan Siddique <nusid...@redhat.com> wrote:

>
>
> On Thu, May 10, 2018 at 12:44 AM, Han Zhou <zhou...@gmail.com> wrote:
>
>>
>>
>> On Wed, May 9, 2018 at 11:51 AM, Numan Siddique <nusid...@redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, May 10, 2018 at 12:15 AM, Han Zhou <zhou...@gmail.com> wrote:
>>>
>>>> Thanks Ali for the quick patch. Please see my comments inline.
>>>>
>>>> On Wed, May 9, 2018 at 9:30 AM, aginwala <aginw...@asu.edu> wrote:
>>>> >
>>>> > Thanks Han and Numan for the clarity to help sort it out.
>>>> >
>>>> > For making vip work with using LB in my two node setup, I had changed
>>>> below code to skip setting master IP  when creating pcs resource for ovndbs
>>>> and listen on 0.0.0.0 instead. Hence, the discussion seems inline with the
>>>> code change which is small for sure as below:
>>>> >
>>>> >
>>>> > diff --git a/ovn/utilities/ovndb-servers.ocf
>>&g

Re: [ovs-discuss] Question to OVN DB pacemaker script

2018-05-09 Thread aginwala
Thanks Han and Numan for the clarity to help sort it out.

For making vip work with using LB in my two node setup, I had changed below
code to skip setting master IP  when creating pcs resource for ovndbs and
listen on 0.0.0.0 instead. Hence, the discussion seems inline with the code
change which is small for sure as below:


diff --git a/ovn/utilities/ovndb-servers.ocf b/ovn/utilities/ovndb-servers.
ocf
index 164b6bc..d4c9ad7 100755
--- a/ovn/utilities/ovndb-servers.ocf
+++ b/ovn/utilities/ovndb-servers.ocf
@@ -295,8 +295,8 @@ ovsdb_server_start() {

 set ${OVN_CTL}

-set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
-set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
+set $@ --db-nb-port=${NB_MASTER_PORT}
+set $@ --db-sb-port=${SB_MASTER_PORT}

 if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
 set $@ --db-nb-create-insecure-remote=yes


Results:
# accessing via LB VIP
ovn-nbctl --db=tcp:10.149.7.56:6641 show
switch bb130c99-a00d-43cf-b40a-9c6fb1df5ed7 (ls666)
ovn-nbctl --db=tcp:10.149.7.56:6641 ls-add ls55
# accessing via active node pool member
root@test-pace2-2365308:~# ovn-nbctl --db=tcp:10.169.129.33:6641 show
switch bb130c99-a00d-43cf-b40a-9c6fb1df5ed7 (ls666)
switch 41922d23-3430-436d-b67a-00422367a653 (ls55)
# accessing using standby node pool member
root@test-pace2-2365308:~# ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add
lss
ovn-nbctl: transaction error: {"details":"insert operation not allowed when
database serv
# using connect string and skip using VIP resource just for reading db and
not for writing.
ovn-nbctl --db=tcp:10.169.129.34:6641,tcp:10.169.129.33:6641 show

I am pointing northd and ovn-controller to the db vip which works as
expected too.

For northd, we can use local unix socket too which is valid as I have
tested both ways by keeping it running on both nodes. I think its just a
personal pref to use vip or unix socket as both are valid for northd. I
think that we might need to update the documentation too with above details.

I will send a formal patch along with documentation update. Let me know if
there are other suggestions too in case anything is missed.


Regards,
Aliasgar


On Wed, May 9, 2018 at 9:18 AM, Han Zhou  wrote:

>
>
> On Wed, May 9, 2018 at 9:02 AM, Numan Siddique 
> wrote:
>
>>
>>
>> On Wed, May 9, 2018 at 9:02 PM, Han Zhou  wrote:
>>
>>> Hi Numan,
>>>
>>> Thanks you so much for the detailed answer! Please see my comments
>>> inline.
>>>
>>> On Wed, May 9, 2018 at 7:41 AM, Numan Siddique 
>>> wrote:
>>>
 Hi Han,

 Please see below for inline comments

 On Wed, May 9, 2018 at 5:17 AM, Han Zhou  wrote:

> Hi Babu/Numan,
>
> I have a question regarding OVN pacemaker OCF script.
> I see in the script MASTER_IP is used to start the active DB and
> standby DBs will use that IP to sync from.
>
> In the Documentation/topics/integration.rst it is also mentioned:
>
> `master_ip` is the IP address on which the active database server is
> expected to be listening, the slave node uses it to connect to the master
> node.
>
> However, since active node will change after failover, I wonder if we
> should provide all the IPs of each nodes, and let pacemaker to decide 
> which
> IP is the master IP to be used, dynamically.
>



> I see in the documentation it is mentioned about using the IPAddr2
> resource for virtual IP. Does it indicate that we should use the virtual 
> IP
> as the master IP?
>

 That is true. If the master ip is not virtual ip, then we will not be
 able to figure out which is the master node. We need to configure
 networking-ovn and ovn-controller to point to the right master node so that
 they can do write transactions on the DB.

 Below is how we have configured pacemaker OVN HA dbs in tripleo
 openstack deployment

  - Tripleo deployment creates many virtual IPs (using IPAddr2) and
 these IP addresses are frontend IPs for keystone and all other openstack
 API services and haproxy is used to load balance the traffic (the
 deployment will mostly have 3 controllers and all the openstack API
 services will be running on each node).

  - We choose one of the IPaddr2 virtual ip and we set a colocation
 constraint when creating the OVN pacemaker HA db resource i.e we ask
 pacemaker to promote the ovsdb-servers running in the node configured with
 the virtual ip (i.e master_ip).  Pacemaker will call the promote action [1]
 on the node where master ip is configured.

 - tripleo configures "ovn_nb_connection=tcp:VIP:6641" and "
 ovn_sb_connection=tcp:VIP:6642" in neutron.conf and runs "ovs-vsctl
 set open . external_ids:ovn-remote=tcp:VIP:6642" on all the nodes
 where ovn-controller service is 

Re: [ovs-discuss] raft ovsdb clustering

2018-04-04 Thread aginwala
Cool! Yup Makes sense for sandbox northd also to point to clustered nb/sb
dbs.

On Wed, Apr 4, 2018 at 4:01 PM, Ben Pfaff <b...@ovn.org> wrote:

> Oh, I see, from reading further in the thread, that this was indeed a
> misunderstanding.  Well, in any case that new option to ovs-sandbox can
> be useful.
>
> On Wed, Apr 04, 2018 at 04:00:20PM -0700, Ben Pfaff wrote:
> > I would like to support cluster-wide locks.  They require extra work and
> > they require new OVSDB JSON-RPC protocol design (because locks are
> > currently per-server, not per-database).  I do not currently have a
> > schedule for designing and implementing them.
> >
> > However, I am surprised that this is an issue for northd.  For a
> > clustered database, ovn-northd always connects to the cluster leader.
> > There is at most one leader in the cluster at a given time, so as long
> > as ovn-northd obtains a lock on the leader, this should ensure that only
> > one ovn-northd is active at a time.  There could be brief races, in
> > which two ovn-northds believe that they have the lock, but they should
> > not persist.
> >
> > You see different behavior, so there is a bug or a misunderstanding.
> > I don't see the same misbehavior, though, when I do a similar test in
> > the sandbox.  If you apply the patches I just posted:
> > https://patchwork.ozlabs.org/patch/895184/
> > https://patchwork.ozlabs.org/patch/895185/
> > then you can try it out with:
> > make sandbox SANDBOXFLAGS='--ovn --sbdb-model=clustered
> --n-northds=3'
> >
> > On Wed, Mar 21, 2018 at 01:12:48PM -0700, aginwala wrote:
> > > :) The only thing is while using pacemaker, if the node that pacemaker
> if
> > > pointing to is down, all the active/standby northd nodes have to be
> updated
> > > to new node from the cluster. But will dig in more to see what else I
> can
> > > find.
> > >
> > > @Ben: Any suggestions further?
> > >
> > >
> > > Regards,
> > >
> > > On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou <zhou...@gmail.com> wrote:
> > >
> > > >
> > > >
> > > > On Wed, Mar 21, 2018 at 9:49 AM, aginwala <aginw...@asu.edu> wrote:
> > > >
> > > >> Thanks Numan:
> > > >>
> > > >> Yup agree with the locking part. For now; yes I am running northd
> on one
> > > >> node. I might right a script to monitor northd  in cluster so that
> if the
> > > >> node where it's running goes down, script can spin up northd on one
> other
> > > >> active nodes as a dirty hack.
> > > >>
> > > >> The "dirty hack" is pacemaker :)
> > > >
> > > >
> > > >> Sure, will await for the inputs from Ben too on this and see how
> complex
> > > >> would it be to roll out this feature.
> > > >>
> > > >>
> > > >> Regards,
> > > >>
> > > >>
> > > >> On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique <
> nusid...@redhat.com>
> > > >> wrote:
> > > >>
> > > >>> Hi Aliasgar,
> > > >>>
> > > >>> ovsdb-server maintains locks per each connection and not across
> the db.
> > > >>> A workaround for you now would be to configure all the ovn-northd
> instances
> > > >>> to connect to one ovsdb-server if you want to have active/standy.
> > > >>>
> > > >>> Probably Ben can answer if there is a plan to support ovsdb locks
> across
> > > >>> the db. We also need this support in networking-ovn as it also
> uses ovsdb
> > > >>> locks.
> > > >>>
> > > >>> Thanks
> > > >>> Numan
> > > >>>
> > > >>>
> > > >>> On Wed, Mar 21, 2018 at 1:40 PM, aginwala <aginw...@asu.edu>
> wrote:
> > > >>>
> > > >>>> Hi Numan:
> > > >>>>
> > > >>>> Just figured out that ovn-northd is running as active on all 3
> nodes
> > > >>>> instead of one active instance as I continued to test further
> which results
> > > >>>> in db errors as per logs.
> > > >>>>
> > > >>>>
> > > >>>> # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs
> in
> > > >>>> ovn-north
> > > &

Re: [ovs-discuss] raft ovsdb clustering

2018-03-27 Thread aginwala
Sure:


#Node1

/usr/share/openvswitch/scripts/ovn-ctl  --db-nb-addr=192.168.220.101
--db-nb-port=6641 --db-nb-cluster-local-addr=tcp:192.168.220.101:6645
--db-nb-create-insecure-remote=yes start_nb_ovsdb

/usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.220.101
--db-sb-port=6642 --db-sb-create-insecure-remote=yes
--db-sb-cluster-local-addr=tcp:192.168.10.220:6644 start_sb_ovsdb

ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:192.168.
220.101:6641,tcp:192.168.220.102:6641,tcp:192.168.220.103:6641" --ovnsb-db="
tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp:192.168.220.103:6642"
--no-chdir --log-file=/var/log/openvswitch/ovn-northd.log
--pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor

#Node2

/usr/share/openvswitch/scripts/ovn-ctl  --db-nb-addr=192.168.220.102
--db-nb-port=6641 --db-nb-cluster-local-addr=tcp:192.168.220.102:6645
--db-nb-cluster-remote-addr="tcp:192.168.220.101:6645"
--db-nb-create-insecure-remote=yes start_nb_ovsdb

/usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.220.102
--db-sb-port=6642 --db-sb-create-insecure-remote=yes
--db-sb-cluster-local-addr="tcp:192.168.220.102:6644"
--db-sb-cluster-remote-addr="tcp:192.168.220.101:6644"  start_sb_ovsdb

ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:192.168.
220.101:6641,tcp:192.168.220.102:6641,tcp:192.168.220.103:6641" --ovnsb-db="
tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp:192.168.220.103:6642"
--no-chdir --log-file=/var/log/openvswitch/ovn-northd.log
--pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor


#Node3

/usr/share/openvswitch/scripts/ovn-ctl  --db-nb-addr=192.168.220.103
--db-nb-port=6641 --db-nb-cluster-local-addr=tcp:192.168.220.103:6645
--db-nb-cluster-remote-addr="tcp:192.168.220.101:6645"
--db-nb-create-insecure-remote=yes start_nb_ovsdb

/usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.220.103
--db-sb-port=6642 --db-sb-create-insecure-remote=yes
--db-sb-cluster-local-addr="tcp:192.168.220.103:6644"
--db-sb-cluster-remote-addr="tcp:192.168.220.101:6644"  start_sb_ovsdb

ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:
192.168.220.101:6641,tcp:192.168.220.102:6641,tcp:192.168.220.103:6641"
--ovnsb-db="tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp:
192.168.220.103:6642" --no-chdir --log-file=/var/log/openvswitch/ovn-northd.log
--pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor

#.export remote="tcp:192.168.220.103:6641,tcp:192.168.220.102:6641,tcp:
192.168.220.101:6641"

#. ovn-nbctl show can be done using command below

ovn-nbctl --db=$remote show

#.ovn-sbctl commands can be run as below:

ovn-sbctl --db=$remote show


Regards,



On Tue, Mar 27, 2018 at 12:08 PM, Numan Siddique <nusid...@redhat.com>
wrote:

> Thanks Aliasgar,
>
> I am still facing the same issue.
>
> Can you also share the (ovn-ctl) commands you used to start/join the
> ovsdb-server clusters in your nodes ?
>
> Thanks
> Numan
>
>
> On Tue, Mar 27, 2018 at 11:04 PM, aginwala <aginw...@asu.edu> wrote:
>
>> Hu Numan:
>>
>> You need to use --db as you are now running db in cluster, you can access
>> data from any of the three dbs.
>>
>> So if the leader crashes, it re-elects from the other two. Below is the
>> e.g. command:
>>
>> # export remote="tcp:192.168.220.103:6641,tcp:192.168.220.102:6641,tcp:
>> 192.168.220.101:6641"
>> # kill -9 3985
>> # ovn-nbctl --db=$remote show
>> switch 1d86ab4e-c8bf-4747-a716-8832a285d58c (ls1)
>> # ovn-nbctl --db=$remote ls-del ls1
>>
>>
>>
>>
>>
>>
>>
>> Hope it helps!
>>
>> Regards,
>>
>>
>> On Tue, Mar 27, 2018 at 10:01 AM, Numan Siddique <nusid...@redhat.com>
>> wrote:
>>
>>> Hi Aliasgar,
>>>
>>> In your setup, if you kill the leader what is the behaviour ?  Are you
>>> still able to create or delete any resources ? Is a new leader elected ?
>>>
>>> In my setup, the command "ovn-nbctl ls-add" for example blocks until I
>>> restart the ovsdb-server in node 1. And I don't see any other ovsdb-server
>>> becoming leader. May be I have configured wrongly.
>>> Could you please test this scenario if not yet please and let me know
>>> your observations if possible.
>>>
>>> Thanks
>>> Numan
>>>
>>>
>>> On Thu, Mar 22, 2018 at 12:28 PM, Han Zhou <zhou...@gmail.com> wrote:
>>>
>>>> Sounds good.
>>>>
>>>> Just checked the patch, by default the C IDL has "leader_only" as true,
>>>>

Re: [ovs-discuss] raft ovsdb clustering

2018-03-27 Thread aginwala
Hu Numan:

You need to use --db as you are now running db in cluster, you can access
data from any of the three dbs.

So if the leader crashes, it re-elects from the other two. Below is the
e.g. command:

# export remote="tcp:192.168.220.103:6641,tcp:192.168.220.102:6641,tcp:
192.168.220.101:6641"
# kill -9 3985
# ovn-nbctl --db=$remote show
switch 1d86ab4e-c8bf-4747-a716-8832a285d58c (ls1)
# ovn-nbctl --db=$remote ls-del ls1







Hope it helps!

Regards,


On Tue, Mar 27, 2018 at 10:01 AM, Numan Siddique <nusid...@redhat.com>
wrote:

> Hi Aliasgar,
>
> In your setup, if you kill the leader what is the behaviour ?  Are you
> still able to create or delete any resources ? Is a new leader elected ?
>
> In my setup, the command "ovn-nbctl ls-add" for example blocks until I
> restart the ovsdb-server in node 1. And I don't see any other ovsdb-server
> becoming leader. May be I have configured wrongly.
> Could you please test this scenario if not yet please and let me know your
> observations if possible.
>
> Thanks
> Numan
>
>
> On Thu, Mar 22, 2018 at 12:28 PM, Han Zhou <zhou...@gmail.com> wrote:
>
>> Sounds good.
>>
>> Just checked the patch, by default the C IDL has "leader_only" as true,
>> which ensures that connection is to leader only. This is the case for
>> northd. So the lock works for northd hot active-standby purpose if all the
>> ovsdb endpoints of a cluster are specified to northd, since all northds are
>> connecting to the same DB, the leader.
>>
>> For neutron networking-ovn, this may not work yet, since I didn't see
>> such logic in the python IDL in current patch series. It would be good if
>> we add similar logic for python IDL. (@ben/numan, correct me if I am wrong)
>>
>>
>> On Wed, Mar 21, 2018 at 6:49 PM, aginwala <aginw...@asu.edu> wrote:
>>
>>> Hi :
>>>
>>> Just sorted out the correct settings and northd also works in ha in raft.
>>>
>>> There were 2 issues in the setup:
>>> 1. I had started nb db without --db-nb-create-insecure-remote
>>> 2. I also started northd locally on all 3 without remote which is like
>>> all three northd trying to lock the ovsdb locally.
>>>
>>> Hence, the duplicate logs were populated in the southbound datapath due
>>> to multiple northd trying to write the local copy.
>>>
>>> So, I now start nb db with --db-nb-create-insecure-remote and northd on
>>> all 3 nodes using below command:
>>>
>>> ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:
>>> 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"
>>> --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
>>> 10.148.181.162:6642" --no-chdir 
>>> --log-file=/var/log/openvswitch/ovn-northd.log
>>> --pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor
>>>
>>>
>>> #At start, northd went active on the leader node and standby on other
>>> two nodes.
>>>
>>> #After old leader crashed and new leader got elected, northd goes active
>>> on any of the remaining 2 nodes as per sample logs below from non-leader
>>> node:
>>> 2018-03-22T00:20:30.732Z|00023|ovn_northd|INFO|ovn-northd lock lost.
>>> This ovn-northd instance is now on standby.
>>> 2018-03-22T00:20:30.743Z|00024|ovn_northd|INFO|ovn-northd lock
>>> acquired. This ovn-northd instance is now active.
>>>
>>> # Also ovn-controller works similar way if leader goes down and connects
>>> to any of the remaining 2 nodes:
>>> 2018-03-22T01:21:56.250Z|00029|ovsdb_idl|INFO|tcp:10.148.181.162:6642:
>>> clustered database server is disconnected from cluster; trying another
>>> server
>>> 2018-03-22T01:21:56.250Z|00030|reconnect|INFO|tcp:10.148.181.162:6642:
>>> connection attempt timed out
>>> 2018-03-22T01:21:56.250Z|00031|reconnect|INFO|tcp:10.148.181.162:6642:
>>> waiting 4 seconds before reconnect
>>> 2018-03-22T01:23:52.417Z|00043|reconnect|INFO|tcp:10.148.181.162:6642:
>>> connected
>>>
>>>
>>>
>>> Above settings will also work if we put all the nodes behind the vip and
>>> updates the ovn configs to use vips. So we don't need pacemaker explicitly
>>> for northd HA :).
>>>
>>> Since the setup is complete now, I will populate the same in scale test
>>> env and see how it behaves.
>>>
>>> @Numan: We can try the same with networking-ovn integration and see if
>>> we find anything weird there too. Not sure

Re: [ovs-discuss] raft ovsdb clustering

2018-03-21 Thread aginwala
Hi :

Just sorted out the correct settings and northd also works in ha in raft.

There were 2 issues in the setup:
1. I had started nb db without --db-nb-create-insecure-remote
2. I also started northd locally on all 3 without remote which is like all
three northd trying to lock the ovsdb locally.

Hence, the duplicate logs were populated in the southbound datapath due to
multiple northd trying to write the local copy.

So, I now start nb db with --db-nb-create-insecure-remote and northd on all
3 nodes using below command:

ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:
10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"
--ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
10.148.181.162:6642" --no-chdir --log-file=/var/log/openvswitch/ovn-northd.log
--pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor


#At start, northd went active on the leader node and standby on other two
nodes.

#After old leader crashed and new leader got elected, northd goes active on
any of the remaining 2 nodes as per sample logs below from non-leader node:
2018-03-22T00:20:30.732Z|00023|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.
2018-03-22T00:20:30.743Z|00024|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.

# Also ovn-controller works similar way if leader goes down and connects to
any of the remaining 2 nodes:
2018-03-22T01:21:56.250Z|00029|ovsdb_idl|INFO|tcp:10.148.181.162:6642:
clustered database server is disconnected from cluster; trying another
server
2018-03-22T01:21:56.250Z|00030|reconnect|INFO|tcp:10.148.181.162:6642:
connection attempt timed out
2018-03-22T01:21:56.250Z|00031|reconnect|INFO|tcp:10.148.181.162:6642:
waiting 4 seconds before reconnect
2018-03-22T01:23:52.417Z|00043|reconnect|INFO|tcp:10.148.181.162:6642:
connected



Above settings will also work if we put all the nodes behind the vip and
updates the ovn configs to use vips. So we don't need pacemaker explicitly
for northd HA :).

Since the setup is complete now, I will populate the same in scale test env
and see how it behaves.

@Numan: We can try the same with networking-ovn integration and see if we
find anything weird there too. Not sure if you have any exclusive findings
for this case.

Let me know if something else is missed here.




Regards,

On Wed, Mar 21, 2018 at 2:50 PM, Han Zhou <zhou...@gmail.com> wrote:

> Ali, sorry if I misunderstand what you are saying, but pacemaker here is
> for northd HA. pacemaker itself won't point to any ovsdb cluster node. All
> northds can point to a LB VIP for the ovsdb cluster, so if a member of
> ovsdb cluster is down it won't have impact to northd.
>
> Without clustering support of the ovsdb lock, I think this is what we have
> now for northd HA. Please suggest if anyone has any other idea. Thanks :)
>
> On Wed, Mar 21, 2018 at 1:12 PM, aginwala <aginw...@asu.edu> wrote:
>
>> :) The only thing is while using pacemaker, if the node that pacemaker if
>> pointing to is down, all the active/standby northd nodes have to be updated
>> to new node from the cluster. But will dig in more to see what else I can
>> find.
>>
>> @Ben: Any suggestions further?
>>
>>
>> Regards,
>>
>> On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou <zhou...@gmail.com> wrote:
>>
>>>
>>>
>>> On Wed, Mar 21, 2018 at 9:49 AM, aginwala <aginw...@asu.edu> wrote:
>>>
>>>> Thanks Numan:
>>>>
>>>> Yup agree with the locking part. For now; yes I am running northd on
>>>> one node. I might right a script to monitor northd  in cluster so that if
>>>> the node where it's running goes down, script can spin up northd on one
>>>> other active nodes as a dirty hack.
>>>>
>>>> The "dirty hack" is pacemaker :)
>>>
>>>
>>>> Sure, will await for the inputs from Ben too on this and see how
>>>> complex would it be to roll out this feature.
>>>>
>>>>
>>>> Regards,
>>>>
>>>>
>>>> On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique <nusid...@redhat.com>
>>>> wrote:
>>>>
>>>>> Hi Aliasgar,
>>>>>
>>>>> ovsdb-server maintains locks per each connection and not across the
>>>>> db. A workaround for you now would be to configure all the ovn-northd
>>>>> instances to connect to one ovsdb-server if you want to have 
>>>>> active/standy.
>>>>>
>>>>> Probably Ben can answer if there is a plan to support ovsdb locks
>>>>> across the db. We also need this support in networking-ovn as it also

Re: [ovs-discuss] raft ovsdb clustering

2018-03-21 Thread aginwala
:) The only thing is while using pacemaker, if the node that pacemaker if
pointing to is down, all the active/standby northd nodes have to be updated
to new node from the cluster. But will dig in more to see what else I can
find.

@Ben: Any suggestions further?


Regards,

On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou <zhou...@gmail.com> wrote:

>
>
> On Wed, Mar 21, 2018 at 9:49 AM, aginwala <aginw...@asu.edu> wrote:
>
>> Thanks Numan:
>>
>> Yup agree with the locking part. For now; yes I am running northd on one
>> node. I might right a script to monitor northd  in cluster so that if the
>> node where it's running goes down, script can spin up northd on one other
>> active nodes as a dirty hack.
>>
>> The "dirty hack" is pacemaker :)
>
>
>> Sure, will await for the inputs from Ben too on this and see how complex
>> would it be to roll out this feature.
>>
>>
>> Regards,
>>
>>
>> On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique <nusid...@redhat.com>
>> wrote:
>>
>>> Hi Aliasgar,
>>>
>>> ovsdb-server maintains locks per each connection and not across the db.
>>> A workaround for you now would be to configure all the ovn-northd instances
>>> to connect to one ovsdb-server if you want to have active/standy.
>>>
>>> Probably Ben can answer if there is a plan to support ovsdb locks across
>>> the db. We also need this support in networking-ovn as it also uses ovsdb
>>> locks.
>>>
>>> Thanks
>>> Numan
>>>
>>>
>>> On Wed, Mar 21, 2018 at 1:40 PM, aginwala <aginw...@asu.edu> wrote:
>>>
>>>> Hi Numan:
>>>>
>>>> Just figured out that ovn-northd is running as active on all 3 nodes
>>>> instead of one active instance as I continued to test further which results
>>>> in db errors as per logs.
>>>>
>>>>
>>>> # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs in
>>>> ovn-north
>>>> 2018-03-21T06:01:59.442Z|7|ovsdb_idl|WARN|transaction error:
>>>> {"details":"Transaction causes multiple rows in \"Datapath_Binding\" table
>>>> to have identical values (1) for index on column \"tunnel_key\".  First
>>>> row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was inserted by
>>>> this transaction.  Second row, with UUID 
>>>> 8e06f919-4cc7-4ffc-9a79-20ce6663b683,
>>>> existed in the database before this transaction and was not modified by the
>>>> transaction.","error":"constraint violation"}
>>>>
>>>> In southbound datapath list, 2 duplicate records gets created for same
>>>> switch.
>>>>
>>>> # ovn-sbctl list Datapath
>>>> _uuid   : b270ae30-3458-445f-95d2-b14e8ebddd01
>>>> external_ids: 
>>>> {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
>>>> name="ls2"}
>>>> tunnel_key  : 2
>>>>
>>>> _uuid   : 8e06f919-4cc7-4ffc-9a79-20ce6663b683
>>>> external_ids: 
>>>> {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
>>>> name="ls2"}
>>>> tunnel_key  : 1
>>>>
>>>>
>>>>
>>>> # on nodes 1 and 2 where northd is running, it gives below error:
>>>> 2018-03-21T06:01:59.437Z|8|ovsdb_idl|WARN|transaction error:
>>>> {"details":"cannot delete Datapath_Binding row
>>>> 8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining
>>>> reference(s)","error":"referential integrity violation"}
>>>>
>>>> As per commit message, for northd I re-tried setting --ovnnb-db="tcp:
>>>> 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"
>>>> and --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
>>>> 10.148.181.162:6642" and it did not help either.
>>>>
>>>> There is no issue if I keep running only one instance of northd on any
>>>> of these 3 nodes. Hence, wanted to know is there something else
>>>> missing here to make only one northd instance as active and rest as
>>>> standby?
>>>>
>>>>
>>>> Regards,
>>>>
>>>> On Thu, Mar 15, 2018 at 3:09 AM, Numan Siddique <nusid...@redhat.com>
>>>> wrote:
>

Re: [ovs-discuss] raft ovsdb clustering

2018-03-21 Thread aginwala
Thanks Numan:

Yup agree with the locking part. For now; yes I am running northd on one
node. I might right a script to monitor northd  in cluster so that if the
node where it's running goes down, script can spin up northd on one other
active nodes as a dirty hack.

Sure, will await for the inputs from Ben too on this and see how complex
would it be to roll out this feature.


Regards,


On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique <nusid...@redhat.com> wrote:

> Hi Aliasgar,
>
> ovsdb-server maintains locks per each connection and not across the db. A
> workaround for you now would be to configure all the ovn-northd instances
> to connect to one ovsdb-server if you want to have active/standy.
>
> Probably Ben can answer if there is a plan to support ovsdb locks across
> the db. We also need this support in networking-ovn as it also uses ovsdb
> locks.
>
> Thanks
> Numan
>
>
> On Wed, Mar 21, 2018 at 1:40 PM, aginwala <aginw...@asu.edu> wrote:
>
>> Hi Numan:
>>
>> Just figured out that ovn-northd is running as active on all 3 nodes
>> instead of one active instance as I continued to test further which results
>> in db errors as per logs.
>>
>>
>> # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs in
>> ovn-north
>> 2018-03-21T06:01:59.442Z|7|ovsdb_idl|WARN|transaction error:
>> {"details":"Transaction causes multiple rows in \"Datapath_Binding\" table
>> to have identical values (1) for index on column \"tunnel_key\".  First
>> row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was inserted by
>> this transaction.  Second row, with UUID 
>> 8e06f919-4cc7-4ffc-9a79-20ce6663b683,
>> existed in the database before this transaction and was not modified by the
>> transaction.","error":"constraint violation"}
>>
>> In southbound datapath list, 2 duplicate records gets created for same
>> switch.
>>
>> # ovn-sbctl list Datapath
>> _uuid   : b270ae30-3458-445f-95d2-b14e8ebddd01
>> external_ids: {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
>> name="ls2"}
>> tunnel_key  : 2
>>
>> _uuid   : 8e06f919-4cc7-4ffc-9a79-20ce6663b683
>> external_ids: {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
>> name="ls2"}
>> tunnel_key  : 1
>>
>>
>>
>> # on nodes 1 and 2 where northd is running, it gives below error:
>> 2018-03-21T06:01:59.437Z|8|ovsdb_idl|WARN|transaction error:
>> {"details":"cannot delete Datapath_Binding row
>> 8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining
>> reference(s)","error":"referential integrity violation"}
>>
>> As per commit message, for northd I re-tried setting --ovnnb-db="tcp:
>> 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"
>> and --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
>> 10.148.181.162:6642" and it did not help either.
>>
>> There is no issue if I keep running only one instance of northd on any of
>> these 3 nodes. Hence, wanted to know is there something else missing
>> here to make only one northd instance as active and rest as standby?
>>
>>
>> Regards,
>>
>> On Thu, Mar 15, 2018 at 3:09 AM, Numan Siddique <nusid...@redhat.com>
>> wrote:
>>
>>> That's great
>>>
>>> Numan
>>>
>>>
>>> On Thu, Mar 15, 2018 at 2:57 AM, aginwala <aginw...@asu.edu> wrote:
>>>
>>>> Hi Numan:
>>>>
>>>> I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu 16.04)with
>>>> fresh installation and it worked super fine for both sb and nb dbs. Seems
>>>> like some kernel issue on the previous nodes when I re-installed raft patch
>>>> as I was running different ovs version on those nodes before.
>>>>
>>>>
>>>> For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp:
>>>> 10.169.125.131:6642, tcp:10.148.181.162:6642"  and started controller
>>>> and it works super fine.
>>>>
>>>>
>>>> Did some failover testing by rebooting/killing the leader (
>>>> 10.169.125.152) and bringing it back up and it works as expected.
>>>> Nothing weird noted so far.
>>>>
>>>> # check-cluster gives below data one of the node(10.148.181.162) post
>>>> leader failure
>>>>
>>>> ovsdb-tool check-cluster /et

Re: [ovs-discuss] raft ovsdb clustering

2018-03-21 Thread aginwala
Hi Numan:

Just figured out that ovn-northd is running as active on all 3 nodes
instead of one active instance as I continued to test further which results
in db errors as per logs.


# on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs in
ovn-north
2018-03-21T06:01:59.442Z|7|ovsdb_idl|WARN|transaction error:
{"details":"Transaction causes multiple rows in \"Datapath_Binding\" table
to have identical values (1) for index on column \"tunnel_key\".  First
row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was inserted by this
transaction.  Second row, with UUID 8e06f919-4cc7-4ffc-9a79-20ce6663b683,
existed in the database before this transaction and was not modified by the
transaction.","error":"constraint violation"}

In southbound datapath list, 2 duplicate records gets created for same
switch.

# ovn-sbctl list Datapath
_uuid   : b270ae30-3458-445f-95d2-b14e8ebddd01
external_ids:
{logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d", name="ls2"}
tunnel_key  : 2

_uuid   : 8e06f919-4cc7-4ffc-9a79-20ce6663b683
external_ids:
{logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d", name="ls2"}
tunnel_key  : 1



# on nodes 1 and 2 where northd is running, it gives below error:
2018-03-21T06:01:59.437Z|8|ovsdb_idl|WARN|transaction error:
{"details":"cannot delete Datapath_Binding row
8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining
reference(s)","error":"referential integrity violation"}

As per commit message, for northd I re-tried setting --ovnnb-db="tcp:
10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"  and
--ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
10.148.181.162:6642" and it did not help either.

There is no issue if I keep running only one instance of northd on any of
these 3 nodes. Hence, wanted to know is there something else missing here
to make only one northd instance as active and rest as standby?


Regards,

On Thu, Mar 15, 2018 at 3:09 AM, Numan Siddique <nusid...@redhat.com> wrote:

> That's great
>
> Numan
>
>
> On Thu, Mar 15, 2018 at 2:57 AM, aginwala <aginw...@asu.edu> wrote:
>
>> Hi Numan:
>>
>> I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu 16.04)with
>> fresh installation and it worked super fine for both sb and nb dbs. Seems
>> like some kernel issue on the previous nodes when I re-installed raft patch
>> as I was running different ovs version on those nodes before.
>>
>>
>> For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp:
>> 10.169.125.131:6642, tcp:10.148.181.162:6642"  and started controller
>> and it works super fine.
>>
>>
>> Did some failover testing by rebooting/killing the leader (10.169.125.152)
>> and bringing it back up and it works as expected. Nothing weird noted so
>> far.
>>
>> # check-cluster gives below data one of the node(10.148.181.162) post
>> leader failure
>>
>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
>> ovsdb-tool: leader /etc/openvswitch/ovnsb_db.db for term 2 has log
>> entries only up to index 18446744073709551615, but index 9 was committed in
>> a previous term (e.g. by /etc/openvswitch/ovnsb_db.db)
>>
>>
>> For check-cluster, are we planning to add more output showing which node
>> is active(leader), etc in upcoming versions ?
>>
>>
>> Thanks a ton for helping sort this out.  I think the patch looks good to
>> be merged post addressing of the comments by Justin along with the man page
>> details for ovsdb-tool.
>>
>>
>> I will do some more crash testing for the cluster along with the scale
>> test and keep you posted if something unexpected is noted.
>>
>>
>>
>> Regards,
>>
>>
>>
>> On Tue, Mar 13, 2018 at 11:07 PM, Numan Siddique <nusid...@redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Mar 14, 2018 at 7:51 AM, aginwala <aginw...@asu.edu> wrote:
>>>
>>>> Sure.
>>>>
>>>> To add on , I also ran for nb db too using different port  and Node2
>>>> crashes with same error :
>>>> # Node 2
>>>> /usr/share/openvswitch/scripts/ovn-ctl --db-nb-addr=10.99.152.138
>>>> --db-nb-port=6641 --db-nb-cluster-remote-addr="tcp:10.99.152.148:6645"
>>>> --db-nb-cluster-local-addr="tcp:10.99.152.138:6645" start_nb_ovsdb
>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db: cannot
>>>> identify file type
>>>>
>>>>
>>&g

Re: [ovs-discuss] raft ovsdb clustering

2018-03-14 Thread aginwala
Hi Numan:

I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu 16.04)with fresh
installation and it worked super fine for both sb and nb dbs. Seems like
some kernel issue on the previous nodes when I re-installed raft patch as I
was running different ovs version on those nodes before.


For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp:
10.169.125.131:6642, tcp:10.148.181.162:6642"  and started controller and
it works super fine.


Did some failover testing by rebooting/killing the leader (10.169.125.152)
and bringing it back up and it works as expected. Nothing weird noted so
far.

# check-cluster gives below data one of the node(10.148.181.162) post
leader failure

ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
ovsdb-tool: leader /etc/openvswitch/ovnsb_db.db for term 2 has log entries
only up to index 18446744073709551615, but index 9 was committed in a
previous term (e.g. by /etc/openvswitch/ovnsb_db.db)


For check-cluster, are we planning to add more output showing which node is
active(leader), etc in upcoming versions ?


Thanks a ton for helping sort this out.  I think the patch looks good to be
merged post addressing of the comments by Justin along with the man page
details for ovsdb-tool.


I will do some more crash testing for the cluster along with the scale test
and keep you posted if something unexpected is noted.



Regards,



On Tue, Mar 13, 2018 at 11:07 PM, Numan Siddique <nusid...@redhat.com>
wrote:

>
>
> On Wed, Mar 14, 2018 at 7:51 AM, aginwala <aginw...@asu.edu> wrote:
>
>> Sure.
>>
>> To add on , I also ran for nb db too using different port  and Node2
>> crashes with same error :
>> # Node 2
>> /usr/share/openvswitch/scripts/ovn-ctl --db-nb-addr=10.99.152.138
>> --db-nb-port=6641 --db-nb-cluster-remote-addr="tcp:10.99.152.148:6645"
>> --db-nb-cluster-local-addr="tcp:10.99.152.138:6645" start_nb_ovsdb
>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db: cannot identify
>> file type
>>
>>
>>
> Hi Aliasgar,
>
> It worked for me. Can you delete the old db files in /etc/openvswitch/ and
> try running the commands again ?
>
> Below are the commands I ran in my setup.
>
> Node 1
> ---
> sudo /usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.121.91
> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
> --db-sb-cluster-local-addr=tcp:192.168.121.91:6644 start_sb_ovsdb
>
> Node 2
> -
> sudo /usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.121.87
> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
> --db-sb-cluster-local-addr="tcp:192.168.121.87:6644"
> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644"  start_sb_ovsdb
>
> Node 3
> -
> sudo /usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.121.78
> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
> --db-sb-cluster-local-addr="tcp:192.168.121.78:6644"
> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644"  start_sb_ovsdb
>
>
>
> Thanks
> Numan
>
>
>
>
>
>>
>> On Tue, Mar 13, 2018 at 9:40 AM, Numan Siddique <nusid...@redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Mar 13, 2018 at 9:46 PM, aginwala <aginw...@asu.edu> wrote:
>>>
>>>> Thanks Numan for the response.
>>>>
>>>> There is no command start_cluster_sb_ovsdb in the source code too. Is
>>>> that in a separate commit somewhere? Hence, I used start_sb_ovsdb
>>>> which I think would not be a right choice?
>>>>
>>>
>>> Sorry, I meant start_sb_ovsdb. Strange that it didn't work for you. Let
>>> me try it out again and update this thread.
>>>
>>> Thanks
>>> Numan
>>>
>>>
>>>>
>>>> # Node1  came up as expected.
>>>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
>>>> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp:
>>>> 10.99.152.148:6644" start_sb_ovsdb.
>>>>
>>>> # verifying its a clustered db with ovsdb-tool db-local-address
>>>> /etc/openvswitch/ovnsb_db.db
>>>> tcp:10.99.152.148:6644
>>>> # ovn-sbctl show works fine and chassis are being populated correctly.
>>>>
>>>> #Node 2 fails with error:
>>>> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
>>>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
>>>&g

Re: [ovs-discuss] raft ovsdb clustering

2018-03-13 Thread aginwala
Sure.

To add on , I also ran for nb db too using different port  and Node2
crashes with same error :
# Node 2
/usr/share/openvswitch/scripts/ovn-ctl --db-nb-addr=10.99.152.138
--db-nb-port=6641 --db-nb-cluster-remote-addr="tcp:10.99.152.148:6645"
--db-nb-cluster-local-addr="tcp:10.99.152.138:6645" start_nb_ovsdb
ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db: cannot identify
file type



On Tue, Mar 13, 2018 at 9:40 AM, Numan Siddique <nusid...@redhat.com> wrote:

>
>
> On Tue, Mar 13, 2018 at 9:46 PM, aginwala <aginw...@asu.edu> wrote:
>
>> Thanks Numan for the response.
>>
>> There is no command start_cluster_sb_ovsdb in the source code too. Is
>> that in a separate commit somewhere? Hence, I used start_sb_ovsdb which
>> I think would not be a right choice?
>>
>
> Sorry, I meant start_sb_ovsdb. Strange that it didn't work for you. Let me
> try it out again and update this thread.
>
> Thanks
> Numan
>
>
>>
>> # Node1  came up as expected.
>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
>> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp:
>> 10.99.152.148:6644" start_sb_ovsdb.
>>
>> # verifying its a clustered db with ovsdb-tool db-local-address
>> /etc/openvswitch/ovnsb_db.db
>> tcp:10.99.152.148:6644
>> # ovn-sbctl show works fine and chassis are being populated correctly.
>>
>> #Node 2 fails with error:
>> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot identify
>> file type
>>
>> # So i did start the sb db the usual way using start_ovsdb to just get
>> the db file created and killed the sb pid and re-ran the command which gave
>> actual error where it complains for join-cluster command that is being
>> called internally
>> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
>> ovsdb-tool: /etc/openvswitch/ovnsb_db.db: not a clustered database
>>  * Backing up database to /etc/openvswitch/ovnsb_db.db.b
>> ackup1.15.0-70426956
>> ovsdb-tool: 'join-cluster' command requires at least 4 arguments
>>  * Creating cluster database /etc/openvswitch/ovnsb_db.db from existing
>> one
>>
>>
>> # based on above error I killed the sb db pid again and  try to create a
>> local cluster on node  then re-ran the join operation as per the source
>> code function.
>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
>> 10.99.152.138:6644 tcp:10.99.152.148:6644 which still complains
>> ovsdb-tool: I/O error: /etc/openvswitch/ovnsb_db.db: create failed (File
>> exists)
>>
>>
>> # Node 3: I did not try as I am assuming the same failure as node 2
>>
>>
>> Let me know may know further.
>>
>>
>> On Tue, Mar 13, 2018 at 3:08 AM, Numan Siddique <nusid...@redhat.com>
>> wrote:
>>
>>> Hi Aliasgar,
>>>
>>> On Tue, Mar 13, 2018 at 7:11 AM, aginwala <aginw...@asu.edu> wrote:
>>>
>>>> Hi Ben/Noman:
>>>>
>>>> I am trying to setup 3 node southbound db cluster  using raft10
>>>> <https://patchwork.ozlabs.org/patch/854298/> in review.
>>>>
>>>> # Node 1 create-cluster
>>>> ovsdb-tool create-cluster /etc/openvswitch/ovnsb_db.db
>>>> /root/ovs-reviews/ovn/ovn-sb.ovsschema tcp:10.99.152.148:6642
>>>>
>>>
>>> A different port is used for RAFT. So you have to choose another port
>>> like 6644 for example.
>>>
>>
>>>>
>>>> # Node 2
>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
>>>> 10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
>>>> 5dfcb678-bb1d-4377-b02d-a380edec2982
>>>>
>>>> #Node 3
>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
>>>> 10.99.152.101:6642 tcp:10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
>>>> 5dfcb678-bb1d-4377-b02d-a380edec2982
>>>>
>>>> # ovn remote is set to all 3 nodes
>>>> external_ids:ovn-remote="tcp:10.99.152.148:6642, 

Re: [ovs-discuss] raft ovsdb clustering

2018-03-13 Thread aginwala
Thanks Numan for the response.

There is no command start_cluster_sb_ovsdb in the source code too. Is that
in a separate commit somewhere? Hence, I used start_sb_ovsdb which I think
would not be a right choice?

# Node1  came up as expected.
ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
--db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp:
10.99.152.148:6644" start_sb_ovsdb.

# verifying its a clustered db with ovsdb-tool db-local-address
/etc/openvswitch/ovnsb_db.db
tcp:10.99.152.148:6644
# ovn-sbctl show works fine and chassis are being populated correctly.

#Node 2 fails with error:
/usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
--db-sb-port=6642 --db-sb-create-insecure-remote=yes
--db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
--db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot identify
file type

# So i did start the sb db the usual way using start_ovsdb to just get the
db file created and killed the sb pid and re-ran the command which gave
actual error where it complains for join-cluster command that is being
called internally
/usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
--db-sb-port=6642 --db-sb-create-insecure-remote=yes
--db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
--db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
ovsdb-tool: /etc/openvswitch/ovnsb_db.db: not a clustered database
 * Backing up database to /etc/openvswitch/ovnsb_db.db.backup1.15.0-70426956
ovsdb-tool: 'join-cluster' command requires at least 4 arguments
 * Creating cluster database /etc/openvswitch/ovnsb_db.db from existing one


# based on above error I killed the sb db pid again and  try to create a
local cluster on node  then re-ran the join operation as per the source
code function.
ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
10.99.152.138:6644 tcp:10.99.152.148:6644 which still complains
ovsdb-tool: I/O error: /etc/openvswitch/ovnsb_db.db: create failed (File
exists)


# Node 3: I did not try as I am assuming the same failure as node 2


Let me know may know further.

On Tue, Mar 13, 2018 at 3:08 AM, Numan Siddique <nusid...@redhat.com> wrote:

> Hi Aliasgar,
>
> On Tue, Mar 13, 2018 at 7:11 AM, aginwala <aginw...@asu.edu> wrote:
>
>> Hi Ben/Noman:
>>
>> I am trying to setup 3 node southbound db cluster  using raft10
>> <https://patchwork.ozlabs.org/patch/854298/> in review.
>>
>> # Node 1 create-cluster
>> ovsdb-tool create-cluster /etc/openvswitch/ovnsb_db.db
>> /root/ovs-reviews/ovn/ovn-sb.ovsschema tcp:10.99.152.148:6642
>>
>
> A different port is used for RAFT. So you have to choose another port like
> 6644 for example.
>

>>
>> # Node 2
>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
>> 10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
>> 5dfcb678-bb1d-4377-b02d-a380edec2982
>>
>> #Node 3
>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
>> 10.99.152.101:6642 tcp:10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
>> 5dfcb678-bb1d-4377-b02d-a380edec2982
>>
>> # ovn remote is set to all 3 nodes
>> external_ids:ovn-remote="tcp:10.99.152.148:6642, tcp:10.99.152.138:6642,
>> tcp:10.99.152.101:6642"
>>
>
>> # Starting sb db on node 1 using below command on node 1:
>>
>> ovsdb-server --detach --monitor -vconsole:off -vraft -vjsonrpc
>> --log-file=/var/log/openvswitch/ovsdb-server-sb.log
>> --pidfile=/var/run/openvswitch/ovnsb_db.pid
>> --remote=db:OVN_Southbound,SB_Global,connections --unixctl=ovnsb_db.ctl
>> --private-key=db:OVN_Southbound,SSL,private_key
>> --certificate=db:OVN_Southbound,SSL,certificate
>> --ca-cert=db:OVN_Southbound,SSL,ca_cert 
>> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
>> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers
>> --remote=punix:/var/run/openvswitch/ovnsb_db.sock
>> /etc/openvswitch/ovnsb_db.db
>>
>> # check-cluster is returning nothing
>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
>>
>> # ovsdb-server-sb.log below shows the leader is elected with only one
>> server and there are rbac related debug logs with rpc replies and empty
>> params with no errors
>>
>> 2018-03-13T01:12:02Z|2|raft|DBG|server 63d1 added to configuration
>> 2018-03-13T01:12:02Z|3|raft|INFO|term 6: starting election
>> 2018-03-13T01:12:02Z|4|raft|INFO|term 6: elected leader by 1+ of 1
>> servers
>>
>>
>> Now Starting the ovsdb-server on the other clusters fails saying
>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cann

Re: [ovs-discuss] OVN load balancing on same subnet failing

2018-03-02 Thread aginwala
Hi:

IRL , we always use different subnets for VIPs for OpenStack workloads in
production for couple of reasons:
1.  It's easy to fail over in case of outages if VIP and pool members are
in different subnets.
2.  It is also easy for neutron's IPAM to manage 2 different subnets; one
for VIP and other for VM/containers instead of allocating from a same
subnet because neutron doesn't care if the allocated IP is getting used for
VIP or VM/container

 Hence, I think its ok to stick with the solution suggested by Guru. If
folks in OpenStack community are exclusively asking for this requirement;
this implementation is worth prioritizing.


On Fri, Mar 2, 2018 at 9:40 AM, Guru Shetty  wrote:

>
>
> On 1 March 2018 at 21:09, Anil Venkata  wrote:
>
>>
>>
>> On Fri, Mar 2, 2018 at 7:23 AM, Guru Shetty  wrote:
>>
>>>
>>>
>>> On 27 February 2018 at 03:13, Anil Venkata 
>>> wrote:
>>>
 For example, I have a 10.1.0.0/24 network and a load balancer is added
 to it with 10.1.0.10 as VIP and 10.1.0.2(MAC 50:54:00:00:00:01),
 10.1.0.3(MAC 50:54:00:00:00:02) as members.
 ovn-nbctl  create load_balancer vips:10.1.0.10="10.1.0.2,10.1.0.3"

>>>
>>> We currently need the VIP to be in a different subnet. You should
>>> connect switch it to a dummy logical router (or connect it to a external
>>> router). Since a VIP is in a different subnet, it sends an ARP for logical
>>> router IP and then things will work.
>>>
>>>
>>
>> Thanks Guru. Any reason for introducing this constraint(i.e VIP to be in
>> a different subnet)? Can we address this limitation?
>>
>
> It was just easy to implement with the constraint. You will need a ARP
> responder for the VIP. And now, you will have to specify the mac address
> for each VIP in the schema. So that is a bit of work - but not hard.
>
>
>>
>>
  When I try to send a request from client within the subnet(i.e
 10.1.0.33) its not reaching any load balancer members.
 I noticed ARP not resolved for VIP 10.1.0.10.

 I tried to resolve this in two ways
 1) Adding a new ARP reply ovs flow for VIP 10.1.0.10 with router port's
 MAC. When client tries to connect VIP, it will use router's MAC. Now router
 gets the packet after load balancing, and will forward the packet to
 appropriate member.

 2) Second approach,
a) Using a new MAC(example, 50:54:00:00:00:ab) for VIP 10.1.0.10,
 and adding a new ARP reply flow with this MAC.
b) As we are not using router, when load balancing changes
 destination ip, VIP MAC has to be replaced with corresponding member's MAC
 i.e
   sudo ovs-ofctl add-flow br-int "table=24,ip,priority=150,dl_d
 st=50:54:00:00:00:ab,nw_dst=10.1.0.2,action=mod_dl_dst:50:54
 :00:00:00:01,load:0x1->NXM_NX_REG15[],resubmit(,32)"
 sudo ovs-ofctl add-flow br-int "table=24,ip,priority=150,dl_d
 st=50:54:00:00:00:ab,nw_dst=10.1.0.3,action=mod_dl_dst:50:54
 :00:00:00:02,load:0x2->NXM_NX_REG15[],resubmit(,32)"

 Which approach will be better or is there any alternate solution?

 Thanks
 Anil


 ___
 discuss mailing list
 disc...@openvswitch.org
 https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


>>>
>>
>
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] ovs-vswitchd 100% CPU in OVN scale test

2018-02-19 Thread aginwala
Hi All:

As per the discussions/requests by Mark and Numan, I finally reverted the
mtu patch (commit-id 8c319e8b73032e06c7dd1832b3b31f8a1189dcd1) on
branch-2.9 and re-ran the test with 10k lports to bind on farms, with 8 LRs
and 40 LS ;and results improvised.  Since ovs did not go super hot, it did
result in completion for 10k ports binding to HVs in 5 hrs 28 minutes vs 8
hours with mtu patch.  Thus, the extra strcmp did add the overhead. Cpu
idle graph of farm with 50 Hvs running 2.9 with/without mtu is available @
https://raw.githubusercontent.com/noah8713/ovn-scale-test/4cef99dbe9a0677a1b2d852b7f4f429ede340875/results/overlay/farm_cpu_2.9.png
which indicates ;running non-mtu patch had more idle cpu percentage vs with
mtu patch. In addition, I have also captured ovs-vswitchd process cpu util
on farm since ovs itself was creating a bottleneck by slowing down the port
binding on the computes.Graph is available @
https://raw.githubusercontent.com/noah8713/ovn-scale-test/scale_results/results/overlay/ovs-vswitchd_2.9_util.png
Hence, overall performance improved which resulted in faster completion of
all 10k port bindings.


On Thu, Feb 15, 2018 at 12:20 PM, Mark Michelson 
wrote:

>
>
> On 02/08/2018 07:55 PM, Han Zhou wrote:
>
>>
>>
>> On Wed, Feb 7, 2018 at 12:47 PM, Han Zhou  zhou...@gmail.com>> wrote:
>>  >
>>  > When doing scale testing for OVN (using https://github.com/openvswitch
>> /ovn-scale-test), we had some interesting findings, and need some help
>> here.
>>  >
>>  > We ran the test "create and bind lports" against branch 2.9 and branch
>> 2.6, and we found that 2.6 was must faster. With some analysis, we found
>> out the reason is not because of OVN gets slower in 2.9, but because the
>> bottleneck of this test in branch 2.9 is ovs-vswitchd.
>>  >
>>  > The testing was run in an environment with 20 farm nodes, each has 50
>> sandbox HVs (I will just mention them as HVs in short). Before the test,
>> there are already 9500 lports bound in 950 HVs on 19 farm nodes. The test
>> run against the last farm node to bind the lport on the 50 HVs there. The
>> steps in the test scenario are:
>>  >
>>  > 1. Create 5 new LSs in NB (so that the LSs will not be shared with any
>> of HVs on other farm nodes)
>>  > 2. create 100 lports in NB on a LS
>>  > 3. bind these lports on HVs, 2 for each HV. They are bound
>> sequentially on each HV, and for each HV the 2 ports are bound using one
>> command together: ovs-vsctl add-port  -- set Interface
>> external-ids:...  -- add-port  -- set Interface external-ids:...
>> (the script didn't set type to internal, but I hope it is not an issue for
>> this test).
>>  > 4. wait the port stated changed to up in NB for all the 100 lports
>> (with a single ovn-nbctl command)
>>  >
>>  > These steps are repeated for 5 times, one for each LS. So in the end
>> we got 500 more lports created and bound (the total scale is then 1k HVs
>> and 10k lports).
>>  >
>>  > When running with 2.6, the ovn-controllers were taking most of the CPU
>> time. However, with 2.9, the CPU of ovn-controllers spikes but there is
>> always ovs-vswitchd on the top with 100% CPU. It means the ovs-vswitchd is
>> the bottleneck in this testing. There is only one ovs-vswitchd with 100% at
>> the same time and different ovs-vswitchd will spike one after another,
>> since the ports are bound sequentially on each HV. From the rally log, each
>> 2 ports binding takes around 4 - 5 seconds. This is just the ovs-vsctl
>> command execution time. The 100% CPU of ovs-vswitchd explains the slowness.
>>  >
>>  > So, based on this result, we can not using the total time to evaluate
>> the efficiency of OVN, instead we can evaluate by CPU cost of
>> ovn-controller processes. In fact, 2.9 ovn-controller costs around 70% less
>> CPU than 2.6, which I think is due to some optimization we made earlier.
>> (With my work-in-progress patch it saves much more, and I will post later
>> as RFC).
>>  >
>>  > However, I cannot explain why ovs-vswitchd is getting slower than 2.6
>> when doing port-binding. We need expert suggestions here, for what could be
>> the possible reason of this slowness. We can do more testing with different
>> versions between 2.6 and 2.9 to find out related change, but with some
>> pointers it might save some effort. Below are some logs of ovs-vswitchd
>> when port binding is happening:
>>  >
>>  > ==
>>  > 2018-02-07T00:12:54.558Z|01767|bridge|INFO|bridge br-int: added
>> interface lport_bc65cd_QFOU3v on port 1028
>>  > 2018-02-07T00:12:55.629Z|01768|timeval|WARN|Unreasonably long 1112ms
>> poll interval (1016ms user, 4ms system)
>>  > 2018-02-07T00:12:55.629Z|01769|timeval|WARN|faults: 336 minor, 0 major
>>  > 2018-02-07T00:12:55.629Z|01770|timeval|WARN|context switches: 0
>> voluntary, 13 involuntary
>>  > 2018-02-07T00:12:55.629Z|01771|coverage|INFO|Event coverage, avg rate
>> over last: 5 seconds, last minute, last hour,  hash=b256889c:
>>  >