Re: [ovs-dev] OVN meeting report

2017-04-14 Thread Ben Pfaff
On Fri, Apr 14, 2017 at 02:48:40PM +0500, Valentine Sinitsyn wrote:
> On 13.04.2017 20:53, Ben Pfaff wrote:
> >On Wed, Apr 12, 2017 at 06:09:28PM +0500, Valentine Sinitsyn wrote:
> >>Is there some design outline for the missing implementation bits?
> >>Specifically, it would be good to know the following:
> >>
> >>1. With clustered OVSDB, a client such as IDL needs two JSON RPC
> >>connections: to the leader (to commit transactions), and a read-only one to
> >>an arbitrary replica set (scaling reads). Will it be implemented on
> >>ovsdb_idl level or encapsulated inside jsonrpc_session? The former seems
> >>natural yet multiple remotes support went to jsonrpc_session already.
> >
> >There are multiple possible approaches here.  The one that I am planning
> >to try out first is to have a client connect to only one randomly
> >selected server, and then have that server be responsible for relaying
> >write transactions to the leader.
> Yes, this is an option. However, our tests suggest that ovsdb-server doesn't
> scale well with respect to (hundreds to thousands) connections. This relay
> approach adds at most one new connection within the cluster per new client
> connection, which could be a bottleneck.

Relaying will take place over the Raft connections among the servers in
the cluster, not over the OVSDB JSON-RPC connections.  The Raft
connections are per-server (although there are N**2 of them for N
servers), so it shouldn't introduce additional per-client connections to
the cluster.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVN meeting report

2017-04-14 Thread Valentine Sinitsyn

Hi Ben,

On 13.04.2017 20:53, Ben Pfaff wrote:

On Wed, Apr 12, 2017 at 06:09:28PM +0500, Valentine Sinitsyn wrote:

Hi,

On 04.04.2017 15:29, Valentine Sinitsyn wrote:

On 03.04.2017 20:29, Valentine Sinitsyn wrote:

Hi Ben,

On 23.03.2017 08:11, Ben Pfaff wrote:

Hello everyone.  I am not sure whether I am going to be able to attend
the OVN meeting tomorrow, because I will be in another possibly
distracting meeting, so I'm going to give my report here.

Toward the end of last week I did a full pass of reviews through
patchwork.  The most notable result, I think, is that I applied patches
that add 802.1ad support.  For OVN, this makes it more reasonable to
consider adding support for tagged logical ports--currently, OVN drops
all tagged logical packets--which I've heard requested once or twice,
because it means that they can now be gatewayed to physical ports within
an outer VLAN.  I don't have any plans to work on that, but I think that
it is worth pointing out.

The OVS "Open Source Day" talks have been scheduled at OpenStack
Boston.  They are all on Wednesday:
https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135

I've been spending what dev time I have on database clustering.  Today,
I managed to get it working, with many caveats.  It will take weeks or
months longer to get it finished, tested, and ready for posting.  (If
you want what I have, check out the raft3 branch in my ovs-reviews repo
at github.)

I've checked out your raft3 branch, and even learned how to create an
OVSDB cluster. Thanks for the docs!

What I don't get though is how do I instruct IDL to connect to the
cluster now? Do I just connect to a random server, or there should be
some dispatcher, or whatever?

OK I see this is an ongoing work in your branch.


I had some time to play with raft3 branch last week.

I added very basic and hacky replica set support to IDL and brought up an
OVN setup with clustered southbound database. It works to some extent, yet
if I try to throw several hundreds of logical ports into the mix, the
database becomes inconsistent. The reason is probably the race window
between when the raft leader appends a log entry to other nodes (so a client
such as ovn-northd already sees it) and the entry really appears in the
leader's log itself. Not sure if it is my bug or not. The original code had
some minor issues as well (which is absolutely normal for WIP) - I can send
my (rather trivial) patches if there is any interest.


I'm not surprised that there are inconsistency bugs.  The testing I've
done so far is really sketchy.  Let me assure you that I will implement
much more thorough testing before I will propose anything to be merged.

Sure, I didn't expect it to be bug free either.




Is there some design outline for the missing implementation bits?
Specifically, it would be good to know the following:

1. With clustered OVSDB, a client such as IDL needs two JSON RPC
connections: to the leader (to commit transactions), and a read-only one to
an arbitrary replica set (scaling reads). Will it be implemented on
ovsdb_idl level or encapsulated inside jsonrpc_session? The former seems
natural yet multiple remotes support went to jsonrpc_session already.


There are multiple possible approaches here.  The one that I am planning
to try out first is to have a client connect to only one randomly
selected server, and then have that server be responsible for relaying
write transactions to the leader.
Yes, this is an option. However, our tests suggest that ovsdb-server 
doesn't scale well with respect to (hundreds to thousands) connections. 
This relay approach adds at most one new connection within the cluster 
per new client connection, which could be a bottleneck.


Thanks,
Valentine




2. How does the client know which replica set member is currently a leader?
I just loop over remotes until one accepts the transaction (which is an
awful idea). It would be nice to send some sort of cluster metadata snapshot
to JSON RPC client during initial handshake. Alternatively, one can extend
the "not leader" error object with a leader URL.


If we do adopt the idea that followers relay write transactions to the
leader, then the client doesn't need to know the leader.  But if that
isn't practical, then the Raft thesis, section 6.2, suggests the same
idea as you did, of having the follower point to the leader if it knows
it.


3. For eventual consistency reasons, if an IDL reads from one member (A) but
writes to another one (B), it can try to delete a row not yet in A's
database. This would make all further requests fail with "inconsistent data"
error and basically is what I observe in my tests. How do you plan to
overcome this?


This sounds like a bug in the existing code (not too surprising).  What
is supposed to happen is that the client waits until it receives updated
data from the server, which it knows will eventually arrive because it
knows that its write was against an inconsistent copy.  Then, it

Re: [ovs-dev] OVN meeting report

2017-04-13 Thread Ben Pfaff
On Wed, Apr 12, 2017 at 06:09:28PM +0500, Valentine Sinitsyn wrote:
> Hi,
> 
> On 04.04.2017 15:29, Valentine Sinitsyn wrote:
> >On 03.04.2017 20:29, Valentine Sinitsyn wrote:
> >>Hi Ben,
> >>
> >>On 23.03.2017 08:11, Ben Pfaff wrote:
> >>>Hello everyone.  I am not sure whether I am going to be able to attend
> >>>the OVN meeting tomorrow, because I will be in another possibly
> >>>distracting meeting, so I'm going to give my report here.
> >>>
> >>>Toward the end of last week I did a full pass of reviews through
> >>>patchwork.  The most notable result, I think, is that I applied patches
> >>>that add 802.1ad support.  For OVN, this makes it more reasonable to
> >>>consider adding support for tagged logical ports--currently, OVN drops
> >>>all tagged logical packets--which I've heard requested once or twice,
> >>>because it means that they can now be gatewayed to physical ports within
> >>>an outer VLAN.  I don't have any plans to work on that, but I think that
> >>>it is worth pointing out.
> >>>
> >>>The OVS "Open Source Day" talks have been scheduled at OpenStack
> >>>Boston.  They are all on Wednesday:
> >>>https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135
> >>>
> >>>I've been spending what dev time I have on database clustering.  Today,
> >>>I managed to get it working, with many caveats.  It will take weeks or
> >>>months longer to get it finished, tested, and ready for posting.  (If
> >>>you want what I have, check out the raft3 branch in my ovs-reviews repo
> >>>at github.)
> >>I've checked out your raft3 branch, and even learned how to create an
> >>OVSDB cluster. Thanks for the docs!
> >>
> >>What I don't get though is how do I instruct IDL to connect to the
> >>cluster now? Do I just connect to a random server, or there should be
> >>some dispatcher, or whatever?
> >OK I see this is an ongoing work in your branch.
> 
> I had some time to play with raft3 branch last week.
> 
> I added very basic and hacky replica set support to IDL and brought up an
> OVN setup with clustered southbound database. It works to some extent, yet
> if I try to throw several hundreds of logical ports into the mix, the
> database becomes inconsistent. The reason is probably the race window
> between when the raft leader appends a log entry to other nodes (so a client
> such as ovn-northd already sees it) and the entry really appears in the
> leader's log itself. Not sure if it is my bug or not. The original code had
> some minor issues as well (which is absolutely normal for WIP) - I can send
> my (rather trivial) patches if there is any interest.

I'm not surprised that there are inconsistency bugs.  The testing I've
done so far is really sketchy.  Let me assure you that I will implement
much more thorough testing before I will propose anything to be merged.

> Is there some design outline for the missing implementation bits?
> Specifically, it would be good to know the following:
> 
> 1. With clustered OVSDB, a client such as IDL needs two JSON RPC
> connections: to the leader (to commit transactions), and a read-only one to
> an arbitrary replica set (scaling reads). Will it be implemented on
> ovsdb_idl level or encapsulated inside jsonrpc_session? The former seems
> natural yet multiple remotes support went to jsonrpc_session already.

There are multiple possible approaches here.  The one that I am planning
to try out first is to have a client connect to only one randomly
selected server, and then have that server be responsible for relaying
write transactions to the leader.

> 2. How does the client know which replica set member is currently a leader?
> I just loop over remotes until one accepts the transaction (which is an
> awful idea). It would be nice to send some sort of cluster metadata snapshot
> to JSON RPC client during initial handshake. Alternatively, one can extend
> the "not leader" error object with a leader URL.

If we do adopt the idea that followers relay write transactions to the
leader, then the client doesn't need to know the leader.  But if that
isn't practical, then the Raft thesis, section 6.2, suggests the same
idea as you did, of having the follower point to the leader if it knows
it.

> 3. For eventual consistency reasons, if an IDL reads from one member (A) but
> writes to another one (B), it can try to delete a row not yet in A's
> database. This would make all further requests fail with "inconsistent data"
> error and basically is what I observe in my tests. How do you plan to
> overcome this?

This sounds like a bug in the existing code (not too surprising).  What
is supposed to happen is that the client waits until it receives updated
data from the server, which it knows will eventually arrive because it
knows that its write was against an inconsistent copy.  Then, it
recomposes its change against the updated database and sends a new
transaction.  This is similar to what the clients already do when their
transactions fail because another client has 

Re: [ovs-dev] OVN meeting report

2017-04-12 Thread Valentine Sinitsyn

Hi,

On 04.04.2017 15:29, Valentine Sinitsyn wrote:

On 03.04.2017 20:29, Valentine Sinitsyn wrote:

Hi Ben,

On 23.03.2017 08:11, Ben Pfaff wrote:

Hello everyone.  I am not sure whether I am going to be able to attend
the OVN meeting tomorrow, because I will be in another possibly
distracting meeting, so I'm going to give my report here.

Toward the end of last week I did a full pass of reviews through
patchwork.  The most notable result, I think, is that I applied patches
that add 802.1ad support.  For OVN, this makes it more reasonable to
consider adding support for tagged logical ports--currently, OVN drops
all tagged logical packets--which I've heard requested once or twice,
because it means that they can now be gatewayed to physical ports within
an outer VLAN.  I don't have any plans to work on that, but I think that
it is worth pointing out.

The OVS "Open Source Day" talks have been scheduled at OpenStack
Boston.  They are all on Wednesday:
https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135

I've been spending what dev time I have on database clustering.  Today,
I managed to get it working, with many caveats.  It will take weeks or
months longer to get it finished, tested, and ready for posting.  (If
you want what I have, check out the raft3 branch in my ovs-reviews repo
at github.)

I've checked out your raft3 branch, and even learned how to create an
OVSDB cluster. Thanks for the docs!

What I don't get though is how do I instruct IDL to connect to the
cluster now? Do I just connect to a random server, or there should be
some dispatcher, or whatever?

OK I see this is an ongoing work in your branch.


I had some time to play with raft3 branch last week.

I added very basic and hacky replica set support to IDL and brought up 
an OVN setup with clustered southbound database. It works to some 
extent, yet if I try to throw several hundreds of logical ports into the 
mix, the database becomes inconsistent. The reason is probably the race 
window between when the raft leader appends a log entry to other nodes 
(so a client such as ovn-northd already sees it) and the entry really 
appears in the leader's log itself. Not sure if it is my bug or not. The 
original code had some minor issues as well (which is absolutely normal 
for WIP) - I can send my (rather trivial) patches if there is any interest.


Is there some design outline for the missing implementation bits? 
Specifically, it would be good to know the following:


1. With clustered OVSDB, a client such as IDL needs two JSON RPC 
connections: to the leader (to commit transactions), and a read-only one 
to an arbitrary replica set (scaling reads). Will it be implemented on 
ovsdb_idl level or encapsulated inside jsonrpc_session? The former seems 
natural yet multiple remotes support went to jsonrpc_session already.


2. How does the client know which replica set member is currently a 
leader? I just loop over remotes until one accepts the transaction 
(which is an awful idea). It would be nice to send some sort of cluster 
metadata snapshot to JSON RPC client during initial handshake. 
Alternatively, one can extend the "not leader" error object with a 
leader URL.


3. For eventual consistency reasons, if an IDL reads from one member (A) 
but writes to another one (B), it can try to delete a row not yet in A's 
database. This would make all further requests fail with "inconsistent 
data" error and basically is what I observe in my tests. How do you plan 
to overcome this?


Thanks in advance!

Valentine



Best,
Valentine



Thanks,
Valentine


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev



--
С уважением,
Синицын Валентин
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVN meeting report

2017-04-04 Thread Valentine Sinitsyn

On 03.04.2017 20:29, Valentine Sinitsyn wrote:

Hi Ben,

On 23.03.2017 08:11, Ben Pfaff wrote:

Hello everyone.  I am not sure whether I am going to be able to attend
the OVN meeting tomorrow, because I will be in another possibly
distracting meeting, so I'm going to give my report here.

Toward the end of last week I did a full pass of reviews through
patchwork.  The most notable result, I think, is that I applied patches
that add 802.1ad support.  For OVN, this makes it more reasonable to
consider adding support for tagged logical ports--currently, OVN drops
all tagged logical packets--which I've heard requested once or twice,
because it means that they can now be gatewayed to physical ports within
an outer VLAN.  I don't have any plans to work on that, but I think that
it is worth pointing out.

The OVS "Open Source Day" talks have been scheduled at OpenStack
Boston.  They are all on Wednesday:
https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135

I've been spending what dev time I have on database clustering.  Today,
I managed to get it working, with many caveats.  It will take weeks or
months longer to get it finished, tested, and ready for posting.  (If
you want what I have, check out the raft3 branch in my ovs-reviews repo
at github.)

I've checked out your raft3 branch, and even learned how to create an
OVSDB cluster. Thanks for the docs!

What I don't get though is how do I instruct IDL to connect to the
cluster now? Do I just connect to a random server, or there should be
some dispatcher, or whatever?

OK I see this is an ongoing work in your branch.

Best,
Valentine



Thanks,
Valentine


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVN meeting report

2017-04-03 Thread Valentine Sinitsyn

Hi Ben,

On 23.03.2017 08:11, Ben Pfaff wrote:

Hello everyone.  I am not sure whether I am going to be able to attend
the OVN meeting tomorrow, because I will be in another possibly
distracting meeting, so I'm going to give my report here.

Toward the end of last week I did a full pass of reviews through
patchwork.  The most notable result, I think, is that I applied patches
that add 802.1ad support.  For OVN, this makes it more reasonable to
consider adding support for tagged logical ports--currently, OVN drops
all tagged logical packets--which I've heard requested once or twice,
because it means that they can now be gatewayed to physical ports within
an outer VLAN.  I don't have any plans to work on that, but I think that
it is worth pointing out.

The OVS "Open Source Day" talks have been scheduled at OpenStack
Boston.  They are all on Wednesday:
https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135

I've been spending what dev time I have on database clustering.  Today,
I managed to get it working, with many caveats.  It will take weeks or
months longer to get it finished, tested, and ready for posting.  (If
you want what I have, check out the raft3 branch in my ovs-reviews repo
at github.)
I've checked out your raft3 branch, and even learned how to create an 
OVSDB cluster. Thanks for the docs!


What I don't get though is how do I instruct IDL to connect to the 
cluster now? Do I just connect to a random server, or there should be 
some dispatcher, or whatever?


Thanks,
Valentine


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] OVN meeting report

2017-03-22 Thread Ben Pfaff
Hello everyone.  I am not sure whether I am going to be able to attend
the OVN meeting tomorrow, because I will be in another possibly
distracting meeting, so I'm going to give my report here.

Toward the end of last week I did a full pass of reviews through
patchwork.  The most notable result, I think, is that I applied patches
that add 802.1ad support.  For OVN, this makes it more reasonable to
consider adding support for tagged logical ports--currently, OVN drops
all tagged logical packets--which I've heard requested once or twice,
because it means that they can now be gatewayed to physical ports within
an outer VLAN.  I don't have any plans to work on that, but I think that
it is worth pointing out.

The OVS "Open Source Day" talks have been scheduled at OpenStack
Boston.  They are all on Wednesday:
https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135

I've been spending what dev time I have on database clustering.  Today,
I managed to get it working, with many caveats.  It will take weeks or
months longer to get it finished, tested, and ready for posting.  (If
you want what I have, check out the raft3 branch in my ovs-reviews repo
at github.)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev