Re: [ovs-dev] OVN meeting report
On Fri, Apr 14, 2017 at 02:48:40PM +0500, Valentine Sinitsyn wrote: > On 13.04.2017 20:53, Ben Pfaff wrote: > >On Wed, Apr 12, 2017 at 06:09:28PM +0500, Valentine Sinitsyn wrote: > >>Is there some design outline for the missing implementation bits? > >>Specifically, it would be good to know the following: > >> > >>1. With clustered OVSDB, a client such as IDL needs two JSON RPC > >>connections: to the leader (to commit transactions), and a read-only one to > >>an arbitrary replica set (scaling reads). Will it be implemented on > >>ovsdb_idl level or encapsulated inside jsonrpc_session? The former seems > >>natural yet multiple remotes support went to jsonrpc_session already. > > > >There are multiple possible approaches here. The one that I am planning > >to try out first is to have a client connect to only one randomly > >selected server, and then have that server be responsible for relaying > >write transactions to the leader. > Yes, this is an option. However, our tests suggest that ovsdb-server doesn't > scale well with respect to (hundreds to thousands) connections. This relay > approach adds at most one new connection within the cluster per new client > connection, which could be a bottleneck. Relaying will take place over the Raft connections among the servers in the cluster, not over the OVSDB JSON-RPC connections. The Raft connections are per-server (although there are N**2 of them for N servers), so it shouldn't introduce additional per-client connections to the cluster. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] OVN meeting report
Hi Ben, On 13.04.2017 20:53, Ben Pfaff wrote: On Wed, Apr 12, 2017 at 06:09:28PM +0500, Valentine Sinitsyn wrote: Hi, On 04.04.2017 15:29, Valentine Sinitsyn wrote: On 03.04.2017 20:29, Valentine Sinitsyn wrote: Hi Ben, On 23.03.2017 08:11, Ben Pfaff wrote: Hello everyone. I am not sure whether I am going to be able to attend the OVN meeting tomorrow, because I will be in another possibly distracting meeting, so I'm going to give my report here. Toward the end of last week I did a full pass of reviews through patchwork. The most notable result, I think, is that I applied patches that add 802.1ad support. For OVN, this makes it more reasonable to consider adding support for tagged logical ports--currently, OVN drops all tagged logical packets--which I've heard requested once or twice, because it means that they can now be gatewayed to physical ports within an outer VLAN. I don't have any plans to work on that, but I think that it is worth pointing out. The OVS "Open Source Day" talks have been scheduled at OpenStack Boston. They are all on Wednesday: https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135 I've been spending what dev time I have on database clustering. Today, I managed to get it working, with many caveats. It will take weeks or months longer to get it finished, tested, and ready for posting. (If you want what I have, check out the raft3 branch in my ovs-reviews repo at github.) I've checked out your raft3 branch, and even learned how to create an OVSDB cluster. Thanks for the docs! What I don't get though is how do I instruct IDL to connect to the cluster now? Do I just connect to a random server, or there should be some dispatcher, or whatever? OK I see this is an ongoing work in your branch. I had some time to play with raft3 branch last week. I added very basic and hacky replica set support to IDL and brought up an OVN setup with clustered southbound database. It works to some extent, yet if I try to throw several hundreds of logical ports into the mix, the database becomes inconsistent. The reason is probably the race window between when the raft leader appends a log entry to other nodes (so a client such as ovn-northd already sees it) and the entry really appears in the leader's log itself. Not sure if it is my bug or not. The original code had some minor issues as well (which is absolutely normal for WIP) - I can send my (rather trivial) patches if there is any interest. I'm not surprised that there are inconsistency bugs. The testing I've done so far is really sketchy. Let me assure you that I will implement much more thorough testing before I will propose anything to be merged. Sure, I didn't expect it to be bug free either. Is there some design outline for the missing implementation bits? Specifically, it would be good to know the following: 1. With clustered OVSDB, a client such as IDL needs two JSON RPC connections: to the leader (to commit transactions), and a read-only one to an arbitrary replica set (scaling reads). Will it be implemented on ovsdb_idl level or encapsulated inside jsonrpc_session? The former seems natural yet multiple remotes support went to jsonrpc_session already. There are multiple possible approaches here. The one that I am planning to try out first is to have a client connect to only one randomly selected server, and then have that server be responsible for relaying write transactions to the leader. Yes, this is an option. However, our tests suggest that ovsdb-server doesn't scale well with respect to (hundreds to thousands) connections. This relay approach adds at most one new connection within the cluster per new client connection, which could be a bottleneck. Thanks, Valentine 2. How does the client know which replica set member is currently a leader? I just loop over remotes until one accepts the transaction (which is an awful idea). It would be nice to send some sort of cluster metadata snapshot to JSON RPC client during initial handshake. Alternatively, one can extend the "not leader" error object with a leader URL. If we do adopt the idea that followers relay write transactions to the leader, then the client doesn't need to know the leader. But if that isn't practical, then the Raft thesis, section 6.2, suggests the same idea as you did, of having the follower point to the leader if it knows it. 3. For eventual consistency reasons, if an IDL reads from one member (A) but writes to another one (B), it can try to delete a row not yet in A's database. This would make all further requests fail with "inconsistent data" error and basically is what I observe in my tests. How do you plan to overcome this? This sounds like a bug in the existing code (not too surprising). What is supposed to happen is that the client waits until it receives updated data from the server, which it knows will eventually arrive because it knows that its write was against an inconsistent copy. Then, it
Re: [ovs-dev] OVN meeting report
On Wed, Apr 12, 2017 at 06:09:28PM +0500, Valentine Sinitsyn wrote: > Hi, > > On 04.04.2017 15:29, Valentine Sinitsyn wrote: > >On 03.04.2017 20:29, Valentine Sinitsyn wrote: > >>Hi Ben, > >> > >>On 23.03.2017 08:11, Ben Pfaff wrote: > >>>Hello everyone. I am not sure whether I am going to be able to attend > >>>the OVN meeting tomorrow, because I will be in another possibly > >>>distracting meeting, so I'm going to give my report here. > >>> > >>>Toward the end of last week I did a full pass of reviews through > >>>patchwork. The most notable result, I think, is that I applied patches > >>>that add 802.1ad support. For OVN, this makes it more reasonable to > >>>consider adding support for tagged logical ports--currently, OVN drops > >>>all tagged logical packets--which I've heard requested once or twice, > >>>because it means that they can now be gatewayed to physical ports within > >>>an outer VLAN. I don't have any plans to work on that, but I think that > >>>it is worth pointing out. > >>> > >>>The OVS "Open Source Day" talks have been scheduled at OpenStack > >>>Boston. They are all on Wednesday: > >>>https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135 > >>> > >>>I've been spending what dev time I have on database clustering. Today, > >>>I managed to get it working, with many caveats. It will take weeks or > >>>months longer to get it finished, tested, and ready for posting. (If > >>>you want what I have, check out the raft3 branch in my ovs-reviews repo > >>>at github.) > >>I've checked out your raft3 branch, and even learned how to create an > >>OVSDB cluster. Thanks for the docs! > >> > >>What I don't get though is how do I instruct IDL to connect to the > >>cluster now? Do I just connect to a random server, or there should be > >>some dispatcher, or whatever? > >OK I see this is an ongoing work in your branch. > > I had some time to play with raft3 branch last week. > > I added very basic and hacky replica set support to IDL and brought up an > OVN setup with clustered southbound database. It works to some extent, yet > if I try to throw several hundreds of logical ports into the mix, the > database becomes inconsistent. The reason is probably the race window > between when the raft leader appends a log entry to other nodes (so a client > such as ovn-northd already sees it) and the entry really appears in the > leader's log itself. Not sure if it is my bug or not. The original code had > some minor issues as well (which is absolutely normal for WIP) - I can send > my (rather trivial) patches if there is any interest. I'm not surprised that there are inconsistency bugs. The testing I've done so far is really sketchy. Let me assure you that I will implement much more thorough testing before I will propose anything to be merged. > Is there some design outline for the missing implementation bits? > Specifically, it would be good to know the following: > > 1. With clustered OVSDB, a client such as IDL needs two JSON RPC > connections: to the leader (to commit transactions), and a read-only one to > an arbitrary replica set (scaling reads). Will it be implemented on > ovsdb_idl level or encapsulated inside jsonrpc_session? The former seems > natural yet multiple remotes support went to jsonrpc_session already. There are multiple possible approaches here. The one that I am planning to try out first is to have a client connect to only one randomly selected server, and then have that server be responsible for relaying write transactions to the leader. > 2. How does the client know which replica set member is currently a leader? > I just loop over remotes until one accepts the transaction (which is an > awful idea). It would be nice to send some sort of cluster metadata snapshot > to JSON RPC client during initial handshake. Alternatively, one can extend > the "not leader" error object with a leader URL. If we do adopt the idea that followers relay write transactions to the leader, then the client doesn't need to know the leader. But if that isn't practical, then the Raft thesis, section 6.2, suggests the same idea as you did, of having the follower point to the leader if it knows it. > 3. For eventual consistency reasons, if an IDL reads from one member (A) but > writes to another one (B), it can try to delete a row not yet in A's > database. This would make all further requests fail with "inconsistent data" > error and basically is what I observe in my tests. How do you plan to > overcome this? This sounds like a bug in the existing code (not too surprising). What is supposed to happen is that the client waits until it receives updated data from the server, which it knows will eventually arrive because it knows that its write was against an inconsistent copy. Then, it recomposes its change against the updated database and sends a new transaction. This is similar to what the clients already do when their transactions fail because another client has
Re: [ovs-dev] OVN meeting report
Hi, On 04.04.2017 15:29, Valentine Sinitsyn wrote: On 03.04.2017 20:29, Valentine Sinitsyn wrote: Hi Ben, On 23.03.2017 08:11, Ben Pfaff wrote: Hello everyone. I am not sure whether I am going to be able to attend the OVN meeting tomorrow, because I will be in another possibly distracting meeting, so I'm going to give my report here. Toward the end of last week I did a full pass of reviews through patchwork. The most notable result, I think, is that I applied patches that add 802.1ad support. For OVN, this makes it more reasonable to consider adding support for tagged logical ports--currently, OVN drops all tagged logical packets--which I've heard requested once or twice, because it means that they can now be gatewayed to physical ports within an outer VLAN. I don't have any plans to work on that, but I think that it is worth pointing out. The OVS "Open Source Day" talks have been scheduled at OpenStack Boston. They are all on Wednesday: https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135 I've been spending what dev time I have on database clustering. Today, I managed to get it working, with many caveats. It will take weeks or months longer to get it finished, tested, and ready for posting. (If you want what I have, check out the raft3 branch in my ovs-reviews repo at github.) I've checked out your raft3 branch, and even learned how to create an OVSDB cluster. Thanks for the docs! What I don't get though is how do I instruct IDL to connect to the cluster now? Do I just connect to a random server, or there should be some dispatcher, or whatever? OK I see this is an ongoing work in your branch. I had some time to play with raft3 branch last week. I added very basic and hacky replica set support to IDL and brought up an OVN setup with clustered southbound database. It works to some extent, yet if I try to throw several hundreds of logical ports into the mix, the database becomes inconsistent. The reason is probably the race window between when the raft leader appends a log entry to other nodes (so a client such as ovn-northd already sees it) and the entry really appears in the leader's log itself. Not sure if it is my bug or not. The original code had some minor issues as well (which is absolutely normal for WIP) - I can send my (rather trivial) patches if there is any interest. Is there some design outline for the missing implementation bits? Specifically, it would be good to know the following: 1. With clustered OVSDB, a client such as IDL needs two JSON RPC connections: to the leader (to commit transactions), and a read-only one to an arbitrary replica set (scaling reads). Will it be implemented on ovsdb_idl level or encapsulated inside jsonrpc_session? The former seems natural yet multiple remotes support went to jsonrpc_session already. 2. How does the client know which replica set member is currently a leader? I just loop over remotes until one accepts the transaction (which is an awful idea). It would be nice to send some sort of cluster metadata snapshot to JSON RPC client during initial handshake. Alternatively, one can extend the "not leader" error object with a leader URL. 3. For eventual consistency reasons, if an IDL reads from one member (A) but writes to another one (B), it can try to delete a row not yet in A's database. This would make all further requests fail with "inconsistent data" error and basically is what I observe in my tests. How do you plan to overcome this? Thanks in advance! Valentine Best, Valentine Thanks, Valentine ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev -- С уважением, Синицын Валентин ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] OVN meeting report
On 03.04.2017 20:29, Valentine Sinitsyn wrote: Hi Ben, On 23.03.2017 08:11, Ben Pfaff wrote: Hello everyone. I am not sure whether I am going to be able to attend the OVN meeting tomorrow, because I will be in another possibly distracting meeting, so I'm going to give my report here. Toward the end of last week I did a full pass of reviews through patchwork. The most notable result, I think, is that I applied patches that add 802.1ad support. For OVN, this makes it more reasonable to consider adding support for tagged logical ports--currently, OVN drops all tagged logical packets--which I've heard requested once or twice, because it means that they can now be gatewayed to physical ports within an outer VLAN. I don't have any plans to work on that, but I think that it is worth pointing out. The OVS "Open Source Day" talks have been scheduled at OpenStack Boston. They are all on Wednesday: https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135 I've been spending what dev time I have on database clustering. Today, I managed to get it working, with many caveats. It will take weeks or months longer to get it finished, tested, and ready for posting. (If you want what I have, check out the raft3 branch in my ovs-reviews repo at github.) I've checked out your raft3 branch, and even learned how to create an OVSDB cluster. Thanks for the docs! What I don't get though is how do I instruct IDL to connect to the cluster now? Do I just connect to a random server, or there should be some dispatcher, or whatever? OK I see this is an ongoing work in your branch. Best, Valentine Thanks, Valentine ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] OVN meeting report
Hi Ben, On 23.03.2017 08:11, Ben Pfaff wrote: Hello everyone. I am not sure whether I am going to be able to attend the OVN meeting tomorrow, because I will be in another possibly distracting meeting, so I'm going to give my report here. Toward the end of last week I did a full pass of reviews through patchwork. The most notable result, I think, is that I applied patches that add 802.1ad support. For OVN, this makes it more reasonable to consider adding support for tagged logical ports--currently, OVN drops all tagged logical packets--which I've heard requested once or twice, because it means that they can now be gatewayed to physical ports within an outer VLAN. I don't have any plans to work on that, but I think that it is worth pointing out. The OVS "Open Source Day" talks have been scheduled at OpenStack Boston. They are all on Wednesday: https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135 I've been spending what dev time I have on database clustering. Today, I managed to get it working, with many caveats. It will take weeks or months longer to get it finished, tested, and ready for posting. (If you want what I have, check out the raft3 branch in my ovs-reviews repo at github.) I've checked out your raft3 branch, and even learned how to create an OVSDB cluster. Thanks for the docs! What I don't get though is how do I instruct IDL to connect to the cluster now? Do I just connect to a random server, or there should be some dispatcher, or whatever? Thanks, Valentine ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] OVN meeting report
Hello everyone. I am not sure whether I am going to be able to attend the OVN meeting tomorrow, because I will be in another possibly distracting meeting, so I'm going to give my report here. Toward the end of last week I did a full pass of reviews through patchwork. The most notable result, I think, is that I applied patches that add 802.1ad support. For OVN, this makes it more reasonable to consider adding support for tagged logical ports--currently, OVN drops all tagged logical packets--which I've heard requested once or twice, because it means that they can now be gatewayed to physical ports within an outer VLAN. I don't have any plans to work on that, but I think that it is worth pointing out. The OVS "Open Source Day" talks have been scheduled at OpenStack Boston. They are all on Wednesday: https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135 I've been spending what dev time I have on database clustering. Today, I managed to get it working, with many caveats. It will take weeks or months longer to get it finished, tested, and ready for posting. (If you want what I have, check out the raft3 branch in my ovs-reviews repo at github.) ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev