On 2/23/22 19:50, Han Zhou wrote: > On Wed, Feb 23, 2022 at 12:35 AM Dumitru Ceara <[email protected]> wrote: >> >> On 2/23/22 09:24, Dumitru Ceara wrote: >>> On 2/23/22 07:27, Han Zhou wrote: >>>> On Fri, Feb 11, 2022 at 12:52 AM Dumitru Ceara <[email protected]> > wrote: >>>>> >>>>> On 2/11/22 00:54, Numan Siddique wrote: >>>>>> On Thu, Jan 27, 2022 at 10:00 AM Dumitru Ceara <[email protected]> >>>> wrote: >>>>>>> >>>>>>> We still need to try to ovsdb_idl_loop_commit_and_wait() on > instances >>>>>>> that are in standby mode. That's because they need to try to take > the >>>>>>> lock. But if they're in standby they won't actually build a >>>> transaction >>>>>>> json and will return early because they don't own the SB lock. >>>>>>> >>>>>>> That's reported as an error by ovsdb_idl_loop_commit_and_wait() but > we >>>>>>> shouldn't act on it. >>>>>>> >>>>>>> Also, to ensure that no DB changes are missed, > ovsdb_idl_track_clear() >>>>>>> should be called only on active instances. The standby or paused > ones >>>>>>> will get the incremental updates when they become active. > Otherwise we >>>>>>> would be forced to perform a full recompute when the instance > becomes >>>>>>> active. >>>>>> >>>>>> Hi Dumitru, >>>>>> >>>>> >>>>> Hi Numan, >>>>> >>>>> Thanks for the review! >>>>> >>>>>> I've a question on the track clear being moved out of the standby >>>> instances. >>>>>> To ensure correctness, I suppose it's better to trigger a full >>>> recompute when a >>>>>> standby instance becomes active. What do you think? >>>>>> >>>>> >>>>> I might be wrong but I don't think that's necessary. It may also be >>>>> quite costly as full recomputes can take quite long. >>>>> >>>>>> Also lets say CMS does the below operations >>>>>> - Add a logical switch S1 >>>>>> - Add a logical port p1 in S1 >>>>>> - Add a logical port p2 in S1 >>>>>> - Delete logical port p2 >>>>>> - Delete a logical switch. >>>>>> >>>>>> With this patch since we are not clearing the tracking information, >>>>>> how does ovn-northd >>>>>> process the tracked changes when it becomes active ? >>>>> >>>>> When ovn-northd becomes active, from a Northbound database > perspective, >>>>> there were no changes; that is, S1 didn't exist when it was last > active >>>>> and it doesn't exist now either. >>>>> >>>>> So, there should be no NB change to process. Accumulating tracked >>>>> changes without calling clear() on the standby has exactly this > effect. >>>> >>>> Hi Dumitru, >>>> >>> >>> Hi Han, >>> >>>> I wonder how accumulating tracked changes without calling clear() would >>>> work. >>>> >>>> Firstly, I was under the impression that ovsdb_idl_track_clear() must > be >>>> called before the next ovsdb_idl_run(), and the current change tracking >>>> implementation cannot safely carry tracking information across the >>>> iterations. This was why in ovn-controller whenever (!engine_has_run() > && >>>> engine_need_run()) we force recompute in the next iteration. If changes >>>> could be carried over we would have incrementally processed the > accumulated >>>> changes without forcing recompute. I can't recall the details, and I >>>> checked the IDL again briefly but I didn't find the exact reason why > it is >>>> not safe. But I believe it was never used this way before. I should > have >>>> added a TODO for this (but somehow forgot to, sorry). >>>> >>> >>> I've been looking at that code too and I don't see any reason why >>> accumulating changes wouldn't work. The IDL code is written such that >>> it processes multiple jsonrpc updates in a single run anyway: >>> >>> ovsdb_idl_run() -> ovsdb_cs_run() -> which can receive up to 50 >>> (hardcoded) jsonrpc messages. >>> > Maybe you are right, but I am not sure. It's true that IDL run can process > multiple messages, but there are functions called once for every IDL run, > after processing all the messages: > ovsdb_idl_reparse_refs_to_inserted, ovsdb_idl_reparse_deleted, > ovsdb_idl_row_destroy_postprocess, etc. > > Is it possible any of these could cause problems with change tracking left > with the previous run? I haven't looked deeper yet. > >>> It's possible that I missed something though, the IDL functionality is >>> complex, so maybe it's worth documenting that we recommend calling >>> ovsdb_idl_track_clear() every time ovsdb_idl_run() is called, and why. >>> What do you think? >>> > > There is a comment above the ovsdb_idl_track_clear() which suggests calling > it before ovsdb_idl_run(), but it doesn't mention if it is mandatory. I > might have just followed the comment. > >>>> Secondly, even if it is safe to accumulate changes, it is going to be a >>>> memory problem. Active-standby failover happens very rarely in a > healthy >>>> production environment. So even if the DB size is small, the change >>>> tracking can grow without any limit. I tried with this patch by doing >>>> simply a loop of adding and deleting a single logical switch 10k times > on >>>> top of an empty NB DB, and the standby northd's memory grew to 1.1GB. >>>> >>> >>> This is surprising, sounds like an issue in the IDL, I would've expected >>> the delete to just remove the record from the track list. I'll look >>> into it. >>> > I wasn't surprised because I thought tracked changes were append-only, and > only got cleared by ovsdb_idl_track_clear() or ovsdb_idl_clear(). I just > recalled that the ovsdb_idl_row_untrack_change() was added some time ago > and it may change the behavior. > >> >> I had a quick look, it's because deleting a row clears its datum, sets >> tracked_old_datum and adds it to the deleted track list. The subsequent >> "add" will not find this row, and will insert a new one. >> > > I don't expect the subsequent add would find the deleted item and remove it > from the tracking. In general, every add would create a new object with a > new UUID. IDL doesn't and shouldn't understand if it is the "same" to an > earlier deleted object. Think about a more general case when we keep adding > and deleting items with different names every time. > > However, for the problem here, it is more about delete-after-add. I think > it is possible to avoid maintaining the items that are deleted before being > processed by the client. Be it a bug-fix or a new feature, we can improve > this, and then the tracked change size shouldn't grow much when the total > DB size is stable. >
You're right, this is probably the way to fix it: if the client didn't process the "add" event yet then just silently remove it when the "delete" event is received. >> We probably need to fix this anyway because even in the current >> implementation the IDL can process up to 50 consecutive >> "add/delete/add/delete/.." updates. This can potentially cause issues >> due to "spurious" delete events when the client finally processes the >> tracked changes. >> > > I *think* it shouldn't cause problems for the current implementation, > except that there can be wasted cycles to process an object addition and > its deletion in the same iteration, which should be fine because the case > should be rare anyway. Each add creates a new object with a new UUID, so to > the client (ovn-controller) it just processes different objects. Please let > me know if you find anything possibly going wrong here, then it's a bug. > No, it's exactly what you pointed in the previous paragraph (delete-after-add), I'm not sure how easy it is to fix though. >>>> Thirdly, processing all the tracked changes when standby becomes > active is >>>> likely taking higher cost than recompute, if the tracked change size is >>>> bigger than the DB size. >>>> >>> >>> If changes are accumulated correctly, the total number of the tracked >>> record changes should never be larger than the size of the DB the last >>> time changes were processed + the size of the DB now, resulting in a >>> cost to process of O(size-of-DB), the same as for full recompute. Or am >>> I missing something? >>> > > My conclusion was based on the fact that the delete-after-add operations > are accumulated. If we can avoid that, and if the incremental processing is > implemented efficiently, i.e. the extra overhead of handling each change is > small enough, then yes I agree with you it is in the worst case similar to > full recompute. > However it would really depend on the implementation. A bad example in > ovn-controller is the change handling for anything referenced by a logical > flow. It used to reprocess the same lflow again and again when there are > multiple reference changes (now this is avoided). > Recompute has an advantage that it guarantees the upper bound time for the > worst case, but of course not efficient if the tracked changes are not big > enough - in the failover scenario since it happens rarely, it is more > likely that the accumulated change size is big, so in that case probably > recompute is not a bad idea. > > Recompute may also avoid the complexity of corner cases during failover. > > But again it depends on how we implement I-P in northd. We can change it to > I-P whenever we find it more feasible. > OK. >>>> So for now I would suggest keeping the logic of clearing tracking on > every >>>> iteration and force recompute when failover. >>>> >>> >>> At least until we figure out why the memory increase in the IDL, I agree >>> to keep forcing a recompute on failover. That's also because we >>> currently don't really incrementally process much in ovn-northd. >>> >>> I'll send a v2. > > Thanks! Looking at it. > > Han > Thanks, Dumitru _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
