On 2/23/22 19:50, Han Zhou wrote:
> On Wed, Feb 23, 2022 at 12:35 AM Dumitru Ceara <[email protected]> wrote:
>>
>> On 2/23/22 09:24, Dumitru Ceara wrote:
>>> On 2/23/22 07:27, Han Zhou wrote:
>>>> On Fri, Feb 11, 2022 at 12:52 AM Dumitru Ceara <[email protected]>
> wrote:
>>>>>
>>>>> On 2/11/22 00:54, Numan Siddique wrote:
>>>>>> On Thu, Jan 27, 2022 at 10:00 AM Dumitru Ceara <[email protected]>
>>>> wrote:
>>>>>>>
>>>>>>> We still need to try to ovsdb_idl_loop_commit_and_wait() on
> instances
>>>>>>> that are in standby mode.  That's because they need to try to take
> the
>>>>>>> lock.  But if they're in standby they won't actually build a
>>>> transaction
>>>>>>> json and will return early because they don't own the SB lock.
>>>>>>>
>>>>>>> That's reported as an error by ovsdb_idl_loop_commit_and_wait() but
> we
>>>>>>> shouldn't act on it.
>>>>>>>
>>>>>>> Also, to ensure that no DB changes are missed,
> ovsdb_idl_track_clear()
>>>>>>> should be called only on active instances.  The standby or paused
> ones
>>>>>>> will get the incremental updates when they become active.
> Otherwise we
>>>>>>> would be forced to perform a full recompute when the instance
> becomes
>>>>>>> active.
>>>>>>
>>>>>> Hi Dumitru,
>>>>>>
>>>>>
>>>>> Hi Numan,
>>>>>
>>>>> Thanks for the review!
>>>>>
>>>>>> I've a question on the track clear being moved out of the standby
>>>> instances.
>>>>>> To ensure correctness,  I suppose it's better to trigger a full
>>>> recompute when a
>>>>>> standby instance becomes active. What do you think?
>>>>>>
>>>>>
>>>>> I might be wrong but I don't think that's necessary.  It may also be
>>>>> quite costly as full recomputes can take quite long.
>>>>>
>>>>>> Also lets say CMS does the below operations
>>>>>>      - Add a logical switch S1
>>>>>>      - Add a  logical port p1 in S1
>>>>>>      - Add a logical port p2 in S1
>>>>>>      - Delete logical port p2
>>>>>>      - Delete a logical switch.
>>>>>>
>>>>>> With this patch since we are not clearing the tracking information,
>>>>>> how does ovn-northd
>>>>>> process the tracked changes when it becomes active ?
>>>>>
>>>>> When ovn-northd becomes active, from a Northbound database
> perspective,
>>>>> there were no changes; that is, S1 didn't exist when it was last
> active
>>>>> and it doesn't exist now either.
>>>>>
>>>>> So, there should be no NB change to process.  Accumulating tracked
>>>>> changes without calling clear() on the standby has exactly this
> effect.
>>>>
>>>> Hi Dumitru,
>>>>
>>>
>>> Hi Han,
>>>
>>>> I wonder how accumulating tracked changes without calling clear() would
>>>> work.
>>>>
>>>> Firstly, I was under the impression that ovsdb_idl_track_clear() must
> be
>>>> called before the next ovsdb_idl_run(), and the current change tracking
>>>> implementation cannot safely carry tracking information across the
>>>> iterations. This was why in ovn-controller whenever (!engine_has_run()
> &&
>>>> engine_need_run()) we force recompute in the next iteration. If changes
>>>> could be carried over we would have incrementally processed the
> accumulated
>>>> changes without forcing recompute. I can't recall the details, and I
>>>> checked the IDL again briefly but I didn't find the exact reason why
> it is
>>>> not safe. But I believe it was never used this way before. I should
> have
>>>> added a TODO for this (but somehow forgot to, sorry).
>>>>
>>>
>>> I've been looking at that code too and I don't see any reason why
>>> accumulating changes wouldn't work.  The IDL code is written such that
>>> it processes multiple jsonrpc updates in a single run anyway:
>>>
>>> ovsdb_idl_run() -> ovsdb_cs_run() -> which can receive up to 50
>>> (hardcoded) jsonrpc messages.
>>>
> Maybe you are right, but I am not sure. It's true that IDL run can process
> multiple messages, but there are functions called once for every IDL run,
> after processing all the messages:
> ovsdb_idl_reparse_refs_to_inserted, ovsdb_idl_reparse_deleted,
> ovsdb_idl_row_destroy_postprocess, etc.
> 
> Is it possible any of these could cause problems with change tracking left
> with the previous run? I haven't looked deeper yet.
> 
>>> It's possible that I missed something though, the IDL functionality is
>>> complex, so maybe it's worth documenting that we recommend calling
>>> ovsdb_idl_track_clear() every time ovsdb_idl_run() is called, and why.
>>> What do you think?
>>>
> 
> There is a comment above the ovsdb_idl_track_clear() which suggests calling
> it before ovsdb_idl_run(), but it doesn't mention if it is mandatory. I
> might have just followed the comment.
> 
>>>> Secondly, even if it is safe to accumulate changes, it is going to be a
>>>> memory problem. Active-standby failover happens very rarely in a
> healthy
>>>> production environment. So even if the DB size is small, the change
>>>> tracking can grow without any limit. I tried with this patch by doing
>>>> simply a loop of adding and deleting a single logical switch 10k times
> on
>>>> top of an empty NB DB, and the standby northd's memory grew to 1.1GB.
>>>>
>>>
>>> This is surprising, sounds like an issue in the IDL, I would've expected
>>> the delete to just remove the record from the track list.  I'll look
>>> into it.
>>>
> I wasn't surprised because I thought tracked changes were append-only, and
> only got cleared by ovsdb_idl_track_clear() or ovsdb_idl_clear(). I just
> recalled that the ovsdb_idl_row_untrack_change() was added some time ago
> and it may change the behavior.
> 
>>
>> I had a quick look, it's because deleting a row clears its datum, sets
>> tracked_old_datum and adds it to the deleted track list.  The subsequent
>> "add" will not find this row, and will insert a new one.
>>
> 
> I don't expect the subsequent add would find the deleted item and remove it
> from the tracking. In general, every add would create a new object with a
> new UUID. IDL doesn't and shouldn't understand if it is the "same" to an
> earlier deleted object. Think about a more general case when we keep adding
> and deleting items with different names every time.
> 
> However, for the problem here, it is more about delete-after-add. I think
> it is possible to avoid maintaining the items that are deleted before being
> processed by the client. Be it a bug-fix or a new feature, we can improve
> this, and then the tracked change size shouldn't grow much when the total
> DB size is stable.
> 

You're right, this is probably the way to fix it: if the client didn't
process the "add" event yet then just silently remove it when the
"delete" event is received.

>> We probably need to fix this anyway because even in the current
>> implementation the IDL can process up to 50 consecutive
>> "add/delete/add/delete/.." updates.  This can potentially cause issues
>> due to "spurious" delete events when the client finally processes the
>> tracked changes.
>>
> 
> I *think* it shouldn't cause problems for the current implementation,
> except that there can be wasted cycles to process an object addition and
> its deletion in the same iteration, which should be fine because the case
> should be rare anyway. Each add creates a new object with a new UUID, so to
> the client (ovn-controller) it just processes different objects. Please let
> me know if you find anything possibly going wrong here, then it's a bug.
> 

No, it's exactly what you pointed in the previous paragraph
(delete-after-add), I'm not sure how easy it is to fix though.

>>>> Thirdly, processing all the tracked changes when standby becomes
> active is
>>>> likely taking higher cost than recompute, if the tracked change size is
>>>> bigger than the DB size.
>>>>
>>>
>>> If changes are accumulated correctly, the total number of the tracked
>>> record changes should never be larger than the size of the DB the last
>>> time changes were processed + the size of the DB now, resulting in a
>>> cost to process of O(size-of-DB), the same as for full recompute.  Or am
>>> I missing something?
>>>
> 
> My conclusion was based on the fact that the delete-after-add operations
> are accumulated. If we can avoid that, and if the incremental processing is
> implemented efficiently, i.e. the extra overhead of handling each change is
> small enough, then yes I agree with you it is in the worst case similar to
> full recompute.
> However it would really depend on the implementation. A bad example in
> ovn-controller is the change handling for anything referenced by a logical
> flow. It used to reprocess the same lflow again and again when there are
> multiple reference changes (now this is avoided).
> Recompute has an advantage that it guarantees the upper bound time for the
> worst case, but of course not efficient if the tracked changes are not big
> enough - in the failover scenario since it happens rarely, it is more
> likely that the accumulated change size is big, so in that case probably
> recompute is not a bad idea.
> 
> Recompute may also avoid the complexity of corner cases during failover.
> 
> But again it depends on how we implement I-P in northd. We can change it to
> I-P whenever we find it more feasible.
> 

OK.

>>>> So for now I would suggest keeping the logic of clearing tracking on
> every
>>>> iteration and force recompute when failover.
>>>>
>>>
>>> At least until we figure out why the memory increase in the IDL, I agree
>>> to keep forcing a recompute on failover.  That's also because we
>>> currently don't really incrementally process much in ovn-northd.
>>>
>>> I'll send a v2.
> 
> Thanks! Looking at it.
> 
> Han
> 

Thanks,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to