On 9/16/20 8:02 PM, Han Zhou wrote:
>
>
> On Wed, Sep 16, 2020 at 5:10 AM Dumitru Ceara <[email protected]
> <mailto:[email protected]>> wrote:
>>
>> Hi Han,
>>
>> On 9/15/20 8:40 PM, Han Zhou wrote:
>> > When a lport is referenced by a logical flow where port-binding refs
>> > needs to be added, currently it can add the same reference pair multiple
>> > times in below situations (introduced in commit ade4e77):
>> >
>> > 1) In add_matches_to_flow_table(), different matches from same lflow
>> > can have same inport/outport.
>> >
>> > 2) In is_chassis_resident_cb(), a lflow can have multiple
> is_chassis_resident
>> > check for same lport (although not very common), and at the same time
>> > the lport used in is_chassis_resident can overlap with the inport/
>> > outport of the same flow.
>> >
>> > Now because of the redundant entries added, it results in unexpected
> behavior
>> > such as same lflow being processed multiple times as a waste of
> processing.
>> > More severely, after commit 580aea72e it can result in orphaned
> pointer leading
>> > to crash, as reported in [0].
>> >
>> > This patch fixes the problems by checking existance of same
> reference before
>>
>> Nit: s/existance/existence
>
> Ack.
>>
>> > adding in lflow_resource_add(). To do this check efficiently, hmap
> is used to
>> > replace the list struct for the resource-to-lflow index.
>> >
>> > [0]
> https://mail.openvswitch.org/pipermail/ovs-dev/2020-September/374991.html
>> >
>> > Reported-by: Dumitru Ceara <[email protected]
> <mailto:[email protected]>>
>> > Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2020-September/374991.html
>> > Fixes: ade4e779d3fb ("ovn-controller: Use the tracked runtime data
> changes for flow calculation.")
>> > Fixes: 580aea72e26f ("ovn-controller: Fix conjunction handling with
> incremental processing.")
>> > Signed-off-by: Han Zhou <[email protected] <mailto:[email protected]>>
>> > ---
>> > v1 -> v2:
>> > - Addressed Dumitru's comments.
>> > - Moved the unrelated change to a separate patch in the series:
>> > "lflow.c: Release ref_lflow_node as soon as it is not needed."
>> >
>> > controller/lflow.c | 69
> +++++++++++++++++++++++++++++++++++++-----------------
>> > controller/lflow.h | 27 +++++++++++----------
>> > tests/ovn.at <http://ovn.at> | 49
> ++++++++++++++++++++++++++++++++++++++
>> > 3 files changed, 112 insertions(+), 33 deletions(-)
>> >
>> > diff --git a/controller/lflow.c b/controller/lflow.c
>> > index e785866..86a5b76 100644
>> > --- a/controller/lflow.c
>> > +++ b/controller/lflow.c
>> > @@ -74,6 +74,15 @@ consider_logical_flow(const struct
> sbrec_logical_flow *lflow,
>> > struct lflow_ctx_out *l_ctx_out);
>> > static void lflow_resource_add(struct lflow_resource_ref *, enum
> ref_type,
>> > const char *ref_name, const struct
> uuid *);
>> > +static struct ref_lflow_node * ref_lflow_lookup(struct hmap
> *ref_lflow_table,
>> > + enum ref_type,
>> > + const char *ref_name);
>>
>> I don't see an explicit rule for pointer return types for functions in
>> our coding guidelines but from what I see most of the time we prefer to
>> attach the '*' to the function name.
>
> Ack.
>>
>> Or:
>>
>> Indent needs one more space.
>>
>> > +static struct lflow_ref_node * lflow_ref_lookup(struct hmap
> *lflow_ref_table,
>> > + const struct uuid
> *lflow_uuid);
>>
>> Same here.
>
> Ack.
>>
>> > +static void ref_lflow_node_destroy(struct ref_lflow_node *);
>> > +static void lflow_resource_destroy_lflow(struct lflow_resource_ref *,
>> > + const struct uuid
> *lflow_uuid);
>> > +
>> >
>> > static bool
>> > lookup_port_cb(const void *aux_, const char *port_name, unsigned
> int *portp)
>> > @@ -161,15 +170,14 @@ lflow_resource_destroy(struct
> lflow_resource_ref *lfrr)
>> > {
>> > struct ref_lflow_node *rlfn, *rlfn_next;
>> > HMAP_FOR_EACH_SAFE (rlfn, rlfn_next, node,
> &lfrr->ref_lflow_table) {
>> > - free(rlfn->ref_name);
>> > struct lflow_ref_list_node *lrln, *next;
>> > - LIST_FOR_EACH_SAFE (lrln, next, ref_list,
> &rlfn->ref_lflow_head) {
>> > - ovs_list_remove(&lrln->ref_list);
>> > - ovs_list_remove(&lrln->lflow_list);
>> > + HMAP_FOR_EACH_SAFE (lrln, next, hmap_node,
> &rlfn->lflow_uuids) {
>> > + ovs_list_remove(&lrln->list_node);
>> > + hmap_remove(&rlfn->lflow_uuids, &lrln->hmap_node);
>> > free(lrln);
>> > }
>> > hmap_remove(&lfrr->ref_lflow_table, &rlfn->node);
>> > - free(rlfn);
>> > + ref_lflow_node_destroy(rlfn);
>> > }
>> > hmap_destroy(&lfrr->ref_lflow_table);
>> >
>> > @@ -224,17 +232,28 @@ lflow_resource_add(struct lflow_resource_ref
> *lfrr, enum ref_type type,
>> > {
>> > struct ref_lflow_node *rlfn =
> ref_lflow_lookup(&lfrr->ref_lflow_table,
>> > type, ref_name);
>> > + struct lflow_ref_node *lfrn =
> lflow_ref_lookup(&lfrr->lflow_ref_table,
>> > + lflow_uuid);
>> > + if (rlfn && lfrn) {
>> > + /* Check if the mapping already existed before adding a new
> one. */
>>
>> I would move this to a separate function:
>>
>> static bool
>> ref_lflow_node_contains_uuid(const struct ref_lflow_node *, const struct
>> uuid *);
>
> I have no strong opinion on this, but in general I'd prefer fewer
> functions if the code is:
> 1. not reused (not creating redundancy) and
> 2. the size is small enough (not affecting readability).
>
I guess it depends. It's also a matter of preference. But in this case
without the comment "Check if the mapping already existed before adding
a new one" a reader would have to check what the lflow_uuids field
actually is and how it's populated. On the other hand, a call to
something like:
ref_lflow_node_contains_uuid(rlfn, flow_uuid)
should be self-explanatory and has the advantage that the implementation
of the check is close to where the structure and other helpers are defined.
But we can discuss this further if/when we refactor this part of the code.
>>
>> > + struct lflow_ref_list_node *n;
>> > + HMAP_FOR_EACH_WITH_HASH (n, hmap_node, uuid_hash(lflow_uuid),
>> > + &rlfn->lflow_uuids) {
>>
>> Indent needs an extra space.
>
> Ack.
>>
>> > + if (uuid_equals(&n->lflow_uuid, lflow_uuid)) {
>> > + return;
>> > + }
>> > + }
>> > + }
>> > +
>> > if (!rlfn) {
>> > rlfn = xzalloc(sizeof *rlfn);
>> > rlfn->node.hash = hash_string(ref_name, type);
>> > rlfn->type = type;
>> > rlfn->ref_name = xstrdup(ref_name);
>> > - ovs_list_init(&rlfn->ref_lflow_head);
>> > + hmap_init(&rlfn->lflow_uuids);
>> > hmap_insert(&lfrr->ref_lflow_table, &rlfn->node,
> rlfn->node.hash);
>>
>> As mentioned on the v1 thread, we have lookup functions, it would make
>> sense to have *_create() functions.
>>
> Same as above. In addition, I prefer fewer functions because it is
> easier (slightly) to tell from the prototypes that those are the only
> operations possible for this data structure. In this case, it tells that
> this is the only interface that allocates memory for the resource
> reference table. Of course with extra functions you can still tell that
> by checking the code and realize that there is only one place the
> _create() functions are called, but to me that is just more noise
> (slightly). I don't think every data structure needs all the
> constructors and destructors in C. The code should evolve naturally when
> such functions are necessary. But if there is a coding style kind of
> agreement that makes it a convention then I would just follow. Again, I
> don't have a strong opinion in this case.
>
I tend to agree in general but my concern in this specific case is that
it's very easy to make a typo and write "free(lfrn);" instead of
"free(lrln);".
>> > }
>> >
>> > - struct lflow_ref_node *lfrn =
> lflow_ref_lookup(&lfrr->lflow_ref_table,
>> > - lflow_uuid);
>> > if (!lfrn) {
>> > lfrn = xzalloc(sizeof *lfrn);
>> > lfrn->node.hash = uuid_hash(lflow_uuid);
>>
>> Same here.
>>
>> > @@ -245,8 +264,17 @@ lflow_resource_add(struct lflow_resource_ref
> *lfrr, enum ref_type type,
>> >
>> > struct lflow_ref_list_node *lrln = xzalloc(sizeof *lrln);
>> > lrln->lflow_uuid = *lflow_uuid;
>> > - ovs_list_push_back(&rlfn->ref_lflow_head, &lrln->ref_list);
>> > - ovs_list_push_back(&lfrn->lflow_ref_head, &lrln->lflow_list);
>> > + lrln->rlfn = rlfn;
>> > + hmap_insert(&rlfn->lflow_uuids, &lrln->hmap_node,
> uuid_hash(lflow_uuid));
>> > + ovs_list_push_back(&lfrn->lflow_ref_head, &lrln->list_node);
>>
>> Here too.
>>
>> > +}
>> > +
>> > +static void
>> > +ref_lflow_node_destroy(struct ref_lflow_node *rlfn)
>> > +{
>> > + free(rlfn->ref_name);
>> > + hmap_destroy(&rlfn->lflow_uuids);
>> > + free(rlfn);
>> > }
>> >
>> > static void
>> > @@ -261,9 +289,9 @@ lflow_resource_destroy_lflow(struct
> lflow_resource_ref *lfrr,
>> >
>> > hmap_remove(&lfrr->lflow_ref_table, &lfrn->node);
>> > struct lflow_ref_list_node *lrln, *next;
>> > - LIST_FOR_EACH_SAFE (lrln, next, lflow_list,
> &lfrn->lflow_ref_head) {
>> > - ovs_list_remove(&lrln->ref_list);
>> > - ovs_list_remove(&lrln->lflow_list);
>> > + LIST_FOR_EACH_SAFE (lrln, next, list_node, &lfrn->lflow_ref_head) {
>> > + ovs_list_remove(&lrln->list_node);
>> > + hmap_remove(&lrln->rlfn->lflow_uuids, &lrln->hmap_node);
>>
>> Here we remove every lflow_ref_list_node from lflow_ref_head and also
>> remove it from the lflow_uuids hashmap but in lflow_handle_changed_ref()
>> we split this operation in two stages: detach and later destroy.
>>
> They are different. Remember that we have two indexes to the reference
> table, lflow_uuids and resources (in the code I used ref_name + ref_type
> as the key for resources, so the data type was named *ref* instead of
> *resource*). In this API it is deleting all the entries related to a
> given lflow_uuid.
>
>> We have a comment in lflow_handle_changed_ref() but I think it would be
>> cleaner if we had a "lflow_ref_list_node_detach()" and
>> "lflow_ref_list_node_destroy()" API. We can even add
>> "ovs_list_remove(&lrln->list_node)" if we're a bit careful and check
>> that the list_node is used before removing it.
>>
>> My first impression when reading the code in lflow_handle_changed_ref()
>> was that we forgot to remove the node from the list.
>>
>
> I agree that this is the trickiest part of the whole data structure
> operations, but I am not sure if having a "detach" API makes it more
> clear. I guess better comments would be more helpful.
> In lflow_handle_changed_ref(), we can't simply destroy a node. The cross
> reference table needs to be manipulated in different directions in
> different stages.
> Originally the table looks like this (similar to
> https://github.com/ovn-org/ovn/blob/master/controller/ofctrl.c#L80-L100):
> * lflow UUIDs
>
> * +-----+-----+-----+-----+-----+-----+-----+
>
> * | | | | | | | |
>
> * +--+--+--+--+--+--+-----+--+--+--+--+--+--+
>
> * | | | | | |
>
> * resources | | | | | |
> * +----+ +-+-+ | +-+-+ | +-+-+ |
> * | +-------+ +-------+ +-------------+ | |
> * +----+ +---+ | +-+-+ | +---+ |
> * | | | | | |
>
> * +----+ | | +-+-+ |
>
> * | +-------------------------------+ | |
> * +----+ +---+ | +---+ |
> * | +-------------+ | | |
>
> * +----+ +---+ | |
>
> * | | | |
>
> * +----+ +-+-+ +-+-+
> * | +-------------------+ +-------------------+ |
>
> * +----+ +---+ +---+
> * | |
>
> * +----+
>
> Now the horizontal direction lists are replaced by hash tables (so it is
> even harder to represent in the diagram).
>
> 1. For a given resource (ref_type + ref_name), detach all the related
> nodes from the table. (Without this, step 3 would be messed up)
> 2. Traverse the list (now the hmap) of the resource that is detached in
> step 1, and do some processing for the lflows and get a new set of
> lflows that are to be deleted and re-parsed.
> 3. For the set of lflows got from step 2, destroy the related nodes in
> the cross reference table by calling lflow_resource_destroy_lflow(), and
> re-parse it by calling consider_logical_flow(), which would result in
> creating new nodes in the cross reference table.
> 4. Finally, destroy the detached list (now the hmap) from step 1.
>
> So, if following the steps with the help of the above diagram to
> visualize, the data structure and operations may become more clear.
> However, it seems not easy to abstract this to one of two APIs. It is
> more of a pattern than an interface.
>
Right, I thought it was supposed to be like that. Thanks for the
detailed explanation.
> These steps were commented in code but are embedded and scattered. Maybe
> I could summarize them in the beginning of the function to make it more
> clear.
>
That'd be great!
>
>> > free(lrln);
>> > }
>> > free(lfrn);
>> > @@ -531,12 +559,12 @@ lflow_handle_changed_ref(enum ref_type
> ref_type, const char *ref_name,
>> > hmap_remove(&l_ctx_out->lfrr->ref_lflow_table, &rlfn->node);
>> >
>> > struct lflow_ref_list_node *lrln, *next;
>> > - /* Detach the rlfn->ref_lflow_head nodes from the lfrr table
> and clean
>> > + /* Detach the rlfn->lflow_uuids nodes from the lfrr table and clean
>> > * up all other nodes related to the lflows that uses the resource,
>> > * so that the old nodes won't interfere with updating the lfrr
> table
>> > * when reparsing the lflows. */
>> > - LIST_FOR_EACH (lrln, ref_list, &rlfn->ref_lflow_head) {
>> > - ovs_list_remove(&lrln->lflow_list);
>> > + HMAP_FOR_EACH (lrln, hmap_node, &rlfn->lflow_uuids) {
>> > + ovs_list_remove(&lrln->list_node);
>> > }
>> >
>> > struct hmap dhcp_opts = HMAP_INITIALIZER(&dhcp_opts);
>> > @@ -565,7 +593,7 @@ lflow_handle_changed_ref(enum ref_type ref_type,
> const char *ref_name,
>> > /* Firstly, flood remove the flows from desired flow table. */
>> > struct hmap flood_remove_nodes =
> HMAP_INITIALIZER(&flood_remove_nodes);
>> > struct ofctrl_flood_remove_node *ofrn, *ofrn_next;
>> > - LIST_FOR_EACH (lrln, ref_list, &rlfn->ref_lflow_head) {
>> > + HMAP_FOR_EACH (lrln, hmap_node, &rlfn->lflow_uuids) {
>> > VLOG_DBG("Reprocess lflow "UUID_FMT" for resource type: %d,"
>> > " name: %s.",
>> > UUID_ARGS(&lrln->lflow_uuid),
>> > @@ -604,12 +632,11 @@ lflow_handle_changed_ref(enum ref_type
> ref_type, const char *ref_name,
>> > }
>> > hmap_destroy(&flood_remove_nodes);
>> >
>> > - LIST_FOR_EACH_SAFE (lrln, next, ref_list, &rlfn->ref_lflow_head) {
>> > - ovs_list_remove(&lrln->ref_list);
>> > + HMAP_FOR_EACH_SAFE (lrln, next, hmap_node, &rlfn->lflow_uuids) {
>> > + hmap_remove(&rlfn->lflow_uuids, &lrln->hmap_node);
>> > free(lrln);
>> > }
>> > - free(rlfn->ref_name);
>> > - free(rlfn);
>> > + ref_lflow_node_destroy(rlfn);
>>
>> I think detaching the "lflow_ref_list_node" entries from ref_lflow_node
>> and freeing them can be part of ref_lflow_node_destroy().
>>
>> We duplicate this code already in lflow_resource_destroy() and here,
>> lflow_handle_changed_ref().
>>
> They are not duplicated. In lflow_resource_destroy() it is different. It
> performs one more operation there: ovs_list_remove(&lrln->list_node);
> We can wrap a function that handles both cases: destroy a ref that is
> already detached from the cross reference table, and destroy a ref that
> is not detached. However, I am not sure if it makes the code more clear
> or more confusing. To achieve that, we need to re-init the list node
> when it is removed from the list, so that we can later check if it is in
> a list before removing it, to avoid double removing. It can work, but it
> is not a common practice at least in OVS/OVN code, so I would avoid it
> since the whole purpose is to make the code more readable.
Right, the only difference is the is the
ovs_list_remove(&lrln->list_node). I had exactly that in mind, reiniting
the list node after removal.
It is a bit unusual but we do use a similar pattern in at least one
place in OVN to check if a node is part of a list, the line where we
crash without your fix:
https://github.com/ovn-org/ovn/blob/master/controller/ofctrl.c#L1108
>> >
>> > dhcp_opts_destroy(&dhcp_opts);
>> > dhcp_opts_destroy(&dhcpv6_opts);
>> > diff --git a/controller/lflow.h b/controller/lflow.h
>> > index c66b318..1251fb0 100644
>> > --- a/controller/lflow.h
>> > +++ b/controller/lflow.h
>> > @@ -78,25 +78,28 @@ enum ref_type {
>> > REF_TYPE_PORTBINDING
>> > };
>> >
>> > -/* Maintains the relationship for a pair of named resource and
>> > - * a lflow, indexed by both ref_lflow_table and lflow_ref_table. */
>> > -struct lflow_ref_list_node {
>> > - struct ovs_list lflow_list; /* list for same lflow */
>> > - struct ovs_list ref_list; /* list for same ref */
>> > - struct uuid lflow_uuid;
>> > -};
>> > -
>> > struct ref_lflow_node {
>> > - struct hmap_node node;
>> > + struct hmap_node node; /* node in
> lflow_resource_ref.ref_lflow_table. */
>> > enum ref_type type; /* key */
>> > char *ref_name; /* key */
>>
>> Nit: I would align the comments with the one for the 'node' field.
> Ok, but I'd reformat this in a separate patch together with other
> refactors if needed, because it is not related to the fix and would add
> more noise to the patch itself.
>
> Thanks again for the detailed review and comments. To summarize, for the
> comments I responded "Ack", I addressed in v3. For the other comments,
> we can continue discussing and when we have agreement let's (either you
> or myself) post a separate patch because those are refactors related to
> existed code, not to the fix itself. I hope the fix can be merged soon
> because it is critical.
>
Ack, I'll have a look at v3 soon.
Thanks,
Dumitru
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev