Hi Antonio,

Thank you for the patches, 

Please find my comments below.



Regards
_Sugesh


> -----Original Message-----
> From: [email protected] [mailto:ovs-dev-
> [email protected]] On Behalf Of Fischetti, Antonio
> Sent: Tuesday, June 6, 2017 5:15 PM
> To: Darrell Ball <[email protected]>; [email protected]
> Subject: Re: [ovs-dev] [PATCH RFC v2] Conntrack: Avoid recirculation for
> established connections.
> 
> Thanks Darrel for your useful comments, I've tried to replicate the usecases
> you mentioned, please find inline some details.
> 
> Regards,
> Antonio
> 
> > -----Original Message-----
> > From: Darrell Ball [mailto:[email protected]]
> > Sent: Thursday, June 1, 2017 6:20 PM
> > To: Fischetti, Antonio <[email protected]>;
> > [email protected]
> > Subject: Re: [ovs-dev] [PATCH RFC v2] Conntrack: Avoid recirculation
> > for established connections.
> >
> > Comments inline
> >
> > On 5/29/17, 8:24 AM, "[email protected] on behalf of
> > Fischetti, Antonio" <[email protected] on behalf of
> > [email protected]> wrote:
> >
> >     Thanks Joe for your feedback and the interesting insights in
> > conntrack in your earlier communication.
> >     We have added all the details that we considered for this first
> > implementation. Also, some answers are inline.
> >
> >     The purpose of this implementation is to avoid recirculation just
> > for those packets that are part of established connections.
> >
> >     This shouldn't affect the packet recirculation for actions other
> > than conntrack. For example in MPLS, after a pop_mpls action the
> > packet will still be recirculated to follow the usual datapath.
> >
> >     Most importantly, the CT module isn't by-passed in this
> > implementation.
> >
> >     Besides the recirculation, all other action[s] shall be executed
> > as-is on each packet.
> >     Any new CT change or action set by the controller will be managed
> > as usual.
> >
> >     For our testing we set up a simple firewall, below are the flows.
> >
> >
> >      Flow Setup
> >      ----------
> >     table=0, priority=100,ct_state=-trk,ip actions=ct(table=1)
> >     table=0, priority=10,arp actions=NORMAL
> >     table=0, priority=1 actions=drop
> >     table=1, ct_state=+new+trk,ip,in_port=1 actions=ct(commit),output:2
> >     table=1, ct_state=+new+trk,ip,in_port=2 actions=drop
> >     table=1, ct_state=+est+trk,ip,in_port=1 actions=output:2
> >     table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1
> >
> >
> >      Basic idea
> >      ----------
> >     With the above rules, all packets belonging to an established
> > connection will first hit the flow
> >     "ct_state=-trk,ip actions=ct(table=1)"
> >
> >     then on recirculation, they will hit
> >     "ct_state=+est+trk,ip,in_port=.. actions=output:X".
> >
> >     The basic idea is to do the following 2 steps.
> >     1. Combine the two sets of actions by removing the recirculation.
> >        a) Original actions:
> >         - "ct(zone=N), recirc(id)" [ i.e ct(table=1) ]
> >         - "output:X"
> >        b) Combined Actions after Removing recirculation:
> >         - "ct(zone=N), output:X".
> >
> >     2. All the subsequent packets shall hit a flow with the combined
> > actions.
> >
> >
> > [Darrell]
> >
> > 1) This would be constraining on how rules are written such that we
> > can’t go back to
> >               the beginning of the pipeline, just because a packet is
> > established.
> >               Meaning we can’t run stateless or stateful rules earlier
> > in the pipeline.
> >               There are lots of possible combinations.
> >               One case is:
> >               We may run a bunch of firewall checks in table X, later
> > on (table X+1) run the non-filtered packets through conntrack
> >               and then want to go back to the firewall table.
> >
> 
> [Antonio]
> I've tried to replicate the case you described. I've used the following set of
> rules. They may have no practical meaning, the purpose is to have - at least
> for packets from port 1 - that they traverse tables #0 => 1 => 2 and then are
> sent back to table#0.
> 
> NXST_FLOW reply (xid=0x4):
>  table=0, priority=100,ct_state=-trk,ip actions=ct(table=1)  table=1,
> ct_state=+new+trk,ip,in_port=1,nw_src=10.10.10.0/24
> actions=ct(commit),output:2  table=1, ct_state=+est+trk,ip,in_port=1
> actions=mod_nw_src:3.3.3.3,resubmit(,2)
>  table=2, ct_state=+est+trk,tcp,in_port=1 actions=drop  table=2,
> ct_state=+est+trk,udp,in_port=1 actions=resubmit(1,0)  table=0,
> ct_state=+est+trk,udp,in_port=1,nw_src=3.3.3.3 actions=output:2  table=1,
> ct_state=+new+trk,ip,in_port=2 actions=drop  table=1,
> ct_state=+est+trk,ip,in_port=2 actions=output:1  table=0, priority=10,arp
> actions=NORMAL
> 
> For packets from port1:
>  - table#0: untracked packets go through the CT, then are recirculated, ie go
> to table#1.
>  - table#1: new conn are committed.
>  - table#1: packets of est conn are modified in the IPsrc that becomes 
> 3.3.3.3,
>             then sent to table#2.
>  - table#2: TCP packets are dropped - but I'm sending just UDP traffic.
>  - table#2: UDP packets go back to table#0.
>  - table#0: est conn packets with IPsrc=3.3.3.3 are sent to port2.
> 
> In this case we still see that the packets of est conn will skip one 
> recirculation.
> 
> In fact what happens is the following.
> For packets belonging to est connections - and coming from
> port1 - the actions
>    mod_nw_src:3.3.3.3,resubmit(,2)
> become
>    ct(),mod_nw_src:3.3.3.3,resubmit(,2)
> 
> and the EMC entry that was hit by the original packet will point to the
> megaflow
>    table=1, ct_state=+est+trk,ip,in_port=1
> actions=ct(),mod_nw_src:3.3.3.3,resubmit(,2)
> So new packets of the same connection will skip one recirculation.
> 
> For packets from port2 the megaflow
>    table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1 becomes
>    table=1, ct_state=+est+trk,ip,in_port=2 actions=ct(),output:1 and will be
> pointed by the EMC entry hit by the original packet.
> So new packets of the same connection will skip one recirculation.
> 
> Is this usecase somewhat similar to the one you were mentioning?
> Or maybe you can suggest some other rule setup?
> 
> 
> > 2) This allows the dataplane to override what the ofproto layer specifies.
> > This looks like a layering violation.
> >
> > I guess this also speaks to the datapath (userspace) specific design here.
> >
> >
> > I don’t think it matters what the performance advantage is, although
> > it is small in this case.
> >
> >
> >
> >
> >      Considerations
> >      --------------
> >     On EMC entries:
> >     All packets hitting a connection in the same direction will have
> > the same 5-tuple.
> >     They shall all hit the same EMC entry - here referred as
> > 'EMC_ORIG' - and point to the flow
> >     "ct_state=-trk,ip actions=ct(table=1)", referred from here as
> > 'FLOW_ORIG'.
> >
> >     On recirculation the packets shall hit another EMC entry -
> > referred as 'EMC_RECIRC' - and point to the flow
> >     "ct_state=+est+trk,ip,in_port=.. actions=output:X",
> >     referred from here as 'FLOW_RECIRC'.
> >
> >     FLOW_ORIG is the flow hit by the ingress packet while FLOW_RECIRC
> > is the flow hit
> >     by the recirculated packet.
> >
> >
> >      Implementation Details
> >      ----------------------
> >     We thought about different implementations, but for the purpose of
> > this RFC we stick to the following approach.
> >
> >     On packet recirculation, a new flow has to be created with a list
> > of appropriate actions. At this
> >     stage we prepend a new action called 'ct(Skip)' to the original
> > list of actions. This
> >     new action is used as a temporary place-holder that will be over
> > written later on.
> >
> >        Original Actions: "Output:2"    ==>    New Actions: "ct(Skip),
> > Output:2"
> >
> >     For all the packets of an established conn. (i.e marked with
> > +trk,+est bits), do the 3 steps:
> >     1. Retrieve Flows from EMC.
> >         - FLOW_ORIG whose actions include:   ct(zone=N), recirc(id)
> >         - FLOW_RECIRC whose actions include: ct(Skip), Output:X
> >
> >     2. Update the actions of FLOW_RECIRC by overwriting ct(skip) with
> > the ct action from FLOW_ORIG, ie ct(zone=N).
> >         ct(Skip), Output:X  =>  ct(zone=N), Output:X
> >
> >     3. Update the flow pointer in EMC_ORIG entry to 'FLOW_RECIRC' flow.
> >
> >     In this way, we avoid recirculating the packets belonging to
> > established connections and
> >     this no ways skips sending the packets to the CT module.
> >
> >
> >     Antonio
> >
> >     > -----Original Message-----
> >     > From: [email protected] [mailto:ovs-dev-
> >     > [email protected]] On Behalf Of [email protected]
> >     > Sent: Thursday, May 25, 2017 5:03 PM
> >     > To: [email protected]
> >     > Subject: [ovs-dev] [PATCH RFC v2] Conntrack: Avoid recirculation for
> >     > established connections.
> >     >
> >     > From: Antonio Fischetti <[email protected]>
> >     >
> >     > With conntrack enabled, packets get recirculated and this impacts
> >     > the performance with thousands of active concurrent connections.
> >     >
> >     > This patch is aimed at avoiding recirculation for packets
> > belonging to
> >     > established connections in steady state. This is achieved by
> >     > manipulating the megaflows and actions. In this case the actions
> > of the
> >     > megaflow of recirculated packet is initially updated with new
> > 'CT_SKIP'
> >     > action and later updated with the actions of the megaflow of the
> >     > original packet(to handle zones). There after the EMC entry
> >     > belonging to the original packet is updated to point to the
> > 'megaflow' of
> >     > the recirculated packet there by avoiding recirculation.
> >     >
> >     > Signed-off-by: Antonio Fischetti <[email protected]>
> >     > Signed-off-by: Bhanuprakash Bodireddy
> > <[email protected]>
> >     > Co-authored-by: Bhanuprakash Bodireddy
> > <[email protected]>
> >     > ---
> >     > - ~10% performance improvement is observed in our testing with UDP
> >     > streams.
> >     > - Limited testing is done with RFC patch and the tests include
> > hundreds of
> >     >   concurrent active TCP connections.
> >     > - This is very early implementation and we are posting here to
> > get initial
> >     >   feedback.
> >     > - This RFC patch is implemented leveraging EMC, but optimization
> > could be
> >     >   extended to dpcls as well to handle higher flows.
> >     > - No VXLAN testing is done with this patch.
> >     >
> >     >  datapath/linux/compat/include/linux/openvswitch.h |   1 +
> >     >  lib/dpif-netdev.c                                 | 148
> >     > ++++++++++++++++++++--
> >     >  lib/packets.h                                     |   2 +
> >     >  3 files changed, 143 insertions(+), 8 deletions(-)
> >     >
> >     > diff --git a/datapath/linux/compat/include/linux/openvswitch.h
> >     > b/datapath/linux/compat/include/linux/openvswitch.h
> >     > index d22102e..6dc5674 100644
> >     > --- a/datapath/linux/compat/include/linux/openvswitch.h
> >     > +++ b/datapath/linux/compat/include/linux/openvswitch.h
> >     > @@ -762,6 +762,7 @@ enum ovs_ct_attr {
> >     >       OVS_CT_ATTR_NAT,        /* Nested OVS_NAT_ATTR_* */
> >     >       OVS_CT_ATTR_FORCE_COMMIT,  /* No argument */
> >     >       OVS_CT_ATTR_EVENTMASK,  /* u32 mask of IPCT_* events. */
> >     > +     OVS_CT_ATTR_SKIP,       /* No argument, does not invoke CT.
> > */
> >     >       __OVS_CT_ATTR_MAX
> >     >  };
> >     >
> >     > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >     > index b50164b..fbbb42e 100644
> >     > --- a/lib/dpif-netdev.c
> >     > +++ b/lib/dpif-netdev.c
> >     > @@ -2101,7 +2101,8 @@ emc_probabilistic_insert(struct
> > dp_netdev_pmd_thread
> >     > *pmd,
> >     >  }
> >     >
> >     >  static inline struct dp_netdev_flow *
> >     > -emc_lookup(struct emc_cache *cache, const struct
> > netdev_flow_key
> > *key)
> >     > +emc_lookup(struct emc_cache *cache, const struct
> > netdev_flow_key *key,
> >     > +        void **pfound_entry)
> >     >  {
> >     >      struct emc_entry *current_entry;
> >     >
> >     > @@ -2110,11 +2111,13 @@ emc_lookup(struct emc_cache *cache,
> > const struct
> >     > netdev_flow_key *key)
> >     >              && emc_entry_alive(current_entry)
> >     >              && netdev_flow_key_equal_mf(&current_entry->key, &key-
> > >mf)) {
> >     >
> >     > +            *pfound_entry = current_entry;
> >     >              /* We found the entry with the 'key->mf' miniflow */
> >     >              return current_entry->flow;
> >     >          }
> >     >      }
> >     >
> >     > +    *pfound_entry = NULL;
> >     >      return NULL;
> >     >  }
> >     >
> >     > @@ -4583,10 +4586,12 @@ emc_processing(struct
> > dp_netdev_pmd_thread *pmd,
> >     >          key->hash = dpif_netdev_packet_get_rss_hash(packet, &key-
> > >mf);
> >     >
> >     >          /* If EMC is disabled skip emc_lookup */
> >     > -        flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key);
> >     > +        flow = (cur_min == 0) ? NULL:
> >     > +                emc_lookup(flow_cache, key, &packet-
> > >md.prev_emc_entry);
> >     >          if (OVS_LIKELY(flow)) {
> >     >              dp_netdev_queue_batches(packet, flow, &key->mf,
> > batches,
> >     >                                      n_batches);
> >     > +            packet->md.prev_flow = flow;
> >     >          } else {
> >     >              /* Exact match cache missed. Group missed packets
> > together at
> >     >               * the beginning of the 'packets' array. */
> >     > @@ -4595,6 +4600,7 @@ emc_processing(struct
> dp_netdev_pmd_thread
> > *pmd,
> >     >               * must be returned to the caller. The next key should
> > be
> >     > extracted
> >     >               * to 'keys[n_missed + 1]'. */
> >     >              key = &keys[++n_missed];
> >     > +            packet->md.prev_flow = NULL;
> >     >          }
> >     >      }
> >     >
> >     > @@ -4604,6 +4610,9 @@ emc_processing(struct
> dp_netdev_pmd_thread
> > *pmd,
> >     >      return dp_packet_batch_size(packets_);
> >     >  }
> >     >
> >     > +#define CT_PREPEND_ACTION_LEN 0x0C
> >     > +#define CT_PREPEND_ACTION_SPARE_LEN 32
> >     > +#define CT_TOT_ACTIONS_LEN (CT_PREPEND_ACTION_LEN +
> >     > CT_PREPEND_ACTION_SPARE_LEN)
> >     >  static inline void
> >     >  handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
> >     >                       struct dp_packet *packet,
> >     > @@ -4663,12 +4672,44 @@ handle_packet_upcall(struct
> > dp_netdev_pmd_thread
> >     > *pmd,
> >     >          ovs_mutex_lock(&pmd->flow_mutex);
> >     >          netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> >     >          if (OVS_LIKELY(!netdev_flow)) {
> >     > -            netdev_flow = dp_netdev_flow_add(pmd, &match, &ufid,
> >     > -                                             add_actions->data,
> >     > -                                             add_actions->size);
> >     > +            /* Recirculated packets that belong to established
> >     > connections
> >     > +             * are treated differently.  A place-holder is
> > prepended to
> >     > the
> >     > +             * list of actions. */
> >     > +            if (!(packet->md.ct_state & CS_INVALID) &&
> >     > +                 (packet->md.ct_state & CS_TRACKED) &&
> >     > +                 (packet->md.ct_state & CS_ESTABLISHED) &&
> >     > +                 (packet->md.recirc_id)) {
> >     > +
> >
> >
> >     [Joe] "Just because the value of ct_state in the packet metadata
> > of this
> >     particular packet happens to have those particular state flags doesn't
> >     mean that the OpenFlow pipeline is enforcing this constraint for all
> >     packets hitting this flow. You would have to ensure that the mask in
> >     the flow is also ensuring that all future packets hitting the new flow
> >     will also have that particular ct_state."
> >
> >     This part of the code should prepend a ct(Skip) action to the
> >     FLOW_RECIRC only and just for packets that belong to an
> > established conn.
> >     The FLOW_ORIG shouldn't be altered, because it will be hit by all
> > ingress
> >     packets.
> >
> >     > +                /* Prepend an OVS_CT_ATTR_SKIP to the list of
> > actions
> >     > inside
> >     > +                 * add_actions.  It does not really invoke the
> > ConnTrack
> >     > module,
> >     > +                 * it's a Ct action that works as a place-holder.
> > It
> >     > will be
> >     > +                 * overwritten inside emc_patch_established_conn()
> > with
> >     > +                 * the proper ct(zone=X) action. */
> >     > +                struct dp_netdev_actions *new_actions;
> >     > +                uint8_t tmp_action[CT_TOT_ACTIONS_LEN] = {
> >     > +                        0x0C, 0x00, 0x0C, 0x00,
> >     > +                        0x06, 0x00, 0x09, 0x00,
> >     > +                        0x00, 0x00, 0x00, 0x00,
> >     > +                        /* will append here add_actions. */
> >     > +                };
> >     > +                memcpy(&tmp_action[CT_PREPEND_ACTION_LEN],
> >     > +                        add_actions->data,
> >     > +                        add_actions->size);
> >     > +                new_actions = dp_netdev_actions_create(
> >     > +                        (const struct nlattr *)tmp_action,
> >     > +                        CT_PREPEND_ACTION_LEN + add_actions->size);
> >
> >     [Joe] "Hmm. In some ways I think this would be cleaner to be
> > implemented in
> >     the xlate code, around compose_conntrack_action().. but that code is
> >     currently pretty much dataplane-oblivious and this suggestion is a
> >     userspace-specific optimization."
> >
> >
> >     > +                netdev_flow = dp_netdev_flow_add(pmd, &match,
> > &ufid,
> >     > +                                                 new_actions-
> > >actions,
> >     > +                                                 new_actions-
> > >size);
> >     > +            } else {
> >     > +                netdev_flow = dp_netdev_flow_add(pmd, &match,
> > &ufid,
> >     > +                                                 add_actions->data,
> >     > +                                                 add_actions-
> > >size);
> >     > +            }
> >     >          }
> >     >          ovs_mutex_unlock(&pmd->flow_mutex);
> >     > -        emc_probabilistic_insert(pmd, key, netdev_flow);
> >     > +        /* For this RFC, probabilistic emc insertion is disabled.
> > */
> >     > +        emc_insert(&pmd->flow_cache, key, netdev_flow);
> >     >      }
> >     >  }
> >     >
> >     > @@ -4761,7 +4802,8 @@ fast_path_processing(struct
> > dp_netdev_pmd_thread
> >     > *pmd,
> >     >
> >     >          flow = dp_netdev_flow_cast(rules[i]);
> >     >
> >     > -        emc_probabilistic_insert(pmd, &keys[i], flow);
> >     > +        /* For testing purposes the emc_insert is restored here. */
> >     > +        emc_insert(&pmd->flow_cache, &keys[i], flow);
> >     >          dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches,
> >     > n_batches);
> >     >      }
> >     >
> >     > @@ -4998,6 +5040,78 @@ dp_execute_userspace_action(struct
> >     > dp_netdev_pmd_thread *pmd,
> >     >      }
> >     >  }
> >     >
> >     > +/* Search into EMC to retrieve the entry related to the
> > original packet
> >     > + * and the entry related to the recirculated packet.
> >     > + * If both EMC entries are alive, then:
> >     > + *  - The flow actions of the recirculated packet are updated with
> >     > + *    'ct(zone=N)' as retrieved from the actions of the original
> > flow.
> >     > + *  - The EMC orig entry flow is updated with the flow pointer
> > of recirc
> >     > entry.
> >     > + */
> >     > +static inline void
> >     > +emc_patch_established_conn(struct dp_netdev_pmd_thread *pmd,
> >     > +        struct dp_packet *packet, long long now OVS_UNUSED)
> >     > +{
> >     > +    struct emc_cache *cache = &pmd->flow_cache;
> >     > +    struct netdev_flow_key key;
> >     > +    struct emc_entry *orig_entry; /* EMC entry hit by the original
> >     > packet. */
> >     > +    struct emc_entry *recirc_entry; /* EMC entry for recirculated
> > packet.
> >     > */
> >     > +    uint32_t old_hash;
> >     > +
> >     > +    if (!packet->md.prev_emc_entry) {
> >     > +        return;
> >     > +    }
> >     > +
> >     > +    orig_entry = packet->md.prev_emc_entry;
> >     > +    if (!emc_entry_alive(orig_entry)) {
> >     > +        return;
> >     > +    }
> >     > +
> >     > +    /* Check that the original EMC entry was not overwritten. */
> >     > +    struct dp_netdev_flow *orig_flow = orig_entry->flow;
> >     > +    if (packet->md.prev_flow && (packet->md.prev_flow !=
> > orig_flow)) {
> >     > +       return;
> >     > +    }
> >     > +
> >     > +    /* TODO as we are calling miniflow_extract now, can we avoid to
> >     > invoke
> >     > +     * it again when we'll classify this recirculated packet? */
> >     > +    miniflow_extract(packet, &key.mf);
> >     > +    key.len = 0; /* Not computed yet. */
> >     > +    old_hash = dp_packet_get_rss_hash(packet);
> >     > +    key.hash = dpif_netdev_packet_get_rss_hash(packet, NULL);
> >     > +    dp_packet_set_rss_hash(packet, old_hash);
> >     > +
> >     > +    EMC_FOR_EACH_POS_WITH_HASH(cache, recirc_entry, key.hash) {
> >     > +        if (recirc_entry->key.hash == key.hash
> >     > +            && emc_entry_alive(recirc_entry)
> >     > +            && netdev_flow_key_equal_mf(&recirc_entry->key,
> > &key.mf)) {
> >     > +
> >     > +            if (orig_entry->flow == recirc_entry->flow) {
> >     > +                /* Nothing to do, already patched by a packet of
> > this
> >     > +                 * same batch. */
> >     > +                return;
> >     > +            }
> >     > +            /* We found the entry related to the recirculated
> > packet.
> >     > +             * Retrieve the actions for orig and recirc entries. *
> > */
> >     > +            struct dp_netdev_actions * orig_actions =
> >     > +                    dp_netdev_flow_get_actions(orig_entry->flow);
> >     > +            struct dp_netdev_actions * recirc_actions =
> >     > +                    dp_netdev_flow_get_actions(recirc_entry->flow);
> >     > +
> >     > +            /* The orig entry action contains a CT action with the
> > zone
> >     > info. */
[Sugesh] is it safe to assume first action would be ct ??
> >     > +            struct nlattr *p = &orig_actions->actions[0];
> >     > +
> >     > +            /* Overwrite the CT Skip action of recirc entry with
> >     > ct(zone=N). */
> >     > +            memcpy(recirc_actions->actions, p, p->nla_len);
[Sugesh] What if recirc_action size is smaller than p? shouldn’t we make sure 
the size of skip action match with size of actions present in ct_table?

> >     > +
> >     > +            /* Update orig EMC entry. */
> >     > +            orig_entry->flow = recirc_entry->flow;
> >     > +            dp_netdev_flow_ref(orig_entry->flow);
> >     > +
> >     > +            return;
> >     > +        }
> >     > +    }
> >     > +}
> >     > +
> >     >  static void
> >     >  dp_execute_cb(void *aux_, struct dp_packet_batch *packets_,
> >     >                const struct nlattr *a, bool may_steal)
> >     > @@ -5024,7 +5138,6 @@ dp_execute_cb(void *aux_, struct
> > dp_packet_batch
> >     > *packets_,
> >     >              } else {
> >     >                  tx_qid = pmd->static_tx_qid;
> >     >              }
> >     > -
> >     >              netdev_send(p->port->netdev, tx_qid, packets_,
> > may_steal,
> >     >                          dynamic_txqs);
> >     >              return;
> >
> >     [Joe] "Unrelated style change."
> >
> >     Will do that.
> >
> >     > @@ -5145,10 +5258,21 @@ dp_execute_cb(void *aux_, struct
> > dp_packet_batch
> >     > *packets_,
> >     >              struct dp_packet *packet;
> >     >              DP_PACKET_BATCH_FOR_EACH (packet, packets_) {
> >     >                  packet->md.recirc_id = nl_attr_get_u32(a);
> >     > +
> >     > +                if (!(packet->md.ct_state & CS_INVALID) &&
> >     > +                     (packet->md.ct_state & CS_TRACKED) &&
> >     > +                     (packet->md.ct_state & CS_ESTABLISHED)) {
> >     > +                    (*depth)++;
> >     > +                    emc_patch_established_conn(pmd, packet, now);
> >     > +                    (*depth)--;
> >     > +                }
> >     >              }
> >     >
> >     >              (*depth)++;
> >     >              dp_netdev_recirculate(pmd, packets_);
> >     > +            DP_PACKET_BATCH_FOR_EACH (packet, packets_) {
> >     > +                packet->md.recirc_id = 0;
> >     > +            }
> >     >              (*depth)--;
> >     >
> >     >              return;
> >
> >     [Joe] "Can you give some more detail about why resetting the
> > recirc_id is
> >     correct here? Could/should this be applied as an independent change
> >     regardless of the rest of this series?"
> >
> >     This addition is needed so that in handle_packet_upcall we add the
> > ct(Skip)
> >     just to the FLOW_RECIRC and not to the FLOW_ORIG.
> >     I guess this could be also applied as an independent change?
> >
> >     > @@ -5166,6 +5290,7 @@ dp_execute_cb(void *aux_, struct
> > dp_packet_batch
> >     > *packets_,
> >     >          const char *helper = NULL;
> >     >          const uint32_t *setmark = NULL;
> >     >          const struct ovs_key_ct_labels *setlabel = NULL;
> >     > +        bool skip_ct = false;
> >     >
> >     >          NL_ATTR_FOR_EACH_UNSAFE (b, left, nl_attr_get(a),
> >     >                                   nl_attr_get_size(a)) {
> >     > @@ -5194,6 +5319,9 @@ dp_execute_cb(void *aux_, struct
> > dp_packet_batch
> >     > *packets_,
> >     >                  /* Silently ignored, as userspace datapath does not
> >     > generate
> >     >                   * netlink events. */
> >     >                  break;
> >     > +            case OVS_CT_ATTR_SKIP:
> >     > +                skip_ct = true;
> >     > +                break;
> >     >              case OVS_CT_ATTR_NAT:
> >     >              case OVS_CT_ATTR_UNSPEC:
> >     >              case __OVS_CT_ATTR_MAX:
> >     > @@ -5201,6 +5329,10 @@ dp_execute_cb(void *aux_, struct
> > dp_packet_batch
> >     > *packets_,
> >     >              }
> >     >          }
> >     >
> >     > +        if (OVS_UNLIKELY(skip_ct)) {
> >     > +            break;
> >     > +        }
> >     > +
> >     >          conntrack_execute(&dp->conntrack, packets_, aux->flow-
> > >dl_type,
> >     > force,
> >     >                            commit, zone, setmark, setlabel, helper);
> >     >          break;
> >     > diff --git a/lib/packets.h b/lib/packets.h
> >     > index 7dbf6dd..2c46a15 100644
> >     > --- a/lib/packets.h
> >     > +++ b/lib/packets.h
> >     > @@ -112,6 +112,8 @@ struct pkt_metadata {
> >     >      struct flow_tnl tunnel;     /* Encapsulating tunnel parameters.
> > Note
> >     > that
> >     >                                   * if 'ip_dst' == 0, the rest of
> > the
> >     > fields may
> >     >                                   * be uninitialized. */
[Sugesh] I don’t see the initiation of these fields. Have you verified the cost 
of these fields in pkt_metadata for normal traffic,?. I mean the performance 
impact.?
> >     > +    void *prev_emc_entry;       /* EMC entry that this packet
> > matched. */
> >     > +    void *prev_flow;            /* Flow pointed by the matching EMC
> >     > entry. */
> >     >  };
> >     >
> >     >  static inline void
> >     > --
> >     > 2.4.11
> >     >
[Sugesh] I cannot apply the patch on latest master due to some conflicts.
I am not sure , how does the stats get updated when you are combining the flows 
in datapath. Earlier there are two UUID for two flows and now only one UUID is 
being used. 
I don’t have a expertise in conntrack, However what would be the result when we 
have actions like

table=0, priority=100,ct_state=-trk,ip actions=clone(push_vlan(4), trunc(200), 
output:2), ct(table=1) 
table=1, ct_state=+est+trk,ip,in_port=1 actions=output:2


Also another example would be 
table=0, priority=100,ct_state=-trk,ip actions=push_vlan(4), ct(table=1) 
,dec_ttl
table=1, ct_state=+est+trk,ip,in_port=1 actions=output:2

As Joe suggested before, I think the logic of combine will be good to be in the 
ofproto-xlate layer than here. Though its userspace specific, it could be the 
same way how the userspace_tunnel handled there.


Also another concern is on the actions that does fields modification along with 
connection tracking. Probably it’s a valid use case with NAT. In this scenario, 
if the flow fields are changed by action, how does it possible to track back 
the original flow?
Or am I missing anything ?



> >     > _______________________________________________
> >     > dev mailing list
> >     > [email protected]
> >     > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__mail.openvswitch.org_mailman_listinfo_ovs-
> > 2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-
> > uZnsw&m=nVfAqNkwbufuhX-dkxok67-OaTEtfFTDAj-
> > Z39vSDZE&s=7SujU3lZVLCtG1pTSrZt1CMmmuv4iKnSkprKdkxSXZY&e=
> >     _______________________________________________
> >     dev mailing list
> >     [email protected]
> >     https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__mail.openvswitch.org_mailman_listinfo_ovs-
> > 2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-
> > uZnsw&m=nVfAqNkwbufuhX-dkxok67-OaTEtfFTDAj-
> > Z39vSDZE&s=7SujU3lZVLCtG1pTSrZt1CMmmuv4iKnSkprKdkxSXZY&e=
> >
> 
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to