Thanks Sugesh for your comments, pls see replies inline.
-Antonio
> -----Original Message-----
> From: Chandran, Sugesh
> Sent: Friday, June 23, 2017 4:20 PM
> To: Fischetti, Antonio <[email protected]>; [email protected]
> Subject: RE: [ovs-dev] [PATCH RFC v2] Conntrack: Avoid recirculation for
> established connections.
>
> Hi Antonio,
>
> Thank you for the patches,
>
> Please find my comments below.
>
>
>
> Regards
> _Sugesh
>
>
> > -----Original Message-----
> > From: [email protected] [mailto:ovs-dev-
> > [email protected]] On Behalf Of Fischetti, Antonio
> > Sent: Tuesday, June 6, 2017 5:15 PM
> > To: Darrell Ball <[email protected]>; [email protected]
> > Subject: Re: [ovs-dev] [PATCH RFC v2] Conntrack: Avoid recirculation for
> > established connections.
> >
> > Thanks Darrel for your useful comments, I've tried to replicate the
> usecases
> > you mentioned, please find inline some details.
> >
> > Regards,
> > Antonio
> >
> > > -----Original Message-----
> > > From: Darrell Ball [mailto:[email protected]]
> > > Sent: Thursday, June 1, 2017 6:20 PM
> > > To: Fischetti, Antonio <[email protected]>;
> > > [email protected]
> > > Subject: Re: [ovs-dev] [PATCH RFC v2] Conntrack: Avoid recirculation
> > > for established connections.
> > >
> > > Comments inline
> > >
> > > On 5/29/17, 8:24 AM, "[email protected] on behalf of
> > > Fischetti, Antonio" <[email protected] on behalf of
> > > [email protected]> wrote:
> > >
> > > Thanks Joe for your feedback and the interesting insights in
> > > conntrack in your earlier communication.
> > > We have added all the details that we considered for this first
> > > implementation. Also, some answers are inline.
> > >
> > > The purpose of this implementation is to avoid recirculation just
> > > for those packets that are part of established connections.
> > >
> > > This shouldn't affect the packet recirculation for actions other
> > > than conntrack. For example in MPLS, after a pop_mpls action the
> > > packet will still be recirculated to follow the usual datapath.
> > >
> > > Most importantly, the CT module isn't by-passed in this
> > > implementation.
> > >
> > > Besides the recirculation, all other action[s] shall be executed
> > > as-is on each packet.
> > > Any new CT change or action set by the controller will be managed
> > > as usual.
> > >
> > > For our testing we set up a simple firewall, below are the flows.
> > >
> > >
> > > Flow Setup
> > > ----------
> > > table=0, priority=100,ct_state=-trk,ip actions=ct(table=1)
> > > table=0, priority=10,arp actions=NORMAL
> > > table=0, priority=1 actions=drop
> > > table=1, ct_state=+new+trk,ip,in_port=1
> actions=ct(commit),output:2
> > > table=1, ct_state=+new+trk,ip,in_port=2 actions=drop
> > > table=1, ct_state=+est+trk,ip,in_port=1 actions=output:2
> > > table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1
> > >
> > >
> > > Basic idea
> > > ----------
> > > With the above rules, all packets belonging to an established
> > > connection will first hit the flow
> > > "ct_state=-trk,ip actions=ct(table=1)"
> > >
> > > then on recirculation, they will hit
> > > "ct_state=+est+trk,ip,in_port=.. actions=output:X".
> > >
> > > The basic idea is to do the following 2 steps.
> > > 1. Combine the two sets of actions by removing the recirculation.
> > > a) Original actions:
> > > - "ct(zone=N), recirc(id)" [ i.e ct(table=1) ]
> > > - "output:X"
> > > b) Combined Actions after Removing recirculation:
> > > - "ct(zone=N), output:X".
> > >
> > > 2. All the subsequent packets shall hit a flow with the combined
> > > actions.
> > >
> > >
> > > [Darrell]
> > >
> > > 1) This would be constraining on how rules are written such that we
> > > can’t go back to
> > > the beginning of the pipeline, just because a packet is
> > > established.
> > > Meaning we can’t run stateless or stateful rules earlier
> > > in the pipeline.
> > > There are lots of possible combinations.
> > > One case is:
> > > We may run a bunch of firewall checks in table X, later
> > > on (table X+1) run the non-filtered packets through conntrack
> > > and then want to go back to the firewall table.
> > >
> >
> > [Antonio]
> > I've tried to replicate the case you described. I've used the following
> set of
> > rules. They may have no practical meaning, the purpose is to have - at
> least
> > for packets from port 1 - that they traverse tables #0 => 1 => 2 and
> then are
> > sent back to table#0.
> >
> > NXST_FLOW reply (xid=0x4):
> > table=0, priority=100,ct_state=-trk,ip actions=ct(table=1) table=1,
> > ct_state=+new+trk,ip,in_port=1,nw_src=10.10.10.0/24
> > actions=ct(commit),output:2 table=1, ct_state=+est+trk,ip,in_port=1
> > actions=mod_nw_src:3.3.3.3,resubmit(,2)
> > table=2, ct_state=+est+trk,tcp,in_port=1 actions=drop table=2,
> > ct_state=+est+trk,udp,in_port=1 actions=resubmit(1,0) table=0,
> > ct_state=+est+trk,udp,in_port=1,nw_src=3.3.3.3 actions=output:2
> table=1,
> > ct_state=+new+trk,ip,in_port=2 actions=drop table=1,
> > ct_state=+est+trk,ip,in_port=2 actions=output:1 table=0,
> priority=10,arp
> > actions=NORMAL
> >
> > For packets from port1:
> > - table#0: untracked packets go through the CT, then are recirculated,
> ie go
> > to table#1.
> > - table#1: new conn are committed.
> > - table#1: packets of est conn are modified in the IPsrc that becomes
> 3.3.3.3,
> > then sent to table#2.
> > - table#2: TCP packets are dropped - but I'm sending just UDP traffic.
> > - table#2: UDP packets go back to table#0.
> > - table#0: est conn packets with IPsrc=3.3.3.3 are sent to port2.
> >
> > In this case we still see that the packets of est conn will skip one
> recirculation.
> >
> > In fact what happens is the following.
> > For packets belonging to est connections - and coming from
> > port1 - the actions
> > mod_nw_src:3.3.3.3,resubmit(,2)
> > become
> > ct(),mod_nw_src:3.3.3.3,resubmit(,2)
> >
> > and the EMC entry that was hit by the original packet will point to the
> > megaflow
> > table=1, ct_state=+est+trk,ip,in_port=1
> > actions=ct(),mod_nw_src:3.3.3.3,resubmit(,2)
> > So new packets of the same connection will skip one recirculation.
> >
> > For packets from port2 the megaflow
> > table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1 becomes
> > table=1, ct_state=+est+trk,ip,in_port=2 actions=ct(),output:1 and
> will be
> > pointed by the EMC entry hit by the original packet.
> > So new packets of the same connection will skip one recirculation.
> >
> > Is this usecase somewhat similar to the one you were mentioning?
> > Or maybe you can suggest some other rule setup?
> >
> >
> > > 2) This allows the dataplane to override what the ofproto layer
> specifies.
> > > This looks like a layering violation.
> > >
> > > I guess this also speaks to the datapath (userspace) specific design
> here.
> > >
> > >
> > > I don’t think it matters what the performance advantage is, although
> > > it is small in this case.
> > >
> > >
> > >
> > >
> > > Considerations
> > > --------------
> > > On EMC entries:
> > > All packets hitting a connection in the same direction will have
> > > the same 5-tuple.
> > > They shall all hit the same EMC entry - here referred as
> > > 'EMC_ORIG' - and point to the flow
> > > "ct_state=-trk,ip actions=ct(table=1)", referred from here as
> > > 'FLOW_ORIG'.
> > >
> > > On recirculation the packets shall hit another EMC entry -
> > > referred as 'EMC_RECIRC' - and point to the flow
> > > "ct_state=+est+trk,ip,in_port=.. actions=output:X",
> > > referred from here as 'FLOW_RECIRC'.
> > >
> > > FLOW_ORIG is the flow hit by the ingress packet while FLOW_RECIRC
> > > is the flow hit
> > > by the recirculated packet.
> > >
> > >
> > > Implementation Details
> > > ----------------------
> > > We thought about different implementations, but for the purpose of
> > > this RFC we stick to the following approach.
> > >
> > > On packet recirculation, a new flow has to be created with a list
> > > of appropriate actions. At this
> > > stage we prepend a new action called 'ct(Skip)' to the original
> > > list of actions. This
> > > new action is used as a temporary place-holder that will be over
> > > written later on.
> > >
> > > Original Actions: "Output:2" ==> New Actions: "ct(Skip),
> > > Output:2"
> > >
> > > For all the packets of an established conn. (i.e marked with
> > > +trk,+est bits), do the 3 steps:
> > > 1. Retrieve Flows from EMC.
> > > - FLOW_ORIG whose actions include: ct(zone=N), recirc(id)
> > > - FLOW_RECIRC whose actions include: ct(Skip), Output:X
> > >
> > > 2. Update the actions of FLOW_RECIRC by overwriting ct(skip) with
> > > the ct action from FLOW_ORIG, ie ct(zone=N).
> > > ct(Skip), Output:X => ct(zone=N), Output:X
> > >
> > > 3. Update the flow pointer in EMC_ORIG entry to 'FLOW_RECIRC'
> flow.
> > >
> > > In this way, we avoid recirculating the packets belonging to
> > > established connections and
> > > this no ways skips sending the packets to the CT module.
> > >
> > >
> > > Antonio
> > >
> > > > -----Original Message-----
> > > > From: [email protected] [mailto:ovs-dev-
> > > > [email protected]] On Behalf Of
> [email protected]
> > > > Sent: Thursday, May 25, 2017 5:03 PM
> > > > To: [email protected]
> > > > Subject: [ovs-dev] [PATCH RFC v2] Conntrack: Avoid recirculation
> for
> > > > established connections.
> > > >
> > > > From: Antonio Fischetti <[email protected]>
> > > >
> > > > With conntrack enabled, packets get recirculated and this
> impacts
> > > > the performance with thousands of active concurrent connections.
> > > >
> > > > This patch is aimed at avoiding recirculation for packets
> > > belonging to
> > > > established connections in steady state. This is achieved by
> > > > manipulating the megaflows and actions. In this case the actions
> > > of the
> > > > megaflow of recirculated packet is initially updated with new
> > > 'CT_SKIP'
> > > > action and later updated with the actions of the megaflow of the
> > > > original packet(to handle zones). There after the EMC entry
> > > > belonging to the original packet is updated to point to the
> > > 'megaflow' of
> > > > the recirculated packet there by avoiding recirculation.
> > > >
> > > > Signed-off-by: Antonio Fischetti <[email protected]>
> > > > Signed-off-by: Bhanuprakash Bodireddy
> > > <[email protected]>
> > > > Co-authored-by: Bhanuprakash Bodireddy
> > > <[email protected]>
> > > > ---
> > > > - ~10% performance improvement is observed in our testing with
> UDP
> > > > streams.
> > > > - Limited testing is done with RFC patch and the tests include
> > > hundreds of
> > > > concurrent active TCP connections.
> > > > - This is very early implementation and we are posting here to
> > > get initial
> > > > feedback.
> > > > - This RFC patch is implemented leveraging EMC, but optimization
> > > could be
> > > > extended to dpcls as well to handle higher flows.
> > > > - No VXLAN testing is done with this patch.
> > > >
> > > > datapath/linux/compat/include/linux/openvswitch.h | 1 +
> > > > lib/dpif-netdev.c | 148
> > > > ++++++++++++++++++++--
> > > > lib/packets.h | 2 +
> > > > 3 files changed, 143 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/datapath/linux/compat/include/linux/openvswitch.h
> > > > b/datapath/linux/compat/include/linux/openvswitch.h
> > > > index d22102e..6dc5674 100644
> > > > --- a/datapath/linux/compat/include/linux/openvswitch.h
> > > > +++ b/datapath/linux/compat/include/linux/openvswitch.h
> > > > @@ -762,6 +762,7 @@ enum ovs_ct_attr {
> > > > OVS_CT_ATTR_NAT, /* Nested OVS_NAT_ATTR_* */
> > > > OVS_CT_ATTR_FORCE_COMMIT, /* No argument */
> > > > OVS_CT_ATTR_EVENTMASK, /* u32 mask of IPCT_* events. */
> > > > + OVS_CT_ATTR_SKIP, /* No argument, does not invoke CT.
> > > */
> > > > __OVS_CT_ATTR_MAX
> > > > };
> > > >
> > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > > > index b50164b..fbbb42e 100644
> > > > --- a/lib/dpif-netdev.c
> > > > +++ b/lib/dpif-netdev.c
> > > > @@ -2101,7 +2101,8 @@ emc_probabilistic_insert(struct
> > > dp_netdev_pmd_thread
> > > > *pmd,
> > > > }
> > > >
> > > > static inline struct dp_netdev_flow *
> > > > -emc_lookup(struct emc_cache *cache, const struct
> > > netdev_flow_key
> > > *key)
> > > > +emc_lookup(struct emc_cache *cache, const struct
> > > netdev_flow_key *key,
> > > > + void **pfound_entry)
> > > > {
> > > > struct emc_entry *current_entry;
> > > >
> > > > @@ -2110,11 +2111,13 @@ emc_lookup(struct emc_cache *cache,
> > > const struct
> > > > netdev_flow_key *key)
> > > > && emc_entry_alive(current_entry)
> > > > && netdev_flow_key_equal_mf(¤t_entry->key,
> &key-
> > > >mf)) {
> > > >
> > > > + *pfound_entry = current_entry;
> > > > /* We found the entry with the 'key->mf' miniflow
> */
> > > > return current_entry->flow;
> > > > }
> > > > }
> > > >
> > > > + *pfound_entry = NULL;
> > > > return NULL;
> > > > }
> > > >
> > > > @@ -4583,10 +4586,12 @@ emc_processing(struct
> > > dp_netdev_pmd_thread *pmd,
> > > > key->hash = dpif_netdev_packet_get_rss_hash(packet,
> &key-
> > > >mf);
> > > >
> > > > /* If EMC is disabled skip emc_lookup */
> > > > - flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache,
> key);
> > > > + flow = (cur_min == 0) ? NULL:
> > > > + emc_lookup(flow_cache, key, &packet-
> > > >md.prev_emc_entry);
> > > > if (OVS_LIKELY(flow)) {
> > > > dp_netdev_queue_batches(packet, flow, &key->mf,
> > > batches,
> > > > n_batches);
> > > > + packet->md.prev_flow = flow;
> > > > } else {
> > > > /* Exact match cache missed. Group missed packets
> > > together at
> > > > * the beginning of the 'packets' array. */
> > > > @@ -4595,6 +4600,7 @@ emc_processing(struct
> > dp_netdev_pmd_thread
> > > *pmd,
> > > > * must be returned to the caller. The next key
> should
> > > be
> > > > extracted
> > > > * to 'keys[n_missed + 1]'. */
> > > > key = &keys[++n_missed];
> > > > + packet->md.prev_flow = NULL;
> > > > }
> > > > }
> > > >
> > > > @@ -4604,6 +4610,9 @@ emc_processing(struct
> > dp_netdev_pmd_thread
> > > *pmd,
> > > > return dp_packet_batch_size(packets_);
> > > > }
> > > >
> > > > +#define CT_PREPEND_ACTION_LEN 0x0C
> > > > +#define CT_PREPEND_ACTION_SPARE_LEN 32
> > > > +#define CT_TOT_ACTIONS_LEN (CT_PREPEND_ACTION_LEN +
> > > > CT_PREPEND_ACTION_SPARE_LEN)
> > > > static inline void
> > > > handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
> > > > struct dp_packet *packet,
> > > > @@ -4663,12 +4672,44 @@ handle_packet_upcall(struct
> > > dp_netdev_pmd_thread
> > > > *pmd,
> > > > ovs_mutex_lock(&pmd->flow_mutex);
> > > > netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key,
> NULL);
> > > > if (OVS_LIKELY(!netdev_flow)) {
> > > > - netdev_flow = dp_netdev_flow_add(pmd, &match,
> &ufid,
> > > > - add_actions->data,
> > > > - add_actions-
> >size);
> > > > + /* Recirculated packets that belong to established
> > > > connections
> > > > + * are treated differently. A place-holder is
> > > prepended to
> > > > the
> > > > + * list of actions. */
> > > > + if (!(packet->md.ct_state & CS_INVALID) &&
> > > > + (packet->md.ct_state & CS_TRACKED) &&
> > > > + (packet->md.ct_state & CS_ESTABLISHED) &&
> > > > + (packet->md.recirc_id)) {
> > > > +
> > >
> > >
> > > [Joe] "Just because the value of ct_state in the packet metadata
> > > of this
> > > particular packet happens to have those particular state flags
> doesn't
> > > mean that the OpenFlow pipeline is enforcing this constraint for
> all
> > > packets hitting this flow. You would have to ensure that the mask
> in
> > > the flow is also ensuring that all future packets hitting the new
> flow
> > > will also have that particular ct_state."
> > >
> > > This part of the code should prepend a ct(Skip) action to the
> > > FLOW_RECIRC only and just for packets that belong to an
> > > established conn.
> > > The FLOW_ORIG shouldn't be altered, because it will be hit by all
> > > ingress
> > > packets.
> > >
> > > > + /* Prepend an OVS_CT_ATTR_SKIP to the list of
> > > actions
> > > > inside
> > > > + * add_actions. It does not really invoke the
> > > ConnTrack
> > > > module,
> > > > + * it's a Ct action that works as a place-
> holder.
> > > It
> > > > will be
> > > > + * overwritten inside
> emc_patch_established_conn()
> > > with
> > > > + * the proper ct(zone=X) action. */
> > > > + struct dp_netdev_actions *new_actions;
> > > > + uint8_t tmp_action[CT_TOT_ACTIONS_LEN] = {
> > > > + 0x0C, 0x00, 0x0C, 0x00,
> > > > + 0x06, 0x00, 0x09, 0x00,
> > > > + 0x00, 0x00, 0x00, 0x00,
> > > > + /* will append here add_actions. */
> > > > + };
> > > > + memcpy(&tmp_action[CT_PREPEND_ACTION_LEN],
> > > > + add_actions->data,
> > > > + add_actions->size);
> > > > + new_actions = dp_netdev_actions_create(
> > > > + (const struct nlattr *)tmp_action,
> > > > + CT_PREPEND_ACTION_LEN + add_actions-
> >size);
> > >
> > > [Joe] "Hmm. In some ways I think this would be cleaner to be
> > > implemented in
> > > the xlate code, around compose_conntrack_action().. but that code
> is
> > > currently pretty much dataplane-oblivious and this suggestion is a
> > > userspace-specific optimization."
> > >
> > >
> > > > + netdev_flow = dp_netdev_flow_add(pmd, &match,
> > > &ufid,
> > > > + new_actions-
> > > >actions,
> > > > + new_actions-
> > > >size);
> > > > + } else {
> > > > + netdev_flow = dp_netdev_flow_add(pmd, &match,
> > > &ufid,
> > > > + add_actions-
> >data,
> > > > + add_actions-
> > > >size);
> > > > + }
> > > > }
> > > > ovs_mutex_unlock(&pmd->flow_mutex);
> > > > - emc_probabilistic_insert(pmd, key, netdev_flow);
> > > > + /* For this RFC, probabilistic emc insertion is
> disabled.
> > > */
> > > > + emc_insert(&pmd->flow_cache, key, netdev_flow);
> > > > }
> > > > }
> > > >
> > > > @@ -4761,7 +4802,8 @@ fast_path_processing(struct
> > > dp_netdev_pmd_thread
> > > > *pmd,
> > > >
> > > > flow = dp_netdev_flow_cast(rules[i]);
> > > >
> > > > - emc_probabilistic_insert(pmd, &keys[i], flow);
> > > > + /* For testing purposes the emc_insert is restored
> here. */
> > > > + emc_insert(&pmd->flow_cache, &keys[i], flow);
> > > > dp_netdev_queue_batches(packet, flow, &keys[i].mf,
> batches,
> > > > n_batches);
> > > > }
> > > >
> > > > @@ -4998,6 +5040,78 @@ dp_execute_userspace_action(struct
> > > > dp_netdev_pmd_thread *pmd,
> > > > }
> > > > }
> > > >
> > > > +/* Search into EMC to retrieve the entry related to the
> > > original packet
> > > > + * and the entry related to the recirculated packet.
> > > > + * If both EMC entries are alive, then:
> > > > + * - The flow actions of the recirculated packet are updated
> with
> > > > + * 'ct(zone=N)' as retrieved from the actions of the
> original
> > > flow.
> > > > + * - The EMC orig entry flow is updated with the flow pointer
> > > of recirc
> > > > entry.
> > > > + */
> > > > +static inline void
> > > > +emc_patch_established_conn(struct dp_netdev_pmd_thread *pmd,
> > > > + struct dp_packet *packet, long long now OVS_UNUSED)
> > > > +{
> > > > + struct emc_cache *cache = &pmd->flow_cache;
> > > > + struct netdev_flow_key key;
> > > > + struct emc_entry *orig_entry; /* EMC entry hit by the
> original
> > > > packet. */
> > > > + struct emc_entry *recirc_entry; /* EMC entry for
> recirculated
> > > packet.
> > > > */
> > > > + uint32_t old_hash;
> > > > +
> > > > + if (!packet->md.prev_emc_entry) {
> > > > + return;
> > > > + }
> > > > +
> > > > + orig_entry = packet->md.prev_emc_entry;
> > > > + if (!emc_entry_alive(orig_entry)) {
> > > > + return;
> > > > + }
> > > > +
> > > > + /* Check that the original EMC entry was not overwritten.
> */
> > > > + struct dp_netdev_flow *orig_flow = orig_entry->flow;
> > > > + if (packet->md.prev_flow && (packet->md.prev_flow !=
> > > orig_flow)) {
> > > > + return;
> > > > + }
> > > > +
> > > > + /* TODO as we are calling miniflow_extract now, can we
> avoid to
> > > > invoke
> > > > + * it again when we'll classify this recirculated packet?
> */
> > > > + miniflow_extract(packet, &key.mf);
> > > > + key.len = 0; /* Not computed yet. */
> > > > + old_hash = dp_packet_get_rss_hash(packet);
> > > > + key.hash = dpif_netdev_packet_get_rss_hash(packet, NULL);
> > > > + dp_packet_set_rss_hash(packet, old_hash);
> > > > +
> > > > + EMC_FOR_EACH_POS_WITH_HASH(cache, recirc_entry, key.hash) {
> > > > + if (recirc_entry->key.hash == key.hash
> > > > + && emc_entry_alive(recirc_entry)
> > > > + && netdev_flow_key_equal_mf(&recirc_entry->key,
> > > &key.mf)) {
> > > > +
> > > > + if (orig_entry->flow == recirc_entry->flow) {
> > > > + /* Nothing to do, already patched by a packet
> of
> > > this
> > > > + * same batch. */
> > > > + return;
> > > > + }
> > > > + /* We found the entry related to the recirculated
> > > packet.
> > > > + * Retrieve the actions for orig and recirc
> entries. *
> > > */
> > > > + struct dp_netdev_actions * orig_actions =
> > > > + dp_netdev_flow_get_actions(orig_entry-
> >flow);
> > > > + struct dp_netdev_actions * recirc_actions =
> > > > + dp_netdev_flow_get_actions(recirc_entry-
> >flow);
> > > > +
> > > > + /* The orig entry action contains a CT action with
> the
> > > zone
> > > > info. */
> [Sugesh] is it safe to assume first action would be ct ??
[Antonio] Right, in general it could be something different than ct.
For this RFC I'm assuming a simplified case.
> > > > + struct nlattr *p = &orig_actions->actions[0];
> > > > +
> > > > + /* Overwrite the CT Skip action of recirc entry
> with
> > > > ct(zone=N). */
> > > > + memcpy(recirc_actions->actions, p, p->nla_len);
> [Sugesh] What if recirc_action size is smaller than p? shouldn’t we make
> sure the size of skip action match with size of actions present in
> ct_table?
[Antonio] The skip action is just a place-holder to accommodate the list of
ct actions from the orig flow. It should be sufficient that the skip action
is bigger.
Worth remember that this solution is a quick implementation of the basic idea,
the purpose was to share the approach to better analyze all possible pros/cons
and corner cases.
A better implementation could be for example - instead of tampering the
actions of the recirc flow - to create a brand new flow which
has the combined actions picked up from the orig and the recirc flow.
>
> > > > +
> > > > + /* Update orig EMC entry. */
> > > > + orig_entry->flow = recirc_entry->flow;
> > > > + dp_netdev_flow_ref(orig_entry->flow);
> > > > +
> > > > + return;
> > > > + }
> > > > + }
> > > > +}
> > > > +
> > > > static void
> > > > dp_execute_cb(void *aux_, struct dp_packet_batch *packets_,
> > > > const struct nlattr *a, bool may_steal)
> > > > @@ -5024,7 +5138,6 @@ dp_execute_cb(void *aux_, struct
> > > dp_packet_batch
> > > > *packets_,
> > > > } else {
> > > > tx_qid = pmd->static_tx_qid;
> > > > }
> > > > -
> > > > netdev_send(p->port->netdev, tx_qid, packets_,
> > > may_steal,
> > > > dynamic_txqs);
> > > > return;
> > >
> > > [Joe] "Unrelated style change."
> > >
> > > Will do that.
> > >
> > > > @@ -5145,10 +5258,21 @@ dp_execute_cb(void *aux_, struct
> > > dp_packet_batch
> > > > *packets_,
> > > > struct dp_packet *packet;
> > > > DP_PACKET_BATCH_FOR_EACH (packet, packets_) {
> > > > packet->md.recirc_id = nl_attr_get_u32(a);
> > > > +
> > > > + if (!(packet->md.ct_state & CS_INVALID) &&
> > > > + (packet->md.ct_state & CS_TRACKED) &&
> > > > + (packet->md.ct_state & CS_ESTABLISHED)) {
> > > > + (*depth)++;
> > > > + emc_patch_established_conn(pmd, packet,
> now);
> > > > + (*depth)--;
> > > > + }
> > > > }
> > > >
> > > > (*depth)++;
> > > > dp_netdev_recirculate(pmd, packets_);
> > > > + DP_PACKET_BATCH_FOR_EACH (packet, packets_) {
> > > > + packet->md.recirc_id = 0;
> > > > + }
> > > > (*depth)--;
> > > >
> > > > return;
> > >
> > > [Joe] "Can you give some more detail about why resetting the
> > > recirc_id is
> > > correct here? Could/should this be applied as an independent
> change
> > > regardless of the rest of this series?"
> > >
> > > This addition is needed so that in handle_packet_upcall we add the
> > > ct(Skip)
> > > just to the FLOW_RECIRC and not to the FLOW_ORIG.
> > > I guess this could be also applied as an independent change?
> > >
> > > > @@ -5166,6 +5290,7 @@ dp_execute_cb(void *aux_, struct
> > > dp_packet_batch
> > > > *packets_,
> > > > const char *helper = NULL;
> > > > const uint32_t *setmark = NULL;
> > > > const struct ovs_key_ct_labels *setlabel = NULL;
> > > > + bool skip_ct = false;
> > > >
> > > > NL_ATTR_FOR_EACH_UNSAFE (b, left, nl_attr_get(a),
> > > > nl_attr_get_size(a)) {
> > > > @@ -5194,6 +5319,9 @@ dp_execute_cb(void *aux_, struct
> > > dp_packet_batch
> > > > *packets_,
> > > > /* Silently ignored, as userspace datapath does
> not
> > > > generate
> > > > * netlink events. */
> > > > break;
> > > > + case OVS_CT_ATTR_SKIP:
> > > > + skip_ct = true;
> > > > + break;
> > > > case OVS_CT_ATTR_NAT:
> > > > case OVS_CT_ATTR_UNSPEC:
> > > > case __OVS_CT_ATTR_MAX:
> > > > @@ -5201,6 +5329,10 @@ dp_execute_cb(void *aux_, struct
> > > dp_packet_batch
> > > > *packets_,
> > > > }
> > > > }
> > > >
> > > > + if (OVS_UNLIKELY(skip_ct)) {
> > > > + break;
> > > > + }
> > > > +
> > > > conntrack_execute(&dp->conntrack, packets_, aux->flow-
> > > >dl_type,
> > > > force,
> > > > commit, zone, setmark, setlabel,
> helper);
> > > > break;
> > > > diff --git a/lib/packets.h b/lib/packets.h
> > > > index 7dbf6dd..2c46a15 100644
> > > > --- a/lib/packets.h
> > > > +++ b/lib/packets.h
> > > > @@ -112,6 +112,8 @@ struct pkt_metadata {
> > > > struct flow_tnl tunnel; /* Encapsulating tunnel
> parameters.
> > > Note
> > > > that
> > > > * if 'ip_dst' == 0, the rest
> of
> > > the
> > > > fields may
> > > > * be uninitialized. */
> [Sugesh] I don’t see the initiation of these fields. Have you verified the
> cost of these fields in pkt_metadata for normal traffic,?. I mean the
> performance impact.?
[Antonio]
For sure this is an overhead for normal traffic.
> > > > + void *prev_emc_entry; /* EMC entry that this packet
> > > matched. */
> > > > + void *prev_flow; /* Flow pointed by the matching
> EMC
> > > > entry. */
> > > > };
> > > >
> > > > static inline void
> > > > --
> > > > 2.4.11
> > > >
> [Sugesh] I cannot apply the patch on latest master due to some conflicts.
> I am not sure , how does the stats get updated when you are combining the
> flows in datapath. Earlier there are two UUID for two flows and now only
> one UUID is being used.
> I don’t have a expertise in conntrack, However what would be the result
> when we have actions like
>
> table=0, priority=100,ct_state=-trk,ip actions=clone(push_vlan(4),
> trunc(200), output:2), ct(table=1)
> table=1, ct_state=+est+trk,ip,in_port=1 actions=output:2
>
>
> Also another example would be
> table=0, priority=100,ct_state=-trk,ip actions=push_vlan(4), ct(table=1)
> ,dec_ttl
> table=1, ct_state=+est+trk,ip,in_port=1 actions=output:2
>
[Antonio]
In general whenever you have some actions that imply recirculation 'before'
the ct action
actions=<other recirc here>, ct(zone=N,recirc)
it should work by combining
<other recirc here>
+ ct(zone=N) <=== recirc gets removed
<actions from recirc flow>
> As Joe suggested before, I think the logic of combine will be good to be
> in the ofproto-xlate layer than here. Though its userspace specific, it
> could be the same way how the userspace_tunnel handled there.
>
[Antonio] yes, that's interesting
>
> Also another concern is on the actions that does fields modification along
> with connection tracking. Probably it’s a valid use case with NAT. In this
> scenario, if the flow fields are changed by action, how does it possible
> to track back the original flow?
> Or am I missing anything ?
>
>
>
> > > > _______________________________________________
> > > > dev mailing list
> > > > [email protected]
> > > > https://urldefense.proofpoint.com/v2/url?u=https-
> > > 3A__mail.openvswitch.org_mailman_listinfo_ovs-
> > > 2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-
> > > uZnsw&m=nVfAqNkwbufuhX-dkxok67-OaTEtfFTDAj-
> > > Z39vSDZE&s=7SujU3lZVLCtG1pTSrZt1CMmmuv4iKnSkprKdkxSXZY&e=
> > > _______________________________________________
> > > dev mailing list
> > > [email protected]
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > > 3A__mail.openvswitch.org_mailman_listinfo_ovs-
> > > 2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-
> > > uZnsw&m=nVfAqNkwbufuhX-dkxok67-OaTEtfFTDAj-
> > > Z39vSDZE&s=7SujU3lZVLCtG1pTSrZt1CMmmuv4iKnSkprKdkxSXZY&e=
> > >
> >
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev