Hi Antonio,

> -----Original Message-----
> From: Fischetti, Antonio
> Sent: Friday, June 23, 2017 11:06 PM
> To: O Mahony, Billy <billy.o.mah...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH 4/4] dp-packet: Use memcpy to copy
> dp_packet fields.
> 
> Hi Billy, thanks for your review.
> Replies inline.
> 
> /Antonio
> 
> > -----Original Message-----
> > From: O Mahony, Billy
> > Sent: Friday, June 23, 2017 2:27 PM
> > To: Fischetti, Antonio <antonio.fische...@intel.com>;
> > d...@openvswitch.org
> > Subject: RE: [ovs-dev] [PATCH 4/4] dp-packet: Use memcpy to copy
> > dp_packet fields.
> >
> > Hi Antonio,
> >
> > I'm not sure that this approach will work. Mainly the memcpy will not
> > take account of any padding that the compiler may insert and the
> > struct.  Maybe if the struct was defined as packed it could fix this 
> > objection.
> 
> [AF] This patch is somewhat similar to the patch for initializing the
> pkt_metadata struct in pkt_metadata_init() at
> http://patchwork.ozlabs.org/patch/779696/
> 
> 
> >
> > Also anyone editing the structure in future would have to be aware that
> > these elements need to be kept contiguous in order for packet_clone to
> > keep working.
> 
> [AF] Agree, I should add a comment locally to the struct definition.
> 
> >
> > Or if the relevant fields here were all placed in the their own nested
> > struct then sizeof that nested struct could be used in the memcpy call as
> > the sizeof nested_struct would account for whatever padding the compiler
> > inserted.

 [[BO'M]] I think at a minimum this nesting of structures would have to be 
done. Just adding a comment won't address the issue with compiler padding. 
Which could change based on compiler, compiler version, compiler flags, target 
architecture, etc.

> >
> > Does this change give much of a performance increase?
> 
> I tested this while working on connection tracker. I was using this particular
> set
> of flows - see 4th line - with
> "action=ct(table=1),NORMAL";
> to trigger a call to dp_packet_clone_with_headroom():
> 
> ovs-ofctl del-flows br0;
> ovs-ofctl add-flow br0 table=0,priority=1,action=drop;
> ovs-ofctl add-flow br0 table=0,priority=10,arp,action=normal;
> ovs-ofctl add-flow br0 table=0,priority=100,ip,ct_state=-
> trk,"action=ct(table=1),NORMAL";
> ovs-ofctl add-flow br0
> table=1,in_port=1,ip,ct_state=+trk+new,"action=ct(commit),2";
> ovs-ofctl add-flow br0 table=1,in_port=1,ip,ct_state=+trk+est,"action=2";
> ovs-ofctl add-flow br0
> table=1,in_port=2,ip,ct_state=+trk+new,"action=drop";
> ovs-ofctl add-flow br0 table=1,in_port=2,ip,ct_state=+trk+est,"action=1";
> 
> After running a Hotspot analysis with VTune for 60 secs I had in the original
> that dp_packet_clone_with_headroom was ranked at the 2nd place:
> Function                       CPU Time
> -----------------------------+-----------
> __libc_malloc                    5.880s
> dp_packet_clone_with_headroom    4.530s
> emc_lookup                       4.050s
> free                             3.500s
> pthread_mutex_unlock             2.890s
> ...
> 
> Instead after this change the same fn was consuming less cpu cycles:
> Function                       CPU Time
> -----------------------------+-----------
> __libc_malloc                    5.900s
> emc_lookup                       4.070s
> free                             4.010s
> dp_packet_clone_with_headroom    3.920s
> pthread_mutex_unlock             3.060s
> 
> 

[[BO'M]] 
So we see a 0.5s saving (out of the 60s test) on the time spent in the changed 
function - almost 1%. However, there is also a change of 0.5s in the usage 
associated with the free() function which we'd expect to not change with this 
patch. So hopefully these changes are not just related to sampling error and 
remain stable across invocations or across longer invocations. Is there any 
change in the cycles per packet statistic with this change? That could be a 
more reliable metric that vtune which will have some sampling error involved.

Also is dp_packet_clone_with_headroom hit for all packets or is it just used 
when conntrk is enabled? 

> 
> 
> >
> > Also I commented on 1/4 of this patchset about a cover letter - but if the
> > patchset members are independent of each other then maybe they should
> just
> > be separate patches.
> 
> [AF] I grouped these patches together because they all would be some
> optimizations on performance with a focus mainly on conntracker usecase.
> Maybe a better choice was to split them in separate patches.
> 
> >
> > Regards,
> > Billy.
> >
> > > -----Original Message-----
> > > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > > boun...@openvswitch.org] On Behalf Of antonio.fische...@intel.com
> > > Sent: Monday, June 19, 2017 11:12 AM
> > > To: d...@openvswitch.org
> > > Subject: [ovs-dev] [PATCH 4/4] dp-packet: Use memcpy to copy
> dp_packet
> > > fields.
> > >
> > > From: Antonio Fischetti <antonio.fische...@intel.com>
> > >
> > > memcpy replaces the single copies inside
> > > dp_packet_clone_with_headroom().
> > >
> > > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> > > ---
> > >  lib/dp-packet.c | 17 +++++++++--------
> > >  1 file changed, 9 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 67aa406..5b1d416
> > 100644
> > > --- a/lib/dp-packet.c
> > > +++ b/lib/dp-packet.c
> > > @@ -157,7 +157,7 @@ dp_packet_clone(const struct dp_packet *buffer)
> > >      return dp_packet_clone_with_headroom(buffer, 0);  }
> > >
> > > -/* Creates and returns a new dp_packet whose data are copied from
> > > 'buffer'.   The
> > > +/* Creates and returns a new dp_packet whose data are copied from
> > > +'buffer'. The
> > >   * returned dp_packet will additionally have 'headroom' bytes of
> > headroom.
> > > */  struct dp_packet *  dp_packet_clone_with_headroom(const struct
> > > dp_packet *buffer, size_t headroom) @@ -167,13 +167,14 @@
> > > dp_packet_clone_with_headroom(const struct dp_packet *buffer,
> size_t
> > > headroom)
> > >      new_buffer =
> > > dp_packet_clone_data_with_headroom(dp_packet_data(buffer),
> > >
> > dp_packet_size(buffer),
> > >                                                   headroom);
> > > -    new_buffer->l2_pad_size = buffer->l2_pad_size;
> > > -    new_buffer->l2_5_ofs = buffer->l2_5_ofs;
> > > -    new_buffer->l3_ofs = buffer->l3_ofs;
> > > -    new_buffer->l4_ofs = buffer->l4_ofs;
> > > -    new_buffer->md = buffer->md;
> > > -    new_buffer->cutlen = buffer->cutlen;
> > > -    new_buffer->packet_type = buffer->packet_type;
> > > +    memcpy(&new_buffer->l2_pad_size, &buffer->l2_pad_size,
> > > +            sizeof(new_buffer->l2_pad_size) +
> > > +            sizeof(new_buffer->l2_5_ofs) +
> > > +            sizeof(new_buffer->l3_ofs) +
> > > +            sizeof(new_buffer->l4_ofs) +
> > > +            sizeof(new_buffer->cutlen) +
> > > +            sizeof(new_buffer->packet_type) +
> > > +            sizeof(new_buffer->md));
> > >  #ifdef DPDK_NETDEV
> > >      new_buffer->mbuf.ol_flags = buffer->mbuf.ol_flags;  #else
> > > --
> > > 2.4.11
> > >
> > > _______________________________________________
> > > dev mailing list
> > > d...@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to