Re: [ovs-dev] [PATCH 4/4] dp-packet: Use memcpy to copy dp_packet fields.

2017-06-28 Thread O Mahony, Billy
Hi Antonio,


> -Original Message-
> From: Fischetti, Antonio
> Sent: Friday, June 23, 2017 11:06 PM
> To: O Mahony, Billy <billy.o.mah...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH 4/4] dp-packet: Use memcpy to copy
> dp_packet fields.
> 
> Hi Billy, thanks for your review.
> Replies inline.
> 
> /Antonio
> 
> > -Original Message-
> > From: O Mahony, Billy
> > Sent: Friday, June 23, 2017 2:27 PM
> > To: Fischetti, Antonio <antonio.fische...@intel.com>;
> > d...@openvswitch.org
> > Subject: RE: [ovs-dev] [PATCH 4/4] dp-packet: Use memcpy to copy
> > dp_packet fields.
> >
> > Hi Antonio,
> >
> > I'm not sure that this approach will work. Mainly the memcpy will not
> > take account of any padding that the compiler may insert and the
> > struct.  Maybe if the struct was defined as packed it could fix this 
> > objection.
> 
> [AF] This patch is somewhat similar to the patch for initializing the
> pkt_metadata struct in pkt_metadata_init() at
> http://patchwork.ozlabs.org/patch/779696/
> 
> 
> >
> > Also anyone editing the structure in future would have to be aware that
> > these elements need to be kept contiguous in order for packet_clone to
> > keep working.
> 
> [AF] Agree, I should add a comment locally to the struct definition.
> 
> >
> > Or if the relevant fields here were all placed in the their own nested
> > struct then sizeof that nested struct could be used in the memcpy call as
> > the sizeof nested_struct would account for whatever padding the compiler
> > inserted.

 [[BO'M]] I think at a minimum this nesting of structures would have to be 
done. Just adding a comment won't address the issue with compiler padding. 
Which could change based on compiler, compiler version, compiler flags, target 
architecture, etc.

> >
> > Does this change give much of a performance increase?
> 
> I tested this while working on connection tracker. I was using this particular
> set
> of flows - see 4th line - with
> "action=ct(table=1),NORMAL";
> to trigger a call to dp_packet_clone_with_headroom():
> 
> ovs-ofctl del-flows br0;
> ovs-ofctl add-flow br0 table=0,priority=1,action=drop;
> ovs-ofctl add-flow br0 table=0,priority=10,arp,action=normal;
> ovs-ofctl add-flow br0 table=0,priority=100,ip,ct_state=-
> trk,"action=ct(table=1),NORMAL";
> ovs-ofctl add-flow br0
> table=1,in_port=1,ip,ct_state=+trk+new,"action=ct(commit),2";
> ovs-ofctl add-flow br0 table=1,in_port=1,ip,ct_state=+trk+est,"action=2";
> ovs-ofctl add-flow br0
> table=1,in_port=2,ip,ct_state=+trk+new,"action=drop";
> ovs-ofctl add-flow br0 table=1,in_port=2,ip,ct_state=+trk+est,"action=1";
> 
> After running a Hotspot analysis with VTune for 60 secs I had in the original
> that dp_packet_clone_with_headroom was ranked at the 2nd place:
> Function   CPU Time
> -+---
> __libc_malloc5.880s
> dp_packet_clone_with_headroom4.530s
> emc_lookup   4.050s
> free 3.500s
> pthread_mutex_unlock 2.890s
> ...
> 
> Instead after this change the same fn was consuming less cpu cycles:
> Function   CPU Time
> -+---
> __libc_malloc5.900s
> emc_lookup   4.070s
> free 4.010s
> dp_packet_clone_with_headroom3.920s
> pthread_mutex_unlock 3.060s
> 
> 

[[BO'M]] 
So we see a 0.5s saving (out of the 60s test) on the time spent in the changed 
function - almost 1%. However, there is also a change of 0.5s in the usage 
associated with the free() function which we'd expect to not change with this 
patch. So hopefully these changes are not just related to sampling error and 
remain stable across invocations or across longer invocations. Is there any 
change in the cycles per packet statistic with this change? That could be a 
more reliable metric that vtune which will have some sampling error involved.

Also is dp_packet_clone_with_headroom hit for all packets or is it just used 
when conntrk is enabled? 

> 
> 
> >
> > Also I commented on 1/4 of this patchset about a cover letter - but if the
> > patchset members are independent of each other then maybe they should
> just
> > be separate patches.
> 
> [AF] I grouped these patches together because they all would be some
> optimizations on performance with a focus mainly on conntracker usecase.
> Maybe a better choice was to split them in separate patches.
> 
> 

Re: [ovs-dev] [PATCH 4/4] dp-packet: Use memcpy to copy dp_packet fields.

2017-06-23 Thread Fischetti, Antonio
Hi Billy, thanks for your review.
Replies inline.

/Antonio

> -Original Message-
> From: O Mahony, Billy
> Sent: Friday, June 23, 2017 2:27 PM
> To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH 4/4] dp-packet: Use memcpy to copy dp_packet
> fields.
> 
> Hi Antonio,
> 
> I'm not sure that this approach will work. Mainly the memcpy will not take
> account of any padding that the compiler may insert and the struct.  Maybe
> if the struct was defined as packed it could fix this objection.

[AF] This patch is somewhat similar to the patch for initializing 
the pkt_metadata struct in pkt_metadata_init() at
http://patchwork.ozlabs.org/patch/779696/


> 
> Also anyone editing the structure in future would have to be aware that
> these elements need to be kept contiguous in order for packet_clone to
> keep working.

[AF] Agree, I should add a comment locally to the struct definition. 

> 
> Or if the relevant fields here were all placed in the their own nested
> struct then sizeof that nested struct could be used in the memcpy call as
> the sizeof nested_struct would account for whatever padding the compiler
> inserted.
> 
> Does this change give much of a performance increase?

I tested this while working on connection tracker. I was using this particular 
set
of flows - see 4th line - with 
"action=ct(table=1),NORMAL";
to trigger a call to dp_packet_clone_with_headroom():

ovs-ofctl del-flows br0;
ovs-ofctl add-flow br0 table=0,priority=1,action=drop;
ovs-ofctl add-flow br0 table=0,priority=10,arp,action=normal;
ovs-ofctl add-flow br0 
table=0,priority=100,ip,ct_state=-trk,"action=ct(table=1),NORMAL";
ovs-ofctl add-flow br0 
table=1,in_port=1,ip,ct_state=+trk+new,"action=ct(commit),2";
ovs-ofctl add-flow br0 table=1,in_port=1,ip,ct_state=+trk+est,"action=2";
ovs-ofctl add-flow br0 table=1,in_port=2,ip,ct_state=+trk+new,"action=drop";
ovs-ofctl add-flow br0 table=1,in_port=2,ip,ct_state=+trk+est,"action=1";

After running a Hotspot analysis with VTune for 60 secs I had in the original 
that dp_packet_clone_with_headroom was ranked at the 2nd place:
Function   CPU Time
-+---
__libc_malloc5.880s
dp_packet_clone_with_headroom4.530s
emc_lookup   4.050s
free 3.500s
pthread_mutex_unlock 2.890s
...

Instead after this change the same fn was consuming less cpu cycles:
Function   CPU Time
-+---
__libc_malloc5.900s
emc_lookup   4.070s
free 4.010s
dp_packet_clone_with_headroom3.920s
pthread_mutex_unlock 3.060s




> 
> Also I commented on 1/4 of this patchset about a cover letter - but if the
> patchset members are independent of each other then maybe they should just
> be separate patches.

[AF] I grouped these patches together because they all would be some 
optimizations on performance with a focus mainly on conntracker usecase. 
Maybe a better choice was to split them in separate patches.

> 
> Regards,
> Billy.
> 
> > -Original Message-
> > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > boun...@openvswitch.org] On Behalf Of antonio.fische...@intel.com
> > Sent: Monday, June 19, 2017 11:12 AM
> > To: d...@openvswitch.org
> > Subject: [ovs-dev] [PATCH 4/4] dp-packet: Use memcpy to copy dp_packet
> > fields.
> >
> > From: Antonio Fischetti <antonio.fische...@intel.com>
> >
> > memcpy replaces the single copies inside
> > dp_packet_clone_with_headroom().
> >
> > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com>
> > ---
> >  lib/dp-packet.c | 17 +
> >  1 file changed, 9 insertions(+), 8 deletions(-)
> >
> > diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 67aa406..5b1d416
> 100644
> > --- a/lib/dp-packet.c
> > +++ b/lib/dp-packet.c
> > @@ -157,7 +157,7 @@ dp_packet_clone(const struct dp_packet *buffer)
> >  return dp_packet_clone_with_headroom(buffer, 0);  }
> >
> > -/* Creates and returns a new dp_packet whose data are copied from
> > 'buffer'.   The
> > +/* Creates and returns a new dp_packet whose data are copied from
> > +'buffer'. The
> >   * returned dp_packet will additionally have 'headroom' bytes of
> headroom.
> > */  struct dp_packet *  dp_packet_clone_with_headroom(const struct
> > dp_packet *buffer, size_t headroom) @@ -167,13 +167,14 @@
> > dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t
> > headroom)
> >  new_buf