[lng-odp] [Bug 3201] Performance degradation due to no-copy packet reference commits

2017-09-14 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3201

Bill Fischofer  changed:

   What|Removed |Added

 Status|IN_PROGRESS |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Bill Fischofer  ---
Resolved by PR #170

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3201] Performance degradation due to no-copy packet reference commits

2017-08-31 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3201

--- Comment #6 from Bill Fischofer  ---
Expecting update from Petri on packet rework.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3201] Performance degradation due to no-copy packet reference commits

2017-08-17 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3201

--- Comment #5 from Bill Fischofer  ---
Waiting for feedback from Petri. He's reworking the packet internals so will
reevaluate after that work is completed.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3201] Performance degradation due to no-copy packet reference commits

2017-08-14 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3201

Bill Fischofer  changed:

   What|Removed |Added

 Status|UNCONFIRMED |IN_PROGRESS
 Ever confirmed|0   |1

--- Comment #4 from Bill Fischofer  ---
PR https://github.com/Linaro/odp/pull/125 posted to address this issue, at
least in part.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3201] Performance degradation due to no-copy packet reference commits

2017-08-09 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3201

--- Comment #3 from Bill Fischofer  ---
Here's an example of what we're dealing with.

If you look at the delta in the path for odp_packet_alloc() you see that it
consists of exactly three additional odp_packet_hdr_t field assignments. The
rest of the code is identical before and after this patch set.

In init_segments() the pkt_hdr->ref_count field is set to 1. This is done via
direct assignment rather than an atomic op since at alloc time there can be no
other references to the packet. 

Then in packet_init() there are two additional assignments:

pkt_hdr->unshared_len = len;
pkt_hdr->ref_hdr = NULL;

So that's a total of three additional assignments for single-segment packet
allocs. For multi-segment allocs the init_segment() routine would set the
ref_count to 1 in each of the additional segments, so again a trivial delta.

The total pathlength for packet allocation is well over 100 instructions, and
yet the microbenchmark is reporting a ~40% degradation introduced by these
three additional assignments. I don't see how such a measurement is possible.
Any theories to explain this?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3201] Performance degradation due to no-copy packet reference commits

2017-08-09 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3201

Bill Fischofer  changed:

   What|Removed |Added

 CC||bogdan.pric...@linaro.org,
   ||dmitry.ereminsolenikov@lina
   ||ro.org

--- Comment #2 from Bill Fischofer  ---
Adding Bogdan to the interest list since he has some experience in this area as
well. Adding Dmitry because he has some good insights as well.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[lng-odp] [Bug 3201] Performance degradation due to no-copy packet reference commits

2017-08-09 Thread bugzilla-daemon
https://bugs.linaro.org/show_bug.cgi?id=3201

--- Comment #1 from Bill Fischofer  ---
Thanks, Petri.

The L2fwd numbers are things we should be able to work with. Do you have any
sort of "hot spot" analysis to say where the extra pathlength is coming from in
those runs? The intersect with the DPDK zero-copy stuff is also interesting.

On the microbenchmarks, how repeatable are those numbers? I ask because the
differences even within a single test seems non-intuitive. For example, both
alloc and free tests individually show measurable degradation, but when
combined (alloc_free test) there is either no degradation or measured
improvement. How is that possible?

Similarly, packet_pull_tail is reporting a 77% improvement on 64 byte packets
and a 39% drop on 128 byte packets. Again I don't see how this is possible
since the code is not sensitive to packet size--the code path deltas should be
identical.

Or consider:
bench_packet_l4_offset_set  -39 %   -39 %   83 %-37 %   -41 %   -37 %

Why should 256-byte packets get a significant boost when other sizes show
degradation?

Or again:
bench_packet_copy_from_mem  -2 %0 % -28 %   0 % -15 %   10 %

These results are all over the map for no obvious reason.

These anomalies suggest that the microbenchmarks themselves are exhibiting some
randomness which may be obscuring what we're trying to measure / tune.

-- 
You are receiving this mail because:
You are on the CC list for the bug.