Re: [ovs-dev] [PATCH v3 06/12] cmap: Remove prefetching in cmap_find_batch().
>-Original Message- >From: Daniele Di Proietto [mailto:[email protected]] >Sent: Tuesday, October 18, 2016 4:07 AM >To: Bodireddy, Bhanuprakash >Cc: [email protected] >Subject: Re: [ovs-dev] [PATCH v3 06/12] cmap: Remove prefetching in >cmap_find_batch(). > > > >2016-10-14 7:37 GMT-07:00 Bhanuprakash Bodireddy >: >prefetching the data in to the caches isn't improving the performance in >cmap_find_batch(). Moreover its found that there is slight improvement >in performance with out prefetching. > >This patch removes prefetching from cmap_find_batch(). > >Signed-off-by: Bhanuprakash Bodireddy > >Co-authored-by: Antonio Fischetti >Signed-off-by: Antonio Fischetti > >I tested this patch in isolation and on my system I didn't notice any >improvements for a single flow (with EMC disabled), I noticed a slight drop >instead with 128 flows in the classifier. >Probably this is due to the fact that I didn't apply yet the first patch of the >series (the one that increases the batch to 32), so I guess I'll defer this >patch >until we can apply the rest of the series. >Also, if you guys see an improvement (and since you got some evidence with >VTune), I don't think it matters that on one particular system (mine) I can't >see >any benefit. I am testing this on haswell and VTune confirmed our observation. Also prefetching Is done at 4 places in cmap_find_batch() and at two places the prefetching is done just before the data is accessed. As prefetch instruction has some overhead, prefetching should be done well enough in advance to have performance gains. Also prefetching too earlier can has negative effect as the prefetched data can be flushed by other access. We played around a bit and found removing the prefetching doesn't impact the performance and hence submitted this patch. Regards, Bhanu Prakash. > >Thanks, >Daniele > > ___ dev mailing list [email protected] http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v3 06/12] cmap: Remove prefetching in cmap_find_batch().
2016-10-14 7:37 GMT-07:00 Bhanuprakash Bodireddy < [email protected]>: > prefetching the data in to the caches isn't improving the performance in > cmap_find_batch(). Moreover its found that there is slight improvement > in performance with out prefetching. > > This patch removes prefetching from cmap_find_batch(). > > Signed-off-by: Bhanuprakash Bodireddy > Co-authored-by: Antonio Fischetti > Signed-off-by: Antonio Fischetti > I tested this patch in isolation and on my system I didn't notice any improvements for a single flow (with EMC disabled), I noticed a slight drop instead with 128 flows in the classifier. Probably this is due to the fact that I didn't apply yet the first patch of the series (the one that increases the batch to 32), so I guess I'll defer this patch until we can apply the rest of the series. Also, if you guys see an improvement (and since you got some evidence with VTune), I don't think it matters that on one particular system (mine) I can't see any benefit. Thanks, Daniele ___ dev mailing list [email protected] http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v3 06/12] cmap: Remove prefetching in cmap_find_batch().
prefetching the data in to the caches isn't improving the performance in
cmap_find_batch(). Moreover its found that there is slight improvement
in performance with out prefetching.
This patch removes prefetching from cmap_find_batch().
Signed-off-by: Bhanuprakash Bodireddy
Co-authored-by: Antonio Fischetti
Signed-off-by: Antonio Fischetti
---
lib/cmap.c | 8 ++--
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/lib/cmap.c b/lib/cmap.c
index 8c7312d..8097b56 100644
--- a/lib/cmap.c
+++ b/lib/cmap.c
@@ -393,11 +393,10 @@ cmap_find_batch(const struct cmap *cmap, unsigned long
map,
const struct cmap_bucket *b2s[sizeof map * CHAR_BIT];
uint32_t c1s[sizeof map * CHAR_BIT];
-/* Compute hashes and prefetch 1st buckets. */
+/* Compute hashes. */
ULLONG_FOR_EACH_1(i, map) {
h1s[i] = rehash(impl, hashes[i]);
b1s[i] = &impl->buckets[h1s[i] & impl->mask];
-OVS_PREFETCH(b1s[i]);
}
/* Lookups, Round 1. Only look up at the first bucket. */
ULLONG_FOR_EACH_1(i, map) {
@@ -411,15 +410,13 @@ cmap_find_batch(const struct cmap *cmap, unsigned long
map,
} while (OVS_UNLIKELY(counter_changed(b1, c1)));
if (!node) {
-/* Not found (yet); Prefetch the 2nd bucket. */
+/* Not found (yet). */
b2s[i] = &impl->buckets[other_hash(h1s[i]) & impl->mask];
-OVS_PREFETCH(b2s[i]);
c1s[i] = c1; /* We may need to check this after Round 2. */
continue;
}
/* Found. */
ULLONG_SET0(map, i); /* Ignore this on round 2. */
-OVS_PREFETCH(node);
nodes[i] = node;
}
/* Round 2. Look into the 2nd bucket, if needed. */
@@ -453,7 +450,6 @@ cmap_find_batch(const struct cmap *cmap, unsigned long map,
continue;
}
found:
-OVS_PREFETCH(node);
nodes[i] = node;
}
return result;
--
2.4.11
___
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev
