Hmm.
Forgive my ignorance, but isn't memcmp() on structs pretty prone to give
incorrect != results, given that there may be padding between members in
structs and that IIRC gcc struct assignment is member-wise.
What happens if there's padding between the jit_context and variant
members of
On Wed, 2011-06-29 at 16:16 -0700, Corbin Simpson wrote:
Okay, so maybe I'm failing to recognize the exact situation here, but
wouldn't it be possible to mark the FS state with a serial number and
just compare those? Or are these FS states not CSO-cached?
No, the struct being compared is
On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
Ok in fact there's a gcc bug about memcmp:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
In short gcc's memcmp builtin is totally lame and loses to glibc's
memcmp (including call overhead, no knowledge about alignment etc.) even
Great work Roland! And thanks Ajax to finding this hot spot.
We use memcmp a lot -- all CSO caching, so we should use this everywhere.
We should also code a sse2 version with intrinsics for x86-64, which is
guaranteed to always have SSE2.
Jose
- Original Message -
Actually I ran some
- Original Message -
Hmm.
Forgive my ignorance, but isn't memcmp() on structs pretty prone to
give
incorrect != results, given that there may be padding between members
in
structs and that IIRC gcc struct assignment is member-wise.
There's no alternative to bitwise comparison on
- Original Message -
On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
Ok in fact there's a gcc bug about memcmp:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
In short gcc's memcmp builtin is totally lame and loses to glibc's
memcmp (including call overhead, no
On Thu, 2011-06-30 at 03:27 -0700, Jose Fonseca wrote:
- Original Message -
On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
Ok in fact there's a gcc bug about memcmp:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
In short gcc's memcmp builtin is totally lame
On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
Ok in fact there's a gcc bug about memcmp:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
In short gcc's memcmp builtin is totally lame and loses to glibc's
memcmp (including call overhead, no knowledge about alignment etc.) even
Am 30.06.2011 12:14, schrieb Jose Fonseca:
- Original Message -
Hmm.
Forgive my ignorance, but isn't memcmp() on structs pretty prone to
give
incorrect != results, given that there may be padding between members
in
structs and that IIRC gcc struct assignment is member-wise.
Am 30.06.2011 16:14, schrieb Adam Jackson:
On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
Ok in fact there's a gcc bug about memcmp:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
In short gcc's memcmp builtin is totally lame and loses to glibc's
memcmp (including call
On Thu, 2011-06-30 at 17:53 +0200, Roland Scheidegger wrote:
Am 30.06.2011 16:14, schrieb Adam Jackson:
On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
Ok in fact there's a gcc bug about memcmp:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
In short gcc's memcmp builtin
Perversely, do this by eliminating the comparison between stored and
current fs state. On ipers, a perf trace showed try_update_scene_state
using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the memcmp.
Taking that out takes try_update_scene_state down to 6.5% of the
profile; more
On Wed, 2011-06-29 at 13:19 -0400, Adam Jackson wrote:
Perversely, do this by eliminating the comparison between stored and
current fs state. On ipers, a perf trace showed try_update_scene_state
using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the memcmp.
Taking that out takes
Ohh that's interesting, you'd think the comparison shouldn't be that
expensive (though I guess in ipers case the comparison is never true).
memcmp is quite extensively used everywhere. Maybe we could replace that
with something faster (since we only ever care if the blocks are the
same but not
Actually I ran some numbers here and tried out a optimized struct compare:
original ipers: 12.1 fps
ajax patch: 15.5 fps
optimized struct compare: 16.8 fps
This is the function I used for that (just enabled in that lp_setup
function):
static INLINE int util_cmp_struct(const void *src1, const
I didn't even look at that was just curious why the memcmp (which is
used a lot in other places) is slow. However, none of the other memcmp
seem to show up prominently (cso functions are quite low in profiles,
_mesa_search_program_cache uses memcmp too but it's not that high
neither). So I guess
Ok in fact there's a gcc bug about memcmp:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
In short gcc's memcmp builtin is totally lame and loses to glibc's
memcmp (including call overhead, no knowledge about alignment etc.) even
when comparing only very few bytes (and loses BIG time for lots
17 matches
Mail list logo