Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Thomas Hellstrom
Hmm. Forgive my ignorance, but isn't memcmp() on structs pretty prone to give incorrect != results, given that there may be padding between members in structs and that IIRC gcc struct assignment is member-wise. What happens if there's padding between the jit_context and variant members of

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Keith Whitwell
On Wed, 2011-06-29 at 16:16 -0700, Corbin Simpson wrote: Okay, so maybe I'm failing to recognize the exact situation here, but wouldn't it be possible to mark the FS state with a serial number and just compare those? Or are these FS states not CSO-cached? No, the struct being compared is

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Keith Whitwell
On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: Ok in fact there's a gcc bug about memcmp: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 In short gcc's memcmp builtin is totally lame and loses to glibc's memcmp (including call overhead, no knowledge about alignment etc.) even

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Jose Fonseca
Great work Roland! And thanks Ajax to finding this hot spot. We use memcmp a lot -- all CSO caching, so we should use this everywhere. We should also code a sse2 version with intrinsics for x86-64, which is guaranteed to always have SSE2. Jose - Original Message - Actually I ran some

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Jose Fonseca
- Original Message - Hmm. Forgive my ignorance, but isn't memcmp() on structs pretty prone to give incorrect != results, given that there may be padding between members in structs and that IIRC gcc struct assignment is member-wise. There's no alternative to bitwise comparison on

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Jose Fonseca
- Original Message - On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: Ok in fact there's a gcc bug about memcmp: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 In short gcc's memcmp builtin is totally lame and loses to glibc's memcmp (including call overhead, no

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Keith Whitwell
On Thu, 2011-06-30 at 03:27 -0700, Jose Fonseca wrote: - Original Message - On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: Ok in fact there's a gcc bug about memcmp: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 In short gcc's memcmp builtin is totally lame

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Adam Jackson
On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: Ok in fact there's a gcc bug about memcmp: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 In short gcc's memcmp builtin is totally lame and loses to glibc's memcmp (including call overhead, no knowledge about alignment etc.) even

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Roland Scheidegger
Am 30.06.2011 12:14, schrieb Jose Fonseca: - Original Message - Hmm. Forgive my ignorance, but isn't memcmp() on structs pretty prone to give incorrect != results, given that there may be padding between members in structs and that IIRC gcc struct assignment is member-wise.

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Roland Scheidegger
Am 30.06.2011 16:14, schrieb Adam Jackson: On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: Ok in fact there's a gcc bug about memcmp: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 In short gcc's memcmp builtin is totally lame and loses to glibc's memcmp (including call

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Keith Whitwell
On Thu, 2011-06-30 at 17:53 +0200, Roland Scheidegger wrote: Am 30.06.2011 16:14, schrieb Adam Jackson: On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: Ok in fact there's a gcc bug about memcmp: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 In short gcc's memcmp builtin

[Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-29 Thread Adam Jackson
Perversely, do this by eliminating the comparison between stored and current fs state. On ipers, a perf trace showed try_update_scene_state using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the memcmp. Taking that out takes try_update_scene_state down to 6.5% of the profile; more

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-29 Thread Keith Whitwell
On Wed, 2011-06-29 at 13:19 -0400, Adam Jackson wrote: Perversely, do this by eliminating the comparison between stored and current fs state. On ipers, a perf trace showed try_update_scene_state using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the memcmp. Taking that out takes

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-29 Thread Roland Scheidegger
Ohh that's interesting, you'd think the comparison shouldn't be that expensive (though I guess in ipers case the comparison is never true). memcmp is quite extensively used everywhere. Maybe we could replace that with something faster (since we only ever care if the blocks are the same but not

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-29 Thread Roland Scheidegger
Actually I ran some numbers here and tried out a optimized struct compare: original ipers: 12.1 fps ajax patch: 15.5 fps optimized struct compare: 16.8 fps This is the function I used for that (just enabled in that lp_setup function): static INLINE int util_cmp_struct(const void *src1, const

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-29 Thread Roland Scheidegger
I didn't even look at that was just curious why the memcmp (which is used a lot in other places) is slow. However, none of the other memcmp seem to show up prominently (cso functions are quite low in profiles, _mesa_search_program_cache uses memcmp too but it's not that high neither). So I guess

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-29 Thread Roland Scheidegger
Ok in fact there's a gcc bug about memcmp: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 In short gcc's memcmp builtin is totally lame and loses to glibc's memcmp (including call overhead, no knowledge about alignment etc.) even when comparing only very few bytes (and loses BIG time for lots