Re: [Mesa-dev] shader-db, and justifying an i965 compiler optimization.

2011-05-18 Thread Tom Stellard
On Wed, May 18, 2011 at 12:23:40PM -0700, Eric Anholt wrote:
> On Wed, 18 May 2011 09:00:09 +0200, Ian Romanick  wrote:
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA1
> > 
> > On 05/18/2011 05:22 AM, Eric Anholt wrote:
> > > One of the pain points of working on compiler optimizations has been
> > > justifying them -- sometimes I come up with something I think is
> > > useful and spend a day or two on it, but the value doesn't show up as
> > > fps in the application that suggested the optimization to me.  Then I
> > > wonder if this transformation of the code is paying off in general,
> > > and thus if I should push it.  If I don't push it, I end up bringing
> > > that patch out on every application I look at that it could affect, to
> > > see if now I finally have justification to get it out of a private
> > > branch.
> > > 
> > > At a conference this week, we heard about how another team is are
> > > using a database of (assembly) shaders, which they run through their
> > > compiler and count resulting instructions for testing purposes.  This
> > > sounded like a fun idea, so I threw one together.  Patch #1 is good in
> > 
> > This is one of those ideas that seems so obvious after you hear about it
> > that you can't believe you hadn't thought of it years ago.  This seems
> > like something we'd want in piglit, but I'm not sure how that would look.
> 
> Incidentally, Tom Stellard has apparently been doing this across piglit
> already.  This makes me think that maybe I want to just roll the
> captured open-source shaders into glslparsertest, and just use the
> analysis stuff on piglit.
>

I use this piglit patch to help capture shader stats:
http://lists.freedesktop.org/archives/piglit/2010-December/000189.html

It redirects any line of output that begins with ~ to a stats file. Then
I use sdiff to compare stats files from different piglit runs.

The output looks like this:

shaders/glsl-orangebook-ch06-bump
 FRAGMENT PROGRAM ~~~
~  25 Instructions
~  25 Vector Instructions (RGB)
~   4 Scalar Instructions (Alpha)
~   0 Flow Control Instructions
~   0 Texture Instructions
~   2 Presub Operations
~   6 Temporary Registers
~~ END ~~

This patch is probably a little overkill, though, because as Marek
pointed out, the same thing could be accomplished by grep'ing the raw
output from piglit.  This has been useful for testing compiler
optimizations, but it would be much better if there were some real world
shaders in piglit.

Also, the glslparsertest hack isn't working on r300g, because shaders
don't get compiled in the r300 backend until the first time they are used.
It's done this way so the driver can emulate things like shadow samplers
in the shader.  I'm not sure what the best solution is for this.  Maybe
we could add an environment variable to force compilation at link time.

-Tom


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] shader-db, and justifying an i965 compiler optimization.

2011-05-18 Thread Jerome Glisse
On Wed, May 18, 2011 at 3:16 PM, Eric Anholt  wrote:
> On Wed, 18 May 2011 11:05:39 -0400, Jerome Glisse  wrote:
>> On Tue, May 17, 2011 at 11:22 PM, Eric Anholt  wrote:
>> > One of the pain points of working on compiler optimizations has been
>> > justifying them -- sometimes I come up with something I think is
>> > useful and spend a day or two on it, but the value doesn't show up as
>> > fps in the application that suggested the optimization to me.  Then I
>> > wonder if this transformation of the code is paying off in general,
>> > and thus if I should push it.  If I don't push it, I end up bringing
>> > that patch out on every application I look at that it could affect, to
>> > see if now I finally have justification to get it out of a private
>> > branch.
>> >
>> > At a conference this week, we heard about how another team is are
>> > using a database of (assembly) shaders, which they run through their
>> > compiler and count resulting instructions for testing purposes.  This
>> > sounded like a fun idea, so I threw one together.  Patch #1 is good in
>> > general (hey, link errors, finally!), but also means that a quick hack
>> > to glslparsertest makes it link a passing compile shader and therefore
>> > generate assembly that gets dumped under INTEL_DEBUG=wm.  Patch #2 I
>> > used for automatic scraping of shaders in every application I could
>> > find on my system at the time.  The open-source ones I pushed to:
>> >
>> > http://cgit.freedesktop.org/~anholt/shader-db
>> >
>> > And finally, patch #3 is something I built before but couldn't really
>> > justify until now.  However, given that it reduced fragment shader
>> > instructions 0.3% across 831 shaders (affecting 52 of them including
>> > yofrankie, warsow, norsetto, and gstreamer) and didn't increase
>> > instructions anywhere, I'm a lot happier now.
>> >
>> > Hopefully we hook up EXT_timer_query to apitrace soon so I can do more
>> > targeted optimizations and need this less :) In the meantime, I hope
>> > this can prove useful to others -- if you want to contribute
>> > appropriately-licensed shaders to the database so we track those, or
>> > if you want to make the analysis work on your hardware backend, feel
>> > free.
>> >
>>
>> I have been thinking at doing somethings slightly different. Sadly
>> instruction count is not necesarily the best metric to evaluate
>> optimization performed by shader compiler. Hidding texture fetch
>> latency of a shader can improve performance a lot more than saving 2
>> instructions. So my idea was to do a gl app that render into
>> framebuffer thousand time the same shader. The use of fbo is to avoid
>> to have things like swapbuffer or a like to play a role while we are
>> solely interested in shader performance. Also use an fbo as big as
>> possible so fragment shader has a lot of pixel to go through and i
>> believe disabling things like blending, zbuffer ... so no other part
>> of the pipeline impact in anyway the shader.
>
> You might take a look at mesa-demos/src/perf for that.  I haven't had
> success using them for performance work due to the noisiness of the
> results.
>
> More generally, imo, the problem with that plan is you have to build the
> shaders yourself and justify to yourself why that shader you wrote is
> representative, and you spend all your time on building the tests when
> you just wanted to know if an instruction-reduction optimization did
> anything.  shader-db took me one evening to build and collect for all
> applications I had (I've got a personal branch for all the closed-source
> stuff :/ )

Shader is a bunch of input, so for each shader collected the issue is
to provide proper input, texture could use dummy texture unless the
shader have some dependency on the texture data (like if the texture
fetched data determine the number of iteration or is use to kill a
fragment, ...). Well it's all about going through know shader and
building a reasonable set of input for each of them, it's time
consuming but i believe it brings a lot more for testing point of
view.

> For actual performance testing of apps without idsoftware-style
> timedemos, I'm way more excited by the potential of using apitrace with
> EXT_timer_query to decide which shaders I should be analyzing, and then
> I'd know afterward whether I impacted a real application by replaying
> the trace.  That is, assuming I didn't increase CPU costs in the
> process, which is where an apitrace replay would not be representative.
>
> Our perspective is: if we are driving the hardware anywhere below what
> is possible, that is a bug that we should fix.  Analyzing the costs of
> instructions, scheduling impacts, CPU overhead impacts, etc. may be out
> of scope for shader-db, but does make some types of analysis quick and
> easy (test all shaders you have ever seen of in a couple minutes).

 I  agree that shader-db provide a usefull tools, i am just convinced
that number of instruction in complex shader is a bad metric especialy
wh

Re: [Mesa-dev] shader-db, and justifying an i965 compiler optimization.

2011-05-18 Thread Eric Anholt
On Wed, 18 May 2011 11:05:39 -0400, Jerome Glisse  wrote:
> On Tue, May 17, 2011 at 11:22 PM, Eric Anholt  wrote:
> > One of the pain points of working on compiler optimizations has been
> > justifying them -- sometimes I come up with something I think is
> > useful and spend a day or two on it, but the value doesn't show up as
> > fps in the application that suggested the optimization to me.  Then I
> > wonder if this transformation of the code is paying off in general,
> > and thus if I should push it.  If I don't push it, I end up bringing
> > that patch out on every application I look at that it could affect, to
> > see if now I finally have justification to get it out of a private
> > branch.
> >
> > At a conference this week, we heard about how another team is are
> > using a database of (assembly) shaders, which they run through their
> > compiler and count resulting instructions for testing purposes.  This
> > sounded like a fun idea, so I threw one together.  Patch #1 is good in
> > general (hey, link errors, finally!), but also means that a quick hack
> > to glslparsertest makes it link a passing compile shader and therefore
> > generate assembly that gets dumped under INTEL_DEBUG=wm.  Patch #2 I
> > used for automatic scraping of shaders in every application I could
> > find on my system at the time.  The open-source ones I pushed to:
> >
> > http://cgit.freedesktop.org/~anholt/shader-db
> >
> > And finally, patch #3 is something I built before but couldn't really
> > justify until now.  However, given that it reduced fragment shader
> > instructions 0.3% across 831 shaders (affecting 52 of them including
> > yofrankie, warsow, norsetto, and gstreamer) and didn't increase
> > instructions anywhere, I'm a lot happier now.
> >
> > Hopefully we hook up EXT_timer_query to apitrace soon so I can do more
> > targeted optimizations and need this less :) In the meantime, I hope
> > this can prove useful to others -- if you want to contribute
> > appropriately-licensed shaders to the database so we track those, or
> > if you want to make the analysis work on your hardware backend, feel
> > free.
> >
> 
> I have been thinking at doing somethings slightly different. Sadly
> instruction count is not necesarily the best metric to evaluate
> optimization performed by shader compiler. Hidding texture fetch
> latency of a shader can improve performance a lot more than saving 2
> instructions. So my idea was to do a gl app that render into
> framebuffer thousand time the same shader. The use of fbo is to avoid
> to have things like swapbuffer or a like to play a role while we are
> solely interested in shader performance. Also use an fbo as big as
> possible so fragment shader has a lot of pixel to go through and i
> believe disabling things like blending, zbuffer ... so no other part
> of the pipeline impact in anyway the shader.

You might take a look at mesa-demos/src/perf for that.  I haven't had
success using them for performance work due to the noisiness of the
results.

More generally, imo, the problem with that plan is you have to build the
shaders yourself and justify to yourself why that shader you wrote is
representative, and you spend all your time on building the tests when
you just wanted to know if an instruction-reduction optimization did
anything.  shader-db took me one evening to build and collect for all
applications I had (I've got a personal branch for all the closed-source
stuff :/ )

For actual performance testing of apps without idsoftware-style
timedemos, I'm way more excited by the potential of using apitrace with
EXT_timer_query to decide which shaders I should be analyzing, and then
I'd know afterward whether I impacted a real application by replaying
the trace.  That is, assuming I didn't increase CPU costs in the
process, which is where an apitrace replay would not be representative.

Our perspective is: if we are driving the hardware anywhere below what
is possible, that is a bug that we should fix.  Analyzing the costs of
instructions, scheduling impacts, CPU overhead impacts, etc. may be out
of scope for shader-db, but does make some types of analysis quick and
easy (test all shaders you have ever seen of in a couple minutes).


pgpeSdItkPTWG.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] shader-db, and justifying an i965 compiler optimization.

2011-05-18 Thread Eric Anholt
On Wed, 18 May 2011 09:00:09 +0200, Ian Romanick  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 05/18/2011 05:22 AM, Eric Anholt wrote:
> > One of the pain points of working on compiler optimizations has been
> > justifying them -- sometimes I come up with something I think is
> > useful and spend a day or two on it, but the value doesn't show up as
> > fps in the application that suggested the optimization to me.  Then I
> > wonder if this transformation of the code is paying off in general,
> > and thus if I should push it.  If I don't push it, I end up bringing
> > that patch out on every application I look at that it could affect, to
> > see if now I finally have justification to get it out of a private
> > branch.
> > 
> > At a conference this week, we heard about how another team is are
> > using a database of (assembly) shaders, which they run through their
> > compiler and count resulting instructions for testing purposes.  This
> > sounded like a fun idea, so I threw one together.  Patch #1 is good in
> 
> This is one of those ideas that seems so obvious after you hear about it
> that you can't believe you hadn't thought of it years ago.  This seems
> like something we'd want in piglit, but I'm not sure how that would look.

Incidentally, Tom Stellard has apparently been doing this across piglit
already.  This makes me think that maybe I want to just roll the
captured open-source shaders into glslparsertest, and just use the
analysis stuff on piglit.

> The first problem is, obviously, using INTEL_DEBUG=wm to get the
> instruction counts won't work. :)  Perhaps we could extend some of the
> existing assembly program queries (e.g.,
> GL_PROGRAM_NATIVE_INSTRUCTIONS_ARB) to GLSL.  That would help even if we
> didn't incorporate this into piglit.

You say it won't work, but I'm using it and it is working :)

Oh, you mean you want a clean solution and not a dirty hack?  Yeah, I'd
really like to have an interface for apps (read: shader debuggers) to
get our annotated assembly out.

> > And finally, patch #3 is something I built before but couldn't really
> > justify until now.  However, given that it reduced fragment shader
> > instructions 0.3% across 831 shaders (affecting 52 of them including
> > yofrankie, warsow, norsetto, and gstreamer) and didn't increase
> > instructions anywhere, I'm a lot happier now.
> 
> We'll probably want to be able to disable this once we have some sort of
> CSE on the low-level IR.  This sort of optimization can cause problems
> for CSE in cases where the same register is a source and a destination.
>  Imagine something like
> 
>   z = sqrt(x) + y;
>   z = z * w;
>   q = sqrt(x) + y;
> 
> If the result of the first 'sqrt(x) + y' is written directly to z, the
> value is "gone" when the second 'sqrt(x) + y' is executed.  If that
> result is written to a temporary register that is then copied to z, the
> value is still around at the second instance.
> 
> Since we don't have any CSE, this doesn't matter now.  However, it's
> something to keep in mind.

I think for CSE on 965 LIR, we'll want to be aggressive, and just
consider whether the RHS values are still around, so we can execute to a
temp and reuse it on math instructions.  Otherwise, you end up with
weird ordering requirements on the optimization passes to ensure that
register coalescing doesn't kill these CSE opportunities.


pgpVDvsrSbsMC.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] shader-db, and justifying an i965 compiler optimization.

2011-05-18 Thread Jerome Glisse
On Tue, May 17, 2011 at 11:22 PM, Eric Anholt  wrote:
> One of the pain points of working on compiler optimizations has been
> justifying them -- sometimes I come up with something I think is
> useful and spend a day or two on it, but the value doesn't show up as
> fps in the application that suggested the optimization to me.  Then I
> wonder if this transformation of the code is paying off in general,
> and thus if I should push it.  If I don't push it, I end up bringing
> that patch out on every application I look at that it could affect, to
> see if now I finally have justification to get it out of a private
> branch.
>
> At a conference this week, we heard about how another team is are
> using a database of (assembly) shaders, which they run through their
> compiler and count resulting instructions for testing purposes.  This
> sounded like a fun idea, so I threw one together.  Patch #1 is good in
> general (hey, link errors, finally!), but also means that a quick hack
> to glslparsertest makes it link a passing compile shader and therefore
> generate assembly that gets dumped under INTEL_DEBUG=wm.  Patch #2 I
> used for automatic scraping of shaders in every application I could
> find on my system at the time.  The open-source ones I pushed to:
>
> http://cgit.freedesktop.org/~anholt/shader-db
>
> And finally, patch #3 is something I built before but couldn't really
> justify until now.  However, given that it reduced fragment shader
> instructions 0.3% across 831 shaders (affecting 52 of them including
> yofrankie, warsow, norsetto, and gstreamer) and didn't increase
> instructions anywhere, I'm a lot happier now.
>
> Hopefully we hook up EXT_timer_query to apitrace soon so I can do more
> targeted optimizations and need this less :) In the meantime, I hope
> this can prove useful to others -- if you want to contribute
> appropriately-licensed shaders to the database so we track those, or
> if you want to make the analysis work on your hardware backend, feel
> free.
>

I have been thinking at doing somethings slightly different. Sadly
instruction count is not necesarily the best metric to evaluate
optimization performed by shader compiler. Hidding texture fetch
latency of a shader can improve performance a lot more than saving 2
instructions. So my idea was to do a gl app that render into
framebuffer thousand time the same shader. The use of fbo is to avoid
to have things like swapbuffer or a like to play a role while we are
solely interested in shader performance. Also use an fbo as big as
possible so fragment shader has a lot of pixel to go through and i
believe disabling things like blending, zbuffer ... so no other part
of the pipeline impact in anyway the shader.

Others things might play a role, for instance if we provide small
dummy texture we might just hide the gain texture fetch optimization
might give, as the GPU might be able to have the texture in cache and
thus have very low latency on each texture fetch. Same if we are using
same texture for all unit, texture cache might hide latency that real
application might otherwise face. So i think we need to have big
enough dummy texture like 512*512 and different one for each unit,
also try to provide random u,v for texture fetch so that texture cache
doesn't hide too much of the latency.

I am sure i am missing other factor that we should try to diminish
while testing for shader performance.

I think such things isn't a good fit for piglit but it can still be
added as a subtools (so that we don't add yet another repository)

Thanks a lot for extracting all those shader, i am sure we can get
some people to write us shader with some what advance math under
acceptable license.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] shader-db, and justifying an i965 compiler optimization.

2011-05-18 Thread Ian Romanick
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/18/2011 05:22 AM, Eric Anholt wrote:
> One of the pain points of working on compiler optimizations has been
> justifying them -- sometimes I come up with something I think is
> useful and spend a day or two on it, but the value doesn't show up as
> fps in the application that suggested the optimization to me.  Then I
> wonder if this transformation of the code is paying off in general,
> and thus if I should push it.  If I don't push it, I end up bringing
> that patch out on every application I look at that it could affect, to
> see if now I finally have justification to get it out of a private
> branch.
> 
> At a conference this week, we heard about how another team is are
> using a database of (assembly) shaders, which they run through their
> compiler and count resulting instructions for testing purposes.  This
> sounded like a fun idea, so I threw one together.  Patch #1 is good in

This is one of those ideas that seems so obvious after you hear about it
that you can't believe you hadn't thought of it years ago.  This seems
like something we'd want in piglit, but I'm not sure how that would look.

The first problem is, obviously, using INTEL_DEBUG=wm to get the
instruction counts won't work. :)  Perhaps we could extend some of the
existing assembly program queries (e.g.,
GL_PROGRAM_NATIVE_INSTRUCTIONS_ARB) to GLSL.  That would help even if we
didn't incorporate this into piglit.

The other problem is what the test would report for a result.  Hmm...

> general (hey, link errors, finally!), but also means that a quick hack
> to glslparsertest makes it link a passing compile shader and therefore
> generate assembly that gets dumped under INTEL_DEBUG=wm.  Patch #2 I
> used for automatic scraping of shaders in every application I could
> find on my system at the time.  The open-source ones I pushed to:
> 
> http://cgit.freedesktop.org/~anholt/shader-db
> 
> And finally, patch #3 is something I built before but couldn't really
> justify until now.  However, given that it reduced fragment shader
> instructions 0.3% across 831 shaders (affecting 52 of them including
> yofrankie, warsow, norsetto, and gstreamer) and didn't increase
> instructions anywhere, I'm a lot happier now.

We'll probably want to be able to disable this once we have some sort of
CSE on the low-level IR.  This sort of optimization can cause problems
for CSE in cases where the same register is a source and a destination.
 Imagine something like

z = sqrt(x) + y;
z = z * w;
q = sqrt(x) + y;

If the result of the first 'sqrt(x) + y' is written directly to z, the
value is "gone" when the second 'sqrt(x) + y' is executed.  If that
result is written to a temporary register that is then copied to z, the
value is still around at the second instance.

Since we don't have any CSE, this doesn't matter now.  However, it's
something to keep in mind.

> Hopefully we hook up EXT_timer_query to apitrace soon so I can do more
> targeted optimizations and need this less :) In the meantime, I hope
> this can prove useful to others -- if you want to contribute
> appropriately-licensed shaders to the database so we track those, or
> if you want to make the analysis work on your hardware backend, feel
> free.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk3TbnkACgkQX1gOwKyEAw96twCfcEHQaQMe4HtpLar6zAFxj9Ww
i/wAnRfQCSlN5E5vCIyE7t3Ep7EfXuL0
=aVeT
-END PGP SIGNATURE-
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] shader-db, and justifying an i965 compiler optimization.

2011-05-17 Thread Eric Anholt
One of the pain points of working on compiler optimizations has been
justifying them -- sometimes I come up with something I think is
useful and spend a day or two on it, but the value doesn't show up as
fps in the application that suggested the optimization to me.  Then I
wonder if this transformation of the code is paying off in general,
and thus if I should push it.  If I don't push it, I end up bringing
that patch out on every application I look at that it could affect, to
see if now I finally have justification to get it out of a private
branch.

At a conference this week, we heard about how another team is are
using a database of (assembly) shaders, which they run through their
compiler and count resulting instructions for testing purposes.  This
sounded like a fun idea, so I threw one together.  Patch #1 is good in
general (hey, link errors, finally!), but also means that a quick hack
to glslparsertest makes it link a passing compile shader and therefore
generate assembly that gets dumped under INTEL_DEBUG=wm.  Patch #2 I
used for automatic scraping of shaders in every application I could
find on my system at the time.  The open-source ones I pushed to:

http://cgit.freedesktop.org/~anholt/shader-db

And finally, patch #3 is something I built before but couldn't really
justify until now.  However, given that it reduced fragment shader
instructions 0.3% across 831 shaders (affecting 52 of them including
yofrankie, warsow, norsetto, and gstreamer) and didn't increase
instructions anywhere, I'm a lot happier now.

Hopefully we hook up EXT_timer_query to apitrace soon so I can do more
targeted optimizations and need this less :) In the meantime, I hope
this can prove useful to others -- if you want to contribute
appropriately-licensed shaders to the database so we track those, or
if you want to make the analysis work on your hardware backend, feel
free.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev