Re: [Mesa-dev] [PATCH v2 08/13] mesa/format_utils: Add a general format conversion function

2014-09-12 Thread Marek Olšák
Have you considered turning all inline functions into macros, so that
the compiler doesn't have to inline them?

Marek

On Fri, Sep 12, 2014 at 12:58 AM, Jason Ekstrand ja...@jlekstrand.net wrote:


 On Thu, Sep 11, 2014 at 3:53 PM, Dieter Nützel die...@nuetzel-hh.de wrote:

 Am 12.09.2014 00:31, schrieb Jason Ekstrand:

 On Thu, Sep 11, 2014 at 2:55 PM, Dieter Nützel die...@nuetzel-hh.de
 wrote:

 Am 15.08.2014 04:50, schrieb Jason Ekstrand:

 On Aug 14, 2014 7:13 PM, Dieter Nützel die...@nuetzel-hh.de
 wrote:


 Am 15.08.2014 02:36, schrieb Dave Airlie:

 On 08/02/2014 02:11 PM, Jason Ekstrand wrote:



 Most format conversion operations required by GL can be

 performed by

 converting one channel at a time, shuffling the channels

 around, and

 optionally filling missing channels with zeros and ones.

 This
 adds a

 function to do just that in a general, yet efficient, way.

 v2:
 * Add better comments including full docs for functions
 * Don't use __typeof__
 * Use inline helpers instead of writing out conversions

 by
 hand,

 * Force full loop unrolling for better performance



 This file seems to anger gcc a lot.

 It seems to take upwards of a minute or two to compile here.

 gcc 4.8.3 on 32-bit x86.

 Dave.



 For me (on our poor little Duron 1800/2 GB) it ran ~5

 minutes...


 gcc 4.8.1 on 32-bit x86.


 If we'd like, the way the macros are set up, it would be easy to
 change it so that we do less unrolling in the cases where we are
 actually doing substantial format conversion and wouldn't notice
 the
 extra logic quite as much. I'll play with it a bit tomorrow or
 next
 week and see how how much of a hit we would actually take if we
 unrolled a little less in places.
 --Jason Ekstrand


 Ping.

 In a second it took 11+ minutes , here...


 11 minutes! What system are you running?  and are you using -03 or
 something?  Yes, we can do something to cut it down, but it will
 probably require a configure flag; the question is what flag.

 --Jason


 See above, the old children's system... ;-)
 -O2 -m32 -march=athlon-mp -mtune=athlon-mp -m3dnow -msse -mmmx
 -mfpmath=sse,387 -pipe

 Bad? - Worked for ages on AthlonMP8-)
 Maybe it is bad on Duron (the MP thing, much smaller cache and better
 GCC), now.

 Dieter


 Yeah, my recommendation would be hacking the macros to not unroll and keep
 the patch locally.  If you've got a better idea as to how to organize the
 code so the compiler likes it, I'm open as long as we don't loose
 performance.
 --Jason

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 08/13] mesa/format_utils: Add a general format conversion function

2014-09-12 Thread Brian Paul

On 09/11/2014 04:58 PM, Jason Ekstrand wrote:



On Thu, Sep 11, 2014 at 3:53 PM, Dieter Nützel die...@nuetzel-hh.de
mailto:die...@nuetzel-hh.de wrote:

Am 12.09.2014 00:31, schrieb Jason Ekstrand:

On Thu, Sep 11, 2014 at 2:55 PM, Dieter Nützel
die...@nuetzel-hh.de mailto:die...@nuetzel-hh.de
wrote:

Am 15.08.2014 04:50, schrieb Jason Ekstrand:

On Aug 14, 2014 7:13 PM, Dieter Nützel
die...@nuetzel-hh.de mailto:die...@nuetzel-hh.de
wrote:


Am 15.08.2014 02:36, schrieb Dave Airlie:

On 08/02/2014 02:11 PM, Jason Ekstrand
wrote:



Most format conversion operations
required by GL can be

performed by

converting one channel at a time,
shuffling the channels

around, and

optionally filling missing channels
with zeros and ones.

This
adds a

function to do just that in a
general, yet efficient, way.

v2:
* Add better comments including full
docs for functions
* Don't use __typeof__
* Use inline helpers instead of
writing out conversions

by
hand,

* Force full loop unrolling for
better performance



This file seems to anger gcc a lot.

It seems to take upwards of a minute or two to
compile here.

gcc 4.8.3 on 32-bit x86.

Dave.



For me (on our poor little Duron 1800/2 GB) it ran ~5

minutes...


gcc 4.8.1 on 32-bit x86.


If we'd like, the way the macros are set up, it would be
easy to
change it so that we do less unrolling in the cases
where we are
actually doing substantial format conversion and
wouldn't notice
the
extra logic quite as much. I'll play with it a bit
tomorrow or
next
week and see how how much of a hit we would actually
take if we
unrolled a little less in places.
--Jason Ekstrand


Ping.

In a second it took 11+ minutes , here...


11 minutes! What system are you running?  and are you using -03 or
something?  Yes, we can do something to cut it down, but it will
probably require a configure flag; the question is what flag.

--Jason


See above, the old children's system... ;-)
-O2 -m32 -march=athlon-mp -mtune=athlon-mp -m3dnow -msse -mmmx
-mfpmath=sse,387 -pipe

Bad? - Worked for ages on AthlonMP8-)
Maybe it is bad on Duron (the MP thing, much smaller cache and
better GCC), now.

Dieter


Yeah, my recommendation would be hacking the macros to not unroll and
keep the patch locally.  If you've got a better idea as to how to
organize the code so the compiler likes it, I'm open as long as we don't
loose performance.


It looks like a release build with MSVC is taking quite a while to 
compile this file too (actually at link time when the optimizer kicks in).


But even on my fast Linux system with gcc, the difference in compile 
time between -O0 and -O3 is pretty big (2 seconds vs. 1 minute, 3 seconds).


I'm still prototyping something but it looks like breaking the top-level 
switch cases in _mesa_swizzle_and_convert() into separate functions 
reduces the time quite a bit.  Let me pursue that a bit further and see 
how it goes...


-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 08/13] mesa/format_utils: Add a general format conversion function

2014-09-12 Thread Jason Ekstrand
Forgot to reply-all.
On Sep 12, 2014 9:05 AM, Jason Ekstrand ja...@jlekstrand.net wrote:

 The teximage-colors test that I pushed to piglit a week or two ago takes a
 --benchmark flag that bumps the texture size and does the upload 1000 times
 and gives you the average time to upload.
 --Jason
 On Sep 12, 2014 9:01 AM, Brian Paul bri...@vmware.com wrote:

 On 09/12/2014 08:49 AM, Jason Ekstrand wrote:


 On Sep 12, 2014 7:09 AM, Brian Paul bri...@vmware.com
 mailto:bri...@vmware.com wrote:
  
   On 09/11/2014 04:58 PM, Jason Ekstrand wrote:
  
  
  
   On Thu, Sep 11, 2014 at 3:53 PM, Dieter Nützel die...@nuetzel-hh.de
 mailto:die...@nuetzel-hh.de
   mailto:die...@nuetzel-hh.de mailto:die...@nuetzel-hh.de wrote:
  
   Am 12.09.2014 00:31, schrieb Jason Ekstrand:
  
   On Thu, Sep 11, 2014 at 2:55 PM, Dieter Nützel
   die...@nuetzel-hh.de mailto:die...@nuetzel-hh.de
 mailto:die...@nuetzel-hh.de mailto:die...@nuetzel-hh.de
  
   wrote:
  
   Am 15.08.2014 04:50, schrieb Jason Ekstrand:
  
   On Aug 14, 2014 7:13 PM, Dieter Nützel
   die...@nuetzel-hh.de mailto:die...@nuetzel-hh.de
 mailto:die...@nuetzel-hh.de mailto:die...@nuetzel-hh.de
  
   wrote:
  
  
   Am 15.08.2014 02:36, schrieb Dave Airlie:
  
   On 08/02/2014 02:11 PM, Jason
 Ekstrand
   wrote:
  
  
  
   Most format conversion operations
   required by GL can be
  
   performed by
  
   converting one channel at a time,
   shuffling the channels
  
   around, and
  
   optionally filling missing
 channels
   with zeros and ones.
  
   This
   adds a
  
   function to do just that in a
   general, yet efficient, way.
  
   v2:
   * Add better comments including
 full
   docs for functions
   * Don't use __typeof__
   * Use inline helpers instead of
   writing out conversions
  
   by
   hand,
  
   * Force full loop unrolling for
   better performance
  
  
  
   This file seems to anger gcc a lot.
  
   It seems to take upwards of a minute or two
 to
   compile here.
  
   gcc 4.8.3 on 32-bit x86.
  
   Dave.
  
  
  
   For me (on our poor little Duron 1800/2 GB) it
 ran ~5
  
   minutes...
  
  
   gcc 4.8.1 on 32-bit x86.
  
  
   If we'd like, the way the macros are set up, it
 would be
   easy to
   change it so that we do less unrolling in the cases
   where we are
   actually doing substantial format conversion and
   wouldn't notice
   the
   extra logic quite as much. I'll play with it a bit
   tomorrow or
   next
   week and see how how much of a hit we would actually
   take if we
   unrolled a little less in places.
   --Jason Ekstrand
  
  
   Ping.
  
   In a second it took 11+ minutes , here...
  
  
   11 minutes! What system are you running?  and are you using
 -03 or
   something?  Yes, we can do something to cut it down, but it
 will
   probably require a configure flag; the question is what flag.
  
   --Jason
  
  
   See above, the old children's system... ;-)
   -O2 -m32 -march=athlon-mp -mtune=athlon-mp -m3dnow -msse -mmmx
   -mfpmath=sse,387 -pipe
  
   Bad? - Worked for ages on AthlonMP8-)
   Maybe it is bad on Duron (the MP thing, much smaller cache and
   better GCC), now.
  
   Dieter
  
  
   Yeah, my recommendation would be hacking the macros to not unroll and
   keep the patch locally.  If you've got a better idea as to how to
   organize the code so the compiler likes it, I'm open as long as we
 don't
   loose performance.
  
  
   It looks like a release build with MSVC is taking quite a while to
 compile this file too (actually at link time when the optimizer kicks
 in).
  
   But even on my fast Linux system with gcc, the difference in compile
 time between -O0 and -O3 is pretty big (2 seconds vs. 1 

Re: [Mesa-dev] [PATCH v2 08/13] mesa/format_utils: Add a general format conversion function

2014-09-12 Thread Brian Paul

On 09/12/2014 08:09 AM, Brian Paul wrote:

On 09/11/2014 04:58 PM, Jason Ekstrand wrote:



On Thu, Sep 11, 2014 at 3:53 PM, Dieter Nützel die...@nuetzel-hh.de
mailto:die...@nuetzel-hh.de wrote:

Am 12.09.2014 00:31, schrieb Jason Ekstrand:

On Thu, Sep 11, 2014 at 2:55 PM, Dieter Nützel
die...@nuetzel-hh.de mailto:die...@nuetzel-hh.de
wrote:

Am 15.08.2014 04:50, schrieb Jason Ekstrand:

On Aug 14, 2014 7:13 PM, Dieter Nützel
die...@nuetzel-hh.de mailto:die...@nuetzel-hh.de
wrote:


Am 15.08.2014 02:36, schrieb Dave Airlie:

On 08/02/2014 02:11 PM, Jason Ekstrand
wrote:



Most format conversion operations
required by GL can be

performed by

converting one channel at a time,
shuffling the channels

around, and

optionally filling missing channels
with zeros and ones.

This
adds a

function to do just that in a
general, yet efficient, way.

v2:
* Add better comments including full
docs for functions
* Don't use __typeof__
* Use inline helpers instead of
writing out conversions

by
hand,

* Force full loop unrolling for
better performance



This file seems to anger gcc a lot.

It seems to take upwards of a minute or two to
compile here.

gcc 4.8.3 on 32-bit x86.

Dave.



For me (on our poor little Duron 1800/2 GB) it ran ~5

minutes...


gcc 4.8.1 on 32-bit x86.


If we'd like, the way the macros are set up, it would be
easy to
change it so that we do less unrolling in the cases
where we are
actually doing substantial format conversion and
wouldn't notice
the
extra logic quite as much. I'll play with it a bit
tomorrow or
next
week and see how how much of a hit we would actually
take if we
unrolled a little less in places.
--Jason Ekstrand


Ping.

In a second it took 11+ minutes , here...


11 minutes! What system are you running?  and are you using
-03 or
something?  Yes, we can do something to cut it down, but it will
probably require a configure flag; the question is what flag.

--Jason


See above, the old children's system... ;-)
-O2 -m32 -march=athlon-mp -mtune=athlon-mp -m3dnow -msse -mmmx
-mfpmath=sse,387 -pipe

Bad? - Worked for ages on AthlonMP8-)
Maybe it is bad on Duron (the MP thing, much smaller cache and
better GCC), now.

Dieter


Yeah, my recommendation would be hacking the macros to not unroll and
keep the patch locally.  If you've got a better idea as to how to
organize the code so the compiler likes it, I'm open as long as we don't
loose performance.


It looks like a release build with MSVC is taking quite a while to
compile this file too (actually at link time when the optimizer kicks in).

But even on my fast Linux system with gcc, the difference in compile
time between -O0 and -O3 is pretty big (2 seconds vs. 1 minute, 3 seconds).

I'm still prototyping something but it looks like breaking the top-level
switch cases in _mesa_swizzle_and_convert() into separate functions
reduces the time quite a bit.  Let me pursue that a bit further and see
how it goes...


OK, I'm posting a couple patches:

mesa: break up _mesa_swizzle_and_convert() to reduce compile time

This reduces -O3 compile time with gcc to 1/4 of what it was.  Seems to 
reduce compile time with MSVC too, but I haven't really measured it. 
Dieter, can you test this patch on your system?



mesa: move i, j var decls into SWIZZLE_CONVERT_LOOP() macro

I think the optimizer can sometimes do a better job when loop variables 
are declared per-loop, rather than declared per-function.  But this 
patch increases the size of the .o file from 2556528 to 2933216 bytes 
(15%).  Jason, if you have a benchmark to measure the speed of this 
code, I'd be interested to know if this patch helps much.  Moving the 
declaration of 'j' 

Re: [Mesa-dev] [PATCH v2 08/13] mesa/format_utils: Add a general format conversion function

2014-09-11 Thread Dieter Nützel

Am 15.08.2014 04:50, schrieb Jason Ekstrand:

On Aug 14, 2014 7:13 PM, Dieter Nützel die...@nuetzel-hh.de
wrote:
 
  Am 15.08.2014 02:36, schrieb Dave Airlie:
 
  On 08/02/2014 02:11 PM, Jason Ekstrand wrote:
 
 
  Most format conversion operations required by GL can be
performed by
  converting one channel at a time, shuffling the channels
around, and
  optionally filling missing channels with zeros and ones.  This
adds a
  function to do just that in a general, yet efficient, way.
 
  v2:
* Add better comments including full docs for functions
* Don't use __typeof__
* Use inline helpers instead of writing out conversions by
hand,
* Force full loop unrolling for better performance
 
 
 
  This file seems to anger gcc a lot.
 
  It seems to take upwards of a minute or two to compile here.
 
  gcc 4.8.3 on 32-bit x86.
 
  Dave.
 
 
  For me (on our poor little Duron 1800/2 GB) it ran ~5 minutes...
 
  gcc 4.8.1 on 32-bit x86.

If we'd like, the way the macros are set up, it would be easy to
change it so that we do less unrolling in the cases where we are
actually doing substantial format conversion and wouldn't notice the
extra logic quite as much.  I'll play with it a bit tomorrow or next
week and see how how much of a hit we would actually take if we
unrolled a little less in places.
 --Jason Ekstrand


Ping.

In a second it took 11+ minutes , here...

Thanks!
Dieter
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 08/13] mesa/format_utils: Add a general format conversion function

2014-09-11 Thread Jason Ekstrand
On Thu, Sep 11, 2014 at 2:55 PM, Dieter Nützel die...@nuetzel-hh.de wrote:

 Am 15.08.2014 04:50, schrieb Jason Ekstrand:

  On Aug 14, 2014 7:13 PM, Dieter Nützel die...@nuetzel-hh.de
 wrote:
  
   Am 15.08.2014 02:36, schrieb Dave Airlie:
  
   On 08/02/2014 02:11 PM, Jason Ekstrand wrote:
  
  
   Most format conversion operations required by GL can be
 performed by
   converting one channel at a time, shuffling the channels
 around, and
   optionally filling missing channels with zeros and ones.  This
 adds a
   function to do just that in a general, yet efficient, way.
  
   v2:
 * Add better comments including full docs for functions
 * Don't use __typeof__
 * Use inline helpers instead of writing out conversions by
 hand,
 * Force full loop unrolling for better performance
  
  
  
   This file seems to anger gcc a lot.
  
   It seems to take upwards of a minute or two to compile here.
  
   gcc 4.8.3 on 32-bit x86.
  
   Dave.
  
  
   For me (on our poor little Duron 1800/2 GB) it ran ~5 minutes...
  
   gcc 4.8.1 on 32-bit x86.

 If we'd like, the way the macros are set up, it would be easy to
 change it so that we do less unrolling in the cases where we are
 actually doing substantial format conversion and wouldn't notice the
 extra logic quite as much.  I'll play with it a bit tomorrow or next
 week and see how how much of a hit we would actually take if we
 unrolled a little less in places.
  --Jason Ekstrand


 Ping.

 In a second it took 11+ minutes , here...


11 minutes! What system are you running?  and are you using -03 or
something?  Yes, we can do something to cut it down, but it will probably
require a configure flag; the question is what flag.
--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 08/13] mesa/format_utils: Add a general format conversion function

2014-09-11 Thread Jason Ekstrand
On Thu, Sep 11, 2014 at 3:31 PM, Jason Ekstrand ja...@jlekstrand.net
wrote:



 On Thu, Sep 11, 2014 at 2:55 PM, Dieter Nützel die...@nuetzel-hh.de
 wrote:

 Am 15.08.2014 04:50, schrieb Jason Ekstrand:

  On Aug 14, 2014 7:13 PM, Dieter Nützel die...@nuetzel-hh.de
 wrote:
  
   Am 15.08.2014 02:36, schrieb Dave Airlie:
  
   On 08/02/2014 02:11 PM, Jason Ekstrand wrote:
  
  
   Most format conversion operations required by GL can be
 performed by
   converting one channel at a time, shuffling the channels
 around, and
   optionally filling missing channels with zeros and ones.  This
 adds a
   function to do just that in a general, yet efficient, way.
  
   v2:
 * Add better comments including full docs for functions
 * Don't use __typeof__
 * Use inline helpers instead of writing out conversions by
 hand,
 * Force full loop unrolling for better performance
  
  
  
   This file seems to anger gcc a lot.
  
   It seems to take upwards of a minute or two to compile here.
  
   gcc 4.8.3 on 32-bit x86.
  
   Dave.
  
  
   For me (on our poor little Duron 1800/2 GB) it ran ~5 minutes...
  
   gcc 4.8.1 on 32-bit x86.

 If we'd like, the way the macros are set up, it would be easy to
 change it so that we do less unrolling in the cases where we are
 actually doing substantial format conversion and wouldn't notice the
 extra logic quite as much.  I'll play with it a bit tomorrow or next
 week and see how how much of a hit we would actually take if we
 unrolled a little less in places.
  --Jason Ekstrand


 Ping.

 In a second it took 11+ minutes , here...


 11 minutes! What system are you running?  and are you using -03 or
 something?  Yes, we can do something to cut it down, but it will probably
 require a configure flag; the question is what flag.
 --Jason


I'm also open to re-factoring it in some way that's a little easier on
compilers.  That said, it's very sensitive to performance, so whatever
refactor gets done will have to be done carefully.
--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 08/13] mesa/format_utils: Add a general format conversion function

2014-09-11 Thread Dieter Nützel

Am 12.09.2014 00:31, schrieb Jason Ekstrand:

On Thu, Sep 11, 2014 at 2:55 PM, Dieter Nützel die...@nuetzel-hh.de
wrote:


Am 15.08.2014 04:50, schrieb Jason Ekstrand:


On Aug 14, 2014 7:13 PM, Dieter Nützel die...@nuetzel-hh.de
wrote:


Am 15.08.2014 02:36, schrieb Dave Airlie:


On 08/02/2014 02:11 PM, Jason Ekstrand wrote:



Most format conversion operations required by GL can be

performed by

converting one channel at a time, shuffling the channels

around, and

optionally filling missing channels with zeros and ones.

This
adds a

function to do just that in a general, yet efficient, way.

v2:
* Add better comments including full docs for functions
* Don't use __typeof__
* Use inline helpers instead of writing out conversions

by
hand,

* Force full loop unrolling for better performance




This file seems to anger gcc a lot.

It seems to take upwards of a minute or two to compile here.

gcc 4.8.3 on 32-bit x86.

Dave.



For me (on our poor little Duron 1800/2 GB) it ran ~5

minutes...


gcc 4.8.1 on 32-bit x86.


If we'd like, the way the macros are set up, it would be easy to
change it so that we do less unrolling in the cases where we are
actually doing substantial format conversion and wouldn't notice
the
extra logic quite as much. I'll play with it a bit tomorrow or
next
week and see how how much of a hit we would actually take if we
unrolled a little less in places.
--Jason Ekstrand


Ping.

In a second it took 11+ minutes , here...


11 minutes! What system are you running?  and are you using -03 or
something?  Yes, we can do something to cut it down, but it will
probably require a configure flag; the question is what flag.

--Jason


See above, the old children's system... ;-)
-O2 -m32 -march=athlon-mp -mtune=athlon-mp -m3dnow -msse -mmmx 
-mfpmath=sse,387 -pipe


Bad? - Worked for ages on AthlonMP8-)
Maybe it is bad on Duron (the MP thing, much smaller cache and better 
GCC), now.


Dieter
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 08/13] mesa/format_utils: Add a general format conversion function

2014-09-11 Thread Jason Ekstrand
On Thu, Sep 11, 2014 at 3:53 PM, Dieter Nützel die...@nuetzel-hh.de wrote:

 Am 12.09.2014 00:31, schrieb Jason Ekstrand:

  On Thu, Sep 11, 2014 at 2:55 PM, Dieter Nützel die...@nuetzel-hh.de
 wrote:

  Am 15.08.2014 04:50, schrieb Jason Ekstrand:

  On Aug 14, 2014 7:13 PM, Dieter Nützel die...@nuetzel-hh.de
 wrote:


 Am 15.08.2014 02:36, schrieb Dave Airlie:

  On 08/02/2014 02:11 PM, Jason Ekstrand wrote:



 Most format conversion operations required by GL can be

 performed by

 converting one channel at a time, shuffling the channels

 around, and

 optionally filling missing channels with zeros and ones.

 This
 adds a

 function to do just that in a general, yet efficient, way.

 v2:
 * Add better comments including full docs for functions
 * Don't use __typeof__
 * Use inline helpers instead of writing out conversions

 by
 hand,

 * Force full loop unrolling for better performance



 This file seems to anger gcc a lot.

 It seems to take upwards of a minute or two to compile here.

 gcc 4.8.3 on 32-bit x86.

 Dave.



 For me (on our poor little Duron 1800/2 GB) it ran ~5

 minutes...


 gcc 4.8.1 on 32-bit x86.


 If we'd like, the way the macros are set up, it would be easy to
 change it so that we do less unrolling in the cases where we are
 actually doing substantial format conversion and wouldn't notice
 the
 extra logic quite as much. I'll play with it a bit tomorrow or
 next
 week and see how how much of a hit we would actually take if we
 unrolled a little less in places.
 --Jason Ekstrand


 Ping.

 In a second it took 11+ minutes , here...


 11 minutes! What system are you running?  and are you using -03 or
 something?  Yes, we can do something to cut it down, but it will
 probably require a configure flag; the question is what flag.

 --Jason


 See above, the old children's system... ;-)
 -O2 -m32 -march=athlon-mp -mtune=athlon-mp -m3dnow -msse -mmmx
 -mfpmath=sse,387 -pipe

 Bad? - Worked for ages on AthlonMP8-)
 Maybe it is bad on Duron (the MP thing, much smaller cache and better
 GCC), now.

 Dieter


Yeah, my recommendation would be hacking the macros to not unroll and keep
the patch locally.  If you've got a better idea as to how to organize the
code so the compiler likes it, I'm open as long as we don't loose
performance.
--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 08/13] mesa/format_utils: Add a general format conversion function

2014-08-14 Thread Dave Airlie
 On 08/02/2014 02:11 PM, Jason Ekstrand wrote:

 Most format conversion operations required by GL can be performed by
 converting one channel at a time, shuffling the channels around, and
 optionally filling missing channels with zeros and ones.  This adds a
 function to do just that in a general, yet efficient, way.

 v2:
   * Add better comments including full docs for functions
   * Don't use __typeof__
   * Use inline helpers instead of writing out conversions by hand,
   * Force full loop unrolling for better performance



This file seems to anger gcc a lot.

It seems to take upwards of a minute or two to compile here.

gcc 4.8.3 on 32-bit x86.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 08/13] mesa/format_utils: Add a general format conversion function

2014-08-14 Thread Dieter Nützel

Am 15.08.2014 02:36, schrieb Dave Airlie:

On 08/02/2014 02:11 PM, Jason Ekstrand wrote:


Most format conversion operations required by GL can be performed by
converting one channel at a time, shuffling the channels around, and
optionally filling missing channels with zeros and ones.  This adds 
a

function to do just that in a general, yet efficient, way.

v2:
  * Add better comments including full docs for functions
  * Don't use __typeof__
  * Use inline helpers instead of writing out conversions by hand,
  * Force full loop unrolling for better performance




This file seems to anger gcc a lot.

It seems to take upwards of a minute or two to compile here.

gcc 4.8.3 on 32-bit x86.

Dave.


For me (on our poor little Duron 1800/2 GB) it ran ~5 minutes...

gcc 4.8.1 on 32-bit x86.

Dieter
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 08/13] mesa/format_utils: Add a general format conversion function

2014-08-14 Thread Jason Ekstrand
On Aug 14, 2014 7:13 PM, Dieter Nützel die...@nuetzel-hh.de wrote:

 Am 15.08.2014 02:36, schrieb Dave Airlie:

 On 08/02/2014 02:11 PM, Jason Ekstrand wrote:


 Most format conversion operations required by GL can be performed by
 converting one channel at a time, shuffling the channels around, and
 optionally filling missing channels with zeros and ones.  This adds a
 function to do just that in a general, yet efficient, way.

 v2:
   * Add better comments including full docs for functions
   * Don't use __typeof__
   * Use inline helpers instead of writing out conversions by hand,
   * Force full loop unrolling for better performance



 This file seems to anger gcc a lot.

 It seems to take upwards of a minute or two to compile here.

 gcc 4.8.3 on 32-bit x86.

 Dave.


 For me (on our poor little Duron 1800/2 GB) it ran ~5 minutes...

 gcc 4.8.1 on 32-bit x86.

If we'd like, the way the macros are set up, it would be easy to change it
so that we do less unrolling in the cases where we are actually doing
substantial format conversion and wouldn't notice the extra logic quite as
much.  I'll play with it a bit tomorrow or next week and see how how much
of a hit we would actually take if we unrolled a little less in places.
--Jason Ekstrand
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 08/13] mesa/format_utils: Add a general format conversion function

2014-08-04 Thread Brian Paul

On 08/02/2014 02:11 PM, Jason Ekstrand wrote:

Most format conversion operations required by GL can be performed by
converting one channel at a time, shuffling the channels around, and
optionally filling missing channels with zeros and ones.  This adds a
function to do just that in a general, yet efficient, way.

v2:
  * Add better comments including full docs for functions
  * Don't use __typeof__
  * Use inline helpers instead of writing out conversions by hand,
  * Force full loop unrolling for better performance

Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com
---
  src/mesa/main/format_utils.c | 844 +++
  src/mesa/main/format_utils.h |   5 +
  2 files changed, 849 insertions(+)

diff --git a/src/mesa/main/format_utils.c b/src/mesa/main/format_utils.c
index 241c158..d60aeb3 100644
--- a/src/mesa/main/format_utils.c
+++ b/src/mesa/main/format_utils.c
@@ -54,3 +54,847 @@ _mesa_srgb_ubyte_to_linear_float(uint8_t cl)

 return lut[cl];
  }
+
+/* A bunch of format conversion macros and helper functions used below */
+
+/* Only guaranteed to work for BITS = 32 */
+#define MAX_UINT(BITS) ((BITS) == 32 ? UINT32_MAX : ((1u  (BITS)) - 1))
+#define MAX_INT(BITS) (int)MAX_UINT((BITS) - 1)


I'd probably put one more set of parens around the whole macro body, 
just to be safe.


-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 08/13] mesa/format_utils: Add a general format conversion function

2014-08-04 Thread Jason Ekstrand
On Mon, Aug 4, 2014 at 7:55 AM, Brian Paul bri...@vmware.com wrote:

 On 08/02/2014 02:11 PM, Jason Ekstrand wrote:

 Most format conversion operations required by GL can be performed by
 converting one channel at a time, shuffling the channels around, and
 optionally filling missing channels with zeros and ones.  This adds a
 function to do just that in a general, yet efficient, way.

 v2:
   * Add better comments including full docs for functions
   * Don't use __typeof__
   * Use inline helpers instead of writing out conversions by hand,
   * Force full loop unrolling for better performance

 Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com
 ---
   src/mesa/main/format_utils.c | 844 ++
 +
   src/mesa/main/format_utils.h |   5 +
   2 files changed, 849 insertions(+)

 diff --git a/src/mesa/main/format_utils.c b/src/mesa/main/format_utils.c
 index 241c158..d60aeb3 100644
 --- a/src/mesa/main/format_utils.c
 +++ b/src/mesa/main/format_utils.c
 @@ -54,3 +54,847 @@ _mesa_srgb_ubyte_to_linear_float(uint8_t cl)

  return lut[cl];
   }
 +
 +/* A bunch of format conversion macros and helper functions used below */
 +
 +/* Only guaranteed to work for BITS = 32 */
 +#define MAX_UINT(BITS) ((BITS) == 32 ? UINT32_MAX : ((1u  (BITS)) - 1))
 +#define MAX_INT(BITS) (int)MAX_UINT((BITS) - 1)


 I'd probably put one more set of parens around the whole macro body, just
 to be safe.


Yup, fixed.
--Jason



 -Brian


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev