Re: [Mesa-dev] [PATCH 2/2] mesa: Add performance debug for meta code.

2013-04-20 Thread Kenneth Graunke

On 04/19/2013 11:35 AM, Eric Anholt wrote:

I noticed a fallback in regnum through sysprof, and wanted a nicer way to
get information about it.
---
  src/mesa/drivers/common/meta.c |   28 +---
  src/mesa/main/errors.h |   10 ++
  2 files changed, 35 insertions(+), 3 deletions(-)


Looks nice to me, but I'm guessing this only shows up through 
ARB_debug_output?  It'd be nice to get this via printf when an 
environment variable is set.


For both:
Reviewed-by: Kenneth Graunke kenn...@whitecape.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeon/llvm: Use LLVM C API for compiling LLVM IR to ISA.

2013-04-20 Thread Mathias Fröhlich

Hi Tom,

May be I need to tell where the problem really appears in real life.
OpenSceneGraph has some nifty features regarding multi channel rendering.
Assume a setup of multiple full screen views running on different graphics 
boards into a single mashine composing a view into a single scene.
Now the recommended way to do this with osg is to set up a X screen per 
graphics board. Even if this spans multiple monitors/projectors. Set up a GL 
graphics context per graphics board and set up a viewport per projector in the 
graphics contexts. Rendering happens now in parallel for each graphics 
context. I do drive such a thing here with two radeons and three monitors for 
testing and here the problem appears.

When I start the favourite flight simulation application of my choice with this 
setup, then it crashes almost immediately without llvm_start_multithreaded 
being called. Wheres it works stable if we ensure llvm being multithreaded.

So, I tried to distill a piglit testcase out of this somehow huger setup with 
flightgear, OpenSceneGraph, multiple gpu's and what not.

On Friday, April 19, 2013 20:08:54 Tom Stellard wrote:
 On Wed, Apr 17, 2013 at 07:54:32AM +0200, Mathias Fröhlich wrote:
  Tom,
  
   -class LLVMEnsureMultithreaded {
   -public:
   -   LLVMEnsureMultithreaded()
   -   {
   -  llvm_start_multithreaded();
   -   }
   -};
   -
   -static LLVMEnsureMultithreaded lLVMEnsureMultithreaded;
  
  Removing this leads to crashes in llvm with applications that concurrently
  work on different gl contexts.
 
 The test you wrote still passes with this patch.  Do you see that
 we are now calling the C API version of llvm_start_multithreaded(),
 LLVMStartMutithreaded() from inside radeon_llvm_compile() protected by a
 static variable?

Oh, no I did not see this. I did not realize that the llvm_start_multithreaded 
call is not just plain C. So I thought grepping for the call I used is 
sufficient.

But negative. If I really apply your patch and try to run this with the above 
setup I get the crashes. The same with the piglit test here.

Too bad, that reproducing races is racy for itself.
With the piglit test I get about 2/3 of the runs either glibc memory 
corruption aborts. Or one of the below asserts from llvm:

bool llvm::llvm_start_multithreaded(): Assertion `!multithreaded_mode  
Already multithreaded!' failed.

void 
llvm::PassRegistry::removeRegistrationListener(llvm::PassRegistrationListener*):
 
Assertion `I != Impl-Listeners.end()  PassRegistrationListener not 
registered!' failed.

bool llvm::sys::SmartMutexmt_only::release() [with bool mt_only = true]: 
Assertion `((recursive  acquired) || (acquired == 1))  Lock not acquired 
before release!' failed.

So the biggest problem IIRC was that use of llvm::sys::SmartMutexmt_only 
which is spread around here and there in llvm. The pass registry was (is?) one 
of the users for that. If you did not tell llvm to run multithreaded these 
locks get noops and you concurrently access containers and that ...

Looking at the first assert, the llvm guys have made this problem even worse 
IMO since I looked at this before. We need to check for multithreading being 
enabled before trying to set this. Both of which being racy for itself in this 
way and all of them not being save against already happening llvm access from 
an other thread and an other foreign use.

 Sorry about that. I didn't have piglit commit access at the time, and
 I forgot about the patch.  I fixed a few things and sent v3 to the list.
The same here. Thanks for this.

  Regarding the point where this funciton is called I had choosen static
  initialization time since llvm requires this function to be called single
  threaded which we cannot guarantee in any case. Keep in mind that you need
  to ensure this function called non concurrently even against applications
  that itself already use the llvm libs in some way while the driver is
  loaded. But the best bet is to do this in the dynamic loder which is
  itself serialized, so I could avoid calling this function concurrently by
  initialization of different contexts. That should at least shield against
  applications that itself do the same trick by calling this funtion in the
  dlopen phase in some static initializer ...
  We may get around part of this problem with dlopening the driver with
  better isolation but up to now the problem can get that far.
 
 This is a tricky problem, and I'm not sure that radeon_llvm_compile() is
 the best place to call llvm_start_multithreaded().  Maybe it would be
 better to move this into gallivm, because this problem could affect any
 driver that uses the gallivm code, which includes: llvmpipe, r300g, r600g,
 radeonsi, and i915g.  What do you think?

Yep, an other place would be better.

I do not know the llvm tools well enough, but If I move the current c++ code 
into src/gallium/auxiliary/gallivm/lp_bld_misc.cpp it works for me (TM).
Seriously, I know of one guy who wants to use llvmpipe with 

[Mesa-dev] [PATCH 1/3] gallivm: Emit vector selects.

2013-04-20 Thread jfonseca
From: José Fonseca jfons...@vmware.com

They are supported on LLVM 3.1, at least on x86. (I haven't tested on PPC
though.)

Actually lp_build_linear_mip_levels() already has been emitting them for
some time.

This avoids intrinsics, which tend to be an obstacle for certain
optimization passes.
---
 src/gallium/auxiliary/gallivm/lp_bld_logic.c |   14 ++
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_logic.c 
b/src/gallium/auxiliary/gallivm/lp_bld_logic.c
index f56b61b..cdb7e0a 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_logic.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_logic.c
@@ -458,20 +458,10 @@ lp_build_select(struct lp_build_context *bld,
   mask = LLVMBuildTrunc(builder, mask, LLVMInt1TypeInContext(lc), );
   res = LLVMBuildSelect(builder, mask, a, b, );
}
-   else if (0) {
+   else if (HAVE_LLVM = 0x301) {
   /* Generate a vector select.
*
-   * XXX: Using vector selects would avoid emitting intrinsics, but they 
aren't
-   * properly supported yet.
-   *
-   * LLVM 3.0 includes experimental support provided the -promote-elements
-   * options is passed to LLVM's command line (e.g., via
-   * llvm::cl::ParseCommandLineOptions), but resulting code quality is much
-   * worse, probably because some optimization passes don't know how to
-   * handle vector selects.
-   *
-   * See also:
-   * - http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/043659.html
+   * Only supported on LLVM 3.1 onwards
*/
 
   /* Convert the mask to a vector of booleans.
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] gallivm: Disable LLVM 2.7 workaround on other versions.

2013-04-20 Thread jfonseca
From: José Fonseca jfons...@vmware.com

2.7 was a particularly trouble ridden released.

Furthermore, the bug no longer can be reproduced ever since the
first_level state was taken in account.
---
 src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
index ced2103..beefdae 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
@@ -1149,11 +1149,10 @@ lp_build_sample_common(struct lp_build_sample_context 
*bld,
   /* fall-through */
case PIPE_TEX_MIPFILTER_NONE:
   /* always use mip level 0 */
-  if (target == PIPE_TEXTURE_CUBE) {
+  if (HAVE_LLVM == 0x0207  target == PIPE_TEXTURE_CUBE) {
  /* XXX this is a work-around for an apparent bug in LLVM 2.7.
   * We should be able to set ilevel0 = const(0) but that causes
   * bad x86 code to be emitted.
-  * XXX should probably disable that on other llvm versions.
   */
  assert(*lod_ipart);
  lp_build_nearest_mip_level(bld, texture_index, *lod_ipart, ilevel0);
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] llvmpipe: Ignore depth-stencil state if format has no depth/stencil.

2013-04-20 Thread jfonseca
From: José Fonseca jfons...@vmware.com

Prevents assertion failures inside the driver for such state combinations.
---
 src/gallium/drivers/llvmpipe/lp_state_fs.c |   14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_state_fs.c 
b/src/gallium/drivers/llvmpipe/lp_state_fs.c
index 8712885..1a9a194 100644
--- a/src/gallium/drivers/llvmpipe/lp_state_fs.c
+++ b/src/gallium/drivers/llvmpipe/lp_state_fs.c
@@ -2478,12 +2478,18 @@ make_variant_key(struct llvmpipe_context *lp,
memset(key, 0, shader-variant_key_size);
 
if (lp-framebuffer.zsbuf) {
-  if (lp-depth_stencil-depth.enabled) {
- key-zsbuf_format = lp-framebuffer.zsbuf-format;
+  enum pipe_format zsbuf_format = lp-framebuffer.zsbuf-format;
+  const struct util_format_description *zsbuf_desc =
+ util_format_description(zsbuf_format);
+
+  if (lp-depth_stencil-depth.enabled 
+  util_format_has_depth(zsbuf_desc)) {
+ key-zsbuf_format = zsbuf_format;
  memcpy(key-depth, lp-depth_stencil-depth, sizeof key-depth);
   }
-  if (lp-depth_stencil-stencil[0].enabled) {
- key-zsbuf_format = lp-framebuffer.zsbuf-format;
+  if (lp-depth_stencil-stencil[0].enabled 
+  util_format_has_stencil(zsbuf_desc)) {
+ key-zsbuf_format = zsbuf_format;
  memcpy(key-stencil, lp-depth_stencil-stencil, sizeof 
key-stencil);
   }
}
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Vadim Girlin

On 04/20/2013 03:11 AM, Marek Olšák wrote:

Please don't add any new environment variables and use R600_DEBUG
instead. The other environment variables are deprecated.


I agree, those vars probably need some cleanup, they were added before 
R600_DEBUG appeared.


Though I'm afraid some of my options won't fit well into the R600_DEBUG 
flags, unless we'll add support for the name/value pairs with optional 
custom parsers.


E.g. I have a group of env vars to define the range of included/excluded 
shaders for optimization and mode (include/exclude/off), I thought about 
doing this with a single var and custom parser to specify the range e.g. 
as 10-20, but after all it's just a debug feature, not intended for 
everyday use, and so far I failed to convince myself that it's worth the 
efforts.


I can implement the support for custom parsers for R600_DEBUG, but do we 
really need it? Maybe it would be enough to add e.g.sb instead of 
R600_SB var to the R600_DEBUG flags for enabling it (probably together 
with other boolean options such as R600_SB_USE_NEW_BYTECODE) but leave 
more complicated internal debug options as is?


Vadim


There is a table for R600_DEBUG in r600_pipe.c and it even comes with
a help feature: R600_DEBUG=help

Marek

On Fri, Apr 19, 2013 at 4:48 PM, Vadim Girlin vadimgir...@gmail.com wrote:

Hi,

In the previous status update I said that the r600-sb branch is not ready to
be merged yet, but recently I've done some cleanups and reworks, and though
I haven't finished everything that I planned initially, I think now it's in
a better state and may be considered for merging.

I'm interested to know if the people think that merging of the r600-sb
branch makes sense at all. I'll try to explain here why it makes sense to
me.

Although I understand that the development of llvm backend is a primary goal
for the r600g developers, it's a complicated process and may require quite
some time to achieve good results regarding the shader/compiler performance,
and at the same time this branch already works and provides good results in
many cases. That's why I think it makes sense to merge this branch as a
non-default backend at least as a temporary solution for shader performance
problems. We can always get rid of it if it becomes too much a maintenance
burden or when llvm backend catches up in terms of shader performance and
compilation speed/overhead.

Regarding the support and maintenance of this code, I'll try to do my best
to fix possible issues, and so far there are no known unfixed issues. I
tested it with many apps on evergreen and fixed all issues with other chips
that were reported to me on the list or privately after the last status
announce. There are no piglit regressions on evergreen when this branch is
used with both default and llvm backends.

This code was intentionally separated as much as possible from the other
parts of the driver, basically there are just two functions used from r600g,
and the shader code is passed to/from r600-sb as a hardware bytecode that is
not going to change. I think it won't require any modifications at all to
keep it in sync with the most changes in r600g.

Some work might be required though if we'll want to add support for the new
hw features that are currently unused, e.g. geometry shaders, new
instruction types for compute shaders, etc, but I think I'll be able to
catch up when it's implemented in the driver and default or llvm backend.
E.g. this branch already works for me on evergreen with some simple OpenCL
kernels, including bfgminer where it increases performance of the kernel
compiled with llvm backend by more than 20% for me.

Besides the performance benefits, I think that alternative backend also
might help with debugging of the default or llvm backend, in some cases it
helped me by exposing the bugs that are not very obvious otherwise, e.g. it
may be hard to compare the dumps from default and llvm backend to spot the
regression because they are too different, but after processing both shaders
with r600-sb the code is usually transformed to some more common form, and
often this makes it easier to compare and find the differences in shader
logic.

One additional feature that might help with llvm backend debugging is the
disassembler that works on the hardware bytecode instead of the internal
r600g bytecode structs. This results in the more readable shader dumps for
instructions passed in native hw encoding from llvm backend. I think this
also can help to catch more potential bugs related to bytecode building in
r600g/llvm. Currently r600-sb uses its bytecode disassembler for all shader
dumps, including the fetch shaders, even when optimization is not enabled.
Basically it can replace r600_bytecode_disasm and related code completely.

Below are some quick benchmarks for shader performance and compilation time,
to demonstrate that currently r600-sb might provide better performance for
users, at least in some cases.

As an example of the shaders with 

Re: [Mesa-dev] [PATCH 5/6] configure: remove IN_DRI_DRIVER

2013-04-20 Thread Andreas Boll
2013/4/20 Brian Paul bri...@vmware.com

 On 04/19/2013 01:56 PM, Andreas Boll wrote:

 From: Brian Paulbri...@vmware.com


 I don't recall authoring this particular patch.  But AFAICT, IN_DRI_DRIVER
 is no loger used, as you indidate.

 -Brian


See
http://cgit.freedesktop.org/~brianp/mesa/commit/?h=remove-mfeaturesid=22c61a33a29bf3f26cc2dacad6eb8ea6ea6ff864

Andreas.


  Not used anymore.

 v2: Andreas Bollandreas.boll.dev@gmail.**comandreas.boll@gmail.com
 
  - split patch into two patches
 - remove more unused code
 ---
   configure.ac |   15 +--
   1 file changed, 5 insertions(+), 10 deletions(-)

 diff --git a/configure.ac b/configure.ac
 index cdc8fd9..228c324 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -731,11 +731,6 @@ if test x$enable_dri = xno; then
   enable_shared_glapi=no
   fi

 -if test x$enable_shared_glapi = xyes; then
 -# libGL will use libglapi for function lookups (IN_DRI_DRIVER means
 to use
 -# the remap table)
 -DEFINES=$DEFINES -DIN_DRI_DRIVER
 -fi
   AM_CONDITIONAL(HAVE_SHARED_**GLAPI, test x$enable_shared_glapi =
 xyes)

   dnl
 @@ -959,7 +954,7 @@ if test x$enable_dri = xyes; then
   # Platform specific settings and drivers to build
   case $host_os in
   linux*)
 -DEFINES=$DEFINES -DUSE_EXTERNAL_DXTN_LIB=1 -DIN_DRI_DRIVER
 +DEFINES=$DEFINES -DUSE_EXTERNAL_DXTN_LIB=1
   DEFINES=$DEFINES -DHAVE_ALIAS

   case $host_cpu in
 @@ -984,21 +979,21 @@ if test x$enable_dri = xyes; then
   ;;
   freebsd* | dragonfly* | *netbsd*)
   DEFINES=$DEFINES -DHAVE_PTHREAD -DUSE_EXTERNAL_DXTN_LIB=1
 -DEFINES=$DEFINES -DIN_DRI_DRIVER -DHAVE_ALIAS
 +DEFINES=$DEFINES -DHAVE_ALIAS

   if test x$DRI_DIRS = xyes; then
   DRI_DIRS=i915 i965 nouveau r200 radeon swrast
   fi
   ;;
   gnu*)
 -DEFINES=$DEFINES -DUSE_EXTERNAL_DXTN_LIB=1 -DIN_DRI_DRIVER
 +DEFINES=$DEFINES -DUSE_EXTERNAL_DXTN_LIB=1
   DEFINES=$DEFINES -DHAVE_ALIAS
 ;;
   solaris*)
 -DEFINES=$DEFINES -DUSE_EXTERNAL_DXTN_LIB=1 -DIN_DRI_DRIVER
 +DEFINES=$DEFINES -DUSE_EXTERNAL_DXTN_LIB=1
   ;;
   cygwin*)
 -DEFINES=$DEFINES -DUSE_EXTERNAL_DXTN_LIB=1 -DIN_DRI_DRIVER
 +DEFINES=$DEFINES -DUSE_EXTERNAL_DXTN_LIB=1
   if test x$DRI_DIRS = xyes; then
   DRI_DIRS=swrast
   fi



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] R600/SI: Add pattern for AMDGPUurecip

2013-04-20 Thread Christian König

Am 20.04.2013 06:06, schrieb Tom Stellard:

On Thu, Apr 11, 2013 at 10:12:01AM +0200, Christian König wrote:

Am 10.04.2013 18:50, schrieb Tom Stellard:

On Wed, Apr 10, 2013 at 05:59:48PM +0200, Michel Dänzer wrote:

[SNIP]

We should start using the updated pattern syntax for all new patterns.
This means replacing register classes with types for the input patterns
and omitting the type in the output pattern:

def : Pat 
   (AMDGPUurecip i32:$src0),
   (V_CVT_U32_F32_e32
 (V_MUL_F32_e32 CONST.FP_UINT_MAX_PLUS_1,
(V_RCP_IFLAG_F32_e32 (V_CVT_F32_U32_e32 $src0

With that change:

Reviewed-by: Tom Stellard thomas.stell...@amd.com

BTW: I created the attached patches two weeks ago. They rework most
of the existing patterns on SI to use the new format, but I
currently don't have time to rebase, test  commit them. They
shouldn't change anything in functionality, so if you guys think
they are ok then please review and commit them.


Thanks for doing this.  I've thrown these patches into a branch along
with changes to the R600 patterns.  I will try to test them next week.
Is there any reason why we can't squash all these patches together before
we commit?


No not really. I just usually split patches up for testing each 
individually, so feel free to squash merge them for commit.


Christian.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] llvmpipe: Don't support Z32_FLOAT_S8X24_UINT texture sampling support either.

2013-04-20 Thread jfonseca
From: José Fonseca jfons...@vmware.com

Because we don't support, and the u_format fallback doesn't work for
zs formats.
---
 src/gallium/drivers/llvmpipe/lp_screen.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c 
b/src/gallium/drivers/llvmpipe/lp_screen.c
index 5535f85..667ade1 100644
--- a/src/gallium/drivers/llvmpipe/lp_screen.c
+++ b/src/gallium/drivers/llvmpipe/lp_screen.c
@@ -361,6 +361,12 @@ llvmpipe_is_format_supported( struct pipe_screen *_screen,
  return FALSE;
}
 
+   /* TODO: Support Z32_FLOAT_S8X24_UINT. See lp_bld_depth.c. */
+   if (format_desc-colorspace == UTIL_FORMAT_COLORSPACE_ZS 
+   format_desc-block.bits  32) {
+  return FALSE;
+   }
+
if (bind  PIPE_BIND_DEPTH_STENCIL) {
   if (format_desc-layout != UTIL_FORMAT_LAYOUT_PLAIN)
  return FALSE;
@@ -368,10 +374,6 @@ llvmpipe_is_format_supported( struct pipe_screen *_screen,
   if (format_desc-colorspace != UTIL_FORMAT_COLORSPACE_ZS)
  return FALSE;
 
-  /* TODO: Support Z32_FLOAT_S8X24_UINT. See lp_bld_depth.c. */
-  if (format_desc-block.bits  32)
- return FALSE;
-
   /* TODO: Support stencil-only formats */
   if (format_desc-swizzle[0] == UTIL_FORMAT_SWIZZLE_NONE) {
  return FALSE;
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Vadim Girlin

On 04/20/2013 01:42 PM, Christian König wrote:

Am 19.04.2013 18:50, schrieb Vadim Girlin:

On 04/19/2013 08:35 PM, Christian König wrote:

Hey Vadim,

Am 19.04.2013 18:18, schrieb Vadim Girlin:

[SNIP]

In theory, yes, some optimizations in this branch are typically used
on the earlier compilation stages, not on the target machine code. On
the other hand, there are some differences that might make it harder,
e.g. many algorithms require SSA form, and though it's possible to do
similar optimizations without SSA, it would be hard to implement. Also
I wanted to support both default backend and llvm backend for
increased testing coverage and to be able to compare the efficiency of
the algorithms in my experiments etc.


Yeah I know, missing an SSA implementation is also something that always
bothered me a bit with both TGSI and GLSL (while I haven't done much
with GLSL, so maybe I misjudge here).

Can you name the different algorithms used?


There is a short description of the algorithms and passes in the
notes.markdown file [1] in that branch, there are also links in the
end to the full description of some algorithms, though some of them
were modified/adapted for this branch.


It's not a strict prerequisite, but I think we both agree that doing
things like LICM on R600 bytecode isn't the best idea over all (when
doing it on GLSL would be beneficial for all drivers not only r600).


In fact there is no special LICM pass, it's done by the GCM (Global
Code Motion, [2]), which probably could be also called global
scheduler. In fact in my branch this pass is combined with some
hw-specific scheduling logic, e.g. grouping fetch/alu instructions to
reduce clause type switching in the code and the number of required CF
instructions, potentially it can also schedule clauses to expose more
parallelism with the BARRIER bit usage.



Yeah I already thought that you're using something like this.

On one hand that is really good, cause it is specialized on so produces
really optimal code for the r600 target. But on the other hand it's bad,
cause it is specialized on so produces really optimal code ONLY on the
r600 target


I think such pass on higher level (GLSL IR or TGSI) would at least need 
some callbacks or caps to be tunable for the target.


Anyway the result of GCM pass is affected by the CFG structure, so when 
the target applies e.g. if-conversion or any other target-specific 
control flow optimization, this means that you might want to apply 
similar pass again on the target instruction level for better results, 
and then previous pass on higher level IR looks not very useful.


Also there are some high level operations that are translated to the 
bunch of target instructions, e.g. integer division on r600. High-level 
pass can't hoist i/5 (where i is loop counter) out of the loop, but 
after translation to target instructions it's possible to hoist some of 
the resulting instructions, producing more efficient code.


One more point is that GCM allows to achieve best efficiency when used 
with GVN (Global Value Numbering) pass, e.g. GCM allows GVN to not care 
about code placement during elimination of redundant operations, so 
you'll probably want to implement high-level GVN pass as well.


I think it's possible to implement GVN-GCM on GLSL or TGSI level, but I 
suspect it will require a lot more efforts than it was required by 
implementation of these passes in my branch, and will be less efficient.




Just speculating, what would it take to make those passes run on the
LLVM Machine Instruction representation instead of your own representation?


Main difference between IRs is the representation of control flow, 
r600-sb relies on the fact that r600 arch doesn't have arbitrary control 
flow, this renders CFGs superfluous. Implementation of these passes on 
CFGs will be more complicated, it will also require the computation of 
dominance frontiers, loops detection and analysis, etc. On the r600-sb's 
IR these passes are greatly simplified.


Regarding the GCM, original algorithm as described in that pdf works on 
the CFG, so it shouldn't be hard to implement in LLVM, but I'm not sure 
how it will fit into the LLVM infrastructure. LLVM has GVN-PRE, LICM and 
other passes that together do basically the same thing as GVN-GCM, so if 
you implement it, you might want to get rid of LLVM's own passes that 
duplicate the same functionality, and I'm not sure if this would be 
easy, possibly there are some interdependencies etc. Also I saw mentions 
of some plans (e.g. [1],[2]) regarding the implementation of global code 
motion in LLVM, looks like there is already some work in progress.


Vadim

[1] 
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120709/146206.html
[2] 
http://markmail.org/message/2td3fnnggk6oripp#query:+page:1+mid:2td3fnnggk6oripp+state:results




Christian.


Vadim

 [1]
http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb/notes.markdown?h=r600-sb

 [2]

[Mesa-dev] [PATCH] radeonsi: cleanup disabling tiling for UVD

2013-04-20 Thread Christian König
From: Christian König christian.koe...@amd.com

Should fix: https://bugs.freedesktop.org/show_bug.cgi?id=63702

Signed-off-by: Christian König christian.koe...@amd.com
---
 src/gallium/drivers/radeonsi/radeonsi_uvd.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_uvd.c 
b/src/gallium/drivers/radeonsi/radeonsi_uvd.c
index d49c088..20d079f 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_uvd.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_uvd.c
@@ -76,6 +76,7 @@ struct pipe_video_buffer *radeonsi_video_buffer_create(struct 
pipe_context *pipe
template.height = align(tmpl-height / depth, VL_MACROBLOCK_HEIGHT);
 
vl_vide_buffer_template(templ, template, resource_formats[0], depth, 
PIPE_USAGE_STATIC, 0);
+   templ.flags = R600_RESOURCE_FLAG_TRANSFER;
resources[0] = (struct r600_resource_texture *)
pipe-screen-resource_create(pipe-screen, templ);
if (!resources[0])
@@ -83,6 +84,7 @@ struct pipe_video_buffer *radeonsi_video_buffer_create(struct 
pipe_context *pipe
 
if (resource_formats[1] != PIPE_FORMAT_NONE) {
vl_vide_buffer_template(templ, template, resource_formats[1], 
depth, PIPE_USAGE_STATIC, 1);
+   templ.flags = R600_RESOURCE_FLAG_TRANSFER;
resources[1] = (struct r600_resource_texture *)
pipe-screen-resource_create(pipe-screen, templ);
if (!resources[1])
@@ -91,6 +93,7 @@ struct pipe_video_buffer *radeonsi_video_buffer_create(struct 
pipe_context *pipe
 
if (resource_formats[2] != PIPE_FORMAT_NONE) {
vl_vide_buffer_template(templ, template, resource_formats[2], 
depth, PIPE_USAGE_STATIC, 2);
+   templ.flags = R600_RESOURCE_FLAG_TRANSFER;
resources[2] = (struct r600_resource_texture *)
pipe-screen-resource_create(pipe-screen, templ);
if (!resources[2])
@@ -114,9 +117,6 @@ struct pipe_video_buffer 
*radeonsi_video_buffer_create(struct pipe_context *pipe
/* recreate the CS handle */
resources[i]-resource.cs_buf = ctx-ws-buffer_get_cs_handle(
resources[i]-resource.buf);
-
-   /* TODO: tiling used to work with UVD on SI */
-   resources[i]-surface.level[0].mode = 
RADEON_SURF_MODE_LINEAR_ALIGNED;
}
 
template.height *= depth;
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Christian König

Am 20.04.2013 13:12, schrieb Vadim Girlin:

On 04/20/2013 01:42 PM, Christian König wrote:

Am 19.04.2013 18:50, schrieb Vadim Girlin:

On 04/19/2013 08:35 PM, Christian König wrote:

Hey Vadim,

Am 19.04.2013 18:18, schrieb Vadim Girlin:

[SNIP]

In theory, yes, some optimizations in this branch are typically used
on the earlier compilation stages, not on the target machine code. On
the other hand, there are some differences that might make it harder,
e.g. many algorithms require SSA form, and though it's possible to do
similar optimizations without SSA, it would be hard to implement. 
Also

I wanted to support both default backend and llvm backend for
increased testing coverage and to be able to compare the 
efficiency of

the algorithms in my experiments etc.


Yeah I know, missing an SSA implementation is also something that 
always

bothered me a bit with both TGSI and GLSL (while I haven't done much
with GLSL, so maybe I misjudge here).

Can you name the different algorithms used?


There is a short description of the algorithms and passes in the
notes.markdown file [1] in that branch, there are also links in the
end to the full description of some algorithms, though some of them
were modified/adapted for this branch.


It's not a strict prerequisite, but I think we both agree that doing
things like LICM on R600 bytecode isn't the best idea over all (when
doing it on GLSL would be beneficial for all drivers not only r600).


In fact there is no special LICM pass, it's done by the GCM (Global
Code Motion, [2]), which probably could be also called global
scheduler. In fact in my branch this pass is combined with some
hw-specific scheduling logic, e.g. grouping fetch/alu instructions to
reduce clause type switching in the code and the number of required CF
instructions, potentially it can also schedule clauses to expose more
parallelism with the BARRIER bit usage.



Yeah I already thought that you're using something like this.

On one hand that is really good, cause it is specialized on so produces
really optimal code for the r600 target. But on the other hand it's bad,
cause it is specialized on so produces really optimal code ONLY on the
r600 target


I think such pass on higher level (GLSL IR or TGSI) would at least 
need some callbacks or caps to be tunable for the target.


Anyway the result of GCM pass is affected by the CFG structure, so 
when the target applies e.g. if-conversion or any other 
target-specific control flow optimization, this means that you might 
want to apply similar pass again on the target instruction level for 
better results, and then previous pass on higher level IR looks not 
very useful.


Also there are some high level operations that are translated to the 
bunch of target instructions, e.g. integer division on r600. 
High-level pass can't hoist i/5 (where i is loop counter) out of the 
loop, but after translation to target instructions it's possible to 
hoist some of the resulting instructions, producing more efficient code.


One more point is that GCM allows to achieve best efficiency when used 
with GVN (Global Value Numbering) pass, e.g. GCM allows GVN to not 
care about code placement during elimination of redundant operations, 
so you'll probably want to implement high-level GVN pass as well.


I think it's possible to implement GVN-GCM on GLSL or TGSI level, but 
I suspect it will require a lot more efforts than it was required by 
implementation of these passes in my branch, and will be less efficient.




Just speculating, what would it take to make those passes run on the
LLVM Machine Instruction representation instead of your own 
representation?


Main difference between IRs is the representation of control flow, 
r600-sb relies on the fact that r600 arch doesn't have arbitrary 
control flow, this renders CFGs superfluous. Implementation of these 
passes on CFGs will be more complicated, it will also require the 
computation of dominance frontiers, loops detection and analysis, etc. 
On the r600-sb's IR these passes are greatly simplified.


Regarding the GCM, original algorithm as described in that pdf works 
on the CFG, so it shouldn't be hard to implement in LLVM, but I'm not 
sure how it will fit into the LLVM infrastructure. LLVM has GVN-PRE, 
LICM and other passes that together do basically the same thing as 
GVN-GCM, so if you implement it, you might want to get rid of LLVM's 
own passes that duplicate the same functionality, and I'm not sure if 
this would be easy, possibly there are some interdependencies etc. 
Also I saw mentions of some plans (e.g. [1],[2]) regarding the 
implementation of global code motion in LLVM, looks like there is 
already some work in progress.




Oh, I wasn't taking about replacing any LLVM passes, more like extending 
them to provide the same amount of functionality. Also I hadn't had LLVM 
IR in mind while writing this, but more the machine instruction 
representation they use.


Well you have quite allot of C++ 

Re: [Mesa-dev] [PATCH] radeon/llvm: Use LLVM C API for compiling LLVM IR to ISA.

2013-04-20 Thread Christian König

Am 20.04.2013 09:27, schrieb Mathias Fröhlich:

Hi Tom,

May be I need to tell where the problem really appears in real life.
OpenSceneGraph has some nifty features regarding multi channel rendering.
Assume a setup of multiple full screen views running on different graphics
boards into a single mashine composing a view into a single scene.
Now the recommended way to do this with osg is to set up a X screen per
graphics board. Even if this spans multiple monitors/projectors. Set up a GL
graphics context per graphics board and set up a viewport per projector in the
graphics contexts. Rendering happens now in parallel for each graphics
context. I do drive such a thing here with two radeons and three monitors for
testing and here the problem appears.

When I start the favourite flight simulation application of my choice with this
setup, then it crashes almost immediately without llvm_start_multithreaded
being called. Wheres it works stable if we ensure llvm being multithreaded.

So, I tried to distill a piglit testcase out of this somehow huger setup with
flightgear, OpenSceneGraph, multiple gpu's and what not.

On Friday, April 19, 2013 20:08:54 Tom Stellard wrote:

On Wed, Apr 17, 2013 at 07:54:32AM +0200, Mathias Fröhlich wrote:

Tom,


-class LLVMEnsureMultithreaded {
-public:
-   LLVMEnsureMultithreaded()
-   {
-  llvm_start_multithreaded();
-   }
-};
-
-static LLVMEnsureMultithreaded lLVMEnsureMultithreaded;

Removing this leads to crashes in llvm with applications that concurrently
work on different gl contexts.

The test you wrote still passes with this patch.  Do you see that
we are now calling the C API version of llvm_start_multithreaded(),
LLVMStartMutithreaded() from inside radeon_llvm_compile() protected by a
static variable?

Oh, no I did not see this. I did not realize that the llvm_start_multithreaded
call is not just plain C. So I thought grepping for the call I used is
sufficient.

But negative. If I really apply your patch and try to run this with the above
setup I get the crashes. The same with the piglit test here.

Too bad, that reproducing races is racy for itself.
With the piglit test I get about 2/3 of the runs either glibc memory
corruption aborts. Or one of the below asserts from llvm:

bool llvm::llvm_start_multithreaded(): Assertion `!multithreaded_mode 
Already multithreaded!' failed.

void
llvm::PassRegistry::removeRegistrationListener(llvm::PassRegistrationListener*):
Assertion `I != Impl-Listeners.end()  PassRegistrationListener not
registered!' failed.

bool llvm::sys::SmartMutexmt_only::release() [with bool mt_only = true]:
Assertion `((recursive  acquired) || (acquired == 1))  Lock not acquired
before release!' failed.

So the biggest problem IIRC was that use of llvm::sys::SmartMutexmt_only
which is spread around here and there in llvm. The pass registry was (is?) one
of the users for that. If you did not tell llvm to run multithreaded these
locks get noops and you concurrently access containers and that ...

Looking at the first assert, the llvm guys have made this problem even worse
IMO since I looked at this before. We need to check for multithreading being
enabled before trying to set this. Both of which being racy for itself in this
way and all of them not being save against already happening llvm access from
an other thread and an other foreign use.


Sorry about that. I didn't have piglit commit access at the time, and
I forgot about the patch.  I fixed a few things and sent v3 to the list.

The same here. Thanks for this.


Regarding the point where this funciton is called I had choosen static
initialization time since llvm requires this function to be called single
threaded which we cannot guarantee in any case. Keep in mind that you need
to ensure this function called non concurrently even against applications
that itself already use the llvm libs in some way while the driver is
loaded. But the best bet is to do this in the dynamic loder which is
itself serialized, so I could avoid calling this function concurrently by
initialization of different contexts. That should at least shield against
applications that itself do the same trick by calling this funtion in the
dlopen phase in some static initializer ...
We may get around part of this problem with dlopening the driver with
better isolation but up to now the problem can get that far.

This is a tricky problem, and I'm not sure that radeon_llvm_compile() is
the best place to call llvm_start_multithreaded().  Maybe it would be
better to move this into gallivm, because this problem could affect any
driver that uses the gallivm code, which includes: llvmpipe, r300g, r600g,
radeonsi, and i915g.  What do you think?

Yep, an other place would be better.

I do not know the llvm tools well enough, but If I move the current c++ code
into src/gallium/auxiliary/gallivm/lp_bld_misc.cpp it works for me (TM).
Seriously, I know of one guy who wants to use llvmpipe with windows and he
would benefit from the c++ solution 

Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Vadim Girlin

On 04/20/2013 03:38 PM, Christian König wrote:

Am 20.04.2013 13:12, schrieb Vadim Girlin:

On 04/20/2013 01:42 PM, Christian König wrote:

Am 19.04.2013 18:50, schrieb Vadim Girlin:

On 04/19/2013 08:35 PM, Christian König wrote:

Hey Vadim,

Am 19.04.2013 18:18, schrieb Vadim Girlin:

[SNIP]

In theory, yes, some optimizations in this branch are typically used
on the earlier compilation stages, not on the target machine code. On
the other hand, there are some differences that might make it harder,
e.g. many algorithms require SSA form, and though it's possible to do
similar optimizations without SSA, it would be hard to implement.
Also
I wanted to support both default backend and llvm backend for
increased testing coverage and to be able to compare the
efficiency of
the algorithms in my experiments etc.


Yeah I know, missing an SSA implementation is also something that
always
bothered me a bit with both TGSI and GLSL (while I haven't done much
with GLSL, so maybe I misjudge here).

Can you name the different algorithms used?


There is a short description of the algorithms and passes in the
notes.markdown file [1] in that branch, there are also links in the
end to the full description of some algorithms, though some of them
were modified/adapted for this branch.


It's not a strict prerequisite, but I think we both agree that doing
things like LICM on R600 bytecode isn't the best idea over all (when
doing it on GLSL would be beneficial for all drivers not only r600).


In fact there is no special LICM pass, it's done by the GCM (Global
Code Motion, [2]), which probably could be also called global
scheduler. In fact in my branch this pass is combined with some
hw-specific scheduling logic, e.g. grouping fetch/alu instructions to
reduce clause type switching in the code and the number of required CF
instructions, potentially it can also schedule clauses to expose more
parallelism with the BARRIER bit usage.



Yeah I already thought that you're using something like this.

On one hand that is really good, cause it is specialized on so produces
really optimal code for the r600 target. But on the other hand it's bad,
cause it is specialized on so produces really optimal code ONLY on the
r600 target


I think such pass on higher level (GLSL IR or TGSI) would at least
need some callbacks or caps to be tunable for the target.

Anyway the result of GCM pass is affected by the CFG structure, so
when the target applies e.g. if-conversion or any other
target-specific control flow optimization, this means that you might
want to apply similar pass again on the target instruction level for
better results, and then previous pass on higher level IR looks not
very useful.

Also there are some high level operations that are translated to the
bunch of target instructions, e.g. integer division on r600.
High-level pass can't hoist i/5 (where i is loop counter) out of the
loop, but after translation to target instructions it's possible to
hoist some of the resulting instructions, producing more efficient code.

One more point is that GCM allows to achieve best efficiency when used
with GVN (Global Value Numbering) pass, e.g. GCM allows GVN to not
care about code placement during elimination of redundant operations,
so you'll probably want to implement high-level GVN pass as well.

I think it's possible to implement GVN-GCM on GLSL or TGSI level, but
I suspect it will require a lot more efforts than it was required by
implementation of these passes in my branch, and will be less efficient.



Just speculating, what would it take to make those passes run on the
LLVM Machine Instruction representation instead of your own
representation?


Main difference between IRs is the representation of control flow,
r600-sb relies on the fact that r600 arch doesn't have arbitrary
control flow, this renders CFGs superfluous. Implementation of these
passes on CFGs will be more complicated, it will also require the
computation of dominance frontiers, loops detection and analysis, etc.
On the r600-sb's IR these passes are greatly simplified.

Regarding the GCM, original algorithm as described in that pdf works
on the CFG, so it shouldn't be hard to implement in LLVM, but I'm not
sure how it will fit into the LLVM infrastructure. LLVM has GVN-PRE,
LICM and other passes that together do basically the same thing as
GVN-GCM, so if you implement it, you might want to get rid of LLVM's
own passes that duplicate the same functionality, and I'm not sure if
this would be easy, possibly there are some interdependencies etc.
Also I saw mentions of some plans (e.g. [1],[2]) regarding the
implementation of global code motion in LLVM, looks like there is
already some work in progress.



Oh, I wasn't taking about replacing any LLVM passes, more like extending
them to provide the same amount of functionality. Also I hadn't had LLVM
IR in mind while writing this, but more the machine instruction
representation they use.

Well you have quite allot of C++ 

Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Henri Verbeet
On 19 April 2013 18:01, Vadim Girlin vadimgir...@gmail.com wrote:
 The choice of C++ (unlike in my previous branch that used C) was mostly
 driven by the fact that optimization algorithms usually deal with a lot of
 different complex data structures, containers, etc, and C++ allows to
 isolate implementation of all such things in separate and easily replaceable
 classes and concentrate on the logic, making the code more clean and
 readable.

I'm sure it would be good fun to have a discussion about the relative
merits of C and C++, though I think I've seen enough actual C++ that
you're not going to convince me it's the better language. However, I
don't think that should be the main consideration. It's probably more
important to consider what current and potential new contributors
prefer, and on Linux, particularly for the more low-level stuff, I
suspect that pretty much means C.

 I haven't tried to keep it as a series of independent patches because during
 the development most changes were pretty intrusive and introduced new
 features, some parts were seriously reworked/rewritten more than one time,
 requiring changes in other parts, especially when intermediate
 representation of the code was changed. It was usually easier for me to
 simply fix the new regressions in the new code than to revert any changes
 and lose new features, so bisection wouldn't be very helpful anyway. That's
 why I didn't even try to keep the history. Anyway most of the code in the
 branch is new, so I don't think that the history of the patches that rewrite
 the same code few times during a development would make it more readable
 than simply reading the final code.

I think I'm just going to disagree there. (But of course that's all
just my personal opinion, which probably doesn't carry a lot of weight
at the moment.)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Vadim Girlin

On 04/20/2013 07:05 PM, Henri Verbeet wrote:

On 19 April 2013 18:01, Vadim Girlin vadimgir...@gmail.com wrote:

The choice of C++ (unlike in my previous branch that used C) was mostly
driven by the fact that optimization algorithms usually deal with a lot of
different complex data structures, containers, etc, and C++ allows to
isolate implementation of all such things in separate and easily replaceable
classes and concentrate on the logic, making the code more clean and
readable.


I'm sure it would be good fun to have a discussion about the relative
merits of C and C++, though I think I've seen enough actual C++ that
you're not going to convince me it's the better language.


I never wanted to convince you that C++ is better language, I just 
wanted to explain why I decided to switch from C to C++ in this 
particular case.



However, I
don't think that should be the main consideration. It's probably more
important to consider what current and potential new contributors
prefer, and on Linux, particularly for the more low-level stuff, I
suspect that pretty much means C.


Well, it may be considered as a low-level stuff because it's a part of 
the driver. On the other hand, I'd rather think of it as a part of the 
compiler, and compilers (especially optimization algorithms) don't 
really look like a low-level stuff to me. Depends on the definition of 
the low-level stuff though.


To name a few examples, we can look at the compilers/optimizing backends 
used by mesa/gallium: GLSL compiler (written in C++). LLVM (written in 
C++), backends for nvidia drivers (written in C++)...


Vadim




I haven't tried to keep it as a series of independent patches because during
the development most changes were pretty intrusive and introduced new
features, some parts were seriously reworked/rewritten more than one time,
requiring changes in other parts, especially when intermediate
representation of the code was changed. It was usually easier for me to
simply fix the new regressions in the new code than to revert any changes
and lose new features, so bisection wouldn't be very helpful anyway. That's
why I didn't even try to keep the history. Anyway most of the code in the
branch is new, so I don't think that the history of the patches that rewrite
the same code few times during a development would make it more readable
than simply reading the final code.


I think I'm just going to disagree there. (But of course that's all
just my personal opinion, which probably doesn't carry a lot of weight
at the moment.)



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: Don't support Z32_FLOAT_S8X24_UINT texture sampling support either.

2013-04-20 Thread Brian Paul

On 04/20/2013 03:45 AM, jfons...@vmware.com wrote:

From: José Fonsecajfons...@vmware.com

Because we don't support, and the u_format fallback doesn't work for
zs formats.
---
  src/gallium/drivers/llvmpipe/lp_screen.c |   10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c 
b/src/gallium/drivers/llvmpipe/lp_screen.c
index 5535f85..667ade1 100644
--- a/src/gallium/drivers/llvmpipe/lp_screen.c
+++ b/src/gallium/drivers/llvmpipe/lp_screen.c
@@ -361,6 +361,12 @@ llvmpipe_is_format_supported( struct pipe_screen *_screen,
   return FALSE;
 }

+   /* TODO: Support Z32_FLOAT_S8X24_UINT. See lp_bld_depth.c. */
+   if (format_desc-colorspace == UTIL_FORMAT_COLORSPACE_ZS
+   format_desc-block.bits  32) {
+  return FALSE;
+   }
+
 if (bind  PIPE_BIND_DEPTH_STENCIL) {
if (format_desc-layout != UTIL_FORMAT_LAYOUT_PLAIN)
   return FALSE;
@@ -368,10 +374,6 @@ llvmpipe_is_format_supported( struct pipe_screen *_screen,
if (format_desc-colorspace != UTIL_FORMAT_COLORSPACE_ZS)
   return FALSE;

-  /* TODO: Support Z32_FLOAT_S8X24_UINT. See lp_bld_depth.c. */
-  if (format_desc-block.bits  32)
- return FALSE;
-
/* TODO: Support stencil-only formats */
if (format_desc-swizzle[0] == UTIL_FORMAT_SWIZZLE_NONE) {
   return FALSE;


Reviewed-by: Brian Paul bri...@vmware.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] gallivm: Emit vector selects.

2013-04-20 Thread Brian Paul

On 04/20/2013 02:57 AM, jfons...@vmware.com wrote:

From: José Fonsecajfons...@vmware.com

They are supported on LLVM 3.1, at least on x86. (I haven't tested on PPC
though.)

Actually lp_build_linear_mip_levels() already has been emitting them for
some time.

This avoids intrinsics, which tend to be an obstacle for certain
optimization passes.
---
  src/gallium/auxiliary/gallivm/lp_bld_logic.c |   14 ++
  1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_logic.c 
b/src/gallium/auxiliary/gallivm/lp_bld_logic.c
index f56b61b..cdb7e0a 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_logic.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_logic.c
@@ -458,20 +458,10 @@ lp_build_select(struct lp_build_context *bld,
mask = LLVMBuildTrunc(builder, mask, LLVMInt1TypeInContext(lc), );
res = LLVMBuildSelect(builder, mask, a, b, );
 }
-   else if (0) {
+   else if (HAVE_LLVM= 0x301) {
/* Generate a vector select.
 *
-   * XXX: Using vector selects would avoid emitting intrinsics, but they 
aren't
-   * properly supported yet.
-   *
-   * LLVM 3.0 includes experimental support provided the -promote-elements
-   * options is passed to LLVM's command line (e.g., via
-   * llvm::cl::ParseCommandLineOptions), but resulting code quality is much
-   * worse, probably because some optimization passes don't know how to
-   * handle vector selects.
-   *
-   * See also:
-   * - http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/043659.html
+   * Only supported on LLVM 3.1 onwards
 */

/* Convert the mask to a vector of booleans.


For the series, Reviewed-by: Brian Paul bri...@vmware.com

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 63469] OSMesa Gallium llvmpipe VTK Test Failures

2013-04-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=63469

--- Comment #1 from Brian Paul bri...@vmware.com ---
Thanks for the traces.  It appears that we've got the sub-pixel positioning of
lines incorrect in llvmpipe.  Softpipe matches NVIDIA's driver but llvmpipe is
off by a half pixel both in x and y.  I'm digging into it.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 63472] OSMesa Gallium Segfault in VTK Test

2013-04-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=63472

--- Comment #1 from Brian Paul bri...@vmware.com ---
Could you run this test with valgrind?  That should give us a bit more info.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 63472] OSMesa Gallium Segfault in VTK Test

2013-04-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=63472

--- Comment #2 from Kevin Hobbs hob...@ohiou.edu ---
Created attachment 78281
  -- https://bugs.freedesktop.org/attachment.cgi?id=78281action=edit
valgring output

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] gallium: Add a new clip_halfz rasterizer state.

2013-04-20 Thread jfonseca
From: José Fonseca jfons...@vmware.com

gl_rasterization_rules lumps too many different flags.
---
 src/gallium/auxiliary/draw/draw_context.c  |1 +
 src/gallium/auxiliary/draw/draw_llvm.c |2 +-
 src/gallium/auxiliary/draw/draw_pt.h   |2 +-
 .../auxiliary/draw/draw_pt_fetch_shade_pipeline.c  |9 +++--
 .../draw/draw_pt_fetch_shade_pipeline_llvm.c   |9 +++--
 src/gallium/auxiliary/draw/draw_pt_post_vs.c   |8 
 src/gallium/docs/source/cso/rasterizer.rst |4 
 src/gallium/include/pipe/p_state.h |6 ++
 8 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_context.c 
b/src/gallium/auxiliary/draw/draw_context.c
index 0f98021..5272951 100644
--- a/src/gallium/auxiliary/draw/draw_context.c
+++ b/src/gallium/auxiliary/draw/draw_context.c
@@ -712,6 +712,7 @@ draw_get_rasterizer_no_cull( struct draw_context *draw,
   rast.flatshade = flatshade;
   rast.front_ccw = 1;
   rast.gl_rasterization_rules = draw-rasterizer-gl_rasterization_rules;
+  rast.clip_halfz = draw-rasterizer-clip_halfz;
 
   draw-rasterizer_no_cull[scissor][flatshade] =
  pipe-create_rasterizer_state(pipe, rast);
diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index e0c0705..e1c08c6 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -1669,7 +1669,7 @@ draw_llvm_make_variant_key(struct draw_llvm *llvm, char 
*store)
key-clip_z = llvm-draw-clip_z;
key-clip_user = llvm-draw-clip_user;
key-bypass_viewport = llvm-draw-identity_viewport;
-   key-clip_halfz = !llvm-draw-rasterizer-gl_rasterization_rules;
+   key-clip_halfz = llvm-draw-rasterizer-clip_halfz;
key-need_edgeflags = (llvm-draw-vs.edgeflag_output ? TRUE : FALSE);
key-ucp_enable = llvm-draw-rasterizer-clip_plane_enable;
key-has_gs = llvm-draw-gs.geometry_shader != NULL;
diff --git a/src/gallium/auxiliary/draw/draw_pt.h 
b/src/gallium/auxiliary/draw/draw_pt.h
index 764d311..dca8368 100644
--- a/src/gallium/auxiliary/draw/draw_pt.h
+++ b/src/gallium/auxiliary/draw/draw_pt.h
@@ -233,7 +233,7 @@ void draw_pt_post_vs_prepare( struct pt_post_vs *pvs,
  boolean clip_user,
   boolean guard_band,
  boolean bypass_viewport,
- boolean opengl,
+  boolean clip_halfz,
  boolean need_edgeflags );
 
 struct pt_post_vs *draw_pt_post_vs_create( struct draw_context *draw );
diff --git a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c 
b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c
index e17f161..8e48f46 100644
--- a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c
+++ b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c
@@ -105,17 +105,14 @@ static void fetch_pipeline_prepare( struct 
draw_pt_middle_end *middle,
   vs-info.num_inputs,
   fpme-vertex_size,
   instance_id_index );
-   /* XXX: it's not really gl rasterization rules we care about here,
-* but gl vs dx9 clip spaces.
-*/
draw_pt_post_vs_prepare( fpme-post_vs,
draw-clip_xy,
draw-clip_z,
draw-clip_user,
 draw-guard_band_xy,
-   draw-identity_viewport,
-   (boolean)draw-rasterizer-gl_rasterization_rules,
-   (draw-vs.edgeflag_output ? TRUE : FALSE) );
+draw-identity_viewport,
+draw-rasterizer-clip_halfz,
+(draw-vs.edgeflag_output ? TRUE : FALSE) );
 
draw_pt_so_emit_prepare( fpme-so_emit, FALSE );
 
diff --git a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c 
b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
index d312dc4..4dff4f8 100644
--- a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
@@ -156,17 +156,14 @@ llvm_middle_end_prepare( struct draw_pt_middle_end 
*middle,
fpme-vertex_size = sizeof(struct vertex_header) + nr * 4 * sizeof(float);
 
 
-   /* XXX: it's not really gl rasterization rules we care about here,
-* but gl vs dx9 clip spaces.
-*/
draw_pt_post_vs_prepare( fpme-post_vs,
draw-clip_xy,
draw-clip_z,
draw-clip_user,
 draw-guard_band_xy,
-   draw-identity_viewport,
-   (boolean)draw-rasterizer-gl_rasterization_rules,
-   (draw-vs.edgeflag_output ? TRUE : FALSE) );
+

[Mesa-dev] [PATCH 1/2] gallium: Add a new clip_halfz rasterizer state.

2013-04-20 Thread jfonseca
From: José Fonseca jfons...@vmware.com

gl_rasterization_rules lumps too many different flags.
---
 src/gallium/auxiliary/draw/draw_context.c  |1 +
 src/gallium/auxiliary/draw/draw_llvm.c |2 +-
 src/gallium/auxiliary/draw/draw_pt.h   |2 +-
 .../auxiliary/draw/draw_pt_fetch_shade_pipeline.c  |9 +++--
 .../draw/draw_pt_fetch_shade_pipeline_llvm.c   |9 +++--
 src/gallium/auxiliary/draw/draw_pt_post_vs.c   |8 
 src/gallium/docs/source/cso/rasterizer.rst |4 
 src/gallium/include/pipe/p_state.h |6 ++
 8 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_context.c 
b/src/gallium/auxiliary/draw/draw_context.c
index 0f98021..5272951 100644
--- a/src/gallium/auxiliary/draw/draw_context.c
+++ b/src/gallium/auxiliary/draw/draw_context.c
@@ -712,6 +712,7 @@ draw_get_rasterizer_no_cull( struct draw_context *draw,
   rast.flatshade = flatshade;
   rast.front_ccw = 1;
   rast.gl_rasterization_rules = draw-rasterizer-gl_rasterization_rules;
+  rast.clip_halfz = draw-rasterizer-clip_halfz;
 
   draw-rasterizer_no_cull[scissor][flatshade] =
  pipe-create_rasterizer_state(pipe, rast);
diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index e0c0705..e1c08c6 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -1669,7 +1669,7 @@ draw_llvm_make_variant_key(struct draw_llvm *llvm, char 
*store)
key-clip_z = llvm-draw-clip_z;
key-clip_user = llvm-draw-clip_user;
key-bypass_viewport = llvm-draw-identity_viewport;
-   key-clip_halfz = !llvm-draw-rasterizer-gl_rasterization_rules;
+   key-clip_halfz = llvm-draw-rasterizer-clip_halfz;
key-need_edgeflags = (llvm-draw-vs.edgeflag_output ? TRUE : FALSE);
key-ucp_enable = llvm-draw-rasterizer-clip_plane_enable;
key-has_gs = llvm-draw-gs.geometry_shader != NULL;
diff --git a/src/gallium/auxiliary/draw/draw_pt.h 
b/src/gallium/auxiliary/draw/draw_pt.h
index 764d311..dca8368 100644
--- a/src/gallium/auxiliary/draw/draw_pt.h
+++ b/src/gallium/auxiliary/draw/draw_pt.h
@@ -233,7 +233,7 @@ void draw_pt_post_vs_prepare( struct pt_post_vs *pvs,
  boolean clip_user,
   boolean guard_band,
  boolean bypass_viewport,
- boolean opengl,
+  boolean clip_halfz,
  boolean need_edgeflags );
 
 struct pt_post_vs *draw_pt_post_vs_create( struct draw_context *draw );
diff --git a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c 
b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c
index e17f161..8e48f46 100644
--- a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c
+++ b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c
@@ -105,17 +105,14 @@ static void fetch_pipeline_prepare( struct 
draw_pt_middle_end *middle,
   vs-info.num_inputs,
   fpme-vertex_size,
   instance_id_index );
-   /* XXX: it's not really gl rasterization rules we care about here,
-* but gl vs dx9 clip spaces.
-*/
draw_pt_post_vs_prepare( fpme-post_vs,
draw-clip_xy,
draw-clip_z,
draw-clip_user,
 draw-guard_band_xy,
-   draw-identity_viewport,
-   (boolean)draw-rasterizer-gl_rasterization_rules,
-   (draw-vs.edgeflag_output ? TRUE : FALSE) );
+draw-identity_viewport,
+draw-rasterizer-clip_halfz,
+(draw-vs.edgeflag_output ? TRUE : FALSE) );
 
draw_pt_so_emit_prepare( fpme-so_emit, FALSE );
 
diff --git a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c 
b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
index d312dc4..4dff4f8 100644
--- a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
@@ -156,17 +156,14 @@ llvm_middle_end_prepare( struct draw_pt_middle_end 
*middle,
fpme-vertex_size = sizeof(struct vertex_header) + nr * 4 * sizeof(float);
 
 
-   /* XXX: it's not really gl rasterization rules we care about here,
-* but gl vs dx9 clip spaces.
-*/
draw_pt_post_vs_prepare( fpme-post_vs,
draw-clip_xy,
draw-clip_z,
draw-clip_user,
 draw-guard_band_xy,
-   draw-identity_viewport,
-   (boolean)draw-rasterizer-gl_rasterization_rules,
-   (draw-vs.edgeflag_output ? TRUE : FALSE) );
+

Re: [Mesa-dev] [PATCH 1/2] gallium: Add a new clip_halfz rasterizer state.

2013-04-20 Thread Jose Fonseca
The second patch didn't make the list. Probably because I'm not the author. Not 
sure exactly how to get git send-email to handle that properly. I'll retry 
sending it shortly.

Anyway, the change is on 
http://cgit.freedesktop.org/~jrfonseca/mesa/commit/?h=gl-rasterization-rulesid=a3910fbee7d95afd2fe9a359d1510b6bc090ce5c
 . 

Jose

- Original Message -
 From: José Fonseca jfons...@vmware.com
 
 gl_rasterization_rules lumps too many different flags.
 ---
  src/gallium/auxiliary/draw/draw_context.c  |1 +
  src/gallium/auxiliary/draw/draw_llvm.c |2 +-
  src/gallium/auxiliary/draw/draw_pt.h   |2 +-
  .../auxiliary/draw/draw_pt_fetch_shade_pipeline.c  |9 +++--
  .../draw/draw_pt_fetch_shade_pipeline_llvm.c   |9 +++--
  src/gallium/auxiliary/draw/draw_pt_post_vs.c   |8 
  src/gallium/docs/source/cso/rasterizer.rst |4 
  src/gallium/include/pipe/p_state.h |6 ++
  8 files changed, 23 insertions(+), 18 deletions(-)
 
 diff --git a/src/gallium/auxiliary/draw/draw_context.c
 b/src/gallium/auxiliary/draw/draw_context.c
 index 0f98021..5272951 100644
 --- a/src/gallium/auxiliary/draw/draw_context.c
 +++ b/src/gallium/auxiliary/draw/draw_context.c
 @@ -712,6 +712,7 @@ draw_get_rasterizer_no_cull( struct draw_context *draw,
rast.flatshade = flatshade;
rast.front_ccw = 1;
rast.gl_rasterization_rules =
draw-rasterizer-gl_rasterization_rules;
 +  rast.clip_halfz = draw-rasterizer-clip_halfz;
  
draw-rasterizer_no_cull[scissor][flatshade] =
   pipe-create_rasterizer_state(pipe, rast);
 diff --git a/src/gallium/auxiliary/draw/draw_llvm.c
 b/src/gallium/auxiliary/draw/draw_llvm.c
 index e0c0705..e1c08c6 100644
 --- a/src/gallium/auxiliary/draw/draw_llvm.c
 +++ b/src/gallium/auxiliary/draw/draw_llvm.c
 @@ -1669,7 +1669,7 @@ draw_llvm_make_variant_key(struct draw_llvm *llvm, char
 *store)
 key-clip_z = llvm-draw-clip_z;
 key-clip_user = llvm-draw-clip_user;
 key-bypass_viewport = llvm-draw-identity_viewport;
 -   key-clip_halfz = !llvm-draw-rasterizer-gl_rasterization_rules;
 +   key-clip_halfz = llvm-draw-rasterizer-clip_halfz;
 key-need_edgeflags = (llvm-draw-vs.edgeflag_output ? TRUE : FALSE);
 key-ucp_enable = llvm-draw-rasterizer-clip_plane_enable;
 key-has_gs = llvm-draw-gs.geometry_shader != NULL;
 diff --git a/src/gallium/auxiliary/draw/draw_pt.h
 b/src/gallium/auxiliary/draw/draw_pt.h
 index 764d311..dca8368 100644
 --- a/src/gallium/auxiliary/draw/draw_pt.h
 +++ b/src/gallium/auxiliary/draw/draw_pt.h
 @@ -233,7 +233,7 @@ void draw_pt_post_vs_prepare( struct pt_post_vs *pvs,
 boolean clip_user,
boolean guard_band,
 boolean bypass_viewport,
 -   boolean opengl,
 +  boolean clip_halfz,
 boolean need_edgeflags );
  
  struct pt_post_vs *draw_pt_post_vs_create( struct draw_context *draw );
 diff --git a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c
 b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c
 index e17f161..8e48f46 100644
 --- a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c
 +++ b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c
 @@ -105,17 +105,14 @@ static void fetch_pipeline_prepare( struct
 draw_pt_middle_end *middle,
vs-info.num_inputs,
fpme-vertex_size,
instance_id_index );
 -   /* XXX: it's not really gl rasterization rules we care about here,
 -* but gl vs dx9 clip spaces.
 -*/
 draw_pt_post_vs_prepare( fpme-post_vs,
   draw-clip_xy,
   draw-clip_z,
   draw-clip_user,
  draw-guard_band_xy,
 - draw-identity_viewport,
 - (boolean)draw-rasterizer-gl_rasterization_rules,
 - (draw-vs.edgeflag_output ? TRUE : FALSE) );
 +draw-identity_viewport,
 +draw-rasterizer-clip_halfz,
 +(draw-vs.edgeflag_output ? TRUE : FALSE) );
  
 draw_pt_so_emit_prepare( fpme-so_emit, FALSE );
  
 diff --git a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
 b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
 index d312dc4..4dff4f8 100644
 --- a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
 +++ b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
 @@ -156,17 +156,14 @@ llvm_middle_end_prepare( struct draw_pt_middle_end
 *middle,
 fpme-vertex_size = sizeof(struct vertex_header) + nr * 4 *
 sizeof(float);
  
  
 -   /* XXX: it's not really gl rasterization rules we care about here,
 -* but gl vs dx9 clip spaces.
 

Re: [Mesa-dev] r600g: status of the r600-sb branch

2013-04-20 Thread Marek Olšák
Ah, I didn't know you had any other env vars. It's preferable to have
as many boolean flags as possible handled by a single env var, because
it's easier to use (R600_DUMP_SHADERS counts as a pretty ugly list of
boolean flags hidden behind a magic number). Feel free to have
separate env vars for more complex parameters.

I skimmed through some of your code and the coding style looks good.
I'm also okay with C++, it really seems like the right choice here.
However I agree with the argument that one header file per cpp might
not always be a good idea, especially if the header file is pretty
small.

Marek

On Sat, Apr 20, 2013 at 11:02 AM, Vadim Girlin vadimgir...@gmail.com wrote:
 On 04/20/2013 03:11 AM, Marek Olšák wrote:

 Please don't add any new environment variables and use R600_DEBUG
 instead. The other environment variables are deprecated.


 I agree, those vars probably need some cleanup, they were added before
 R600_DEBUG appeared.

 Though I'm afraid some of my options won't fit well into the R600_DEBUG
 flags, unless we'll add support for the name/value pairs with optional
 custom parsers.

 E.g. I have a group of env vars to define the range of included/excluded
 shaders for optimization and mode (include/exclude/off), I thought about
 doing this with a single var and custom parser to specify the range e.g. as
 10-20, but after all it's just a debug feature, not intended for everyday
 use, and so far I failed to convince myself that it's worth the efforts.

 I can implement the support for custom parsers for R600_DEBUG, but do we
 really need it? Maybe it would be enough to add e.g.sb instead of R600_SB
 var to the R600_DEBUG flags for enabling it (probably together with other
 boolean options such as R600_SB_USE_NEW_BYTECODE) but leave more complicated
 internal debug options as is?

 Vadim


 There is a table for R600_DEBUG in r600_pipe.c and it even comes with
 a help feature: R600_DEBUG=help

 Marek

 On Fri, Apr 19, 2013 at 4:48 PM, Vadim Girlin vadimgir...@gmail.com
 wrote:

 Hi,

 In the previous status update I said that the r600-sb branch is not ready
 to
 be merged yet, but recently I've done some cleanups and reworks, and
 though
 I haven't finished everything that I planned initially, I think now it's
 in
 a better state and may be considered for merging.

 I'm interested to know if the people think that merging of the r600-sb
 branch makes sense at all. I'll try to explain here why it makes sense to
 me.

 Although I understand that the development of llvm backend is a primary
 goal
 for the r600g developers, it's a complicated process and may require
 quite
 some time to achieve good results regarding the shader/compiler
 performance,
 and at the same time this branch already works and provides good results
 in
 many cases. That's why I think it makes sense to merge this branch as a
 non-default backend at least as a temporary solution for shader
 performance
 problems. We can always get rid of it if it becomes too much a
 maintenance
 burden or when llvm backend catches up in terms of shader performance and
 compilation speed/overhead.

 Regarding the support and maintenance of this code, I'll try to do my
 best
 to fix possible issues, and so far there are no known unfixed issues. I
 tested it with many apps on evergreen and fixed all issues with other
 chips
 that were reported to me on the list or privately after the last status
 announce. There are no piglit regressions on evergreen when this branch
 is
 used with both default and llvm backends.

 This code was intentionally separated as much as possible from the other
 parts of the driver, basically there are just two functions used from
 r600g,
 and the shader code is passed to/from r600-sb as a hardware bytecode that
 is
 not going to change. I think it won't require any modifications at all to
 keep it in sync with the most changes in r600g.

 Some work might be required though if we'll want to add support for the
 new
 hw features that are currently unused, e.g. geometry shaders, new
 instruction types for compute shaders, etc, but I think I'll be able to
 catch up when it's implemented in the driver and default or llvm backend.
 E.g. this branch already works for me on evergreen with some simple
 OpenCL
 kernels, including bfgminer where it increases performance of the kernel
 compiled with llvm backend by more than 20% for me.

 Besides the performance benefits, I think that alternative backend also
 might help with debugging of the default or llvm backend, in some cases
 it
 helped me by exposing the bugs that are not very obvious otherwise, e.g.
 it
 may be hard to compare the dumps from default and llvm backend to spot
 the
 regression because they are too different, but after processing both
 shaders
 with r600-sb the code is usually transformed to some more common form,
 and
 often this makes it easier to compare and find the differences in shader
 logic.

 One additional feature that might help with llvm backend 

[Mesa-dev] [PATCH] i965/fs: Don't save value returned by emit() if it's not used.

2013-04-20 Thread Matt Turner
Probably a copy-n-paste mistake.
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 422816d..f1539d5 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -521,7 +521,7 @@ fs_visitor::visit(ir_expression *ir)
   break;
 
case ir_unop_b2i:
-  inst = emit(AND(this-result, op[0], fs_reg(1)));
+  emit(AND(this-result, op[0], fs_reg(1)));
   break;
case ir_unop_b2f:
   temp = fs_reg(this, glsl_type::int_type);
@@ -541,14 +541,14 @@ fs_visitor::visit(ir_expression *ir)
   break;
case ir_unop_ceil:
   op[0].negate = !op[0].negate;
-  inst = emit(RNDD(this-result, op[0]));
+  emit(RNDD(this-result, op[0]));
   this-result.negate = true;
   break;
case ir_unop_floor:
-  inst = emit(RNDD(this-result, op[0]));
+  emit(RNDD(this-result, op[0]));
   break;
case ir_unop_fract:
-  inst = emit(FRC(this-result, op[0]));
+  emit(FRC(this-result, op[0]));
   break;
case ir_unop_round_even:
   emit(RNDE(this-result, op[0]));
@@ -585,27 +585,27 @@ fs_visitor::visit(ir_expression *ir)
   break;
 
case ir_unop_bit_not:
-  inst = emit(NOT(this-result, op[0]));
+  emit(NOT(this-result, op[0]));
   break;
case ir_binop_bit_and:
-  inst = emit(AND(this-result, op[0], op[1]));
+  emit(AND(this-result, op[0], op[1]));
   break;
case ir_binop_bit_xor:
-  inst = emit(XOR(this-result, op[0], op[1]));
+  emit(XOR(this-result, op[0], op[1]));
   break;
case ir_binop_bit_or:
-  inst = emit(OR(this-result, op[0], op[1]));
+  emit(OR(this-result, op[0], op[1]));
   break;
 
case ir_binop_lshift:
-  inst = emit(SHL(this-result, op[0], op[1]));
+  emit(SHL(this-result, op[0], op[1]));
   break;
 
case ir_binop_rshift:
   if (ir-type-base_type == GLSL_TYPE_INT)
-inst = emit(ASR(this-result, op[0], op[1]));
+emit(ASR(this-result, op[0], op[1]));
   else
-inst = emit(SHR(this-result, op[0], op[1]));
+emit(SHR(this-result, op[0], op[1]));
   break;
case ir_binop_pack_half_2x16_split:
   emit(FS_OPCODE_PACK_HALF_2x16_SPLIT, this-result, op[0], op[1]);
-- 
1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Fix a mistake in the comments for software counters.

2013-04-20 Thread Kenneth Graunke
The code doesn't set brw-query.obj to NULL, it sets query-bo to NULL.
---
 src/mesa/drivers/dri/i965/brw_queryobj.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c 
b/src/mesa/drivers/dri/i965/brw_queryobj.c
index 194725c..81e975a 100644
--- a/src/mesa/drivers/dri/i965/brw_queryobj.c
+++ b/src/mesa/drivers/dri/i965/brw_queryobj.c
@@ -421,7 +421,7 @@ brw_end_query(struct gl_context *ctx, struct 
gl_query_object *q)
   query-Base.Result = brw-sol.primitives_generated;
   brw-sol.counting_primitives_generated = false;
 
-  /* And set brw-query.obj to NULL so that this query won't try to wait
+  /* And set query-bo to NULL so that this query won't try to wait
* for any rendering to complete.
*/
   query-bo = NULL;
@@ -435,7 +435,7 @@ brw_end_query(struct gl_context *ctx, struct 
gl_query_object *q)
   query-Base.Result = brw-sol.primitives_written;
   brw-sol.counting_primitives_written = false;
 
-  /* And set brw-query.obj to NULL so that this query won't try to wait
+  /* And set query-bo to NULL so that this query won't try to wait
* for any rendering to complete.
*/
   query-bo = NULL;
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev