On Wed, 2018-12-05 at 11:33 -0600, Jason Ekstrand wrote:
> On Tue, Dec 4, 2018 at 1:17 AM Iago Toral Quiroga
> wrote:
> > ---
> >
> > src/compiler/spirv/vtn_glsl450.c | 36 +---
> >
> >
> > 1 file changed, 24 insertions(+), 12 deletions(-)
> >
> >
> >
> > diff
On Friday, 2018-11-09 18:04:12 +, Silvestrs Timofejevs wrote:
> Feature to print out EGL returned configs for debug purposes.
>
> 'eglChooseConfig' and 'eglGetConfigs' debug information printout is
> enabled when the log level equals '_EGL_DEBUG'. The configs are
> printed, and if any of them
Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is
one if 8, 16, 32, or 64. This leads to having a few more opcodes but
now everything is consistent and booleans aren't a weird special case
anymore.
Reviewed-by: Connor Abbott
---
src/amd/common/ac_nir_to_llvm.c | 12
Instead of using an OrderedDict, just have a (necessarily sorted) array
of transforms and a set of opcodes.
Reviewed-by: Connor Abbott
---
src/compiler/nir/nir_algebraic.py | 21 +++--
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git
This was originally added for the out-of-tree Mali driver but I think
we've all agreed it's easy enough for them to just do in their back-end.
Cc: Alyssa Rosenzweig
---
src/compiler/nir/nir.h| 3 ---
src/compiler/nir/nir_opt_algebraic.py | 5 +
2 files changed, 1
While we're at it, we rework them a bit to all use regular expressions
and assert more.
Reviewed-by: Connor Abbott
---
src/compiler/nir/nir_constant_expressions.py | 25 ++
src/compiler/nir/nir_opcodes.py | 34 +---
src/compiler/nir/nir_opcodes_c.py
Reviewed-by: Connor Abbott
---
src/compiler/nir/nir_opcodes.py | 22 +++---
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
index d69d09d30ce..00720708305 100644
---
---
src/compiler/nir/nir_lower_alu_to_scalar.c | 8 ++---
src/compiler/nir/nir_opcodes.py| 34 +++---
2 files changed, 21 insertions(+), 21 deletions(-)
diff --git a/src/compiler/nir/nir_lower_alu_to_scalar.c
b/src/compiler/nir/nir_lower_alu_to_scalar.c
index
Both of these things are already handled in the Value base class so we
don't need to handle them explicitly in Constant.
Reviewed-by: Connor Abbott
---
src/compiler/nir/nir_algebraic.py | 4
1 file changed, 4 deletions(-)
diff --git a/src/compiler/nir/nir_algebraic.py
This commit contains three related changes. First, we define boolN_t
for N = 8, 16, and 64 and move the definition of boolN_vec to the loop
with the other vec definitions. Second, there's no reason why we need
the != 0 on the source because that happens implicitly when it's
converted to bool.
Later in this series, bool is not going to imply 32-bit.
---
src/compiler/nir/nir_opt_algebraic.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/compiler/nir/nir_opt_algebraic.py
b/src/compiler/nir/nir_opt_algebraic.py
index 57abe7c9952..1f1dd9e8b77 100644
---
All conversion opcodes require a destination size but this makes
constructing certain algebraic expressions rather cumbersome. This
commit adds support to nir_search and nir_algebraic for writing
conversion opcodes without a size. These meta-opcodes match any
conversion of that type regardless
Reviewed-by: Connor Abbott
---
src/compiler/nir/nir_opt_algebraic.py | 9 ++---
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/src/compiler/nir/nir_opt_algebraic.py
b/src/compiler/nir/nir_opt_algebraic.py
index f2a7be0c403..27c90cebaee 100644
---
This commit adds support for 1-bit Booleans and integers. Booleans
obviously take a value of true or false. Because we have to define the
semantics of 1-bit signed and unsigned integers, we define uint1_t to
take values of 0 and 1 and int1_t to take values of 0 and -1. 1-bit
arithmetic is then
This is a v2 of my series to switch NIR over to 1-bit Booleans. The first
version of the series can be found here:
https://patchwork.freedesktop.org/series/51351/
Since then, a bit of work has been done on NIR to make the transition a bit
smoother. Connor rewrote the entire bit-size inference
Shader-db results on Kaby Lake:
total instructions in shared programs: 15072525 -> 15072525 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
This helps prevent regressions in later commits.
---
src/compiler/nir/nir_opt_algebraic.py | 2 ++
1 file changed, 2
Suffixes are dropped from a bunch of conversion opcodes when it makes
sense to do so. Others are kept if we really do want the bit-size
restriction.
Reviewed-by: Connor Abbott
---
src/compiler/nir/nir_opt_algebraic.py | 58 +--
1 file changed, 29 insertions(+), 29
Jordan Justen writes:
> This documents a process for using GitLab Merge Requests as an second
> way to submit code changes for Mesa. Only one of the two methods is
> allowed for each patch series.
>
> We will *not* require all patches to be emailed. Some code changes may
> be reviewed and merged
Ugh... This should not be 29 patches. I used the wrong base. The real
series starts at "nir/algebraic: Optimize x2b(xneg(a)) -> a"
On Thu, Dec 6, 2018 at 1:45 PM Jason Ekstrand wrote:
> This is a v2 of my series to switch NIR over to 1-bit Booleans. The first
> version of the series can be
Instead of looking at input_sizes[i] which contains the number of
components for each source, we look at the bit size of input_types[i].
This fixes a regression in the 1-bit boolean series though I have no
idea how we haven't seen it before now.
Fixes: 35baee5dce5 "nir/constant_folding: fix
We also have to add support for 1-bit integers while we're here so we
get 1-bit variants of iand, ior, and inot.
---
src/compiler/nir/nir_lower_alu_to_scalar.c | 4 +++
src/compiler/nir/nir_opcodes.py| 29 --
src/compiler/nir/nir_search.c | 4 ++-
3
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i
---
src/compiler/nir/nir_algebraic.py | 40 ++
src/compiler/nir/nir_opt_algebraic.py | 103 +-
2 files changed, 57 insertions(+), 86 deletions(-)
diff --git a/src/compiler/nir/nir_algebraic.py
b/src/compiler/nir/nir_algebraic.py
index
---
src/compiler/nir/nir.h | 24 -
src/compiler/nir/nir_loop_analyze.c| 28 +-
src/compiler/nir/nir_opt_if.c | 2 +-
src/compiler/nir/nir_opt_peephole_select.c | 2 +-
src/compiler/nir/nir_opt_undef.c | 2 +-
---
src/compiler/glsl/glsl_to_nir.cpp | 15 +++
src/compiler/nir/nir.h| 2 +-
src/compiler/nir/nir_builder.h| 4 ++--
src/compiler/nir_types.h | 4 +++-
src/compiler/spirv/spirv_to_nir.c | 2 +-
5 files changed, 18 insertions(+), 9 deletions(-)
diff --git
Reviewed-by: Connor Abbott
---
src/compiler/nir/nir_opt_algebraic.py | 112 +-
1 file changed, 56 insertions(+), 56 deletions(-)
diff --git a/src/compiler/nir/nir_opt_algebraic.py
b/src/compiler/nir/nir_opt_algebraic.py
index c482bde8b3b..aa1a7a94e6e 100644
---
This just makes it nicely scale across bit sizes.
---
src/compiler/nir/nir_opt_algebraic.py | 11 +--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/src/compiler/nir/nir_opt_algebraic.py
b/src/compiler/nir/nir_opt_algebraic.py
index 1f1dd9e8b77..72445ee830e 100644
---
---
src/compiler/nir/nir_algebraic.py | 30 ++
1 file changed, 30 insertions(+)
diff --git a/src/compiler/nir/nir_algebraic.py
b/src/compiler/nir/nir_algebraic.py
index c16cadbdc58..9a28421b799 100644
--- a/src/compiler/nir/nir_algebraic.py
+++
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i
We also enable it in all of the NIR drivers.
---
src/amd/vulkan/radv_shader.c | 2 +
src/broadcom/compiler/vir.c | 2 +
src/compiler/Makefile.sources| 1 +
src/compiler/nir/meson.build | 1 +
src/compiler/nir/nir.h
---
src/compiler/nir/nir_builder_opcodes_h.py | 39 +--
1 file changed, 1 insertion(+), 38 deletions(-)
diff --git a/src/compiler/nir/nir_builder_opcodes_h.py
b/src/compiler/nir/nir_builder_opcodes_h.py
index 5c38818d4ec..34b8c4371e1 100644
---
---
src/compiler/nir/nir_builder.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/compiler/nir/nir_builder.h b/src/compiler/nir/nir_builder.h
index 08c5f1e8b6c..826e549019a 100644
--- a/src/compiler/nir/nir_builder.h
+++ b/src/compiler/nir/nir_builder.h
@@ -212,9
D3D Booleans use a 32-bit 0/-1 representation. Because this previously
matched NIR exactly, we didn't have to really optimize for it. Now that
we have 1-bit Booleans, we need some specific optimizations to chew
through the D3D12-style Booleans.
Shader-db results on Kaby Lake:
total
---
src/compiler/nir/nir_opt_large_constants.c | 14 +-
src/compiler/nir_types.cpp | 2 +-
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/src/compiler/nir/nir_opt_large_constants.c
b/src/compiler/nir/nir_opt_large_constants.c
index
---
src/compiler/nir/nir_builder_opcodes_h.py | 39 ++-
1 file changed, 38 insertions(+), 1 deletion(-)
diff --git a/src/compiler/nir/nir_builder_opcodes_h.py
b/src/compiler/nir/nir_builder_opcodes_h.py
index 34b8c4371e1..5c38818d4ec 100644
---
https://bugs.freedesktop.org/show_bug.cgi?id=108961
Jason Ekstrand changed:
What|Removed |Added
Assignee|mesa-dev@lists.freedesktop. |cwabbo...@gmail.com
Thanks!
On Thu, Dec 6, 2018 at 2:24 PM Alyssa Rosenzweig
wrote:
> Reviewed-by: Alyssa Rosenzweig
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Instead of looking at input_sizes[i] which contains the number of
components for each source, we look at the bit size of input_types[i].
This fixes a regression in the 1-bit boolean series though I have no
idea how we haven't seen it before now.
Fixes: 35baee5dce5 "nir/constant_folding: fix
https://bugs.freedesktop.org/show_bug.cgi?id=108961
Bug ID: 108961
Summary: make check test_replace_src_bitsize failure
Product: Mesa
Version: git
Hardware: x86-64 (AMD64)
OS: All
Status: NEW
Keywords:
Quoting Jordan Justen (2018-12-05 15:32:05)
> This documents a process for using GitLab Merge Requests as an second
> way to submit code changes for Mesa. Only one of the two methods is
> allowed for each patch series.
>
> We will *not* require all patches to be emailed. Some code changes may
>
On 06.12.18 00:32, Jordan Justen wrote:
This documents a process for using GitLab Merge Requests as an second
way to submit code changes for Mesa. Only one of the two methods is
allowed for each patch series.
We will *not* require all patches to be emailed. Some code changes may
be reviewed and
On 2018-12-06 13:57:09, Nicolai Hähnle wrote:
> On 06.12.18 00:32, Jordan Justen wrote:
> > + To participate in code review, you should monitor the
> > + https://lists.freedesktop.org/mailman/listinfo/mesa-dev;>
> > + mesa-dev email list and the GitLab
> > + Mesa >
On Thu, Dec 6, 2018 at 3:57 PM Nicolai Hähnle wrote:
> On 06.12.18 00:32, Jordan Justen wrote:
> > This documents a process for using GitLab Merge Requests as an second
> > way to submit code changes for Mesa. Only one of the two methods is
> > allowed for each patch series.
> >
> > We will
On 12/05/2018 10:12 AM, Connor Abbott wrote:
> This won't work, since this optimization in nir_opt_algebraic will undo it:
>
> # For any float comparison operation, "cmp", if you have "a == a && a cmp b"
> # then the "a == a" is redundant because it's equivalent to "a is not NaN"
> # and, if a is
Is this going to be used by an extension? If you don't have a use for
it yet, it would probably be better to wait.
On Thu, Dec 6, 2018 at 3:01 PM Nicolai Hähnle wrote:
>
> From: Nicolai Hähnle
>
> Order-aware scan/reduce can trade-off LDS traffic for external atomics
> memory traffic in
From: Nicolai Hähnle
Linking against LLVM built with BUILD_SHARED_LIBS fails otherwise,
as the component is required for the draw module.
---
meson.build | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/meson.build b/meson.build
index 1aeef95f722..0177716c476 100644
---
From: Nicolai Hähnle
---
src/gallium/drivers/radeonsi/si_perfcounter.c | 2 +-
src/gallium/drivers/radeonsi/si_query.c | 6 +++---
src/gallium/drivers/radeonsi/si_query.h | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git
From: Nicolai Hähnle
Other callers of si_set_constant_buffer don't need it.
---
src/gallium/drivers/radeonsi/si_descriptors.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c
b/src/gallium/drivers/radeonsi/si_descriptors.c
From: Nicolai Hähnle
---
src/gallium/drivers/radeon/r600_perfcounter.c | 639
src/gallium/drivers/radeonsi/Makefile.sources | 1 -
src/gallium/drivers/radeonsi/meson.build | 1 -
src/gallium/drivers/radeonsi/si_perfcounter.c | 688 --
From: Nicolai Hähnle
---
src/gallium/drivers/radeonsi/si_debug.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/gallium/drivers/radeonsi/si_debug.c
b/src/gallium/drivers/radeonsi/si_debug.c
index 22019741d80..fe2970a0ea3 100644
---
From: Nicolai Hähnle
This is a move towards using composition instead of inheritance for
different query types.
This change weakens out-of-memory error reporting somewhat, though this
should be acceptable since we didn't consistently report such errors in
the first place.
---
From: Nicolai Hähnle
---
src/gallium/drivers/radeonsi/si_state_shaders.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index ad7d21e7816..0d4e1956037 100644
---
From: Nicolai Hähnle
---
src/gallium/drivers/radeonsi/si_blit.c| 2 +-
src/gallium/drivers/radeonsi/si_pipe.h| 2 +-
src/gallium/drivers/radeonsi/si_texture.c | 4 ++--
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/gallium/drivers/radeonsi/si_blit.c
From: Nicolai Hähnle
This is rather important for merged VS/TCS as LSHS shaders...
---
src/gallium/drivers/radeonsi/si_debug.c | 10 --
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/gallium/drivers/radeonsi/si_debug.c
b/src/gallium/drivers/radeonsi/si_debug.c
index
From: Nicolai Hähnle
This helps some debugging cases by initializing addrlib with
slightly more appropriate settings.
---
src/gallium/drivers/radeonsi/si_pipe.c| 34 --
src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c | 36 +++
2 files changed, 36
From: Nicolai Hähnle
---
src/amd/common/ac_surface.c | 8 +---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/src/amd/common/ac_surface.c b/src/amd/common/ac_surface.c
index d8d927ee1c5..aeba5e161c9 100644
--- a/src/amd/common/ac_surface.c
+++ b/src/amd/common/ac_surface.c
@@
From: Nicolai Hähnle
Prepare for some later refactoring.
---
src/gallium/drivers/radeonsi/si_shader.c | 43 ++--
1 file changed, 25 insertions(+), 18 deletions(-)
diff --git a/src/gallium/drivers/radeonsi/si_shader.c
b/src/gallium/drivers/radeonsi/si_shader.c
index
From: Nicolai Hähnle
Order-aware scan/reduce can trade-off LDS traffic for external atomics
memory traffic in producer/consumer compute shaders.
---
src/amd/common/ac_llvm_build.c | 195 -
src/amd/common/ac_llvm_build.h | 36 ++
2 files changed, 227
From: Nicolai Hähnle
There is never a read-after-write hazard because the command doesn't read.
---
src/gallium/drivers/radeonsi/si_cp_dma.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/gallium/drivers/radeonsi/si_cp_dma.c
b/src/gallium/drivers/radeonsi/si_cp_dma.c
From: Nicolai Hähnle
Reduce the number of places that encode buffer descriptors.
---
.../drivers/radeonsi/si_state_streamout.c | 61 ---
1 file changed, 11 insertions(+), 50 deletions(-)
diff --git a/src/gallium/drivers/radeonsi/si_state_streamout.c
From: Nicolai Hähnle
---
src/gallium/drivers/radeonsi/si_descriptors.c | 107 ++
src/gallium/drivers/radeonsi/si_state.h | 2 +
2 files changed, 64 insertions(+), 45 deletions(-)
diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c
From: Nicolai Hähnle
Remove a level of indirection to make the code more explicit -- should
make it easier to follow what's going on.
---
src/gallium/drivers/radeonsi/si_perfcounter.c | 143 --
1 file changed, 93 insertions(+), 50 deletions(-)
diff --git
From: Nicolai Hähnle
---
src/gallium/drivers/radeonsi/si_perfcounter.c | 13 ++--
src/gallium/drivers/radeonsi/si_query.c | 75 ++-
src/gallium/drivers/radeonsi/si_query.h | 18 +++--
3 files changed, 62 insertions(+), 44 deletions(-)
diff --git
We reuse the helpers we created.
---
src/gallium/drivers/virgl/virgl_buffer.c | 28 +++---
src/gallium/drivers/virgl/virgl_resource.h | 1 +
2 files changed, 10 insertions(+), 19 deletions(-)
diff --git a/src/gallium/drivers/virgl/virgl_buffer.c
Previously, we ignored the the glUnmap(..) operation and
flushed before we flush the cbuf. Now, let's just flush
the data when we unmap.
Neither method is optimal, for example:
glMapBufferRange(.., 0, 100, GL_MAP_FLUSH_EXPLICIT_BIT)
glFlushMappedBufferRange(.., 25, 30)
Will be reused.
---
src/gallium/drivers/virgl/virgl_resource.c | 24 +++
src/gallium/drivers/virgl/virgl_resource.h | 3 +++
src/gallium/drivers/virgl/virgl_texture.c | 27 +-
3 files changed, 28 insertions(+), 26 deletions(-)
diff --git
A resource is just a buffer with some metadata.
---
src/gallium/drivers/virgl/virgl_buffer.c | 51 +++--
src/gallium/drivers/virgl/virgl_context.c | 5 +-
src/gallium/drivers/virgl/virgl_resource.h | 21 +-
src/gallium/drivers/virgl/virgl_texture.c | 85 +++---
4
We can remove some duplicated code.
---
src/gallium/drivers/virgl/virgl_buffer.c | 33 +
src/gallium/drivers/virgl/virgl_resource.c | 84 +++---
src/gallium/drivers/virgl/virgl_resource.h | 16 ++---
src/gallium/drivers/virgl/virgl_texture.c | 70 ++
4
Will be reused.
---
src/gallium/drivers/virgl/virgl_resource.h | 11 ---
src/gallium/drivers/virgl/virgl_texture.c | 19 ++-
2 files changed, 18 insertions(+), 12 deletions(-)
diff --git a/src/gallium/drivers/virgl/virgl_resource.h
Will be reused.
---
src/gallium/drivers/virgl/virgl_resource.c | 37 +++
src/gallium/drivers/virgl/virgl_resource.h | 4 ++
src/gallium/drivers/virgl/virgl_texture.c | 52 +-
3 files changed, 51 insertions(+), 42 deletions(-)
diff --git
It's used for all types of resources.
---
src/gallium/drivers/virgl/virgl_buffer.c | 4 ++--
src/gallium/drivers/virgl/virgl_context.c | 4 ++--
src/gallium/drivers/virgl/virgl_context.h | 2 +-
src/gallium/drivers/virgl/virgl_screen.c | 4 ++--
src/gallium/drivers/virgl/virgl_screen.h | 2 +-
We could allocate and destroy transfers in one place.
---
src/gallium/drivers/virgl/virgl_buffer.c | 2 +-
src/gallium/drivers/virgl/virgl_resource.c | 47 +++---
src/gallium/drivers/virgl/virgl_resource.h | 14 --
src/gallium/drivers/virgl/virgl_texture.c | 58
util_format_get_blocksize returns 1 for R8 formats (all
PIPE_BUFFERs are R8).
---
src/gallium/drivers/virgl/virgl_resource.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/src/gallium/drivers/virgl/virgl_resource.c
b/src/gallium/drivers/virgl/virgl_resource.c
index
The ioctls don't even pass this (though they should).
Let's calculate this correctly in one place and then pass it down.
Note -- If anyone is using vtest with protocol version 1 (why?),
then you'll need this host side CL too since the layer stride
is forwarded for non-array textures.
With commit 89b479, we moved to tracking buffer cleanliness
when binding.
TEST=dEQP-GLES31.functional.image_load_store.buffer.load_store.r32ui
---
src/gallium/drivers/virgl/virgl_buffer.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/src/gallium/drivers/virgl/virgl_buffer.c
Am 07.12.18 um 03:20 schrieb Matt Turner:
> Since this is for an extension that will be BDW+ can we use the
> _cvtss_sh() intrinsic instead? It corresponds to an IVB+ instruction
> and even takes the rounding mode directly as an immediate argument.
Not saying trying to use it isn't a good idea,
From: Roland Scheidegger
AoS sampling tries to use integers for coord wrapping when possible,
as it should be faster. However, for AVX, this was suboptimal, because
only floats can use 8x32bit vectors, whereas integers have to be split
into 4x32bit vectors. (I believe part of why it was slower
Since this is for an extension that will be BDW+ can we use the
_cvtss_sh() intrinsic instead? It corresponds to an IVB+ instruction
and even takes the rounding mode directly as an immediate argument.
___
mesa-dev mailing list
Reviewed-by: Thomas Helland
---
src/compiler/nir/nir_loop_analyze.c | 31 ++---
1 file changed, 10 insertions(+), 21 deletions(-)
diff --git a/src/compiler/nir/nir_loop_analyze.c
b/src/compiler/nir/nir_loop_analyze.c
index 9c3fd2f286..c779383b36 100644
---
Following commits will introduce additional fields such as
guessed_trip_count. Renaming these will help avoid confusion
as our unrolling feature set grows.
Reviewed-by: Thomas Helland
---
src/compiler/nir/nir.h | 8 +---
src/compiler/nir/nir_loop_analyze.c| 14
This is three series combined. I've sent the first two previously
(patch 1-11 & patch 12-15) and they have been partially reviewed
by Thomas. Please see the previous sends of those series for cover
letters.
There is a small bug fix in patch 11 that was discovered by some
new piglit tests [1].
In order to stop continuously partially unrolling the same loop
we add the bool partialy_unrolled to nir_loop, we add it here
rather than in nir_loop_info because nir_loop_info is only set
via loop analysis and is intended to be cleared before each
analysis. Also nir_loop_info is never cloned.
---
This adds support to loop analysis for loops where the induction
variable is compared to the result of min(variable, constant).
For example:
for (int i = 0; i < imin(x, 4); i++)
...
We add a new bool to the loop terminator struct in order to
differentiate terminators with this exit
This will be used to help find the trip count of loops that look
like the following:
while (a < x && i < 8) {
...
i++;
}
Where the NIR will end up looking something like this:
vec1 32 ssa_0 = load_const (0x /* 0.00 */)
vec1 32 ssa_1 = load_const (0x0008
Reviewed-by: Thomas Helland
---
src/compiler/nir/nir_loop_analyze.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/compiler/nir/nir_loop_analyze.c
b/src/compiler/nir/nir_loop_analyze.c
index fbaa638884..ef69422c12 100644
--- a/src/compiler/nir/nir_loop_analyze.c
+++
This will allow us to improve analysis to find more induction
variables.
Reviewed-by: Thomas Helland
---
src/compiler/nir/nir_loop_analyze.c | 34 ++---
1 file changed, 21 insertions(+), 13 deletions(-)
diff --git a/src/compiler/nir/nir_loop_analyze.c
Here we create a helper is_supported_terminator_condition()
and use that rather than embedding all the trip count code
inside a switch.
The new helper will also be used in a following patch.
---
src/compiler/nir/nir_loop_analyze.c | 172 +++-
1 file changed, 93
This allows loop analysis to detect inductions variables that
are incremented in both branches of an if rather than in a main
loop block. For example:
loop {
block block_1:
/* preds: block_0 block_7 */
vec1 32 ssa_8 = phi block_0: ssa_4, block_7: ssa_20
vec1 32 ssa_9 =
For some loops can have a single terminator but the exact trip
count is still unknown. For example:
for (int i = 0; i < imin(x, 4); i++)
...
Shader-db results radeonsi (all affected are from Tropico 5):
Totals from affected shaders:
SGPRS: 200 -> 208 (4.00 %)
VGPRS: 164 -> 148 (-9.76
Rather than getting this from the alu instruction this allows us
some flexibility. In the following pass we instead pass the
inverse op.
---
src/compiler/nir/nir_loop_analyze.c | 17 ++---
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git
From: Danylo Piliaiev
Removing the last continue can allow more loops to unroll. Also
inserting code into the if branch can allow the various if opts
to progress further.
The insertion of some loops into the if branch also reduces VGPR
use in some shaders.
vkpipeline-db results (VEGA):
Totals
This will be used to help find the trip count of loops that look
like the following:
while (a < x && i < 8) {
...
i++;
}
Where the NIR will end up looking something like this:
vec1 32 ssa_0 = load_const (0x /* 0.00 */)
vec1 32 ssa_1 = load_const (0x0008
This helps make find_trip_count() a little easier to follow but
will also be used by a following patch.
---
src/compiler/nir/nir_loop_analyze.c | 41 ++---
1 file changed, 26 insertions(+), 15 deletions(-)
diff --git a/src/compiler/nir/nir_loop_analyze.c
Here we rework force_unroll_array_access() so that we can reused
the induction variable detection in a following patch.
Reviewed-by: Thomas Helland
---
src/compiler/nir/nir_loop_analyze.c | 49 -
1 file changed, 35 insertions(+), 14 deletions(-)
diff --git
This adds partial loop unrolling support and makes use of a
guessed trip count based on array access.
The code is written so that we could use partial unrolling
more generally, but for now it's only use when we have guessed
the trip count.
We use partial unrolling for this guessed trip count
Reviewed-by: Thomas Helland
---
src/compiler/nir/nir_opt_loop_unroll.c | 76 ++
1 file changed, 28 insertions(+), 48 deletions(-)
diff --git a/src/compiler/nir/nir_opt_loop_unroll.c
b/src/compiler/nir/nir_opt_loop_unroll.c
index 0e9966320b..c267c185b6 100644
---
Reviewed-by: Thomas Helland
---
src/compiler/nir/nir_opt_loop_unroll.c | 115 ++---
1 file changed, 64 insertions(+), 51 deletions(-)
diff --git a/src/compiler/nir/nir_opt_loop_unroll.c
b/src/compiler/nir/nir_opt_loop_unroll.c
index c267c185b6..8406880204 100644
---
This detects an induction variable used as an array index to guess
the trip count of the loop. This enables us to do a partial
unroll of the loop, with can eventually result in the loop being
eliminated.
---
src/compiler/nir/nir.h | 4 ++
src/compiler/nir/nir_loop_analyze.c | 78
Reviewed-by: Thomas Helland
---
src/compiler/nir/nir_control_flow.h | 10 ++
1 file changed, 10 insertions(+)
diff --git a/src/compiler/nir/nir_control_flow.h
b/src/compiler/nir/nir_control_flow.h
index 2ea460e5df..9111b30a29 100644
--- a/src/compiler/nir/nir_control_flow.h
+++
In preparation for the definition of a function to log a formatted
string.
---
src/mesa/drivers/dri/i915/intel_context.h | 18 +--
src/mesa/drivers/dri/i915/intel_fbo.c | 10 +++---
src/mesa/drivers/dri/i965/brw_context.h | 18 +--
1 - 100 of 148 matches
Mail list logo