Re: [Freedreno] [RFC 0/4] freedreno: Move some compiler offset computations to NIR

2019-01-25 Thread Rob Clark
On Fri, Jan 25, 2019 at 10:48 AM Eduardo Lima Mitev  wrote:
>
> There are a bunch of instructions emitted on ir3_compiler_nir related to
> offset computations for IO opcodes (ssbo, image, etc). This small series
> explores the possibility of moving these instructions to NIR, where we
> have higher chances of optimizing them.
>
> The series introduces a new, freedreno specific NIR pass,
> 'ir3_nir_lower_sampler_io' (final name not set). The pass is executed
> early on ir3_optimize_nir(), and the goal is to centralize all these
> computations there, hoping that later NIR passes will produce better
> code than what is currently emitted.

I can think of a few other things that would be interesting to lower
to driver specific nir opcodes (imul and various lowering for tex
instructions come to mind.. but probably also ubo and ssbo address
calculation.. maybe it could even make sense for some of the single
src alu instructions that translate into multiple ir3 instructions,
not sure)..

Are you thinking about having separate passes for each?  I guess at
least for alu instructions we might able to use nir_algebraic so
having things split up might be easier.

> So far, we have just implemented byte-offset computation for image store
> and atomics. This seemed like a good first target given the amount of
> instructions being emitted for it by the backend.
>
> This is an RFC series because there are a few open questions, but we
> wanted to gather feedback already now, in case this effort is something
> not worth it; and also hoping that somebody else will give it a try
> against other shaders and on other gens (we have just tried this on
> a5xx).
>
> * We have so far been unable to see any improvement in generated code
> (not a penalty either). shader-db has not been specially useful. Few
> shaders there exercise image store or image atomic ops, and of those
> that do, most require higher versions of GLSL than what freedreno
> supports, so they get skipped. The few that do actually run, don't
> show any meaningful difference.

I guess it would be easy enough to construct shaders that would
benefit from this, but maybe that is cheating..

Possibly UBO and SSBO is a better target, I guess there you might be
more likely to see patterns of access of successive elements (ie.
foo[idx], foo[idx+1], etc)?

Anyways, since we don't try to do (and I'd rather not do) any sort of
CSE post nir->ir3 I think starting to introduce more ir3 specific
nir->nir lowering seems like a thing we need, so I'm pretty happy that
someone is looking at this :-)

BR,
-R

> Then other shaders picked from tests suites are simple enough not to
> produce any difference in code either.
>
> There is still on-going work looking for cases where the pass helps
> instruction stats, whether writing custom shaders or porting complex
> shader from shader-db to run on GLES 310.
>
> There is though an open question here as to whether moving backend
> code to NIR is a benefit in and of itself.
>
> * The series adds a nir_op_imad opcode that didn't exist before, and
> perhaps not generally useful even for freedreno outside this pass,
> because it maps to IR3_MAD_S24 which is probably not suitable for
> generic integer multiply-add.
>
> * The pass currently has 2 alternative code-paths to emit the
> multiplication by the bytes-per-pixel of an image format. In one
> case, since this value can be obtained at compile time, it is
> emitted as an immediate by nir_imul_imm. The other alternative is
> emitting an nir_imul with an SSA value that will map to
> image_dims[0] at shader runtime.
>
> The former case is uglier but produces better code (a single SHL
> instruction), whereas the latter involves a generic imul, for which
> the backend emits a lot of code to cover for overflow.
>
> The doubt here is whether we should introduce a (lower precision)
> version of imul that maps directly to IR3_IMUL_S.
>
>
> A live (WIP) tree of the series can be found at:
> 
>
> We plan to continue moving computations to the pass if we see
> good opportunities.
>
> Feedback very welcome,
>
> cheers,
> Eduardo
>
> Eduardo Lima Mitev (4):
>   nir: Add a new intrinsic 'load_image_stride'
>   nir: Add a new ALU nir_op_imad
>   ir3/nir: Add a new pass 'ir3_nir_lower_sampler_io'
>   ir3: Use ir3_nir_lower_sampler_io pass
>
>  src/compiler/nir/nir_intrinsics.py   |   2 +
>  src/compiler/nir/nir_opcodes.py  |   1 +
>  src/freedreno/Makefile.sources   |   1 +
>  src/freedreno/ir3/ir3_compiler_nir.c |  61 ++--
>  src/freedreno/ir3/ir3_nir.c  |   1 +
>  src/freedreno/ir3/ir3_nir.h  |   1 +
>  src/freedreno/ir3/ir3_nir_lower_sampler_io.c | 349 +++
>  7 files changed, 383 insertions(+), 33 deletions(-)
>  create mode 100644 src/freedreno/ir3/ir3_nir_lower_sampler_io.c
>
> --
> 2.20.1
>
> ___
> 

Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad

2019-01-25 Thread Eric Anholt
Eduardo Lima Mitev  writes:

> ir3 compiler has an integer multiply-add instruction (IMAD_S24)
> that is used for different offset calculations in the backend.
> Since we intend to move some of these calculations to NIR, we need
> a new ALU op that can represent it.
> ---
>  src/compiler/nir/nir_opcodes.py | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
> index d32005846a6..b61845fd514 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -754,6 +754,7 @@ def triop_horiz(name, output_size, src1_size, src2_size, 
> src3_size, const_expr):
> [tuint, tuint, tuint], "", const_expr)
>  
>  triop("ffma", tfloat, "src0 * src1 + src2")
> +triop("imad", tint, "src0 * src1 + src2")

The constant-folding expression should be corrected to what the backend
operation actually does, and you should probably call it imad24 or
something in that case.


signature.asc
Description: PGP signature
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [Mesa-dev] [RFC 1/4] nir: Add a new intrinsic 'load_image_stride'

2019-01-25 Thread Eric Anholt
Eduardo Lima Mitev  writes:

> This is an internal intrinsic intended to be injected by a
> (freedreno-specific) 'lower_sampler_io' pass that will be introduced
> later in this series; and consumed by ir3_compiler_nir.
>
> The intrinsic will load in SSA values for various constants
> for images (image_dims), namely the format's bytes-per-pixel,
> y-stride and z-stride; for which the backend compiler will emit
> the corresponding uniforms.
>
> const_index[0] is the image index, and const_index[1] is the index
> into image_dims' register file for a given image: 0 for bpp, 1 to
> y-stride and 2 for z-stride.

Can you move this documentation of the meaning of the intrinsic into a
comment in the file?


signature.asc
Description: PGP signature
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad

2019-01-25 Thread Ilia Mirkin
The specification in NIR has to be exact. Otherwise it will
constant-fold in a way that doesn't reflect what the hardware would
do, leading to subtle bugs.

On Fri, Jan 25, 2019 at 11:06 AM Eduardo Lima Mitev  wrote:
>
> On 1/25/19 5:01 PM, Ilia Mirkin wrote:
> > On Fri, Jan 25, 2019 at 10:58 AM Ilia Mirkin  wrote:
> >>
> >> IMAD_S24 isn't src0 * src1 + src2 though. I think this could be called
> >> imad24, which I suspect exits on many GPUs (nv50-era NVIDIA definitely
> >> had this, and I think maxwell+ has a variant of this implemented by
> >> XMAD):
> >>
> >> (src0 * src1) & 0xff + src2
> >
> > And of course even that's wrong... the 24th bit has to get
> > sign-extended on that. Can express it with shifts.
> >
>
> IMAD_S24 is what is currently used in
> ir3_compiler_nir::get_image_offset(), so the pass doesn't change
> anything regarding computations.
>
> I agree that the nir opcode should hint at the bit limit, so probably
> nir_op_imad24. That is one of the open questions.
>
> Thanks,
>
> Eduardo
>
>
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad

2019-01-25 Thread Eduardo Lima Mitev
On 1/25/19 5:01 PM, Ilia Mirkin wrote:
> On Fri, Jan 25, 2019 at 10:58 AM Ilia Mirkin  wrote:
>>
>> IMAD_S24 isn't src0 * src1 + src2 though. I think this could be called
>> imad24, which I suspect exits on many GPUs (nv50-era NVIDIA definitely
>> had this, and I think maxwell+ has a variant of this implemented by
>> XMAD):
>>
>> (src0 * src1) & 0xff + src2
> 
> And of course even that's wrong... the 24th bit has to get
> sign-extended on that. Can express it with shifts.
>

IMAD_S24 is what is currently used in
ir3_compiler_nir::get_image_offset(), so the pass doesn't change
anything regarding computations.

I agree that the nir opcode should hint at the bit limit, so probably
nir_op_imad24. That is one of the open questions.

Thanks,

Eduardo


___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad

2019-01-25 Thread Ilia Mirkin
On Fri, Jan 25, 2019 at 10:58 AM Ilia Mirkin  wrote:
>
> IMAD_S24 isn't src0 * src1 + src2 though. I think this could be called
> imad24, which I suspect exits on many GPUs (nv50-era NVIDIA definitely
> had this, and I think maxwell+ has a variant of this implemented by
> XMAD):
>
> (src0 * src1) & 0xff + src2

And of course even that's wrong... the 24th bit has to get
sign-extended on that. Can express it with shifts.
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad

2019-01-25 Thread Ilia Mirkin
IMAD_S24 isn't src0 * src1 + src2 though. I think this could be called
imad24, which I suspect exits on many GPUs (nv50-era NVIDIA definitely
had this, and I think maxwell+ has a variant of this implemented by
XMAD):

(src0 * src1) & 0xff + src2

Cheers,

  -ilia

On Fri, Jan 25, 2019 at 10:49 AM Eduardo Lima Mitev  wrote:
>
> ir3 compiler has an integer multiply-add instruction (IMAD_S24)
> that is used for different offset calculations in the backend.
> Since we intend to move some of these calculations to NIR, we need
> a new ALU op that can represent it.
> ---
>  src/compiler/nir/nir_opcodes.py | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
> index d32005846a6..b61845fd514 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -754,6 +754,7 @@ def triop_horiz(name, output_size, src1_size, src2_size, 
> src3_size, const_expr):
> [tuint, tuint, tuint], "", const_expr)
>
>  triop("ffma", tfloat, "src0 * src1 + src2")
> +triop("imad", tint, "src0 * src1 + src2")
>
>  triop("flrp", tfloat, "src0 * (1 - src2) + src1 * src2")
>
> --
> 2.20.1
>
> ___
> mesa-dev mailing list
> mesa-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [RFC 2/4] nir: Add a new ALU nir_op_imad

2019-01-25 Thread Eduardo Lima Mitev
ir3 compiler has an integer multiply-add instruction (IMAD_S24)
that is used for different offset calculations in the backend.
Since we intend to move some of these calculations to NIR, we need
a new ALU op that can represent it.
---
 src/compiler/nir/nir_opcodes.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
index d32005846a6..b61845fd514 100644
--- a/src/compiler/nir/nir_opcodes.py
+++ b/src/compiler/nir/nir_opcodes.py
@@ -754,6 +754,7 @@ def triop_horiz(name, output_size, src1_size, src2_size, 
src3_size, const_expr):
[tuint, tuint, tuint], "", const_expr)
 
 triop("ffma", tfloat, "src0 * src1 + src2")
+triop("imad", tint, "src0 * src1 + src2")
 
 triop("flrp", tfloat, "src0 * (1 - src2) + src1 * src2")
 
-- 
2.20.1

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [RFC 0/4] freedreno: Move some compiler offset computations to NIR

2019-01-25 Thread Eduardo Lima Mitev
There are a bunch of instructions emitted on ir3_compiler_nir related to
offset computations for IO opcodes (ssbo, image, etc). This small series
explores the possibility of moving these instructions to NIR, where we
have higher chances of optimizing them.

The series introduces a new, freedreno specific NIR pass,
'ir3_nir_lower_sampler_io' (final name not set). The pass is executed
early on ir3_optimize_nir(), and the goal is to centralize all these
computations there, hoping that later NIR passes will produce better
code than what is currently emitted.

So far, we have just implemented byte-offset computation for image store
and atomics. This seemed like a good first target given the amount of
instructions being emitted for it by the backend.

This is an RFC series because there are a few open questions, but we
wanted to gather feedback already now, in case this effort is something
not worth it; and also hoping that somebody else will give it a try
against other shaders and on other gens (we have just tried this on
a5xx).

* We have so far been unable to see any improvement in generated code
(not a penalty either). shader-db has not been specially useful. Few
shaders there exercise image store or image atomic ops, and of those
that do, most require higher versions of GLSL than what freedreno
supports, so they get skipped. The few that do actually run, don't
show any meaningful difference.

Then other shaders picked from tests suites are simple enough not to
produce any difference in code either.

There is still on-going work looking for cases where the pass helps
instruction stats, whether writing custom shaders or porting complex
shader from shader-db to run on GLES 310.

There is though an open question here as to whether moving backend
code to NIR is a benefit in and of itself.

* The series adds a nir_op_imad opcode that didn't exist before, and
perhaps not generally useful even for freedreno outside this pass,
because it maps to IR3_MAD_S24 which is probably not suitable for
generic integer multiply-add.

* The pass currently has 2 alternative code-paths to emit the
multiplication by the bytes-per-pixel of an image format. In one
case, since this value can be obtained at compile time, it is
emitted as an immediate by nir_imul_imm. The other alternative is
emitting an nir_imul with an SSA value that will map to
image_dims[0] at shader runtime.

The former case is uglier but produces better code (a single SHL
instruction), whereas the latter involves a generic imul, for which
the backend emits a lot of code to cover for overflow.

The doubt here is whether we should introduce a (lower precision)
version of imul that maps directly to IR3_IMUL_S.


A live (WIP) tree of the series can be found at:


We plan to continue moving computations to the pass if we see
good opportunities.

Feedback very welcome,

cheers,
Eduardo

Eduardo Lima Mitev (4):
  nir: Add a new intrinsic 'load_image_stride'
  nir: Add a new ALU nir_op_imad
  ir3/nir: Add a new pass 'ir3_nir_lower_sampler_io'
  ir3: Use ir3_nir_lower_sampler_io pass

 src/compiler/nir/nir_intrinsics.py   |   2 +
 src/compiler/nir/nir_opcodes.py  |   1 +
 src/freedreno/Makefile.sources   |   1 +
 src/freedreno/ir3/ir3_compiler_nir.c |  61 ++--
 src/freedreno/ir3/ir3_nir.c  |   1 +
 src/freedreno/ir3/ir3_nir.h  |   1 +
 src/freedreno/ir3/ir3_nir_lower_sampler_io.c | 349 +++
 7 files changed, 383 insertions(+), 33 deletions(-)
 create mode 100644 src/freedreno/ir3/ir3_nir_lower_sampler_io.c

-- 
2.20.1

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [RFC 4/4] ir3: Use ir3_nir_lower_sampler_io pass

2019-01-25 Thread Eduardo Lima Mitev
This effectively removes all offset calculations in
ir3_compiler_nir::get_image_offset().

No regressions observed on affected tests from Khronos CTS and piglit
suites, compared to master.

Collecting useful stats on helps/hurts caused by this pass is WIP. Very
few shaders in shader-db data-base exercise image store or image
atomic ops, and of those that do, most require higher versions of
GLSL than what freedreno supports, so they get skipped.

There is on-going work writing/porting shaders to collect useful
stats. So far, all tested show no meaningful difference compared
to master.
---
 src/freedreno/ir3/ir3_compiler_nir.c | 61 +---
 src/freedreno/ir3/ir3_nir.c  |  1 +
 2 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/src/freedreno/ir3/ir3_compiler_nir.c 
b/src/freedreno/ir3/ir3_compiler_nir.c
index fd641735620..fe329db658c 100644
--- a/src/freedreno/ir3/ir3_compiler_nir.c
+++ b/src/freedreno/ir3/ir3_compiler_nir.c
@@ -548,6 +548,9 @@ emit_alu(struct ir3_context *ctx, nir_alu_instr *alu)
ir3_MADSH_M16(b, src[0], 0, src[1], 0,
ir3_MULL_U(b, src[0], 0, 
src[1], 0), 0), 0);
break;
+   case nir_op_imad:
+   dst[0] = ir3_MAD_S24(b, src[0], 0, src[1], 0, src[2], 0);
+   break;
case nir_op_ineg:
dst[0] = ir3_ABSNEG_S(b, src[0], IR3_REG_SNEG);
break;
@@ -1172,44 +1175,19 @@ get_image_type(const nir_variable *var)
 
 static struct ir3_instruction *
 get_image_offset(struct ir3_context *ctx, const nir_variable *var,
-   struct ir3_instruction * const *coords, bool byteoff)
+   struct ir3_instruction * const *coords)
 {
struct ir3_block *b = ctx->block;
-   struct ir3_instruction *offset;
-   unsigned ncoords = get_image_coords(var, NULL);
-
-   /* to calculate the byte offset (yes, uggg) we need (up to) three
-* const values to know the bytes per pixel, and y and z stride:
-*/
-   unsigned cb = regid(ctx->so->constbase.image_dims, 0) +
-   ctx->so->const_layout.image_dims.off[var->data.driver_location];
 
debug_assert(ctx->so->const_layout.image_dims.mask &
(1 << var->data.driver_location));
 
-   /* offset = coords.x * bytes_per_pixel: */
-   offset = ir3_MUL_S(b, coords[0], 0, create_uniform(b, cb + 0), 0);
-   if (ncoords > 1) {
-   /* offset += coords.y * y_pitch: */
-   offset = ir3_MAD_S24(b, create_uniform(b, cb + 1), 0,
-   coords[1], 0, offset, 0);
-   }
-   if (ncoords > 2) {
-   /* offset += coords.z * z_pitch: */
-   offset = ir3_MAD_S24(b, create_uniform(b, cb + 2), 0,
-   coords[2], 0, offset, 0);
-   }
-
-   if (!byteoff) {
-   /* Some cases, like atomics, seem to use dword offset instead
-* of byte offsets.. blob just puts an extra shr.b in there
-* in those cases:
-*/
-   offset = ir3_SHR_B(b, offset, 0, create_immed(b, 2), 0);
-   }
-
+   /* ir3_nir_lower_sampler_io pass should have placed the final
+* byte-offset (or dword offset for atomics) at the 4th component
+* of the coordinate vector.
+*/
return ir3_create_collect(ctx, (struct ir3_instruction*[]){
-   offset,
+   coords[3],
create_immed(b, 0),
}, 2);
 }
@@ -1341,7 +1319,7 @@ emit_intrinsic_store_image(struct ir3_context *ctx, 
nir_intrinsic_instr *intr)
 * src2 is 64b byte offset
 */
 
-   offset = get_image_offset(ctx, var, coords, true);
+   offset = get_image_offset(ctx, var, coords);
 
/* NOTE: stib seems to take byte offset, but stgb.typed can be used
 * too and takes a dword offset.. not quite sure yet why blob uses
@@ -1443,7 +1421,7 @@ emit_intrinsic_atomic_image(struct ir3_context *ctx, 
nir_intrinsic_instr *intr)
 */
src0 = ir3_get_src(ctx, >src[3])[0];
src1 = ir3_create_collect(ctx, coords, ncoords);
-   src2 = get_image_offset(ctx, var, coords, false);
+   src2 = get_image_offset(ctx, var, coords);
 
switch (intr->intrinsic) {
case nir_intrinsic_image_deref_atomic_add:
@@ -1612,6 +1590,23 @@ emit_intrinsic(struct ir3_context *ctx, 
nir_intrinsic_instr *intr)
}
 
switch (intr->intrinsic) {
+   case nir_intrinsic_load_image_stride: {
+   idx = intr->const_index[0];
+
+   /* this is the index into image_dims offsets, which can take
+* values 0, 1 or 2 (bpp, y-stride, z-stride respectively).
+*/
+   uint8_t off = intr->const_index[1];
+   debug_assert(off <= 2);
+
+   unsigned cb = regid(ctx->so->constbase.image_dims, 0) +
+  

[Freedreno] [RFC 3/4] ir3/nir: Add a new pass 'ir3_nir_lower_sampler_io'

2019-01-25 Thread Eduardo Lima Mitev
This pass moves to NIR some offset calculations that are currently
implemented on the backend compiler, to allow NIR to possibly
optimize them.

For now, only coordinate byte-offset calculations for imageStore
and image atomic operations are implemented.
---
 src/freedreno/Makefile.sources   |   1 +
 src/freedreno/ir3/ir3_nir.h  |   1 +
 src/freedreno/ir3/ir3_nir_lower_sampler_io.c | 349 +++
 3 files changed, 351 insertions(+)
 create mode 100644 src/freedreno/ir3/ir3_nir_lower_sampler_io.c

diff --git a/src/freedreno/Makefile.sources b/src/freedreno/Makefile.sources
index 7fea9de39ef..fd4f7f294cd 100644
--- a/src/freedreno/Makefile.sources
+++ b/src/freedreno/Makefile.sources
@@ -31,6 +31,7 @@ ir3_SOURCES := \
ir3/ir3_legalize.c \
ir3/ir3_nir.c \
ir3/ir3_nir.h \
+   ir3/ir3_nir_lower_sampler_io.c \
ir3/ir3_nir_lower_tg4_to_tex.c \
ir3/ir3_print.c \
ir3/ir3_ra.c \
diff --git a/src/freedreno/ir3/ir3_nir.h b/src/freedreno/ir3/ir3_nir.h
index 74201d34160..52809ba099e 100644
--- a/src/freedreno/ir3/ir3_nir.h
+++ b/src/freedreno/ir3/ir3_nir.h
@@ -36,6 +36,7 @@ void ir3_nir_scan_driver_consts(nir_shader *shader, struct 
ir3_driver_const_layo
 
 bool ir3_nir_apply_trig_workarounds(nir_shader *shader);
 bool ir3_nir_lower_tg4_to_tex(nir_shader *shader);
+bool ir3_nir_lower_sampler_io(nir_shader *shader);
 
 const nir_shader_compiler_options * ir3_get_compiler_options(struct 
ir3_compiler *compiler);
 bool ir3_key_lowers_nir(const struct ir3_shader_key *key);
diff --git a/src/freedreno/ir3/ir3_nir_lower_sampler_io.c 
b/src/freedreno/ir3/ir3_nir_lower_sampler_io.c
new file mode 100644
index 000..e2910d8906d
--- /dev/null
+++ b/src/freedreno/ir3/ir3_nir_lower_sampler_io.c
@@ -0,0 +1,349 @@
+/*
+ * Copyright © 2018 Igalia S.L.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "ir3_nir.h"
+#include "compiler/nir/nir_builder.h"
+
+/**
+ * The goal of this pass is to move to NIR some offset calculations for
+ * different I/O that are currently implemented on the backend compiler,
+ * to allow NIR to possibly optimize them.
+ *
+ * Currently, only offset calculations for image store and image
+ * atomic operations are implemented.
+ */
+
+
+/* This flag enables/disables a code-path where the bytes-per-pixel of
+ * an image is obtained directly from the format, which is known
+ * at shader compile time; as opposed to using image_dims[0] constant
+ * available only at shader runtime.
+ *
+ * Inlining the bytes-per-pixel here as an immediate has the advantage
+ * that it gets converted to a single (SHL) instruction (because all
+ * possible values are powers of two); while loading it as a uniform and
+ * emitting an IMUL will cause the backend to expand it to quite a few
+ * instructions (see ir3_compiler_nir for imul), thus ultimately hurting
+ * instruction count.
+ */
+#define INLINE_BPP 1
+
+
+static bool
+intrinsic_is_image_atomic(unsigned intrinsic)
+{
+   switch (intrinsic) {
+   case nir_intrinsic_image_deref_atomic_add:
+   case nir_intrinsic_image_deref_atomic_min:
+   case nir_intrinsic_image_deref_atomic_max:
+   case nir_intrinsic_image_deref_atomic_and:
+   case nir_intrinsic_image_deref_atomic_or:
+   case nir_intrinsic_image_deref_atomic_xor:
+   case nir_intrinsic_image_deref_atomic_exchange:
+   case nir_intrinsic_image_deref_atomic_comp_swap:
+   return true;
+   default:
+   break;
+   }
+
+   return false;
+}
+
+static bool
+intrinsic_is_image_store_or_atomic(unsigned intrinsic)
+{
+   if (intrinsic == nir_intrinsic_image_deref_store)
+   return true;
+   else
+   return intrinsic_is_image_atomic(intrinsic);
+}
+
+/*
+ * FIXME: shamelessly copied from ir3_compiler_nir until it gets factorized
+ * out at some point.
+ */

[Freedreno] [RFC 1/4] nir: Add a new intrinsic 'load_image_stride'

2019-01-25 Thread Eduardo Lima Mitev
This is an internal intrinsic intended to be injected by a
(freedreno-specific) 'lower_sampler_io' pass that will be introduced
later in this series; and consumed by ir3_compiler_nir.

The intrinsic will load in SSA values for various constants
for images (image_dims), namely the format's bytes-per-pixel,
y-stride and z-stride; for which the backend compiler will emit
the corresponding uniforms.

const_index[0] is the image index, and const_index[1] is the index
into image_dims' register file for a given image: 0 for bpp, 1 to
y-stride and 2 for z-stride.
---
 src/compiler/nir/nir_intrinsics.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/compiler/nir/nir_intrinsics.py 
b/src/compiler/nir/nir_intrinsics.py
index a5cc3f7401c..169c835d746 100644
--- a/src/compiler/nir/nir_intrinsics.py
+++ b/src/compiler/nir/nir_intrinsics.py
@@ -590,6 +590,8 @@ load("shared", 1, [BASE, ALIGN_MUL, ALIGN_OFFSET], 
[CAN_ELIMINATE])
 load("push_constant", 1, [BASE, RANGE], [CAN_ELIMINATE, CAN_REORDER])
 # src[] = { offset }. const_index[] = { base, range }
 load("constant", 1, [BASE, RANGE], [CAN_ELIMINATE, CAN_REORDER])
+# src[] = { offset }. const_index[] = { image_idx, dim_idx }
+load("image_stride", 1, [], [CAN_REORDER])
 
 # Stores work the same way as loads, except now the first source is the value
 # to store and the second (and possibly third) source specify where to store
-- 
2.20.1

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [PATCH 01/26] drm/irq: Don't check for DRIVER_HAVE_IRQ in drm_irq_(un)install

2019-01-25 Thread Emil Velikov
On Thu, 24 Jan 2019 at 16:58, Daniel Vetter  wrote:
>
> If a non-legacy driver calls these it's valid to assume there is
> interrupt support. The flag is really only needed for legacy drivers.

... legacy drivers which issue the IRQ via the DRM_IOCTL_CONTROL legacy IOCTL.

At a later stage, we might as well add LEGACY to the name:
DRIVER_LEGACY_HAVE_IRQ?


>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 2 +-
>  drivers/gpu/drm/arm/hdlcd_drv.c  | 2 +-
>  drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c | 2 +-
>  drivers/gpu/drm/drm_irq.c| 6 --
>  drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c| 2 +-
>  drivers/gpu/drm/gma500/psb_drv.c | 2 +-
>  drivers/gpu/drm/i915/i915_drv.c  | 2 +-
>  drivers/gpu/drm/meson/meson_drv.c| 2 +-
>  drivers/gpu/drm/msm/msm_drv.c| 3 +--
>  drivers/gpu/drm/mxsfb/mxsfb_drv.c| 3 +--
>  drivers/gpu/drm/qxl/qxl_drv.c| 2 +-
>  drivers/gpu/drm/radeon/radeon_drv.c  | 2 +-
>  drivers/gpu/drm/shmobile/shmob_drm_drv.c | 2 +-
>  drivers/gpu/drm/tilcdc/tilcdc_drv.c  | 2 +-
>  drivers/gpu/drm/vc4/vc4_drv.c| 1 -
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  | 2 +-
>  drivers/staging/vboxvideo/vbox_drv.c | 2 +-

Local grep shows one more non-legacy entry in
drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c

With that file addressed and trivial comment additions, patches: 1, 2,
3 and 26 are
Reviewed-by: Emil Velikov 

HTH
Emil
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [PATCH 01/26] drm/irq: Don't check for DRIVER_HAVE_IRQ in drm_irq_(un)install

2019-01-25 Thread Sam Ravnborg
Hi Daniel.

On Thu, Jan 24, 2019 at 05:58:06PM +0100, Daniel Vetter wrote:
> If a non-legacy driver calls these it's valid to assume there is
> interrupt support. The flag is really only needed for legacy drivers.
> 
> Also remove all the flag usage from non-legacy drivers.
> 
> Signed-off-by: Daniel Vetter 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: intel-...@lists.freedesktop.org
> Cc: linux-amlo...@lists.infradead.org
> Cc: linux-arm-...@vger.kernel.org
> Cc: freedreno@lists.freedesktop.org
> Cc: virtualizat...@lists.linux-foundation.org
> Cc: spice-de...@lists.freedesktop.org
> Cc: amd-...@lists.freedesktop.org
> Cc: linux-renesas-...@vger.kernel.org

The actual code changes looks good.
But if this is the right thing to do I cannot tell.

On this (limited) basis I provide an:
Reviewed-by: Sam Ravnborg 
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno