Re: [Mesa-dev] clover's interface to clang

2020-11-11 Thread Francisco Jerez
I don't remember the specifics of why we ended up interfacing with Clang
this way.  What is technically wrong with it, specifically?  I don't
have any objection to switching to the Driver and Compilation interface,
nor to translating the "-cl-denorms-are-zero" option to whatever the
current option name is so the current Clang interfacing keeps working.

Dave Airlie  writes:

> Hey all (mostly Tom).
>
> I've been learning new things today since Matt pushed a patch to clang
> to remove "-cl-denorms-are-zero" from cc1 options. I thought this was
> a regression or we should hack things to pass a different flag (which
> I did locally for testing), but Matt informed me clover is likely
> interfacing to clang wrong.
>
> The correct way to do things seems to be to build up a set of command
> line args pass those to the Driver, create a Compilation object, with
> jobs, and then execute each job in turns, one of the jobs would be a
> cc1 job and the driver would have all the correct arguments for it.
>
> Now I'll likely dig into this a bit more, but I was wondering if
> anyone knows historically why this wasn't done. I know for example
> with clover we really only want to use a the cc1 pass since at least
> for the NIR backend we just want to emit LLVM bytecode and pass it to
> the spirv conversion, so using the driver might be overkill.
>
> Dave.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Rust drivers in Mesa

2020-10-01 Thread Francisco Jerez
Alyssa Rosenzweig  writes:

> Hi all,
>
> Recently I've been thinking about the potential for the Rust programming
> language in Mesa. Rust bills itself a safe system programming language
> with comparable performance to C [0], which is a naturally fit for
> graphics driver development.
>
> Mesa today is written primarily in C, a notoriously low-level language,
> with some components in C++. To handle the impedance mismatch, we've
> built up a number of abstractions in-tree, including multiple ad hoc
> code generators (GenXML, NIR algebraic passes, Bifrost disassembler). A
> higher level language can help avoid the web of metaprogramming and
> effect code that is simpler and easier to reason about. Similarly, a
> better type system can aid static analysis.
>
> Beyond abstraction, Rust's differentiating feature is the borrow checker
> to guarantee memory safety. Historically, safety has not been a primary
> concern of graphics drivers, since drivers are implemented as regular
> userspace code running in the process of the application calling them.
> Unfortunately, now that OpenGL is being exposed to untrusted code via
> WebGL, the driver does become an attack vector.
>
> For the time being, Mesa attempts to minimize memory bugs with defensive
> programming, safe in-tree abstractions (including ralloc), and static
> analysis via Coverity. Nevertheless, these are all heuristic solutions.
> Static analysis is imperfect and in our case, proprietary software.
> Ideally, the bugs we've been fixing via Coverity could be caught at
> compile-time with a free and open source toolchain.
>
> As Rust would allow exactly this, I see the primary benefit of Rust in
> verifying correctness and robustness, rather than security concerns per
> se.  Indeed, safety guarantees do translate well beyond WebGL.
>
> Practically, how would Rust fit in with our existing C codebase?
> Obviously I'm not suggesting a rewrite of Mesa's more than 15 million
> lines of C. Instead, I see value in introducing Rust in targeted parts
> of the tree. In particular, I envision backend compilers written in part
> in Rust. While creating an idiomatic Rust wrapper for NIR or Gallium
> would be prohibitively costly for now, a backend compiler could be
> written in Rust with IR builders exported for use of the NIR -> backend
> IR translator written in C.
>
> This would have minimal impact on the tree. Users that are not building
> such a driver would be unaffected. For those who _are_ building Rust
> code, the Rust compiler would be added as a build-time dependency and
> the (statically linked) Rust standard library would be added as a
> runtime dependency. There is concern about the Rust compiler requiring
> LLVM as a dependency, but again this is build-time, and no worse than
> Mesa already requiring LLVM as a runtime dependency for llvmpipe and
> clover. As for the standard library, it is possible to eliminate the
> dependency as embedded Rust does, perhaps calling out to the C standard
> library via the FFI, but this is likely quixotic. I do regret the binary
> size increase, however.
>
> Implications for the build system vary. Rust prefers to be built by its
> own package manager, Cargo, which is tricky to integrate with other
> build systems. Actually, Meson has native support for Rust, invoking the
> compiler directly and skipping Cargo, as if it were C code. This support
> is not widely adopted as it prevents linking with external libraries
> ("crates", in Rust parlance), with discussions between Rust and Meson
> developers ending in a stand-still [1]. For Mesa, this might be just
> fine. Our out-of-tree run-time dependencies are minimal for the C code,
> and Rust's standard library largely avoids the need for us to maintain a
> Rust version of util/ in-tree. If this proves impractical in the
> long-term, it is possible to integrate Cargo with Meson on our end [2].
>
> One outstanding concern is build-time, which has been a notorious
> growing pain for Rust due to both language design and LLVM itself [3],
> although there is active work to improve both fronts [4][5]. I build
> Mesa on my Arm laptop, so I suppose I'd be hurt more than many of us.
> There's also awkward bootstrapping questions, but there is work here too
> [6].
>
> If this is of interest, please discuss. It's clear to me Rust is not
> going away any time soon, and I see value in Mesa embracing the new
> technology. I'd like to hear other Mesa developers' thoughts.
>
> Thanks,
>
> Alyssa
>

I fully agree with the memory safety, generic programming and type
system benefits of Rust over C you're talking about, particularly while
writing any minimally complex piece of code like a compiler back-end.  I
have no objection to new back-ends being written in Rust instead of C.
But just saying, most of those benefits can be reasonably achieved with
modern dialects of C++ (modern as in >10 years old), which we already
have hooked up to the build system and doesn't require any additional

[Mesa-dev] [PATCHv2 4/4] anv/gen9: Optimize slice and subslice load balancing behavior.

2019-08-12 Thread Francisco Jerez
See "i965/gen9: Optimize slice and subslice load balancing behavior."
for the rationale.  According to Jason, improves Aztec Ruins
performance by 2.7%.

Reviewed-by: Kenneth Graunke  (v1)

v2: Undo CPU performance micro-optimization done in i965 and iris due
to lack of data justifying it on anv.  Use
cmd_buffer_apply_pipe_flushes wrapper instead of emitting pipe
control command directly.  (Jason)
---
 src/intel/vulkan/anv_genX.h|  4 ++
 src/intel/vulkan/anv_private.h |  6 ++
 src/intel/vulkan/genX_blorp_exec.c |  4 ++
 src/intel/vulkan/genX_cmd_buffer.c | 95 ++
 4 files changed, 109 insertions(+)

diff --git a/src/intel/vulkan/anv_genX.h b/src/intel/vulkan/anv_genX.h
index a5435e566a3..06c6b467acf 100644
--- a/src/intel/vulkan/anv_genX.h
+++ b/src/intel/vulkan/anv_genX.h
@@ -44,6 +44,10 @@ void genX(cmd_buffer_apply_pipe_flushes)(struct 
anv_cmd_buffer *cmd_buffer);
 
 void genX(cmd_buffer_emit_gen7_depth_flush)(struct anv_cmd_buffer *cmd_buffer);
 
+void genX(cmd_buffer_emit_hashing_mode)(struct anv_cmd_buffer *cmd_buffer,
+unsigned width, unsigned height,
+unsigned scale);
+
 void genX(flush_pipeline_select_3d)(struct anv_cmd_buffer *cmd_buffer);
 void genX(flush_pipeline_select_gpgpu)(struct anv_cmd_buffer *cmd_buffer);
 
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 2465f264354..b381386a716 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -2421,6 +2421,12 @@ struct anv_cmd_state {
 
bool conditional_render_enabled;
 
+   /**
+* Last rendering scale argument provided to
+* genX(cmd_buffer_emit_hashing_mode)().
+*/
+   unsigned current_hash_scale;
+
/**
 * Array length is anv_cmd_state::pass::attachment_count. Array content is
 * valid only when recording a render pass instance.
diff --git a/src/intel/vulkan/genX_blorp_exec.c 
b/src/intel/vulkan/genX_blorp_exec.c
index 1592e7f7e3d..239bfb47433 100644
--- a/src/intel/vulkan/genX_blorp_exec.c
+++ b/src/intel/vulkan/genX_blorp_exec.c
@@ -223,6 +223,10 @@ genX(blorp_exec)(struct blorp_batch *batch,
   genX(cmd_buffer_config_l3)(cmd_buffer, cfg);
}
 
+   const unsigned scale = params->fast_clear_op ? UINT_MAX : 1;
+   genX(cmd_buffer_emit_hashing_mode)(cmd_buffer, params->x1 - params->x0,
+  params->y1 - params->y0, scale);
+
 #if GEN_GEN >= 11
/* The PIPE_CONTROL command description says:
 *
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 86ef1663ac4..10da132115c 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -1595,6 +1595,7 @@ genX(CmdExecuteCommands)(
 */
primary->state.current_pipeline = UINT32_MAX;
primary->state.current_l3_config = NULL;
+   primary->state.current_hash_scale = 0;
 
/* Each of the secondary command buffers will use its own state base
 * address.  We need to re-emit state base address for the primary after
@@ -2663,6 +2664,8 @@ genX(cmd_buffer_flush_state)(struct anv_cmd_buffer 
*cmd_buffer)
 
genX(cmd_buffer_config_l3)(cmd_buffer, pipeline->urb.l3_config);
 
+   genX(cmd_buffer_emit_hashing_mode)(cmd_buffer, UINT_MAX, UINT_MAX, 1);
+
genX(flush_pipeline_select_3d)(cmd_buffer);
 
if (vb_emit) {
@@ -3925,6 +3928,98 @@ genX(cmd_buffer_emit_gen7_depth_flush)(struct 
anv_cmd_buffer *cmd_buffer)
}
 }
 
+/**
+ * Update the pixel hashing modes that determine the balancing of PS threads
+ * across subslices and slices.
+ *
+ * \param width Width bound of the rendering area (already scaled down if \p
+ *  scale is greater than 1).
+ * \param height Height bound of the rendering area (already scaled down if \p
+ *   scale is greater than 1).
+ * \param scale The number of framebuffer samples that could potentially be
+ *  affected by an individual channel of the PS thread.  This is
+ *  typically one for single-sampled rendering, but for operations
+ *  like CCS resolves and fast clears a single PS invocation may
+ *  update a huge number of pixels, in which case a finer
+ *  balancing is desirable in order to maximally utilize the
+ *  bandwidth available.  UINT_MAX can be used as shorthand for
+ *  "finest hashing mode available".
+ */
+void
+genX(cmd_buffer_emit_hashing_mode)(struct anv_cmd_buffer *cmd_buffer,
+   unsigned width, unsigned height,
+   unsigned scale)
+{
+#if GEN_GEN == 9
+   const struct gen_device_info *devinfo = _buffer->device->info;
+   const unsigned slice_hashing[] = {
+  /* Because all Gen9 platforms with more than one slice require
+   * three-way subslice hashing, a 

Re: [Mesa-dev] [PATCH 4/4] OPTIONAL: anv/gen9: Optimize slice and subslice load balancing behavior.

2019-08-10 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Sat, Aug 10, 2019 at 2:22 PM Francisco Jerez 
> wrote:
>
>> Jason Ekstrand  writes:
>>
>> > On Fri, Aug 9, 2019 at 7:22 PM Francisco Jerez 
>> > wrote:
>> >
>> >> See "i965/gen9: Optimize slice and subslice load balancing behavior."
>> >> for the rationale.  Marked optional because no performance evaluation
>> >> has been done on this commit, it is provided to match the hashing
>> >> settings of the Iris driver.  Test reports welcome.
>> >>
>> >
>> > Improves Aztec Ruins by 2.7% (tested 5 times in each configuration).  I
>> > also tested Batman: Arkham City but it only reports integer FPS numbers
>> and
>> > no change was noticable.
>> >
>> >
>> >> ---
>> >>  src/intel/vulkan/anv_genX.h|  4 ++
>> >>  src/intel/vulkan/anv_private.h |  6 ++
>> >>  src/intel/vulkan/genX_blorp_exec.c |  6 ++
>> >>  src/intel/vulkan/genX_cmd_buffer.c | 96 ++
>> >>  4 files changed, 112 insertions(+)
>> >>
>> >> diff --git a/src/intel/vulkan/anv_genX.h b/src/intel/vulkan/anv_genX.h
>> >> index a5435e566a3..06c6b467acf 100644
>> >> --- a/src/intel/vulkan/anv_genX.h
>> >> +++ b/src/intel/vulkan/anv_genX.h
>> >> @@ -44,6 +44,10 @@ void genX(cmd_buffer_apply_pipe_flushes)(struct
>> >> anv_cmd_buffer *cmd_buffer);
>> >>
>> >>  void genX(cmd_buffer_emit_gen7_depth_flush)(struct anv_cmd_buffer
>> >> *cmd_buffer);
>> >>
>> >> +void genX(cmd_buffer_emit_hashing_mode)(struct anv_cmd_buffer
>> *cmd_buffer,
>> >> +unsigned width, unsigned
>> height,
>> >> +unsigned scale);
>> >> +
>> >>  void genX(flush_pipeline_select_3d)(struct anv_cmd_buffer *cmd_buffer);
>> >>  void genX(flush_pipeline_select_gpgpu)(struct anv_cmd_buffer
>> *cmd_buffer);
>> >>
>> >> diff --git a/src/intel/vulkan/anv_private.h
>> >> b/src/intel/vulkan/anv_private.h
>> >> index 2465f264354..b381386a716 100644
>> >> --- a/src/intel/vulkan/anv_private.h
>> >> +++ b/src/intel/vulkan/anv_private.h
>> >> @@ -2421,6 +2421,12 @@ struct anv_cmd_state {
>> >>
>> >> bool
>> >>  conditional_render_enabled;
>> >>
>> >> +   /**
>> >> +* Last rendering scale argument provided to
>> >> +* genX(cmd_buffer_emit_hashing_mode)().
>> >> +*/
>> >> +   unsigned current_hash_scale;
>> >> +
>> >> /**
>> >>  * Array length is anv_cmd_state::pass::attachment_count. Array
>> >> content is
>> >>  * valid only when recording a render pass instance.
>> >> diff --git a/src/intel/vulkan/genX_blorp_exec.c
>> >> b/src/intel/vulkan/genX_blorp_exec.c
>> >> index 1592e7f7e3d..e9eedc06696 100644
>> >> --- a/src/intel/vulkan/genX_blorp_exec.c
>> >> +++ b/src/intel/vulkan/genX_blorp_exec.c
>> >> @@ -223,6 +223,12 @@ genX(blorp_exec)(struct blorp_batch *batch,
>> >>genX(cmd_buffer_config_l3)(cmd_buffer, cfg);
>> >> }
>> >>
>> >> +   const unsigned scale = params->fast_clear_op ? UINT_MAX : 1;
>> >> +   if (cmd_buffer->state.current_hash_scale != scale) {
>> >> +  genX(cmd_buffer_emit_hashing_mode)(cmd_buffer, params->x1 -
>> >> params->x0,
>> >> + params->y1 - params->y0,
>> scale);
>> >> +   }
>> >> +
>> >>  #if GEN_GEN >= 11
>> >> /* The PIPE_CONTROL command description says:
>> >>  *
>> >> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
>> >> b/src/intel/vulkan/genX_cmd_buffer.c
>> >> index 86ef1663ac4..e9e5570d49f 100644
>> >> --- a/src/intel/vulkan/genX_cmd_buffer.c
>> >> +++ b/src/intel/vulkan/genX_cmd_buffer.c
>> >> @@ -1595,6 +1595,7 @@ genX(CmdExecuteCommands)(
>> >>  */
>> >> primary->state.current_pipeline = UINT32_MAX;
>> >> primary->state.current_l3_config = NULL;
>> >> +   primary->state.current_hash_scale = 0;
>> >>
>> >> /* Each of the secondary command buffers will use its own state base
&

Re: [Mesa-dev] [PATCH 4/4] OPTIONAL: anv/gen9: Optimize slice and subslice load balancing behavior.

2019-08-10 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Fri, Aug 9, 2019 at 7:22 PM Francisco Jerez 
> wrote:
>
>> See "i965/gen9: Optimize slice and subslice load balancing behavior."
>> for the rationale.  Marked optional because no performance evaluation
>> has been done on this commit, it is provided to match the hashing
>> settings of the Iris driver.  Test reports welcome.
>>
>
> Improves Aztec Ruins by 2.7% (tested 5 times in each configuration).  I
> also tested Batman: Arkham City but it only reports integer FPS numbers and
> no change was noticable.
>
>
>> ---
>>  src/intel/vulkan/anv_genX.h|  4 ++
>>  src/intel/vulkan/anv_private.h |  6 ++
>>  src/intel/vulkan/genX_blorp_exec.c |  6 ++
>>  src/intel/vulkan/genX_cmd_buffer.c | 96 ++
>>  4 files changed, 112 insertions(+)
>>
>> diff --git a/src/intel/vulkan/anv_genX.h b/src/intel/vulkan/anv_genX.h
>> index a5435e566a3..06c6b467acf 100644
>> --- a/src/intel/vulkan/anv_genX.h
>> +++ b/src/intel/vulkan/anv_genX.h
>> @@ -44,6 +44,10 @@ void genX(cmd_buffer_apply_pipe_flushes)(struct
>> anv_cmd_buffer *cmd_buffer);
>>
>>  void genX(cmd_buffer_emit_gen7_depth_flush)(struct anv_cmd_buffer
>> *cmd_buffer);
>>
>> +void genX(cmd_buffer_emit_hashing_mode)(struct anv_cmd_buffer *cmd_buffer,
>> +unsigned width, unsigned height,
>> +unsigned scale);
>> +
>>  void genX(flush_pipeline_select_3d)(struct anv_cmd_buffer *cmd_buffer);
>>  void genX(flush_pipeline_select_gpgpu)(struct anv_cmd_buffer *cmd_buffer);
>>
>> diff --git a/src/intel/vulkan/anv_private.h
>> b/src/intel/vulkan/anv_private.h
>> index 2465f264354..b381386a716 100644
>> --- a/src/intel/vulkan/anv_private.h
>> +++ b/src/intel/vulkan/anv_private.h
>> @@ -2421,6 +2421,12 @@ struct anv_cmd_state {
>>
>> bool
>>  conditional_render_enabled;
>>
>> +   /**
>> +* Last rendering scale argument provided to
>> +* genX(cmd_buffer_emit_hashing_mode)().
>> +*/
>> +   unsigned current_hash_scale;
>> +
>> /**
>>  * Array length is anv_cmd_state::pass::attachment_count. Array
>> content is
>>  * valid only when recording a render pass instance.
>> diff --git a/src/intel/vulkan/genX_blorp_exec.c
>> b/src/intel/vulkan/genX_blorp_exec.c
>> index 1592e7f7e3d..e9eedc06696 100644
>> --- a/src/intel/vulkan/genX_blorp_exec.c
>> +++ b/src/intel/vulkan/genX_blorp_exec.c
>> @@ -223,6 +223,12 @@ genX(blorp_exec)(struct blorp_batch *batch,
>>genX(cmd_buffer_config_l3)(cmd_buffer, cfg);
>> }
>>
>> +   const unsigned scale = params->fast_clear_op ? UINT_MAX : 1;
>> +   if (cmd_buffer->state.current_hash_scale != scale) {
>> +  genX(cmd_buffer_emit_hashing_mode)(cmd_buffer, params->x1 -
>> params->x0,
>> + params->y1 - params->y0, scale);
>> +   }
>> +
>>  #if GEN_GEN >= 11
>> /* The PIPE_CONTROL command description says:
>>  *
>> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
>> b/src/intel/vulkan/genX_cmd_buffer.c
>> index 86ef1663ac4..e9e5570d49f 100644
>> --- a/src/intel/vulkan/genX_cmd_buffer.c
>> +++ b/src/intel/vulkan/genX_cmd_buffer.c
>> @@ -1595,6 +1595,7 @@ genX(CmdExecuteCommands)(
>>  */
>> primary->state.current_pipeline = UINT32_MAX;
>> primary->state.current_l3_config = NULL;
>> +   primary->state.current_hash_scale = 0;
>>
>> /* Each of the secondary command buffers will use its own state base
>>  * address.  We need to re-emit state base address for the primary
>> after
>> @@ -2663,6 +2664,9 @@ genX(cmd_buffer_flush_state)(struct anv_cmd_buffer
>> *cmd_buffer)
>>
>> genX(cmd_buffer_config_l3)(cmd_buffer, pipeline->urb.l3_config);
>>
>> +   if (cmd_buffer->state.current_hash_scale != 1)
>> +  genX(cmd_buffer_emit_hashing_mode)(cmd_buffer, UINT_MAX, UINT_MAX,
>> 1);
>>
>
> Can we do the check against current_hash_scale as an early return inside
> the emit function since that's already where the "current" value is
> updated?  Maybe call it cmd_buffer_config_hashing_mode if "emit" no longer
> seems right with that change?  I don't really care about the rename.
>

That's what my first revision did.  I moved the current_hash_scale check
one level up because the function call overhead was visible in
brw_upload_pipeline_s

[Mesa-dev] [PATCH 3/4] iris/gen9: Optimize slice and subslice load balancing behavior.

2019-08-09 Thread Francisco Jerez
See "i965/gen9: Optimize slice and subslice load balancing behavior."
for the rationale.

Reviewed-by: Kenneth Graunke 
---
 src/gallium/drivers/iris/iris_blorp.c   |  6 ++
 src/gallium/drivers/iris/iris_context.c |  1 +
 src/gallium/drivers/iris/iris_context.h |  3 +
 src/gallium/drivers/iris/iris_genx_protos.h |  4 +
 src/gallium/drivers/iris/iris_state.c   | 96 +
 5 files changed, 110 insertions(+)

diff --git a/src/gallium/drivers/iris/iris_blorp.c 
b/src/gallium/drivers/iris/iris_blorp.c
index 7298e23d23c..7aae5ea7002 100644
--- a/src/gallium/drivers/iris/iris_blorp.c
+++ b/src/gallium/drivers/iris/iris_blorp.c
@@ -307,6 +307,12 @@ iris_blorp_exec(struct blorp_batch *blorp_batch,
 
iris_require_command_space(batch, 1400);
 
+   const unsigned scale = params->fast_clear_op ? UINT_MAX : 1;
+   if (ice->state.current_hash_scale != scale) {
+  genX(emit_hashing_mode)(ice, batch, params->x1 - params->x0,
+  params->y1 - params->y0, scale);
+   }
+
blorp_exec(blorp_batch, params);
 
/* We've smashed all state compared to what the normal 3D pipeline
diff --git a/src/gallium/drivers/iris/iris_context.c 
b/src/gallium/drivers/iris/iris_context.c
index 8710f010ebf..02b74d39619 100644
--- a/src/gallium/drivers/iris/iris_context.c
+++ b/src/gallium/drivers/iris/iris_context.c
@@ -98,6 +98,7 @@ iris_lost_context_state(struct iris_batch *batch)
}
 
ice->state.dirty = ~0ull;
+   ice->state.current_hash_scale = 0;
memset(ice->state.last_grid, 0, sizeof(ice->state.last_grid));
batch->last_surface_base_address = ~0ull;
ice->vtbl.lost_genx_state(ice, batch);
diff --git a/src/gallium/drivers/iris/iris_context.h 
b/src/gallium/drivers/iris/iris_context.h
index f25c91fb317..6237c6f7014 100644
--- a/src/gallium/drivers/iris/iris_context.h
+++ b/src/gallium/drivers/iris/iris_context.h
@@ -726,6 +726,9 @@ struct iris_context {
 
   /** Records the size of variable-length state for INTEL_DEBUG=bat */
   struct hash_table_u64 *sizes;
+
+  /** Last rendering scale argument provided to genX(emit_hashing_mode). */
+  unsigned current_hash_scale;
} state;
 };
 
diff --git a/src/gallium/drivers/iris/iris_genx_protos.h 
b/src/gallium/drivers/iris/iris_genx_protos.h
index 623eb6b4802..16da78d7e9f 100644
--- a/src/gallium/drivers/iris/iris_genx_protos.h
+++ b/src/gallium/drivers/iris/iris_genx_protos.h
@@ -33,6 +33,10 @@ void genX(emit_urb_setup)(struct iris_context *ice,
   struct iris_batch *batch,
   const unsigned size[4],
   bool tess_present, bool gs_present);
+void genX(emit_hashing_mode)(struct iris_context *ice,
+ struct iris_batch *batch,
+ unsigned width, unsigned height,
+ unsigned scale);
 
 /* iris_blorp.c */
 void genX(init_blorp)(struct iris_context *ice);
diff --git a/src/gallium/drivers/iris/iris_state.c 
b/src/gallium/drivers/iris/iris_state.c
index 7932df23e3d..9e255d4cf89 100644
--- a/src/gallium/drivers/iris/iris_state.c
+++ b/src/gallium/drivers/iris/iris_state.c
@@ -5192,6 +5192,9 @@ iris_upload_dirty_render_state(struct iris_context *ice,
   }
}
 
+   if (ice->state.current_hash_scale != 1)
+  genX(emit_hashing_mode)(ice, batch, UINT_MAX, UINT_MAX, 1);
+
/* TODO: Gen8 PMA fix */
 }
 
@@ -6450,6 +6453,99 @@ iris_lost_genx_state(struct iris_context *ice, struct 
iris_batch *batch)
memset(genx->last_index_buffer, 0, sizeof(genx->last_index_buffer));
 }
 
+/**
+ * Update the pixel hashing modes that determine the balancing of PS threads
+ * across subslices and slices.
+ *
+ * \param width Width bound of the rendering area (already scaled down if \p
+ *  scale is greater than 1).
+ * \param height Height bound of the rendering area (already scaled down if \p
+ *   scale is greater than 1).
+ * \param scale The number of framebuffer samples that could potentially be
+ *  affected by an individual channel of the PS thread.  This is
+ *  typically one for single-sampled rendering, but for operations
+ *  like CCS resolves and fast clears a single PS invocation may
+ *  update a huge number of pixels, in which case a finer
+ *  balancing is desirable in order to maximally utilize the
+ *  bandwidth available.  UINT_MAX can be used as shorthand for
+ *  "finest hashing mode available".
+ */
+void
+genX(emit_hashing_mode)(struct iris_context *ice, struct iris_batch *batch,
+unsigned width, unsigned height, unsigned scale)
+{
+#if GEN_GEN == 9
+   const struct gen_device_info *devinfo = >screen->devinfo;
+   const unsigned slice_hashing[] = {
+  /* Because all Gen9 platforms with more than one slice require
+   * three-way subslice hashing, a single "normal" 16x16 slice hashing
+   * block 

[Mesa-dev] [PATCH 4/4] OPTIONAL: anv/gen9: Optimize slice and subslice load balancing behavior.

2019-08-09 Thread Francisco Jerez
See "i965/gen9: Optimize slice and subslice load balancing behavior."
for the rationale.  Marked optional because no performance evaluation
has been done on this commit, it is provided to match the hashing
settings of the Iris driver.  Test reports welcome.
---
 src/intel/vulkan/anv_genX.h|  4 ++
 src/intel/vulkan/anv_private.h |  6 ++
 src/intel/vulkan/genX_blorp_exec.c |  6 ++
 src/intel/vulkan/genX_cmd_buffer.c | 96 ++
 4 files changed, 112 insertions(+)

diff --git a/src/intel/vulkan/anv_genX.h b/src/intel/vulkan/anv_genX.h
index a5435e566a3..06c6b467acf 100644
--- a/src/intel/vulkan/anv_genX.h
+++ b/src/intel/vulkan/anv_genX.h
@@ -44,6 +44,10 @@ void genX(cmd_buffer_apply_pipe_flushes)(struct 
anv_cmd_buffer *cmd_buffer);
 
 void genX(cmd_buffer_emit_gen7_depth_flush)(struct anv_cmd_buffer *cmd_buffer);
 
+void genX(cmd_buffer_emit_hashing_mode)(struct anv_cmd_buffer *cmd_buffer,
+unsigned width, unsigned height,
+unsigned scale);
+
 void genX(flush_pipeline_select_3d)(struct anv_cmd_buffer *cmd_buffer);
 void genX(flush_pipeline_select_gpgpu)(struct anv_cmd_buffer *cmd_buffer);
 
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 2465f264354..b381386a716 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -2421,6 +2421,12 @@ struct anv_cmd_state {
 
bool conditional_render_enabled;
 
+   /**
+* Last rendering scale argument provided to
+* genX(cmd_buffer_emit_hashing_mode)().
+*/
+   unsigned current_hash_scale;
+
/**
 * Array length is anv_cmd_state::pass::attachment_count. Array content is
 * valid only when recording a render pass instance.
diff --git a/src/intel/vulkan/genX_blorp_exec.c 
b/src/intel/vulkan/genX_blorp_exec.c
index 1592e7f7e3d..e9eedc06696 100644
--- a/src/intel/vulkan/genX_blorp_exec.c
+++ b/src/intel/vulkan/genX_blorp_exec.c
@@ -223,6 +223,12 @@ genX(blorp_exec)(struct blorp_batch *batch,
   genX(cmd_buffer_config_l3)(cmd_buffer, cfg);
}
 
+   const unsigned scale = params->fast_clear_op ? UINT_MAX : 1;
+   if (cmd_buffer->state.current_hash_scale != scale) {
+  genX(cmd_buffer_emit_hashing_mode)(cmd_buffer, params->x1 - params->x0,
+ params->y1 - params->y0, scale);
+   }
+
 #if GEN_GEN >= 11
/* The PIPE_CONTROL command description says:
 *
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 86ef1663ac4..e9e5570d49f 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -1595,6 +1595,7 @@ genX(CmdExecuteCommands)(
 */
primary->state.current_pipeline = UINT32_MAX;
primary->state.current_l3_config = NULL;
+   primary->state.current_hash_scale = 0;
 
/* Each of the secondary command buffers will use its own state base
 * address.  We need to re-emit state base address for the primary after
@@ -2663,6 +2664,9 @@ genX(cmd_buffer_flush_state)(struct anv_cmd_buffer 
*cmd_buffer)
 
genX(cmd_buffer_config_l3)(cmd_buffer, pipeline->urb.l3_config);
 
+   if (cmd_buffer->state.current_hash_scale != 1)
+  genX(cmd_buffer_emit_hashing_mode)(cmd_buffer, UINT_MAX, UINT_MAX, 1);
+
genX(flush_pipeline_select_3d)(cmd_buffer);
 
if (vb_emit) {
@@ -3925,6 +3929,98 @@ genX(cmd_buffer_emit_gen7_depth_flush)(struct 
anv_cmd_buffer *cmd_buffer)
}
 }
 
+/**
+ * Update the pixel hashing modes that determine the balancing of PS threads
+ * across subslices and slices.
+ *
+ * \param width Width bound of the rendering area (already scaled down if \p
+ *  scale is greater than 1).
+ * \param height Height bound of the rendering area (already scaled down if \p
+ *   scale is greater than 1).
+ * \param scale The number of framebuffer samples that could potentially be
+ *  affected by an individual channel of the PS thread.  This is
+ *  typically one for single-sampled rendering, but for operations
+ *  like CCS resolves and fast clears a single PS invocation may
+ *  update a huge number of pixels, in which case a finer
+ *  balancing is desirable in order to maximally utilize the
+ *  bandwidth available.  UINT_MAX can be used as shorthand for
+ *  "finest hashing mode available".
+ */
+void
+genX(cmd_buffer_emit_hashing_mode)(struct anv_cmd_buffer *cmd_buffer,
+   unsigned width, unsigned height,
+   unsigned scale)
+{
+#if GEN_GEN == 9
+   const struct gen_device_info *devinfo = _buffer->device->info;
+   const unsigned slice_hashing[] = {
+  /* Because all Gen9 platforms with more than one slice require
+   * three-way subslice hashing, a single "normal" 16x16 slice 

[Mesa-dev] [PATCH 1/4] i965/gen9: Optimize slice and subslice load balancing behavior.

2019-08-09 Thread Francisco Jerez
The default pixel hashing mode settings used for slice and subslice
load balancing are far from optimal under certain conditions (see the
comments below for the gory details).  The top-of-the-line GT4 parts
suffer from a particularly severe performance problem currently due to
a subslice load balancing issue.  Fixing this seems to improve
graphics performance across the board for most of the benchmarks in my
test set, up to ~20% in some cases, e.g. from SKL GT4:

unigine/valley:3.44% ±0.11%
gfxbench/gl_manhattan31:   3.99% ±0.13%
gputest/pixmark_volplosion:8.05% ±0.11%
synmark/OglTexFilterAniso:15.22% ±0.07%
synmark/OglTexMem128: 22.26% ±0.06%

Lower-end platforms are also affected by some subslice load imbalance
to a lesser degree, especially during CCS resolve and fast clear
operations, which are handled especially here due to rasterization
ocurring in reduced CCS coordinates, which changes the semantics of
the pixel hashing mode settings.

No regressions seen during my tests on some SKL, KBL and BXT
configurations.  Additional benchmark reports welcome on any Gen9
platforms (that includes anything with Skylake, Broxton, Kabylake,
Geminilake, Coffeelake, Whiskey Lake, Comet Lake or Amber Lake in your
renderer string).

P.S.: A similar problem is likely to be present on other non-Gen9
  platforms, especially for CCS resolve and fast clear operations.
  Will follow-up with additional patches fixing the hashing mode
  for those once I have enough performance data to justify it.

Reviewed-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_context.h  |  5 ++
 src/mesa/drivers/dri/i965/brw_defines.h  |  5 ++
 src/mesa/drivers/dri/i965/brw_misc_state.c   | 90 
 src/mesa/drivers/dri/i965/brw_state_upload.c |  9 +-
 src/mesa/drivers/dri/i965/genX_blorp_exec.c  |  6 ++
 5 files changed, 109 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 2ac443bf032..17639bf5995 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1219,6 +1219,9 @@ struct brw_context
 
enum gen9_astc5x5_wa_tex_type gen9_astc5x5_wa_tex_mask;
 
+   /** Last rendering scale argument provided to brw_emit_hashing_mode(). */
+   unsigned current_hash_scale;
+
__DRIcontext *driContext;
struct intel_screen *screen;
 };
@@ -1265,6 +1268,8 @@ GLboolean brwCreateContext(gl_api api,
  */
 void brw_workaround_depthstencil_alignment(struct brw_context *brw,
GLbitfield clear_mask);
+void brw_emit_hashing_mode(struct brw_context *brw, unsigned width,
+   unsigned height, unsigned scale);
 
 /* brw_object_purgeable.c */
 void brw_init_object_purgeable_functions(struct dd_function_table *functions);
diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 425f5534110..33d042be869 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1570,6 +1570,11 @@ enum brw_pixel_shader_coverage_mask_mode {
 # define GEN9_SUBSLICE_HASHING_8x4  (2 << 8)
 # define GEN9_SUBSLICE_HASHING_16x16(3 << 8)
 # define GEN9_SUBSLICE_HASHING_MASK_BITS REG_MASK(3 << 8)
+# define GEN9_SLICE_HASHING_NORMAL  (0 << 11)
+# define GEN9_SLICE_HASHING_DISABLED(1 << 11)
+# define GEN9_SLICE_HASHING_32x16   (2 << 11)
+# define GEN9_SLICE_HASHING_32x32   (3 << 11)
+# define GEN9_SLICE_HASHING_MASK_BITS REG_MASK(3 << 11)
 
 /* Predicate registers */
 #define MI_PREDICATE_SRC0   0x2400
diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c 
b/src/mesa/drivers/dri/i965/brw_misc_state.c
index e73cadc5d3e..1291470d479 100644
--- a/src/mesa/drivers/dri/i965/brw_misc_state.c
+++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
@@ -601,6 +601,96 @@ brw_emit_select_pipeline(struct brw_context *brw, enum 
brw_pipeline pipeline)
}
 }
 
+/**
+ * Update the pixel hashing modes that determine the balancing of PS threads
+ * across subslices and slices.
+ *
+ * \param width Width bound of the rendering area (already scaled down if \p
+ *  scale is greater than 1).
+ * \param height Height bound of the rendering area (already scaled down if \p
+ *   scale is greater than 1).
+ * \param scale The number of framebuffer samples that could potentially be
+ *  affected by an individual channel of the PS thread.  This is
+ *  typically one for single-sampled rendering, but for operations
+ *  like CCS resolves and fast clears a single PS invocation may
+ *  update a huge number of pixels, in which case a finer
+ *  balancing is desirable in order to maximally utilize the
+ *  bandwidth available.  UINT_MAX can be used as shorthand for
+ *  "finest hashing mode available".
+ */
+void

[Mesa-dev] [PATCH 2/4] intel/genxml: Add GT_MODE hashing defs for Gen9.

2019-08-09 Thread Francisco Jerez
Reviewed-by: Kenneth Graunke 
---
 src/intel/genxml/gen9.xml | 17 +
 1 file changed, 17 insertions(+)

diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml
index 9df7cd82738..0d037489df9 100644
--- a/src/intel/genxml/gen9.xml
+++ b/src/intel/genxml/gen9.xml
@@ -6477,6 +6477,23 @@
 
   
 
+  
+
+  
+  
+  
+  
+
+
+
+  
+  
+  
+  
+
+
+  
+
   
 
   
-- 
2.22.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] intel/ir: Fix CFG corruption in opt_predicated_break().

2019-07-23 Thread Francisco Jerez
Specifically the optimization of a conditional BREAK + WHILE sequence
into a conditional WHILE seems pretty broken.  The list of successors
of "earlier_block" (where the conditional BREAK was found) is emptied
and then re-created with the same edges for no apparent reason.  On
top of that the list of predecessors of the block immediately after
the WHILE loop is emptied, but only one of the original edges will be
added back, which means that potentially several blocks that still
have it on their list of successors won't be on its list of
predecessors anymore, causing all sorts of hilarity due to the
inconsistency in the control flow graph.

The solution is to remove the code that's removing valid edges from
the CFG.  cfg_t::remove_block() will already clean up after itself.
The assert in bblock_t::combine_with() also needs to be removed since
we will be merging a block with multiple children into the first one
of them.

Found the issue on a hardware enabling branch originally, but
apparently somebody reproduced the same problem independently on
master in the meantime.

Fixes: d13bcdb3a9f ("i965/fs: Extend predicated break pass to predicate WHILE.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111009
Cc: jiradet...@gmail.com
Cc: Sergii Romantsov 
Cc: Matt Turner 
Cc: mesa-sta...@lists.freedesktop.org
---
 src/intel/compiler/brw_cfg.cpp  | 3 ---
 src/intel/compiler/brw_predicated_break.cpp | 6 --
 2 files changed, 9 deletions(-)

diff --git a/src/intel/compiler/brw_cfg.cpp b/src/intel/compiler/brw_cfg.cpp
index 600b428a492..6c40889088d 100644
--- a/src/intel/compiler/brw_cfg.cpp
+++ b/src/intel/compiler/brw_cfg.cpp
@@ -128,9 +128,6 @@ void
 bblock_t::combine_with(bblock_t *that)
 {
assert(this->can_combine_with(that));
-   foreach_list_typed (bblock_link, link, link, >children) {
-  assert(link->block == that);
-   }
foreach_list_typed (bblock_link, link, link, >parents) {
   assert(link->block == this);
}
diff --git a/src/intel/compiler/brw_predicated_break.cpp 
b/src/intel/compiler/brw_predicated_break.cpp
index 607715dace4..e60052f3608 100644
--- a/src/intel/compiler/brw_predicated_break.cpp
+++ b/src/intel/compiler/brw_predicated_break.cpp
@@ -128,14 +128,8 @@ opt_predicated_break(backend_shader *s)
  while_inst->predicate = jump_inst->predicate;
  while_inst->predicate_inverse = !jump_inst->predicate_inverse;
 
- earlier_block->children.make_empty();
- earlier_block->add_successor(s->cfg->mem_ctx, while_block);
-
  assert(earlier_block->can_combine_with(while_block));
  earlier_block->combine_with(while_block);
-
- earlier_block->next()->parents.make_empty();
- earlier_block->add_successor(s->cfg->mem_ctx, earlier_block->next());
   }
 
   progress = true;
-- 
2.22.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 07/15] clover/spirv: Add functions for parsing arguments, linking programs, etc.

2019-05-11 Thread Francisco Jerez
Karol Herbst  writes:

> From: Pierre Moreau 
>
> v2 (Karol Herbst):
>   silence warnings about unhandled enum values
> ---
>  .../clover/spirv/invocation.cpp   | 598 ++
>  .../clover/spirv/invocation.hpp   |  12 +
>  2 files changed, 610 insertions(+)
>
> diff --git a/src/gallium/state_trackers/clover/spirv/invocation.cpp 
> b/src/gallium/state_trackers/clover/spirv/invocation.cpp
> index b874f2f061c..62886e77495 100644
> --- a/src/gallium/state_trackers/clover/spirv/invocation.cpp
> +++ b/src/gallium/state_trackers/clover/spirv/invocation.cpp
> @@ -22,10 +22,24 @@
>  
>  #include "invocation.hpp"
>  
> +#include 

You don't seem to be using this include.

> +#include 
> +#include 
> +#include 
> +#include 
> +
>  #ifdef CLOVER_ALLOW_SPIRV
>  #include 
> +#include 
>  #endif
>  
> +#include "core/error.hpp"
> +#include "core/platform.hpp"
> +#include "invocation.hpp"
> +#include "llvm/util.hpp"
> +#include "pipe/p_state.h"
> +#include "util/algorithm.hpp"
> +#include "util/functional.hpp"
>  #include "util/u_math.h"
>  
>  #include "compiler/spirv/spirv.h"
> @@ -34,6 +48,472 @@ using namespace clover;
>  
>  namespace {
>  
> +   template
> +   T get(const char *source, size_t index) {
> +  const uint32_t *word_ptr = reinterpret_cast(source);
> +  return static_cast(word_ptr[index]);
> +   }
> +
> +   enum module::argument::type
> +   convertStorageClass(SpvStorageClass storage_class, std::string ) {

Use underscores instead of camel case here and below.

> +  switch (storage_class) {
> +  case SpvStorageClassFunction:
> + return module::argument::scalar;
> +  case SpvStorageClassUniformConstant:
> + return module::argument::constant;
> +  case SpvStorageClassWorkgroup:
> + return module::argument::local;
> +  case SpvStorageClassCrossWorkgroup:
> + return module::argument::global;
> +  default:
> + err += "Invalid storage type " + std::to_string(storage_class) + 
> "\n";
> + throw build_error();
> +  }
> +   }
> +
> +   enum module::argument::type
> +   convertImageType(SpvId id, SpvDim dim, SpvAccessQualifier access,
> +std::string ) {
> +#define APPEND_DIM(d) \
> +  switch(access) { \
> +  case SpvAccessQualifierReadOnly: \
> + return module::argument::image##d##_rd; \
> +  case SpvAccessQualifierWriteOnly: \
> + return module::argument::image##d##_wr; \
> +  default: \
> + err += "Unsupported access qualifier " #d " for image " + \
> +std::to_string(id); \

Error message seems broken, #d is the image dimensionality, not the
access qualifier.

> + throw build_error(); \
> +  }
> +
> +  switch (dim) {
> +  case SpvDim2D:
> + APPEND_DIM(2d)

I was shortly confused about whether this could fall through.  It would
be cleaner to do this with a single if/else ladder like
clover/llvm/codegen/common.cpp.  You would only be able to output a
single error message though (e.g. "Unsupported qualifiers $DIM $ACCESS
on image $ID"), but that seems good enough to me (do we even need an
error message here instead of an assertion failure?).

> +  case SpvDim3D:
> + APPEND_DIM(3d)
> +  default:
> + err += "Unsupported dimension " + std::to_string(dim) + " for image 
> " +
> +std::to_string(id);
> + throw build_error();
> +  }
> +
> +#undef APPEND_DIM
> +   }
> +
> +   module::section
> +   make_text_section(const std::vector ,
> + enum module::section::type section_type) {
> +  const pipe_llvm_program_header header { uint32_t(code.size()) };
> +  module::section text { 0, section_type, header.num_bytes, {} };
> +
> +  text.data.insert(text.data.end(), reinterpret_cast *>(),
> +   reinterpret_cast() + 
> sizeof(header));
> +  text.data.insert(text.data.end(), code.begin(), code.end());
> +
> +  return text;
> +   }
> +
> +   module
> +   create_module_from_spirv(const std::vector ,
> +size_t pointer_byte_size,
> +std::string ) {
> +  const size_t length = source.size() / sizeof(uint32_t);
> +  size_t i = 5u; // Skip header

It would be nice if there was a define for this, but okay...

> +
> +  std::string kernel_name;
> +  size_t kernel_nb = 0u;
> +  std::vector args;
> +
> +  module m;
> +
> +  std::unordered_map kernels;
> +  std::unordered_map types;
> +  std::unordered_map pointer_types;
> +  std::unordered_map constants;
> +  std::unordered_set packed_structures;
> +  std::unordered_map>
> + func_param_attr_map;
> +
> +#define GET_OPERAND(type, operand_id) get(source.data(), i + 
> operand_id)
> +
> +  while (i < length) {

You could declare a 'const auto inst = [i * sizeof(uint32_t)]'
pointer here, then do 'get(inst, j)' instead of 'GET_OPERAND(T, j)',
and get rid of 

Re: [Mesa-dev] [PATCH 06/15] clover/spirv: Add functions for validating SPIR-V binaries

2019-05-11 Thread Francisco Jerez
_msg(spv_message_level_t level, const char * /* source */,
> +const spv_position_t , const char *message) 
> {
> +  auto const level_to_string = [](spv_message_level_t level){

"const auto" would be more idiosyncratic.

> + switch (level) {
> +case SPV_MSG_FATAL:
> +   return std::string("Fatal");
> +case SPV_MSG_INTERNAL_ERROR:
> +   return std::string("Internal error");
> +case SPV_MSG_ERROR:
> +   return std::string("Error");
> +case SPV_MSG_WARNING:
> +   return std::string("Warning");
> +case SPV_MSG_INFO:
> +   return std::string("Info");
> +case SPV_MSG_DEBUG:
> +   return std::string("Debug");
> + }
> + return std::string();
> +  };

You could just use a 'level == X ? "Y" : ...' ladder here instead since
the 'level' argument is already known at the point the level_to_string
function is defined.

> +  return "[" + level_to_string(level) + "] At word No." +
> + std::to_string(position.index) + ": \"" + message + "\"\n";
> +   }
> +
> +   spv_target_env
> +   convert_opencl_str_to_target_env(const std::string _version) {
> +  if (opencl_version == "2.2") {
> + return SPV_ENV_OPENCL_2_2;
> +  } else if (opencl_version == "2.1") {
> + return SPV_ENV_OPENCL_2_1;
> +  } else if (opencl_version == "2.0") {
> + return SPV_ENV_OPENCL_2_0;
> +  } else if (opencl_version == "1.2" ||
> + opencl_version == "1.1" ||
> + opencl_version == "1.0") {
> + // SPIR-V is only defined for OpenCL >= 1.2, however some drivers
> + // might use it with OpenCL 1.0 and 1.1.
> + return SPV_ENV_OPENCL_1_2;
> +  } else {
> + throw build_error("Invalid OpenCL version");
> +  }
> +   }
> +#endif
> +
> +}
> +
> +bool
> +clover::spirv::is_binary_spirv(const char *il, size_t length)
> +{
> +   const uint32_t *binary = reinterpret_cast(il);
> +
> +   // A SPIR-V binary is at the very least 5 32-bit words, which represent 
> the
> +   // SPIR-V header.
> +   if (length < 20u)
> +  return false;
> +
> +   const uint32_t first_word = binary[0u];
> +   return (first_word == SpvMagicNumber) ||
> +  (util_bswap32(first_word) == SpvMagicNumber);
> +}
> +

This function seems like dead code.  Maybe drop it?  Then you'll be able
to use a single #ifdef preprocessor conditional to guard the whole file.

> +#ifdef CLOVER_ALLOW_SPIRV
> +bool
> +clover::spirv::is_valid_spirv(const uint32_t *binary, size_t length,
> +  const std::string _version,
> +  const context::notify_action ) {

We don't need an extra level of call-backs here, you can pass a
'std::string _log' reference and append the errors there, just like
the LLVM codepaths do too return error strings.

Other than that looks good:

Reviewed-by: Francisco Jerez 

> +   auto const validator_consumer =
> +  [](spv_message_level_t level, const char *source,
> +const spv_position_t , const char *message) {
> +  if (!notify)
> + return;
> +
> +  const std::string log = format_validator_msg(level, source, position,
> +   message);
> +  notify(log.c_str());
> +   };
> +
> +   const spv_target_env target_env =
> +  convert_opencl_str_to_target_env(opencl_version);
> +   spvtools::SpirvTools spvTool(target_env);
> +   spvTool.SetMessageConsumer(validator_consumer);
> +
> +   return spvTool.Validate(binary, length);
> +}
> +#else
> +bool
> +clover::spirv::is_valid_spirv(const uint32_t * /*binary*/, size_t /*length*/,
> +  const std::string &/*opencl_version*/,
> +  const context::notify_action &/*notify*/) {
> +   return false;
> +}
> +#endif
> diff --git a/src/gallium/state_trackers/clover/spirv/invocation.hpp 
> b/src/gallium/state_trackers/clover/spirv/invocation.hpp
> new file mode 100644
> index 000..92255a68a8e
> --- /dev/null
> +++ b/src/gallium/state_trackers/clover/spirv/invocation.hpp
> @@ -0,0 +1,47 @@
> +//
> +// Copyright 2018 Pierre Moreau
> +//
> +// Permission is hereby granted, free of charge, to any person obtaining a
> +// copy of this software and associated documentation files (the "Software"),
> +// to deal in the So

Re: [Mesa-dev] [PATCH 05/15] meson: Check for SPIRV-Tools and llvm-spirv

2019-05-11 Thread Francisco Jerez
Karol Herbst  writes:

> From: Pierre Moreau 
>
> Changes since:
> * v11 (Karol Herbst):
>   - only set new defines for clover to speed up recompilation
>   - remove autotools
> * v10:
>   - Add a new flag (`--enable-opencl-spirv` for autotools, and
> `-Dopencl-spirv=true` for meson) for enabling SPIR-V support in
> clover, and never automagically enable it without that flag. (Dylan Baker)
>   - When enabling the SPIR-V support, the SPIRV-Tools and
> SPIRV-LLVM-Translator libraries are now required dependencies.
> * v7:
>   - Properly align LLVMSPIRVLib comment (Dylan Baker)
>   - Only define CLOVER_ALLOW_SPIRV when **both** dependencies are found:
> autotools was only requiring one or the other.
> * v6: Replace the llvm-spirv repository by the new official
>   SPIRV-LLVM-Translator.
> * v4: Add a comment saying where to find llvm-spirv (Karol Herbst).
> * v3:
>   - make SPIRV-Tools and llvm-spirv optional (Francisco Jerez);
>   - bump requirement for llvm-spirv to version 0.2
> * v2:
>   - Bump the required version of SPIRV-Tools to the latest release;
>   - Add a dependency on llvm-spirv.
>
> Reviewed-by: Dylan Baker  (v10)
> Reviewed-by: Karol Herbst 
> ---
>  meson.build   | 13 +
>  meson_options.txt |  6 ++
>  src/gallium/state_trackers/clover/meson.build |  9 +++--
>  3 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/meson.build b/meson.build
> index 2cefbb3f204..dba9f35b28b 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -693,6 +693,16 @@ if _opencl != 'disabled'
>with_gallium_opencl = true
>with_opencl_icd = _opencl == 'icd'
>  
> +  with_opencl_spirv = get_option('opencl-spirv')
> +  if with_opencl_spirv
> +dep_spirv_tools = dependency('SPIRV-Tools', required : true, version : 
> '>= 2018.0')
> +# LLVMSPIRVLib is available at 
> https://github.com/KhronosGroup/SPIRV-LLVM-Translator
> +dep_llvmspirvlib = dependency('LLVMSPIRVLib', required : true, version : 
> '>= 0.2.1')
> +  else
> +dep_spirv_tools = null_dep
> +dep_llvmspirvlib = null_dep
> +  endif
> +
>if host_machine.cpu_family().startswith('ppc') and cpp.compiles('''
>#if !defined(__VEC__) || !defined(__ALTIVEC__)
>#error "AltiVec not enabled"
> @@ -702,8 +712,11 @@ if _opencl != 'disabled'
>endif
>  else
>dep_clc = null_dep
> +  dep_spirv_tools = null_dep
> +  dep_llvmspirvlib = null_dep
>with_gallium_opencl = false
>with_opencl_icd = false
> +  with_opencl_spirv = false
>  endif
>  
>  gl_pkgconfig_c_flags = []
> diff --git a/meson_options.txt b/meson_options.txt
> index 1f72faabee8..00f2e7bc949 100644
> --- a/meson_options.txt
> +++ b/meson_options.txt
> @@ -142,6 +142,12 @@ option(
>value : 'disabled',
>description : 'build gallium "clover" OpenCL state tracker.',
>  )
> +option(
> +  'opencl-spirv',
> +  type : 'boolean',
> +  value : false,
> +  description : 'build gallium "clover" OpenCL state tracker with SPIR-V 
> binary support.',
> +)
>  option(
>'d3d-drivers-path',
>type : 'string',
> diff --git a/src/gallium/state_trackers/clover/meson.build 
> b/src/gallium/state_trackers/clover/meson.build
> index 2ff060bf35b..311dcb69a6b 100644
> --- a/src/gallium/state_trackers/clover/meson.build
> +++ b/src/gallium/state_trackers/clover/meson.build
> @@ -19,12 +19,17 @@
>  # SOFTWARE.
>  
>  clover_cpp_args = []
> +clover_spirv_cpp_args = []
>  clover_incs = [inc_include, inc_src, inc_gallium, inc_gallium_aux]
>  
>  if with_opencl_icd
>clover_cpp_args += '-DHAVE_CLOVER_ICD'
>  endif
>  
> +if with_opencl_spirv
> +  clover_spirv_cpp_args += '-DCLOVER_ALLOW_SPIRV'

Maybe name this HAVE_CLOVER_SPIRV for consistency with the other
preprocessor defines.  With that fixed:

Reviewed-by: Francisco Jerez 

> +endif
> +
>  libclllvm = static_library(
>'clllvm',
>files(
> @@ -40,7 +45,7 @@ libclllvm = static_library(
>),
>include_directories : clover_incs,
>cpp_args : [
> -cpp_vis_args,
> +clover_spirv_cpp_args, cpp_vis_args,
>  
> '-DLIBCLC_INCLUDEDIR="@0@/"'.format(dep_clc.get_pkgconfig_variable('includedir')),
>  
> '-DLIBCLC_LIBEXECDIR="@0@/"'.format(dep_clc.get_pkgconfig_variable('libexecdir')),
>  '-DCLANG_RESOURCE_DIR="@0@"'.format(join_paths(
> @@ -111,7 +116,7 @@ libclover = static_library(
>'clover',
>[clover_files, sha1_h],
>include_directories : clover_incs,
> -  cpp_args : [clover_cpp_args, cpp_vis_args],
> +  cpp_args : [clover_spir

Re: [Mesa-dev] [PATCH 11/15] rename pipe_llvm_program_header to pipe_binary_program_header

2019-05-11 Thread Francisco Jerez
Karol Herbst  writes:

> We want to use it for other formats as well, so give it a more generic name
>
> Signed-off-by: Karol Herbst 
> Reviewed-by: Francisco Jerez 
> ---
>  src/gallium/drivers/r600/evergreen_compute.c  | 2 +-
>  src/gallium/drivers/radeonsi/si_compute.c | 2 +-
>  src/gallium/include/pipe/p_state.h| 2 +-
>  src/gallium/state_trackers/clover/llvm/codegen/common.cpp | 2 +-
>  src/gallium/state_trackers/clover/spirv/invocation.cpp| 4 ++--
>  5 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
> b/src/gallium/drivers/r600/evergreen_compute.c
> index 34e5755696f..2f4d84405db 100644
> --- a/src/gallium/drivers/r600/evergreen_compute.c
> +++ b/src/gallium/drivers/r600/evergreen_compute.c
> @@ -410,7 +410,7 @@ static void *evergreen_create_compute_state(struct 
> pipe_context *ctx,
>   struct r600_context *rctx = (struct r600_context *)ctx;
>   struct r600_pipe_compute *shader = CALLOC_STRUCT(r600_pipe_compute);
>  #ifdef HAVE_OPENCL
> - const struct pipe_llvm_program_header *header;
> + const struct pipe_binary_program_header *header;
>   void *p;
>   boolean use_kill;
>  #endif
> diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
> b/src/gallium/drivers/radeonsi/si_compute.c
> index ae10709f2f1..72fc3a197e7 100644
> --- a/src/gallium/drivers/radeonsi/si_compute.c
> +++ b/src/gallium/drivers/radeonsi/si_compute.c
> @@ -232,7 +232,7 @@ static void *si_create_compute_state(
>   >compiler_ctx_state,
>   program, 
> si_create_compute_state_async);
>   } else {
> - const struct pipe_llvm_program_header *header;
> + const struct pipe_binary_program_header *header;
>   header = cso->prog;
>  
>   ac_elf_read(header->blob, header->num_bytes, 
> >shader.binary);
> diff --git a/src/gallium/include/pipe/p_state.h 
> b/src/gallium/include/pipe/p_state.h
> index 27350091b82..c94dfb0ba78 100644
> --- a/src/gallium/include/pipe/p_state.h
> +++ b/src/gallium/include/pipe/p_state.h
> @@ -881,7 +881,7 @@ struct pipe_grid_info
>  /**
>   * Structure used as a header for serialized LLVM programs.

Don't forget to update the comment here and below.

>   */
> -struct pipe_llvm_program_header
> +struct pipe_binary_program_header
>  {
> uint32_t num_bytes; /**< Number of bytes in the LLVM bytecode program. */
> char blob[];
> diff --git a/src/gallium/state_trackers/clover/llvm/codegen/common.cpp 
> b/src/gallium/state_trackers/clover/llvm/codegen/common.cpp
> index 98a9d5ffb5e..3879fb61a02 100644
> --- a/src/gallium/state_trackers/clover/llvm/codegen/common.cpp
> +++ b/src/gallium/state_trackers/clover/llvm/codegen/common.cpp
> @@ -177,7 +177,7 @@ namespace {
>  
> module::section
> make_text_section(const std::vector ) {
> -  const pipe_llvm_program_header header { uint32_t(code.size()) };
> +  const pipe_binary_program_header header { uint32_t(code.size()) };
>module::section text { 0, module::section::text_executable,
>   header.num_bytes, {} };
>  
> diff --git a/src/gallium/state_trackers/clover/spirv/invocation.cpp 
> b/src/gallium/state_trackers/clover/spirv/invocation.cpp
> index 2fd5a876a32..5f71e94bf42 100644
> --- a/src/gallium/state_trackers/clover/spirv/invocation.cpp
> +++ b/src/gallium/state_trackers/clover/spirv/invocation.cpp
> @@ -103,7 +103,7 @@ namespace {
> module::section
> make_text_section(const std::vector ,
>   enum module::section::type section_type) {
> -  const pipe_llvm_program_header header { uint32_t(code.size()) };
> +  const pipe_binary_program_header header { uint32_t(code.size()) };
>module::section text { 0, section_type, header.num_bytes, {} };
>  
>text.data.insert(text.data.end(), reinterpret_cast *>(),
> @@ -649,7 +649,7 @@ clover::spirv::link_program(const std::vector 
> ,
>   assert(false);
>}
>  
> -  const auto c_il = ((struct 
> pipe_llvm_program_header*)msec->data.data())->blob;
> +  const auto c_il = ((struct 
> pipe_binary_program_header*)msec->data.data())->blob;
>const auto length = msec->size;
>  
>sections.push_back(reinterpret_cast(c_il));
> -- 
> 2.21.0


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/5] intel/fs: Lower integer multiply correctly when destination stride equals 4.

2019-04-29 Thread Francisco Jerez
Francisco Jerez  writes:

> Because the "low" temporary needs to be accessed with word type and
> twice the original stride, attempting to preserve the alignment of the
> original destination can potentially lead to instructions with illegal
> destination stride greater than four.  Because the CHV/BXT alignment
> restrictions are now being enforced by the regioning lowering pass run
> after lower_integer_multiplication(), there is no real need to
> preserve the original strides anymore.
>
> Note that this bug can be reproduced on stable branches, but
> back-porting would be non-trivial, because the fix relies on the
> regioning lowering pass recently introduced.

This and PATCH 3 of this series should have landed with a:

Cc: 19.0 

The paragraph above recommending against their inclusion in stable
branches was referring to the *then* stable branches when the patch was
written, a couple of weeks before the 19.0 branch point.  Unfortunately
the series landed after the branchpoint and patches 2 and 3 missed the
19.0 branch they were intended for.  Any chance we could get these into
the next 19.0 stable release?

Thanks to Tapani for realizing the problem!

> ---
>  src/intel/compiler/brw_fs.cpp | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index f475b617df2..5768e0d6542 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -3962,13 +3962,11 @@ fs_visitor::lower_integer_multiplication()
>  regions_overlap(inst->dst, inst->size_written,
>  inst->src[0], inst->size_read(0)) ||
>  regions_overlap(inst->dst, inst->size_written,
> -inst->src[1], inst->size_read(1))) {
> +inst->src[1], inst->size_read(1)) ||
> +inst->dst.stride >= 4) {
> needs_mov = true;
> -   /* Get a new VGRF but keep the same stride as inst->dst */
> low = fs_reg(VGRF, alloc.allocate(regs_written(inst)),
>  inst->dst.type);
> -   low.stride = inst->dst.stride;
> -   low.offset = inst->dst.offset % REG_SIZE;
>  }
>  
>  /* Get a new VGRF but keep the same stride as inst->dst */
> -- 
> 2.19.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v7] intel/compiler: validate region restrictions for mixed float mode

2019-04-17 Thread Francisco Jerez
RESS_DIRECT),
> +"Indirect addressing on source is not supported when source and "
> +"destination data types are mixed float");
> +
> +   /* From the SKL PRM, Special Restrictions for Handling Mixed Mode
> +* Float Operations:
> +*
> +*"No SIMD16 in mixed mode when destination is f32. Instruction
> +* execution size must be no more than 8."
> +*/
> +   ERROR_IF(exec_size > 8 && dst_type == BRW_REGISTER_TYPE_F,
> +"Mixed float mode with 32-bit float destination is limited "
> +"to SIMD8");
> +
> +   if (is_align16) {
> +  /* From the SKL PRM, Special Restrictions for Handling Mixed Mode
> +   * Float Operations:
> +   *
> +   *   "In Align16 mode, when half float and float data types are mixed
> +   *between source operands OR between source and destination 
> operands,
> +   *the register content are assumed to be packed."
> +   *
> +   * Since Align16 doesn't have a concept of horizontal stride (or 
> width),
> +   * it means that vertical stride must always be 4, since 0 and 2 would
> +   * lead to replicated data, and any other value is disallowed in 
> Align16.
> +   * However, the PRM also says:
> +   *
> +   *   "In Align16, vertical stride can never be zero for f16"
> +   *
> +   * Which is oddly redundant and specific considering the more general
> +   * assumption that all operands are assumed to be packed, so we
> +   * understand that this might be hinting that there may be an exception
> +   * for f32 operands with a vstride of 0, so we don't validate this for
> +   * them while we don't have empirical evidence that it is forbidden.
> +   *
> +   *"Math operations for mixed mode:
> +   * - In Align16, only packed format is supported"
> +   *
> +   * It is not clear what this is restricting since as stated in previous
> +   * spec quotes, Align16 always assumes packed data. However, since
> +   * we are allowing vstride of 0 on f32, we check again here without 
> that
> +   * exception.
> +

The comment text from "However, the PRM also says" till here seems
obsolete by your last changes.  Please remove it.

With that fixed:

Reviewed-by: Francisco Jerez 

I'm guessing that's all the reviews you needed on this series?

> +   */
> +  ERROR_IF(brw_inst_src0_vstride(devinfo, inst) != BRW_VERTICAL_STRIDE_4,
> +   "Align16 mixed float mode assumes packed data (vstride must 
> be 4");
> +
> +  ERROR_IF(num_sources >= 2 &&
> +   brw_inst_src1_vstride(devinfo, inst) != BRW_VERTICAL_STRIDE_4,
> +   "Align16 mixed float mode assumes packed data (vstride must 
> be 4");
> +
> +  /* From the SKL PRM, Special Restrictions for Handling Mixed Mode
> +   * Float Operations:
> +   *
> +   *   "For Align16 mixed mode, both input and output packed f16 data
> +   *must be oword aligned, no oword crossing in packed f16."
> +   *
> +   * The previous rule requires that Align16 operands are always packed,
> +   * and since there is only one bit for Align16 subnr, which represents
> +   * offsets 0B and 16B, this rule is always enforced and we don't need 
> to
> +   * validate it.
> +   */
> +
> +  /* From the SKL PRM, Special Restrictions for Handling Mixed Mode
> +   * Float Operations:
> +   *
> +   *"No SIMD16 in mixed mode when destination is packed f16 for both
> +   * Align1 and Align16."
> +   *
> +   * And:
> +   *
> +   *   "In Align16 mode, when half float and float data types are mixed
> +   *between source operands OR between source and destination 
> operands,
> +   *the register content are assumed to be packed."
> +   *
> +   * Which implies that SIMD16 is not available in Align16. This is 
> further
> +   * confirmed by:
> +   *
> +   *"For Align16 mixed mode, both input and output packed f16 data
> +   * must be oword aligned, no oword crossing in packed f16"
> +   *
> +   * Since oword-aligned packed f16 data would cross oword boundaries 
> when
> +   * the execution size is larger than 8.
> +   */
> +  ERROR_IF(exec_size > 8, "Align16 mixed float mode is limited to 
> SIMD8");
> +
> +  /* From the SKL PRM, Special Restrictions for Handling Mixed Mode
> +   * Float Operations:
> +   *
> +   *"No accumulator read acces

Re: [Mesa-dev] [PATCH v4 00/40] intel: VK_KHR_shader_float16_int8 implementation

2019-04-13 Thread Francisco Jerez
Jason Ekstrand  writes:

> Quick status check.  Mesa 19.1 is supposed to branch in two weeks.  Are we
> about ready to land this?
>

Seems pretty close to ready to me...


> On Mon, Mar 25, 2019 at 11:13 AM Juan A. Suarez Romero 
> wrote:
>
>> On Fri, 2019-03-22 at 17:53 +0100, Iago Toral wrote:
>> > Yes, I think those should be fine to land now, they are very few
>> > actually. Jason, any objections?
>> >
>>
>> Pushed:
>> - [PATCH v4 10/40] compiler/nir: add lowering option for 16-bit fmod
>> - [PATCH v4 11/40] compiler/nir: add lowering for 16-bit flrp
>> - [PATCH v4 12/40] compiler/nir: add lowering for 16-bit ldexp
>>
>> J.A.
>>
>> > Iago
>> >
>> > On Fri, 2019-03-22 at 17:26 +0100, Samuel Pitoiset wrote:
>> > > Can you eventually merge all NIR patches now? We should be able to
>> > > hook
>> > > up that extension for RADV quite soon.
>> > >
>> > > On 2/12/19 12:55 PM, Iago Toral Quiroga wrote:
>> > > > The changes in this version address review feedback to v3. The most
>> > > > significant
>> > > > changes include:
>> > > >
>> > > > 1. A more generic constant combining pass that can handle more
>> > > > constant types (not just F and HF) requested by Jason.
>> > > >
>> > > > 2. The addition of assembly validation for half-float restrictions,
>> > > > and also
>> > > > for mixed float mode, requested by Curro. It should be noted that
>> > > > this
>> > > > implementation of VK_KHR_shader_float16_int8 does not emit any
>> > > > mixed mode float
>> > > > instructions at this moment so I have not empirically validated the
>> > > > restictions
>> > > > implemented here.
>> > > >
>> > > > As always, a branch with these patches is available for testing in
>> > > > the
>> > > > itoral/VK_KHR_shader_float16_int8 branch of the Igalia Mesa
>> > > > repository at
>> > > > https://github.com/Igalia/mesa.
>> > > >
>> > > > Iago Toral Quiroga (40):
>> > > >compiler/nir: add an is_conversion field to nir_op_info
>> > > >intel/compiler: add a NIR pass to lower conversions
>> > > >intel/compiler: split float to 64-bit opcodes from int to 64-bit
>> > > >intel/compiler: handle b2i/b2f with other integer conversion
>> > > > opcodes
>> > > >intel/compiler: assert restrictions on conversions to half-float
>> > > >intel/compiler: lower some 16-bit float operations to 32-bit
>> > > >intel/compiler: handle extended math restrictions for half-float
>> > > >intel/compiler: implement 16-bit fsign
>> > > >intel/compiler: drop unnecessary temporary from 32-bit fsign
>> > > >  implementation
>> > > >compiler/nir: add lowering option for 16-bit fmod
>> > > >compiler/nir: add lowering for 16-bit flrp
>> > > >compiler/nir: add lowering for 16-bit ldexp
>> > > >intel/compiler: add instruction setters for Src1Type and
>> > > > Src2Type.
>> > > >intel/compiler: add new half-float register type for 3-src
>> > > >  instructions
>> > > >intel/compiler: don't compact 3-src instructions with Src1Type
>> > > > or
>> > > >  Src2Type bits
>> > > >intel/compiler: allow half-float on 3-source instructions since
>> > > > gen8
>> > > >intel/compiler: set correct precision fields for 3-source float
>> > > >  instructions
>> > > >intel/compiler: fix ddx and ddy for 16-bit float
>> > > >intel/compiler: fix ddy for half-float in Broadwell
>> > > >intel/compiler: workaround for SIMD8 half-float MAD in gen8
>> > > >intel/compiler: split is_partial_write() into two variants
>> > > >intel/compiler: activate 16-bit bit-size lowerings also for 8-
>> > > > bit
>> > > >intel/compiler: rework conversion opcodes
>> > > >intel/compiler: implement isign for int8
>> > > >intel/compiler: ask for an integer type if requesting an 8-bit
>> > > > type
>> > > >intel/eu: force stride of 2 on NULL register for Byte
>> > > > instructions
>> > > >intel/compiler: generalize the combine constants pass
>> > > >intel/compiler: implement is_zero, is_one, is_negative_one for
>> > > >  8-bit/16-bit
>> > > >intel/compiler: add a brw_reg_type_is_integer helper
>> > > >intel/compiler: fix cmod propagation for non 32-bit types
>> > > >intel/compiler: remove inexact algebraic optimizations from the
>> > > >  backend
>> > > >intel/compiler: skip MAD algebraic optimization for half-float
>> > > > or
>> > > >  mixed mode
>> > > >intel/compiler: also set F execution type for mixed float mode
>> > > > in BDW
>> > > >intel/compiler: validate region restrictions for half-float
>> > > >  conversions
>> > > >intel/compiler: validate conversions between 64-bit and 8-bit
>> > > > types
>> > > >intel/compiler: skip validating restrictions on operand types
>> > > > for
>> > > >  mixed float
>> > > >intel/compiler: validate region restrictions for mixed float
>> > > > mode
>> > > >compiler/spirv: move the check for Int8 capability
>> > > >anv/pipeline: support Float16 and Int8 SPIR-V capabilities in
>> > 

Re: [Mesa-dev] [PATCH v6 32/35] intel/compiler: validate region restrictions for mixed float mode

2019-04-13 Thread Francisco Jerez
"Juan A. Suarez Romero"  writes:

> On Wed, 2019-04-10 at 17:13 -0700, Francisco Jerez wrote:
>> "Juan A. Suarez Romero"  writes:
>> 
>> > From: Iago Toral Quiroga 
>> > 
>> > v2:
>> >  - Adapted unit tests to make them consistent with the changes done
>> >to the validation of half-float conversions.
>> > 
>> > v3 (Curro):
>> > - Check all the accummulators
>> > - Constify declarations
>> > - Do not check src1 type in single-source instructions.
>> > - Check for all instructions that read accumulator (either implicitly or
>> >   explicitly)
>> > - Check restrictions in src1 too.
>> > - Merge conditional block
>> > - Add invalid test case.
>> > ---
>> >  src/intel/compiler/brw_eu_validate.c| 290 +++
>> >  src/intel/compiler/test_eu_validate.cpp | 631 
>> >  2 files changed, 921 insertions(+)
>> > 
>> > diff --git a/src/intel/compiler/brw_eu_validate.c 
>> > b/src/intel/compiler/brw_eu_validate.c
>> > index cfaf126e2f5..4a735641c86 100644
>> > --- a/src/intel/compiler/brw_eu_validate.c
>> > +++ b/src/intel/compiler/brw_eu_validate.c
>> > @@ -170,6 +170,20 @@ src1_is_null(const struct gen_device_info *devinfo, 
>> > const brw_inst *inst)
>> >brw_inst_src1_da_reg_nr(devinfo, inst) == BRW_ARF_NULL;
>> >  }
>> >  
>> > +static bool
>> > +src0_is_acc(const struct gen_device_info *devinfo, const brw_inst *inst)
>> > +{
>> > +   return brw_inst_src0_reg_file(devinfo, inst) == 
>> > BRW_ARCHITECTURE_REGISTER_FILE &&
>> > +  (brw_inst_src0_da_reg_nr(devinfo, inst) & 0xF0) == 
>> > BRW_ARF_ACCUMULATOR;
>> > +}
>> > +
>> > +static bool
>> > +src1_is_acc(const struct gen_device_info *devinfo, const brw_inst *inst)
>> > +{
>> > +   return brw_inst_src1_reg_file(devinfo, inst) == 
>> > BRW_ARCHITECTURE_REGISTER_FILE &&
>> > +  (brw_inst_src1_da_reg_nr(devinfo, inst) & 0xF0) == 
>> > BRW_ARF_ACCUMULATOR;
>> > +}
>> > +
>> >  static bool
>> >  src0_is_grf(const struct gen_device_info *devinfo, const brw_inst *inst)
>> >  {
>> > @@ -275,6 +289,27 @@ sources_not_null(const struct gen_device_info 
>> > *devinfo,
>> > return error_msg;
>> >  }
>> >  
>> > +static bool
>> > +inst_uses_src_acc(const struct gen_device_info *devinfo, const brw_inst 
>> > *inst)
>> > +{
>> > +   /* Check instructions that use implicit accumulator sources */
>> > +   switch (brw_inst_opcode(devinfo, inst)) {
>> > +   case BRW_OPCODE_MAC:
>> > +   case BRW_OPCODE_MACH:
>> > +   case BRW_OPCODE_SADA2:
>> > +  return true;
>> > +   }
>> > +
>> > +   /* Instructions with three source operands cannot use explicit 
>> > accumulator
>> > +* operands.
>> > +*/
>> 
>> They can on Gen10+.  Yeah, I know, it's quite a pain to have to
>> special-case 3src instructions everywhere in the validator code...
>
>
> Checking other parts of code, I'll assume that srcN_is_acc() should return 
> false
> for align16 mode; at least in them there're assertions that in this mode srcs
> can only be GRF.
>

Yes, only Align1 3-source instructions on Gen10 and greater can use the
accumulator explicitly.

> OTOH, is it worth to handle here the case for 3src instructions allowing
> explicit accumulator? If other parts of drive asume this is not possible, I
> understand it would be better to fix this in all the code in a separate 
> patchset
> (not related with float16).


I think it's at least worth adding an assert(num_sources < 3) and a
FINISHME comment so whenever this function starts getting used to
validate 3-source instructions people notice it's incomplete.

Honestly it's kind of disturbing that 3-source instructions aren't being
validated at all for the most part in the EU validator.  But that's not
really your fault.

[A bunch more comments below]

>
>
>> 
>> > +   const unsigned num_sources = num_sources_from_inst(devinfo, inst);
>> > +   if (num_sources > 2)
>> > +  return false;
>> > +
>> > +   return src0_is_acc(devinfo, inst) || (num_sources > 1 && 
>> > src1_is_acc(devinfo, inst));
>> > +}
>> > +
>> >  static struct string
>> >  send_restrictions(const struct gen_device_info *devinfo,
>> >   

Re: [Mesa-dev] [PATCH v6 32/35] intel/compiler: validate region restrictions for mixed float mode

2019-04-10 Thread Francisco Jerez
"Juan A. Suarez Romero"  writes:

> From: Iago Toral Quiroga 
>
> v2:
>  - Adapted unit tests to make them consistent with the changes done
>to the validation of half-float conversions.
>
> v3 (Curro):
> - Check all the accummulators
> - Constify declarations
> - Do not check src1 type in single-source instructions.
> - Check for all instructions that read accumulator (either implicitly or
>   explicitly)
> - Check restrictions in src1 too.
> - Merge conditional block
> - Add invalid test case.
> ---
>  src/intel/compiler/brw_eu_validate.c| 290 +++
>  src/intel/compiler/test_eu_validate.cpp | 631 
>  2 files changed, 921 insertions(+)
>
> diff --git a/src/intel/compiler/brw_eu_validate.c 
> b/src/intel/compiler/brw_eu_validate.c
> index cfaf126e2f5..4a735641c86 100644
> --- a/src/intel/compiler/brw_eu_validate.c
> +++ b/src/intel/compiler/brw_eu_validate.c
> @@ -170,6 +170,20 @@ src1_is_null(const struct gen_device_info *devinfo, 
> const brw_inst *inst)
>brw_inst_src1_da_reg_nr(devinfo, inst) == BRW_ARF_NULL;
>  }
>  
> +static bool
> +src0_is_acc(const struct gen_device_info *devinfo, const brw_inst *inst)
> +{
> +   return brw_inst_src0_reg_file(devinfo, inst) == 
> BRW_ARCHITECTURE_REGISTER_FILE &&
> +  (brw_inst_src0_da_reg_nr(devinfo, inst) & 0xF0) == 
> BRW_ARF_ACCUMULATOR;
> +}
> +
> +static bool
> +src1_is_acc(const struct gen_device_info *devinfo, const brw_inst *inst)
> +{
> +   return brw_inst_src1_reg_file(devinfo, inst) == 
> BRW_ARCHITECTURE_REGISTER_FILE &&
> +  (brw_inst_src1_da_reg_nr(devinfo, inst) & 0xF0) == 
> BRW_ARF_ACCUMULATOR;
> +}
> +
>  static bool
>  src0_is_grf(const struct gen_device_info *devinfo, const brw_inst *inst)
>  {
> @@ -275,6 +289,27 @@ sources_not_null(const struct gen_device_info *devinfo,
> return error_msg;
>  }
>  
> +static bool
> +inst_uses_src_acc(const struct gen_device_info *devinfo, const brw_inst 
> *inst)
> +{
> +   /* Check instructions that use implicit accumulator sources */
> +   switch (brw_inst_opcode(devinfo, inst)) {
> +   case BRW_OPCODE_MAC:
> +   case BRW_OPCODE_MACH:
> +   case BRW_OPCODE_SADA2:
> +  return true;
> +   }
> +
> +   /* Instructions with three source operands cannot use explicit accumulator
> +* operands.
> +*/

They can on Gen10+.  Yeah, I know, it's quite a pain to have to
special-case 3src instructions everywhere in the validator code...

> +   const unsigned num_sources = num_sources_from_inst(devinfo, inst);
> +   if (num_sources > 2)
> +  return false;
> +
> +   return src0_is_acc(devinfo, inst) || (num_sources > 1 && 
> src1_is_acc(devinfo, inst));
> +}
> +
>  static struct string
>  send_restrictions(const struct gen_device_info *devinfo,
>const brw_inst *inst)
> @@ -938,6 +973,260 @@ general_restrictions_on_region_parameters(const struct 
> gen_device_info *devinfo,
> return error_msg;
>  }
>  
> +static struct string
> +special_restrictions_for_mixed_float_mode(const struct gen_device_info 
> *devinfo,
> +  const brw_inst *inst)
> +{
> +   struct string error_msg = { .str = NULL, .len = 0 };
> +
> +   const unsigned opcode = brw_inst_opcode(devinfo, inst);
> +   const unsigned num_sources = num_sources_from_inst(devinfo, inst);
> +   if (num_sources >= 3)
> +  return error_msg;
> +
> +   if (!is_mixed_float(devinfo, inst))
> +  return error_msg;
> +
> +   unsigned exec_size = 1 << brw_inst_exec_size(devinfo, inst);
> +   bool is_align16 = brw_inst_access_mode(devinfo, inst) == BRW_ALIGN_16;
> +
> +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo, inst);
> +   enum brw_reg_type src1_type = num_sources > 1 ?
> + brw_inst_src1_type(devinfo, inst) : 0;
> +   enum brw_reg_type dst_type = brw_inst_dst_type(devinfo, inst);
> +
> +   unsigned dst_stride = STRIDE(brw_inst_dst_hstride(devinfo, inst));
> +   bool dst_is_packed = is_packed(exec_size * dst_stride, exec_size, 
> dst_stride);
> +
> +   /* From the SKL PRM, Special Restrictions for Handling Mixed Mode
> +* Float Operations:
> +*
> +*"Indirect addressing on source is not supported when source and
> +* destination data types are mixed float."
> +*/
> +   ERROR_IF((types_are_mixed_float(dst_type, src0_type) &&
> + brw_inst_src0_address_mode(devinfo, inst) != 
> BRW_ADDRESS_DIRECT) ||
> +(num_sources > 1 &&
> + types_are_mixed_float(dst_type, src1_type) &&
> + brw_inst_src1_address_mode(devinfo, inst) != 
> BRW_ADDRESS_DIRECT),
> +"Indirect addressing on source is not supported when source and "
> +"destination data types are mixed float");
> +
> +   /* From the SKL PRM, Special Restrictions for Handling Mixed Mode
> +* Float Operations:
> +*
> +*"No SIMD16 in mixed mode when destination is f32. Instruction
> +* execution size must be no more than 

Re: [Mesa-dev] [PATCH v6 30/35] intel/compiler: validate region restrictions for half-float conversions

2019-04-10 Thread Francisco Jerez
"Juan A. Suarez Romero"  writes:

> From: Iago Toral Quiroga 
>
> v2:
>  - Consider implicit conversions in 2-src instructions too (Curro)
>  - For restrictions that involve destination stride requirements
>only validate them for Align1, since Align16 always requires
>packed data.
>  - Skip general rule for the dst/execution type size ratio for
>mixed float instructions on CHV and SKL+, these have their own
>set of rules that we'll be validated separately.
>
> v3 (Curro):
>  - Do not check src1 type in single-source instructions.
>  - Check restriction on src1.
>  - Remove invalid test.

Reviewed-by: Francisco Jerez 

> ---
>  src/intel/compiler/brw_eu_validate.c| 155 +++-
>  src/intel/compiler/test_eu_validate.cpp | 116 ++
>  2 files changed, 270 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_eu_validate.c 
> b/src/intel/compiler/brw_eu_validate.c
> index bd0e48a5e5c..54bffb3af03 100644
> --- a/src/intel/compiler/brw_eu_validate.c
> +++ b/src/intel/compiler/brw_eu_validate.c
> @@ -469,6 +469,66 @@ is_packed(unsigned vstride, unsigned width, unsigned 
> hstride)
> return false;
>  }
>  
> +/**
> + * Returns whether an instruction is an explicit or implicit conversion
> + * to/from half-float.
> + */
> +static bool
> +is_half_float_conversion(const struct gen_device_info *devinfo,
> + const brw_inst *inst)
> +{
> +   enum brw_reg_type dst_type = brw_inst_dst_type(devinfo, inst);
> +
> +   unsigned num_sources = num_sources_from_inst(devinfo, inst);
> +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo, inst);
> +
> +   if (dst_type != src0_type &&
> +   (dst_type == BRW_REGISTER_TYPE_HF || src0_type == 
> BRW_REGISTER_TYPE_HF)) {
> +  return true;
> +   } else if (num_sources > 1) {
> +  enum brw_reg_type src1_type = brw_inst_src1_type(devinfo, inst);
> +  return dst_type != src1_type &&
> +(dst_type == BRW_REGISTER_TYPE_HF ||
> + src1_type == BRW_REGISTER_TYPE_HF);
> +   }
> +
> +   return false;
> +}
> +
> +/*
> + * Returns whether an instruction is using mixed float operation mode
> + */
> +static bool
> +is_mixed_float(const struct gen_device_info *devinfo, const brw_inst *inst)
> +{
> +   if (devinfo->gen < 8)
> +  return false;
> +
> +   if (inst_is_send(devinfo, inst))
> +  return false;
> +
> +   unsigned opcode = brw_inst_opcode(devinfo, inst);
> +   const struct opcode_desc *desc = brw_opcode_desc(devinfo, opcode);
> +   if (desc->ndst == 0)
> +  return false;
> +
> +   /* FIXME: support 3-src instructions */
> +   unsigned num_sources = num_sources_from_inst(devinfo, inst);
> +   assert(num_sources < 3);
> +
> +   enum brw_reg_type dst_type = brw_inst_dst_type(devinfo, inst);
> +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo, inst);
> +
> +   if (num_sources == 1)
> +  return types_are_mixed_float(src0_type, dst_type);
> +
> +   enum brw_reg_type src1_type = brw_inst_src1_type(devinfo, inst);
> +
> +   return types_are_mixed_float(src0_type, src1_type) ||
> +  types_are_mixed_float(src0_type, dst_type) ||
> +  types_are_mixed_float(src1_type, dst_type);
> +}
> +
>  /**
>   * Checks restrictions listed in "General Restrictions Based on Operand 
> Types"
>   * in the "Register Region Restrictions" section.
> @@ -539,7 +599,100 @@ general_restrictions_based_on_operand_types(const 
> struct gen_device_info *devinf
> exec_type_size == 8 && dst_type_size == 4)
>dst_type_size = 8;
>  
> -   if (exec_type_size > dst_type_size) {
> +   if (is_half_float_conversion(devinfo, inst)) {
> +  /**
> +   * A helper to validate used in the validation of the following 
> restriction
> +   * from the BDW+ PRM, Volume 2a, Command Reference, Instructions - MOV:
> +   *
> +   *"There is no direct conversion from HF to DF or DF to HF.
> +   * There is no direct conversion from HF to Q/UQ or Q/UQ to HF."
> +   *
> +   * Even if these restrictions are listed for the MOV instruction, we
> +   * validate this more generally, since there is the possibility
> +   * of implicit conversions from other instructions, such us implicit
> +   * conversion from integer to HF with the ADD instruction in SKL+.
> +   */
> +  enum brw_reg_type src0_type = brw_inst_src0_type(devinfo, inst);
> +  enum brw_reg_type src1_type = num_sources > 1 ?
> +brw_inst_src1_type(devinfo,

Re: [Mesa-dev] [PATCH v7 28/35] intel/compiler: implement SIMD16 restrictions for mixed-float instructions

2019-04-10 Thread Francisco Jerez
Iago Toral  writes:

> On Mon, 2019-04-08 at 12:00 -0700, Francisco Jerez wrote:
>> "Juan A. Suarez Romero"  writes:
>> 
>> > From: Iago Toral Quiroga 
>> > 
>> > v2: f32to16/f16to32 can use a :W destination (Curro)
>> > ---
>> >  src/intel/compiler/brw_fs.cpp | 71
>> > +++
>> >  1 file changed, 71 insertions(+)
>> > 
>> > diff --git a/src/intel/compiler/brw_fs.cpp
>> > b/src/intel/compiler/brw_fs.cpp
>> > index d4803c63b93..48b5cc6c403 100644
>> > --- a/src/intel/compiler/brw_fs.cpp
>> > +++ b/src/intel/compiler/brw_fs.cpp
>> > @@ -5606,6 +5606,48 @@ fs_visitor::lower_logical_sends()
>> > return progress;
>> >  }
>> >  
>> > +static bool
>> > +is_mixed_float_with_fp32_dst(const fs_inst *inst)
>> > +{
>> > +   /* This opcode sometimes uses :W type on the source even if the
>> > operand is
>> > +* a :HF, because in gen7 there is no support for :HF, and thus
>> > it uses :W.
>> > +*/
>> > +   if (inst->opcode == BRW_OPCODE_F16TO32)
>> > +  return true;
>> > +
>> > +   if (inst->dst.type != BRW_REGISTER_TYPE_F)
>> > +  return false;
>> > +
>> > +   for (int i = 0; i < inst->sources; i++) {
>> > +  if (inst->src[i].type == BRW_REGISTER_TYPE_HF)
>> > + return true;
>> > +   }
>> > +
>> > +   return false;
>> > +}
>> > +
>> > +static bool
>> > +is_mixed_float_with_packed_fp16_dst(const fs_inst *inst)
>> > +{
>> > +   /* This opcode sometimes uses :W type on the destination even
>> > if the
>> > +* destination is a :HF, because in gen7 there is no support
>> > for :HF, and
>> > +* thus it uses :W.
>> > +*/
>> > +   if (inst->opcode == BRW_OPCODE_F32TO16)
>> 
>> Don't you need to check whether the destination is packed here?
>
> Yes, we also need that, like in the code below.
>

Patch would be Reviewed-by: Francisco Jerez  with
that change.

>> > +  return true;
>> > +
>> > +   if (inst->dst.type != BRW_REGISTER_TYPE_HF ||
>> > +   inst->dst.stride != 1)
>> > +  return false;
>> > +
>> > +   for (int i = 0; i < inst->sources; i++) {
>> > +  if (inst->src[i].type == BRW_REGISTER_TYPE_F)
>> > + return true;
>> > +   }
>> > +
>> > +   return false;
>> > +}
>> > +
>> >  /**
>> >   * Get the closest allowed SIMD width for instruction \p inst
>> > accounting for
>> >   * some common regioning and execution control restrictions that
>> > apply to FPU
>> > @@ -5768,6 +5810,35 @@ get_fpu_lowered_simd_width(const struct
>> > gen_device_info *devinfo,
>> >   max_width = MIN2(max_width, 4);
>> > }
>> >  
>> > +   /* From the SKL PRM, Special Restrictions for Handling Mixed
>> > Mode
>> > +* Float Operations:
>> > +*
>> > +*"No SIMD16 in mixed mode when destination is f32.
>> > Instruction
>> > +* execution size must be no more than 8."
>> > +*
>> > +* FIXME: the simulator doesn't seem to complain if we don't do
>> > this and
>> > +* empirical testing with existing CTS tests show that they
>> > pass just fine
>> > +* without implementing this, however, since our interpretation
>> > of the PRM
>> > +* is that conversion MOVs between HF and F are still mixed-
>> > float
>> > +* instructions (and therefore subject to this restriction) we
>> > decided to
>> > +* split them to be safe. Might be useful to do additional
>> > investigation to
>> > +* lift the restriction if we can ensure that it is safe
>> > though, since these
>> > +* conversions are common when half-float types are involved
>> > since many
>> > +* instructions do not support HF types and conversions from/to
>> > F are
>> > +* required.
>> > +*/
>> > +   if (is_mixed_float_with_fp32_dst(inst))
>> > +  max_width = MIN2(max_width, 8);
>> > +
>> > +   /* From the SKL PRM, Special Restrictions for Handling Mixed
>> > Mode
>> > +* Float Operations:
>> > +*
>> > +*"No SIMD16 in mixed mode when destination is packed f16
>> > for both
>> > +* Align1 and Align16."
>> > +*/
>> > +   if (is_mixed_float_with_packed_fp16_dst(inst))
>> > +  max_width = MIN2(max_width, 8);
>> > +
>> > /* Only power-of-two execution sizes are representable in the
>> > instruction
>> >  * control fields.
>> >  */
>> > -- 
>> > 2.20.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v7 28/35] intel/compiler: implement SIMD16 restrictions for mixed-float instructions

2019-04-08 Thread Francisco Jerez
"Juan A. Suarez Romero"  writes:

> From: Iago Toral Quiroga 
>
> v2: f32to16/f16to32 can use a :W destination (Curro)
> ---
>  src/intel/compiler/brw_fs.cpp | 71 +++
>  1 file changed, 71 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index d4803c63b93..48b5cc6c403 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -5606,6 +5606,48 @@ fs_visitor::lower_logical_sends()
> return progress;
>  }
>  
> +static bool
> +is_mixed_float_with_fp32_dst(const fs_inst *inst)
> +{
> +   /* This opcode sometimes uses :W type on the source even if the operand is
> +* a :HF, because in gen7 there is no support for :HF, and thus it uses 
> :W.
> +*/
> +   if (inst->opcode == BRW_OPCODE_F16TO32)
> +  return true;
> +
> +   if (inst->dst.type != BRW_REGISTER_TYPE_F)
> +  return false;
> +
> +   for (int i = 0; i < inst->sources; i++) {
> +  if (inst->src[i].type == BRW_REGISTER_TYPE_HF)
> + return true;
> +   }
> +
> +   return false;
> +}
> +
> +static bool
> +is_mixed_float_with_packed_fp16_dst(const fs_inst *inst)
> +{
> +   /* This opcode sometimes uses :W type on the destination even if the
> +* destination is a :HF, because in gen7 there is no support for :HF, and
> +* thus it uses :W.
> +*/
> +   if (inst->opcode == BRW_OPCODE_F32TO16)

Don't you need to check whether the destination is packed here?

> +  return true;
> +
> +   if (inst->dst.type != BRW_REGISTER_TYPE_HF ||
> +   inst->dst.stride != 1)
> +  return false;
> +
> +   for (int i = 0; i < inst->sources; i++) {
> +  if (inst->src[i].type == BRW_REGISTER_TYPE_F)
> + return true;
> +   }
> +
> +   return false;
> +}
> +
>  /**
>   * Get the closest allowed SIMD width for instruction \p inst accounting for
>   * some common regioning and execution control restrictions that apply to FPU
> @@ -5768,6 +5810,35 @@ get_fpu_lowered_simd_width(const struct 
> gen_device_info *devinfo,
>   max_width = MIN2(max_width, 4);
> }
>  
> +   /* From the SKL PRM, Special Restrictions for Handling Mixed Mode
> +* Float Operations:
> +*
> +*"No SIMD16 in mixed mode when destination is f32. Instruction
> +* execution size must be no more than 8."
> +*
> +* FIXME: the simulator doesn't seem to complain if we don't do this and
> +* empirical testing with existing CTS tests show that they pass just fine
> +* without implementing this, however, since our interpretation of the PRM
> +* is that conversion MOVs between HF and F are still mixed-float
> +* instructions (and therefore subject to this restriction) we decided to
> +* split them to be safe. Might be useful to do additional investigation 
> to
> +* lift the restriction if we can ensure that it is safe though, since 
> these
> +* conversions are common when half-float types are involved since many
> +* instructions do not support HF types and conversions from/to F are
> +* required.
> +*/
> +   if (is_mixed_float_with_fp32_dst(inst))
> +  max_width = MIN2(max_width, 8);
> +
> +   /* From the SKL PRM, Special Restrictions for Handling Mixed Mode
> +* Float Operations:
> +*
> +*"No SIMD16 in mixed mode when destination is packed f16 for both
> +* Align1 and Align16."
> +*/
> +   if (is_mixed_float_with_packed_fp16_dst(inst))
> +  max_width = MIN2(max_width, 8);
> +
> /* Only power-of-two execution sizes are representable in the instruction
>  * control fields.
>  */
> -- 
> 2.20.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v5 35/38] intel/compiler: validate region restrictions for mixed float mode

2019-04-01 Thread Francisco Jerez
"Juan A. Suarez Romero"  writes:

> On Wed, 2019-03-27 at 19:37 -0700, Francisco Jerez wrote:
>> "Juan A. Suarez Romero"  writes:
>> 
>> > From: Iago Toral Quiroga 
>> > 
>> > v2:
>> >  - Adapted unit tests to make them consistent with the changes done
>> >to the validation of half-float conversions.
>> > ---
>> >  src/intel/compiler/brw_eu_validate.c| 256 ++
>> >  src/intel/compiler/test_eu_validate.cpp | 620 
>> >  2 files changed, 876 insertions(+)
>> > 
>> > diff --git a/src/intel/compiler/brw_eu_validate.c 
>> > b/src/intel/compiler/brw_eu_validate.c
>> > index 18c95efb05b..5eea02f5c94 100644
>> > --- a/src/intel/compiler/brw_eu_validate.c
>> > +++ b/src/intel/compiler/brw_eu_validate.c
>> > @@ -170,6 +170,13 @@ src1_is_null(const struct gen_device_info *devinfo, 
>> > const brw_inst *inst)
>> >brw_inst_src1_da_reg_nr(devinfo, inst) == BRW_ARF_NULL;
>> >  }
>> >  
>> > +static bool
>> > +src0_is_acc(const struct gen_device_info *devinfo, const brw_inst *inst)
>> > +{
>> > +   return brw_inst_src0_reg_file(devinfo, inst) == 
>> > BRW_ARCHITECTURE_REGISTER_FILE &&
>> > +  brw_inst_src0_da_reg_nr(devinfo, inst) == BRW_ARF_ACCUMULATOR;
>> > +}
>> > +
>> 
>> There are multiple accumulator registers.  The above only checks for
>> acc0.
>> 
>
>  static bool
>> >  src0_is_grf(const struct gen_device_info *devinfo, const brw_inst *inst)
>> >  {
>> > @@ -937,6 +944,254 @@ general_restrictions_on_region_parameters(const 
>> > struct gen_device_info *devinfo,
>> > return error_msg;
>> >  }
>> >  
>> > +static struct string
>> > +special_restrictions_for_mixed_float_mode(const struct gen_device_info 
>> > *devinfo,
>> > +  const brw_inst *inst)
>> > +{
>> > +   struct string error_msg = { .str = NULL, .len = 0 };
>> > +
>> > +   unsigned opcode = brw_inst_opcode(devinfo, inst);
>> 
>> Constify this and the declarations below.
>> 
>> > +   unsigned num_sources = num_sources_from_inst(devinfo, inst);
>> > +   if (num_sources >= 3)
>> > +  return error_msg;
>> > +
>> > +   if (!is_mixed_float(devinfo, inst))
>> > +  return error_msg;
>> > +
>> > +   unsigned exec_size = 1 << brw_inst_exec_size(devinfo, inst);
>> > +   bool is_align16 = brw_inst_access_mode(devinfo, inst) == BRW_ALIGN_16;
>> > +
>> > +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo, inst);
>> > +   enum brw_reg_type src1_type = brw_inst_src1_type(devinfo, inst);
>> 
>> Same comment as in the previous patch, this can possibly blow up for
>> instructions with less than two sources.
>> 
>> > +   enum brw_reg_type dst_type = brw_inst_dst_type(devinfo, inst);
>> > +
>> > +   unsigned dst_stride = STRIDE(brw_inst_dst_hstride(devinfo, inst));
>> > +   bool dst_is_packed = is_packed(exec_size * dst_stride, exec_size, 
>> > dst_stride);
>> > +
>> > +   /* From the SKL PRM, Special Restrictions for Handling Mixed Mode
>> > +* Float Operations:
>> > +*
>> > +*"Indirect addressing on source is not supported when source and
>> > +* destination data types are mixed float."
>> > +*
>> > +* Indirect addressing is only supported on the first source, so we 
>> > only
>> > +* check that.
>> 
>> I don't think that's true.  The hardware spec has the following example
>> of a valid but kind of funky instruction with indirect regioning:
>> 
>> > add (16) r[a0.0]:f r[a0.2]:f r[a0.4]:f 
>
> Interesting, because looking at the driver implementation of
> brw_set_src1, for anything that is not an immediate we do:
>
> /* This is a hardware restriction, which may or may not be lifted
>  * in the future:
>  */
> assert (reg.address_mode == BRW_ADDRESS_DIRECT);
>
> I guess this assertion was written for older platforms then?
>

Yes, that assert seems very outdated.  Apparently the only platform that
doesn't support indirect addressing in src1 is BRW (i.e. the original
i965).  All later platforms seem to support it, but they are restricted
to 1x1 mode.

>> > +*/
>> > +   ERROR_IF(types_are_mixed_float(dst_type, src0_type) &&
>> 
>> I doubt that it makes a d

Re: [Mesa-dev] [PATCH v5 35/38] intel/compiler: validate region restrictions for mixed float mode

2019-03-27 Thread Francisco Jerez
"Juan A. Suarez Romero"  writes:

> From: Iago Toral Quiroga 
>
> v2:
>  - Adapted unit tests to make them consistent with the changes done
>to the validation of half-float conversions.
> ---
>  src/intel/compiler/brw_eu_validate.c| 256 ++
>  src/intel/compiler/test_eu_validate.cpp | 620 
>  2 files changed, 876 insertions(+)
>
> diff --git a/src/intel/compiler/brw_eu_validate.c 
> b/src/intel/compiler/brw_eu_validate.c
> index 18c95efb05b..5eea02f5c94 100644
> --- a/src/intel/compiler/brw_eu_validate.c
> +++ b/src/intel/compiler/brw_eu_validate.c
> @@ -170,6 +170,13 @@ src1_is_null(const struct gen_device_info *devinfo, 
> const brw_inst *inst)
>brw_inst_src1_da_reg_nr(devinfo, inst) == BRW_ARF_NULL;
>  }
>  
> +static bool
> +src0_is_acc(const struct gen_device_info *devinfo, const brw_inst *inst)
> +{
> +   return brw_inst_src0_reg_file(devinfo, inst) == 
> BRW_ARCHITECTURE_REGISTER_FILE &&
> +  brw_inst_src0_da_reg_nr(devinfo, inst) == BRW_ARF_ACCUMULATOR;
> +}
> +

There are multiple accumulator registers.  The above only checks for
acc0.

>  static bool
>  src0_is_grf(const struct gen_device_info *devinfo, const brw_inst *inst)
>  {
> @@ -937,6 +944,254 @@ general_restrictions_on_region_parameters(const struct 
> gen_device_info *devinfo,
> return error_msg;
>  }
>  
> +static struct string
> +special_restrictions_for_mixed_float_mode(const struct gen_device_info 
> *devinfo,
> +  const brw_inst *inst)
> +{
> +   struct string error_msg = { .str = NULL, .len = 0 };
> +
> +   unsigned opcode = brw_inst_opcode(devinfo, inst);

Constify this and the declarations below.

> +   unsigned num_sources = num_sources_from_inst(devinfo, inst);
> +   if (num_sources >= 3)
> +  return error_msg;
> +
> +   if (!is_mixed_float(devinfo, inst))
> +  return error_msg;
> +
> +   unsigned exec_size = 1 << brw_inst_exec_size(devinfo, inst);
> +   bool is_align16 = brw_inst_access_mode(devinfo, inst) == BRW_ALIGN_16;
> +
> +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo, inst);
> +   enum brw_reg_type src1_type = brw_inst_src1_type(devinfo, inst);

Same comment as in the previous patch, this can possibly blow up for
instructions with less than two sources.

> +   enum brw_reg_type dst_type = brw_inst_dst_type(devinfo, inst);
> +
> +   unsigned dst_stride = STRIDE(brw_inst_dst_hstride(devinfo, inst));
> +   bool dst_is_packed = is_packed(exec_size * dst_stride, exec_size, 
> dst_stride);
> +
> +   /* From the SKL PRM, Special Restrictions for Handling Mixed Mode
> +* Float Operations:
> +*
> +*"Indirect addressing on source is not supported when source and
> +* destination data types are mixed float."
> +*
> +* Indirect addressing is only supported on the first source, so we only
> +* check that.

I don't think that's true.  The hardware spec has the following example
of a valid but kind of funky instruction with indirect regioning:

| add (16) r[a0.0]:f r[a0.2]:f r[a0.4]:f 

> +*/
> +   ERROR_IF(types_are_mixed_float(dst_type, src0_type) &&

I doubt that it makes a difference whether there is a mismatch between
the type of src0 and the type of the destination for indirect addressing
to be disallowed.  Things are likely to blow up with indirect addressing
in src0 even if the instruction is mixed-mode due to the effect of the
type of src1 alone.  But it's hard to tell for sure, spec wording seems
fairly ambiguous...

> +brw_inst_src0_address_mode(devinfo, inst) != BRW_ADDRESS_DIRECT,
> +"Indirect addressing on source is not supported when source and "
> +"destination data types are mixed float");
> +
> +   /* From the SKL PRM, Special Restrictions for Handling Mixed Mode
> +* Float Operations:
> +*
> +*"No SIMD16 in mixed mode when destination is f32. Instruction
> +* execution size must be no more than 8."
> +*/
> +   ERROR_IF(exec_size > 8 && dst_type == BRW_REGISTER_TYPE_F,
> +"Mixed float mode with 32-bit float destination is limited "
> +"to SIMD8");
> +
> +   if (is_align16) {
> +  /* From the SKL PRM, Special Restrictions for Handling Mixed Mode
> +   * Float Operations:
> +   *
> +   *   "In Align16 mode, when half float and float data types are mixed
> +   *between source operands OR between source and destination 
> operands,
> +   *the register content are assumed to be packed."
> +   *
> +   * Since Align16 doesn't have a concept of horizontal stride (or 
> width),
> +   * it means that vertical stride must always be 4, since 0 and 2 would
> +   * lead to replicated data, and any other value is disallowed in 
> Align16.
> +   * However, the PRM also says:
> +   *
> +   *   "In Align16, vertical stride can never be zero for f16"
> +   *
> +   * Which is oddly redundant and specific considering the 

Re: [Mesa-dev] [PATCH v5 33/38] intel/compiler: validate region restrictions for half-float conversions

2019-03-27 Thread Francisco Jerez
"Juan A. Suarez Romero"  writes:

> From: Iago Toral Quiroga 
>
> v2:
>  - Consider implicit conversions in 2-src instructions too (Curro)
>  - For restrictions that involve destination stride requirements
>only validate them for Align1, since Align16 always requires
>packed data.
>  - Skip general rule for the dst/execution type size ratio for
>mixed float instructions on CHV and SKL+, these have their own
>set of rules that we'll be validated separately.
> ---
>  src/intel/compiler/brw_eu_validate.c| 156 +++-
>  src/intel/compiler/test_eu_validate.cpp | 117 ++
>  2 files changed, 272 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_eu_validate.c 
> b/src/intel/compiler/brw_eu_validate.c
> index bd0e48a5e5c..cad50469c65 100644
> --- a/src/intel/compiler/brw_eu_validate.c
> +++ b/src/intel/compiler/brw_eu_validate.c
> @@ -469,6 +469,66 @@ is_packed(unsigned vstride, unsigned width, unsigned 
> hstride)
> return false;
>  }
>  
> +/**
> + * Returns whether an instruction is an explicit or implicit conversion
> + * to/from half-float.
> + */
> +static bool
> +is_half_float_conversion(const struct gen_device_info *devinfo,
> + const brw_inst *inst)
> +{
> +   enum brw_reg_type dst_type = brw_inst_dst_type(devinfo, inst);
> +
> +   unsigned num_sources = num_sources_from_inst(devinfo, inst);
> +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo, inst);
> +
> +   if (dst_type != src0_type &&
> +   (dst_type == BRW_REGISTER_TYPE_HF || src0_type == 
> BRW_REGISTER_TYPE_HF)) {
> +  return true;
> +   } else if (num_sources > 1) {
> +  enum brw_reg_type src1_type = brw_inst_src1_type(devinfo, inst);
> +  return dst_type != src1_type &&
> +(dst_type == BRW_REGISTER_TYPE_HF ||
> + src1_type == BRW_REGISTER_TYPE_HF);
> +   }
> +
> +   return false;
> +}
> +
> +/*
> + * Returns whether an instruction is using mixed float operation mode
> + */
> +static bool
> +is_mixed_float(const struct gen_device_info *devinfo, const brw_inst *inst)
> +{
> +   if (devinfo->gen < 8)
> +  return false;
> +
> +   if (inst_is_send(devinfo, inst))
> +  return false;
> +
> +   unsigned opcode = brw_inst_opcode(devinfo, inst);
> +   const struct opcode_desc *desc = brw_opcode_desc(devinfo, opcode);
> +   if (desc->ndst == 0)
> +  return false;
> +
> +   /* FIXME: support 3-src instructions */
> +   unsigned num_sources = num_sources_from_inst(devinfo, inst);
> +   assert(num_sources < 3);
> +
> +   enum brw_reg_type dst_type = brw_inst_dst_type(devinfo, inst);
> +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo, inst);
> +
> +   if (num_sources == 1)
> +  return types_are_mixed_float(src0_type, dst_type);
> +
> +   enum brw_reg_type src1_type = brw_inst_src1_type(devinfo, inst);
> +
> +   return types_are_mixed_float(src0_type, src1_type) ||
> +  types_are_mixed_float(src0_type, dst_type) ||
> +  types_are_mixed_float(src1_type, dst_type);
> +}
> +
>  /**
>   * Checks restrictions listed in "General Restrictions Based on Operand 
> Types"
>   * in the "Register Region Restrictions" section.
> @@ -539,7 +599,101 @@ general_restrictions_based_on_operand_types(const 
> struct gen_device_info *devinf
> exec_type_size == 8 && dst_type_size == 4)
>dst_type_size = 8;
>  
> -   if (exec_type_size > dst_type_size) {
> +   if (is_half_float_conversion(devinfo, inst)) {
> +  /**
> +   * A helper to validate used in the validation of the following 
> restriction
> +   * from the BDW+ PRM, Volume 2a, Command Reference, Instructions - MOV:
> +   *
> +   *"There is no direct conversion from HF to DF or DF to HF.
> +   * There is no direct conversion from HF to Q/UQ or Q/UQ to HF."
> +   *
> +   * Even if these restrictions are listed for the MOV instruction, we
> +   * validate this more generally, since there is the possibility
> +   * of implicit conversions from other instructions, such us implicit
> +   * conversion from integer to HF with the ADD instruction in SKL+.
> +   */
> +  enum brw_reg_type src0_type = brw_inst_src0_type(devinfo, inst);
> +  enum brw_reg_type src1_type = brw_inst_src1_type(devinfo, inst);

I think the evaluation of brw_inst_src1_type() above can blow up if the
instruction has a single source and the src1 type field happens to
overlap with something else (e.g. an immediate).

> +  ERROR_IF(dst_type == BRW_REGISTER_TYPE_HF &&
> +   (type_sz(src0_type) == 8 ||
> +(num_sources > 1 && type_sz(src1_type) == 8)),
> +   "There are no direct conversions between 64-bit types and 
> HF");
> +
> +  ERROR_IF(type_sz(dst_type) == 8 &&
> +   (src0_type == BRW_REGISTER_TYPE_HF ||
> +(num_sources > 1 && src1_type == BRW_REGISTER_TYPE_HF)),
> +   "There are no direct conversions 

Re: [Mesa-dev] [PATCH v6 31/38] intel/compiler: implement SIMD16 restrictions for mixed-float instructions

2019-03-27 Thread Francisco Jerez
"Juan A. Suarez Romero"  writes:

> From: Iago Toral Quiroga 
>
> ---
>  src/intel/compiler/brw_fs.cpp | 65 +++
>  1 file changed, 65 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 2fc7793709b..3616a7afc31 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -5607,6 +5607,42 @@ fs_visitor::lower_logical_sends()
> return progress;
>  }
>  
> +static bool
> +is_mixed_float_with_fp32_dst(const fs_inst *inst)
> +{
> +   /* FIXME: This opcode sometimes uses :W type on the source
> +* for some reason even if the operand is a half-float, we
> +* should probably fix it to use the correct type.
> +*/
> +   if (inst->opcode == BRW_OPCODE_F16TO32)
> +  return true;
> +
> +   if (inst->dst.type != BRW_REGISTER_TYPE_F)
> +  return false;
> +
> +   for (int i = 0; i < inst->sources; i++) {
> +  if (inst->src[i].type == BRW_REGISTER_TYPE_HF)
> + return true;
> +   }
> +
> +   return false;
> +}
> +
> +static bool
> +is_mixed_float_with_packed_fp16_dst(const fs_inst *inst)
> +{

F32TO16 is also sometimes used with a W type as destination IIRC (since
it's a Gen7 instruction that predates HF types).  Not sure if it's ever
packed, but it might be sensible to return true here if that's ever the
case.

Other than that looks okay to me.

> +   if (inst->dst.type != BRW_REGISTER_TYPE_HF ||
> +   inst->dst.stride != 1)
> +  return false;
> +
> +   for (int i = 0; i < inst->sources; i++) {
> +  if (inst->src[i].type == BRW_REGISTER_TYPE_F)
> + return true;
> +   }
> +
> +   return false;
> +}
> +
>  /**
>   * Get the closest allowed SIMD width for instruction \p inst accounting for
>   * some common regioning and execution control restrictions that apply to FPU
> @@ -5769,6 +5805,35 @@ get_fpu_lowered_simd_width(const struct 
> gen_device_info *devinfo,
>   max_width = MIN2(max_width, 4);
> }
>  
> +   /* From the SKL PRM, Special Restrictions for Handling Mixed Mode
> +* Float Operations:
> +*
> +*"No SIMD16 in mixed mode when destination is f32. Instruction
> +* execution size must be no more than 8."
> +*
> +* FIXME: the simulator doesn't seem to complain if we don't do this and
> +* empirical testing with existing CTS tests show that they pass just fine
> +* without implementing this, however, since our interpretation of the PRM
> +* is that conversion MOVs between HF and F are still mixed-float
> +* instructions (and therefore subject to this restriction) we decided to
> +* split them to be safe. Might be useful to do additional investigation 
> to
> +* lift the restriction if we can ensure that it is safe though, since 
> these
> +* conversions are common when half-float types are involved since many
> +* instructions do not support HF types and conversions from/to F are
> +* required.
> +*/
> +   if (is_mixed_float_with_fp32_dst(inst))
> +  max_width = MIN2(max_width, 8);
> +
> +   /* From the SKL PRM, Special Restrictions for Handling Mixed Mode
> +* Float Operations:
> +*
> +*"No SIMD16 in mixed mode when destination is packed f16 for both
> +* Align1 and Align16."
> +*/
> +   if (is_mixed_float_with_packed_fp16_dst(inst))
> +  max_width = MIN2(max_width, 8);
> +
> /* Only power-of-two execution sizes are representable in the instruction
>  * control fields.
>  */
> -- 
> 2.20.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions

2019-03-13 Thread Francisco Jerez
Iago Toral  writes:

> On Tue, 2019-03-12 at 15:44 -0700, Francisco Jerez wrote:
>> Iago Toral  writes:
>> 
>> > On Tue, 2019-03-05 at 07:35 +0100, Iago Toral wrote:
>> > > On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote:
>> > > > Iago Toral  writes:
>> > > > 
>> > > > > On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote:
>> > > > > > Iago Toral  writes:
>> > > > > > 
>> > > > > > > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
>> > > > > > > > Iago Toral  writes:
>> > > > > > > > 
>> > > > > > > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez
>> > > > > > > > > wrote:
>> > > > > > > > > > Iago Toral  writes:
>> > > > > > > > > > 
>> > > > > > > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco
>> > > > > > > > > > > Jerez
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > > Iago Toral Quiroga  writes:
>> > > > > > > > > > > > 
>> > > > > > > > > > > > > ---
>> > > > > > > > > > > > >  src/intel/compiler/brw_eu_validate.c|  6
>> > > > > > > > > > > > > 4
>> > > > > > > > > > > > > -
>> > > > > > > > > > > > >  src/intel/compiler/test_eu_validate.cpp |
>> > > > > > > > > > > > > 122
>> > > > > > > > > > > > > 
>> > > > > > > > > > > > >  2 files changed, 185 insertions(+), 1
>> > > > > > > > > > > > > deletion(-
>> > > > > > > > > > > > > )
>> > > > > > > > > > > > > 
>> > > > > > > > > > > > > diff --git
>> > > > > > > > > > > > > a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > > > > > b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > > > > > index 000a05cb6ac..203641fecb9 100644
>> > > > > > > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > > > > > @@ -531,7 +531,69 @@
>> > > > > > > > > > > > > general_restrictions_based_on_operand_types(c
>> > > > > > > > > > > > > onst
>> > > > > > > > > > > > > struct
>> > > > > > > > > > > > > gen_device_info *devinf
>> > > > > > > > > > > > > exec_type_size == 8 && dst_type_size
>> > > > > > > > > > > > > ==
>> > > > > > > > > > > > > 4)
>> > > > > > > > > > > > >dst_type_size = 8;
>> > > > > > > > > > > > >  
>> > > > > > > > > > > > > -   if (exec_type_size > dst_type_size) {
>> > > > > > > > > > > > > +   /* From the BDW+ PRM:
>> > > > > > > > > > > > > +*
>> > > > > > > > > > > > > +*"There is no direct conversion from
>> > > > > > > > > > > > > HF
>> > > > > > > > > > > > > to
>> > > > > > > > > > > > > DF
>> > > > > > > > > > > > > or
>> > > > > > > > > > > > > DF to
>> > > > > > > > > > > > > HF.
>> > > > > > > > > > > > > +* There is no direct conversion from
>> > > > > > > > > > > > > HF
>> > > > > > > > > > > > > to
>> > > > > >

Re: [Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions

2019-03-12 Thread Francisco Jerez
Francisco Jerez  writes:

> Iago Toral  writes:
>
>> On Tue, 2019-03-05 at 07:35 +0100, Iago Toral wrote:
>>> On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote:
>>> > Iago Toral  writes:
>>> > 
>>> > > On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote:
>>> > > > Iago Toral  writes:
>>> > > > 
>>> > > > > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
>>> > > > > > Iago Toral  writes:
>>> > > > > > 
>>> > > > > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
>>> > > > > > > > Iago Toral  writes:
>>> > > > > > > > 
>>> > > > > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez
>>> > > > > > > > > wrote:
>>> > > > > > > > > > Iago Toral Quiroga  writes:
>>> > > > > > > > > > 
>>> > > > > > > > > > > ---
>>> > > > > > > > > > >  src/intel/compiler/brw_eu_validate.c|  64
>>> > > > > > > > > > > -
>>> > > > > > > > > > >  src/intel/compiler/test_eu_validate.cpp | 122
>>> > > > > > > > > > > 
>>> > > > > > > > > > >  2 files changed, 185 insertions(+), 1 deletion(-
>>> > > > > > > > > > > )
>>> > > > > > > > > > > 
>>> > > > > > > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c
>>> > > > > > > > > > > b/src/intel/compiler/brw_eu_validate.c
>>> > > > > > > > > > > index 000a05cb6ac..203641fecb9 100644
>>> > > > > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
>>> > > > > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
>>> > > > > > > > > > > @@ -531,7 +531,69 @@
>>> > > > > > > > > > > general_restrictions_based_on_operand_types(const
>>> > > > > > > > > > > struct
>>> > > > > > > > > > > gen_device_info *devinf
>>> > > > > > > > > > > exec_type_size == 8 && dst_type_size ==
>>> > > > > > > > > > > 4)
>>> > > > > > > > > > >dst_type_size = 8;
>>> > > > > > > > > > >  
>>> > > > > > > > > > > -   if (exec_type_size > dst_type_size) {
>>> > > > > > > > > > > +   /* From the BDW+ PRM:
>>> > > > > > > > > > > +*
>>> > > > > > > > > > > +*"There is no direct conversion from HF
>>> > > > > > > > > > > to
>>> > > > > > > > > > > DF
>>> > > > > > > > > > > or
>>> > > > > > > > > > > DF to
>>> > > > > > > > > > > HF.
>>> > > > > > > > > > > +* There is no direct conversion from HF
>>> > > > > > > > > > > to
>>> > > > > > > > > > > Q/UQ or
>>> > > > > > > > > > > Q/UQ to
>>> > > > > > > > > > > HF."
>>> > > > > > > > > > > +*/
>>> > > > > > > > > > > +   enum brw_reg_type src0_type =
>>> > > > > > > > > > > brw_inst_src0_type(devinfo,
>>> > > > > > > > > > > inst);
>>> > > > > > > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) ==
>>> > > > > > > > > > > BRW_OPCODE_MOV
>>> > > > > > > > > > > &&
>>> > > > > > > > > > 
>>> > > > > > > > > > Why is only the MOV instruction handled here and
>>> > > > > > > > > > below?  Aren't
>&

Re: [Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions

2019-03-12 Thread Francisco Jerez
Iago Toral  writes:

> On Wed, 2019-03-06 at 09:21 +0100, Iago Toral wrote:
>> On Tue, 2019-03-05 at 07:35 +0100, Iago Toral wrote:
>> > On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote:
>> > > Iago Toral  writes:
>> > > 
>> > > > On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote:
>> > > > > Iago Toral  writes:
>> > > > > 
>> > > > > > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
>> > > > > > > Iago Toral  writes:
>> > > > > > > 
>> > > > > > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez
>> > > > > > > > wrote:
>> > > > > > > > > Iago Toral  writes:
>> > > > > > > > > 
>> > > > > > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez
>> > > > > > > > > > wrote:
>> > > > > > > > > > > Iago Toral Quiroga  writes:
>> > > > > > > > > > > 
>> > > > > > > > > > > > ---
>> > > > > > > > > > > >  src/intel/compiler/brw_eu_validate.c|  64
>> > > > > > > > > > > > -
>> > > > > > > > > > > >  src/intel/compiler/test_eu_validate.cpp | 122
>> > > > > > > > > > > > 
>> > > > > > > > > > > >  2 files changed, 185 insertions(+), 1
>> > > > > > > > > > > > deletion(-
>> > > > > > > > > > > > )
>> > > > > > > > > > > > 
>> > > > > > > > > > > > diff --git
>> > > > > > > > > > > > a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > > > > b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > > > > index 000a05cb6ac..203641fecb9 100644
>> > > > > > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > > > > @@ -531,7 +531,69 @@
>> > > > > > > > > > > > general_restrictions_based_on_operand_types(con
>> > > > > > > > > > > > st
>> > > > > > > > > > > > struct
>> > > > > > > > > > > > gen_device_info *devinf
>> > > > > > > > > > > > exec_type_size == 8 && dst_type_size ==
>> > > > > > > > > > > > 4)
>> > > > > > > > > > > >dst_type_size = 8;
>> > > > > > > > > > > >  
>> > > > > > > > > > > > -   if (exec_type_size > dst_type_size) {
>> > > > > > > > > > > > +   /* From the BDW+ PRM:
>> > > > > > > > > > > > +*
>> > > > > > > > > > > > +*"There is no direct conversion from
>> > > > > > > > > > > > HF
>> > > > > > > > > > > > to
>> > > > > > > > > > > > DF
>> > > > > > > > > > > > or
>> > > > > > > > > > > > DF to
>> > > > > > > > > > > > HF.
>> > > > > > > > > > > > +* There is no direct conversion from
>> > > > > > > > > > > > HF
>> > > > > > > > > > > > to
>> > > > > > > > > > > > Q/UQ or
>> > > > > > > > > > > > Q/UQ to
>> > > > > > > > > > > > HF."
>> > > > > > > > > > > > +*/
>> > > > > > > > > > > > +   enum brw_reg_type src0_type =
>> > > > > > > > > > > > brw_inst_src0_type(devinfo,
>> > > > > > > > > > > > inst);
>> > > > > > > > > > > > +   ERROR_IF(brw

Re: [Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions

2019-03-12 Thread Francisco Jerez
Iago Toral  writes:

> On Tue, 2019-03-05 at 07:35 +0100, Iago Toral wrote:
>> On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote:
>> > Iago Toral  writes:
>> > 
>> > > On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote:
>> > > > Iago Toral  writes:
>> > > > 
>> > > > > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
>> > > > > > Iago Toral  writes:
>> > > > > > 
>> > > > > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
>> > > > > > > > Iago Toral  writes:
>> > > > > > > > 
>> > > > > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez
>> > > > > > > > > wrote:
>> > > > > > > > > > Iago Toral Quiroga  writes:
>> > > > > > > > > > 
>> > > > > > > > > > > ---
>> > > > > > > > > > >  src/intel/compiler/brw_eu_validate.c|  64
>> > > > > > > > > > > -
>> > > > > > > > > > >  src/intel/compiler/test_eu_validate.cpp | 122
>> > > > > > > > > > > 
>> > > > > > > > > > >  2 files changed, 185 insertions(+), 1 deletion(-
>> > > > > > > > > > > )
>> > > > > > > > > > > 
>> > > > > > > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > > > b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > > > index 000a05cb6ac..203641fecb9 100644
>> > > > > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > > > @@ -531,7 +531,69 @@
>> > > > > > > > > > > general_restrictions_based_on_operand_types(const
>> > > > > > > > > > > struct
>> > > > > > > > > > > gen_device_info *devinf
>> > > > > > > > > > > exec_type_size == 8 && dst_type_size ==
>> > > > > > > > > > > 4)
>> > > > > > > > > > >dst_type_size = 8;
>> > > > > > > > > > >  
>> > > > > > > > > > > -   if (exec_type_size > dst_type_size) {
>> > > > > > > > > > > +   /* From the BDW+ PRM:
>> > > > > > > > > > > +*
>> > > > > > > > > > > +*"There is no direct conversion from HF
>> > > > > > > > > > > to
>> > > > > > > > > > > DF
>> > > > > > > > > > > or
>> > > > > > > > > > > DF to
>> > > > > > > > > > > HF.
>> > > > > > > > > > > +* There is no direct conversion from HF
>> > > > > > > > > > > to
>> > > > > > > > > > > Q/UQ or
>> > > > > > > > > > > Q/UQ to
>> > > > > > > > > > > HF."
>> > > > > > > > > > > +*/
>> > > > > > > > > > > +   enum brw_reg_type src0_type =
>> > > > > > > > > > > brw_inst_src0_type(devinfo,
>> > > > > > > > > > > inst);
>> > > > > > > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) ==
>> > > > > > > > > > > BRW_OPCODE_MOV
>> > > > > > > > > > > &&
>> > > > > > > > > > 
>> > > > > > > > > > Why is only the MOV instruction handled here and
>> > > > > > > > > > below?  Aren't
>> > > > > > > > > > other
>> > > > > > > > > > instructions able to do implicit
>> > > > > > > > > > conversions?  Probably
>> > > > > > > > > > means
>> > &

Re: [Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions

2019-03-12 Thread Francisco Jerez
Iago Toral  writes:

> On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote:
>> Iago Toral  writes:
>> 
>> > On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote:
>> > > Iago Toral  writes:
>> > > 
>> > > > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
>> > > > > Iago Toral  writes:
>> > > > > 
>> > > > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
>> > > > > > > Iago Toral  writes:
>> > > > > > > 
>> > > > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez
>> > > > > > > > wrote:
>> > > > > > > > > Iago Toral Quiroga  writes:
>> > > > > > > > > 
>> > > > > > > > > > ---
>> > > > > > > > > >  src/intel/compiler/brw_eu_validate.c|  64
>> > > > > > > > > > -
>> > > > > > > > > >  src/intel/compiler/test_eu_validate.cpp | 122
>> > > > > > > > > > 
>> > > > > > > > > >  2 files changed, 185 insertions(+), 1 deletion(-)
>> > > > > > > > > > 
>> > > > > > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > > b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > > index 000a05cb6ac..203641fecb9 100644
>> > > > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > > @@ -531,7 +531,69 @@
>> > > > > > > > > > general_restrictions_based_on_operand_types(const
>> > > > > > > > > > struct
>> > > > > > > > > > gen_device_info *devinf
>> > > > > > > > > > exec_type_size == 8 && dst_type_size == 4)
>> > > > > > > > > >dst_type_size = 8;
>> > > > > > > > > >  
>> > > > > > > > > > -   if (exec_type_size > dst_type_size) {
>> > > > > > > > > > +   /* From the BDW+ PRM:
>> > > > > > > > > > +*
>> > > > > > > > > > +*"There is no direct conversion from HF to
>> > > > > > > > > > DF
>> > > > > > > > > > or
>> > > > > > > > > > DF to
>> > > > > > > > > > HF.
>> > > > > > > > > > +* There is no direct conversion from HF to
>> > > > > > > > > > Q/UQ or
>> > > > > > > > > > Q/UQ to
>> > > > > > > > > > HF."
>> > > > > > > > > > +*/
>> > > > > > > > > > +   enum brw_reg_type src0_type =
>> > > > > > > > > > brw_inst_src0_type(devinfo,
>> > > > > > > > > > inst);
>> > > > > > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) ==
>> > > > > > > > > > BRW_OPCODE_MOV
>> > > > > > > > > > &&
>> > > > > > > > > 
>> > > > > > > > > Why is only the MOV instruction handled here and
>> > > > > > > > > below?  Aren't
>> > > > > > > > > other
>> > > > > > > > > instructions able to do implicit
>> > > > > > > > > conversions?  Probably
>> > > > > > > > > means
>> > > > > > > > > you
>> > > > > > > > > need
>> > > > > > > > > to deal with two sources rather than one.
>> > > > > > > > 
>> > > > > > > > This comes from the programming notes of the MOV
>> > > > > > > > instruction
>> > > > > > > > (Volume
>> > > > > > > > 2a, Command Reference - Instructions - MOV), so it is
>> > > > > > > > described
>> > > >

Re: [Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions

2019-03-04 Thread Francisco Jerez
Iago Toral  writes:

> On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote:
>> Iago Toral  writes:
>> 
>> > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
>> > > Iago Toral  writes:
>> > > 
>> > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
>> > > > > Iago Toral  writes:
>> > > > > 
>> > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
>> > > > > > > Iago Toral Quiroga  writes:
>> > > > > > > 
>> > > > > > > > ---
>> > > > > > > >  src/intel/compiler/brw_eu_validate.c|  64
>> > > > > > > > -
>> > > > > > > >  src/intel/compiler/test_eu_validate.cpp | 122
>> > > > > > > > 
>> > > > > > > >  2 files changed, 185 insertions(+), 1 deletion(-)
>> > > > > > > > 
>> > > > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > index 000a05cb6ac..203641fecb9 100644
>> > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > @@ -531,7 +531,69 @@
>> > > > > > > > general_restrictions_based_on_operand_types(const
>> > > > > > > > struct
>> > > > > > > > gen_device_info *devinf
>> > > > > > > > exec_type_size == 8 && dst_type_size == 4)
>> > > > > > > >dst_type_size = 8;
>> > > > > > > >  
>> > > > > > > > -   if (exec_type_size > dst_type_size) {
>> > > > > > > > +   /* From the BDW+ PRM:
>> > > > > > > > +*
>> > > > > > > > +*"There is no direct conversion from HF to DF
>> > > > > > > > or
>> > > > > > > > DF to
>> > > > > > > > HF.
>> > > > > > > > +* There is no direct conversion from HF to
>> > > > > > > > Q/UQ or
>> > > > > > > > Q/UQ to
>> > > > > > > > HF."
>> > > > > > > > +*/
>> > > > > > > > +   enum brw_reg_type src0_type =
>> > > > > > > > brw_inst_src0_type(devinfo,
>> > > > > > > > inst);
>> > > > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) ==
>> > > > > > > > BRW_OPCODE_MOV
>> > > > > > > > &&
>> > > > > > > 
>> > > > > > > Why is only the MOV instruction handled here and
>> > > > > > > below?  Aren't
>> > > > > > > other
>> > > > > > > instructions able to do implicit conversions?  Probably
>> > > > > > > means
>> > > > > > > you
>> > > > > > > need
>> > > > > > > to deal with two sources rather than one.
>> > > > > > 
>> > > > > > This comes from the programming notes of the MOV
>> > > > > > instruction
>> > > > > > (Volume
>> > > > > > 2a, Command Reference - Instructions - MOV), so it is
>> > > > > > described
>> > > > > > specifically for the MOV instruction. I should probably
>> > > > > > have
>> > > > > > made
>> > > > > > this
>> > > > > > clear in the comment.
>> > > > > > 
>> > > > > 
>> > > > > Maybe the one above is specified in the MOV page only,
>> > > > > probably
>> > > > > due
>> > > > > to
>> > > > > an oversight (If these restrictions were really specific to
>> > > > > the
>> > > > > MOV
>> > > > > instruction, what would prevent you from implementing such
>> > > > > conversions
>> > > > > through a different instruction?  E.g. "ADD dst:df, src:hf,
>

Re: [Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions

2019-03-01 Thread Francisco Jerez
Iago Toral  writes:

> On Fri, 2019-03-01 at 09:39 +0100, Iago Toral wrote:
>> On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
>> > Iago Toral  writes:
>> > 
>> > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
>> > > > Iago Toral  writes:
>> > > > 
>> > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
>> > > > > > Iago Toral Quiroga  writes:
>> > > > > > 
>> > > > > > > ---
>> > > > > > >  src/intel/compiler/brw_eu_validate.c|  64
>> > > > > > > -
>> > > > > > >  src/intel/compiler/test_eu_validate.cpp | 122
>> > > > > > > 
>> > > > > > >  2 files changed, 185 insertions(+), 1 deletion(-)
>> > > > > > > 
>> > > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > index 000a05cb6ac..203641fecb9 100644
>> > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > @@ -531,7 +531,69 @@
>> > > > > > > general_restrictions_based_on_operand_types(const struct
>> > > > > > > gen_device_info *devinf
>> > > > > > > exec_type_size == 8 && dst_type_size == 4)
>> > > > > > >dst_type_size = 8;
>> > > > > > >  
>> > > > > > > -   if (exec_type_size > dst_type_size) {
>> > > > > > > +   /* From the BDW+ PRM:
>> > > > > > > +*
>> > > > > > > +*"There is no direct conversion from HF to DF or
>> > > > > > > DF to
>> > > > > > > HF.
>> > > > > > > +* There is no direct conversion from HF to Q/UQ
>> > > > > > > or
>> > > > > > > Q/UQ to
>> > > > > > > HF."
>> > > > > > > +*/
>> > > > > > > +   enum brw_reg_type src0_type =
>> > > > > > > brw_inst_src0_type(devinfo,
>> > > > > > > inst);
>> > > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) ==
>> > > > > > > BRW_OPCODE_MOV
>> > > > > > > &&
>> > > > > > 
>> > > > > > Why is only the MOV instruction handled here and
>> > > > > > below?  Aren't
>> > > > > > other
>> > > > > > instructions able to do implicit conversions?  Probably
>> > > > > > means
>> > > > > > you
>> > > > > > need
>> > > > > > to deal with two sources rather than one.
>> > > > > 
>> > > > > This comes from the programming notes of the MOV instruction
>> > > > > (Volume
>> > > > > 2a, Command Reference - Instructions - MOV), so it is
>> > > > > described
>> > > > > specifically for the MOV instruction. I should probably have
>> > > > > made
>> > > > > this
>> > > > > clear in the comment.
>> > > > > 
>> > > > 
>> > > > Maybe the one above is specified in the MOV page only, probably
>> > > > due
>> > > > to
>> > > > an oversight (If these restrictions were really specific to the
>> > > > MOV
>> > > > instruction, what would prevent you from implementing such
>> > > > conversions
>> > > > through a different instruction?  E.g. "ADD dst:df, src:hf, 0"
>> > > > which
>> > > > would be substantially more efficient than what you're doing in
>> > > > PATCH
>> > > > 02)
>> > > 
>> > > Instructions that take HF can only be strictly HF or mix F and HF
>> > > (mixed mode float), with MOV being the only exception. That means
>> > > that
>> > > any instruction like the one you use above are invalid. Maybe we
>> > > should
>> > > validate explicitly that instructions that use HF are strictly HF
>> > > or
>> > >

Re: [Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions

2019-03-01 Thread Francisco Jerez
Iago Toral  writes:

> On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
>> Iago Toral  writes:
>> 
>> > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
>> > > Iago Toral  writes:
>> > > 
>> > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
>> > > > > Iago Toral Quiroga  writes:
>> > > > > 
>> > > > > > ---
>> > > > > >  src/intel/compiler/brw_eu_validate.c|  64
>> > > > > > -
>> > > > > >  src/intel/compiler/test_eu_validate.cpp | 122
>> > > > > > 
>> > > > > >  2 files changed, 185 insertions(+), 1 deletion(-)
>> > > > > > 
>> > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c
>> > > > > > b/src/intel/compiler/brw_eu_validate.c
>> > > > > > index 000a05cb6ac..203641fecb9 100644
>> > > > > > --- a/src/intel/compiler/brw_eu_validate.c
>> > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
>> > > > > > @@ -531,7 +531,69 @@
>> > > > > > general_restrictions_based_on_operand_types(const struct
>> > > > > > gen_device_info *devinf
>> > > > > > exec_type_size == 8 && dst_type_size == 4)
>> > > > > >dst_type_size = 8;
>> > > > > >  
>> > > > > > -   if (exec_type_size > dst_type_size) {
>> > > > > > +   /* From the BDW+ PRM:
>> > > > > > +*
>> > > > > > +*"There is no direct conversion from HF to DF or
>> > > > > > DF to
>> > > > > > HF.
>> > > > > > +* There is no direct conversion from HF to Q/UQ or
>> > > > > > Q/UQ to
>> > > > > > HF."
>> > > > > > +*/
>> > > > > > +   enum brw_reg_type src0_type =
>> > > > > > brw_inst_src0_type(devinfo,
>> > > > > > inst);
>> > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) ==
>> > > > > > BRW_OPCODE_MOV
>> > > > > > &&
>> > > > > 
>> > > > > Why is only the MOV instruction handled here and
>> > > > > below?  Aren't
>> > > > > other
>> > > > > instructions able to do implicit conversions?  Probably means
>> > > > > you
>> > > > > need
>> > > > > to deal with two sources rather than one.
>> > > > 
>> > > > This comes from the programming notes of the MOV instruction
>> > > > (Volume
>> > > > 2a, Command Reference - Instructions - MOV), so it is described
>> > > > specifically for the MOV instruction. I should probably have
>> > > > made
>> > > > this
>> > > > clear in the comment.
>> > > > 
>> > > 
>> > > Maybe the one above is specified in the MOV page only, probably
>> > > due
>> > > to
>> > > an oversight (If these restrictions were really specific to the
>> > > MOV
>> > > instruction, what would prevent you from implementing such
>> > > conversions
>> > > through a different instruction?  E.g. "ADD dst:df, src:hf, 0"
>> > > which
>> > > would be substantially more efficient than what you're doing in
>> > > PATCH
>> > > 02)
>> > 
>> > Instructions that take HF can only be strictly HF or mix F and HF
>> > (mixed mode float), with MOV being the only exception. That means
>> > that
>> > any instruction like the one you use above are invalid. Maybe we
>> > should
>> > validate explicitly that instructions that use HF are strictly HF
>> > or
>> > mixed-float mode only?
>> > 
>> 
>> So you're acknowledging that the conversions checked above are
>> illegal
>> whether the instruction is a MOV or something else.  Where are we
>> preventing instructions other than MOV with such conversions from
>> being
>> accepted by the validator?
>
> We aren't, because the validator is not checking, in general, for
> accepted type combinations for specific instructions anywhere as far as
> I can see.

Luckily these type conversion restrictions aren't really specif

Re: [Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions

2019-02-28 Thread Francisco Jerez
Iago Toral  writes:

> On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
>> Iago Toral  writes:
>> 
>> > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
>> > > Iago Toral Quiroga  writes:
>> > > 
>> > > > ---
>> > > >  src/intel/compiler/brw_eu_validate.c|  64 -
>> > > >  src/intel/compiler/test_eu_validate.cpp | 122
>> > > > 
>> > > >  2 files changed, 185 insertions(+), 1 deletion(-)
>> > > > 
>> > > > diff --git a/src/intel/compiler/brw_eu_validate.c
>> > > > b/src/intel/compiler/brw_eu_validate.c
>> > > > index 000a05cb6ac..203641fecb9 100644
>> > > > --- a/src/intel/compiler/brw_eu_validate.c
>> > > > +++ b/src/intel/compiler/brw_eu_validate.c
>> > > > @@ -531,7 +531,69 @@
>> > > > general_restrictions_based_on_operand_types(const struct
>> > > > gen_device_info *devinf
>> > > > exec_type_size == 8 && dst_type_size == 4)
>> > > >dst_type_size = 8;
>> > > >  
>> > > > -   if (exec_type_size > dst_type_size) {
>> > > > +   /* From the BDW+ PRM:
>> > > > +*
>> > > > +*"There is no direct conversion from HF to DF or DF to
>> > > > HF.
>> > > > +* There is no direct conversion from HF to Q/UQ or
>> > > > Q/UQ to
>> > > > HF."
>> > > > +*/
>> > > > +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo,
>> > > > inst);
>> > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV
>> > > > &&
>> > > 
>> > > Why is only the MOV instruction handled here and below?  Aren't
>> > > other
>> > > instructions able to do implicit conversions?  Probably means you
>> > > need
>> > > to deal with two sources rather than one.
>> > 
>> > This comes from the programming notes of the MOV instruction
>> > (Volume
>> > 2a, Command Reference - Instructions - MOV), so it is described
>> > specifically for the MOV instruction. I should probably have made
>> > this
>> > clear in the comment.
>> > 
>> 
>> Maybe the one above is specified in the MOV page only, probably due
>> to
>> an oversight (If these restrictions were really specific to the MOV
>> instruction, what would prevent you from implementing such
>> conversions
>> through a different instruction?  E.g. "ADD dst:df, src:hf, 0" which
>> would be substantially more efficient than what you're doing in PATCH
>> 02)
>
> Instructions that take HF can only be strictly HF or mix F and HF
> (mixed mode float), with MOV being the only exception. That means that
> any instruction like the one you use above are invalid. Maybe we should
> validate explicitly that instructions that use HF are strictly HF or
> mixed-float mode only?
>

So you're acknowledging that the conversions checked above are illegal
whether the instruction is a MOV or something else.  Where are we
preventing instructions other than MOV with such conversions from being
accepted by the validator?

>>  but I see other restriction checks in this patch which are
>> certainly specified in the generic regioning restrictions page and
>> you're limiting to the MOV instruction...
>
> There are two rules below:
>
> 1. The one about conversions between integer and half-float. Again,
> these can only happen through MOV for the same reasons, so I think this
> one should be fine.
>

Why do you think that can only happen through a MOV instruction?  The
hardware spec lists the following as a valid example in the register
region restrictions page:

| add (8) r10.0<2>:hf r11.0<8;8,1>:w r12.0<8;8,1>:w

> 2. The one about word destinations (of which we are only really
> implementing conversions from F->HF). Here the rule is more generic and
> I agree that expanding this to include any other mixed float mode
> instruction would make sense. However, validation for mixed float mode
> has its own set rules, some of which are incompatible with the general
> region restrictions being validated here, so I think it is inconvenient
> to try and do that validation  here (see patch 36 and then patch 37).
> What I would propose, if you agree, is that we only implement this for
> MOV here, and then for mixed float mode instructions, we implement the
> more generic versio

Re: [Mesa-dev] [PATCH v5 33/40] intel/compiler: also set F execution type for mixed float mode in BDW

2019-02-28 Thread Francisco Jerez
Iago Toral  writes:

> On Wed, 2019-02-27 at 15:44 -0800, Francisco Jerez wrote:
>> Iago Toral Quiroga  writes:
>> 
>> > The section 'Execution Data Types' of 3D Media GPGPU volume, which
>> > describes execution types, is exactly the same in BDW and SKL+.
>> > 
>> > Also, this section states that there is a single execution type, so
>> > it
>> > makes sense that this is the wider of the two floating point types
>> > involved in mixed float mode, which is what we do for SKL+ and CHV.
>> > 
>> > v2:
>> >  - Make sure we also account for the destination type in mixed mode
>> > (Curro).
>> > ---
>> >  src/intel/compiler/brw_eu_validate.c | 39 +---
>> > 
>> >  1 file changed, 24 insertions(+), 15 deletions(-)
>> > 
>> > diff --git a/src/intel/compiler/brw_eu_validate.c
>> > b/src/intel/compiler/brw_eu_validate.c
>> > index 358a0347a93..e0010f0fb07 100644
>> > --- a/src/intel/compiler/brw_eu_validate.c
>> > +++ b/src/intel/compiler/brw_eu_validate.c
>> > @@ -348,6 +348,17 @@ is_unsupported_inst(const struct
>> > gen_device_info *devinfo,
>> > return brw_opcode_desc(devinfo, brw_inst_opcode(devinfo, inst))
>> > == NULL;
>> >  }
>> >  
>> > +/**
>> > + * Returns whether a combination of two types would qualify as
>> > mixed float
>> > + * operation mode
>> > + */
>> > +static inline bool
>> > +types_are_mixed_float(enum brw_reg_type t0, enum brw_reg_type t1)
>> > +{
>> > +   return (t0 == BRW_REGISTER_TYPE_F && t1 ==
>> > BRW_REGISTER_TYPE_HF) ||
>> > +  (t1 == BRW_REGISTER_TYPE_F && t0 ==
>> > BRW_REGISTER_TYPE_HF);
>> > +}
>> > +
>> >  static enum brw_reg_type
>> >  execution_type_for_type(enum brw_reg_type type)
>> >  {
>> > @@ -390,20 +401,24 @@ execution_type(const struct gen_device_info
>> > *devinfo, const brw_inst *inst)
>> > enum brw_reg_type src0_exec_type, src1_exec_type;
>> >  
>> > /* Execution data type is independent of destination data type,
>> > except in
>> > -* mixed F/HF instructions on CHV and SKL+.
>> > +* mixed F/HF instructions.
>> >  */
>> > enum brw_reg_type dst_exec_type = brw_inst_dst_type(devinfo,
>> > inst);
>> >  
>> > src0_exec_type =
>> > execution_type_for_type(brw_inst_src0_type(devinfo, inst));
>> > if (num_sources == 1) {
>> > -  if ((devinfo->gen >= 9 || devinfo->is_cherryview) &&
>> > -  src0_exec_type == BRW_REGISTER_TYPE_HF) {
>> > +  if (src0_exec_type == BRW_REGISTER_TYPE_HF)
>> >   return dst_exec_type;
>> > -  }
>> >return src0_exec_type;
>> > }
>> >  
>> > src1_exec_type =
>> > execution_type_for_type(brw_inst_src1_type(devinfo, inst));
>> > +   if (types_are_mixed_float(src0_exec_type, src1_exec_type) ||
>> > +   types_are_mixed_float(src0_exec_type, dst_exec_type) ||
>> > +   types_are_mixed_float(src1_exec_type, dst_exec_type)) {
>> > +  return BRW_REGISTER_TYPE_F;
>> > +   }
>> > +
>> > if (src0_exec_type == src1_exec_type)
>> >return src0_exec_type;
>> >  
>> > @@ -431,18 +446,12 @@ execution_type(const struct gen_device_info
>> > *devinfo, const brw_inst *inst)
>> > src1_exec_type == BRW_REGISTER_TYPE_DF)
>> >return BRW_REGISTER_TYPE_DF;
>> >  
>> > -   if (devinfo->gen >= 9 || devinfo->is_cherryview) {
>> > -  if (dst_exec_type == BRW_REGISTER_TYPE_F ||
>> > -  src0_exec_type == BRW_REGISTER_TYPE_F ||
>> > -  src1_exec_type == BRW_REGISTER_TYPE_F) {
>> > - return BRW_REGISTER_TYPE_F;
>> > -  } else {
>> > - return BRW_REGISTER_TYPE_HF;
>> > -  }
>> > -   }
>> > +   if (src0_exec_type == BRW_REGISTER_TYPE_F ||
>> > +   src1_exec_type == BRW_REGISTER_TYPE_F)
>> > +  return BRW_REGISTER_TYPE_F;
>> >  
>> > -   assert(src0_exec_type == BRW_REGISTER_TYPE_F);
>> > -   return BRW_REGISTER_TYPE_F;
>> > +   assert(src0_exec_type == BRW_REGISTER_TYPE_HF);
>> > +   return BRW_REGISTER_TYPE_HF;
>> 
>> Not really convinced the function is fully correct, but it should be
>> strictly better with this patch:
>
> Is it because of this patch in particular or are you talking about the
> function in general?
>

Talking about the function in general, patch looks okay to me.

>> Acked-by: Francisco Jerez 
>> 
>> >  }
>> >  
>> >  /**
>> > -- 
>> > 2.17.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v5 33/40] intel/compiler: also set F execution type for mixed float mode in BDW

2019-02-27 Thread Francisco Jerez
Iago Toral Quiroga  writes:

> The section 'Execution Data Types' of 3D Media GPGPU volume, which
> describes execution types, is exactly the same in BDW and SKL+.
>
> Also, this section states that there is a single execution type, so it
> makes sense that this is the wider of the two floating point types
> involved in mixed float mode, which is what we do for SKL+ and CHV.
>
> v2:
>  - Make sure we also account for the destination type in mixed mode (Curro).
> ---
>  src/intel/compiler/brw_eu_validate.c | 39 +---
>  1 file changed, 24 insertions(+), 15 deletions(-)
>
> diff --git a/src/intel/compiler/brw_eu_validate.c 
> b/src/intel/compiler/brw_eu_validate.c
> index 358a0347a93..e0010f0fb07 100644
> --- a/src/intel/compiler/brw_eu_validate.c
> +++ b/src/intel/compiler/brw_eu_validate.c
> @@ -348,6 +348,17 @@ is_unsupported_inst(const struct gen_device_info 
> *devinfo,
> return brw_opcode_desc(devinfo, brw_inst_opcode(devinfo, inst)) == NULL;
>  }
>  
> +/**
> + * Returns whether a combination of two types would qualify as mixed float
> + * operation mode
> + */
> +static inline bool
> +types_are_mixed_float(enum brw_reg_type t0, enum brw_reg_type t1)
> +{
> +   return (t0 == BRW_REGISTER_TYPE_F && t1 == BRW_REGISTER_TYPE_HF) ||
> +  (t1 == BRW_REGISTER_TYPE_F && t0 == BRW_REGISTER_TYPE_HF);
> +}
> +
>  static enum brw_reg_type
>  execution_type_for_type(enum brw_reg_type type)
>  {
> @@ -390,20 +401,24 @@ execution_type(const struct gen_device_info *devinfo, 
> const brw_inst *inst)
> enum brw_reg_type src0_exec_type, src1_exec_type;
>  
> /* Execution data type is independent of destination data type, except in
> -* mixed F/HF instructions on CHV and SKL+.
> +* mixed F/HF instructions.
>  */
> enum brw_reg_type dst_exec_type = brw_inst_dst_type(devinfo, inst);
>  
> src0_exec_type = execution_type_for_type(brw_inst_src0_type(devinfo, 
> inst));
> if (num_sources == 1) {
> -  if ((devinfo->gen >= 9 || devinfo->is_cherryview) &&
> -  src0_exec_type == BRW_REGISTER_TYPE_HF) {
> +  if (src0_exec_type == BRW_REGISTER_TYPE_HF)
>   return dst_exec_type;
> -  }
>return src0_exec_type;
> }
>  
> src1_exec_type = execution_type_for_type(brw_inst_src1_type(devinfo, 
> inst));
> +   if (types_are_mixed_float(src0_exec_type, src1_exec_type) ||
> +   types_are_mixed_float(src0_exec_type, dst_exec_type) ||
> +   types_are_mixed_float(src1_exec_type, dst_exec_type)) {
> +  return BRW_REGISTER_TYPE_F;
> +   }
> +
> if (src0_exec_type == src1_exec_type)
>return src0_exec_type;
>  
> @@ -431,18 +446,12 @@ execution_type(const struct gen_device_info *devinfo, 
> const brw_inst *inst)
> src1_exec_type == BRW_REGISTER_TYPE_DF)
>return BRW_REGISTER_TYPE_DF;
>  
> -   if (devinfo->gen >= 9 || devinfo->is_cherryview) {
> -  if (dst_exec_type == BRW_REGISTER_TYPE_F ||
> -  src0_exec_type == BRW_REGISTER_TYPE_F ||
> -  src1_exec_type == BRW_REGISTER_TYPE_F) {
> - return BRW_REGISTER_TYPE_F;
> -  } else {
> - return BRW_REGISTER_TYPE_HF;
> -  }
> -   }
> +   if (src0_exec_type == BRW_REGISTER_TYPE_F ||
> +   src1_exec_type == BRW_REGISTER_TYPE_F)
> +  return BRW_REGISTER_TYPE_F;
>  
> -   assert(src0_exec_type == BRW_REGISTER_TYPE_F);
> -   return BRW_REGISTER_TYPE_F;
> +   assert(src0_exec_type == BRW_REGISTER_TYPE_HF);
> +   return BRW_REGISTER_TYPE_HF;

Not really convinced the function is fully correct, but it should be
strictly better with this patch:

Acked-by: Francisco Jerez 

>  }
>  
>  /**
> -- 
> 2.17.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions

2019-02-27 Thread Francisco Jerez
Iago Toral  writes:

> On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
>> Iago Toral Quiroga  writes:
>> 
>> > ---
>> >  src/intel/compiler/brw_eu_validate.c|  64 -
>> >  src/intel/compiler/test_eu_validate.cpp | 122
>> > 
>> >  2 files changed, 185 insertions(+), 1 deletion(-)
>> > 
>> > diff --git a/src/intel/compiler/brw_eu_validate.c
>> > b/src/intel/compiler/brw_eu_validate.c
>> > index 000a05cb6ac..203641fecb9 100644
>> > --- a/src/intel/compiler/brw_eu_validate.c
>> > +++ b/src/intel/compiler/brw_eu_validate.c
>> > @@ -531,7 +531,69 @@
>> > general_restrictions_based_on_operand_types(const struct
>> > gen_device_info *devinf
>> > exec_type_size == 8 && dst_type_size == 4)
>> >dst_type_size = 8;
>> >  
>> > -   if (exec_type_size > dst_type_size) {
>> > +   /* From the BDW+ PRM:
>> > +*
>> > +*"There is no direct conversion from HF to DF or DF to HF.
>> > +* There is no direct conversion from HF to Q/UQ or Q/UQ to
>> > HF."
>> > +*/
>> > +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo,
>> > inst);
>> > +   ERROR_IF(brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
>> 
>> Why is only the MOV instruction handled here and below?  Aren't other
>> instructions able to do implicit conversions?  Probably means you
>> need
>> to deal with two sources rather than one.
>
> This comes from the programming notes of the MOV instruction (Volume
> 2a, Command Reference - Instructions - MOV), so it is described
> specifically for the MOV instruction. I should probably have made this
> clear in the comment.
>

Maybe the one above is specified in the MOV page only, probably due to
an oversight (If these restrictions were really specific to the MOV
instruction, what would prevent you from implementing such conversions
through a different instruction?  E.g. "ADD dst:df, src:hf, 0" which
would be substantially more efficient than what you're doing in PATCH
02), but I see other restriction checks in this patch which are
certainly specified in the generic regioning restrictions page and
you're limiting to the MOV instruction...

>> > +((dst_type == BRW_REGISTER_TYPE_HF &&
>> > type_sz(src0_type) == 8) ||
>> > + (dst_type_size == 8 && src0_type ==
>> > BRW_REGISTER_TYPE_HF)),
>> > +"There are no direct conversion between 64-bit types
>> > and HF");
>> > +
>> > +   /* From the BDW+ PRM:
>> > +*
>> > +*   "Conversion between Integer and HF (Half Float) must be
>> > +*DWord-aligned and strided by a DWord on the destination."
>> > +*
>> > +* But this seems to be expanded on CHV and SKL+ by:
>> > +*
>> > +*   "There is a relaxed alignment rule for word destinations.
>> > When
>> > +*the destination type is word (UW, W, HF), destination
>> > data types
>> > +*can be aligned to either the lowest word or the second
>> > lowest
>> > +*word of the execution channel. This means the destination
>> > data
>> > +*words can be either all in the even word locations or all
>> > in the
>> > +*odd word locations."
>> > +*
>> > +* We do not implement the second rule as is though, since
>> > empirical testing
>> > +* shows inconsistencies:
>> > +*   - It suggests that packed 16-bit is not allowed, which is
>> > not true.
>> > +*   - It suggests that conversions from Q/DF to W (which need
>> > to be 64-bit
>> > +* aligned on the destination) are not possible, which is
>> > not true.
>> > +*   - It suggests that conversions from 16-bit executions
>> > types to W need
>> > +* to be 32-bit aligned, which doesn't seem to be
>> > necessary.
>> > +*
>> > +* So from this rule we only validate the implication that
>> > conversion from
>> > +* F to HF needs to be DWord aligned too (in BDW this is
>> > limited to
>> > +* conversions from integer types).
>> > +*/
>> > +   bool is_half_float_conversion =
>> > +   brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
>> > +   dst_type != src0_type &&
>> > +   (dst_type == BRW_REGISTE

Re: [Mesa-dev] [PATCH v4 35/40] intel/compiler: validate conversions between 64-bit and 8-bit types

2019-02-26 Thread Francisco Jerez
Iago Toral Quiroga  writes:

> ---
>  src/intel/compiler/brw_eu_validate.c| 10 +-
>  src/intel/compiler/test_eu_validate.cpp | 46 +
>  2 files changed, 55 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_eu_validate.c 
> b/src/intel/compiler/brw_eu_validate.c
> index 203641fecb9..b1fdd1ce941 100644
> --- a/src/intel/compiler/brw_eu_validate.c
> +++ b/src/intel/compiler/brw_eu_validate.c
> @@ -533,10 +533,18 @@ general_restrictions_based_on_operand_types(const 
> struct gen_device_info *devinf
>  
> /* From the BDW+ PRM:
>  *
> -*"There is no direct conversion from HF to DF or DF to HF.
> +*"There is no direct conversion from B/UB to DF or DF to B/UB.
> +* There is no direct conversion from B/UB to Q/UQ or Q/UQ to B/UB.
> +* There is no direct conversion from HF to DF or DF to HF.
>  * There is no direct conversion from HF to Q/UQ or Q/UQ to HF."
>  */
> enum brw_reg_type src0_type = brw_inst_src0_type(devinfo, inst);
> +
> +   ERROR_IF(brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
> +((dst_type_size == 1 && type_sz(src0_type) == 8) ||
> + (dst_type_size == 8 && type_sz(src0_type) == 1)),
> +"There are no direct conversion between 64-bit types and B/UB");
> +

Same comment here as for the last patch -- Why is this only handling the
MOV instruction?

> ERROR_IF(brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
>  ((dst_type == BRW_REGISTER_TYPE_HF && type_sz(src0_type) == 8) ||
>   (dst_type_size == 8 && src0_type == BRW_REGISTER_TYPE_HF)),
> diff --git a/src/intel/compiler/test_eu_validate.cpp 
> b/src/intel/compiler/test_eu_validate.cpp
> index 1557b6d2452..06beb53eb5d 100644
> --- a/src/intel/compiler/test_eu_validate.cpp
> +++ b/src/intel/compiler/test_eu_validate.cpp
> @@ -848,6 +848,52 @@ TEST_P(validation_test, 
> byte_destination_relaxed_alignment)
> }
>  }
>  
> +TEST_P(validation_test, byte_64bit_conversion)
> +{
> +   static const struct {
> +  enum brw_reg_type dst_type;
> +  enum brw_reg_type src_type;
> +  unsigned dst_stride;
> +  bool expected_result;
> +   } inst[] = {
> +#define INST(dst_type, src_type, dst_stride, expected_result) \
> +  {   \
> + BRW_REGISTER_TYPE_##dst_type,\
> + BRW_REGISTER_TYPE_##src_type,\
> + BRW_HORIZONTAL_STRIDE_##dst_stride,  \
> + expected_result, \
> +  }
> +
> +  INST(B,  Q, 1, false),
> +  INST(B, UQ, 1, false),
> +  INST(B, DF, 1, false),
> +
> +  INST(B,  Q, 2, false),
> +  INST(B, UQ, 2, false),
> +  INST(B, DF, 2, false),
> +
> +  INST(B,  Q, 4, false),
> +  INST(B, UQ, 4, false),
> +  INST(B, DF, 4, false),
> +
> +#undef INST
> +   };
> +
> +   if (devinfo.gen < 8)
> +  return;
> +
> +   for (unsigned i = 0; i < sizeof(inst) / sizeof(inst[0]); i++) {
> +  if (!devinfo.has_64bit_types && type_sz(inst[i].src_type) == 8)
> + continue;
> +
> +  brw_MOV(p, retype(g0, inst[i].dst_type), retype(g0, inst[i].src_type));
> +  brw_inst_set_dst_hstride(, last_inst, inst[i].dst_stride);
> +  EXPECT_EQ(inst[i].expected_result, validate(p));
> +
> +  clear_instructions(p);
> +   }
> +}
> +
>  TEST_P(validation_test, half_float_conversion)
>  {
> static const struct {
> -- 
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions

2019-02-26 Thread Francisco Jerez
Iago Toral Quiroga  writes:

> ---
>  src/intel/compiler/brw_eu_validate.c|  64 -
>  src/intel/compiler/test_eu_validate.cpp | 122 
>  2 files changed, 185 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_eu_validate.c 
> b/src/intel/compiler/brw_eu_validate.c
> index 000a05cb6ac..203641fecb9 100644
> --- a/src/intel/compiler/brw_eu_validate.c
> +++ b/src/intel/compiler/brw_eu_validate.c
> @@ -531,7 +531,69 @@ general_restrictions_based_on_operand_types(const struct 
> gen_device_info *devinf
> exec_type_size == 8 && dst_type_size == 4)
>dst_type_size = 8;
>  
> -   if (exec_type_size > dst_type_size) {
> +   /* From the BDW+ PRM:
> +*
> +*"There is no direct conversion from HF to DF or DF to HF.
> +* There is no direct conversion from HF to Q/UQ or Q/UQ to HF."
> +*/
> +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo, inst);
> +   ERROR_IF(brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&

Why is only the MOV instruction handled here and below?  Aren't other
instructions able to do implicit conversions?  Probably means you need
to deal with two sources rather than one.

> +((dst_type == BRW_REGISTER_TYPE_HF && type_sz(src0_type) == 8) ||
> + (dst_type_size == 8 && src0_type == BRW_REGISTER_TYPE_HF)),
> +"There are no direct conversion between 64-bit types and HF");
> +
> +   /* From the BDW+ PRM:
> +*
> +*   "Conversion between Integer and HF (Half Float) must be
> +*DWord-aligned and strided by a DWord on the destination."
> +*
> +* But this seems to be expanded on CHV and SKL+ by:
> +*
> +*   "There is a relaxed alignment rule for word destinations. When
> +*the destination type is word (UW, W, HF), destination data types
> +*can be aligned to either the lowest word or the second lowest
> +*word of the execution channel. This means the destination data
> +*words can be either all in the even word locations or all in the
> +*odd word locations."
> +*
> +* We do not implement the second rule as is though, since empirical 
> testing
> +* shows inconsistencies:
> +*   - It suggests that packed 16-bit is not allowed, which is not true.
> +*   - It suggests that conversions from Q/DF to W (which need to be 
> 64-bit
> +* aligned on the destination) are not possible, which is not true.
> +*   - It suggests that conversions from 16-bit executions types to W need
> +* to be 32-bit aligned, which doesn't seem to be necessary.
> +*
> +* So from this rule we only validate the implication that conversion from
> +* F to HF needs to be DWord aligned too (in BDW this is limited to
> +* conversions from integer types).
> +*/
> +   bool is_half_float_conversion =
> +   brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
> +   dst_type != src0_type &&
> +   (dst_type == BRW_REGISTER_TYPE_HF || src0_type == 
> BRW_REGISTER_TYPE_HF);
> +
> +   if (is_half_float_conversion) {
> +  assert(devinfo->gen >= 8);
> +
> +  if ((dst_type == BRW_REGISTER_TYPE_HF && 
> brw_reg_type_is_integer(src0_type)) ||
> +  (brw_reg_type_is_integer(dst_type) && src0_type == 
> BRW_REGISTER_TYPE_HF)) {
> + ERROR_IF(dst_stride * dst_type_size != 4,
> +  "Conversions between integer and half-float must be 
> strided "
> +  "by a DWord on the destination");
> +
> + unsigned subreg = brw_inst_dst_da1_subreg_nr(devinfo, inst);
> + ERROR_IF(subreg % 4 != 0,
> +  "Conversions between integer and half-float must be 
> aligned "
> +  "to a DWord on the destination");
> +  } else if ((devinfo->is_cherryview || devinfo->gen >= 9) &&
> + dst_type == BRW_REGISTER_TYPE_HF) {
> + ERROR_IF(dst_stride != 2,
> +  "Conversions to HF must have either all words in even word 
> "
> +  "locations or all words in odd word locations");
> +  }
> +
> +   } else if (exec_type_size > dst_type_size) {
>if (!(dst_type_is_byte && inst_is_raw_move(devinfo, inst))) {
>   ERROR_IF(dst_stride * dst_type_size != exec_type_size,
>"Destination stride must be equal to the ratio of the 
> sizes "
> diff --git a/src/intel/compiler/test_eu_validate.cpp 
> b/src/intel/compiler/test_eu_validate.cpp
> index 73300b23122..1557b6d2452 100644
> --- a/src/intel/compiler/test_eu_validate.cpp
> +++ b/src/intel/compiler/test_eu_validate.cpp
> @@ -848,6 +848,128 @@ TEST_P(validation_test, 
> byte_destination_relaxed_alignment)
> }
>  }
>  
> +TEST_P(validation_test, half_float_conversion)
> +{
> +   static const struct {
> +  enum brw_reg_type dst_type;
> +  enum brw_reg_type src_type;
> +  unsigned dst_stride;
> +  unsigned dst_subnr;
> +  bool expected_result;
> +   } inst[] = {
> 

Re: [Mesa-dev] [PATCH v4 33/40] intel/compiler: also set F execution type for mixed float mode in BDW

2019-02-26 Thread Francisco Jerez
Iago Toral Quiroga  writes:

> The section 'Execution Data Types' of 3D Media GPGPU volume, which
> describes execution types, is exactly the same in BDW and SKL+.
>
> Also, this section states that there is a single execution type, so it
> makes sense that this is the wider of the two floating point types
> involved in mixed float mode, which is what we do for SKL+ and CHV.
> ---
>  src/intel/compiler/brw_eu_validate.c | 18 +++---
>  1 file changed, 7 insertions(+), 11 deletions(-)
>
> diff --git a/src/intel/compiler/brw_eu_validate.c 
> b/src/intel/compiler/brw_eu_validate.c
> index 358a0347a93..000a05cb6ac 100644
> --- a/src/intel/compiler/brw_eu_validate.c
> +++ b/src/intel/compiler/brw_eu_validate.c
> @@ -431,18 +431,14 @@ execution_type(const struct gen_device_info *devinfo, 
> const brw_inst *inst)
> src1_exec_type == BRW_REGISTER_TYPE_DF)
>return BRW_REGISTER_TYPE_DF;
>  
> -   if (devinfo->gen >= 9 || devinfo->is_cherryview) {
> -  if (dst_exec_type == BRW_REGISTER_TYPE_F ||
> -  src0_exec_type == BRW_REGISTER_TYPE_F ||
> -  src1_exec_type == BRW_REGISTER_TYPE_F) {
> - return BRW_REGISTER_TYPE_F;
> -  } else {
> - return BRW_REGISTER_TYPE_HF;
> -  }
> +   if (dst_exec_type == BRW_REGISTER_TYPE_F ||
> +   src0_exec_type == BRW_REGISTER_TYPE_F ||
> +   src1_exec_type == BRW_REGISTER_TYPE_F) {
> +  return BRW_REGISTER_TYPE_F;
> +   } else {
> +  assert(devinfo->gen >= 8 && src0_exec_type == BRW_REGISTER_TYPE_HF);
> +  return BRW_REGISTER_TYPE_HF;

I'm having trouble convincing myself that this is correct.  Aren't there
four earlier return statements in this function you may potentially hit
on BDW that will still fail to consider the destination type for
instructions with HF operands?

> }
> -
> -   assert(src0_exec_type == BRW_REGISTER_TYPE_F);
> -   return BRW_REGISTER_TYPE_F;
>  }
>  
>  /**
> -- 
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/5] intel/fs: Exclude control sources from execution type and region alignment calculations.

2019-02-15 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Fri, Jan 18, 2019 at 6:09 PM Francisco Jerez 
> wrote:
>
>> Currently the execution type calculation will return a bogus value in
>> cases like:
>>
>>   mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u
>>
>> Which will be considered to have a 32-bit integer execution type even
>> though the actual indirect move operation will be carried out with
>> 16-bit precision.
>>
>> Similarly there's no need to apply the CHV/BXT double-precision region
>> alignment restrictions to such control sources, since they aren't
>> directly involved in the double-precision arithmetic operations
>> emitted by these virtual instructions.  Applying the CHV/BXT
>> restrictions to control sources was expected to be harmless if mildly
>> inefficient, but unfortunately it exposed problems at codegen level
>> for virtual instructions (namely the SHUFFLE instruction used for the
>> Vulkan 1.1 subgroup feature) that weren't prepared to accept control
>> sources with an arbitrary strided region.
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
>> Reported-by: Mark Janes 
>> Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass."
>> ---
>>  src/intel/compiler/brw_fs.cpp | 54 +++
>>  src/intel/compiler/brw_fs_lower_regioning.cpp |  6 +--
>>  src/intel/compiler/brw_ir_fs.h| 10 +++-
>>  3 files changed, 66 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
>> index 0359eb079f7..f475b617df2 100644
>> --- a/src/intel/compiler/brw_fs.cpp
>> +++ b/src/intel/compiler/brw_fs.cpp
>> @@ -271,6 +271,60 @@ fs_inst::is_send_from_grf() const
>> }
>>  }
>>
>> +bool
>> +fs_inst::is_control_source(unsigned arg) const
>> +{
>> +   switch (opcode) {
>> +   case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD:
>> +   case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GEN7:
>> +   case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN4:
>> +   case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7:
>> +  return arg == 0;
>> +
>> +   case SHADER_OPCODE_BROADCAST:
>> +   case SHADER_OPCODE_SHUFFLE:
>> +   case SHADER_OPCODE_QUAD_SWIZZLE:
>> +   case FS_OPCODE_INTERPOLATE_AT_SAMPLE:
>> +   case FS_OPCODE_INTERPOLATE_AT_SHARED_OFFSET:
>> +   case FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET:
>> +   case SHADER_OPCODE_IMAGE_SIZE:
>> +   case SHADER_OPCODE_GET_BUFFER_SIZE:
>> +  return arg == 1;
>> +
>> +   case SHADER_OPCODE_MOV_INDIRECT:
>> +   case SHADER_OPCODE_CLUSTER_BROADCAST:
>> +   case SHADER_OPCODE_TEX:
>> +   case FS_OPCODE_TXB:
>> +   case SHADER_OPCODE_TXD:
>> +   case SHADER_OPCODE_TXF:
>> +   case SHADER_OPCODE_TXF_LZ:
>> +   case SHADER_OPCODE_TXF_CMS:
>> +   case SHADER_OPCODE_TXF_CMS_W:
>> +   case SHADER_OPCODE_TXF_UMS:
>> +   case SHADER_OPCODE_TXF_MCS:
>> +   case SHADER_OPCODE_TXL:
>> +   case SHADER_OPCODE_TXL_LZ:
>> +   case SHADER_OPCODE_TXS:
>> +   case SHADER_OPCODE_LOD:
>> +   case SHADER_OPCODE_TG4:
>> +   case SHADER_OPCODE_TG4_OFFSET:
>> +   case SHADER_OPCODE_SAMPLEINFO:
>> +   case SHADER_OPCODE_UNTYPED_ATOMIC:
>> +   case SHADER_OPCODE_UNTYPED_ATOMIC_FLOAT:
>> +   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
>> +   case SHADER_OPCODE_UNTYPED_SURFACE_WRITE:
>> +   case SHADER_OPCODE_BYTE_SCATTERED_READ:
>> +   case SHADER_OPCODE_BYTE_SCATTERED_WRITE:
>> +   case SHADER_OPCODE_TYPED_ATOMIC:
>> +   case SHADER_OPCODE_TYPED_SURFACE_READ:
>> +   case SHADER_OPCODE_TYPED_SURFACE_WRITE:
>>
>
> As of b284d222db, we are no longer using many of the opcodes in this list
> (gen7 pull constant loads, [un]typed surface reads/writes, etc.)  It will
> need to be rebased and we need to add SHADER_OPCODE_SEND to the list.
> Fortunately, the changes to add SHADER_OPCODE_SEND landed before the 19.0
> cut-off so there is no need to make two versions for backporting.
>

Yes, that's roughly what I had done during one of my previous rebases of
this series, see:

https://cgit.freedesktop.org/~currojerez/mesa/commit/?h=jenkins=30f8f3ff48b02ead688705e0679a98c0d6c9c87e

> Other than that, this patch seems perfectly reasonable to me
>
> Reviewed-by: Jason Ekstrand 
>
> If you want me to hand-review the new list of opcodes, feel free to send a
> v2 and cc me.
>
>
>> +  return arg == 1 || arg == 2;
>> +
>> +   default:
>> +  return false;
>> +   }
>> +}
>> +
>>  /**
>>   * Returns true if this instruction's sou

Re: [Mesa-dev] [PATCH 2/5] intel/fs: Lower integer multiply correctly when destination stride equals 4.

2019-02-15 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Fri, Jan 18, 2019 at 6:09 PM Francisco Jerez 
> wrote:
>
>> Because the "low" temporary needs to be accessed with word type and
>> twice the original stride, attempting to preserve the alignment of the
>> original destination can potentially lead to instructions with illegal
>> destination stride greater than four.  Because the CHV/BXT alignment
>> restrictions are now being enforced by the regioning lowering pass run
>> after lower_integer_multiplication(), there is no real need to
>> preserve the original strides anymore.
>>
>> Note that this bug can be reproduced on stable branches, but
>> back-porting would be non-trivial, because the fix relies on the
>> regioning lowering pass recently introduced.
>> ---
>>  src/intel/compiler/brw_fs.cpp | 6 ++
>>  1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
>> index f475b617df2..5768e0d6542 100644
>> --- a/src/intel/compiler/brw_fs.cpp
>> +++ b/src/intel/compiler/brw_fs.cpp
>> @@ -3962,13 +3962,11 @@ fs_visitor::lower_integer_multiplication()
>>  regions_overlap(inst->dst, inst->size_written,
>>  inst->src[0], inst->size_read(0)) ||
>>  regions_overlap(inst->dst, inst->size_written,
>> -inst->src[1], inst->size_read(1))) {
>> +inst->src[1], inst->size_read(1)) ||
>> +inst->dst.stride >= 4) {
>>
>
> It would be nice to throw in a quick comment as to why we're adding a
> temporary when stride >= 4.
>

There seemed to be no pre-existing comment about any of the other
conditions to allocate a temporary, so I've added the following:

+ /* Get a new VGRF for the "low" 32x16-bit multiplication result if
+  * reusing the original destination is impossible due to hardware
+  * restrictions, source/destination overlap, or it being the null
+  * register.
+  */
  
>
>> needs_mov = true;
>> -   /* Get a new VGRF but keep the same stride as inst->dst */
>> low = fs_reg(VGRF, alloc.allocate(regs_written(inst)),
>>  inst->dst.type);
>> -   low.stride = inst->dst.stride;
>> -   low.offset = inst->dst.offset % REG_SIZE;
>>  }
>>
>>  /* Get a new VGRF but keep the same stride as inst->dst */
>> --
>> 2.19.2
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/5] intel/fs: Cap dst-aligned region stride to maximum representable hstride value.

2019-02-15 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Fri, Jan 18, 2019 at 6:09 PM Francisco Jerez 
> wrote:
>
>> This is required in combination with the following commit, because
>> otherwise if a source region with an extended 8+ stride is present in
>> the instruction (which we're about to declare legal) we'll end up
>> emitting code that attempts to write to such a region, even though
>> strides greater than four are still illegal for the destination.
>> ---
>>  src/intel/compiler/brw_fs_lower_regioning.cpp | 20 ++-
>>  1 file changed, 15 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp
>> b/src/intel/compiler/brw_fs_lower_regioning.cpp
>> index 6a3c23892b4..b86e95ed9eb 100644
>> --- a/src/intel/compiler/brw_fs_lower_regioning.cpp
>> +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
>> @@ -71,15 +71,25 @@ namespace {
>>!is_byte_raw_mov(inst)) {
>>   return get_exec_type_size(inst);
>>} else {
>> - unsigned stride = inst->dst.stride * type_sz(inst->dst.type);
>> + /* Calculate the maximum byte stride and the minimum type size
>> across
>> +  * all source and destination operands.
>> +  */
>> + unsigned max_stride = inst->dst.stride * type_sz(inst->dst.type);
>> + unsigned min_size = type_sz(inst->dst.type);
>>
>>   for (unsigned i = 0; i < inst->sources; i++) {
>> -if (!is_uniform(inst->src[i]) && !inst->is_control_source(i))
>> -   stride = MAX2(stride, inst->src[i].stride *
>> - type_sz(inst->src[i].type));
>> +if (!is_uniform(inst->src[i]) && !inst->is_control_source(i))
>> {
>> +   max_stride = MAX2(max_stride, inst->src[i].stride *
>> + type_sz(inst->src[i].type));
>> +   min_size = MIN2(min_size, type_sz(inst->src[i].type));
>> +}
>>   }
>>
>> - return stride;
>> + /* Attempt to use the largest byte stride among all present
>> operands,
>> +  * but never exceed a stride of 4 since that would lead to
>> illegal
>> +  * destination regions during lowering.
>> +  */
>> + return MIN2(max_stride, 4 * min_size);
>>
>
> Why not just fall back to tightly packed in this case?  I think I can
> answer my own question:  Because using something that's equal to one of the
> strides reduces the liklihood that we'll need a temporary.  If that's the
> correct answer, then maybe what we want is the maximum of all strides with
> stride_in_bytes <= 4 * type_sz?
>

We also want the result to be greater than or equal to the size of the
largest non-uniform, non-control source type, since packing a vector of
such a type into a temporary of lower byte stride than its size is
impossible.  This patch guarantees that as long as max_size <= 4 *
min_size, which is necessary for the lowering code that calls this
function to work at all.

It would be possible to preserve this guarantee while attempting to pick
one of the strides of the pre-existing sources as you say -- I would be
happy to review that change as a follow-up micro-optimization patch, but
there are some corner cases to consider I don't necessarily want to
bother with in the patch doing the functional change, for the sake of
bisectability.

It may make sense to add an assert here that max_size <= 4 * min_size
for the case such an instruction doesn't blow up already at the EU
validator (it doesn't look like the validator is currently enforcing the
lack of conversions between 8 and 64 bit types?), it will just involve
calculating max_size in addition to max_stride and min_size above.

> --Jason
>
>
>>}
>> }
>>
>> --
>> 2.19.2
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/5] intel/fs: Implement extended strides greater than 4 for IR source regions.

2019-02-14 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Fri, Jan 18, 2019 at 6:09 PM Francisco Jerez 
> wrote:
>
>> Strides up to 32B can be implemented for the source regions of most
>> instructions by leveraging either the vertical or the horizontal
>> stride of the hardware Align1 region.  The main motivation for this is
>> that currently the lower_integer_multiplication() pass will happily
>> double the stride of one of the 32-bit sources, which can blow up if
>> the stride of the original source was already the maximum value
>> allowed by the hardware.
>>
>
> I thought this looked familiar so I did some digging...
>
> On Nov 2 of 2017, I wrote almost exactly this same patch which was
> committed on Nov 7 as e8c9e65185de3e821e1
> On Nov 14, Matt reverted it in a31d0382084c8aa8 because it wasn't needed
> anymore and he wasn't sure of its correctness.
>

That's funny, I wasn't aware of e8c9e65185de3e821e1 nor of its revert.
Change was certainly still needed on Nov 14 due to
lower_integer_multiplication().

> And here we are again
>
> I still believe it to be correct so it is
>
> Reviewed-by: Jason Ekstrand 
>
> My one major request is that you include some of the history of this change
> in the commit message.  As far as the patch itself goes, it's identical to
> mine except for the unneeded whitespace change and one additional assert
> which I believe to be a good addition.
>

Added a comment locally about your previous attempt to do the same thing.

> I've also CC'd matt in case he wants to throw in his $.02
>
> --Jason
>
> An alternative would be to use the regioning legalization pass in
>> order to lower such strides into the composition of multiple legal
>> strides, but that would be somewhat less efficient.
>>
>> This showed up as a regression from my commit cbea91eb57a501bebb1ca2
>> in Vulkan 1.1 CTS tests on CHV/BXT platforms, however it was really a
>> pre-existing problem that had affected conformance on other platforms
>> without native support for integer multiplication.  CHV/BXT were
>> getting around it because the code I removed in that commit had the
>> "fortunate" side effect of emitting narrower regions that didn't hit
>> the hardware stride limit after lowering.  Beyond fixing the
>> regression this fixes ~90 additional Vulkan 1.1 subgroup CTS tests on
>> ICL (that's why this patch is marked for inclusion in mesa-stable even
>> though the original regressing patch was not).
>>
>> Cc: mesa-sta...@lists.freedesktop.org
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
>> Reported-by
>> <https://bugs.freedesktop.org/show_bug.cgi?id=109328Reported-by>: Mark
>> Janes 
>> ---
>>  src/intel/compiler/brw_fs_generator.cpp | 14 +++---
>>  1 file changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/intel/compiler/brw_fs_generator.cpp
>> b/src/intel/compiler/brw_fs_generator.cpp
>> index 5fc6cf5f8cc..b169eacf15b 100644
>> --- a/src/intel/compiler/brw_fs_generator.cpp
>> +++ b/src/intel/compiler/brw_fs_generator.cpp
>> @@ -90,9 +90,17 @@ brw_reg_from_fs_reg(const struct gen_device_info
>> *devinfo, fs_inst *inst,
>>*   different execution size when the number of components
>>*   written to each destination GRF is not the same.
>>*/
>> - const unsigned width = MIN2(reg_width, phys_width);
>> - brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg->nr,
>> 0);
>> - brw_reg = stride(brw_reg, width * reg->stride, width,
>> reg->stride);
>> + if (reg->stride > 4) {
>> +assert(reg != >dst);
>> +assert(reg->stride * type_sz(reg->type) <= REG_SIZE);
>> +brw_reg = brw_vecn_reg(1, brw_file_from_reg(reg), reg->nr, 0);
>> +brw_reg = stride(brw_reg, reg->stride, 1, 0);
>> +
>>
>
> Extra whitespace?
>
>
>> + } else {
>> +const unsigned width = MIN2(reg_width, phys_width);
>> +brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg),
>> reg->nr, 0);
>> +brw_reg = stride(brw_reg, width * reg->stride, width,
>> reg->stride);
>> + }
>>
>>   if (devinfo->gen == 7 && !devinfo->is_haswell) {
>>  /* From the IvyBridge PRM (EU Changes by Processor
>> Generation, page 13):
>> --
>> 2.19.2
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] intel/dump_gpu: Disambiguate between BOs from different GEM handle spaces.

2019-02-07 Thread Francisco Jerez
This fixes a rather astonishing problem that came up while debugging
an issue in the Vulkan CTS.  Apparently the Vulkan CTS framework has
the tendency to create multiple VkDevices, each one with a separate
DRM device FD and therefore a disjoint GEM buffer object handle space.
Because the intel_dump_gpu tool wasn't making any distinction between
buffers from the different handle spaces, it was confusing the
instruction state pools from both devices, which happened to have the
exact same GEM handle and PPGTT virtual address, but completely
different shader contents.  This was causing the simulator to believe
that the vertex pipeline was executing a fragment shader, which didn't
end up well.
---
 src/intel/tools/intel_dump_gpu.c | 41 ++--
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/src/intel/tools/intel_dump_gpu.c b/src/intel/tools/intel_dump_gpu.c
index ffe49b10108..19e054c894c 100644
--- a/src/intel/tools/intel_dump_gpu.c
+++ b/src/intel/tools/intel_dump_gpu.c
@@ -58,6 +58,7 @@ static FILE *output_file = NULL;
 static int verbose = 0;
 static bool device_override;
 
+#define MAX_FD_COUNT 64
 #define MAX_BO_COUNT 64 * 1024
 
 struct bo {
@@ -94,12 +95,13 @@ fail_if(int cond, const char *format, ...)
 }
 
 static struct bo *
-get_bo(uint32_t handle)
+get_bo(unsigned fd, uint32_t handle)
 {
struct bo *bo;
 
fail_if(handle >= MAX_BO_COUNT, "bo handle too large\n");
-   bo = [handle];
+   fail_if(fd >= MAX_FD_COUNT, "bo fd too large\n");
+   bo = [handle + fd * MAX_BO_COUNT];
 
return bo;
 }
@@ -115,7 +117,7 @@ static uint32_t device = 0;
 static struct aub_file aub_file;
 
 static void *
-relocate_bo(struct bo *bo, const struct drm_i915_gem_execbuffer2 *execbuffer2,
+relocate_bo(int fd, struct bo *bo, const struct drm_i915_gem_execbuffer2 
*execbuffer2,
 const struct drm_i915_gem_exec_object2 *obj)
 {
const struct drm_i915_gem_exec_object2 *exec_objects =
@@ -137,7 +139,7 @@ relocate_bo(struct bo *bo, const struct 
drm_i915_gem_execbuffer2 *execbuffer2,
  handle = relocs[i].target_handle;
 
   aub_write_reloc(, ((char *)relocated) + relocs[i].offset,
-  get_bo(handle)->offset + relocs[i].delta);
+  get_bo(fd, handle)->offset + relocs[i].delta);
}
 
return relocated;
@@ -226,7 +228,7 @@ dump_execbuffer2(int fd, struct drm_i915_gem_execbuffer2 
*execbuffer2)
 
for (uint32_t i = 0; i < execbuffer2->buffer_count; i++) {
   obj = _objects[i];
-  bo = get_bo(obj->handle);
+  bo = get_bo(fd, obj->handle);
 
   /* If bo->size == 0, this means they passed us an invalid
* buffer.  The kernel will reject it and so should we.
@@ -262,13 +264,13 @@ dump_execbuffer2(int fd, struct drm_i915_gem_execbuffer2 
*execbuffer2)
 
batch_index = (execbuffer2->flags & I915_EXEC_BATCH_FIRST) ? 0 :
   execbuffer2->buffer_count - 1;
-   batch_bo = get_bo(exec_objects[batch_index].handle);
+   batch_bo = get_bo(fd, exec_objects[batch_index].handle);
for (uint32_t i = 0; i < execbuffer2->buffer_count; i++) {
   obj = _objects[i];
-  bo = get_bo(obj->handle);
+  bo = get_bo(fd, obj->handle);
 
   if (obj->relocation_count > 0)
- data = relocate_bo(bo, execbuffer2, obj);
+ data = relocate_bo(fd, bo, execbuffer2, obj);
   else
  data = bo->map;
 
@@ -306,11 +308,12 @@ dump_execbuffer2(int fd, struct drm_i915_gem_execbuffer2 
*execbuffer2)
 }
 
 static void
-add_new_bo(int handle, uint64_t size, void *map)
+add_new_bo(unsigned fd, int handle, uint64_t size, void *map)
 {
-   struct bo *bo = [handle];
+   struct bo *bo = [handle + fd * MAX_BO_COUNT];
 
fail_if(handle >= MAX_BO_COUNT, "bo handle out of range\n");
+   fail_if(fd >= MAX_FD_COUNT, "bo fd out of range\n");
fail_if(size == 0, "bo size is invalid\n");
 
bo->size = size;
@@ -318,9 +321,9 @@ add_new_bo(int handle, uint64_t size, void *map)
 }
 
 static void
-remove_bo(int handle)
+remove_bo(int fd, int handle)
 {
-   struct bo *bo = get_bo(handle);
+   struct bo *bo = get_bo(fd, handle);
 
if (bo->map && !IS_USERPTR(bo->map))
   munmap(bo->map, bo->size);
@@ -383,7 +386,7 @@ maybe_init(void)
}
fclose(config);
 
-   bos = calloc(MAX_BO_COUNT, sizeof(bos[0]));
+   bos = calloc(MAX_FD_COUNT * MAX_BO_COUNT, sizeof(bos[0]));
fail_if(bos == NULL, "out of memory\n");
 }
 
@@ -455,7 +458,7 @@ ioctl(int fd, unsigned long request, ...)
 
  ret = libc_ioctl(fd, request, argp);
  if (ret == 0)
-add_new_bo(create->handle, create->size, NULL);
+add_new_bo(fd, create->handle, create->size, NULL);
 
  return ret;
   }
@@ -465,15 +468,16 @@ ioctl(int fd, unsigned long request, ...)
 
  ret = libc_ioctl(fd, request, argp);
  if (ret == 0)
-add_new_bo(userptr->handle, userptr->user_size,
+add_new_bo(fd, userptr->handle, userptr->user_size,

Re: [Mesa-dev] [PATCH] intel/compiler: update validator to account for half-float exec type promotion

2019-02-04 Thread Francisco Jerez
Iago Toral  writes:

> On Mon, 2019-02-04 at 08:50 +0100, Iago Toral wrote:
>> On Fri, 2019-02-01 at 11:23 -0800, Francisco Jerez wrote:
>> > Iago Toral  writes:
>> > 
>> > > On Fri, 2019-01-25 at 12:54 -0800, Francisco Jerez wrote:
>> > > > Iago Toral  writes:
>> > > > 
>> > > > > On Thu, 2019-01-24 at 11:45 -0800, Francisco Jerez wrote:
>> > > > > > Iago Toral  writes:
>> > > > > > 
>> > > > > > > On Wed, 2019-01-23 at 06:03 -0800, Francisco Jerez wrote:
>> > > > > > > > Iago Toral Quiroga  writes:
>> > > > > > > > 
>> > > > > > > > > Commit c84ec70b3a72 implemented execution type
>> > > > > > > > > promotion to
>> > > > > > > > > 32-
>> > > > > > > > > bit
>> > > > > > > > > for
>> > > > > > > > > conversions involving half-float registers, which
>> > > > > > > > > empirical
>> > > > > > > > > testing
>> > > > > > > > > suggested
>> > > > > > > > > was required, but it did not incorporate this change
>> > > > > > > > > into
>> > > > > > > > > the
>> > > > > > > > > assembly validator
>> > > > > > > > > logic. This commits adds that, preventing validation
>> > > > > > > > > errors
>> > > > > > > > > like
>> > > > > > > > > this:
>> > > > > > > > > 
>> > > > > > > > 
>> > > > > > > > I don't think we should be validating empirical
>> > > > > > > > assumptions
>> > > > > > > > in
>> > > > > > > > the EU
>> > > > > > > > validator.
>> > > > > > > 
>> > > > > > > I am not sure I get your point, isn't c84ec70b3a72 also
>> > > > > > > based
>> > > > > > > on
>> > > > > > > empirical testing after all?
>> > > > > > > 
>> > > > > > 
>> > > > > > To some extent, but it doesn't attempt to enforce ISA
>> > > > > > restrictions
>> > > > > > based
>> > > > > > on information obtained empirically.
>> > > > > > 
>> > > > > > > 
>> > > > > > > > > mov(16)  g9<4>B   g3<16,8,2>HF { align1 1H };
>> > > > > > > > > ERROR: Destination stride must be equal to the ratio
>> > > > > > > > > of
>> > > > > > > > > the
>> > > > > > > > > sizes
>> > > > > > > > > of the
>> > > > > > > > >execution data type to the destination type
>> > > > > > > > > 
>> > > > > > > > > Fixes: c84ec70b3a72 "intel/fs: Promote execution type
>> > > > > > > > > to
>> > > > > > > > > 32-bit
>> > > > > > > > > when any half-float conversion is needed."
>> > > > > > > > 
>> > > > > > > > I don't think this "fixes" anything that ever worked.
>> > > > > > > 
>> > > > > > > It is true that the code in that trace above is not
>> > > > > > > something
>> > > > > > > we
>> > > > > > > can
>> > > > > > > produce right now, because it is a conversion from HF to
>> > > > > > > B
>> > > > > > > and
>> > > > > > > that
>> > > > > > > should only happen within the context of
>> > > > > > > VK_KHR_shader_float16_int8,
>> > > > > > > however, this is a consequence of the fact that since
>> > > > > > > c84ec70b3a72
>> > > > > > > there is an inconsistency between what we do at the IR
>> > > > > > > level
>> > > > > > > regarding
>> > > &

Re: [Mesa-dev] [PATCH] intel/compiler: update validator to account for half-float exec type promotion

2019-02-01 Thread Francisco Jerez
Iago Toral  writes:

> On Fri, 2019-01-25 at 12:54 -0800, Francisco Jerez wrote:
>> Iago Toral  writes:
>> 
>> > On Thu, 2019-01-24 at 11:45 -0800, Francisco Jerez wrote:
>> > > Iago Toral  writes:
>> > > 
>> > > > On Wed, 2019-01-23 at 06:03 -0800, Francisco Jerez wrote:
>> > > > > Iago Toral Quiroga  writes:
>> > > > > 
>> > > > > > Commit c84ec70b3a72 implemented execution type promotion to
>> > > > > > 32-
>> > > > > > bit
>> > > > > > for
>> > > > > > conversions involving half-float registers, which empirical
>> > > > > > testing
>> > > > > > suggested
>> > > > > > was required, but it did not incorporate this change into
>> > > > > > the
>> > > > > > assembly validator
>> > > > > > logic. This commits adds that, preventing validation errors
>> > > > > > like
>> > > > > > this:
>> > > > > > 
>> > > > > 
>> > > > > I don't think we should be validating empirical assumptions
>> > > > > in
>> > > > > the EU
>> > > > > validator.
>> > > > 
>> > > > I am not sure I get your point, isn't c84ec70b3a72 also based
>> > > > on
>> > > > empirical testing after all?
>> > > > 
>> > > 
>> > > To some extent, but it doesn't attempt to enforce ISA
>> > > restrictions
>> > > based
>> > > on information obtained empirically.
>> > > 
>> > > > 
>> > > > > > mov(16)  g9<4>B   g3<16,8,2>HF { align1 1H };
>> > > > > > ERROR: Destination stride must be equal to the ratio of the
>> > > > > > sizes
>> > > > > > of the
>> > > > > >execution data type to the destination type
>> > > > > > 
>> > > > > > Fixes: c84ec70b3a72 "intel/fs: Promote execution type to
>> > > > > > 32-bit
>> > > > > > when any half-float conversion is needed."
>> > > > > 
>> > > > > I don't think this "fixes" anything that ever worked.
>> > > > 
>> > > > It is true that the code in that trace above is not something
>> > > > we
>> > > > can
>> > > > produce right now, because it is a conversion from HF to B and
>> > > > that
>> > > > should only happen within the context of
>> > > > VK_KHR_shader_float16_int8,
>> > > > however, this is a consequence of the fact that since
>> > > > c84ec70b3a72
>> > > > there is an inconsistency between what we do at the IR level
>> > > > regarding
>> > > > execution size of HF conversions and what the EU validator is
>> > > > doing,
>> > > > and from that perspective this is really fixing an
>> > > > inconsistency
>> > > > that
>> > > > didn't exist before, and I thought we would want to address
>> > > > that
>> > > > sooner
>> > > > rather than later and track it down to the original change that
>> > > > introduced that inconsistency so we know where this is coming
>> > > > from.
>> > > > 
>> > > 
>> > > The "inconsistency" between the IR's get_exec_type() and the EU
>> > > validator's execution_type() has existed ever since
>> > > a05b6f25bf4bfad7
>> > > removed the HF assert from get_exec_type() without actually
>> > > implementing
>> > > the code required to handle HF operands (which is what my commit
>> > > c84ec70b3a72 did).
>> > 
>> > I agree with the fact that since a05b6f25bf4bfad7 the validator
>> > could
>> > reject valid code and that had nothing to do with your patch,
>> 
>> The validator rejected the same valid HF code since it was written,
>> that
>> had nothing to do with neither a05b6f25bf4bfad7 nor with my patch,
>> and
>> it is the real problem this patch was working around.
>> 
>> > but the inconsistency I am talking about here, that this patch
>> > fixes,
>> > is the one about get_exec_type() in the IR and execution_type() in
>> > the
&

Re: [Mesa-dev] [PATCH] intel/compiler: update validator to account for half-float exec type promotion

2019-01-25 Thread Francisco Jerez
Iago Toral  writes:

> On Thu, 2019-01-24 at 11:45 -0800, Francisco Jerez wrote:
>> Iago Toral  writes:
>> 
>> > On Wed, 2019-01-23 at 06:03 -0800, Francisco Jerez wrote:
>> > > Iago Toral Quiroga  writes:
>> > > 
>> > > > Commit c84ec70b3a72 implemented execution type promotion to 32-
>> > > > bit
>> > > > for
>> > > > conversions involving half-float registers, which empirical
>> > > > testing
>> > > > suggested
>> > > > was required, but it did not incorporate this change into the
>> > > > assembly validator
>> > > > logic. This commits adds that, preventing validation errors
>> > > > like
>> > > > this:
>> > > > 
>> > > 
>> > > I don't think we should be validating empirical assumptions in
>> > > the EU
>> > > validator.
>> > 
>> > I am not sure I get your point, isn't c84ec70b3a72 also based on
>> > empirical testing after all?
>> > 
>> 
>> To some extent, but it doesn't attempt to enforce ISA restrictions
>> based
>> on information obtained empirically.
>> 
>> > 
>> > > > mov(16)  g9<4>B   g3<16,8,2>HF { align1 1H };
>> > > > ERROR: Destination stride must be equal to the ratio of the
>> > > > sizes
>> > > > of the
>> > > >execution data type to the destination type
>> > > > 
>> > > > Fixes: c84ec70b3a72 "intel/fs: Promote execution type to 32-bit
>> > > > when any half-float conversion is needed."
>> > > 
>> > > I don't think this "fixes" anything that ever worked.
>> > 
>> > It is true that the code in that trace above is not something we
>> > can
>> > produce right now, because it is a conversion from HF to B and that
>> > should only happen within the context of
>> > VK_KHR_shader_float16_int8,
>> > however, this is a consequence of the fact that since c84ec70b3a72
>> > there is an inconsistency between what we do at the IR level
>> > regarding
>> > execution size of HF conversions and what the EU validator is
>> > doing,
>> > and from that perspective this is really fixing an inconsistency
>> > that
>> > didn't exist before, and I thought we would want to address that
>> > sooner
>> > rather than later and track it down to the original change that
>> > introduced that inconsistency so we know where this is coming from.
>> > 
>> 
>> The "inconsistency" between the IR's get_exec_type() and the EU
>> validator's execution_type() has existed ever since a05b6f25bf4bfad7
>> removed the HF assert from get_exec_type() without actually
>> implementing
>> the code required to handle HF operands (which is what my commit
>> c84ec70b3a72 did).
>
> I agree with the fact that since a05b6f25bf4bfad7 the validator could
> reject valid code and that had nothing to do with your patch,

The validator rejected the same valid HF code since it was written, that
had nothing to do with neither a05b6f25bf4bfad7 nor with my patch, and
it is the real problem this patch was working around.

> but the inconsistency I am talking about here, that this patch fixes,
> is the one about get_exec_type() in the IR and execution_type() in the
> validator doing different things for HF instructions, which only
> exists since your patch and which you discuss below.
>

The "inconsistency" exists ever since get_exec_type() was introduced
without correct handling of HF types (even though execution_type()
already attempted to handle it).  And I disagree that it's a real
inconsistency except due to the fact that the validator is incorrectly
attempting to validate the alignment of the destination region according
to a rule that doesn't apply to HF types.

>> > Anyway, that was my rationale for the Fixes tag, but if you think
>> > this
>> > is not useful I am happy to drop this patch and just include it as
>> > part
>> > of my series without the tag.
>> > 
>> 
>> I'd like to see the actual regioning restrictions for HF types
>> implemented in the EU validator as part of your series.
>
> Ok, let's see if we can agree on what restrictions should we implement
> then. I can implement this restriction as documented:
>
> "Conversion between Integer and HF (Half Float) must be DWord-aligned
> and strided by a DWord on the destination"
>
> Instead of trying to apply the g

Re: [Mesa-dev] [PATCH] intel/compiler: Add a file-level description of brw_eu_validate.c

2019-01-24 Thread Francisco Jerez
Matt Turner  writes:

> ---
>  src/intel/compiler/brw_eu_validate.c | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_eu_validate.c 
> b/src/intel/compiler/brw_eu_validate.c
> index a25010b225c..7f1580a5bb3 100644
> --- a/src/intel/compiler/brw_eu_validate.c
> +++ b/src/intel/compiler/brw_eu_validate.c
> @@ -1,5 +1,5 @@
>  /*
> - * Copyright © 2015 Intel Corporation
> + * Copyright © 2015-2019 Intel Corporation
>   *
>   * Permission is hereby granted, free of charge, to any person obtaining a
>   * copy of this software and associated documentation files (the "Software"),
> @@ -24,6 +24,18 @@
>  /** @file brw_eu_validate.c
>   *
>   * This file implements a pass that validates shader assembly.
> + *
> + * The restrictions implemented herein are intended to verify that 
> instructions
> + * in shader assembly do not violate restrictions documented in the graphics
> + * programming reference manuals.
> + *
> + * The restrictions are difficult for humans to quickly verify due to their
> + * complexity and abundance.
> + *
> + * It is critical that this code is thoroughly unit tested because false
> + * results it will lead developers astray, which is worse than having no

Redundant "it".

> + * validator at all. Patches to this file without corresponding unit tests 
> (in
> + * test_eu_validate.cpp) will be rejected.

Strictly by that rule this patch should be rejected ;).  Maybe say
"functional changes" instead of "patches"?  Other than that:

Reviewed-by: Francisco Jerez 

>   */
>  
>  #include "brw_eu.h"
> -- 
> 2.19.2


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/compiler: update validator to account for half-float exec type promotion

2019-01-24 Thread Francisco Jerez
Iago Toral  writes:

> On Wed, 2019-01-23 at 06:03 -0800, Francisco Jerez wrote:
>> Iago Toral Quiroga  writes:
>> 
>> > Commit c84ec70b3a72 implemented execution type promotion to 32-bit
>> > for
>> > conversions involving half-float registers, which empirical testing
>> > suggested
>> > was required, but it did not incorporate this change into the
>> > assembly validator
>> > logic. This commits adds that, preventing validation errors like
>> > this:
>> > 
>> 
>> I don't think we should be validating empirical assumptions in the EU
>> validator.
>
> I am not sure I get your point, isn't c84ec70b3a72 also based on
> empirical testing after all?
>

To some extent, but it doesn't attempt to enforce ISA restrictions based
on information obtained empirically.

>
>> > mov(16)  g9<4>B   g3<16,8,2>HF { align1 1H };
>> > ERROR: Destination stride must be equal to the ratio of the sizes
>> > of the
>> >execution data type to the destination type
>> > 
>> > Fixes: c84ec70b3a72 "intel/fs: Promote execution type to 32-bit
>> > when any half-float conversion is needed."
>> 
>> I don't think this "fixes" anything that ever worked.
>
> It is true that the code in that trace above is not something we can
> produce right now, because it is a conversion from HF to B and that
> should only happen within the context of VK_KHR_shader_float16_int8,
> however, this is a consequence of the fact that since c84ec70b3a72
> there is an inconsistency between what we do at the IR level regarding
> execution size of HF conversions and what the EU validator is doing,
> and from that perspective this is really fixing an inconsistency that
> didn't exist before, and I thought we would want to address that sooner
> rather than later and track it down to the original change that
> introduced that inconsistency so we know where this is coming from.
>

The "inconsistency" between the IR's get_exec_type() and the EU
validator's execution_type() has existed ever since a05b6f25bf4bfad7
removed the HF assert from get_exec_type() without actually implementing
the code required to handle HF operands (which is what my commit
c84ec70b3a72 did).

> Anyway, that was my rationale for the Fixes tag, but if you think this
> is not useful I am happy to drop this patch and just include it as part
> of my series without the tag.
>

I'd like to see the actual regioning restrictions for HF types
implemented in the EU validator as part of your series.  I don't see the
need to introduce any empirical assumptions into the EU validator's
execution_type() in order to achieve that, even if that means that
get_exec_type() and execution_type() don't do the exact same calculation
-- What you call an inconsistency is the consequence of execution_type()
being the hardware spec's opinion on what the execution type is, which
we assume is what we need to use while enforcing a regioning restriction
that refers to the execution type of the instruction.

>>   The validator is
>> still missing an implementation of the quirky HF restrictions, and it
>> wasn't the purpose of c84ec70b3a72 to do such a thing.
>
> While this is true in general, the EU validator does consider the
> execution type of the instruction to validate general rules such as the
> one I mentioned in the commit message in this patch. And that part of
> the validator is inconsistent with c84ec70b3a72.

That part of the validator was also inconsistent with the code generated
by your original VK_KHR_shader_float16_int8 series even before I
committed c84ec70b3a72.  The reason is that it is trying to validate a
restriction that rejects working code, because the "general" rule it's
trying to enforce isn't supposed to apply to instructions with HF
operands, which is the real bug.

> In fact, the EU validator is accounting for execution size promotion
> of HF instructions to 32-bit in SKL+ and CHV only, for conversions
> from HF->F and mixed float mode instructions... which is part of what
> c84ec70b3a72 addresses at the IR level, which it actually does for all
> hardware platforms and in more cases.
>

I'm fine with fixing execution_type() to do the right thing in more
cases and platforms, but I don't think that should involve making
empirical assumptions in the validator.

>>   You *should*
>> definitely implement those restrictions (as they're stated in the
>> hardware spec, without empirical assumptions) in the validator as
>> part
>> of your VK_KHR_shader_float16_int8 series,
>
> Again, I am not sure what you mean by "without empirical assumptions".

I was just paraphrasing your comment.  If

Re: [Mesa-dev] [PATCH] intel/compiler: update validator to account for half-float exec type promotion

2019-01-23 Thread Francisco Jerez
Iago Toral Quiroga  writes:

> Commit c84ec70b3a72 implemented execution type promotion to 32-bit for
> conversions involving half-float registers, which empirical testing suggested
> was required, but it did not incorporate this change into the assembly 
> validator
> logic. This commits adds that, preventing validation errors like this:
>

I don't think we should be validating empirical assumptions in the EU
validator.

> mov(16)  g9<4>B   g3<16,8,2>HF { align1 1H };
> ERROR: Destination stride must be equal to the ratio of the sizes of the
>execution data type to the destination type
>
> Fixes: c84ec70b3a72 "intel/fs: Promote execution type to 32-bit when any 
> half-float conversion is needed."

I don't think this "fixes" anything that ever worked.  The validator is
still missing an implementation of the quirky HF restrictions, and it
wasn't the purpose of c84ec70b3a72 to do such a thing.  You *should*
definitely implement those restrictions (as they're stated in the
hardware spec, without empirical assumptions) in the validator as part
of your VK_KHR_shader_float16_int8 series, if anything because currently
it will reject working code that uses HF types.

> ---
>  src/intel/compiler/brw_eu_validate.c | 27 ++-
>  1 file changed, 14 insertions(+), 13 deletions(-)
>
> diff --git a/src/intel/compiler/brw_eu_validate.c 
> b/src/intel/compiler/brw_eu_validate.c
> index a25010b225c..3bb37677672 100644
> --- a/src/intel/compiler/brw_eu_validate.c
> +++ b/src/intel/compiler/brw_eu_validate.c
> @@ -325,17 +325,20 @@ execution_type(const struct gen_device_info *devinfo, 
> const brw_inst *inst)
> unsigned num_sources = num_sources_from_inst(devinfo, inst);
> enum brw_reg_type src0_exec_type, src1_exec_type;
>  
> -   /* Execution data type is independent of destination data type, except in
> -* mixed F/HF instructions on CHV and SKL+.
> +   /* Empirical testing suggests that type conversions involving half-float
> +* promote execution type to 32-bit. See get_exec_type() in brw_ir_fs.h.
>  */
> enum brw_reg_type dst_exec_type = brw_inst_dst_type(devinfo, inst);
>  
> src0_exec_type = execution_type_for_type(brw_inst_src0_type(devinfo, 
> inst));
> if (num_sources == 1) {
> -  if ((devinfo->gen >= 9 || devinfo->is_cherryview) &&
> -  src0_exec_type == BRW_REGISTER_TYPE_HF) {
> - return dst_exec_type;
> +  if (type_sz(src0_exec_type) == 2 && dst_exec_type != src0_exec_type) {
> + if (src0_exec_type == BRW_REGISTER_TYPE_HF)
> +return BRW_REGISTER_TYPE_F;
> + else if (dst_exec_type == BRW_REGISTER_TYPE_HF)
> +return BRW_REGISTER_TYPE_D;
>}
> +
>return src0_exec_type;
> }
>  
> @@ -367,14 +370,12 @@ execution_type(const struct gen_device_info *devinfo, 
> const brw_inst *inst)
> src1_exec_type == BRW_REGISTER_TYPE_DF)
>return BRW_REGISTER_TYPE_DF;
>  
> -   if (devinfo->gen >= 9 || devinfo->is_cherryview) {
> -  if (dst_exec_type == BRW_REGISTER_TYPE_F ||
> -  src0_exec_type == BRW_REGISTER_TYPE_F ||
> -  src1_exec_type == BRW_REGISTER_TYPE_F) {
> - return BRW_REGISTER_TYPE_F;
> -  } else {
> - return BRW_REGISTER_TYPE_HF;
> -  }
> +   if (dst_exec_type == BRW_REGISTER_TYPE_F ||
> +   src0_exec_type == BRW_REGISTER_TYPE_F ||
> +   src1_exec_type == BRW_REGISTER_TYPE_F) {
> +  return BRW_REGISTER_TYPE_F;
> +   } else {
> +  return BRW_REGISTER_TYPE_HF;
> }
>  
> assert(src0_exec_type == BRW_REGISTER_TYPE_F);
> -- 
> 2.17.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/compiler: Reset default flag register in brw_find_live_channel()

2019-01-22 Thread Francisco Jerez
Matt Turner  writes:

> emit_uniformize() emits SHADER_OPCODE_FIND_LIVE_CHANNEL with its
> flag_subreg set, so that the IR knows which flag is accessed. However
> the flag is only used on Gen7 in Align1 mode.
>
> To avoid setting unnecessary bits in the instruction words, get the
> information we need and reset the default flag register. This allows
> round-tripping through the assembler/disassembler.
> ---
>  src/intel/compiler/brw_eu_emit.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_eu_emit.c 
> b/src/intel/compiler/brw_eu_emit.c
> index 45e2552783b..7c5b40af3ae 100644
> --- a/src/intel/compiler/brw_eu_emit.c
> +++ b/src/intel/compiler/brw_eu_emit.c
> @@ -3312,6 +3312,13 @@ brw_find_live_channel(struct brw_codegen *p, struct 
> brw_reg dst,
>  
> brw_push_insn_state(p);
>  
> +   /* The flag register is only used on Gen7 in align1 mode, so avoid setting
> +* unnecessary bits in the instruction words, get the information we need
> +* and reset the default flag register.

Maybe mention here that this also allows more instructions to be
compacted.  Looks good otherwise:

Reviewed-by: Francisco Jerez 

> +*/
> +   const unsigned flag_subreg = p->current->flag_subreg;
> +   brw_set_default_flag_reg(p, 0, 0);
> +
> if (brw_get_default_access_mode(p) == BRW_ALIGN_1) {
>brw_set_default_mask_control(p, BRW_MASK_DISABLE);
>  
> @@ -3345,8 +3352,7 @@ brw_find_live_channel(struct brw_codegen *p, struct 
> brw_reg dst,
>*/
>   inst = brw_FBL(p, vec1(dst), exec_mask);
>} else {
> - const struct brw_reg flag = brw_flag_reg(p->current->flag_subreg / 
> 2,
> -  p->current->flag_subreg % 
> 2);
> + const struct brw_reg flag = brw_flag_subreg(flag_subreg);
>  
>   brw_set_default_exec_size(p, BRW_EXECUTE_1);
>   brw_MOV(p, retype(flag, BRW_REGISTER_TYPE_UD), brw_imm_ud(0));
> @@ -3366,6 +3372,8 @@ brw_find_live_channel(struct brw_codegen *p, struct 
> brw_reg dst,
>  brw_inst_set_group(devinfo, inst, lower_size * i + 8 * 
> qtr_control);
>  brw_inst_set_cond_modifier(devinfo, inst, BRW_CONDITIONAL_Z);
>  brw_inst_set_exec_size(devinfo, inst, cvt(lower_size) - 1);
> +brw_inst_set_flag_reg_nr(devinfo, inst, flag_subreg / 2);
> +brw_inst_set_flag_subreg_nr(devinfo, inst, flag_subreg % 2);
>   }
>  
>   /* Find the first bit set in the exec_size-wide portion of the flag
> -- 
> 2.19.2


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/compiler: Reset default flag register in brw_find_live_channel()

2019-01-22 Thread Francisco Jerez
Matt Turner  writes:

> emit_uniformize() emits SHADER_OPCODE_FIND_LIVE_CHANNEL with its
> flag_subreg set, so that the IR knows which flag is accessed. However
> the flag is only used on Gen7 in Align1 mode.
>
> To avoid setting unnecessary bits in the instruction words, get the
> information we need and reset the default flag register. This allows
> round-tripping through the assembler/disassembler.
> ---
>  src/intel/compiler/brw_eu_emit.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_eu_emit.c 
> b/src/intel/compiler/brw_eu_emit.c
> index 45e2552783b..d05ea506353 100644
> --- a/src/intel/compiler/brw_eu_emit.c
> +++ b/src/intel/compiler/brw_eu_emit.c
> @@ -3312,6 +3312,14 @@ brw_find_live_channel(struct brw_codegen *p, struct 
> brw_reg dst,
>  
> brw_push_insn_state(p);
>  
> +   /* The flag register is only used on Gen7 in align1 mode, so avoid setting
> +* unnecessary bits in the instruction words, get the information we need
> +* and reset the default flag register.
> +*/
> +   int flag_reg = p->current->flag_subreg / 2;
> +   int flag_subreg = p->current->flag_subreg % 2;

You can replace the two lines above with:

+ const unsigned flag_subreg = p->current->flag_subreg;

> +   brw_set_default_flag_reg(p, 0, 0);
> +
> if (brw_get_default_access_mode(p) == BRW_ALIGN_1) {
>brw_set_default_mask_control(p, BRW_MASK_DISABLE);
>  
> @@ -3345,12 +3353,14 @@ brw_find_live_channel(struct brw_codegen *p, struct 
> brw_reg dst,
>*/
>   inst = brw_FBL(p, vec1(dst), exec_mask);
>} else {
> - const struct brw_reg flag = brw_flag_reg(p->current->flag_subreg / 
> 2,
> -  p->current->flag_subreg % 
> 2);
> + const struct brw_reg flag = brw_flag_reg(flag_reg, flag_subreg);

so this will just be "brw_flag_subreg(flag_subreg)".

>  
>   brw_set_default_exec_size(p, BRW_EXECUTE_1);
>   brw_MOV(p, retype(flag, BRW_REGISTER_TYPE_UD), brw_imm_ud(0));
>  
> + brw_push_insn_state(p);
> + brw_set_default_flag_reg(p, flag_reg, flag_subreg);
> +

No need to push and pop another entry into the default instruction state
stack, you can just "brw_inst_set_flag_*reg_nr()" on the MOV instruction.

>   /* Run enough instructions returning zero with execution masking and
>* a conditional modifier enabled in order to get the full execution
>* mask in f1.0.  We could use a single 32-wide move here if it
> @@ -3368,6 +3378,8 @@ brw_find_live_channel(struct brw_codegen *p, struct 
> brw_reg dst,
>  brw_inst_set_exec_size(devinfo, inst, cvt(lower_size) - 1);
>   }
>  
> + brw_pop_insn_state(p);
> +
>   /* Find the first bit set in the exec_size-wide portion of the flag
>* register that was updated by the last sequence of MOV
>* instructions.
> -- 
> 2.19.2


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/compiler: Reset default flag register in brw_find_live_channel()

2019-01-22 Thread Francisco Jerez
Matt Turner  writes:

> emit_uniformize() emits SHADER_OPCODE_FIND_LIVE_CHANNEL with its
> flag_subreg set, so that the IR knows which flag is accessed. However
> the flag is only used on Gen7 in Align1 mode, and it is used as an
> explicit source and destination.
>
> To avoid setting unnecessary bits in the instruction words, get the
> information we need and reset the default flag register. This allows
> round-tripping through the assembler/disassembler.
> ---
>  src/intel/compiler/brw_eu_emit.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/src/intel/compiler/brw_eu_emit.c 
> b/src/intel/compiler/brw_eu_emit.c
> index 45e2552783b..e6f6d6419d2 100644
> --- a/src/intel/compiler/brw_eu_emit.c
> +++ b/src/intel/compiler/brw_eu_emit.c
> @@ -3312,6 +3312,14 @@ brw_find_live_channel(struct brw_codegen *p, struct 
> brw_reg dst,
>  
> brw_push_insn_state(p);
>  
> +   /* The flag register is only used on Gen7 in align1 mode, so avoid setting
> +* unnecessary bits in the instruction words, get the information we need
> +* and reset the default flag register.
> +*/
> +   const struct brw_reg flag = brw_flag_reg(p->current->flag_subreg / 2,
> +p->current->flag_subreg % 2);
> +   brw_set_default_flag_reg(p, 0, 0);
> +

I think this is going to break Gen7, because the MOV instructions
emitted in the loop below have conditional mod enabled and won't be
pointing at the right flag register anymore after this change.

> if (brw_get_default_access_mode(p) == BRW_ALIGN_1) {
>brw_set_default_mask_control(p, BRW_MASK_DISABLE);
>  
> @@ -3345,9 +3353,6 @@ brw_find_live_channel(struct brw_codegen *p, struct 
> brw_reg dst,
>*/
>   inst = brw_FBL(p, vec1(dst), exec_mask);
>} else {
> - const struct brw_reg flag = brw_flag_reg(p->current->flag_subreg / 
> 2,
> -  p->current->flag_subreg % 
> 2);
> -
>   brw_set_default_exec_size(p, BRW_EXECUTE_1);
>   brw_MOV(p, retype(flag, BRW_REGISTER_TYPE_UD), brw_imm_ud(0));
>  
> -- 
> 2.19.2


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] intel/fs: Exclude control sources from execution type and region alignment calculations.

2019-01-18 Thread Francisco Jerez
Currently the execution type calculation will return a bogus value in
cases like:

  mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u

Which will be considered to have a 32-bit integer execution type even
though the actual indirect move operation will be carried out with
16-bit precision.

Similarly there's no need to apply the CHV/BXT double-precision region
alignment restrictions to such control sources, since they aren't
directly involved in the double-precision arithmetic operations
emitted by these virtual instructions.  Applying the CHV/BXT
restrictions to control sources was expected to be harmless if mildly
inefficient, but unfortunately it exposed problems at codegen level
for virtual instructions (namely the SHUFFLE instruction used for the
Vulkan 1.1 subgroup feature) that weren't prepared to accept control
sources with an arbitrary strided region.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes 
Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass."
---
 src/intel/compiler/brw_fs.cpp | 54 +++
 src/intel/compiler/brw_fs_lower_regioning.cpp |  6 +--
 src/intel/compiler/brw_ir_fs.h| 10 +++-
 3 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 0359eb079f7..f475b617df2 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -271,6 +271,60 @@ fs_inst::is_send_from_grf() const
}
 }
 
+bool
+fs_inst::is_control_source(unsigned arg) const
+{
+   switch (opcode) {
+   case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD:
+   case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GEN7:
+   case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN4:
+   case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7:
+  return arg == 0;
+
+   case SHADER_OPCODE_BROADCAST:
+   case SHADER_OPCODE_SHUFFLE:
+   case SHADER_OPCODE_QUAD_SWIZZLE:
+   case FS_OPCODE_INTERPOLATE_AT_SAMPLE:
+   case FS_OPCODE_INTERPOLATE_AT_SHARED_OFFSET:
+   case FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET:
+   case SHADER_OPCODE_IMAGE_SIZE:
+   case SHADER_OPCODE_GET_BUFFER_SIZE:
+  return arg == 1;
+
+   case SHADER_OPCODE_MOV_INDIRECT:
+   case SHADER_OPCODE_CLUSTER_BROADCAST:
+   case SHADER_OPCODE_TEX:
+   case FS_OPCODE_TXB:
+   case SHADER_OPCODE_TXD:
+   case SHADER_OPCODE_TXF:
+   case SHADER_OPCODE_TXF_LZ:
+   case SHADER_OPCODE_TXF_CMS:
+   case SHADER_OPCODE_TXF_CMS_W:
+   case SHADER_OPCODE_TXF_UMS:
+   case SHADER_OPCODE_TXF_MCS:
+   case SHADER_OPCODE_TXL:
+   case SHADER_OPCODE_TXL_LZ:
+   case SHADER_OPCODE_TXS:
+   case SHADER_OPCODE_LOD:
+   case SHADER_OPCODE_TG4:
+   case SHADER_OPCODE_TG4_OFFSET:
+   case SHADER_OPCODE_SAMPLEINFO:
+   case SHADER_OPCODE_UNTYPED_ATOMIC:
+   case SHADER_OPCODE_UNTYPED_ATOMIC_FLOAT:
+   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
+   case SHADER_OPCODE_UNTYPED_SURFACE_WRITE:
+   case SHADER_OPCODE_BYTE_SCATTERED_READ:
+   case SHADER_OPCODE_BYTE_SCATTERED_WRITE:
+   case SHADER_OPCODE_TYPED_ATOMIC:
+   case SHADER_OPCODE_TYPED_SURFACE_READ:
+   case SHADER_OPCODE_TYPED_SURFACE_WRITE:
+  return arg == 1 || arg == 2;
+
+   default:
+  return false;
+   }
+}
+
 /**
  * Returns true if this instruction's sources and destinations cannot
  * safely be the same register.
diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp 
b/src/intel/compiler/brw_fs_lower_regioning.cpp
index df50993dee6..6a3c23892b4 100644
--- a/src/intel/compiler/brw_fs_lower_regioning.cpp
+++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
@@ -74,7 +74,7 @@ namespace {
  unsigned stride = inst->dst.stride * type_sz(inst->dst.type);
 
  for (unsigned i = 0; i < inst->sources; i++) {
-if (!is_uniform(inst->src[i]))
+if (!is_uniform(inst->src[i]) && !inst->is_control_source(i))
stride = MAX2(stride, inst->src[i].stride *
  type_sz(inst->src[i].type));
  }
@@ -92,7 +92,7 @@ namespace {
required_dst_byte_offset(const fs_inst *inst)
{
   for (unsigned i = 0; i < inst->sources; i++) {
- if (!is_uniform(inst->src[i]))
+ if (!is_uniform(inst->src[i]) && !inst->is_control_source(i))
 if (reg_offset(inst->src[i]) % REG_SIZE !=
 reg_offset(inst->dst) % REG_SIZE)
return 0;
@@ -109,7 +109,7 @@ namespace {
has_invalid_src_region(const gen_device_info *devinfo, const fs_inst *inst,
   unsigned i)
{
-  if (is_unordered(inst)) {
+  if (is_unordered(inst) || inst->is_control_source(i)) {
  return false;
   } else {
  const unsigned dst_byte_stride = inst->dst.stride * 
type_sz(inst->dst.type);
diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
index 08e3d83d910..0a0ba1d363a 100644
--- a/src/intel/compiler/brw_ir_fs.h
+++ b/src/intel/compiler/brw_ir_fs.h
@@ -358,6 +358,13 @@ public:
bool can_change_types() const;

[Mesa-dev] [PATCH 5/5] intel/fs: Rely on undocumented unrestricted regioning for 32x16-bit integer multiply.

2019-01-18 Thread Francisco Jerez
Even though the hardware spec claims that any "integer DWord multiply"
operation is affected by the regioning restrictions of CHV/BXT/GLK,
this is inconsistent with the behavior of the simulator and with
empirical evidence -- Return false from has_dst_aligned_region_restriction()
for such instructions as a micro-optimization.
---
 src/intel/compiler/brw_ir_fs.h | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
index 0a0ba1d363a..c50df45922a 100644
--- a/src/intel/compiler/brw_ir_fs.h
+++ b/src/intel/compiler/brw_ir_fs.h
@@ -543,11 +543,19 @@ has_dst_aligned_region_restriction(const gen_device_info 
*devinfo,
const fs_inst *inst)
 {
const brw_reg_type exec_type = get_exec_type(inst);
-   const bool is_int_multiply = !brw_reg_type_is_floating_point(exec_type) &&
- (inst->opcode == BRW_OPCODE_MUL || inst->opcode == BRW_OPCODE_MAD);
+   /* Even though the hardware spec claims that "integer DWord multiply"
+* operations are restricted, empirical evidence and the behavior of the
+* simulator suggest that only 32x32-bit integer multiplication is
+* restricted.
+*/
+   const bool is_dword_multiply = !brw_reg_type_is_floating_point(exec_type) &&
+  ((inst->opcode == BRW_OPCODE_MUL &&
+MIN2(type_sz(inst->src[0].type), type_sz(inst->src[1].type)) >= 4) ||
+   (inst->opcode == BRW_OPCODE_MAD &&
+MIN2(type_sz(inst->src[1].type), type_sz(inst->src[2].type)) >= 4));
 
if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
-   (type_sz(exec_type) == 4 && is_int_multiply))
+   (type_sz(exec_type) == 4 && is_dword_multiply))
   return devinfo->is_cherryview || gen_device_info_is_9lp(devinfo);
else
   return false;
-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] intel/fs: Lower integer multiply correctly when destination stride equals 4.

2019-01-18 Thread Francisco Jerez
Because the "low" temporary needs to be accessed with word type and
twice the original stride, attempting to preserve the alignment of the
original destination can potentially lead to instructions with illegal
destination stride greater than four.  Because the CHV/BXT alignment
restrictions are now being enforced by the regioning lowering pass run
after lower_integer_multiplication(), there is no real need to
preserve the original strides anymore.

Note that this bug can be reproduced on stable branches, but
back-porting would be non-trivial, because the fix relies on the
regioning lowering pass recently introduced.
---
 src/intel/compiler/brw_fs.cpp | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index f475b617df2..5768e0d6542 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -3962,13 +3962,11 @@ fs_visitor::lower_integer_multiplication()
 regions_overlap(inst->dst, inst->size_written,
 inst->src[0], inst->size_read(0)) ||
 regions_overlap(inst->dst, inst->size_written,
-inst->src[1], inst->size_read(1))) {
+inst->src[1], inst->size_read(1)) ||
+inst->dst.stride >= 4) {
needs_mov = true;
-   /* Get a new VGRF but keep the same stride as inst->dst */
low = fs_reg(VGRF, alloc.allocate(regs_written(inst)),
 inst->dst.type);
-   low.stride = inst->dst.stride;
-   low.offset = inst->dst.offset % REG_SIZE;
 }
 
 /* Get a new VGRF but keep the same stride as inst->dst */
-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] intel/fs: Cap dst-aligned region stride to maximum representable hstride value.

2019-01-18 Thread Francisco Jerez
This is required in combination with the following commit, because
otherwise if a source region with an extended 8+ stride is present in
the instruction (which we're about to declare legal) we'll end up
emitting code that attempts to write to such a region, even though
strides greater than four are still illegal for the destination.
---
 src/intel/compiler/brw_fs_lower_regioning.cpp | 20 ++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp 
b/src/intel/compiler/brw_fs_lower_regioning.cpp
index 6a3c23892b4..b86e95ed9eb 100644
--- a/src/intel/compiler/brw_fs_lower_regioning.cpp
+++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
@@ -71,15 +71,25 @@ namespace {
   !is_byte_raw_mov(inst)) {
  return get_exec_type_size(inst);
   } else {
- unsigned stride = inst->dst.stride * type_sz(inst->dst.type);
+ /* Calculate the maximum byte stride and the minimum type size across
+  * all source and destination operands.
+  */
+ unsigned max_stride = inst->dst.stride * type_sz(inst->dst.type);
+ unsigned min_size = type_sz(inst->dst.type);
 
  for (unsigned i = 0; i < inst->sources; i++) {
-if (!is_uniform(inst->src[i]) && !inst->is_control_source(i))
-   stride = MAX2(stride, inst->src[i].stride *
- type_sz(inst->src[i].type));
+if (!is_uniform(inst->src[i]) && !inst->is_control_source(i)) {
+   max_stride = MAX2(max_stride, inst->src[i].stride *
+ type_sz(inst->src[i].type));
+   min_size = MIN2(min_size, type_sz(inst->src[i].type));
+}
  }
 
- return stride;
+ /* Attempt to use the largest byte stride among all present operands,
+  * but never exceed a stride of 4 since that would lead to illegal
+  * destination regions during lowering.
+  */
+ return MIN2(max_stride, 4 * min_size);
   }
}
 
-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] intel/fs: Implement extended strides greater than 4 for IR source regions.

2019-01-18 Thread Francisco Jerez
Strides up to 32B can be implemented for the source regions of most
instructions by leveraging either the vertical or the horizontal
stride of the hardware Align1 region.  The main motivation for this is
that currently the lower_integer_multiplication() pass will happily
double the stride of one of the 32-bit sources, which can blow up if
the stride of the original source was already the maximum value
allowed by the hardware.

An alternative would be to use the regioning legalization pass in
order to lower such strides into the composition of multiple legal
strides, but that would be somewhat less efficient.

This showed up as a regression from my commit cbea91eb57a501bebb1ca2
in Vulkan 1.1 CTS tests on CHV/BXT platforms, however it was really a
pre-existing problem that had affected conformance on other platforms
without native support for integer multiplication.  CHV/BXT were
getting around it because the code I removed in that commit had the
"fortunate" side effect of emitting narrower regions that didn't hit
the hardware stride limit after lowering.  Beyond fixing the
regression this fixes ~90 additional Vulkan 1.1 subgroup CTS tests on
ICL (that's why this patch is marked for inclusion in mesa-stable even
though the original regressing patch was not).

Cc: mesa-sta...@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes 
---
 src/intel/compiler/brw_fs_generator.cpp | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index 5fc6cf5f8cc..b169eacf15b 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -90,9 +90,17 @@ brw_reg_from_fs_reg(const struct gen_device_info *devinfo, 
fs_inst *inst,
   *   different execution size when the number of components
   *   written to each destination GRF is not the same.
   */
- const unsigned width = MIN2(reg_width, phys_width);
- brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg->nr, 0);
- brw_reg = stride(brw_reg, width * reg->stride, width, reg->stride);
+ if (reg->stride > 4) {
+assert(reg != >dst);
+assert(reg->stride * type_sz(reg->type) <= REG_SIZE);
+brw_reg = brw_vecn_reg(1, brw_file_from_reg(reg), reg->nr, 0);
+brw_reg = stride(brw_reg, reg->stride, 1, 0);
+
+ } else {
+const unsigned width = MIN2(reg_width, phys_width);
+brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg->nr, 0);
+brw_reg = stride(brw_reg, width * reg->stride, width, reg->stride);
+ }
 
  if (devinfo->gen == 7 && !devinfo->is_haswell) {
 /* From the IvyBridge PRM (EU Changes by Processor Generation, 
page 13):
-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v10 09/20] clover: Track flags per module section

2019-01-18 Thread Francisco Jerez
Pierre Moreau  writes:

> One flag that needs to be tracked is whether a library is allowed to
> received mathematics optimisations or not, as the authorisation is given
> when creating the library while the optimisations are specified when
> creating the executable.
>
> Reviewed-by: Aaron Watry 
>
> Changes since:
> * v3: drop the modification to the tgsi backend, as already dropped
>   (Aaron Watry)
>
> Signed-off-by: Pierre Moreau 
> ---
>  src/gallium/state_trackers/clover/core/module.cpp   |  1 +
>  src/gallium/state_trackers/clover/core/module.hpp   | 13 +
>  .../state_trackers/clover/llvm/codegen/bitcode.cpp  |  3 ++-
>  .../state_trackers/clover/llvm/codegen/common.cpp   |  2 +-
>  4 files changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/src/gallium/state_trackers/clover/core/module.cpp 
> b/src/gallium/state_trackers/clover/core/module.cpp
> index a6c5b98d8e0..0e11506d0d7 100644
> --- a/src/gallium/state_trackers/clover/core/module.cpp
> +++ b/src/gallium/state_trackers/clover/core/module.cpp
> @@ -163,6 +163,7 @@ namespace {
>proc(S , QT ) {
>   _proc(s, x.id);
>   _proc(s, x.type);
> + _proc(s, x.flags);
>   _proc(s, x.size);
>   _proc(s, x.data);
>}
> diff --git a/src/gallium/state_trackers/clover/core/module.hpp 
> b/src/gallium/state_trackers/clover/core/module.hpp
> index 2ddd26426fb..ff7e9b6234a 100644
> --- a/src/gallium/state_trackers/clover/core/module.hpp
> +++ b/src/gallium/state_trackers/clover/core/module.hpp
> @@ -41,14 +41,19 @@ namespace clover {
>  data_local,
>  data_private
>   };
> + enum class flags_t {

You probably want the type to be "enum flags" for consistency with the
other enums defined here.

> +none,
> +allow_link_options

And explicitly define allow_link_options = 1u, assuming that this is
going to be a bit-mask with multiple flags.

Is this patch being used at all in this series?

> + };
>  
> - section(resource_id id, enum type type, size_t size,
> - const std::vector ) :
> - id(id), type(type), size(size), data(data) { }
> - section() : id(0), type(text_intermediate), size(0), data() { }
> + section(resource_id id, enum type type, flags_t flags,
> + size_t size, const std::vector ) :
> + id(id), type(type), flags(flags), size(size), data(data) { }
> + section() : id(0), type(text_intermediate), flags(flags_t::none), 
> size(0), data() { }
>  
>   resource_id id;
>   type type;
> + flags_t flags;
>   size_t size;
>   std::vector data;
>};
> diff --git a/src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp 
> b/src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp
> index 40bb426218d..8e9d4c7e85c 100644
> --- a/src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp
> +++ b/src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp
> @@ -84,7 +84,8 @@ clover::llvm::build_module_library(const ::llvm::Module 
> ,
> enum module::section::type section_type) {
> module m;
> const auto code = emit_code(mod);
> -   m.secs.emplace_back(0, section_type, code.size(), code);
> +   m.secs.emplace_back(0, section_type, module::section::flags_t::none,
> +   code.size(), code);
> return m;
>  }
>  
> diff --git a/src/gallium/state_trackers/clover/llvm/codegen/common.cpp 
> b/src/gallium/state_trackers/clover/llvm/codegen/common.cpp
> index ca5f78940d2..a278e675003 100644
> --- a/src/gallium/state_trackers/clover/llvm/codegen/common.cpp
> +++ b/src/gallium/state_trackers/clover/llvm/codegen/common.cpp
> @@ -178,7 +178,7 @@ namespace {
> make_text_section(const std::vector ) {
>const pipe_llvm_program_header header { uint32_t(code.size()) };
>module::section text { 0, module::section::text_executable,
> - header.num_bytes, {} };
> + module::section::flags_t::none, 
> header.num_bytes, {} };
>  
>text.data.insert(text.data.end(), reinterpret_cast *>(),
> reinterpret_cast() + 
> sizeof(header));
> -- 
> 2.20.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v10 06/20] clover/api: Rework the validation of devices for building

2019-01-18 Thread Francisco Jerez
Pierre Moreau  writes:

> Reviewed-by: Francisco Jerez 
>
> Changes since:
> * v5:
>   - Drop the `valid_devs` argument to `validate_build_common()`
>     (Francisco Jerez)
>   - Change `clLinkProgram()` to initialise `prog`’s devices prior to
> calling `validate_build_common()`.
> * v2:
>   - validate_build_common no longer returns a list of devices (Francisco
> Jerez);
>   - Dropped duplicate checks (Francisco Jerez).
>
> Signed-off-by: Pierre Moreau 

The current revision of this patch is still:

Reviewed-by: Francisco Jerez 

> ---
>  .../state_trackers/clover/api/program.cpp  | 18 +-
>  .../state_trackers/clover/core/program.cpp |  3 ++-
>  2 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/src/gallium/state_trackers/clover/api/program.cpp 
> b/src/gallium/state_trackers/clover/api/program.cpp
> index 9d59668f8f6..891a002f3d0 100644
> --- a/src/gallium/state_trackers/clover/api/program.cpp
> +++ b/src/gallium/state_trackers/clover/api/program.cpp
> @@ -41,7 +41,7 @@ namespace {
>   throw error(CL_INVALID_OPERATION);
>  
>if (any_of([&](const device ) {
> -   return !count(dev, prog.context().devices());
> +   return !count(dev, prog.devices());
>  }, objs(d_devs, num_devs)))
>   throw error(CL_INVALID_DEVICE);
> }
> @@ -176,8 +176,8 @@ clBuildProgram(cl_program d_prog, cl_uint num_devs,
> void (*pfn_notify)(cl_program, void *),
> void *user_data) try {
> auto  = obj(d_prog);
> -   auto devs = (d_devs ? objs(d_devs, num_devs) :
> -ref_vector(prog.context().devices()));
> +   auto devs =
> +  (d_devs ? objs(d_devs, num_devs) : ref_vector(prog.devices()));
> const auto opts = std::string(p_opts ? p_opts : "") + " " +
>   debug_get_option("CLOVER_EXTRA_BUILD_OPTIONS", "");
>  
> @@ -202,8 +202,8 @@ clCompileProgram(cl_program d_prog, cl_uint num_devs,
>   void (*pfn_notify)(cl_program, void *),
>   void *user_data) try {
> auto  = obj(d_prog);
> -   auto devs = (d_devs ? objs(d_devs, num_devs) :
> -ref_vector(prog.context().devices()));
> +   auto devs =
> +   (d_devs ? objs(d_devs, num_devs) : 
> ref_vector(prog.devices()));
> const auto opts = std::string(p_opts ? p_opts : "") + " " +
>   debug_get_option("CLOVER_EXTRA_COMPILE_OPTIONS", "");
> header_map headers;
> @@ -279,10 +279,10 @@ clLinkProgram(cl_context d_ctx, cl_uint num_devs, const 
> cl_device_id *d_devs,
> const auto opts = std::string(p_opts ? p_opts : "") + " " +
>   debug_get_option("CLOVER_EXTRA_LINK_OPTIONS", "");
> auto progs = objs(d_progs, num_progs);
> -   auto prog = create(ctx);
> -   auto devs = validate_link_devices(progs,
> - (d_devs ? objs(d_devs, num_devs) :
> -  ref_vector(ctx.devices(;
> +   auto all_devs =
> +  (d_devs ? objs(d_devs, num_devs) : ref_vector(ctx.devices()));
> +   auto prog = create(ctx, all_devs);
> +   auto devs = validate_link_devices(progs, all_devs);
>  
> validate_build_common(prog, num_devs, d_devs, pfn_notify, user_data);
>  
> diff --git a/src/gallium/state_trackers/clover/core/program.cpp 
> b/src/gallium/state_trackers/clover/core/program.cpp
> index ec71d99b017..62fa13efbf9 100644
> --- a/src/gallium/state_trackers/clover/core/program.cpp
> +++ b/src/gallium/state_trackers/clover/core/program.cpp
> @@ -26,7 +26,8 @@
>  using namespace clover;
>  
>  program::program(clover::context , const std::string ) :
> -   has_source(true), context(ctx), _source(source), _kernel_ref_counter(0) {
> +   has_source(true), context(ctx), _devices(ctx.devices()), _source(source),
> +   _kernel_ref_counter(0) {
>  }
>  
>  program::program(clover::context ,
> -- 
> 2.20.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/fs: Don't apply the des stride alignment rule to accumulators

2019-01-17 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Thu, Jan 17, 2019 at 3:34 PM Francisco Jerez 
> wrote:
>
>> Jason Ekstrand  writes:
>>
>> > Bah... previous e-mail unfinished.  Please ignore.
>> >
>> > On Thu, Jan 17, 2019 at 4:15 AM Francisco Jerez 
>> > wrote:
>> >
>> >> Jason Ekstrand  writes:
>> >>
>> >> > The pass was discovered to cause problems with the MUL+MACH
>> combination
>> >> > we emit for nir_[iu]mul_high.  In an experimental branch of mine, I
>> ran
>> >> > into issues where the MUL+MACH ended up using a strided source due to
>> >> > working on half of a uint64_t and the new lowering pass helpfully
>> tried
>> >> > to fix the multiply which wrote to an unstriated accumulator.
>> >>
>> >> > Not only did the multiply not need to be fixed
>> >>
>> >> That's far from clear, and inconsistent with what this patch is doing,
>> >> since the fix is still being applied (Wouldn't it make sense to clarify
>> >> that in the commit message since it's slightly misleading about it?).
>> >>
>> >> The original instruction was technically violating the first CHV/BXT
>> >> double-precision regioning restriction before the pass was introduced,
>> >> that's why it made any changes in the first place.  The integer
>> >> multiplication lowering code was just lucky enough that violating the
>> >> restriction didn't matter in this case, but I doubt that the reason for
>> >> that had anything to do with the accumulator being the explicit
>> >> destination...
>> >>
>> >
>> > Explicit, no, but I do suspect that does have to do with it being the
>> > accumulator.  This restriction isn't theoretical; if you violate it
>> > with a GRF, you will get data corruption; I've seen it myself.
>>
>> The BSpec language is vague and frequently inconsistent.  Obviously it
>> was being violated before because the text doesn't name the accumulator
>> as an exemption from that rule.  The fact that you've seen it blow up
>> with corruption before doesn't guarantee it will always blow up under
>> the conditions stated on the hardware spec (because those conditions are
>> a highly imperfect abstraction of the hardware logic rather than the
>> hardware logic itself).  It's because the restriction (as it's
>> enunciated in the BSpec) was purely theoretical that the MULH
>> implementation worked in the first place.
>>
>> > I could see two possible explanations:
>> >
>> >  1. Under the hood the accumulator is written with a Q type and an
>> internal
>> > stride of 8 bytes, hence the restriction does apply but is implicitly
>> > satisfied for D type source strides of 1 and 2.
>> >  2. The data path to the accumulator is a special case in the hardware
>> and
>> > doesn't use the normal general-purpose regioning logic and so doesn't
>> > require the restriction.
>> >
>>
>> I don't see any evidence for any of these explanations.  I believe that
>> the actual reason why the MULH implementation didn't suffer the effects
>> of violating these restrictions is that in fact they don't apply to
>> *any* 32x16-bit integer multiply operations at all despite what the
>> hardware spec says, whether the destination is the accumulator or not.
>>
>> I've verified it by doing a daily CI run on the following patch:
>>
>>
>> https://cgit.freedesktop.org/~currojerez/mesa/commit/?h=jenkins=c1a32c4e1e53d70b0c8f6254f0f53f0230b7e21b
>>
>> It disables legalization of the integer multiply instruction and then
>> adds a hack to lower_integer_multiplication() for it to intentionally
>> break the alignment rule.  No regressions on CHV/BXT/GLK.  My reading of
>> the simulator confirms that 32x16-bit multiplication isn't affected by
>> the restriction.
>>
>
> That explanation makes sense especially when combined with the fact that
> DxD -> Q and DxD -> D was added on gen8 and DxW -> D or DxW -> acc0 has
> been around for a long time.
>
>
>> I'm tempted to send a patch that disables regioning alignment lowering
>> for 32x16-bit integer multiplication strictly for performance.  But
>> that's really an orthogonal change to this patch, since due to the issue
>> of precision loss we still need to make sure not to touch accumulator
>> destinations in instructions that *do* have this restriction.
>>
>
> I just searched through the fs code and implementing MULH is th

Re: [Mesa-dev] [PATCH] intel/fs: Don't apply the des stride alignment rule to accumulators

2019-01-17 Thread Francisco Jerez
Subject is still inaccurate.  How about "intel/fs: Don't touch
accumulator destination while applying regioning alignment rule."

Jason Ekstrand  writes:

> In some shaders, you can end up with a stride in the source of a
> SHADER_OPCODE_MULH.  One way this can happen is if the MULH is acting on
> the top bits of a 64-bit value due to 64-bit integer lowering.  In this
> case, the compiler will produce something like this:
>
> mul(8)acc0<1>UD   g5<8,4,2>UD   0x0004UW  { align1 1Q };
> mach(8)   g6<1>UD g5<8,4,2>UD   0x0004UD  { align1 1Q AccWrEnable };
>
> The new region fixup pass looks at the MUL and sees a strided source and
> unstrided destination and determines that the sequence is illegal.  It
> then attempts to fix the illegal stride by replacing the destination of
> the MUL with a temporary and emitting a MOV into the accumulator:
>
> mul(8)g9<2>UD g5<8,4,2>UD   0x0004UW  { align1 1Q };
> mov(8)acc0<1>UD   g9<8,4,2>UD { align1 1Q };
> mach(8)   g6<1>UD g5<8,4,2>UD   0x0004UD  { align1 1Q AccWrEnable };
>
> Unfortunately, this new sequence isn't correct because MOV accesses the
> accumulator with a different precision to MUL and, instead of filling
> the bottom 32 bits with the source and zeroing the top 32 bits, it
> leaves the top 32 (or maybe 31) bits alone and full of garbage.  When
> the MACH comes along and tries to complete the multiplication, the
> result is correct in the bottom 32 bits (which we throw away) and
> garbage in the top 32 bits which are actually returned by MACH.
>
> This commit does two things:  First, it adds an assert to ensure that we
> don't try to rewrite accumulator destinations of MUL instructions so we
> can avoid this precision issue.  Second, it modifies
> required_dst_byte_stride to require a tightly packed stride so that we
> fix up the sources instead and the actual code which gets emitted is
> this:
>
> mov(8)g9<1>UD g5<8,4,2>UD { align1 1Q };
> mul(8)acc0<1>UD   g9<8,8,1>UD   0x0004UW  { align1 1Q };
> mach(8)   g6<1>UD g5<8,4,2>UD   0x0004UD  { align1 1Q AccWrEnable };
>
> Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass"
> Cc: Francisco Jerez 
> ---
>  src/intel/compiler/brw_fs_lower_regioning.cpp | 24 ++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp 
> b/src/intel/compiler/brw_fs_lower_regioning.cpp
> index cc4163b4c2c..00cb5769ebe 100644
> --- a/src/intel/compiler/brw_fs_lower_regioning.cpp
> +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
> @@ -53,7 +53,21 @@ namespace {
> unsigned
> required_dst_byte_stride(const fs_inst *inst)
> {
> -  if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
> +  if (inst->dst.is_accumulator()) {
> + /* If the destination is an accumulator, insist that we leave the
> +  * stride alone.  We cannot "fix" accumulator destinations by 
> writing
> +  * to a temporary and emitting a MOV into the original destination.
> +  * For multiply instructions (our one use of the accumulator), the
> +  * MUL writes the full 66 bits of the accumulator whereas the MOV we
> +  * would emit only writes 33 bits ane leaves the top 33 bits

and*

> +  * undefined.
> +  *
> +  * It's safe to just require the original stride here because the
> +      * lowering pass will detect the mismatch in 
> required_src_byte_stride
> +  * just fix up the sources of the multiply instead of the 
> destination.

There isn't such a thing as "required_src_byte_stride".  Conjunction
missing between that sentence and the "just fix up..." one.

Code is still:

Reviewed-by: Francisco Jerez 

> +  */
> + return inst->dst.stride * type_sz(inst->dst.type);
> +  } else if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
>!is_byte_raw_mov(inst)) {
>   return get_exec_type_size(inst);
>} else {
> @@ -316,6 +330,14 @@ namespace {
> bool
> lower_dst_region(fs_visitor *v, bblock_t *block, fs_inst *inst)
> {
> +  /* We cannot replace the result of an integer multiply which writes the
> +   * accumulator because MUL+MACH pairs act on the accumulator as a 
> 66-bit
> +   * value whereas the MOV will act on only 32 or 33 bits of the
> +   * accumulator.
> +   */
> +  assert(inst->opcode != BRW_OPCODE_MUL || !inst->dst.is_accumulator() ||
> + brw_reg_type_is_floating_point(inst->dst.type));
> +
>const fs_builder ibld(v, block, inst);
>const unsigned stride = required_dst_byte_stride(inst) /
>type_sz(inst->dst.type);
> -- 
> 2.20.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/fs: Don't apply the des stride alignment rule to accumulators

2019-01-17 Thread Francisco Jerez
Jason Ekstrand  writes:

> Bah... previous e-mail unfinished.  Please ignore.
>
> On Thu, Jan 17, 2019 at 4:15 AM Francisco Jerez 
> wrote:
>
>> Jason Ekstrand  writes:
>>
>> > The pass was discovered to cause problems with the MUL+MACH combination
>> > we emit for nir_[iu]mul_high.  In an experimental branch of mine, I ran
>> > into issues where the MUL+MACH ended up using a strided source due to
>> > working on half of a uint64_t and the new lowering pass helpfully tried
>> > to fix the multiply which wrote to an unstriated accumulator.
>>
>> > Not only did the multiply not need to be fixed
>>
>> That's far from clear, and inconsistent with what this patch is doing,
>> since the fix is still being applied (Wouldn't it make sense to clarify
>> that in the commit message since it's slightly misleading about it?).
>>
>> The original instruction was technically violating the first CHV/BXT
>> double-precision regioning restriction before the pass was introduced,
>> that's why it made any changes in the first place.  The integer
>> multiplication lowering code was just lucky enough that violating the
>> restriction didn't matter in this case, but I doubt that the reason for
>> that had anything to do with the accumulator being the explicit
>> destination...
>>
>
> Explicit, no, but I do suspect that does have to do with it being the
> accumulator.  This restriction isn't theoretical; if you violate it
> with a GRF, you will get data corruption; I've seen it myself.

The BSpec language is vague and frequently inconsistent.  Obviously it
was being violated before because the text doesn't name the accumulator
as an exemption from that rule.  The fact that you've seen it blow up
with corruption before doesn't guarantee it will always blow up under
the conditions stated on the hardware spec (because those conditions are
a highly imperfect abstraction of the hardware logic rather than the
hardware logic itself).  It's because the restriction (as it's
enunciated in the BSpec) was purely theoretical that the MULH
implementation worked in the first place.

> I could see two possible explanations:
>
>  1. Under the hood the accumulator is written with a Q type and an internal
> stride of 8 bytes, hence the restriction does apply but is implicitly
> satisfied for D type source strides of 1 and 2.
>  2. The data path to the accumulator is a special case in the hardware and
> doesn't use the normal general-purpose regioning logic and so doesn't
> require the restriction.
>

I don't see any evidence for any of these explanations.  I believe that
the actual reason why the MULH implementation didn't suffer the effects
of violating these restrictions is that in fact they don't apply to
*any* 32x16-bit integer multiply operations at all despite what the
hardware spec says, whether the destination is the accumulator or not.

I've verified it by doing a daily CI run on the following patch:

https://cgit.freedesktop.org/~currojerez/mesa/commit/?h=jenkins=c1a32c4e1e53d70b0c8f6254f0f53f0230b7e21b

It disables legalization of the integer multiply instruction and then
adds a hack to lower_integer_multiplication() for it to intentionally
break the alignment rule.  No regressions on CHV/BXT/GLK.  My reading of
the simulator confirms that 32x16-bit multiplication isn't affected by
the restriction.

I'm tempted to send a patch that disables regioning alignment lowering
for 32x16-bit integer multiplication strictly for performance.  But
that's really an orthogonal change to this patch, since due to the issue
of precision loss we still need to make sure not to touch accumulator
destinations in instructions that *do* have this restriction.

> I don't find the first one very convincing at all.  Among other things, if
> it were the true reason, it would imply that we would need to use a stride
> of exactly 2 on D type sources which but empirical evidence suggests that
> "mul(8) acc0<1> g5<8,8,1>UD g9<16,8,2>UW" works just fine.
>
>
>> > but the "fix" ended up breaking it because a MOV to the accumulator is
>> > not the same as using it as a multiply destination due to the magic
>> > way the 33/64 bits of the
>>
>> Technically it has 66 bits (it wasn't a typo when I said that to you
>> earlier on IRC).  That's how it can t hold the result of a SIMD16
>> 16x16-bit integer multiplication with 33-bit signed precision per scalar
>> component.
>>
>
> Yes, there are 33 bits available for WxW multiplies but this is dealing
> with a DxD multiply which only has 64 bits according to this bit of bspec
> text:
>
> As there are only 64 bits per channel in DWord mode (D and UD), it is
> sufficient to store the m

Re: [Mesa-dev] [PATCH] intel/fs: Don't apply the des stride alignment rule to accumulators

2019-01-17 Thread Francisco Jerez
Jason Ekstrand  writes:

> The pass was discovered to cause problems with the MUL+MACH combination
> we emit for nir_[iu]mul_high.  In an experimental branch of mine, I ran
> into issues where the MUL+MACH ended up using a strided source due to
> working on half of a uint64_t and the new lowering pass helpfully tried
> to fix the multiply which wrote to an unstriated accumulator.

> Not only did the multiply not need to be fixed

That's far from clear, and inconsistent with what this patch is doing,
since the fix is still being applied (Wouldn't it make sense to clarify
that in the commit message since it's slightly misleading about it?).

The original instruction was technically violating the first CHV/BXT
double-precision regioning restriction before the pass was introduced,
that's why it made any changes in the first place.  The integer
multiplication lowering code was just lucky enough that violating the
restriction didn't matter in this case, but I doubt that the reason for
that had anything to do with the accumulator being the explicit
destination...

> but the "fix" ended up breaking it because a MOV to the accumulator is
> not the same as using it as a multiply destination due to the magic
> way the 33/64 bits of the

Technically it has 66 bits (it wasn't a typo when I said that to you
earlier on IRC).  That's how it can t hold the result of a SIMD16
16x16-bit integer multiplication with 33-bit signed precision per scalar
component.

> accumulator are handled for different instruction types.
>
> Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass"
> Cc: Francisco Jerez 
> ---
>  src/intel/compiler/brw_fs_lower_regioning.cpp | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp 
> b/src/intel/compiler/brw_fs_lower_regioning.cpp
> index cc4163b4c2c..b8a89e82272 100644
> --- a/src/intel/compiler/brw_fs_lower_regioning.cpp
> +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
> @@ -53,7 +53,13 @@ namespace {
> unsigned
> required_dst_byte_stride(const fs_inst *inst)
> {
> -  if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
> +  if (inst->dst.is_accumulator()) {
> + /* Even though it's not explicitly documented in the PRMs or the
> +  * BSpec, writes to the accumulator appear to not need any special
> +  * treatment with respect too their destination stride alignment.
> +  */

The code is not really doing what the comment says.  The
destination/source stride alignment restriction will still be honored
for this instruction.  It's just that the destination *has* to be left
untouched while doing that in the case of an integer MUL/MACH
instruction (that's the only reason I asked you to return the original
byte stride of the destination), because splitting off the region into a
MOV would lead to data loss due to the inconsistent semantics of the
accumulator destination for integer MUL/MACH (which update the whole 66
bits) and every other integer arithmetic instruction (which update the
bottom 33 bits and *apparently* leave the top 33 bits uninitialized) --
IOW this is only here so that the assert below doesn't fire.

> + return inst->dst.stride * type_sz(inst->dst.type);
> +  } else if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&

The code changes themselves are just as I wished, so this gets my:

Reviewed-by: Francisco Jerez 

assuming that you clarify the commit message and comment above.

>!is_byte_raw_mov(inst)) {
>   return get_exec_type_size(inst);
>} else {
> @@ -316,6 +322,14 @@ namespace {
> bool
> lower_dst_region(fs_visitor *v, bblock_t *block, fs_inst *inst)
> {
> +  /* We cannot replace the result of an integer multiply which writes the
> +   * accumulator because MUL+MACH pairs act on the accumulator as a 
> 64-bit
> +   * value whereas the MOV will act on only 32 or 33 bits of the
> +   * accumulator.
> +   */
> +  assert(inst->opcode != BRW_OPCODE_MUL || !inst->dst.is_accumulator() ||
> + brw_reg_type_is_floating_point(inst->dst.type));
> +
>const fs_builder ibld(v, block, inst);
>const unsigned stride = required_dst_byte_stride(inst) /
>type_sz(inst->dst.type);
> -- 
> 2.20.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] intel/fs: Promote execution type to 32-bit when any half-float conversion is needed.

2019-01-15 Thread Francisco Jerez
The docs are fairly incomplete and inconsistent about it, but this
seems to be the reason why half-float destinations are required to be
DWORD-aligned on BDW+ projects.  This way the regioning lowering pass
will make sure that the destination components of W to HF and HF to W
conversions are aligned like the corresponding conversion operation
with 32-bit execution data type.
---
 src/intel/compiler/brw_ir_fs.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
index 3c23fb375e4..08e3d83d910 100644
--- a/src/intel/compiler/brw_ir_fs.h
+++ b/src/intel/compiler/brw_ir_fs.h
@@ -477,6 +477,27 @@ get_exec_type(const fs_inst *inst)
 
assert(exec_type != BRW_REGISTER_TYPE_B);
 
+   /* Promotion of the execution type to 32-bit for conversions from or to
+* half-float seems to be consistent with the following text from the
+* Cherryview PRM Vol. 7, "Execution Data Type":
+*
+* "When single precision and half precision floats are mixed between
+*  source operands or between source and destination operand [..] single
+*  precision float is the execution datatype."
+*
+* and from "Register Region Restrictions":
+*
+* "Conversion between Integer and HF (Half Float) must be DWord aligned
+*  and strided by a DWord on the destination."
+*/
+   if (type_sz(exec_type) == 2 &&
+   inst->dst.type != exec_type) {
+  if (exec_type == BRW_REGISTER_TYPE_HF)
+ exec_type = BRW_REGISTER_TYPE_F;
+  else if (inst->dst.type == BRW_REGISTER_TYPE_HF)
+ exec_type = BRW_REGISTER_TYPE_D;
+   }
+
return exec_type;
 }
 
-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 01/42] intel/compiler: handle conversions between int and half-float on atom

2019-01-15 Thread Francisco Jerez
Iago Toral Quiroga  writes:

> v2: adapted to work with the new regioning lowering pass
>
> Reviewed-by: Topi Pohjolainen  (v1)
> ---
>  src/intel/compiler/brw_ir_fs.h | 33 ++---
>  1 file changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
> index 3c23fb375e4..ba4d6a95720 100644
> --- a/src/intel/compiler/brw_ir_fs.h
> +++ b/src/intel/compiler/brw_ir_fs.h
> @@ -497,9 +497,10 @@ is_unordered(const fs_inst *inst)
>  }
>  
>  /**
> - * Return whether the following regioning restriction applies to the 
> specified
> - * instruction.  From the Cherryview PRM Vol 7. "Register Region
> - * Restrictions":
> + * Return whether one of the the following regioning restrictions apply to 
> the
> + * specified instruction.
> + *
> + * From the Cherryview PRM Vol 7. "Register Region Restrictions":
>   *
>   * "When source or destination datatype is 64b or operation is integer DWord
>   *  multiply, regioning in Align1 must follow these rules:
> @@ -508,6 +509,14 @@ is_unordered(const fs_inst *inst)
>   *  2. Regioning must ensure Src.Vstride = Src.Width * Src.Hstride.
>   *  3. Source and Destination offset must be the same, except the case of
>   * scalar source."
> + *
> + * From the Cherryview PRM Vol 7. "Register Region Restrictions":
> + *
> + *"Conversion between Integer and HF (Half Float) must be DWord
> + * aligned and strided by a DWord on the destination."
> + *
> + *The same restriction is listed for other hardware platforms, however,
> + *empirical testing suggests that only atom platforms are affected.
>   */
>  static inline bool
>  has_dst_aligned_region_restriction(const gen_device_info *devinfo,
> @@ -518,10 +527,20 @@ has_dst_aligned_region_restriction(const 
> gen_device_info *devinfo,
>   (inst->opcode == BRW_OPCODE_MUL || inst->opcode == BRW_OPCODE_MAD);
>  
> if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
> -   (type_sz(exec_type) == 4 && is_int_multiply))
> -  return devinfo->is_cherryview || gen_device_info_is_9lp(devinfo);
> -   else
> -  return false;
> +   (type_sz(exec_type) == 4 && is_int_multiply)) {
> +  if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo))
> + return true;
> +   }
> +
> +   const bool dst_type_is_hf = inst->dst.type == BRW_REGISTER_TYPE_HF;
> +   const bool exec_type_is_hf = exec_type == BRW_REGISTER_TYPE_HF;
> +   if ((dst_type_is_hf && !brw_reg_type_is_floating_point(exec_type)) ||
> +   (exec_type_is_hf && !brw_reg_type_is_floating_point(inst->dst.type))) 
> {
> +  if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo))
> + return true;
> +   }

While looking into this closely, I'm seeing substantial divergence
between the behavior of the simulator, the hardware docs, and the
restriction this is implementing...  The docs are certainly inconsistent
about how and where this should be handled.

I'm suspecting that this restriction is more similar in nature to the
one referred to in the regioning lowering pass as
"is_narrowing_conversion", rather than the one handled by
has_dst_aligned_region_restriction().  Probably we don't need to change
this function nor the regioning pass for it to be honored, because that
restriction is already implemented.  I have a feeling that the reason
for this may be that the 16-bit pipeline lacks the ability to handle
conversions from or to half-float, so the execution type is implicitly
promoted to the matching (integer or floating-point) 32-bit type where
any HF conversion would be needed.  And on those the usual alignment
restriction of the destination to a larger execution type applies.  From
the hardware docs for CHV *only*:

| When single precision and half precision floats are mixed between
| source operands or between source and destination operand. In such
| cases, single precision float is the execution datatype.

This would mean that an "add dst:f, src:hf, src:hf" is really computed
with single precision (!).

The restriction you're quoting seems to be the following:

| BDW+
|
| Conversion between Integer and HF (Half Float) must be DWord-aligned
| and strided by a DWord on the destination.
|
| // Example:
| add (8) r10.0<2>:hf r11.0<8;8,1>:w r12.0<8;8,1>:w
| // Destination stride must be 2.
| mov (8) r10.0<2>:w r11.0<8;8,1>:hf
| // Destination stride must be 2.

However that restriction is apparently overriden on *most* projects
except for BDW (where you aren't applying any restriction at all) by the
following:

| Project:  CHV, SKL+
|
| There is a relaxed alignment rule for word destinations. When the
| destination type is word (UW, W, HF), destination data types can be
| aligned to either the lowest word or the second lowest word of the
| execution channel. This means the destination data words can be either
| all in the even word locations or all in the odd word locations.
| 
| // Example:
| add (8)  r10.0<2>:hf 

Re: [Mesa-dev] [PATCH 03/10] intel/fs: Fix bug in lower_simd_width while splitting an instruction which was already split.

2019-01-07 Thread Francisco Jerez
Iago Toral  writes:

> On Mon, 2019-01-07 at 11:58 -0800, Francisco Jerez wrote:
>> Iago Toral  writes:
>> 
>> > On Sat, 2018-12-29 at 12:38 -0800, Francisco Jerez wrote:
>> > > This seems to be a problem in combination with the
>> > > lower_regioning
>> > > pass introduced by a future commit, which can modify a SIMD-split
>> > > instruction causing its execution size to become illegal
>> > > again.  A
>> > > subsequent call to lower_simd_width() would hit this bug on a
>> > > future
>> > > platform.
>> > > 
>> > > Cc: mesa-sta...@lists.freedesktop.org
>> > > ---
>> > >  src/intel/compiler/brw_fs.cpp | 4 ++--
>> > >  1 file changed, 2 insertions(+), 2 deletions(-)
>> > > 
>> > > diff --git a/src/intel/compiler/brw_fs.cpp
>> > > b/src/intel/compiler/brw_fs.cpp
>> > > index 97544fdf465..4aacc72a1b7 100644
>> > > --- a/src/intel/compiler/brw_fs.cpp
>> > > +++ b/src/intel/compiler/brw_fs.cpp
>> > > @@ -5666,7 +5666,7 @@ static fs_reg
>> > >  emit_unzip(const fs_builder , fs_inst *inst, unsigned i)
>> > >  {
>> > > /* Specified channel group from the source region. */
>> > > -   const fs_reg src = horiz_offset(inst->src[i], lbld.group());
>> > > +   const fs_reg src = horiz_offset(inst->src[i], lbld.group() -
>> > > inst->group);
>> > 
>> > Should we assert that lbld.group >= inst->group? Same below.
>> > 
>> 
>> The IR will fail validation anytime that's not the case.  But I can
>> add
>> the assertions in both places if that makes you feel more
>> comfortable.
>
> I guess you are referring to this assert at codegen time:
>
> assert(inst->force_writemask_all || inst->group % inst->exec_size ==
> 0);
>
> I guess that is probably enough, but I would still prefer to add the
> asserts here too if that's okay.
>

Nah, I was thinking of the i965 IR validator that checks for
out-of-bounds VGRF register accesses.  But the asserts would be more
strict -- Just added them locally.

Thanks!

>> > > if (needs_src_copy(lbld, inst, i)) {
>> > >/* Builder of the right width to perform the copy avoiding
>> > > uninitialized
>> > > @@ -5757,7 +5757,7 @@ emit_zip(const fs_builder _before,
>> > > const
>> > > fs_builder _after,
>> > > assert(lbld_before.group() == lbld_after.group());
>> > >  
>> > > /* Specified channel group from the destination region. */
>> > > -   const fs_reg dst = horiz_offset(inst->dst,
>> > > lbld_after.group());
>> > > +   const fs_reg dst = horiz_offset(inst->dst, lbld_after.group()
>> > > -
>> > > inst->group);
>> > > const unsigned dst_size = inst->size_written /
>> > >inst->dst.component_size(inst->exec_size);
>> > >  


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/10] intel/fs: Remove existing lower_conversions pass.

2019-01-07 Thread Francisco Jerez
Iago Toral  writes:

> On Sat, 2018-12-29 at 12:39 -0800, Francisco Jerez wrote:
>> It's redundant with the functionality provided by lower_regioning
>> now.
>> ---
>>  src/intel/Makefile.sources|   1 -
>>  src/intel/compiler/brw_fs.cpp |   1 -
>>  src/intel/compiler/brw_fs.h   |   1 -
>>  .../compiler/brw_fs_lower_conversions.cpp | 132 
>> --
>>  src/intel/compiler/meson.build|   1 -
>>  5 files changed, 136 deletions(-)
>>  delete mode 100644 src/intel/compiler/brw_fs_lower_conversions.cpp
>> 
>> diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
>> index 6b9874d2b80..fe06a57b42e 100644
>> --- a/src/intel/Makefile.sources
>> +++ b/src/intel/Makefile.sources
>> @@ -62,7 +62,6 @@ COMPILER_FILES = \
>>  compiler/brw_fs.h \
>>  compiler/brw_fs_live_variables.cpp \
>>  compiler/brw_fs_live_variables.h \
>> -compiler/brw_fs_lower_conversions.cpp \
>>  compiler/brw_fs_lower_pack.cpp \
>>  compiler/brw_fs_lower_regioning.cpp \
>>  compiler/brw_fs_nir.cpp \
>> diff --git a/src/intel/compiler/brw_fs.cpp
>> b/src/intel/compiler/brw_fs.cpp
>> index caa7a798332..d6280d558ec 100644
>> --- a/src/intel/compiler/brw_fs.cpp
>> +++ b/src/intel/compiler/brw_fs.cpp
>> @@ -6472,7 +6472,6 @@ fs_visitor::optimize()
>> }
>>  
>> progress = false;
>> -   OPT(lower_conversions);
>> OPT(lower_regioning);
>> if (progress) {
>>OPT(opt_copy_propagation);
>
> If you didn't do this in the previous patch, then maybe do it here:
>
> if (OPT(lower_regioning)) {
>...
> }
>
> and avoid resetting progress.
>

I left this lying around because there is another legalization pass
coming up that should cause the same post-lowering optimization passes
to be executed if progress is made.  I can clean things up though if you
like, and re-introduce the reset of the progress flag in the future
commit.

>> diff --git a/src/intel/compiler/brw_fs.h
>> b/src/intel/compiler/brw_fs.h
>> index 36825754931..7edaa3af43c 100644
>> --- a/src/intel/compiler/brw_fs.h
>> +++ b/src/intel/compiler/brw_fs.h
>> @@ -165,7 +165,6 @@ public:
>> bool lower_load_payload();
>> bool lower_pack();
>> bool lower_regioning();
>> -   bool lower_conversions();
>> bool lower_logical_sends();
>> bool lower_integer_multiplication();
>> bool lower_minmax();
>> diff --git a/src/intel/compiler/brw_fs_lower_conversions.cpp
>> b/src/intel/compiler/brw_fs_lower_conversions.cpp
>> deleted file mode 100644
>> index 145fb55f995..000
>> --- a/src/intel/compiler/brw_fs_lower_conversions.cpp
>> +++ /dev/null
>> @@ -1,132 +0,0 @@
>> -/*
>> - * Copyright © 2015 Connor Abbott
>> - *
>> - * Permission is hereby granted, free of charge, to any person
>> obtaining a
>> - * copy of this software and associated documentation files (the
>> "Software"),
>> - * to deal in the Software without restriction, including without
>> limitation
>> - * the rights to use, copy, modify, merge, publish, distribute,
>> sublicense,
>> - * and/or sell copies of the Software, and to permit persons to whom
>> the
>> - * Software is furnished to do so, subject to the following
>> conditions:
>> - *
>> - * The above copyright notice and this permission notice (including
>> the next
>> - * paragraph) shall be included in all copies or substantial
>> portions of the
>> - * Software.
>> - *
>> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
>> EVENT SHALL
>> - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
>> OR OTHER
>> - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> ARISING
>> - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> OTHER DEALINGS
>> - * IN THE SOFTWARE.
>> - */
>> -
>> -#include "brw_fs.h"
>> -#include "brw_cfg.h"
>> -#include "brw_fs_builder.h"
>> -
>> -using namespace brw;
>> -
>> -static bool
>> -supports_type_conversion(const fs_inst *inst) {
>> -   switch (inst->opcode) {
>> -   case BRW_OPCODE_MOV:
>> -   case SHADER_OPCODE_MOV_INDIRECT:
>> -  return true;
>> 

Re: [Mesa-dev] [PATCHv2 07/10] intel/fs: Introduce regioning lowering pass.

2019-01-07 Thread Francisco Jerez
Iago Toral  writes:

> On Sat, 2019-01-05 at 14:03 -0800, Francisco Jerez wrote:
>> This legalization pass is meant to handle situations where the source
>> or destination regioning controls of an instruction are unsupported
>> by
>> the hardware and need to be lowered away into separate instructions.
>> This should be more reliable and future-proof than the current
>> approach of handling CHV/BXT restrictions manually all over the
>> visitor.  The same mechanism is leveraged to lower unsupported type
>> conversions easily, which obsoletes the lower_conversions pass.
>> 
>> v2: Give conditional modifiers the same treatment as predicates for
>> SEL instructions in lower_dst_modifiers() (Iago).  Special-case a
>> couple of other instructions with inconsistent conditional mod
>> semantics in lower_dst_modifiers() (Curro).
>> ---
>>  src/intel/Makefile.sources|   1 +
>>  src/intel/compiler/brw_fs.cpp |   5 +-
>>  src/intel/compiler/brw_fs.h   |  21 +-
>>  src/intel/compiler/brw_fs_lower_regioning.cpp | 399
>> ++
>>  src/intel/compiler/brw_ir_fs.h|  10 +
>>  src/intel/compiler/meson.build|   1 +
>>  6 files changed, 418 insertions(+), 19 deletions(-)
>>  create mode 100644 src/intel/compiler/brw_fs_lower_regioning.cpp
>> 
>> diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
>> index 5e7d32293b7..6b9874d2b80 100644
>> --- a/src/intel/Makefile.sources
>> +++ b/src/intel/Makefile.sources
>> @@ -64,6 +64,7 @@ COMPILER_FILES = \
>>  compiler/brw_fs_live_variables.h \
>>  compiler/brw_fs_lower_conversions.cpp \
>>  compiler/brw_fs_lower_pack.cpp \
>> +compiler/brw_fs_lower_regioning.cpp \
>>  compiler/brw_fs_nir.cpp \
>>  compiler/brw_fs_reg_allocate.cpp \
>>  compiler/brw_fs_register_coalesce.cpp \
>> diff --git a/src/intel/compiler/brw_fs.cpp
>> b/src/intel/compiler/brw_fs.cpp
>> index 889509badab..caa7a798332 100644
>> --- a/src/intel/compiler/brw_fs.cpp
>> +++ b/src/intel/compiler/brw_fs.cpp
>> @@ -6471,7 +6471,10 @@ fs_visitor::optimize()
>>OPT(dead_code_eliminate);
>> }
>>  
>> -   if (OPT(lower_conversions)) {
>> +   progress = false;
>> +   OPT(lower_conversions);
>> +   OPT(lower_regioning);
>> +   if (progress) {
>
> This is a small nitpick but since this makes lower_conversions
> redundant, maybe it makes more sense to just remove the call to it here
> already in this patch so you don't have to reset the progress variable
> and simply do:
>
> if (OPT(lower_regioning)) {
>...
> }
>

The main reason for this is that in the event of a regression this will
allow identifying from the bisection result whether the reason for the
failure is the lack of a condition in the lower_regioning pass which was
previously handled by lower_conversions, or whether it's a bug in the
lowering code of lower_regioning itself.

>>OPT(opt_copy_propagation);
>>OPT(dead_code_eliminate);
>>OPT(lower_simd_width);
>> diff --git a/src/intel/compiler/brw_fs.h
>> b/src/intel/compiler/brw_fs.h
>> index dc36ecc21ac..36825754931 100644
>> --- a/src/intel/compiler/brw_fs.h
>> +++ b/src/intel/compiler/brw_fs.h
>> @@ -164,6 +164,7 @@ public:
>> void lower_uniform_pull_constant_loads();
>> bool lower_load_payload();
>> bool lower_pack();
>> +   bool lower_regioning();
>> bool lower_conversions();
>> bool lower_logical_sends();
>> bool lower_integer_multiplication();
>> @@ -536,24 +537,8 @@ namespace brw {
>>}
>> }
>>  
>> -   /**
>> -* Remove any modifiers from the \p i-th source region of the
>> instruction,
>> -* including negate, abs and any implicit type conversion to the
>> execution
>> -* type.  Instead any source modifiers will be implemented as a
>> separate
>> -* MOV instruction prior to the original instruction.
>> -*/
>> -   inline bool
>> -   lower_src_modifiers(fs_visitor *v, bblock_t *block, fs_inst
>> *inst, unsigned i)
>> -   {
>> -  assert(inst->components_read(i) == 1);
>> -  const fs_builder ibld(v, block, inst);
>> -  const fs_reg tmp = ibld.vgrf(get_exec_type(inst));
>> -
>> -  ibld.MOV(tmp, inst->src[i]);
>> -  inst->src[i] = tmp;
>> -
>> -  return true;
>> -   }
>> +   bool
>> +   lower_src_modifiers(fs_visitor *v, bblock_t *block, fs_inst
>> *

Re: [Mesa-dev] [PATCH 05/10] intel/fs: Respect CHV/BXT regioning restrictions in copy propagation pass.

2019-01-07 Thread Francisco Jerez
Iago Toral  writes:

> On Sat, 2018-12-29 at 12:38 -0800, Francisco Jerez wrote:
>> Currently the visitor attempts to enforce the regioning restrictions
>> that apply to double-precision instructions on CHV/BXT at NIR-to-i965
>> translation time.  It is possible though for the copy propagation
>> pass
>> to violate this restriction if a strided move is propagated into one
>> of the affected instructions.  I've only reproduced this issue on a
>> future platform but it could affect CHV/BXT too under the right
>> conditions.
>> 
>> Cc: mesa-sta...@lists.freedesktop.org
>> ---
>>  .../compiler/brw_fs_copy_propagation.cpp  | 10 +++
>>  src/intel/compiler/brw_ir_fs.h| 28
>> +++
>>  2 files changed, 38 insertions(+)
>> 
>> diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp
>> b/src/intel/compiler/brw_fs_copy_propagation.cpp
>> index a8ec1c34630..c23ce1ef426 100644
>> --- a/src/intel/compiler/brw_fs_copy_propagation.cpp
>> +++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
>> @@ -315,6 +315,16 @@ can_take_stride(fs_inst *inst, unsigned arg,
>> unsigned stride,
>> if (stride > 4)
>>return false;
>>  
>> +   /* Bail if the channels of the source need to be aligned to the
>> byte offset
>> +* of the corresponding channel of the destination, and the
>> provided stride
>> +* would break this restriction.
>> +*/
>> +   if (has_dst_aligned_region_restriction(devinfo, inst) &&
>> +   !(type_sz(inst->src[arg].type) * stride ==
>> +   type_sz(inst->dst.type) * inst->dst.stride ||
>> + stride == 0))
>> +  return false;
>> +
>> /* 3-source instructions can only be Align16, which restricts
>> what strides
>>  * they can take. They can only take a stride of 1 (the usual
>> case), or 0
>>  * with a special "repctrl" bit. But the repctrl bit doesn't work
>> for
>> diff --git a/src/intel/compiler/brw_ir_fs.h
>> b/src/intel/compiler/brw_ir_fs.h
>> index 07e7224e0f8..95b069a2e02 100644
>> --- a/src/intel/compiler/brw_ir_fs.h
>> +++ b/src/intel/compiler/brw_ir_fs.h
>> @@ -486,4 +486,32 @@ get_exec_type_size(const fs_inst *inst)
>> return type_sz(get_exec_type(inst));
>>  }
>>  
>> +/**
>> + * Return whether the following regioning restriction applies to the
>> specified
>> + * instruction.  From the Cherryview PRM Vol 7. "Register Region
>> + * Restrictions":
>> + *
>> + * "When source or destination datatype is 64b or operation is
>> integer DWord
>> + *  multiply, regioning in Align1 must follow these rules:
>> + *
>> + *  1. Source and Destination horizontal stride must be aligned to
>> the same qword.
>> + *  2. Regioning must ensure Src.Vstride = Src.Width * Src.Hstride.
>> + *  3. Source and Destination offset must be the same, except the
>> case of
>> + * scalar source."
>> + */
>> +static inline bool
>> +has_dst_aligned_region_restriction(const gen_device_info *devinfo,
>> +   const fs_inst *inst)
>> +{
>> +   const brw_reg_type exec_type = get_exec_type(inst);
>> +   const bool is_int_multiply =
>> !brw_reg_type_is_floating_point(exec_type) &&
>> + (inst->opcode == BRW_OPCODE_MUL || inst->opcode ==
>> BRW_OPCODE_MAD);
>
> Should this be extended to include MAC and MACH too?
>

The documentation is unclear, but it doesn't look like that's the case
according to the simulator, because those instructions don't do more
than a 16x16 or 32x16 bit integer multiply respectively.

>> +
>> +   if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
>> +   (type_sz(exec_type) == 4 && is_int_multiply))
>> +  return devinfo->is_cherryview ||
>> gen_device_info_is_9lp(devinfo);
>
> How about:
>
> if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo)) {
>...
> } else {
>return false;
> }
>
> since we only really need to do these checks in those platforms it
> might make a bit more sense to do it this way.
>

Right now the difference is purely cosmetic, but in the future that
won't work for the platform this was designed for, I can send you more
details off-list.

>> +   else
>> +  return false;
>> +}
>> +
>>  #endif


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/10] intel/fs: Fix bug in lower_simd_width while splitting an instruction which was already split.

2019-01-07 Thread Francisco Jerez
Iago Toral  writes:

> On Sat, 2018-12-29 at 12:38 -0800, Francisco Jerez wrote:
>> This seems to be a problem in combination with the lower_regioning
>> pass introduced by a future commit, which can modify a SIMD-split
>> instruction causing its execution size to become illegal again.  A
>> subsequent call to lower_simd_width() would hit this bug on a future
>> platform.
>> 
>> Cc: mesa-sta...@lists.freedesktop.org
>> ---
>>  src/intel/compiler/brw_fs.cpp | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/src/intel/compiler/brw_fs.cpp
>> b/src/intel/compiler/brw_fs.cpp
>> index 97544fdf465..4aacc72a1b7 100644
>> --- a/src/intel/compiler/brw_fs.cpp
>> +++ b/src/intel/compiler/brw_fs.cpp
>> @@ -5666,7 +5666,7 @@ static fs_reg
>>  emit_unzip(const fs_builder , fs_inst *inst, unsigned i)
>>  {
>> /* Specified channel group from the source region. */
>> -   const fs_reg src = horiz_offset(inst->src[i], lbld.group());
>> +   const fs_reg src = horiz_offset(inst->src[i], lbld.group() -
>> inst->group);
>
> Should we assert that lbld.group >= inst->group? Same below.
>

The IR will fail validation anytime that's not the case.  But I can add
the assertions in both places if that makes you feel more comfortable.

>> if (needs_src_copy(lbld, inst, i)) {
>>/* Builder of the right width to perform the copy avoiding
>> uninitialized
>> @@ -5757,7 +5757,7 @@ emit_zip(const fs_builder _before, const
>> fs_builder _after,
>> assert(lbld_before.group() == lbld_after.group());
>>  
>> /* Specified channel group from the destination region. */
>> -   const fs_reg dst = horiz_offset(inst->dst, lbld_after.group());
>> +   const fs_reg dst = horiz_offset(inst->dst, lbld_after.group() -
>> inst->group);
>> const unsigned dst_size = inst->size_written /
>>inst->dst.component_size(inst->exec_size);
>>  


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 02/10] intel/fs: Implement quad swizzles on ICL+.

2019-01-07 Thread Francisco Jerez
Iago Toral  writes:

> On Sat, 2018-12-29 at 12:38 -0800, Francisco Jerez wrote:
>> Align16 is no longer a thing, so a new implementation is provided
>> using Align1 instead.  Not all possible swizzles can be represented
>> as
>> a single Align1 region, but some fast paths are provided for
>> frequently used swizzles that can be represented efficiently in
>> Align1
>> mode.
>> 
>> Fixes ~90 subgroup quad swap Vulkan CTS tests.
>> 
>> Cc: mesa-sta...@lists.freedesktop.org
>> ---
>>  src/intel/compiler/brw_fs.cpp   | 25 +++-
>>  src/intel/compiler/brw_fs.h |  4 ++
>>  src/intel/compiler/brw_fs_generator.cpp | 82 ---
>> --
>>  3 files changed, 93 insertions(+), 18 deletions(-)
>> 
>> diff --git a/src/intel/compiler/brw_fs.cpp
>> b/src/intel/compiler/brw_fs.cpp
>> index 2f0f0151219..97544fdf465 100644
>> --- a/src/intel/compiler/brw_fs.cpp
>> +++ b/src/intel/compiler/brw_fs.cpp
>> @@ -315,6 +315,20 @@ fs_inst::has_source_and_destination_hazard()
>> const
>> * may stomp all over it.
>> */
>>return true;
>> +   case SHADER_OPCODE_QUAD_SWIZZLE:
>> +  switch (src[1].ud) {
>
> Maybe it is worth adding a small comment here indicating that these are
> the cases where we implement the opcode as a single instruction and
> refer to the generator for details?
>

Yeah, fixed up locally.

>> +  case BRW_SWIZZLE_:
>> +  case BRW_SWIZZLE_:
>> +  case BRW_SWIZZLE_:
>> +  case BRW_SWIZZLE_:
>> +  case BRW_SWIZZLE_XXZZ:
>> +  case BRW_SWIZZLE_YYWW:
>> +  case BRW_SWIZZLE_XYXY:
>> +  case BRW_SWIZZLE_ZWZW:
>> + return false;
>> +  default:
>> + return !is_uniform(src[0]);
>
> Shouldn't this be:
>
> return !is_uniform(src[0]) ||
>(devinfo->gen < 11 && type_sz(src.type) == 4);
>
> Since in that case we also implement the opcode with a single ALIGN16
> instruction.
>

Not really.  Maybe you mean "!is_uniform(src[0]) &&
(devinfo->gen >= 11 || type_sz(src.type) != 4)" instead?  That would be
somewhat more accurate than the expression in my patch, but
unfortunately the devinfo pointer is not available here.  I wouldn't
mind plumbing it through but patch is meant for mesa-stable, and it
shouldn't affect correctness to be more strict than necessary regarding
source/destination hazards.

>> +  }
>> default:
>>/* The SIMD16 compressed instruction
>> *
>> @@ -5579,9 +5593,14 @@ get_lowered_simd_width(const struct
>> gen_device_info *devinfo,
>> case SHADER_OPCODE_URB_WRITE_SIMD8_MASKED_PER_SLOT:
>>return MIN2(8, inst->exec_size);
>>  
>> -   case SHADER_OPCODE_QUAD_SWIZZLE:
>> -  return 8;
>> -
>> +   case SHADER_OPCODE_QUAD_SWIZZLE: {
>> +  const unsigned swiz = inst->src[1].ud;
>> +  return (is_uniform(inst->src[0]) ?
>> + get_fpu_lowered_simd_width(devinfo, inst) :
>> +  devinfo->gen < 11 && type_sz(inst->src[0].type) == 4 ?
>> 8 :
>> +  swiz == BRW_SWIZZLE_XYXY || swiz == BRW_SWIZZLE_ZWZW ?
>> 4 :
>> +  get_fpu_lowered_simd_width(devinfo, inst));
>> +   }
>> case SHADER_OPCODE_MOV_INDIRECT: {
>>/* From IVB and HSW PRMs:
>> *
>> diff --git a/src/intel/compiler/brw_fs.h
>> b/src/intel/compiler/brw_fs.h
>> index 53d9b6ce7bf..dc36ecc21ac 100644
>> --- a/src/intel/compiler/brw_fs.h
>> +++ b/src/intel/compiler/brw_fs.h
>> @@ -480,6 +480,10 @@ private:
>>   struct brw_reg src,
>>   struct brw_reg idx);
>>  
>> +   void generate_quad_swizzle(const fs_inst *inst,
>> +  struct brw_reg dst, struct brw_reg
>> src,
>> +  unsigned swiz);
>> +
>> bool patch_discard_jumps_to_fb_writes();
>>  
>> const struct brw_compiler *compiler;
>> diff --git a/src/intel/compiler/brw_fs_generator.cpp
>> b/src/intel/compiler/brw_fs_generator.cpp
>> index 08dd83dded7..84627e83132 100644
>> --- a/src/intel/compiler/brw_fs_generator.cpp
>> +++ b/src/intel/compiler/brw_fs_generator.cpp
>> @@ -582,6 +582,72 @@ fs_generator::generate_shuffle(fs_inst *inst,
>> }
>>  }
>>  
>> +void
>> +fs_generator::generate_quad_swizzle(const fs_inst *inst,
>> +struct brw_reg dst, struct
>> brw_reg src,

[Mesa-dev] [PATCHv2 07/10] intel/fs: Introduce regioning lowering pass.

2019-01-05 Thread Francisco Jerez
This legalization pass is meant to handle situations where the source
or destination regioning controls of an instruction are unsupported by
the hardware and need to be lowered away into separate instructions.
This should be more reliable and future-proof than the current
approach of handling CHV/BXT restrictions manually all over the
visitor.  The same mechanism is leveraged to lower unsupported type
conversions easily, which obsoletes the lower_conversions pass.

v2: Give conditional modifiers the same treatment as predicates for
SEL instructions in lower_dst_modifiers() (Iago).  Special-case a
couple of other instructions with inconsistent conditional mod
semantics in lower_dst_modifiers() (Curro).
---
 src/intel/Makefile.sources|   1 +
 src/intel/compiler/brw_fs.cpp |   5 +-
 src/intel/compiler/brw_fs.h   |  21 +-
 src/intel/compiler/brw_fs_lower_regioning.cpp | 399 ++
 src/intel/compiler/brw_ir_fs.h|  10 +
 src/intel/compiler/meson.build|   1 +
 6 files changed, 418 insertions(+), 19 deletions(-)
 create mode 100644 src/intel/compiler/brw_fs_lower_regioning.cpp

diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
index 5e7d32293b7..6b9874d2b80 100644
--- a/src/intel/Makefile.sources
+++ b/src/intel/Makefile.sources
@@ -64,6 +64,7 @@ COMPILER_FILES = \
compiler/brw_fs_live_variables.h \
compiler/brw_fs_lower_conversions.cpp \
compiler/brw_fs_lower_pack.cpp \
+   compiler/brw_fs_lower_regioning.cpp \
compiler/brw_fs_nir.cpp \
compiler/brw_fs_reg_allocate.cpp \
compiler/brw_fs_register_coalesce.cpp \
diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 889509badab..caa7a798332 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -6471,7 +6471,10 @@ fs_visitor::optimize()
   OPT(dead_code_eliminate);
}
 
-   if (OPT(lower_conversions)) {
+   progress = false;
+   OPT(lower_conversions);
+   OPT(lower_regioning);
+   if (progress) {
   OPT(opt_copy_propagation);
   OPT(dead_code_eliminate);
   OPT(lower_simd_width);
diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index dc36ecc21ac..36825754931 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -164,6 +164,7 @@ public:
void lower_uniform_pull_constant_loads();
bool lower_load_payload();
bool lower_pack();
+   bool lower_regioning();
bool lower_conversions();
bool lower_logical_sends();
bool lower_integer_multiplication();
@@ -536,24 +537,8 @@ namespace brw {
   }
}
 
-   /**
-* Remove any modifiers from the \p i-th source region of the instruction,
-* including negate, abs and any implicit type conversion to the execution
-* type.  Instead any source modifiers will be implemented as a separate
-* MOV instruction prior to the original instruction.
-*/
-   inline bool
-   lower_src_modifiers(fs_visitor *v, bblock_t *block, fs_inst *inst, unsigned 
i)
-   {
-  assert(inst->components_read(i) == 1);
-  const fs_builder ibld(v, block, inst);
-  const fs_reg tmp = ibld.vgrf(get_exec_type(inst));
-
-  ibld.MOV(tmp, inst->src[i]);
-  inst->src[i] = tmp;
-
-  return true;
-   }
+   bool
+   lower_src_modifiers(fs_visitor *v, bblock_t *block, fs_inst *inst, unsigned 
i);
 }
 
 void shuffle_from_32bit_read(const brw::fs_builder ,
diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp 
b/src/intel/compiler/brw_fs_lower_regioning.cpp
new file mode 100644
index 000..d7c97e1442a
--- /dev/null
+++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
@@ -0,0 +1,399 @@
+/*
+ * Copyright © 2018 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "brw_fs.h"
+#include "brw_cfg.h"
+#include 

Re: [Mesa-dev] [PATCH 07/10] intel/fs: Introduce regioning lowering pass.

2019-01-05 Thread Francisco Jerez
Francisco Jerez  writes:

> Iago Toral  writes:
>
>> On Sat, 2018-12-29 at 12:39 -0800, Francisco Jerez wrote:
>>> This legalization pass is meant to handle situations where the source
>>> or destination regioning controls of an instruction are unsupported
>>> by
>>> the hardware and need to be lowered away into separate instructions.
>>> This should be more reliable and future-proof than the current
>>> approach of handling CHV/BXT restrictions manually all over the
>>> visitor.  The same mechanism is leveraged to lower unsupported type
>>> conversions easily, which obsoletes the lower_conversions pass.
>>> ---
>>>  src/intel/Makefile.sources|   1 +
>>>  src/intel/compiler/brw_fs.cpp |   5 +-
>>>  src/intel/compiler/brw_fs.h   |  21 +-
>>>  src/intel/compiler/brw_fs_lower_regioning.cpp | 382
>>> ++
>>>  src/intel/compiler/brw_ir_fs.h|  10 +
>>>  src/intel/compiler/meson.build|   1 +
>>>  6 files changed, 401 insertions(+), 19 deletions(-)
>>>  create mode 100644 src/intel/compiler/brw_fs_lower_regioning.cpp
>>> 
>>> diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
>>> index 5e7d32293b7..6b9874d2b80 100644
>>> --- a/src/intel/Makefile.sources
>>> +++ b/src/intel/Makefile.sources
>>> @@ -64,6 +64,7 @@ COMPILER_FILES = \
>>> compiler/brw_fs_live_variables.h \
>>> compiler/brw_fs_lower_conversions.cpp \
>>> compiler/brw_fs_lower_pack.cpp \
>>> +   compiler/brw_fs_lower_regioning.cpp \
>>> compiler/brw_fs_nir.cpp \
>>> compiler/brw_fs_reg_allocate.cpp \
>>> compiler/brw_fs_register_coalesce.cpp \
>>> diff --git a/src/intel/compiler/brw_fs.cpp
>>> b/src/intel/compiler/brw_fs.cpp
>>> index 889509badab..caa7a798332 100644
>>> --- a/src/intel/compiler/brw_fs.cpp
>>> +++ b/src/intel/compiler/brw_fs.cpp
>>> @@ -6471,7 +6471,10 @@ fs_visitor::optimize()
>>>OPT(dead_code_eliminate);
>>> }
>>>  
>>> -   if (OPT(lower_conversions)) {
>>> +   progress = false;
>>> +   OPT(lower_conversions);
>>> +   OPT(lower_regioning);
>>> +   if (progress) {
>>>OPT(opt_copy_propagation);
>>>OPT(dead_code_eliminate);
>>>OPT(lower_simd_width);
>>> diff --git a/src/intel/compiler/brw_fs.h
>>> b/src/intel/compiler/brw_fs.h
>>> index dc36ecc21ac..36825754931 100644
>>> --- a/src/intel/compiler/brw_fs.h
>>> +++ b/src/intel/compiler/brw_fs.h
>>> @@ -164,6 +164,7 @@ public:
>>> void lower_uniform_pull_constant_loads();
>>> bool lower_load_payload();
>>> bool lower_pack();
>>> +   bool lower_regioning();
>>> bool lower_conversions();
>>> bool lower_logical_sends();
>>> bool lower_integer_multiplication();
>>> @@ -536,24 +537,8 @@ namespace brw {
>>>}
>>> }
>>>  
>>> -   /**
>>> -* Remove any modifiers from the \p i-th source region of the
>>> instruction,
>>> -* including negate, abs and any implicit type conversion to the
>>> execution
>>> -* type.  Instead any source modifiers will be implemented as a
>>> separate
>>> -* MOV instruction prior to the original instruction.
>>> -*/
>>> -   inline bool
>>> -   lower_src_modifiers(fs_visitor *v, bblock_t *block, fs_inst
>>> *inst, unsigned i)
>>> -   {
>>> -  assert(inst->components_read(i) == 1);
>>> -  const fs_builder ibld(v, block, inst);
>>> -  const fs_reg tmp = ibld.vgrf(get_exec_type(inst));
>>> -
>>> -  ibld.MOV(tmp, inst->src[i]);
>>> -  inst->src[i] = tmp;
>>> -
>>> -  return true;
>>> -   }
>>> +   bool
>>> +   lower_src_modifiers(fs_visitor *v, bblock_t *block, fs_inst
>>> *inst, unsigned i);
>>>  }
>>>  
>>>  void shuffle_from_32bit_read(const brw::fs_builder ,
>>> diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp
>>> b/src/intel/compiler/brw_fs_lower_regioning.cpp
>>> new file mode 100644
>>> index 000..9578622401d
>>> --- /dev/null
>>> +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
>>> @@ -0,0 +1,382 @@
>>> +/*
>>> + * Copyright © 2018 Intel Corporation
>>>

Re: [Mesa-dev] [PATCH v2 39/53] intel/compiler: add a helper to do conversions between integer and half-float

2019-01-04 Thread Francisco Jerez
Iago Toral  writes:

> On Wed, 2019-01-02 at 15:00 -0800, Francisco Jerez wrote:
>> Iago Toral Quiroga  writes:
>> 
>> > There are hardware restrictions to consider that seem to affect
>> > atom platforms
>> > only.
>> 
>> Same comment here as for PATCH 13 of this series.  This and PATCH 40
>> shouldn't be necessary anymore with [1] in place.  Please drop them.
>
> Actually, I think your pass doesn't handle this case. I have just
> tested this and I get various regressions for conversions between
> integers (specifically integers lower than 32-bit, so I wonder if this
> restriction only affects these cases) and half-float.
>
> Here is an example of final IR generated with your pass and without the
> call to fixup_int_half_float_conversion from my series:
>
> mov(8) vgrf1+0.0:HF, vgrf0<2>:W 
>
> Which should be:
>
> mov(8) vgrf1<2>:HF, vgrf0<2>:W
>
> It seems your pass doesn't act on this code since INTEL_DEBUG=optimizer
> doesn't show any trace of it.
>
> If this is not a bug in your pass and just that it doesn't handle this
> case, I am happy to add the support for it as part of my series if that
> makes sense to you, just let me know if that is the case.
>

It's not really a bug, you just need to add a case to
has_dst_aligned_region_restriction() for it to return true for FP16
instructions with this restriction, sorry I didn't point that out
before.

> Iago
>
>> [1] 
>> https://lists.freedesktop.org/archives/mesa-dev/2018-December/212775.html
>> 
>> > ---
>> >  src/intel/compiler/brw_fs_nir.cpp | 32
>> > +++
>> >  1 file changed, 32 insertions(+)
>> > 
>> > diff --git a/src/intel/compiler/brw_fs_nir.cpp
>> > b/src/intel/compiler/brw_fs_nir.cpp
>> > index 802f5cb0944..a9fd98bab68 100644
>> > --- a/src/intel/compiler/brw_fs_nir.cpp
>> > +++ b/src/intel/compiler/brw_fs_nir.cpp
>> > @@ -696,6 +696,38 @@ fixup_64bit_conversion(const fs_builder ,
>> > return false;
>> >  }
>> >  
>> > +static bool
>> > +fixup_int_half_float_conversion(const fs_builder ,
>> > +fs_reg dst, fs_reg src,
>> > +bool saturate,
>> > +const struct gen_device_info
>> > *devinfo)
>> > +{
>> > +   /* CHV PRM, 3D Media GPGPU Engine, Register Region
>> > Restrictions,
>> > +* Special Restrictions:
>> > +*
>> > +*"Conversion between Integer and HF (Half Float) must be
>> > DWord
>> > +* aligned and strided by a DWord on the destination."
>> > +*
>> > +* The same restriction is listed for other hardware platforms,
>> > however,
>> > +* empirical testing suggests that only atom platforms are
>> > affected.
>> > +*/
>> > +   if (!devinfo->is_cherryview &&
>> > !gen_device_info_is_9lp(devinfo))
>> > +  return false;
>> > +
>> > +   if (!((dst.type == BRW_REGISTER_TYPE_HF &&
>> > !brw_reg_type_is_floating_point(src.type)) ||
>> > + (src.type == BRW_REGISTER_TYPE_HF &&
>> > !brw_reg_type_is_floating_point(dst.type
>> > +  return false;
>> > +
>> > +   fs_reg tmp = horiz_stride(retype(bld.vgrf(BRW_REGISTER_TYPE_F,
>> > 1),
>> > +dst.type),
>> > + 2);
>> > +   bld.MOV(tmp, src);
>> > +   fs_inst *inst = bld.MOV(dst, tmp);
>> > +   inst->saturate = saturate;
>> > +
>> > +   return true;
>> > +}
>> > +
>> >  void
>> >  fs_visitor::nir_emit_alu(const fs_builder , nir_alu_instr
>> > *instr)
>> >  {
>> > -- 
>> > 2.17.1
>> > 
>> > ___
>> > mesa-dev mailing list
>> > mesa-dev@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/10] intel/fs: Introduce regioning lowering pass.

2019-01-04 Thread Francisco Jerez
Iago Toral  writes:

> On Sat, 2018-12-29 at 12:39 -0800, Francisco Jerez wrote:
>> This legalization pass is meant to handle situations where the source
>> or destination regioning controls of an instruction are unsupported
>> by
>> the hardware and need to be lowered away into separate instructions.
>> This should be more reliable and future-proof than the current
>> approach of handling CHV/BXT restrictions manually all over the
>> visitor.  The same mechanism is leveraged to lower unsupported type
>> conversions easily, which obsoletes the lower_conversions pass.
>> ---
>>  src/intel/Makefile.sources|   1 +
>>  src/intel/compiler/brw_fs.cpp |   5 +-
>>  src/intel/compiler/brw_fs.h   |  21 +-
>>  src/intel/compiler/brw_fs_lower_regioning.cpp | 382
>> ++
>>  src/intel/compiler/brw_ir_fs.h|  10 +
>>  src/intel/compiler/meson.build|   1 +
>>  6 files changed, 401 insertions(+), 19 deletions(-)
>>  create mode 100644 src/intel/compiler/brw_fs_lower_regioning.cpp
>> 
>> diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
>> index 5e7d32293b7..6b9874d2b80 100644
>> --- a/src/intel/Makefile.sources
>> +++ b/src/intel/Makefile.sources
>> @@ -64,6 +64,7 @@ COMPILER_FILES = \
>>  compiler/brw_fs_live_variables.h \
>>  compiler/brw_fs_lower_conversions.cpp \
>>  compiler/brw_fs_lower_pack.cpp \
>> +compiler/brw_fs_lower_regioning.cpp \
>>  compiler/brw_fs_nir.cpp \
>>  compiler/brw_fs_reg_allocate.cpp \
>>  compiler/brw_fs_register_coalesce.cpp \
>> diff --git a/src/intel/compiler/brw_fs.cpp
>> b/src/intel/compiler/brw_fs.cpp
>> index 889509badab..caa7a798332 100644
>> --- a/src/intel/compiler/brw_fs.cpp
>> +++ b/src/intel/compiler/brw_fs.cpp
>> @@ -6471,7 +6471,10 @@ fs_visitor::optimize()
>>OPT(dead_code_eliminate);
>> }
>>  
>> -   if (OPT(lower_conversions)) {
>> +   progress = false;
>> +   OPT(lower_conversions);
>> +   OPT(lower_regioning);
>> +   if (progress) {
>>OPT(opt_copy_propagation);
>>OPT(dead_code_eliminate);
>>OPT(lower_simd_width);
>> diff --git a/src/intel/compiler/brw_fs.h
>> b/src/intel/compiler/brw_fs.h
>> index dc36ecc21ac..36825754931 100644
>> --- a/src/intel/compiler/brw_fs.h
>> +++ b/src/intel/compiler/brw_fs.h
>> @@ -164,6 +164,7 @@ public:
>> void lower_uniform_pull_constant_loads();
>> bool lower_load_payload();
>> bool lower_pack();
>> +   bool lower_regioning();
>> bool lower_conversions();
>> bool lower_logical_sends();
>> bool lower_integer_multiplication();
>> @@ -536,24 +537,8 @@ namespace brw {
>>}
>> }
>>  
>> -   /**
>> -* Remove any modifiers from the \p i-th source region of the
>> instruction,
>> -* including negate, abs and any implicit type conversion to the
>> execution
>> -* type.  Instead any source modifiers will be implemented as a
>> separate
>> -* MOV instruction prior to the original instruction.
>> -*/
>> -   inline bool
>> -   lower_src_modifiers(fs_visitor *v, bblock_t *block, fs_inst
>> *inst, unsigned i)
>> -   {
>> -  assert(inst->components_read(i) == 1);
>> -  const fs_builder ibld(v, block, inst);
>> -  const fs_reg tmp = ibld.vgrf(get_exec_type(inst));
>> -
>> -  ibld.MOV(tmp, inst->src[i]);
>> -  inst->src[i] = tmp;
>> -
>> -  return true;
>> -   }
>> +   bool
>> +   lower_src_modifiers(fs_visitor *v, bblock_t *block, fs_inst
>> *inst, unsigned i);
>>  }
>>  
>>  void shuffle_from_32bit_read(const brw::fs_builder ,
>> diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp
>> b/src/intel/compiler/brw_fs_lower_regioning.cpp
>> new file mode 100644
>> index 000..9578622401d
>> --- /dev/null
>> +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
>> @@ -0,0 +1,382 @@
>> +/*
>> + * Copyright © 2018 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person
>> obtaining a
>> + * copy of this software and associated documentation files (the
>> "Software"),
>> + * to deal in the Software without restriction, including without
>> limitation
>> + * the rights to use, copy, modify, merge, publish, distribute,
>> sublicense,
>> + * and/or sell copies of the Software, and to permit

Re: [Mesa-dev] [PATCH v2 39/53] intel/compiler: add a helper to do conversions between integer and half-float

2019-01-02 Thread Francisco Jerez
Iago Toral Quiroga  writes:

> There are hardware restrictions to consider that seem to affect atom platforms
> only.

Same comment here as for PATCH 13 of this series.  This and PATCH 40
shouldn't be necessary anymore with [1] in place.  Please drop them.

[1] https://lists.freedesktop.org/archives/mesa-dev/2018-December/212775.html

> ---
>  src/intel/compiler/brw_fs_nir.cpp | 32 +++
>  1 file changed, 32 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs_nir.cpp 
> b/src/intel/compiler/brw_fs_nir.cpp
> index 802f5cb0944..a9fd98bab68 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -696,6 +696,38 @@ fixup_64bit_conversion(const fs_builder ,
> return false;
>  }
>  
> +static bool
> +fixup_int_half_float_conversion(const fs_builder ,
> +fs_reg dst, fs_reg src,
> +bool saturate,
> +const struct gen_device_info *devinfo)
> +{
> +   /* CHV PRM, 3D Media GPGPU Engine, Register Region Restrictions,
> +* Special Restrictions:
> +*
> +*"Conversion between Integer and HF (Half Float) must be DWord
> +* aligned and strided by a DWord on the destination."
> +*
> +* The same restriction is listed for other hardware platforms, however,
> +* empirical testing suggests that only atom platforms are affected.
> +*/
> +   if (!devinfo->is_cherryview && !gen_device_info_is_9lp(devinfo))
> +  return false;
> +
> +   if (!((dst.type == BRW_REGISTER_TYPE_HF && 
> !brw_reg_type_is_floating_point(src.type)) ||
> + (src.type == BRW_REGISTER_TYPE_HF && 
> !brw_reg_type_is_floating_point(dst.type
> +  return false;
> +
> +   fs_reg tmp = horiz_stride(retype(bld.vgrf(BRW_REGISTER_TYPE_F, 1),
> +dst.type),
> + 2);
> +   bld.MOV(tmp, src);
> +   fs_inst *inst = bld.MOV(dst, tmp);
> +   inst->saturate = saturate;
> +
> +   return true;
> +}
> +
>  void
>  fs_visitor::nir_emit_alu(const fs_builder , nir_alu_instr *instr)
>  {
> -- 
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 13/53] intel/compiler: add a helper to handle conversions to 64-bit in atom

2019-01-02 Thread Francisco Jerez
This patch is redundant with the regioning lowering pass I sent a few
days ago [1].  The problem with this approach is that on the one hand
it's easy for the back-end compiler to cause code which was legalized at
NIR translation time to become illegal again accidentally, on the other
hand there's the maintainability issue of having workarounds for the
exact same restriction scattered all over the place.

Please drop it, there shouldn't be any need to do manual workarounds at
NIR translation time for the CHV/BXT regioning restrictions to be
honored anymore.

[1] https://lists.freedesktop.org/archives/mesa-dev/2018-December/212775.html

Iago Toral Quiroga  writes:

> ---
>  src/intel/compiler/brw_fs_nir.cpp | 55 ++-
>  1 file changed, 33 insertions(+), 22 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_nir.cpp 
> b/src/intel/compiler/brw_fs_nir.cpp
> index 92ec85a27cc..15715651aa6 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -664,6 +664,38 @@ brw_rnd_mode_from_nir_op (const nir_op op) {
> }
>  }
>  
> +static bool
> +fixup_64bit_conversion(const fs_builder ,
> +   fs_reg dst, fs_reg src,
> +   bool saturate,
> +   const struct gen_device_info *devinfo)
> +{
> +   /* CHV PRM, vol07, 3D Media GPGPU Engine, Register Region Restrictions:
> +*
> +*"When source or destination is 64b (...), regioning in Align1
> +* must follow these rules:
> +*
> +* 1. Source and destination horizontal stride must be aligned to
> +*the same qword.
> +* (...)"
> +*
> +* This means that conversions from bit-sizes smaller than 64-bit to
> +* 64-bit need to have the source data elements aligned to 64-bit.
> +* This restriction does not apply to BDW and later.
> +*/
> +   if (type_sz(dst.type) == 8 && type_sz(src.type) < 8 &&
> +   (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo))) {
> +  fs_reg tmp = bld.vgrf(dst.type, 1);
> +  tmp = subscript(tmp, src.type, 0);
> +  bld.MOV(tmp, src);
> +  fs_inst *inst = bld.MOV(dst, tmp);
> +  inst->saturate = saturate;
> +  return true;
> +   }
> +
> +   return false;
> +}
> +
>  void
>  fs_visitor::nir_emit_alu(const fs_builder , nir_alu_instr *instr)
>  {
> @@ -805,29 +837,8 @@ fs_visitor::nir_emit_alu(const fs_builder , 
> nir_alu_instr *instr)
> case nir_op_i2i64:
> case nir_op_u2f64:
> case nir_op_u2u64:
> -  /* CHV PRM, vol07, 3D Media GPGPU Engine, Register Region Restrictions:
> -   *
> -   *"When source or destination is 64b (...), regioning in Align1
> -   * must follow these rules:
> -   *
> -   * 1. Source and destination horizontal stride must be aligned to
> -   *the same qword.
> -   * (...)"
> -   *
> -   * This means that conversions from bit-sizes smaller than 64-bit to
> -   * 64-bit need to have the source data elements aligned to 64-bit.
> -   * This restriction does not apply to BDW and later.
> -   */
> -  if (nir_dest_bit_size(instr->dest.dest) == 64 &&
> -  nir_src_bit_size(instr->src[0].src) < 64 &&
> -  (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo))) {
> - fs_reg tmp = bld.vgrf(result.type, 1);
> - tmp = subscript(tmp, op[0].type, 0);
> - inst = bld.MOV(tmp, op[0]);
> - inst = bld.MOV(result, tmp);
> - inst->saturate = instr->dest.saturate;
> +  if (fixup_64bit_conversion(bld, result, op[0], instr->dest.saturate, 
> devinfo))
>   break;
> -  }
>/* fallthrough */
> case nir_op_f2f32:
> case nir_op_f2i32:
> -- 
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/10] intel/fs: Implement quad swizzles on ICL+.

2018-12-29 Thread Francisco Jerez
Align16 is no longer a thing, so a new implementation is provided
using Align1 instead.  Not all possible swizzles can be represented as
a single Align1 region, but some fast paths are provided for
frequently used swizzles that can be represented efficiently in Align1
mode.

Fixes ~90 subgroup quad swap Vulkan CTS tests.

Cc: mesa-sta...@lists.freedesktop.org
---
 src/intel/compiler/brw_fs.cpp   | 25 +++-
 src/intel/compiler/brw_fs.h |  4 ++
 src/intel/compiler/brw_fs_generator.cpp | 82 -
 3 files changed, 93 insertions(+), 18 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 2f0f0151219..97544fdf465 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -315,6 +315,20 @@ fs_inst::has_source_and_destination_hazard() const
* may stomp all over it.
*/
   return true;
+   case SHADER_OPCODE_QUAD_SWIZZLE:
+  switch (src[1].ud) {
+  case BRW_SWIZZLE_:
+  case BRW_SWIZZLE_:
+  case BRW_SWIZZLE_:
+  case BRW_SWIZZLE_:
+  case BRW_SWIZZLE_XXZZ:
+  case BRW_SWIZZLE_YYWW:
+  case BRW_SWIZZLE_XYXY:
+  case BRW_SWIZZLE_ZWZW:
+ return false;
+  default:
+ return !is_uniform(src[0]);
+  }
default:
   /* The SIMD16 compressed instruction
*
@@ -5579,9 +5593,14 @@ get_lowered_simd_width(const struct gen_device_info 
*devinfo,
case SHADER_OPCODE_URB_WRITE_SIMD8_MASKED_PER_SLOT:
   return MIN2(8, inst->exec_size);
 
-   case SHADER_OPCODE_QUAD_SWIZZLE:
-  return 8;
-
+   case SHADER_OPCODE_QUAD_SWIZZLE: {
+  const unsigned swiz = inst->src[1].ud;
+  return (is_uniform(inst->src[0]) ?
+ get_fpu_lowered_simd_width(devinfo, inst) :
+  devinfo->gen < 11 && type_sz(inst->src[0].type) == 4 ? 8 :
+  swiz == BRW_SWIZZLE_XYXY || swiz == BRW_SWIZZLE_ZWZW ? 4 :
+  get_fpu_lowered_simd_width(devinfo, inst));
+   }
case SHADER_OPCODE_MOV_INDIRECT: {
   /* From IVB and HSW PRMs:
*
diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 53d9b6ce7bf..dc36ecc21ac 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -480,6 +480,10 @@ private:
  struct brw_reg src,
  struct brw_reg idx);
 
+   void generate_quad_swizzle(const fs_inst *inst,
+  struct brw_reg dst, struct brw_reg src,
+  unsigned swiz);
+
bool patch_discard_jumps_to_fb_writes();
 
const struct brw_compiler *compiler;
diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index 08dd83dded7..84627e83132 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -582,6 +582,72 @@ fs_generator::generate_shuffle(fs_inst *inst,
}
 }
 
+void
+fs_generator::generate_quad_swizzle(const fs_inst *inst,
+struct brw_reg dst, struct brw_reg src,
+unsigned swiz)
+{
+   /* Requires a quad. */
+   assert(inst->exec_size >= 4);
+
+   if (src.file == BRW_IMMEDIATE_VALUE ||
+   has_scalar_region(src)) {
+  /* The value is uniform across all channels */
+  brw_MOV(p, dst, src);
+
+   } else if (devinfo->gen < 11 && type_sz(src.type) == 4) {
+  /* This only works on 8-wide 32-bit values */
+  assert(inst->exec_size == 8);
+  assert(src.hstride == BRW_HORIZONTAL_STRIDE_1);
+  assert(src.vstride == src.width + 1);
+  brw_set_default_access_mode(p, BRW_ALIGN_16);
+  struct brw_reg swiz_src = stride(src, 4, 4, 1);
+  swiz_src.swizzle = swiz;
+  brw_MOV(p, dst, swiz_src);
+
+   } else {
+  assert(src.hstride == BRW_HORIZONTAL_STRIDE_1);
+  assert(src.vstride == src.width + 1);
+  const struct brw_reg src_0 = suboffset(src, BRW_GET_SWZ(swiz, 0));
+
+  switch (swiz) {
+  case BRW_SWIZZLE_:
+  case BRW_SWIZZLE_:
+  case BRW_SWIZZLE_:
+  case BRW_SWIZZLE_:
+ brw_MOV(p, dst, stride(src_0, 4, 4, 0));
+ break;
+
+  case BRW_SWIZZLE_XXZZ:
+  case BRW_SWIZZLE_YYWW:
+ brw_MOV(p, dst, stride(src_0, 2, 2, 0));
+ break;
+
+  case BRW_SWIZZLE_XYXY:
+  case BRW_SWIZZLE_ZWZW:
+ assert(inst->exec_size == 4);
+ brw_MOV(p, dst, stride(src_0, 0, 2, 1));
+ break;
+
+  default:
+ assert(inst->force_writemask_all);
+ brw_set_default_exec_size(p, cvt(inst->exec_size / 4) - 1);
+
+ for (unsigned c = 0; c < 4; c++) {
+brw_inst *insn = brw_MOV(
+   p, stride(suboffset(dst, c),
+ 4 * inst->dst.stride, 1, 4 * inst->dst.stride),
+   stride(suboffset(src, BRW_GET_SWZ(swiz, c)), 4, 1, 0));
+
+brw_inst_set_no_dd_clear(devinfo, insn, c < 3);
+   

[Mesa-dev] Assorted bug fixes and improvements back-ported from an internal branch.

2018-12-29 Thread Francisco Jerez
These are a number of fixes and clean-ups we've been carrying around
for a while in an internal branch.  Most of the fixes are required for
conformance of a future platform, but due to their nature some of them
are likely to affect shipping platforms as well -- Especially the
issues addressed by patches 1 and 5, and certainly the issue addressed
by PATCH 2 which was causing Vulkan CTS failures on ICL.

PATCH 7 introduces a more automated approach to enforce any regioning
restrictions of the hardware, which should be more reliable than the
current approach of enforcing them manually at NIR translation time
hoping that the optimizer will leave the workarounds untouched.  It
has some potential to fix bugs in certain scenarios, but it's
intrusive enough that it's not marked for inclusion in mesa-stable
yet.

Patches 8-9 take advantage of the lowering pass in order to get rid of
a bunch of code that is now redundant.  The code removed by PATCH 10
has been redundant ever since the FS IR gained the ability to
represent strided sources.

[PATCH 01/10] intel/fs: Handle source modifiers in 
lower_integer_multiplication().
[PATCH 02/10] intel/fs: Implement quad swizzles on ICL+.
[PATCH 03/10] intel/fs: Fix bug in lower_simd_width while splitting an 
instruction which was already split.
[PATCH 04/10] intel/eu/gen7: Fix brw_MOV() with DF destination and strided 
source.
[PATCH 05/10] intel/fs: Respect CHV/BXT regioning restrictions in copy 
propagation pass.
[PATCH 06/10] intel/fs: Constify fs_inst::can_do_source_mods().
[PATCH 07/10] intel/fs: Introduce regioning lowering pass.
[PATCH 08/10] intel/fs: Remove existing lower_conversions pass.
[PATCH 09/10] intel/fs: Remove nasty open-coded CHV/BXT 64-bit workarounds.
[PATCH 10/10] intel/fs: Remove FS_OPCODE_UNPACK_HALF_2x16_SPLIT opcodes.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/10] intel/fs: Introduce regioning lowering pass.

2018-12-29 Thread Francisco Jerez
This legalization pass is meant to handle situations where the source
or destination regioning controls of an instruction are unsupported by
the hardware and need to be lowered away into separate instructions.
This should be more reliable and future-proof than the current
approach of handling CHV/BXT restrictions manually all over the
visitor.  The same mechanism is leveraged to lower unsupported type
conversions easily, which obsoletes the lower_conversions pass.
---
 src/intel/Makefile.sources|   1 +
 src/intel/compiler/brw_fs.cpp |   5 +-
 src/intel/compiler/brw_fs.h   |  21 +-
 src/intel/compiler/brw_fs_lower_regioning.cpp | 382 ++
 src/intel/compiler/brw_ir_fs.h|  10 +
 src/intel/compiler/meson.build|   1 +
 6 files changed, 401 insertions(+), 19 deletions(-)
 create mode 100644 src/intel/compiler/brw_fs_lower_regioning.cpp

diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
index 5e7d32293b7..6b9874d2b80 100644
--- a/src/intel/Makefile.sources
+++ b/src/intel/Makefile.sources
@@ -64,6 +64,7 @@ COMPILER_FILES = \
compiler/brw_fs_live_variables.h \
compiler/brw_fs_lower_conversions.cpp \
compiler/brw_fs_lower_pack.cpp \
+   compiler/brw_fs_lower_regioning.cpp \
compiler/brw_fs_nir.cpp \
compiler/brw_fs_reg_allocate.cpp \
compiler/brw_fs_register_coalesce.cpp \
diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 889509badab..caa7a798332 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -6471,7 +6471,10 @@ fs_visitor::optimize()
   OPT(dead_code_eliminate);
}
 
-   if (OPT(lower_conversions)) {
+   progress = false;
+   OPT(lower_conversions);
+   OPT(lower_regioning);
+   if (progress) {
   OPT(opt_copy_propagation);
   OPT(dead_code_eliminate);
   OPT(lower_simd_width);
diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index dc36ecc21ac..36825754931 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -164,6 +164,7 @@ public:
void lower_uniform_pull_constant_loads();
bool lower_load_payload();
bool lower_pack();
+   bool lower_regioning();
bool lower_conversions();
bool lower_logical_sends();
bool lower_integer_multiplication();
@@ -536,24 +537,8 @@ namespace brw {
   }
}
 
-   /**
-* Remove any modifiers from the \p i-th source region of the instruction,
-* including negate, abs and any implicit type conversion to the execution
-* type.  Instead any source modifiers will be implemented as a separate
-* MOV instruction prior to the original instruction.
-*/
-   inline bool
-   lower_src_modifiers(fs_visitor *v, bblock_t *block, fs_inst *inst, unsigned 
i)
-   {
-  assert(inst->components_read(i) == 1);
-  const fs_builder ibld(v, block, inst);
-  const fs_reg tmp = ibld.vgrf(get_exec_type(inst));
-
-  ibld.MOV(tmp, inst->src[i]);
-  inst->src[i] = tmp;
-
-  return true;
-   }
+   bool
+   lower_src_modifiers(fs_visitor *v, bblock_t *block, fs_inst *inst, unsigned 
i);
 }
 
 void shuffle_from_32bit_read(const brw::fs_builder ,
diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp 
b/src/intel/compiler/brw_fs_lower_regioning.cpp
new file mode 100644
index 000..9578622401d
--- /dev/null
+++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
@@ -0,0 +1,382 @@
+/*
+ * Copyright © 2018 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "brw_fs.h"
+#include "brw_cfg.h"
+#include "brw_fs_builder.h"
+
+using namespace brw;
+
+namespace {
+   /* From the SKL PRM Vol 2a, "Move":
+*
+* "A mov with the same source and destination type, no source modifier,
+*  and no saturation is a raw move. A packed byte destination region (B
+*  

[Mesa-dev] [PATCH 03/10] intel/fs: Fix bug in lower_simd_width while splitting an instruction which was already split.

2018-12-29 Thread Francisco Jerez
This seems to be a problem in combination with the lower_regioning
pass introduced by a future commit, which can modify a SIMD-split
instruction causing its execution size to become illegal again.  A
subsequent call to lower_simd_width() would hit this bug on a future
platform.

Cc: mesa-sta...@lists.freedesktop.org
---
 src/intel/compiler/brw_fs.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 97544fdf465..4aacc72a1b7 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -5666,7 +5666,7 @@ static fs_reg
 emit_unzip(const fs_builder , fs_inst *inst, unsigned i)
 {
/* Specified channel group from the source region. */
-   const fs_reg src = horiz_offset(inst->src[i], lbld.group());
+   const fs_reg src = horiz_offset(inst->src[i], lbld.group() - inst->group);
 
if (needs_src_copy(lbld, inst, i)) {
   /* Builder of the right width to perform the copy avoiding uninitialized
@@ -5757,7 +5757,7 @@ emit_zip(const fs_builder _before, const fs_builder 
_after,
assert(lbld_before.group() == lbld_after.group());
 
/* Specified channel group from the destination region. */
-   const fs_reg dst = horiz_offset(inst->dst, lbld_after.group());
+   const fs_reg dst = horiz_offset(inst->dst, lbld_after.group() - 
inst->group);
const unsigned dst_size = inst->size_written /
   inst->dst.component_size(inst->exec_size);
 
-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/10] intel/eu/gen7: Fix brw_MOV() with DF destination and strided source.

2018-12-29 Thread Francisco Jerez
I triggered this bug while prototyping code for a future platform on
IVB.  Could be a problem today though if a strided move is
copy-propagated into a type-converting move with DF destination.

Cc: mesa-sta...@lists.freedesktop.org
---
 src/intel/compiler/brw_eu_emit.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index 483037345e9..3c8f7b9ab93 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++ b/src/intel/compiler/brw_eu_emit.c
@@ -942,8 +942,8 @@ brw_MOV(struct brw_codegen *p, struct brw_reg dest, struct 
brw_reg src0)
const struct gen_device_info *devinfo = p->devinfo;
 
/* When converting F->DF on IVB/BYT, every odd source channel is ignored.
-* To avoid the problems that causes, we use a <1,2,0> source region to read
-* each element twice.
+* To avoid the problems that causes, we use an  source region to
+* read each element twice.
 */
if (devinfo->gen == 7 && !devinfo->is_haswell &&
brw_get_default_access_mode(p) == BRW_ALIGN_1 &&
@@ -952,11 +952,8 @@ brw_MOV(struct brw_codegen *p, struct brw_reg dest, struct 
brw_reg src0)
 src0.type == BRW_REGISTER_TYPE_D ||
 src0.type == BRW_REGISTER_TYPE_UD) &&
!has_scalar_region(src0)) {
-  assert(src0.vstride == BRW_VERTICAL_STRIDE_4 &&
- src0.width == BRW_WIDTH_4 &&
- src0.hstride == BRW_HORIZONTAL_STRIDE_1);
-
-  src0.vstride = BRW_VERTICAL_STRIDE_1;
+  assert(src0.vstride == src0.width + src0.hstride);
+  src0.vstride = src0.hstride;
   src0.width = BRW_WIDTH_2;
   src0.hstride = BRW_HORIZONTAL_STRIDE_0;
}
-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/10] intel/fs: Respect CHV/BXT regioning restrictions in copy propagation pass.

2018-12-29 Thread Francisco Jerez
Currently the visitor attempts to enforce the regioning restrictions
that apply to double-precision instructions on CHV/BXT at NIR-to-i965
translation time.  It is possible though for the copy propagation pass
to violate this restriction if a strided move is propagated into one
of the affected instructions.  I've only reproduced this issue on a
future platform but it could affect CHV/BXT too under the right
conditions.

Cc: mesa-sta...@lists.freedesktop.org
---
 .../compiler/brw_fs_copy_propagation.cpp  | 10 +++
 src/intel/compiler/brw_ir_fs.h| 28 +++
 2 files changed, 38 insertions(+)

diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp 
b/src/intel/compiler/brw_fs_copy_propagation.cpp
index a8ec1c34630..c23ce1ef426 100644
--- a/src/intel/compiler/brw_fs_copy_propagation.cpp
+++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
@@ -315,6 +315,16 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned 
stride,
if (stride > 4)
   return false;
 
+   /* Bail if the channels of the source need to be aligned to the byte offset
+* of the corresponding channel of the destination, and the provided stride
+* would break this restriction.
+*/
+   if (has_dst_aligned_region_restriction(devinfo, inst) &&
+   !(type_sz(inst->src[arg].type) * stride ==
+   type_sz(inst->dst.type) * inst->dst.stride ||
+ stride == 0))
+  return false;
+
/* 3-source instructions can only be Align16, which restricts what strides
 * they can take. They can only take a stride of 1 (the usual case), or 0
 * with a special "repctrl" bit. But the repctrl bit doesn't work for
diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
index 07e7224e0f8..95b069a2e02 100644
--- a/src/intel/compiler/brw_ir_fs.h
+++ b/src/intel/compiler/brw_ir_fs.h
@@ -486,4 +486,32 @@ get_exec_type_size(const fs_inst *inst)
return type_sz(get_exec_type(inst));
 }
 
+/**
+ * Return whether the following regioning restriction applies to the specified
+ * instruction.  From the Cherryview PRM Vol 7. "Register Region
+ * Restrictions":
+ *
+ * "When source or destination datatype is 64b or operation is integer DWord
+ *  multiply, regioning in Align1 must follow these rules:
+ *
+ *  1. Source and Destination horizontal stride must be aligned to the same 
qword.
+ *  2. Regioning must ensure Src.Vstride = Src.Width * Src.Hstride.
+ *  3. Source and Destination offset must be the same, except the case of
+ * scalar source."
+ */
+static inline bool
+has_dst_aligned_region_restriction(const gen_device_info *devinfo,
+   const fs_inst *inst)
+{
+   const brw_reg_type exec_type = get_exec_type(inst);
+   const bool is_int_multiply = !brw_reg_type_is_floating_point(exec_type) &&
+ (inst->opcode == BRW_OPCODE_MUL || inst->opcode == BRW_OPCODE_MAD);
+
+   if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
+   (type_sz(exec_type) == 4 && is_int_multiply))
+  return devinfo->is_cherryview || gen_device_info_is_9lp(devinfo);
+   else
+  return false;
+}
+
 #endif
-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/10] intel/fs: Remove nasty open-coded CHV/BXT 64-bit workarounds.

2018-12-29 Thread Francisco Jerez
---
 src/intel/compiler/brw_fs_builder.h | 68 +-
 src/intel/compiler/brw_fs_nir.cpp   | 89 +++--
 2 files changed, 12 insertions(+), 145 deletions(-)

diff --git a/src/intel/compiler/brw_fs_builder.h 
b/src/intel/compiler/brw_fs_builder.h
index 4846820722c..c50af4c1f55 100644
--- a/src/intel/compiler/brw_fs_builder.h
+++ b/src/intel/compiler/brw_fs_builder.h
@@ -451,43 +451,13 @@ namespace brw {
 
  if (cluster_size > 1) {
 const fs_builder ubld = exec_all().group(dispatch_width() / 2, 0);
-dst_reg left = horiz_stride(tmp, 2);
-dst_reg right = horiz_stride(horiz_offset(tmp, 1), 2);
-
-/* From the Cherryview PRM Vol. 7, "Register Region Restrictiosn":
- *
- *"When source or destination datatype is 64b or operation is
- *integer DWord multiply, regioning in Align1 must follow
- *these rules:
- *
- *[...]
- *
- *3. Source and Destination offset must be the same, except
- *   the case of scalar source."
- *
- * In order to work around this, we create a temporary register
- * and shift left over to match right.  If we have a 64-bit type,
- * we have to use two integer MOVs instead of a 64-bit MOV.
- */
-if (need_matching_subreg_offset(opcode, tmp.type)) {
-   dst_reg tmp2 = vgrf(tmp.type);
-   dst_reg new_left = horiz_stride(horiz_offset(tmp2, 1), 2);
-   if (type_sz(tmp.type) > 4) {
-  ubld.MOV(subscript(new_left, BRW_REGISTER_TYPE_D, 0),
-   subscript(left, BRW_REGISTER_TYPE_D, 0));
-  ubld.MOV(subscript(new_left, BRW_REGISTER_TYPE_D, 1),
-   subscript(left, BRW_REGISTER_TYPE_D, 1));
-   } else {
-  ubld.MOV(new_left, left);
-   }
-   left = new_left;
-}
+const dst_reg left = horiz_stride(tmp, 2);
+const dst_reg right = horiz_stride(horiz_offset(tmp, 1), 2);
 set_condmod(mod, ubld.emit(opcode, right, left, right));
  }
 
  if (cluster_size > 2) {
-if (type_sz(tmp.type) <= 4 &&
-!need_matching_subreg_offset(opcode, tmp.type)) {
+if (type_sz(tmp.type) <= 4) {
const fs_builder ubld =
   exec_all().group(dispatch_width() / 4, 0);
src_reg left = horiz_stride(horiz_offset(tmp, 1), 4);
@@ -787,38 +757,6 @@ namespace brw {
  }
   }
 
-
-  /* From the Cherryview PRM Vol. 7, "Register Region Restrictiosn":
-   *
-   *"When source or destination datatype is 64b or operation is
-   *integer DWord multiply, regioning in Align1 must follow
-   *these rules:
-   *
-   *[...]
-   *
-   *3. Source and Destination offset must be the same, except
-   *   the case of scalar source."
-   *
-   * This helper just detects when we're in this case.
-   */
-  bool
-  need_matching_subreg_offset(enum opcode opcode,
-  enum brw_reg_type type) const
-  {
- if (!shader->devinfo->is_cherryview &&
- !gen_device_info_is_9lp(shader->devinfo))
-return false;
-
- if (type_sz(type) > 4)
-return true;
-
- if (opcode == BRW_OPCODE_MUL &&
- !brw_reg_type_is_floating_point(type))
-return true;
-
- return false;
-  }
-
   bblock_t *block;
   exec_node *cursor;
 
diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 92ec85a27cc..312cd22de79 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -805,30 +805,6 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
case nir_op_i2i64:
case nir_op_u2f64:
case nir_op_u2u64:
-  /* CHV PRM, vol07, 3D Media GPGPU Engine, Register Region Restrictions:
-   *
-   *"When source or destination is 64b (...), regioning in Align1
-   * must follow these rules:
-   *
-   * 1. Source and destination horizontal stride must be aligned to
-   *the same qword.
-   * (...)"
-   *
-   * This means that conversions from bit-sizes smaller than 64-bit to
-   * 64-bit need to have the source data elements aligned to 64-bit.
-   * This restriction does not apply to BDW and later.
-   */
-  if (nir_dest_bit_size(instr->dest.dest) == 64 &&
-  nir_src_bit_size(instr->src[0].src) < 64 &&
-  (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo))) {
- fs_reg tmp = bld.vgrf(result.type, 1);
- tmp = subscript(tmp, op[0].type, 0);
- inst = bld.MOV(tmp, op[0]);
- inst 

[Mesa-dev] [PATCH 01/10] intel/fs: Handle source modifiers in lower_integer_multiplication().

2018-12-29 Thread Francisco Jerez
lower_integer_multiplication() implements 32x32-bit multiplication on
some platforms by bit-casting one of the 32-bit sources into two
16-bit unsigned integer portions.  This can give incorrect results if
the original instruction specified a source modifier.  Fix it by
emitting an additional MOV instruction implementing the source
modifiers where necessary.

Cc: mesa-sta...@lists.freedesktop.org
---
 src/intel/compiler/brw_fs.cpp | 20 ++--
 src/intel/compiler/brw_fs.h   | 19 +++
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 590a9b32a8e..2f0f0151219 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -3862,6 +3862,9 @@ fs_visitor::lower_integer_multiplication()
 high.offset = inst->dst.offset % REG_SIZE;
 
 if (devinfo->gen >= 7) {
+   if (inst->src[1].abs)
+  lower_src_modifiers(this, block, inst, 1);
+
if (inst->src[1].file == IMM) {
   ibld.MUL(low, inst->src[0],
brw_imm_uw(inst->src[1].ud & 0x));
@@ -3874,6 +3877,9 @@ fs_visitor::lower_integer_multiplication()
subscript(inst->src[1], BRW_REGISTER_TYPE_UW, 1));
}
 } else {
+   if (inst->src[0].abs)
+  lower_src_modifiers(this, block, inst, 0);
+
ibld.MUL(low, subscript(inst->src[0], BRW_REGISTER_TYPE_UW, 0),
 inst->src[1]);
ibld.MUL(high, subscript(inst->src[0], BRW_REGISTER_TYPE_UW, 1),
@@ -3891,6 +3897,18 @@ fs_visitor::lower_integer_multiplication()
  }
 
   } else if (inst->opcode == SHADER_OPCODE_MULH) {
+ /* According to the BDW+ BSpec page for the "Multiply Accumulate
+  * High" instruction:
+  *
+  *  "An added preliminary mov is required for source modification on
+  *   src1:
+  *  mov (8) r3.0<1>:d -r3<8;8,1>:d
+  *  mul (8) acc0:d r2.0<8;8,1>:d r3.0<16;8,2>:uw
+  *  mach (8) r5.0<1>:d r2.0<8;8,1>:d r3.0<8;8,1>:d"
+  */
+ if (devinfo->gen >= 8 && (inst->src[1].negate || inst->src[1].abs))
+lower_src_modifiers(this, block, inst, 1);
+
  /* Should have been lowered to 8-wide. */
  assert(inst->exec_size <= get_lowered_simd_width(devinfo, inst));
  const fs_reg acc = retype(brw_acc_reg(inst->exec_size),
@@ -3906,8 +3924,6 @@ fs_visitor::lower_integer_multiplication()
  * On Gen8, the multiply instruction does a full 32x32-bit
  * multiply, but in order to do a 64-bit multiply we can simulate
  * the previous behavior and then use a MACH instruction.
- *
- * FINISHME: Don't use source modifiers on src1.
  */
 assert(mul->src[1].type == BRW_REGISTER_TYPE_D ||
mul->src[1].type == BRW_REGISTER_TYPE_UD);
diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 163c0008820..53d9b6ce7bf 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -531,6 +531,25 @@ namespace brw {
  return fs_reg(retype(brw_vec8_grf(regs[0], 0), type));
   }
}
+
+   /**
+* Remove any modifiers from the \p i-th source region of the instruction,
+* including negate, abs and any implicit type conversion to the execution
+* type.  Instead any source modifiers will be implemented as a separate
+* MOV instruction prior to the original instruction.
+*/
+   inline bool
+   lower_src_modifiers(fs_visitor *v, bblock_t *block, fs_inst *inst, unsigned 
i)
+   {
+  assert(inst->components_read(i) == 1);
+  const fs_builder ibld(v, block, inst);
+  const fs_reg tmp = ibld.vgrf(get_exec_type(inst));
+
+  ibld.MOV(tmp, inst->src[i]);
+  inst->src[i] = tmp;
+
+  return true;
+   }
 }
 
 void shuffle_from_32bit_read(const brw::fs_builder ,
-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/10] intel/fs: Remove FS_OPCODE_UNPACK_HALF_2x16_SPLIT opcodes.

2018-12-29 Thread Francisco Jerez
These are broken on a future platform, but it turns out we don't need
to fix them, since they're just type-converting moves with strided
source.  Kill them.
---
 src/intel/compiler/brw_eu_defines.h |  2 --
 src/intel/compiler/brw_fs.cpp   |  2 --
 src/intel/compiler/brw_fs.h |  3 ---
 src/intel/compiler/brw_fs_generator.cpp | 34 -
 src/intel/compiler/brw_fs_nir.cpp   |  6 +++--
 src/intel/compiler/brw_shader.cpp   |  4 ---
 6 files changed, 4 insertions(+), 47 deletions(-)

diff --git a/src/intel/compiler/brw_eu_defines.h 
b/src/intel/compiler/brw_eu_defines.h
index affe977835b..b7bd104be59 100644
--- a/src/intel/compiler/brw_eu_defines.h
+++ b/src/intel/compiler/brw_eu_defines.h
@@ -523,8 +523,6 @@ enum opcode {
FS_OPCODE_DISCARD_JUMP,
FS_OPCODE_SET_SAMPLE_ID,
FS_OPCODE_PACK_HALF_2x16_SPLIT,
-   FS_OPCODE_UNPACK_HALF_2x16_SPLIT_X,
-   FS_OPCODE_UNPACK_HALF_2x16_SPLIT_Y,
FS_OPCODE_PLACEHOLDER_HALT,
FS_OPCODE_INTERPOLATE_AT_SAMPLE,
FS_OPCODE_INTERPOLATE_AT_SHARED_OFFSET,
diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index d6280d558ec..42fa507c4ec 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -5471,8 +5471,6 @@ get_lowered_simd_width(const struct gen_device_info 
*devinfo,
case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD:
case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7:
case FS_OPCODE_PACK_HALF_2x16_SPLIT:
-   case FS_OPCODE_UNPACK_HALF_2x16_SPLIT_X:
-   case FS_OPCODE_UNPACK_HALF_2x16_SPLIT_Y:
case FS_OPCODE_INTERPOLATE_AT_SAMPLE:
case FS_OPCODE_INTERPOLATE_AT_SHARED_OFFSET:
case FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET:
diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 7edaa3af43c..d141a9237df 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -461,9 +461,6 @@ private:
   struct brw_reg dst,
   struct brw_reg x,
   struct brw_reg y);
-   void generate_unpack_half_2x16_split(fs_inst *inst,
-struct brw_reg dst,
-struct brw_reg src);
 
void generate_shader_time_add(fs_inst *inst,
  struct brw_reg payload,
diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index 84627e83132..9088c97d92b 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -1755,35 +1755,6 @@ fs_generator::generate_pack_half_2x16_split(fs_inst *,
brw_F32TO16(p, dst_w, x);
 }
 
-void
-fs_generator::generate_unpack_half_2x16_split(fs_inst *inst,
-  struct brw_reg dst,
-  struct brw_reg src)
-{
-   assert(devinfo->gen >= 7);
-   assert(dst.type == BRW_REGISTER_TYPE_F);
-   assert(src.type == BRW_REGISTER_TYPE_UD);
-
-   /* From the Ivybridge PRM, Vol4, Part3, Section 6.26 f16to32:
-*
-*   Because this instruction does not have a 16-bit floating-point type,
-*   the source data type must be Word (W). The destination type must be
-*   F (Float).
-*/
-   struct brw_reg src_w = spread(retype(src, BRW_REGISTER_TYPE_W), 2);
-
-   /* Each channel of src has the form of unpackHalf2x16's input: 0x.
-* For the Y case, we wish to access only the upper word; therefore
-* a 16-bit subregister offset is needed.
-*/
-   assert(inst->opcode == FS_OPCODE_UNPACK_HALF_2x16_SPLIT_X ||
-  inst->opcode == FS_OPCODE_UNPACK_HALF_2x16_SPLIT_Y);
-   if (inst->opcode == FS_OPCODE_UNPACK_HALF_2x16_SPLIT_Y)
-  src_w.subnr += 2;
-
-   brw_F16TO32(p, dst, src_w);
-}
-
 void
 fs_generator::generate_shader_time_add(fs_inst *,
struct brw_reg payload,
@@ -2421,11 +2392,6 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
   generate_pack_half_2x16_split(inst, dst, src[0], src[1]);
   break;
 
-  case FS_OPCODE_UNPACK_HALF_2x16_SPLIT_X:
-  case FS_OPCODE_UNPACK_HALF_2x16_SPLIT_Y:
- generate_unpack_half_2x16_split(inst, dst, src[0]);
- break;
-
   case FS_OPCODE_PLACEHOLDER_HALT:
  /* This is the place where the final HALT needs to be inserted if
   * we've emitted any discards.  If not, this will emit no code.
diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 312cd22de79..fe3dff016ad 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -1319,11 +1319,13 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
   unreachable("not reached: should be handled by lower_packing_builtins");
 
case nir_op_unpack_half_2x16_split_x:
-  inst = bld.emit(FS_OPCODE_UNPACK_HALF_2x16_SPLIT_X, result, op[0]);
+  

[Mesa-dev] [PATCH 06/10] intel/fs: Constify fs_inst::can_do_source_mods().

2018-12-29 Thread Francisco Jerez
---
 src/intel/compiler/brw_fs.cpp  | 2 +-
 src/intel/compiler/brw_ir_fs.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 4aacc72a1b7..889509badab 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -395,7 +395,7 @@ fs_inst::is_copy_payload(const brw::simple_allocator 
_alloc) const
 }
 
 bool
-fs_inst::can_do_source_mods(const struct gen_device_info *devinfo)
+fs_inst::can_do_source_mods(const struct gen_device_info *devinfo) const
 {
if (devinfo->gen == 6 && is_math())
   return false;
diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
index 95b069a2e02..5bb92e4cc86 100644
--- a/src/intel/compiler/brw_ir_fs.h
+++ b/src/intel/compiler/brw_ir_fs.h
@@ -353,7 +353,7 @@ public:
bool is_copy_payload(const brw::simple_allocator _alloc) const;
unsigned components_read(unsigned i) const;
unsigned size_read(int arg) const;
-   bool can_do_source_mods(const struct gen_device_info *devinfo);
+   bool can_do_source_mods(const struct gen_device_info *devinfo) const;
bool can_do_cmod();
bool can_change_types() const;
bool has_source_and_destination_hazard() const;
-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/10] intel/fs: Remove existing lower_conversions pass.

2018-12-29 Thread Francisco Jerez
It's redundant with the functionality provided by lower_regioning now.
---
 src/intel/Makefile.sources|   1 -
 src/intel/compiler/brw_fs.cpp |   1 -
 src/intel/compiler/brw_fs.h   |   1 -
 .../compiler/brw_fs_lower_conversions.cpp | 132 --
 src/intel/compiler/meson.build|   1 -
 5 files changed, 136 deletions(-)
 delete mode 100644 src/intel/compiler/brw_fs_lower_conversions.cpp

diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
index 6b9874d2b80..fe06a57b42e 100644
--- a/src/intel/Makefile.sources
+++ b/src/intel/Makefile.sources
@@ -62,7 +62,6 @@ COMPILER_FILES = \
compiler/brw_fs.h \
compiler/brw_fs_live_variables.cpp \
compiler/brw_fs_live_variables.h \
-   compiler/brw_fs_lower_conversions.cpp \
compiler/brw_fs_lower_pack.cpp \
compiler/brw_fs_lower_regioning.cpp \
compiler/brw_fs_nir.cpp \
diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index caa7a798332..d6280d558ec 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -6472,7 +6472,6 @@ fs_visitor::optimize()
}
 
progress = false;
-   OPT(lower_conversions);
OPT(lower_regioning);
if (progress) {
   OPT(opt_copy_propagation);
diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 36825754931..7edaa3af43c 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -165,7 +165,6 @@ public:
bool lower_load_payload();
bool lower_pack();
bool lower_regioning();
-   bool lower_conversions();
bool lower_logical_sends();
bool lower_integer_multiplication();
bool lower_minmax();
diff --git a/src/intel/compiler/brw_fs_lower_conversions.cpp 
b/src/intel/compiler/brw_fs_lower_conversions.cpp
deleted file mode 100644
index 145fb55f995..000
--- a/src/intel/compiler/brw_fs_lower_conversions.cpp
+++ /dev/null
@@ -1,132 +0,0 @@
-/*
- * Copyright © 2015 Connor Abbott
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
- * IN THE SOFTWARE.
- */
-
-#include "brw_fs.h"
-#include "brw_cfg.h"
-#include "brw_fs_builder.h"
-
-using namespace brw;
-
-static bool
-supports_type_conversion(const fs_inst *inst) {
-   switch (inst->opcode) {
-   case BRW_OPCODE_MOV:
-   case SHADER_OPCODE_MOV_INDIRECT:
-  return true;
-   case BRW_OPCODE_SEL:
-  return inst->dst.type == get_exec_type(inst);
-   default:
-  /* FIXME: We assume the opcodes don't explicitly mentioned
-   * before just work fine with arbitrary conversions.
-   */
-  return true;
-   }
-}
-
-/* From the SKL PRM Vol 2a, "Move":
- *
- *"A mov with the same source and destination type, no source modifier,
- * and no saturation is a raw move. A packed byte destination region (B
- * or UB type with HorzStride == 1 and ExecSize > 1) can only be written
- * using raw move."
- */
-static bool
-is_byte_raw_mov (const fs_inst *inst)
-{
-   return type_sz(inst->dst.type) == 1 &&
-  inst->opcode == BRW_OPCODE_MOV &&
-  inst->src[0].type == inst->dst.type &&
-  !inst->saturate &&
-  !inst->src[0].negate &&
-  !inst->src[0].abs;
-}
-
-bool
-fs_visitor::lower_conversions()
-{
-   bool progress = false;
-
-   foreach_block_and_inst(block, fs_inst, inst, cfg) {
-  const fs_builder ibld(this, block, inst);
-  fs_reg dst = inst->dst;
-  bool saturate = inst->saturate;
-
-  if (supports_type_conversion(inst)) {
- if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
- !is_byte_raw_mov(inst)) {
-/* From the Broadwell PRM, 3D Media GPGPU, "Double Precision Float 
to
- * Single Precision Float":
- *
- *The upper Dword of every Qword will be written with undefined
- *value when converting DF to F.
- *
-  

Re: [Mesa-dev] [PATCH] intel/compiler: Flag surface reads as having side-effects

2018-11-26 Thread Francisco Jerez
Jason Ekstrand  writes:

> ---
>  src/intel/compiler/brw_shader.cpp | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/src/intel/compiler/brw_shader.cpp 
> b/src/intel/compiler/brw_shader.cpp
> index 34b8f3acf93..5cb91e0dce9 100644
> --- a/src/intel/compiler/brw_shader.cpp
> +++ b/src/intel/compiler/brw_shader.cpp
> @@ -1007,6 +1007,18 @@ backend_instruction::has_side_effects() const
> case SHADER_OPCODE_SEND:
>return send_has_side_effects;
>  
> +   case SHADER_OPCODE_TYPED_SURFACE_READ:
> +   case SHADER_OPCODE_TYPED_SURFACE_READ_LOGICAL:
> +   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
> +   case SHADER_OPCODE_UNTYPED_SURFACE_READ_LOGICAL:
> +   case SHADER_OPCODE_BYTE_SCATTERED_READ:
> +   case SHADER_OPCODE_BYTE_SCATTERED_READ_LOGICAL:
> +  /* The back-end compilers don't know how to properly re-order reads 
> with
> +   * respect to writes.  In order to prevent accidental re-ordering and
> +   * CSE, flag them as having side-effects.
> +   */
> +  return true;
> +

Why would that be necessary?  Don't surface writes and atomics act as a
scheduling barrier?

> case SHADER_OPCODE_UNTYPED_ATOMIC:
> case SHADER_OPCODE_UNTYPED_ATOMIC_LOGICAL:
> case SHADER_OPCODE_UNTYPED_ATOMIC_FLOAT:
> -- 
> 2.19.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] anv/icl: Set use full ways in L3CNTLREG

2018-11-26 Thread Francisco Jerez
Anuj Phogat  writes:

> L3 allocation table in h/w specification recommends using 4 KB
> granularity for programming allocation fields in L3CNTLREG.
>
> Signed-off-by: Anuj Phogat 
> Cc: Kenneth Graunke 
> Cc: Francisco Jerez 
> Cc: Lionel Landwerlin 

Reviewed-by: Francisco Jerez 

> ---
>  src/intel/genxml/gen11.xml | 1 +
>  src/intel/vulkan/genX_cmd_buffer.c | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/src/intel/genxml/gen11.xml b/src/intel/genxml/gen11.xml
> index b975fe94776..1239ed011ed 100644
> --- a/src/intel/genxml/gen11.xml
> +++ b/src/intel/genxml/gen11.xml
> @@ -3547,6 +3547,7 @@
>  
>  
>   type="bool"/>
> +
>  
>  
>  
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
> b/src/intel/vulkan/genX_cmd_buffer.c
> index ed88157170d..c7e5ef9596e 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -1623,6 +1623,7 @@ genX(cmd_buffer_config_l3)(struct anv_cmd_buffer 
> *cmd_buffer,
>  * desirable behavior.
> */
> .ErrorDetectionBehaviorControl = true,
> +   .UseFullWays = true,
>  #endif
> .URBAllocation = cfg->n[GEN_L3P_URB],
> .ROAllocation = cfg->n[GEN_L3P_RO],
> -- 
> 2.17.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/5] intel/icl: Set way_size_per_bank to 4

2018-11-26 Thread Francisco Jerez
Anuj Phogat  writes:

> Signed-off-by: Anuj Phogat 
> Cc: Kenneth Graunke 
> Cc: Francisco Jerez 
> Cc: Lionel Landwerlin 

Reviewed-by: Francisco Jerez 

> ---
>  src/intel/common/gen_l3_config.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/common/gen_l3_config.c 
> b/src/intel/common/gen_l3_config.c
> index 079608198bc..de16ad23017 100644
> --- a/src/intel/common/gen_l3_config.c
> +++ b/src/intel/common/gen_l3_config.c
> @@ -313,7 +313,8 @@ static unsigned
>  get_l3_way_size(const struct gen_device_info *devinfo)
>  {
> const unsigned way_size_per_bank =
> -  devinfo->gen >= 9 && devinfo->l3_banks == 1 ? 4 : 2;
> +  (devinfo->gen >= 9 && devinfo->l3_banks == 1) || devinfo->gen == 11 ?
> +  4 : 2;
>  
> assert(devinfo->l3_banks);
> return way_size_per_bank * devinfo->l3_banks;
> -- 
> 2.17.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/icl: Set use full ways in L3CNTLREG

2018-11-26 Thread Francisco Jerez
Anuj Phogat  writes:

> L3 allocation table in h/w specification recommends using 4 KB
> granularity for programming allocation fields in L3CNTLREG.
>
> Signed-off-by: Anuj Phogat 
> Cc: Kenneth Graunke 
> Cc: Francisco Jerez 
> Cc: Lionel Landwerlin 
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h   | 1 +
>  src/mesa/drivers/dri/i965/gen7_l3_state.c | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 897c91aa31e..b8ada02d6eb 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -1647,6 +1647,7 @@ enum brw_pixel_shader_coverage_mask_mode {
>  # define GEN8_L3CNTLREG_ALL_ALLOC_SHIFT25
>  # define GEN8_L3CNTLREG_ALL_ALLOC_MASK INTEL_MASK(31, 25)
>  # define GEN8_L3CNTLREG_EDBC_NO_HANG   (1 << 9)
> +# define GEN8_L3CNTLREG_USE_FULL_WAYS  (1 << 10)
>  

This bit only exists in Gen11, you should probably prefix the define
with GEN11 instead.  With that fixed:

Reviewed-by: Francisco Jerez 

>  #define GEN10_CACHE_MODE_SS0x0e420
>  #define GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE (1 << 4)
> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
> b/src/mesa/drivers/dri/i965/gen7_l3_state.c
> index 8c6c4c47481..fb9b2703a50 100644
> --- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
> @@ -119,6 +119,7 @@ setup_l3_config(struct brw_context *brw, const struct 
> gen_l3_config *cfg)
>assert(!cfg->n[GEN_L3P_IS] && !cfg->n[GEN_L3P_C] && 
> !cfg->n[GEN_L3P_T]);
>  
>const unsigned imm_data = ((has_slm ? GEN8_L3CNTLREG_SLM_ENABLE : 0) |
> + (devinfo->gen == 11 ? GEN8_L3CNTLREG_USE_FULL_WAYS : 0) |
>   SET_FIELD(cfg->n[GEN_L3P_URB], GEN8_L3CNTLREG_URB_ALLOC) |
>   SET_FIELD(cfg->n[GEN_L3P_RO], GEN8_L3CNTLREG_RO_ALLOC) |
>   SET_FIELD(cfg->n[GEN_L3P_DC], GEN8_L3CNTLREG_DC_ALLOC) |
> -- 
> 2.17.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V2 1/4] i965/icl: Fix L3 configurations

2018-11-26 Thread Francisco Jerez
Anuj Phogat  writes:

> Use L3 configuration specified in h/w specification.
>
> V2: Drop configs which do under allocation of l3 cache.
> Bump up the comment above table.
>
> Signed-off-by: Anuj Phogat 
> Cc: Kenneth Graunke 
> Cc: Francisco Jerez 

Reviewed-by: Francisco Jerez 

> ---
>  src/intel/common/gen_l3_config.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/src/intel/common/gen_l3_config.c 
> b/src/intel/common/gen_l3_config.c
> index b977c6ab136..32264394fb6 100644
> --- a/src/intel/common/gen_l3_config.c
> +++ b/src/intel/common/gen_l3_config.c
> @@ -134,15 +134,15 @@ static const struct gen_l3_config cnl_l3_configs[] = {
>  
>  /**
>   * ICL validated L3 configurations.  \sa icl_l3_configs.
> + * Zeroth entry in below table has been commented out intentionally
> + * due to known issues with this configuration. Many other entries
> + * suggested by h/w specification aren't added here because they
> + * do under allocation of L3 cache with below partitioning.
>   */
>  static const struct gen_l3_config icl_l3_configs[] = {
> /* SLM URB ALL DC  RO  IS   C   T */
> -   {{  0, 64, 64,  0,  0,  0,  0,  0 }},
> -   {{  0, 64,  0, 16, 48,  0,  0,  0 }},
> -   {{  0, 48,  0, 16, 64,  0,  0,  0 }},
> -   {{  0, 32,  0,  0, 96,  0,  0,  0 }},
> -   {{  0, 32, 96,  0,  0,  0,  0,  0 }},
> -   {{  0, 32,  0, 16, 80,  0,  0,  0 }},
> +   /*{{  0, 16, 80,  0,  0,  0,  0,  0 }},*/
> +   {{  0, 32, 64,  0,  0,  0,  0,  0 }},
> {{  0 }}
>  };
>  
> -- 
> 2.17.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] i965/icl: Fix L3 configurations

2018-11-16 Thread Francisco Jerez
Anuj Phogat  writes:

> On Fri, Nov 16, 2018 at 6:21 AM Eero Tamminen  
> wrote:
>>
>> Hi,
>>
>> On 16.11.2018 10.33, Francisco Jerez wrote:
>> > Kenneth Graunke  writes:
>> [...]
>> >> Perhaps we'll get both configs working, and then will want to be able
>> >> to select between them.  I question whether the additional URB is truly
>> >> that valuable - how large are the actual gains? - considering that we
>> >> have to stall in order to reconfigure everything anyway...
>>
>> It's more about value of additional space for caching textures.
>>
>> One can calculate required max URB space when GS/TS isn't used, whereas
>> textures can fill all available cache.  For example, if draw does just a
>> single quad, L3 is better utilized with minimal URB space and leaving
>> rest for texture caching.
>>
> Right. URB (16) and ALL (80) config is the one with minimum URB allocation.
> But, it's not working probably because of a hardware bug. Inferring from above
> comments by ken and Eero, If we ever get it working, we should always be using
> just that one config and that's the config which h/w documentation recommends
> as well. Correct me if that's not what you meant.

I don't think anybody said that.  There is a use-case for the 32/64
configuration even after we get thee other configuration working, that's
why the hardware even gives you the choice.

> In that case, I would prefer to bypass all this code and do it in
> brw_upload_initial_gpu_state().
>

There is no real benefit from that.  It would be more complexity than
using the exact same codepath for all platforms.  It won't improve
runtime performance measurably.  And it will close the door to several
performance optimizations which are still valuable on ICL.

>>
>> > That just means that the update frequency needs to be low enough for the
>> > stall overhead to be negligible -- E.g. at batch buffer boundaries or
>> > wherever we're getting stalled anyway.
>>
>>
>> - Eero
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] i965/icl: Fix L3 configurations

2018-11-16 Thread Francisco Jerez
Kenneth Graunke  writes:

> On Thursday, November 15, 2018 11:16:09 PM PST Francisco Jerez wrote:
>> Kenneth Graunke  writes:
>> 
>> > On Thursday, November 15, 2018 5:51:18 PM PST Francisco Jerez wrote:
>> >> Anuj Phogat  writes:
>> >> 
>> >> > Use L3 configuration table specified in h/w specification.
>> >> >
>> >> > Signed-off-by: Anuj Phogat 
>> >> > Cc: Kenneth Graunke 
>> >> > Cc: Francisco Jerez 
>> >> > Cc: Lionel Landwerlin 
>> >> > ---
>> >> >  src/intel/common/gen_l3_config.c | 16 ++--
>> >> >  1 file changed, 10 insertions(+), 6 deletions(-)
>> >> >
>> >> > diff --git a/src/intel/common/gen_l3_config.c 
>> >> > b/src/intel/common/gen_l3_config.c
>> >> > index b977c6ab136..079608198bc 100644
>> >> > --- a/src/intel/common/gen_l3_config.c
>> >> > +++ b/src/intel/common/gen_l3_config.c
>> >> > @@ -137,12 +137,16 @@ static const struct gen_l3_config 
>> >> > cnl_l3_configs[] = {
>> >> >   */
>> >> >  static const struct gen_l3_config icl_l3_configs[] = {
>> >> > /* SLM URB ALL DC  RO  IS   C   T */
>> >> > -   {{  0, 64, 64,  0,  0,  0,  0,  0 }},
>> >> > -   {{  0, 64,  0, 16, 48,  0,  0,  0 }},
>> >> > -   {{  0, 48,  0, 16, 64,  0,  0,  0 }},
>> >> > -   {{  0, 32,  0,  0, 96,  0,  0,  0 }},
>> >> > -   {{  0, 32, 96,  0,  0,  0,  0,  0 }},
>> >> > -   {{  0, 32,  0, 16, 80,  0,  0,  0 }},
>> >> > +   {{  0, 32, 32,  0,  0,  0,  0,  0 }},
>> >> 
>> >> This configuration is inherently inefficient since it will always leave
>> >> a third of the L3 cache unallocated.  According to the hardware docs
>> >> it's only included for backwards compatibility.  I think we should
>> >> remove it so we don't end up using it accidentally.
>> >> 
>> >> > +   {{  0, 32, 28,  0,  0,  0,  0,  0 }},
>> >> > +   {{  0, 24,  0,  8, 28,  0,  0,  0 }},
>> >> > +   {{  0, 16,  0,  0, 44,  0,  0,  0 }},
>> >> > +   {{  0, 16, 12,  0,  0,  0,  0,  0 }},
>> >> > +   {{  0, 16,  0,  0, 12,  0,  0,  0 }},
>> >> 
>> >> The configurations above won't work right now because we aren't setting
>> >> up the command buffer and tile cache-related partitions in the L3
>> >> control registers.  You either need to hook up the new partitions (and
>> >> add array entries for them in gen_l3_config), or remove/comment out the
>> >> five lines above.
>> >> 
>> >> > +   {{  0, 16, 80,  0,  0,  0,  0,  0 }},
>> >> 
>> >> From the results of the experiments we ran it seems like the last
>> >> configuration above is busted due to some hardware bug.  It would make
>> >> sense to remove or at least comment out the line so we don't use it
>> >> accidentally until we get some better workaround from the hardware team.
>> >> 
>> >> > +   {{  0, 16, 48,  0,  0,  0,  0,  0 }},
>> >> > +   {{  0, 16, 44,  0,  0,  0,  0,  0 }},
>> >> 
>> >> As before the above two configurations won't work due to the missing
>> >> partitions introduced in ICL.  With these changes in place there's
>> >> probably no need for PATCH 4 of this series.
>> >> 
>> >> > +   {{  0, 32, 64,  0,  0,  0,  0,  0 }},
>> >> > {{  0 }}
>> >> >  };
>> >> >  
>> >> 
>> >
>> > So, one of the main motivations for dynamically reconfiguring the L3
>> > on the fly was to add/remove the SLM partition as needed.  This isn't
>> > required on Gen11+, as SLM is handled separately.  Furthermore, we just
>> > use the "ALL" configs rather than manually partitioning things between
>> > DC, RO, and so on.
>> >
>> > With that in mind, it seems like we basically always want to use the
>> > last config (32/64) - or maybe the smaller URB one (16/80).  I am not
>> > sure that we really need the ability to dynamically switch on the fly.
>> >
>> 
>> There's no optimal choice between the two beforehand, that's why the
>> hardware still provides the ability to reconfigure the L3 cache.
>> Different workloads can benefit from different ratios of URB to the rest
>> of the cache based on how bandwidth and geometry intensive they are.
>

Re: [Mesa-dev] [PATCH 1/5] i965/icl: Fix L3 configurations

2018-11-15 Thread Francisco Jerez
Kenneth Graunke  writes:

> On Thursday, November 15, 2018 5:51:18 PM PST Francisco Jerez wrote:
>> Anuj Phogat  writes:
>> 
>> > Use L3 configuration table specified in h/w specification.
>> >
>> > Signed-off-by: Anuj Phogat 
>> > Cc: Kenneth Graunke 
>> > Cc: Francisco Jerez 
>> > Cc: Lionel Landwerlin 
>> > ---
>> >  src/intel/common/gen_l3_config.c | 16 ++--
>> >  1 file changed, 10 insertions(+), 6 deletions(-)
>> >
>> > diff --git a/src/intel/common/gen_l3_config.c 
>> > b/src/intel/common/gen_l3_config.c
>> > index b977c6ab136..079608198bc 100644
>> > --- a/src/intel/common/gen_l3_config.c
>> > +++ b/src/intel/common/gen_l3_config.c
>> > @@ -137,12 +137,16 @@ static const struct gen_l3_config cnl_l3_configs[] = 
>> > {
>> >   */
>> >  static const struct gen_l3_config icl_l3_configs[] = {
>> > /* SLM URB ALL DC  RO  IS   C   T */
>> > -   {{  0, 64, 64,  0,  0,  0,  0,  0 }},
>> > -   {{  0, 64,  0, 16, 48,  0,  0,  0 }},
>> > -   {{  0, 48,  0, 16, 64,  0,  0,  0 }},
>> > -   {{  0, 32,  0,  0, 96,  0,  0,  0 }},
>> > -   {{  0, 32, 96,  0,  0,  0,  0,  0 }},
>> > -   {{  0, 32,  0, 16, 80,  0,  0,  0 }},
>> > +   {{  0, 32, 32,  0,  0,  0,  0,  0 }},
>> 
>> This configuration is inherently inefficient since it will always leave
>> a third of the L3 cache unallocated.  According to the hardware docs
>> it's only included for backwards compatibility.  I think we should
>> remove it so we don't end up using it accidentally.
>> 
>> > +   {{  0, 32, 28,  0,  0,  0,  0,  0 }},
>> > +   {{  0, 24,  0,  8, 28,  0,  0,  0 }},
>> > +   {{  0, 16,  0,  0, 44,  0,  0,  0 }},
>> > +   {{  0, 16, 12,  0,  0,  0,  0,  0 }},
>> > +   {{  0, 16,  0,  0, 12,  0,  0,  0 }},
>> 
>> The configurations above won't work right now because we aren't setting
>> up the command buffer and tile cache-related partitions in the L3
>> control registers.  You either need to hook up the new partitions (and
>> add array entries for them in gen_l3_config), or remove/comment out the
>> five lines above.
>> 
>> > +   {{  0, 16, 80,  0,  0,  0,  0,  0 }},
>> 
>> From the results of the experiments we ran it seems like the last
>> configuration above is busted due to some hardware bug.  It would make
>> sense to remove or at least comment out the line so we don't use it
>> accidentally until we get some better workaround from the hardware team.
>> 
>> > +   {{  0, 16, 48,  0,  0,  0,  0,  0 }},
>> > +   {{  0, 16, 44,  0,  0,  0,  0,  0 }},
>> 
>> As before the above two configurations won't work due to the missing
>> partitions introduced in ICL.  With these changes in place there's
>> probably no need for PATCH 4 of this series.
>> 
>> > +   {{  0, 32, 64,  0,  0,  0,  0,  0 }},
>> > {{  0 }}
>> >  };
>> >  
>> 
>
> So, one of the main motivations for dynamically reconfiguring the L3
> on the fly was to add/remove the SLM partition as needed.  This isn't
> required on Gen11+, as SLM is handled separately.  Furthermore, we just
> use the "ALL" configs rather than manually partitioning things between
> DC, RO, and so on.
>
> With that in mind, it seems like we basically always want to use the
> last config (32/64) - or maybe the smaller URB one (16/80).  I am not
> sure that we really need the ability to dynamically switch on the fly.
>

There's no optimal choice between the two beforehand, that's why the
hardware still provides the ability to reconfigure the L3 cache.
Different workloads can benefit from different ratios of URB to the rest
of the cache based on how bandwidth and geometry intensive they are.

> I had suggested to Anuj earlier to make brw_upload_initial_gpu_state()
> program one of the two configs directly, then remove the gen7_l3_state
> atom from the Gen11+ list.  We'd bypass all of this code entirely.  No
> more lists of things we don't want, with manipulated weights to pick the
> one thing we do want, and draw-time state flagging to re-select the same
> config every time...
>
> It seems like it would be dramatically simpler.  This code is great for
> Gen7+, I just don't think it makes sense for Gen11+.
>
> Curro, what do you think?
>

It definitely still makes sense on Gen11+ AFAICT.  And we may still need
to switch L3 partitions on the fly because of everybody's favorite ICL
performance hardware feature.  I doubt that making the allocation static
is a good plan.  What is your concern exactly?  Does it show up in your
profiling logs at all?

> --Ken


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] i965/icl: Fix L3 configurations

2018-11-15 Thread Francisco Jerez
Anuj Phogat  writes:

> Use L3 configuration table specified in h/w specification.
>
> Signed-off-by: Anuj Phogat 
> Cc: Kenneth Graunke 
> Cc: Francisco Jerez 
> Cc: Lionel Landwerlin 
> ---
>  src/intel/common/gen_l3_config.c | 16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/src/intel/common/gen_l3_config.c 
> b/src/intel/common/gen_l3_config.c
> index b977c6ab136..079608198bc 100644
> --- a/src/intel/common/gen_l3_config.c
> +++ b/src/intel/common/gen_l3_config.c
> @@ -137,12 +137,16 @@ static const struct gen_l3_config cnl_l3_configs[] = {
>   */
>  static const struct gen_l3_config icl_l3_configs[] = {
> /* SLM URB ALL DC  RO  IS   C   T */
> -   {{  0, 64, 64,  0,  0,  0,  0,  0 }},
> -   {{  0, 64,  0, 16, 48,  0,  0,  0 }},
> -   {{  0, 48,  0, 16, 64,  0,  0,  0 }},
> -   {{  0, 32,  0,  0, 96,  0,  0,  0 }},
> -   {{  0, 32, 96,  0,  0,  0,  0,  0 }},
> -   {{  0, 32,  0, 16, 80,  0,  0,  0 }},
> +   {{  0, 32, 32,  0,  0,  0,  0,  0 }},

This configuration is inherently inefficient since it will always leave
a third of the L3 cache unallocated.  According to the hardware docs
it's only included for backwards compatibility.  I think we should
remove it so we don't end up using it accidentally.

> +   {{  0, 32, 28,  0,  0,  0,  0,  0 }},
> +   {{  0, 24,  0,  8, 28,  0,  0,  0 }},
> +   {{  0, 16,  0,  0, 44,  0,  0,  0 }},
> +   {{  0, 16, 12,  0,  0,  0,  0,  0 }},
> +   {{  0, 16,  0,  0, 12,  0,  0,  0 }},

The configurations above won't work right now because we aren't setting
up the command buffer and tile cache-related partitions in the L3
control registers.  You either need to hook up the new partitions (and
add array entries for them in gen_l3_config), or remove/comment out the
five lines above.

> +   {{  0, 16, 80,  0,  0,  0,  0,  0 }},

From the results of the experiments we ran it seems like the last
configuration above is busted due to some hardware bug.  It would make
sense to remove or at least comment out the line so we don't use it
accidentally until we get some better workaround from the hardware team.

> +   {{  0, 16, 48,  0,  0,  0,  0,  0 }},
> +   {{  0, 16, 44,  0,  0,  0,  0,  0 }},

As before the above two configurations won't work due to the missing
partitions introduced in ICL.  With these changes in place there's
probably no need for PATCH 4 of this series.

> +   {{  0, 32, 64,  0,  0,  0,  0,  0 }},
> {{  0 }}
>  };
>  
> -- 
> 2.17.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   4   5   6   7   8   9   10   >