Re: [Mesa-dev] NIR constant problem for GPU which doesn't have native integer support

2019-01-03 Thread Roland Scheidegger
Am 03.01.19 um 20:50 schrieb Jason Ekstrand:
> > The problem you're describing is in converting from NIR to another IR,
> > not to hardware.  In LLVM they made a choice to put types on SSA
> values
> > but then to have the actual semantics be based on the instruction
> > itself.  This leads to lots of redundancy in the IR but also lots of
> > things you can validate which is kind-of nice.  Is that redundancy
> > really useful?  In our experience with NIR, we haven't found it to be
> > other than booleans (now sorted), this one constant issue, and
> > translating to IRs that have that redundancy.  Then why did they add
> > it?  I'm really not sure but it may, at least somewhat, be related to
> > the fact that they allow arrays and structs in their SSA values and so
> > need full types.  This is another design decision in LLVM which I find
> > highly questionable.  What you're is more-or-less that NIR should
> carry,
> > maintain, and validate extra useless information just so we can
> pass it
> > on to LLVM which is going to use it for what, exactly?  Sorry if I'm
> > extremely reluctant to make fairly fundamental changes to NIR with no
> > better reason than "LLVM did it that way".
> >
> > There's a second reason why I don't like the idea of putting types on
> > SSA values: It's extremely deceptive.  In SPIR-V you're allowed to
> do an
> > OpSConvert with a source that is an unsigned 16-bit integer and a
> > destination that is an unsigned 32-bit integer.  What happens?  Your
> > uint -> uint cast sign extends!  Yup That's what happens.  No
> joke. 
> > The same is true of signed vs. unsigned division or modulus.  The end
> > result is that the types in SPIR-V are useless and you can't actually
> > trust them for anything except bit-size and sometimes labelling
> > something as a float vs. int.
> This is really a decision of spir-v only though, llvm doesn't have that
> nonsense (maybe for making it easier to translate to other languages?) -
> there's just int and float types there, with no distinction between
> signed and unsigned.
> 
> 
> I think with SPIR-V you could probably just pick one and make everything
> either signed or unsigned.  I'm not immediately aware of any opcodes
> that require one signedness or the other; most just require an integer
> or require a float.  That said, this is SPIR-V so I'm not going to bet
> money on that...
>  
> 
> You are quite right though that float vs. int is somewhat redundant too
> due to the instructions indicating the type. I suppose there might be
> reasons why there's different types - hw may use different registers for
> instance (whereas of course noone in their right mind would use
> different registers for signed vs. unsigned ints), or there might be
> some real cost of bitcasts (at least bypass delays are common for cpus
> when transitioning values between int and float execution units). For
> instance, it makes a difference with x86 sse if you use float or int
> loads, which otherwise you couldn't indicate directly (llvm can optimize
> this into the right kind of load nowadays even if you use the wrong kind
> of variable for the load, e.g. int load then bitcast to float and do
> some float op will change it into a float load, but this is IIRC
> actually a pretty new ability, and possibly doesn't happen if you
> disable enough optimization passes).
> 
> 
> Having it for the purpose of register allocation makes sense in the CPU
> world.  In the GPU world, everything is typically designed float-first
> and I'm not aware of any hardware has separate int and float registers
> like x86 does.  That said, hardware changes over time and it's entirely
> possible that someone will decide that a heterogeneous register file is
> a good idea on a GPU.  (Technically, most GPUs do have flag regs or
> accumulators or something but it's not as bad as x86.)  Our of
> curiosity, do you (or anyone else) know if LLVM actually uses them for
> that purpose?  I could see that information getting lost in the back-end
> and them using something else to choose registers.
> 
> --Jason

For llvm with x86 sse, I don't think a variable being characterized as
float or int via bitcast actually makes a difference whatsoever (at
least not with optimization). llvm would determine if it's float or int
on its own (based on preceeding / succeeding instructions). For example,
as you might know, sse has both float and int logic ops, whereas llvm of
course does not - but it would still use float logic ops appropriately
(when the value is coming out of / going into a "true" float op),
despite that you have to cast it to int in llvm to do the logic op. (A
more interesting examples are actually shuffles, since again due to sse
being quite horrid there some are only directly possible with ints, some

[Mesa-dev] [PATCH] st/nine: Ignore null sized windows

2019-01-03 Thread Axel Davy
If for some reason the window size detected
is null, just render at normal size.

Fixes the regression caused by:
commit 2318ca68bbeb4fa6e21a4d8c650cec3f64246596
"st/nine: Handle window resize when a presentation buffer is used"

Fixes: https://github.com/iXit/Mesa-3D/issues/331
Fixes: https://github.com/iXit/Mesa-3D/issues/332

Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Axel Davy 
---
It may not apply cleanly to mesa stable. I can do another
patch for stable if needed. Basically it's the very same code
except the part "/* Switch to using presentation buffers ...*/"
would be replaced by "pipe = NineDevice9_GetPipe(This->base.device);"

 src/gallium/state_trackers/nine/swapchain9.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/gallium/state_trackers/nine/swapchain9.c 
b/src/gallium/state_trackers/nine/swapchain9.c
index 6c22be24c7c..ceaa1cd848a 100644
--- a/src/gallium/state_trackers/nine/swapchain9.c
+++ b/src/gallium/state_trackers/nine/swapchain9.c
@@ -753,6 +753,12 @@ present( struct NineSwapChain9 *This,
 ID3DPresent_GetWindowInfo(This->present, hDestWindowOverride, 
_width, _height, _depth);
 (void)target_depth;
 
+/* Can happen for a few frames. */
+if (target_width == 0 || target_height == 0) {
+target_width = resource->width0;
+target_height = resource->height0;
+}
+
 /* Switch to using presentation buffers on window resize.
  * Note: Most apps should resize the d3d back buffers when
  * a window resize is detected, which will result in a call to
-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] NIR constant problem for GPU which doesn't have native integer support

2019-01-03 Thread Jason Ekstrand
On Thu, Jan 3, 2019 at 2:03 PM Bas Nieuwenhuizen 
wrote:

> On Thu, Jan 3, 2019 at 6:59 PM Jason Ekstrand 
> wrote:
> >
> > On Thu, Jan 3, 2019 at 3:39 AM Erik Faye-Lund <
> erik.faye-l...@collabora.com> wrote:
> >>
> >> On Wed, 2019-01-02 at 10:16 -0600, Jason Ekstrand wrote:
> >> > On Wed, Jan 2, 2019 at 9:43 AM Ilia Mirkin 
> >> > wrote:
> >> > > Have a look at the first 4 patches in the series from Jonathan
> >> > > Marek
> >> > > to address some of these issues:
> >> > >
> >> > > https://patchwork.freedesktop.org/series/54295/
> >> > >
> >> > > Not sure exactly what state that work is in, but I've added
> >> > > Jonathan
> >> > > to CC, perhaps he can provide an update.
> >> > >
> >> > > Cheers,
> >> > >
> >> > >   -ilia
> >> > >
> >> > > On Wed, Jan 2, 2019 at 6:28 AM Qiang Yu  wrote:
> >> > > >
> >> > > > Hi guys,
> >> > > >
> >> > > > I found the problem with this test fragment shader when lima
> >> > > development:
> >> > > > uniform int color;
> >> > > > void main() {
> >> > > > if (color > 1)
> >> > > > gl_FragColor = vec4(1.0, 0.0, 0.0, 1);
> >> > > > else
> >> > > > gl_FragColor = vec4(0.0, 1.0, 0.0, 1);
> >> > > > }
> >> > > >
> >> > > > nir_print_shader output:
> >> > > > impl main {
> >> > > > block block_0:
> >> > > > /* preds: */
> >> > > > vec1 32 ssa_0 = load_const (0x0001 /* 0.00 */)
> >> > > > vec4 32 ssa_1 = load_const (0x3f80 /* 1.00 */,
> >> > > > 0x /* 0.00 */, 0x /* 0.00 */, 0x3f80
> >> > > /*
> >> > > > 1.00 */)
> >> > > > vec4 32 ssa_2 = load_const (0x /* 0.00 */,
> >> > > > 0x3f80 /* 1.00 */, 0x /* 0.00 */, 0x3f80
> >> > > /*
> >> > > > 1.00 */)
> >> > > > vec1 32 ssa_3 = load_const (0x /* 0.00 */)
> >> > > > vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 1, 0)
> >> > > /*
> >> > > > base=0 */ /* range=1 */ /* component=0 */   /* color */
> >> > > > vec1 32 ssa_5 = slt ssa_0, ssa_4
> >> > > > vec1 32 ssa_6 = fnot ssa_5
> >> > > > vec4 32 ssa_7 = bcsel ssa_6., ssa_2, ssa_1
> >> > > > intrinsic store_output (ssa_7, ssa_3) (0, 15, 0) /*
> >> > > base=0 */
> >> > > > /* wrmask=xyzw */ /* component=0 */   /* gl_FragColor */
> >> > > > /* succs: block_1 */
> >> > > > block block_1:
> >> > > > }
> >> > > >
> >> > > > ssa0 is not converted to float when glsl to nir. I see
> >> > > glsl_to_nir.cpp
> >> > > > will create flt/ilt/ult
> >> > > > based on source type for gpu support native integer, but for gpu
> >> > > not
> >> > > > support native
> >> > > > integer, just create slt for all source type. And in
> >> > > > nir_lower_constant_initializers,
> >> > > > there's also no type conversion for integer constant.
> >> >
> >> > This is a generally sticky issue.  In NIR, we have no concept of
> >> > types on SSA values which has proven perfectly reasonable and
> >> > actually very powerful in a world where integers are supported
> >> > natively.  Unfortunately, it causes significant problems for float-
> >> > only architectures.
> >>
> >> I would like to take this chance to say that this untyped SSA-value
> >> choice has lead to issues in both radeon_si (because LLVM values are
> >> typed) and zink (similarly, because SPIR-V values are typed), where we
> >> need to to bitcasts on every access because there's just not enough
> >> information available to emit variables with the right type.
> >
> >
> > I'm not sure if I agree that the two problems are the same or not...
> More on that in a bit.
> >
> >>
> >> It took us a lot of time to realize that the meta-data from the opcodes
> >> doesn't *really* provide this, because the rest of nir doesn't treat
> >> values consistently. In fact, this feels arguably more like buggy
> >> behavior; why do we even have fmov when all of the time the compiler
> >> will emit imovs for floating-point values...? Or why do we have bitcast
> >
> >
> > Why do we have different mov opcodes?  Because they have different
> behavior in the presence of source/destination modifiers.  You likely don't
> use those in radeon or Zink but we do use them on Intel.  That being said,
> I've very seriously considered adding a generic nir_op_mov which is
> entirely typeless and doesn't support modifiers at all and make most of
> core NIR use that.  For that matter, there's no real reason why we need
> fmov with modifiers at all when we could we could easily replace "ssa1 =
> fmov.sat |x|" with "ssa1 = fsat |x|" or "ssa1 = fabs.sat x".  If we had a
> generic nir_op_mov to deal with swizzling etc., the only thing having
> i/fmov would really accomplish is lowering the number of different ways you
> can write "fmov.sat |x|".  The only reason I haven't added this nir_op_mov
> opcode is because it's a pile of work and anyone who can't do integers can
> just implement imov as fmov and it's not a big deal.
>
> Is there anything blocking just 

Re: [Mesa-dev] [PATCH 6/7] util/queue: add util_queue_adjust_num_threads

2019-01-03 Thread Ian Romanick
This patch is

Reviewed-by: Ian Romanick 

On 11/28/18 6:59 PM, Marek Olšák wrote:
> From: Marek Olšák 
> 
> for ARB_parallel_shader_compile
> ---
>  src/util/u_queue.c | 50 --
>  src/util/u_queue.h |  8 
>  2 files changed, 52 insertions(+), 6 deletions(-)
> 
> diff --git a/src/util/u_queue.c b/src/util/u_queue.c
> index 612ad5e83c6..383a9c09919 100644
> --- a/src/util/u_queue.c
> +++ b/src/util/u_queue.c
> @@ -27,42 +27,43 @@
>  #include "u_queue.h"
>  
>  #include 
>  
>  #include "util/os_time.h"
>  #include "util/u_string.h"
>  #include "util/u_thread.h"
>  #include "u_process.h"
>  
>  static void
> -util_queue_kill_threads(struct util_queue *queue, unsigned keep_num_threads);
> +util_queue_kill_threads(struct util_queue *queue, unsigned keep_num_threads,
> +bool finish_locked);
>  
>  /
>   * Wait for all queues to assert idle when exit() is called.
>   *
>   * Otherwise, C++ static variable destructors can be called while threads
>   * are using the static variables.
>   */
>  
>  static once_flag atexit_once_flag = ONCE_FLAG_INIT;
>  static struct list_head queue_list;
>  static mtx_t exit_mutex = _MTX_INITIALIZER_NP;
>  
>  static void
>  atexit_handler(void)
>  {
> struct util_queue *iter;
>  
> mtx_lock(_mutex);
> /* Wait for all queues to assert idle. */
> LIST_FOR_EACH_ENTRY(iter, _list, head) {
> -  util_queue_kill_threads(iter, 0);
> +  util_queue_kill_threads(iter, 0, false);
> }
> mtx_unlock(_mutex);
>  }
>  
>  static void
>  global_init(void)
>  {
> LIST_INITHEAD(_list);
> atexit(atexit_handler);
>  }
> @@ -333,20 +334,53 @@ util_queue_create_thread(struct util_queue *queue, 
> unsigned index)
> *
> * Note that Linux only allows decreasing the priority. The original
> * priority can't be restored.
> */
>pthread_setschedparam(queue->threads[index], SCHED_IDLE, _param);
>  #endif
> }
> return true;
>  }
>  
> +void
> +util_queue_adjust_num_threads(struct util_queue *queue, unsigned num_threads)
> +{
> +   num_threads = MIN2(num_threads, queue->max_threads);
> +   num_threads = MAX2(num_threads, 1);
> +
> +   mtx_lock(>finish_lock);
> +   unsigned old_num_threads = queue->num_threads;
> +
> +   if (num_threads == old_num_threads) {
> +  mtx_unlock(>finish_lock);
> +  return;
> +   }
> +
> +   if (num_threads < old_num_threads) {
> +  util_queue_kill_threads(queue, num_threads, true);
> +  mtx_unlock(>finish_lock);
> +  return;
> +   }
> +
> +   /* Create threads.
> +*
> +* We need to update num_threads first, because threads terminate
> +* when thread_index < num_threads.
> +*/
> +   queue->num_threads = num_threads;
> +   for (unsigned i = old_num_threads; i < num_threads; i++) {
> +  if (!util_queue_create_thread(queue, i))
> + break;
> +   }
> +   mtx_unlock(>finish_lock);
> +}
> +
>  bool
>  util_queue_init(struct util_queue *queue,
>  const char *name,
>  unsigned max_jobs,
>  unsigned num_threads,
>  unsigned flags)
>  {
> unsigned i;
>  
> /* Form the thread name from process_name and name, limited to 13
> @@ -371,20 +405,21 @@ util_queue_init(struct util_queue *queue,
> memset(queue, 0, sizeof(*queue));
>  
> if (process_len) {
>util_snprintf(queue->name, sizeof(queue->name), "%.*s:%s",
>  process_len, process_name, name);
> } else {
>util_snprintf(queue->name, sizeof(queue->name), "%s", name);
> }
>  
> queue->flags = flags;
> +   queue->max_threads = num_threads;
> queue->num_threads = num_threads;
> queue->max_jobs = max_jobs;
>  
> queue->jobs = (struct util_queue_job*)
>   calloc(max_jobs, sizeof(struct util_queue_job));
> if (!queue->jobs)
>goto fail;
>  
> (void) mtx_init(>lock, mtx_plain);
> (void) mtx_init(>finish_lock, mtx_plain);
> @@ -422,48 +457,51 @@ fail:
>cnd_destroy(>has_queued_cond);
>mtx_destroy(>lock);
>free(queue->jobs);
> }
> /* also util_queue_is_initialized can be used to check for success */
> memset(queue, 0, sizeof(*queue));
> return false;
>  }
>  
>  static void
> -util_queue_kill_threads(struct util_queue *queue, unsigned keep_num_threads)
> +util_queue_kill_threads(struct util_queue *queue, unsigned keep_num_threads,
> +bool finish_locked)
>  {
> unsigned i;
>  
> /* Signal all threads to terminate. */
> -   mtx_lock(>finish_lock);
> +   if (!finish_locked)
> +  mtx_lock(>finish_lock);
>  
> if (keep_num_threads >= queue->num_threads) {
>mtx_unlock(>finish_lock);
>return;
> }
>  
> mtx_lock(>lock);
> unsigned old_num_threads = queue->num_threads;
> queue->num_threads = 

Re: [Mesa-dev] NIR constant problem for GPU which doesn't have native integer support

2019-01-03 Thread Bas Nieuwenhuizen
On Thu, Jan 3, 2019 at 6:59 PM Jason Ekstrand  wrote:
>
> On Thu, Jan 3, 2019 at 3:39 AM Erik Faye-Lund  
> wrote:
>>
>> On Wed, 2019-01-02 at 10:16 -0600, Jason Ekstrand wrote:
>> > On Wed, Jan 2, 2019 at 9:43 AM Ilia Mirkin 
>> > wrote:
>> > > Have a look at the first 4 patches in the series from Jonathan
>> > > Marek
>> > > to address some of these issues:
>> > >
>> > > https://patchwork.freedesktop.org/series/54295/
>> > >
>> > > Not sure exactly what state that work is in, but I've added
>> > > Jonathan
>> > > to CC, perhaps he can provide an update.
>> > >
>> > > Cheers,
>> > >
>> > >   -ilia
>> > >
>> > > On Wed, Jan 2, 2019 at 6:28 AM Qiang Yu  wrote:
>> > > >
>> > > > Hi guys,
>> > > >
>> > > > I found the problem with this test fragment shader when lima
>> > > development:
>> > > > uniform int color;
>> > > > void main() {
>> > > > if (color > 1)
>> > > > gl_FragColor = vec4(1.0, 0.0, 0.0, 1);
>> > > > else
>> > > > gl_FragColor = vec4(0.0, 1.0, 0.0, 1);
>> > > > }
>> > > >
>> > > > nir_print_shader output:
>> > > > impl main {
>> > > > block block_0:
>> > > > /* preds: */
>> > > > vec1 32 ssa_0 = load_const (0x0001 /* 0.00 */)
>> > > > vec4 32 ssa_1 = load_const (0x3f80 /* 1.00 */,
>> > > > 0x /* 0.00 */, 0x /* 0.00 */, 0x3f80
>> > > /*
>> > > > 1.00 */)
>> > > > vec4 32 ssa_2 = load_const (0x /* 0.00 */,
>> > > > 0x3f80 /* 1.00 */, 0x /* 0.00 */, 0x3f80
>> > > /*
>> > > > 1.00 */)
>> > > > vec1 32 ssa_3 = load_const (0x /* 0.00 */)
>> > > > vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 1, 0)
>> > > /*
>> > > > base=0 */ /* range=1 */ /* component=0 */   /* color */
>> > > > vec1 32 ssa_5 = slt ssa_0, ssa_4
>> > > > vec1 32 ssa_6 = fnot ssa_5
>> > > > vec4 32 ssa_7 = bcsel ssa_6., ssa_2, ssa_1
>> > > > intrinsic store_output (ssa_7, ssa_3) (0, 15, 0) /*
>> > > base=0 */
>> > > > /* wrmask=xyzw */ /* component=0 */   /* gl_FragColor */
>> > > > /* succs: block_1 */
>> > > > block block_1:
>> > > > }
>> > > >
>> > > > ssa0 is not converted to float when glsl to nir. I see
>> > > glsl_to_nir.cpp
>> > > > will create flt/ilt/ult
>> > > > based on source type for gpu support native integer, but for gpu
>> > > not
>> > > > support native
>> > > > integer, just create slt for all source type. And in
>> > > > nir_lower_constant_initializers,
>> > > > there's also no type conversion for integer constant.
>> >
>> > This is a generally sticky issue.  In NIR, we have no concept of
>> > types on SSA values which has proven perfectly reasonable and
>> > actually very powerful in a world where integers are supported
>> > natively.  Unfortunately, it causes significant problems for float-
>> > only architectures.
>>
>> I would like to take this chance to say that this untyped SSA-value
>> choice has lead to issues in both radeon_si (because LLVM values are
>> typed) and zink (similarly, because SPIR-V values are typed), where we
>> need to to bitcasts on every access because there's just not enough
>> information available to emit variables with the right type.
>
>
> I'm not sure if I agree that the two problems are the same or not...  More on 
> that in a bit.
>
>>
>> It took us a lot of time to realize that the meta-data from the opcodes
>> doesn't *really* provide this, because the rest of nir doesn't treat
>> values consistently. In fact, this feels arguably more like buggy
>> behavior; why do we even have fmov when all of the time the compiler
>> will emit imovs for floating-point values...? Or why do we have bitcast
>
>
> Why do we have different mov opcodes?  Because they have different behavior 
> in the presence of source/destination modifiers.  You likely don't use those 
> in radeon or Zink but we do use them on Intel.  That being said, I've very 
> seriously considered adding a generic nir_op_mov which is entirely typeless 
> and doesn't support modifiers at all and make most of core NIR use that.  For 
> that matter, there's no real reason why we need fmov with modifiers at all 
> when we could we could easily replace "ssa1 = fmov.sat |x|" with "ssa1 = fsat 
> |x|" or "ssa1 = fabs.sat x".  If we had a generic nir_op_mov to deal with 
> swizzling etc., the only thing having i/fmov would really accomplish is 
> lowering the number of different ways you can write "fmov.sat |x|".  The only 
> reason I haven't added this nir_op_mov opcode is because it's a pile of work 
> and anyone who can't do integers can just implement imov as fmov and it's not 
> a big deal.

Is there anything blocking just always using fmov and removing imov
from nir? Currently anyone that want to add a modifier to an imov can
just change the opcode to an fmov and add the modifier, is it not?
>
>>
>> I would really love it if we could at least consider making this "we
>> can just treat floats 

Re: [Mesa-dev] [PATCH 5/7] util/queue: hold a lock when reading num_threads in util_queue_finish

2019-01-03 Thread Ian Romanick
This patch is

Reviewed-by: Ian Romanick 

On 11/28/18 6:59 PM, Marek Olšák wrote:
> From: Marek Olšák 
> 
> ---
>  src/util/u_queue.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/src/util/u_queue.c b/src/util/u_queue.c
> index 5aaf60ae78e..612ad5e83c6 100644
> --- a/src/util/u_queue.c
> +++ b/src/util/u_queue.c
> @@ -582,29 +582,29 @@ util_queue_finish_execute(void *data, int num_thread)
> util_barrier_wait(barrier);
>  }
>  
>  /**
>   * Wait until all previously added jobs have completed.
>   */
>  void
>  util_queue_finish(struct util_queue *queue)
>  {
> util_barrier barrier;
> -   struct util_queue_fence *fences = malloc(queue->num_threads * 
> sizeof(*fences));
> -
> -   util_barrier_init(, queue->num_threads);
> +   struct util_queue_fence *fences;
>  
> /* If 2 threads were adding jobs for 2 different barries at the same time,
>  * a deadlock would happen, because 1 barrier requires that all threads
>  * wait for it exclusively.
>  */
> mtx_lock(>finish_lock);
> +   fences = malloc(queue->num_threads * sizeof(*fences));
> +   util_barrier_init(, queue->num_threads);
>  
> for (unsigned i = 0; i < queue->num_threads; ++i) {
>util_queue_fence_init([i]);
>util_queue_add_job(queue, , [i], 
> util_queue_finish_execute, NULL);
> }
>  
> for (unsigned i = 0; i < queue->num_threads; ++i) {
>util_queue_fence_wait([i]);
>util_queue_fence_destroy([i]);
> }
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/7] util/queue: add ability to kill a subset of threads

2019-01-03 Thread Ian Romanick
On 11/28/18 6:59 PM, Marek Olšák wrote:
> From: Marek Olšák 
> 
> for ARB_parallel_shader_compile
> ---
>  src/util/u_queue.c | 49 +-
>  src/util/u_queue.h |  5 ++---
>  2 files changed, 33 insertions(+), 21 deletions(-)
> 
> diff --git a/src/util/u_queue.c b/src/util/u_queue.c
> index 48c5c79552d..5aaf60ae78e 100644
> --- a/src/util/u_queue.c
> +++ b/src/util/u_queue.c
> @@ -26,42 +26,43 @@
>  
>  #include "u_queue.h"
>  
>  #include 
>  
>  #include "util/os_time.h"
>  #include "util/u_string.h"
>  #include "util/u_thread.h"
>  #include "u_process.h"
>  
> -static void util_queue_killall_and_wait(struct util_queue *queue);
> +static void
> +util_queue_kill_threads(struct util_queue *queue, unsigned keep_num_threads);
>  
>  /
>   * Wait for all queues to assert idle when exit() is called.
>   *
>   * Otherwise, C++ static variable destructors can be called while threads
>   * are using the static variables.
>   */
>  
>  static once_flag atexit_once_flag = ONCE_FLAG_INIT;
>  static struct list_head queue_list;
>  static mtx_t exit_mutex = _MTX_INITIALIZER_NP;
>  
>  static void
>  atexit_handler(void)
>  {
> struct util_queue *iter;
>  
> mtx_lock(_mutex);
> /* Wait for all queues to assert idle. */
> LIST_FOR_EACH_ENTRY(iter, _list, head) {
> -  util_queue_killall_and_wait(iter);
> +  util_queue_kill_threads(iter, 0);
> }
> mtx_unlock(_mutex);
>  }
>  
>  static void
>  global_init(void)
>  {
> LIST_INITHEAD(_list);
> atexit(atexit_handler);
>  }
> @@ -259,55 +260,58 @@ util_queue_thread_func(void *input)
>u_thread_setname(name);
> }
>  
> while (1) {
>struct util_queue_job job;
>  
>mtx_lock(>lock);
>assert(queue->num_queued >= 0 && queue->num_queued <= queue->max_jobs);
>  
>/* wait if the queue is empty */
> -  while (!queue->kill_threads && queue->num_queued == 0)
> +  while (thread_index < queue->num_threads && queue->num_queued == 0)
>   cnd_wait(>has_queued_cond, >lock);
>  
> -  if (queue->kill_threads) {
> +  /* only kill threads that are above "num_threads" */
> +  if (thread_index >= queue->num_threads) {
>   mtx_unlock(>lock);
>   break;
>}
>  
>job = queue->jobs[queue->read_idx];
>memset(>jobs[queue->read_idx], 0, sizeof(struct 
> util_queue_job));
>queue->read_idx = (queue->read_idx + 1) % queue->max_jobs;
>  
>queue->num_queued--;
>cnd_signal(>has_space_cond);
>mtx_unlock(>lock);
>  
>if (job.job) {
>   job.execute(job.job, thread_index);
>   util_queue_fence_signal(job.fence);
>   if (job.cleanup)
>  job.cleanup(job.job, thread_index);
>}
> }
>  
> -   /* signal remaining jobs before terminating */
> +   /* signal remaining jobs if all threads are being terminated */
> mtx_lock(>lock);
> -   for (unsigned i = queue->read_idx; i != queue->write_idx;
> -i = (i + 1) % queue->max_jobs) {
> -  if (queue->jobs[i].job) {
> - util_queue_fence_signal(queue->jobs[i].fence);
> - queue->jobs[i].job = NULL;
> +   if (queue->num_threads == 0) {
> +  for (unsigned i = queue->read_idx; i != queue->write_idx;
> +   i = (i + 1) % queue->max_jobs) {
> + if (queue->jobs[i].job) {
> +util_queue_fence_signal(queue->jobs[i].fence);
> +queue->jobs[i].job = NULL;
> + }
>}
> +  queue->read_idx = queue->write_idx;
> +  queue->num_queued = 0;
> }
> -   queue->read_idx = queue->write_idx;
> -   queue->num_queued = 0;
> mtx_unlock(>lock);
> return 0;
>  }
>  
>  static bool
>  util_queue_create_thread(struct util_queue *queue, unsigned index)
>  {
> struct thread_input *input =
>(struct thread_input *) malloc(sizeof(struct thread_input));
> input->queue = queue;
> @@ -418,60 +422,69 @@ fail:
>cnd_destroy(>has_queued_cond);
>mtx_destroy(>lock);
>free(queue->jobs);
> }
> /* also util_queue_is_initialized can be used to check for success */
> memset(queue, 0, sizeof(*queue));
> return false;
>  }
>  
>  static void
> -util_queue_killall_and_wait(struct util_queue *queue)
> +util_queue_kill_threads(struct util_queue *queue, unsigned keep_num_threads)
>  {
> unsigned i;
>  
> /* Signal all threads to terminate. */
> +   mtx_lock(>finish_lock);
> +
> +   if (keep_num_threads >= queue->num_threads) {
> +  mtx_unlock(>finish_lock);
> +  return;
> +   }
> +
> mtx_lock(>lock);
> -   queue->kill_threads = 1;
> +   unsigned old_num_threads = queue->num_threads;
> +   queue->num_threads = keep_num_threads;

Shouldn't this still be set below, after the threads are joined?

> cnd_broadcast(>has_queued_cond);
> mtx_unlock(>lock);
>  
> -   for (i = 0; i < queue->num_threads; i++)
> 

Re: [Mesa-dev] NIR constant problem for GPU which doesn't have native integer support

2019-01-03 Thread Jason Ekstrand
On Thu, Jan 3, 2019 at 1:37 PM Roland Scheidegger 
wrote:

> Am 03.01.19 um 18:58 schrieb Jason Ekstrand:
> > On Thu, Jan 3, 2019 at 3:39 AM Erik Faye-Lund
> > mailto:erik.faye-l...@collabora.com>>
> wrote:
> >
> > On Wed, 2019-01-02 at 10:16 -0600, Jason Ekstrand wrote:
> > > On Wed, Jan 2, 2019 at 9:43 AM Ilia Mirkin  > >
> > > wrote:
> > > > Have a look at the first 4 patches in the series from Jonathan
> > > > Marek
> > > > to address some of these issues:
> > > >
> > > > https://patchwork.freedesktop.org/series/54295/
> > <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fseries%2F54295%2F=02%7C01%7Csroland%40vmware.com%7Cb5a1b18f854a4533274508d671a52647%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636821351481601457=RPh1ZhwqvjBP8eSmE9D%2BErOVgCcmb4kobJVy%2BpJeyIc%3D=0
> >
> > > >
> > > > Not sure exactly what state that work is in, but I've added
> > > > Jonathan
> > > > to CC, perhaps he can provide an update.
> > > >
> > > > Cheers,
> > > >
> > > >   -ilia
> > > >
> > > > On Wed, Jan 2, 2019 at 6:28 AM Qiang Yu  > > wrote:
> > > > >
> > > > > Hi guys,
> > > > >
> > > > > I found the problem with this test fragment shader when lima
> > > > development:
> > > > > uniform int color;
> > > > > void main() {
> > > > > if (color > 1)
> > > > > gl_FragColor = vec4(1.0, 0.0, 0.0, 1);
> > > > > else
> > > > > gl_FragColor = vec4(0.0, 1.0, 0.0, 1);
> > > > > }
> > > > >
> > > > > nir_print_shader output:
> > > > > impl main {
> > > > > block block_0:
> > > > > /* preds: */
> > > > > vec1 32 ssa_0 = load_const (0x0001 /* 0.00 */)
> > > > > vec4 32 ssa_1 = load_const (0x3f80 /* 1.00 */,
> > > > > 0x /* 0.00 */, 0x /* 0.00 */,
> 0x3f80
> > > > /*
> > > > > 1.00 */)
> > > > > vec4 32 ssa_2 = load_const (0x /* 0.00 */,
> > > > > 0x3f80 /* 1.00 */, 0x /* 0.00 */,
> 0x3f80
> > > > /*
> > > > > 1.00 */)
> > > > > vec1 32 ssa_3 = load_const (0x /* 0.00 */)
> > > > > vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 1,
> 0)
> > > > /*
> > > > > base=0 */ /* range=1 */ /* component=0 */   /* color */
> > > > > vec1 32 ssa_5 = slt ssa_0, ssa_4
> > > > > vec1 32 ssa_6 = fnot ssa_5
> > > > > vec4 32 ssa_7 = bcsel ssa_6., ssa_2, ssa_1
> > > > > intrinsic store_output (ssa_7, ssa_3) (0, 15, 0) /*
> > > > base=0 */
> > > > > /* wrmask=xyzw */ /* component=0 */   /* gl_FragColor */
> > > > > /* succs: block_1 */
> > > > > block block_1:
> > > > > }
> > > > >
> > > > > ssa0 is not converted to float when glsl to nir. I see
> > > > glsl_to_nir.cpp
> > > > > will create flt/ilt/ult
> > > > > based on source type for gpu support native integer, but for
> gpu
> > > > not
> > > > > support native
> > > > > integer, just create slt for all source type. And in
> > > > > nir_lower_constant_initializers,
> > > > > there's also no type conversion for integer constant.
> > >
> > > This is a generally sticky issue.  In NIR, we have no concept of
> > > types on SSA values which has proven perfectly reasonable and
> > > actually very powerful in a world where integers are supported
> > > natively.  Unfortunately, it causes significant problems for float-
> > > only architectures.
> >
> > I would like to take this chance to say that this untyped SSA-value
> > choice has lead to issues in both radeon_si (because LLVM values are
> > typed) and zink (similarly, because SPIR-V values are typed), where
> we
> > need to to bitcasts on every access because there's just not enough
> > information available to emit variables with the right type.
> >
> >
> > I'm not sure if I agree that the two problems are the same or not...
> > More on that in a bit.
> >
> >
> > It took us a lot of time to realize that the meta-data from the
> opcodes
> > doesn't *really* provide this, because the rest of nir doesn't treat
> > values consistently. In fact, this feels arguably more like buggy
> > behavior; why do we even have fmov when all of the time the compiler
> > will emit imovs for floating-point values...? Or why do we have
> bitcast
> >
> >
> > Why do we have different mov opcodes?  Because they have different
> > behavior in the presence of source/destination modifiers.  You likely
> > don't use those in radeon or Zink but we do use them on Intel.  That
> > being said, I've very seriously considered adding a generic nir_op_mov
> > which is entirely typeless and doesn't support modifiers at 

Re: [Mesa-dev] [PATCH 3/7] util/queue: move thread creation into a separate function

2019-01-03 Thread Ian Romanick
Looks to be a simple enough refactor.  This patch is

Reviewed-by: Ian Romanick 

On 11/28/18 6:59 PM, Marek Olšák wrote:
> From: Marek Olšák 
> 
> ---
>  src/util/u_queue.c | 56 ++
>  1 file changed, 32 insertions(+), 24 deletions(-)
> 
> diff --git a/src/util/u_queue.c b/src/util/u_queue.c
> index 3812c824b6d..48c5c79552d 100644
> --- a/src/util/u_queue.c
> +++ b/src/util/u_queue.c
> @@ -298,20 +298,51 @@ util_queue_thread_func(void *input)
>   util_queue_fence_signal(queue->jobs[i].fence);
>   queue->jobs[i].job = NULL;
>}
> }
> queue->read_idx = queue->write_idx;
> queue->num_queued = 0;
> mtx_unlock(>lock);
> return 0;
>  }
>  
> +static bool
> +util_queue_create_thread(struct util_queue *queue, unsigned index)
> +{
> +   struct thread_input *input =
> +  (struct thread_input *) malloc(sizeof(struct thread_input));
> +   input->queue = queue;
> +   input->thread_index = index;
> +
> +   queue->threads[index] = u_thread_create(util_queue_thread_func, input);
> +
> +   if (!queue->threads[index]) {
> +  free(input);
> +  return false;
> +   }
> +
> +   if (queue->flags & UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY) {
> +#if defined(__linux__) && defined(SCHED_IDLE)
> +  struct sched_param sched_param = {0};
> +
> +  /* The nice() function can only set a maximum of 19.
> +   * SCHED_IDLE is the same as nice = 20.
> +   *
> +   * Note that Linux only allows decreasing the priority. The original
> +   * priority can't be restored.
> +   */
> +  pthread_setschedparam(queue->threads[index], SCHED_IDLE, _param);
> +#endif
> +   }
> +   return true;
> +}
> +
>  bool
>  util_queue_init(struct util_queue *queue,
>  const char *name,
>  unsigned max_jobs,
>  unsigned num_threads,
>  unsigned flags)
>  {
> unsigned i;
>  
> /* Form the thread name from process_name and name, limited to 13
> @@ -357,53 +388,30 @@ util_queue_init(struct util_queue *queue,
> queue->num_queued = 0;
> cnd_init(>has_queued_cond);
> cnd_init(>has_space_cond);
>  
> queue->threads = (thrd_t*) calloc(num_threads, sizeof(thrd_t));
> if (!queue->threads)
>goto fail;
>  
> /* start threads */
> for (i = 0; i < num_threads; i++) {
> -  struct thread_input *input =
> - (struct thread_input *) malloc(sizeof(struct thread_input));
> -  input->queue = queue;
> -  input->thread_index = i;
> -
> -  queue->threads[i] = u_thread_create(util_queue_thread_func, input);
> -
> -  if (!queue->threads[i]) {
> - free(input);
> -
> +  if (!util_queue_create_thread(queue, i)) {
>   if (i == 0) {
>  /* no threads created, fail */
>  goto fail;
>   } else {
>  /* at least one thread created, so use it */
>  queue->num_threads = i;
>  break;
>   }
>}
> -
> -  if (flags & UTIL_QUEUE_INIT_USE_MINIMUM_PRIORITY) {
> -   #if defined(__linux__) && defined(SCHED_IDLE)
> - struct sched_param sched_param = {0};
> -
> - /* The nice() function can only set a maximum of 19.
> -  * SCHED_IDLE is the same as nice = 20.
> -  *
> -  * Note that Linux only allows decreasing the priority. The original
> -  * priority can't be restored.
> -  */
> - pthread_setschedparam(queue->threads[i], SCHED_IDLE, _param);
> -   #endif
> -  }
> }
>  
> add_to_atexit_list(queue);
> return true;
>  
>  fail:
> free(queue->threads);
>  
> if (queue->jobs) {
>cnd_destroy(>has_space_cond);
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/7] mesa: implement ARB/KHR_parallel_shader_compile

2019-01-03 Thread Ian Romanick
On 1/3/19 11:40 AM, Ian Romanick wrote:
> On 11/28/18 6:59 PM, Marek Olšák wrote:
>> From: Marek Olšák 
>>
>> Tested by piglit.
> 
> It doesn't look like there are any piglit test

Ignore that.  I started typing, checked the piglit list, then forgot to
delete it.

>> ---
>>  docs/features.txt   |  2 +-
>>  docs/relnotes/19.0.0.html   |  2 ++
>>  src/mapi/glapi/gen/gl_API.xml   | 15 ++-
>>  src/mesa/main/dd.h  |  7 +++
>>  src/mesa/main/extensions_table.h|  2 ++
>>  src/mesa/main/get_hash_params.py|  3 +++
>>  src/mesa/main/hint.c| 12 
>>  src/mesa/main/hint.h|  4 
>>  src/mesa/main/mtypes.h  |  1 +
>>  src/mesa/main/shaderapi.c   | 10 ++
>>  src/mesa/main/tests/dispatch_sanity.cpp |  4 
>>  11 files changed, 60 insertions(+), 2 deletions(-)
>>
>> diff --git a/docs/features.txt b/docs/features.txt
>> index 8999e42519c..7b827de6a92 100644
>> --- a/docs/features.txt
>> +++ b/docs/features.txt
>> @@ -295,21 +295,21 @@ GLES3.2, GLSL ES 3.2 -- all DONE: i965/gen9+, 
>> radeonsi, virgl
>>GL_OES_texture_storage_multisample_2d_array   DONE (all drivers 
>> that support GL_ARB_texture_multisample)
>>  
>>  Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL 
>> ES version:
>>  
>>GL_ARB_bindless_texture   DONE (nvc0, 
>> radeonsi)
>>GL_ARB_cl_event   not started
>>GL_ARB_compute_variable_group_sizeDONE (nvc0, 
>> radeonsi)
>>GL_ARB_ES3_2_compatibilityDONE (i965/gen8+, 
>> radeonsi, virgl)
>>GL_ARB_fragment_shader_interlock  DONE (i965)
>>GL_ARB_gpu_shader_int64   DONE (i965/gen8+, 
>> nvc0, radeonsi, softpipe, llvmpipe)
>> -  GL_ARB_parallel_shader_compilenot started, but 
>> Chia-I Wu did some related work in 2014
>> +  GL_ARB_parallel_shader_compileDONE (all drivers)
>>GL_ARB_post_depth_coverageDONE (i965, nvc0)
>>GL_ARB_robustness_isolation   not started
>>GL_ARB_sample_locations   DONE (nvc0)
>>GL_ARB_seamless_cubemap_per_texture   DONE (freedreno, 
>> i965, nvc0, radeonsi, r600, softpipe, swr, virgl)
>>GL_ARB_shader_ballot  DONE (i965/gen8+, 
>> nvc0, radeonsi)
>>GL_ARB_shader_clock   DONE (i965/gen7+, 
>> nv50, nvc0, r600, radeonsi, virgl)
>>GL_ARB_shader_stencil_export  DONE (i965/gen9+, 
>> r600, radeonsi, softpipe, llvmpipe, swr, virgl)
>>GL_ARB_shader_viewport_layer_arrayDONE (i965/gen6+, 
>> nvc0, radeonsi)
>>GL_ARB_sparse_buffer  DONE (radeonsi/CIK+)
>>GL_ARB_sparse_texture not started
>> diff --git a/docs/relnotes/19.0.0.html b/docs/relnotes/19.0.0.html
>> index bc1776e8f4e..540482bca5f 100644
>> --- a/docs/relnotes/19.0.0.html
>> +++ b/docs/relnotes/19.0.0.html
>> @@ -33,24 +33,26 @@ Compatibility contexts may report a lower version 
>> depending on each driver.
>>  SHA256 checksums
>>  
>>  TBD.
>>  
>>  
>>  
>>  New features
>>  
>>  
>>  GL_AMD_texture_texture4 on all GL 4.0 drivers.
>> +GL_ARB_parallel_shader_compile on all drivers.
>>  GL_EXT_shader_implicit_conversions on all drivers (ES extension).
>>  GL_EXT_texture_compression_bptc on all GL 4.0 drivers (ES 
>> extension).
>>  GL_EXT_texture_compression_rgtc on all GL 3.0 drivers (ES 
>> extension).
>>  GL_EXT_texture_view on drivers supporting texture views (ES 
>> extension).
>> +GL_KHR_parallel_shader_compile on all drivers.
>>  GL_OES_texture_view on drivers supporting texture views (ES 
>> extension).
>>  
>>  
>>  Bug fixes
>>  
>>  
>>  TBD
>>  
>>  
>>  Changes
>> diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
>> index f4d0808f13b..4ce691b361b 100644
>> --- a/src/mapi/glapi/gen/gl_API.xml
>> +++ b/src/mapi/glapi/gen/gl_API.xml
>> @@ -8402,21 +8402,34 @@
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  > xmlns:xi="http://www.w3.org/2001/XInclude"/>
>>  
>> -
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +
>> +> alias="MaxShaderCompilerThreadsKHR">
>> +
>> +
>> +
>> +
>> +
>>  
>>  > xmlns:xi="http://www.w3.org/2001/XInclude"/>
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>> diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
>> index f14c3e04e91..92b6ecac33c 100644
>> --- a/src/mesa/main/dd.h
>> +++ b/src/mesa/main/dd.h
>> @@ -1292,20 +1292,27 @@ struct dd_function_table {
>> /**
>>  * Called to initialize gl_program::driver_cache_blob (and size) with a
>>  * ralloc 

Re: [Mesa-dev] [PATCH 1/7] mesa: implement ARB/KHR_parallel_shader_compile

2019-01-03 Thread Ian Romanick
On 11/28/18 6:59 PM, Marek Olšák wrote:
> From: Marek Olšák 
> 
> Tested by piglit.

It doesn't look like there are any piglit test

> ---
>  docs/features.txt   |  2 +-
>  docs/relnotes/19.0.0.html   |  2 ++
>  src/mapi/glapi/gen/gl_API.xml   | 15 ++-
>  src/mesa/main/dd.h  |  7 +++
>  src/mesa/main/extensions_table.h|  2 ++
>  src/mesa/main/get_hash_params.py|  3 +++
>  src/mesa/main/hint.c| 12 
>  src/mesa/main/hint.h|  4 
>  src/mesa/main/mtypes.h  |  1 +
>  src/mesa/main/shaderapi.c   | 10 ++
>  src/mesa/main/tests/dispatch_sanity.cpp |  4 
>  11 files changed, 60 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/features.txt b/docs/features.txt
> index 8999e42519c..7b827de6a92 100644
> --- a/docs/features.txt
> +++ b/docs/features.txt
> @@ -295,21 +295,21 @@ GLES3.2, GLSL ES 3.2 -- all DONE: i965/gen9+, radeonsi, 
> virgl
>GL_OES_texture_storage_multisample_2d_array   DONE (all drivers 
> that support GL_ARB_texture_multisample)
>  
>  Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL 
> ES version:
>  
>GL_ARB_bindless_texture   DONE (nvc0, radeonsi)
>GL_ARB_cl_event   not started
>GL_ARB_compute_variable_group_sizeDONE (nvc0, radeonsi)
>GL_ARB_ES3_2_compatibilityDONE (i965/gen8+, 
> radeonsi, virgl)
>GL_ARB_fragment_shader_interlock  DONE (i965)
>GL_ARB_gpu_shader_int64   DONE (i965/gen8+, 
> nvc0, radeonsi, softpipe, llvmpipe)
> -  GL_ARB_parallel_shader_compilenot started, but 
> Chia-I Wu did some related work in 2014
> +  GL_ARB_parallel_shader_compileDONE (all drivers)
>GL_ARB_post_depth_coverageDONE (i965, nvc0)
>GL_ARB_robustness_isolation   not started
>GL_ARB_sample_locations   DONE (nvc0)
>GL_ARB_seamless_cubemap_per_texture   DONE (freedreno, 
> i965, nvc0, radeonsi, r600, softpipe, swr, virgl)
>GL_ARB_shader_ballot  DONE (i965/gen8+, 
> nvc0, radeonsi)
>GL_ARB_shader_clock   DONE (i965/gen7+, 
> nv50, nvc0, r600, radeonsi, virgl)
>GL_ARB_shader_stencil_export  DONE (i965/gen9+, 
> r600, radeonsi, softpipe, llvmpipe, swr, virgl)
>GL_ARB_shader_viewport_layer_arrayDONE (i965/gen6+, 
> nvc0, radeonsi)
>GL_ARB_sparse_buffer  DONE (radeonsi/CIK+)
>GL_ARB_sparse_texture not started
> diff --git a/docs/relnotes/19.0.0.html b/docs/relnotes/19.0.0.html
> index bc1776e8f4e..540482bca5f 100644
> --- a/docs/relnotes/19.0.0.html
> +++ b/docs/relnotes/19.0.0.html
> @@ -33,24 +33,26 @@ Compatibility contexts may report a lower version 
> depending on each driver.
>  SHA256 checksums
>  
>  TBD.
>  
>  
>  
>  New features
>  
>  
>  GL_AMD_texture_texture4 on all GL 4.0 drivers.
> +GL_ARB_parallel_shader_compile on all drivers.
>  GL_EXT_shader_implicit_conversions on all drivers (ES extension).
>  GL_EXT_texture_compression_bptc on all GL 4.0 drivers (ES extension).
>  GL_EXT_texture_compression_rgtc on all GL 3.0 drivers (ES extension).
>  GL_EXT_texture_view on drivers supporting texture views (ES 
> extension).
> +GL_KHR_parallel_shader_compile on all drivers.
>  GL_OES_texture_view on drivers supporting texture views (ES 
> extension).
>  
>  
>  Bug fixes
>  
>  
>  TBD
>  
>  
>  Changes
> diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
> index f4d0808f13b..4ce691b361b 100644
> --- a/src/mapi/glapi/gen/gl_API.xml
> +++ b/src/mapi/glapi/gen/gl_API.xml
> @@ -8402,21 +8402,34 @@
>  
>  
>  
>  
>  
>  
>  
>  
>   xmlns:xi="http://www.w3.org/2001/XInclude"/>
>  
> -
> +
> +
> +
> +
> +
> +
> +
> +
> + alias="MaxShaderCompilerThreadsKHR">
> +
> +
> +
> +
> +
>  
>   xmlns:xi="http://www.w3.org/2001/XInclude"/>
>  
>  
>  
>  
>  
>  
>  
>  
> diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
> index f14c3e04e91..92b6ecac33c 100644
> --- a/src/mesa/main/dd.h
> +++ b/src/mesa/main/dd.h
> @@ -1292,20 +1292,27 @@ struct dd_function_table {
> /**
>  * Called to initialize gl_program::driver_cache_blob (and size) with a
>  * ralloc allocated buffer.
>  *
>  * This buffer will be saved and restored as part of the gl_program
>  * serialization and deserialization.
>  */
> void (*ShaderCacheSerializeDriverBlob)(struct gl_context *ctx,
>struct gl_program 

Re: [Mesa-dev] NIR constant problem for GPU which doesn't have native integer support

2019-01-03 Thread Roland Scheidegger
Am 03.01.19 um 18:58 schrieb Jason Ekstrand:
> On Thu, Jan 3, 2019 at 3:39 AM Erik Faye-Lund
> mailto:erik.faye-l...@collabora.com>> wrote:
> 
> On Wed, 2019-01-02 at 10:16 -0600, Jason Ekstrand wrote:
> > On Wed, Jan 2, 2019 at 9:43 AM Ilia Mirkin  >
> > wrote:
> > > Have a look at the first 4 patches in the series from Jonathan
> > > Marek
> > > to address some of these issues:
> > >
> > > https://patchwork.freedesktop.org/series/54295/
> 
> 
> > >
> > > Not sure exactly what state that work is in, but I've added
> > > Jonathan
> > > to CC, perhaps he can provide an update.
> > >
> > > Cheers,
> > >
> > >   -ilia
> > >
> > > On Wed, Jan 2, 2019 at 6:28 AM Qiang Yu  > wrote:
> > > >
> > > > Hi guys,
> > > >
> > > > I found the problem with this test fragment shader when lima
> > > development:
> > > > uniform int color;
> > > > void main() {
> > > >     if (color > 1)
> > > >         gl_FragColor = vec4(1.0, 0.0, 0.0, 1);
> > > >     else
> > > >         gl_FragColor = vec4(0.0, 1.0, 0.0, 1);
> > > > }
> > > >
> > > > nir_print_shader output:
> > > > impl main {
> > > >         block block_0:
> > > >         /* preds: */
> > > >         vec1 32 ssa_0 = load_const (0x0001 /* 0.00 */)
> > > >         vec4 32 ssa_1 = load_const (0x3f80 /* 1.00 */,
> > > > 0x /* 0.00 */, 0x /* 0.00 */, 0x3f80
> > > /*
> > > > 1.00 */)
> > > >         vec4 32 ssa_2 = load_const (0x /* 0.00 */,
> > > > 0x3f80 /* 1.00 */, 0x /* 0.00 */, 0x3f80
> > > /*
> > > > 1.00 */)
> > > >         vec1 32 ssa_3 = load_const (0x /* 0.00 */)
> > > >         vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 1, 0)
> > > /*
> > > > base=0 */ /* range=1 */ /* component=0 */   /* color */
> > > >         vec1 32 ssa_5 = slt ssa_0, ssa_4
> > > >         vec1 32 ssa_6 = fnot ssa_5
> > > >         vec4 32 ssa_7 = bcsel ssa_6., ssa_2, ssa_1
> > > >         intrinsic store_output (ssa_7, ssa_3) (0, 15, 0) /*
> > > base=0 */
> > > > /* wrmask=xyzw */ /* component=0 */       /* gl_FragColor */
> > > >         /* succs: block_1 */
> > > >         block block_1:
> > > > }
> > > >
> > > > ssa0 is not converted to float when glsl to nir. I see
> > > glsl_to_nir.cpp
> > > > will create flt/ilt/ult
> > > > based on source type for gpu support native integer, but for gpu
> > > not
> > > > support native
> > > > integer, just create slt for all source type. And in
> > > > nir_lower_constant_initializers,
> > > > there's also no type conversion for integer constant.
> >
> > This is a generally sticky issue.  In NIR, we have no concept of
> > types on SSA values which has proven perfectly reasonable and
> > actually very powerful in a world where integers are supported
> > natively.  Unfortunately, it causes significant problems for float-
> > only architectures.
> 
> I would like to take this chance to say that this untyped SSA-value
> choice has lead to issues in both radeon_si (because LLVM values are
> typed) and zink (similarly, because SPIR-V values are typed), where we
> need to to bitcasts on every access because there's just not enough
> information available to emit variables with the right type.
> 
> 
> I'm not sure if I agree that the two problems are the same or not... 
> More on that in a bit.
>  
> 
> It took us a lot of time to realize that the meta-data from the opcodes
> doesn't *really* provide this, because the rest of nir doesn't treat
> values consistently. In fact, this feels arguably more like buggy
> behavior; why do we even have fmov when all of the time the compiler
> will emit imovs for floating-point values...? Or why do we have bitcast
> 
> 
> Why do we have different mov opcodes?  Because they have different
> behavior in the presence of source/destination modifiers.  You likely
> don't use those in radeon or Zink but we do use them on Intel.  That
> being said, I've very seriously considered adding a generic nir_op_mov
> which is entirely typeless and doesn't support modifiers at all and make
> most of core NIR use that.  For that matter, there's no real reason why
> we need fmov with modifiers at all when we could we could easily replace
> "ssa1 = fmov.sat |x|" with "ssa1 = fsat |x|" or "ssa1 = fabs.sat x".  If
> we had a generic nir_op_mov to deal 

Re: [Mesa-dev] [PATCH mesa 1/2] docs: fix the meson aarch64 cross-file

2019-01-03 Thread Dylan Baker
Quoting Eric Engestrom (2019-01-03 08:37:03)
> `gcc-ar` is preferred over the generic `ar`, and the `arm` family is
> for 32-bit ARM [1].
> 
> [1] https://mesonbuild.com/Reference-tables.html#cpu-families
> 
> Signed-off-by: Eric Engestrom 
> ---
>  docs/meson.html | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/meson.html b/docs/meson.html
> index e7ae061e6096fe12a8b7..eff13aa1c515825eb564 100644
> --- a/docs/meson.html
> +++ b/docs/meson.html
> @@ -286,14 +286,14 @@ 2. Cross-compilation and 
> 32-bit builds
>  [binaries]
>  c = '/usr/bin/aarch64-linux-gnu-gcc'
>  cpp = '/usr/bin/aarch64-linux-gnu-g++'
> -ar = '/usr/bin/aarch64-linux-gnu-ar'
> +ar = '/usr/bin/aarch64-linux-gnu-gcc-ar'
>  strip = '/usr/bin/aarch64-linux-gnu-strip'
>  pkgconfig = '/usr/bin/aarch64-linux-gnu-pkg-config'
>  exe_wrapper = '/usr/bin/qemu-aarch64-static'
>  
>  [host_machine]
>  system = 'linux'
> -cpu_family = 'arm'
> +cpu_family = 'aarch64'
>  cpu = 'aarch64'
>  endian = 'little'
>  
> -- 
> Cheers,
>   Eric
> 

for the series:
Reviewed-by: Dylan Baker 


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] NIR constant problem for GPU which doesn't have native integer support

2019-01-03 Thread Jason Ekstrand
On Thu, Jan 3, 2019 at 3:39 AM Erik Faye-Lund 
wrote:

> On Wed, 2019-01-02 at 10:16 -0600, Jason Ekstrand wrote:
> > On Wed, Jan 2, 2019 at 9:43 AM Ilia Mirkin 
> > wrote:
> > > Have a look at the first 4 patches in the series from Jonathan
> > > Marek
> > > to address some of these issues:
> > >
> > > https://patchwork.freedesktop.org/series/54295/
> > >
> > > Not sure exactly what state that work is in, but I've added
> > > Jonathan
> > > to CC, perhaps he can provide an update.
> > >
> > > Cheers,
> > >
> > >   -ilia
> > >
> > > On Wed, Jan 2, 2019 at 6:28 AM Qiang Yu  wrote:
> > > >
> > > > Hi guys,
> > > >
> > > > I found the problem with this test fragment shader when lima
> > > development:
> > > > uniform int color;
> > > > void main() {
> > > > if (color > 1)
> > > > gl_FragColor = vec4(1.0, 0.0, 0.0, 1);
> > > > else
> > > > gl_FragColor = vec4(0.0, 1.0, 0.0, 1);
> > > > }
> > > >
> > > > nir_print_shader output:
> > > > impl main {
> > > > block block_0:
> > > > /* preds: */
> > > > vec1 32 ssa_0 = load_const (0x0001 /* 0.00 */)
> > > > vec4 32 ssa_1 = load_const (0x3f80 /* 1.00 */,
> > > > 0x /* 0.00 */, 0x /* 0.00 */, 0x3f80
> > > /*
> > > > 1.00 */)
> > > > vec4 32 ssa_2 = load_const (0x /* 0.00 */,
> > > > 0x3f80 /* 1.00 */, 0x /* 0.00 */, 0x3f80
> > > /*
> > > > 1.00 */)
> > > > vec1 32 ssa_3 = load_const (0x /* 0.00 */)
> > > > vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 1, 0)
> > > /*
> > > > base=0 */ /* range=1 */ /* component=0 */   /* color */
> > > > vec1 32 ssa_5 = slt ssa_0, ssa_4
> > > > vec1 32 ssa_6 = fnot ssa_5
> > > > vec4 32 ssa_7 = bcsel ssa_6., ssa_2, ssa_1
> > > > intrinsic store_output (ssa_7, ssa_3) (0, 15, 0) /*
> > > base=0 */
> > > > /* wrmask=xyzw */ /* component=0 */   /* gl_FragColor */
> > > > /* succs: block_1 */
> > > > block block_1:
> > > > }
> > > >
> > > > ssa0 is not converted to float when glsl to nir. I see
> > > glsl_to_nir.cpp
> > > > will create flt/ilt/ult
> > > > based on source type for gpu support native integer, but for gpu
> > > not
> > > > support native
> > > > integer, just create slt for all source type. And in
> > > > nir_lower_constant_initializers,
> > > > there's also no type conversion for integer constant.
> >
> > This is a generally sticky issue.  In NIR, we have no concept of
> > types on SSA values which has proven perfectly reasonable and
> > actually very powerful in a world where integers are supported
> > natively.  Unfortunately, it causes significant problems for float-
> > only architectures.
>
> I would like to take this chance to say that this untyped SSA-value
> choice has lead to issues in both radeon_si (because LLVM values are
> typed) and zink (similarly, because SPIR-V values are typed), where we
> need to to bitcasts on every access because there's just not enough
> information available to emit variables with the right type.
>

I'm not sure if I agree that the two problems are the same or not...  More
on that in a bit.


> It took us a lot of time to realize that the meta-data from the opcodes
> doesn't *really* provide this, because the rest of nir doesn't treat
> values consistently. In fact, this feels arguably more like buggy
> behavior; why do we even have fmov when all of the time the compiler
> will emit imovs for floating-point values...? Or why do we have bitcast
>

Why do we have different mov opcodes?  Because they have different behavior
in the presence of source/destination modifiers.  You likely don't use
those in radeon or Zink but we do use them on Intel.  That being said, I've
very seriously considered adding a generic nir_op_mov which is entirely
typeless and doesn't support modifiers at all and make most of core NIR use
that.  For that matter, there's no real reason why we need fmov with
modifiers at all when we could we could easily replace "ssa1 = fmov.sat
|x|" with "ssa1 = fsat |x|" or "ssa1 = fabs.sat x".  If we had a generic
nir_op_mov to deal with swizzling etc., the only thing having i/fmov would
really accomplish is lowering the number of different ways you can write
"fmov.sat |x|".  The only reason I haven't added this nir_op_mov opcode is
because it's a pile of work and anyone who can't do integers can just
implement imov as fmov and it's not a big deal.


> I would really love it if we could at least consider making this "we
> can just treat floats as ints without bitcasts if we feel like it"-
> design optional for the backend somehow.
>
> I guess the assumption is that bitcasts are for free? They aren't once
> you have to emit them and have a back-end remove a bunch of redundant
> ones... We should already have all the information to trivially place
> casts for backends that care, yet we currently make it hard (unless
> your HW/backend happens to 

[Mesa-dev] MR: ARB_gl_spirv xfb improvement

2019-01-03 Thread apinheiro
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/63

This series started as just trying to use the general NIR xfb gathering
pass, added for Vulkan consumption, and ended fixing some corner cases
on such gathering, and on the ARB_gl_spirv support for xfb itself. All
that can be tested on piglit here:
https://github.com/Igalia/piglit/tree/apinheiro/xfb

For now a personal branch. It is mostly done, but it is pending the
double checking before sending to review.

As part of this series, we also change how input arrays of blocks are
splitted, in order to support interpolators qualifiers for such case.



pEpkey.asc
Description: application/pgp-keys
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] MR: i965: fix VF cache issue workaround

2019-01-03 Thread Lionel Landwerlin

https://gitlab.freedesktop.org/mesa/mesa/merge_requests/62

A small series to fix an issue observed in Blender : 
https://bugs.freedesktop.org/show_bug.cgi?id=109072


Cheers,

-
Lionel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH mesa 2/2] docs: advertise distro-provided meson cross-files

2019-01-03 Thread Eric Engestrom
Hopefully we can kick start the revolution and other distros will start
providing them as well :)

Signed-off-by: Eric Engestrom 
---
 docs/meson.html | 9 +
 1 file changed, 9 insertions(+)

diff --git a/docs/meson.html b/docs/meson.html
index eff13aa1c515825eb564..082333b19d3673af1340 100644
--- a/docs/meson.html
+++ b/docs/meson.html
@@ -255,6 +255,15 @@ 2. Cross-compilation and 32-bit 
builds
 Below are a few example of cross files, but keep in mind that you
 will likely have to alter them for your system.
 
+
+Those running on ArchLinux can use the AUR-maintained packages for some
+of those, as they'll have the right values for your system:
+
+  https://aur.archlinux.org/packages/meson-cross-x86-linux-gnu;>meson-cross-x86-linux-gnu
+  https://aur.archlinux.org/packages/meson-cross-aarch64-linux-gnu;>meson-cross-aarch64-linux-gnu
+
+
+
 
 32-bit build on x86 linux:
 
-- 
Cheers,
  Eric

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH mesa 1/2] docs: fix the meson aarch64 cross-file

2019-01-03 Thread Eric Engestrom
`gcc-ar` is preferred over the generic `ar`, and the `arm` family is
for 32-bit ARM [1].

[1] https://mesonbuild.com/Reference-tables.html#cpu-families

Signed-off-by: Eric Engestrom 
---
 docs/meson.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/meson.html b/docs/meson.html
index e7ae061e6096fe12a8b7..eff13aa1c515825eb564 100644
--- a/docs/meson.html
+++ b/docs/meson.html
@@ -286,14 +286,14 @@ 2. Cross-compilation and 
32-bit builds
 [binaries]
 c = '/usr/bin/aarch64-linux-gnu-gcc'
 cpp = '/usr/bin/aarch64-linux-gnu-g++'
-ar = '/usr/bin/aarch64-linux-gnu-ar'
+ar = '/usr/bin/aarch64-linux-gnu-gcc-ar'
 strip = '/usr/bin/aarch64-linux-gnu-strip'
 pkgconfig = '/usr/bin/aarch64-linux-gnu-pkg-config'
 exe_wrapper = '/usr/bin/qemu-aarch64-static'
 
 [host_machine]
 system = 'linux'
-cpu_family = 'arm'
+cpu_family = 'aarch64'
 cpu = 'aarch64'
 endian = 'little'
 
-- 
Cheers,
  Eric

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108900] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test

2019-01-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108900

Eero Tamminen  changed:

   What|Removed |Added

   See Also||https://bugs.freedesktop.or
   ||g/show_bug.cgi?id=108898

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 109058] Machine freezes during early stages of gfxbench startup

2019-01-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109058

--- Comment #3 from Eero Tamminen  ---
(In reply to cstout from comment #0)
> Is this hardware known to work?
> Is there a recommended test app to validate that the driver works on my 
> system?
> I'm trying to run gfxbench 5.0, the vulkan_5_normal scenario.

All the Linux 3D benchmarks we tried (before Mesa AMD driver stopped building
with LLVM v6) worked OK on HadesCanyon. GfxBench v5 (and especially its
AztecRuins test) was the only exception, but there's no public GfxBench v5
release for Linux yet...

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] virgl: use primconvert provoking vertex properly

2019-01-03 Thread Gert Wollny
Am Donnerstag, den 03.01.2019, 14:28 +0100 schrieb Gert Wollny:
> Tested it, looks fine, 
> Reviewed-By: Gert Wollny 
Sorry, that should be 


> 
> Am Freitag, den 28.12.2018, 16:54 +1000 schrieb Dave Airlie:
> > From: Dave Airlie 
> > 
> > This stores the raster state and calls the correct primconvert
> > interface
> > using the currently bound raster state.
> > 
> > This fixes piglit glsl-fs-flat-color
> > ---
> >  src/gallium/drivers/virgl/virgl_context.c | 26 ---
> > --
> > --
> >  src/gallium/drivers/virgl/virgl_context.h |  6 ++
> >  2 files changed, 24 insertions(+), 8 deletions(-)
> > 
> > diff --git a/src/gallium/drivers/virgl/virgl_context.c
> > b/src/gallium/drivers/virgl/virgl_context.c
> > index f095920489f..6ed4e2f8394 100644
> > --- a/src/gallium/drivers/virgl/virgl_context.c
> > +++ b/src/gallium/drivers/virgl/virgl_context.c
> > @@ -322,19 +322,27 @@ static void
> > *virgl_create_rasterizer_state(struct pipe_context *ctx,
> > const struct
> > pipe_rasterizer_state *rs_state)
> >  {
> > struct virgl_context *vctx = virgl_context(ctx);
> > -   uint32_t handle;
> > -   handle = virgl_object_assign_handle();
> > +   struct virgl_rasterizer_state *vrs =
> > CALLOC_STRUCT(virgl_rasterizer_state);
> >  
> > -   virgl_encode_rasterizer_state(vctx, handle, rs_state);
> > -   return (void *)(unsigned long)handle;
> > +   if (!vrs)
> > +  return NULL;
> > +   vrs->rs = *rs_state;
> > +   vrs->handle = virgl_object_assign_handle();
> > +
> > +   virgl_encode_rasterizer_state(vctx, vrs->handle, rs_state);
> > +   return (void *)vrs;
> >  }
> >  
> >  static void virgl_bind_rasterizer_state(struct pipe_context *ctx,
> >  void *rs_state)
> >  {
> > struct virgl_context *vctx = virgl_context(ctx);
> > -   uint32_t handle = (unsigned long)rs_state;
> > -
> > +   uint32_t handle = 0;
> > +   if (rs_state) {
> > +  struct virgl_rasterizer_state *vrs = rs_state;
> > +  vctx->rs_state = *vrs;
> > +  handle = vrs->handle;
> > +   }
> > virgl_encode_bind_object(vctx, handle,
> > VIRGL_OBJECT_RASTERIZER);
> >  }
> >  
> > @@ -342,8 +350,9 @@ static void
> > virgl_delete_rasterizer_state(struct
> > pipe_context *ctx,
> >   void *rs_state)
> >  {
> > struct virgl_context *vctx = virgl_context(ctx);
> > -   uint32_t handle = (unsigned long)rs_state;
> > -   virgl_encode_delete_object(vctx, handle,
> > VIRGL_OBJECT_RASTERIZER);
> > +   struct virgl_rasterizer_state *vrs = rs_state;
> > +   virgl_encode_delete_object(vctx, vrs->handle,
> > VIRGL_OBJECT_RASTERIZER);
> > +   FREE(vrs);
> >  }
> >  
> >  static void virgl_set_framebuffer_state(struct pipe_context *ctx,
> > @@ -695,6 +704,7 @@ static void virgl_draw_vbo(struct pipe_context
> > *ctx,
> >return;
> >  
> > if (!(rs->caps.caps.v1.prim_mask & (1 << dinfo->mode))) {
> > +  util_primconvert_save_rasterizer_state(vctx->primconvert,
> > >rs_state.rs);
> >util_primconvert_draw_vbo(vctx->primconvert, dinfo);
> >return;
> > }
> > diff --git a/src/gallium/drivers/virgl/virgl_context.h
> > b/src/gallium/drivers/virgl/virgl_context.h
> > index 0f51d730985..79a1a73e615 100644
> > --- a/src/gallium/drivers/virgl/virgl_context.h
> > +++ b/src/gallium/drivers/virgl/virgl_context.h
> > @@ -49,6 +49,11 @@ struct virgl_textures_info {
> > uint32_t enabled_mask;
> >  };
> >  
> > +struct virgl_rasterizer_state {
> > +   struct pipe_rasterizer_state rs;
> > +   uint32_t handle;
> > +};
> > +
> >  struct virgl_context {
> > struct pipe_context base;
> > struct virgl_cmd_buf *cbuf;
> > @@ -66,6 +71,7 @@ struct virgl_context {
> > unsigned num_vertex_buffers;
> > boolean vertex_array_dirty;
> >  
> > +   struct virgl_rasterizer_state rs_state;
> > struct virgl_so_target so_targets[PIPE_MAX_SO_BUFFERS];
> > unsigned num_so_targets;
> >  
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] virgl: use primconvert provoking vertex properly

2019-01-03 Thread Gert Wollny
Tested it, looks fine, 
Reviewed-By: Gert Wollny 

Am Freitag, den 28.12.2018, 16:54 +1000 schrieb Dave Airlie:
> From: Dave Airlie 
> 
> This stores the raster state and calls the correct primconvert
> interface
> using the currently bound raster state.
> 
> This fixes piglit glsl-fs-flat-color
> ---
>  src/gallium/drivers/virgl/virgl_context.c | 26 -
> --
>  src/gallium/drivers/virgl/virgl_context.h |  6 ++
>  2 files changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/src/gallium/drivers/virgl/virgl_context.c
> b/src/gallium/drivers/virgl/virgl_context.c
> index f095920489f..6ed4e2f8394 100644
> --- a/src/gallium/drivers/virgl/virgl_context.c
> +++ b/src/gallium/drivers/virgl/virgl_context.c
> @@ -322,19 +322,27 @@ static void
> *virgl_create_rasterizer_state(struct pipe_context *ctx,
> const struct
> pipe_rasterizer_state *rs_state)
>  {
> struct virgl_context *vctx = virgl_context(ctx);
> -   uint32_t handle;
> -   handle = virgl_object_assign_handle();
> +   struct virgl_rasterizer_state *vrs =
> CALLOC_STRUCT(virgl_rasterizer_state);
>  
> -   virgl_encode_rasterizer_state(vctx, handle, rs_state);
> -   return (void *)(unsigned long)handle;
> +   if (!vrs)
> +  return NULL;
> +   vrs->rs = *rs_state;
> +   vrs->handle = virgl_object_assign_handle();
> +
> +   virgl_encode_rasterizer_state(vctx, vrs->handle, rs_state);
> +   return (void *)vrs;
>  }
>  
>  static void virgl_bind_rasterizer_state(struct pipe_context *ctx,
>  void *rs_state)
>  {
> struct virgl_context *vctx = virgl_context(ctx);
> -   uint32_t handle = (unsigned long)rs_state;
> -
> +   uint32_t handle = 0;
> +   if (rs_state) {
> +  struct virgl_rasterizer_state *vrs = rs_state;
> +  vctx->rs_state = *vrs;
> +  handle = vrs->handle;
> +   }
> virgl_encode_bind_object(vctx, handle, VIRGL_OBJECT_RASTERIZER);
>  }
>  
> @@ -342,8 +350,9 @@ static void virgl_delete_rasterizer_state(struct
> pipe_context *ctx,
>   void *rs_state)
>  {
> struct virgl_context *vctx = virgl_context(ctx);
> -   uint32_t handle = (unsigned long)rs_state;
> -   virgl_encode_delete_object(vctx, handle,
> VIRGL_OBJECT_RASTERIZER);
> +   struct virgl_rasterizer_state *vrs = rs_state;
> +   virgl_encode_delete_object(vctx, vrs->handle,
> VIRGL_OBJECT_RASTERIZER);
> +   FREE(vrs);
>  }
>  
>  static void virgl_set_framebuffer_state(struct pipe_context *ctx,
> @@ -695,6 +704,7 @@ static void virgl_draw_vbo(struct pipe_context
> *ctx,
>return;
>  
> if (!(rs->caps.caps.v1.prim_mask & (1 << dinfo->mode))) {
> +  util_primconvert_save_rasterizer_state(vctx->primconvert,
> >rs_state.rs);
>util_primconvert_draw_vbo(vctx->primconvert, dinfo);
>return;
> }
> diff --git a/src/gallium/drivers/virgl/virgl_context.h
> b/src/gallium/drivers/virgl/virgl_context.h
> index 0f51d730985..79a1a73e615 100644
> --- a/src/gallium/drivers/virgl/virgl_context.h
> +++ b/src/gallium/drivers/virgl/virgl_context.h
> @@ -49,6 +49,11 @@ struct virgl_textures_info {
> uint32_t enabled_mask;
>  };
>  
> +struct virgl_rasterizer_state {
> +   struct pipe_rasterizer_state rs;
> +   uint32_t handle;
> +};
> +
>  struct virgl_context {
> struct pipe_context base;
> struct virgl_cmd_buf *cbuf;
> @@ -66,6 +71,7 @@ struct virgl_context {
> unsigned num_vertex_buffers;
> boolean vertex_array_dirty;
>  
> +   struct virgl_rasterizer_state rs_state;
> struct virgl_so_target so_targets[PIPE_MAX_SO_BUFFERS];
> unsigned num_so_targets;
>  
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] st/va: Return correct status from vlVaQuerySurfaceStatus

2019-01-03 Thread Das, Indrajit-kumar
From: Indrajit Das 
Date: Thu, 3 Jan 2019 14:36:33 +0530
Subject: [PATCH] st/va: Return correct status from vlVaQuerySurfaceStatus

Signed-off-by: Indrajit Das 
---
 src/gallium/state_trackers/va/surface.c | 31 +
 1 file changed, 31 insertions(+)

diff --git a/src/gallium/state_trackers/va/surface.c 
b/src/gallium/state_trackers/va/surface.c
index cc26efe..e7ed64b 100644
--- a/src/gallium/state_trackers/va/surface.c
+++ b/src/gallium/state_trackers/va/surface.c
@@ -146,9 +146,40 @@ vlVaSyncSurface(VADriverContextP ctx, VASurfaceID 
render_target)
 VAStatus
 vlVaQuerySurfaceStatus(VADriverContextP ctx, VASurfaceID render_target, 
VASurfaceStatus *status)
 {
+   vlVaDriver *drv;
+   vlVaSurface *surf;
+   vlVaContext *context;
+
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
+   drv = VL_VA_DRIVER(ctx);
+   if (!drv)
+  return VA_STATUS_ERROR_INVALID_CONTEXT;
+
+   mtx_lock(>mutex);
+
+   surf = handle_table_get(drv->htab, render_target);
+   if (!surf || !surf->buffer) {
+  mtx_unlock(>mutex);
+  return VA_STATUS_ERROR_INVALID_SURFACE;
+   }
+
+   context = handle_table_get(drv->htab, surf->ctx);
+   if (!context) {
+  mtx_unlock(>mutex);
+  return VA_STATUS_ERROR_INVALID_CONTEXT;
+   }
+
+   if (context->decoder->entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE) {
+  if(surf->feedback == NULL)
+ *status=VASurfaceReady;
+  else
+ *status=VASurfaceRendering;
+   }
+
+   mtx_unlock(>mutex);
+
return VA_STATUS_SUCCESS;
 }
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] NIR constant problem for GPU which doesn't have native integer support

2019-01-03 Thread Erik Faye-Lund
On Wed, 2019-01-02 at 10:16 -0600, Jason Ekstrand wrote:
> On Wed, Jan 2, 2019 at 9:43 AM Ilia Mirkin 
> wrote:
> > Have a look at the first 4 patches in the series from Jonathan
> > Marek
> > to address some of these issues:
> > 
> > https://patchwork.freedesktop.org/series/54295/
> > 
> > Not sure exactly what state that work is in, but I've added
> > Jonathan
> > to CC, perhaps he can provide an update.
> > 
> > Cheers,
> > 
> >   -ilia
> > 
> > On Wed, Jan 2, 2019 at 6:28 AM Qiang Yu  wrote:
> > >
> > > Hi guys,
> > >
> > > I found the problem with this test fragment shader when lima
> > development:
> > > uniform int color;
> > > void main() {
> > > if (color > 1)
> > > gl_FragColor = vec4(1.0, 0.0, 0.0, 1);
> > > else
> > > gl_FragColor = vec4(0.0, 1.0, 0.0, 1);
> > > }
> > >
> > > nir_print_shader output:
> > > impl main {
> > > block block_0:
> > > /* preds: */
> > > vec1 32 ssa_0 = load_const (0x0001 /* 0.00 */)
> > > vec4 32 ssa_1 = load_const (0x3f80 /* 1.00 */,
> > > 0x /* 0.00 */, 0x /* 0.00 */, 0x3f80
> > /*
> > > 1.00 */)
> > > vec4 32 ssa_2 = load_const (0x /* 0.00 */,
> > > 0x3f80 /* 1.00 */, 0x /* 0.00 */, 0x3f80
> > /*
> > > 1.00 */)
> > > vec1 32 ssa_3 = load_const (0x /* 0.00 */)
> > > vec1 32 ssa_4 = intrinsic load_uniform (ssa_3) (0, 1, 0)
> > /*
> > > base=0 */ /* range=1 */ /* component=0 */   /* color */
> > > vec1 32 ssa_5 = slt ssa_0, ssa_4
> > > vec1 32 ssa_6 = fnot ssa_5
> > > vec4 32 ssa_7 = bcsel ssa_6., ssa_2, ssa_1
> > > intrinsic store_output (ssa_7, ssa_3) (0, 15, 0) /*
> > base=0 */
> > > /* wrmask=xyzw */ /* component=0 */   /* gl_FragColor */
> > > /* succs: block_1 */
> > > block block_1:
> > > }
> > >
> > > ssa0 is not converted to float when glsl to nir. I see
> > glsl_to_nir.cpp
> > > will create flt/ilt/ult
> > > based on source type for gpu support native integer, but for gpu
> > not
> > > support native
> > > integer, just create slt for all source type. And in
> > > nir_lower_constant_initializers,
> > > there's also no type conversion for integer constant.
> 
> This is a generally sticky issue.  In NIR, we have no concept of
> types on SSA values which has proven perfectly reasonable and
> actually very powerful in a world where integers are supported
> natively.  Unfortunately, it causes significant problems for float-
> only architectures.

I would like to take this chance to say that this untyped SSA-value
choice has lead to issues in both radeon_si (because LLVM values are
typed) and zink (similarly, because SPIR-V values are typed), where we
need to to bitcasts on every access because there's just not enough
information available to emit variables with the right type.

It took us a lot of time to realize that the meta-data from the opcodes
doesn't *really* provide this, because the rest of nir doesn't treat
values consistently. In fact, this feels arguably more like buggy
behavior; why do we even have fmov when all of the time the compiler
will emit imovs for floating-point values...? Or why do we have bitcast

I would really love it if we could at least consider making this "we
can just treat floats as ints without bitcasts if we feel like it"-
design optional for the backend somehow.

I guess the assumption is that bitcasts are for free? They aren't once
you have to emit them and have a back-end remove a bunch of redundant
ones... We should already have all the information to trivially place
casts for backends that care, yet we currently make it hard (unless
your HW/backend happens to be untyped)...

Is there some way we can perhaps improve this for backends that care?

>   There are two general possible solutions:
> 
>  1. convert all integers to floats in glsl_to_nir and prog_to_nir and
> adjust various lowering/optimization passes to handle
> nir_compiler_options::supports_native_integers == false.
> 
>  2. Just allow integers all the way through until you get very close
> to the end and then lower integers to floats at the last possible
> moment.
> 
> Both of these come with significant issues.  With the first approach,
> there are potentially a lot of passes that will need to be adjusted
> and it's not 100% clear what to do with UBO offsets and indirect
> addressing; fortunately, you should be able to disable most of those
> optimizations to get going so it shouldn't be too bad.  The second
> would be less invasive to NIR because it doesn't require modifying as
> many passes.  However, doing such a lowering would be very tricky to
> get right primarily because of constants.  With everything else, you
> can just sort of assume that inputs are floats (you lowered, right?)
> and lower to produce a float output.  With constants, however, you
> don't know whether or not it's an integer that needs lowering.  We
> could, 

[Mesa-dev] DriConf replacement

2019-01-03 Thread Rithvik Patibandla

Hi,

I am a college senior from India. I have been looking for new open 
source projects that I can contribute to and I found a project idea on 
Xorg's website named `DriConf replacement` [1]. I want to work on this 
project and I was wondering if it is still open and up-for-grabs? If 
yes, can anyone direct me on how to proceed? I built a GUI using Qt as 
part of GSoC 2017 [2] so I have some experience in that area but I am 
new to Xorg, so any help would be appreciated.


[1] https://www.x.org/wiki/SummerOfCodeIdeas/
[2] https://github.com/rithvikp1998/cpd


Thank You

- Rithvik

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 28/53] intel/compiler: add new half-float register type for 3-src instructions

2019-01-03 Thread Iago Toral
On Thu, 2019-01-03 at 10:25 +0200, Pohjolainen, Topi wrote:
> On Thu, Jan 03, 2019 at 07:50:03AM +0100, Iago Toral wrote:
> > On Wed, 2019-01-02 at 11:35 +0200, Pohjolainen, Topi wrote:
> > > On Wed, Dec 19, 2018 at 12:50:56PM +0100, Iago Toral Quiroga
> > > wrote:
> > > > This is available since gen8.
> > > > ---
> > > >  src/intel/compiler/brw_reg_type.c | 35
> > > > +++
> > > >  1 file changed, 31 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/src/intel/compiler/brw_reg_type.c
> > > > b/src/intel/compiler/brw_reg_type.c
> > > > index 60240ba1513..72295a2bd75 100644
> > > > --- a/src/intel/compiler/brw_reg_type.c
> > > > +++ b/src/intel/compiler/brw_reg_type.c
> > > > @@ -138,6 +138,7 @@ enum hw_3src_reg_type {
> > > > GEN7_3SRC_TYPE_D  = 1,
> > > > GEN7_3SRC_TYPE_UD = 2,
> > > > GEN7_3SRC_TYPE_DF = 3,
> > > > +   GEN8_3SRC_TYPE_HF = 4,
> > > >  
> > > > /** When ExecutionDatatype is 1: @{ */
> > > > GEN10_ALIGN1_3SRC_REG_TYPE_HF = 0b000,
> > > > @@ -166,6 +167,14 @@ static const struct hw_3src_type {
> > > > [BRW_REGISTER_TYPE_D]  = { GEN7_3SRC_TYPE_D  },
> > > > [BRW_REGISTER_TYPE_UD] = { GEN7_3SRC_TYPE_UD },
> > > > [BRW_REGISTER_TYPE_DF] = { GEN7_3SRC_TYPE_DF },
> > > > +}, gen8_hw_3src_type[] = {
> > > > +   [0 ... BRW_REGISTER_TYPE_LAST] = { INVALID },
> > > > +
> > > > +   [BRW_REGISTER_TYPE_F]  = { GEN7_3SRC_TYPE_F  },
> > > > +   [BRW_REGISTER_TYPE_D]  = { GEN7_3SRC_TYPE_D  },
> > > > +   [BRW_REGISTER_TYPE_UD] = { GEN7_3SRC_TYPE_UD },
> > > > +   [BRW_REGISTER_TYPE_DF] = { GEN7_3SRC_TYPE_DF },
> > > > +   [BRW_REGISTER_TYPE_HF] = { GEN8_3SRC_TYPE_HF },
> > > >  }, gen10_hw_3src_align1_type[] = {
> > > >  #define E(x) BRW_ALIGN1_3SRC_EXEC_TYPE_##x
> > > > [0 ... BRW_REGISTER_TYPE_LAST] = { INVALID },
> > > > @@ -249,6 +258,20 @@ brw_hw_type_to_reg_type(const struct
> > > > gen_device_info *devinfo,
> > > > unreachable("not reached");
> > > >  }
> > > >  
> > > > +static inline const struct hw_3src_type *
> > > > +get_hw_3src_type_map(const struct gen_device_info *devinfo,
> > > > uint32_t *size)
> > > > +{
> > > > +   if (devinfo->gen < 8) {
> > > > +  if (size)
> > > > + *size = ARRAY_SIZE(gen7_hw_3src_type);
> > > > +  return gen7_hw_3src_type;
> > > > +   } else {
> > > > +  if (size)
> > > > + *size = ARRAY_SIZE(gen8_hw_3src_type);
> > > > +  return gen8_hw_3src_type;
> > > > +   }
> > > > +}
> > > > +
> > > >  /**
> > > >   * Convert a brw_reg_type enumeration value into the hardware
> > > > representation
> > > >   * for a 3-src align16 instruction
> > > > @@ -257,9 +280,11 @@ unsigned
> > > >  brw_reg_type_to_a16_hw_3src_type(const struct gen_device_info
> > > > *devinfo,
> > > >   enum brw_reg_type type)
> > > >  {
> > > > -   assert(type < ARRAY_SIZE(gen7_hw_3src_type));
> > > > -   assert(gen7_hw_3src_type[type].reg_type != (enum
> > > > hw_3src_reg_type)INVALID);
> > > > -   return gen7_hw_3src_type[type].reg_type;
> > > > +   uint32_t map_size;
> > > > +   const struct hw_3src_type *hw_3src_type_map =
> > > > +  get_hw_3src_type_map(devinfo, _size);
> > > > +   assert(hw_3src_type_map[type].reg_type != (enum
> > > > hw_3src_reg_type)INVALID);
> > > > +   return hw_3src_type_map[type].reg_type;
> > > 
> > > I wonder if we should use a style equivalent to
> > > brw_reg_type_to_hw_type() and
> > > brw_hw_type_to_reg_type() and inline the table (or map)
> > > selection:
> > 
> > I don't have a strong opinion, but since we need this in at least
> > two
> > different places I think it is best to have that code in a single
> > function that we can reuse rather than replicating it wherever we
> > need
> > it. I'd be more in favor of changing the other functions to follow
> > a
> > similar pattern for the same reason.
> 
> I don't have a preference either. Having similar logic in both would
> be nice
> though. We could, for example, go with your patch here and as a
> follow-up
> modify the existing. Hence, this patch:
> 
> Reviewed-by: Topi Pohjolainen 

Sure, I'll write that patch.

Iago

> > 
> > Iago
> > 
> > >   const struct hw_type *table;
> > > 
> > >   if (devinfo->gen >= 8) {
> > >  assert(type < ARRAY_SIZE(gen8_hw_3src_type));
> > >  table = gen7_hw_3src_type;
> > >   } else {
> > >  assert(type < ARRAY_SIZE(gen7_hw_3src_type));
> > >  table = gen7_hw_3src_type;
> > >   }
> > > 
> > >   assert(table[type].reg_type != (enum hw_reg_type)INVALID);
> > > 
> > >   return table[type].reg_type;
> > > 
> > > >  }
> > > >  
> > > >  /**
> > > > @@ -283,8 +308,10 @@ enum brw_reg_type
> > > >  brw_a16_hw_3src_type_to_reg_type(const struct gen_device_info
> > > > *devinfo,
> > > >   unsigned hw_type)
> > > >  {
> > > > +   const struct hw_3src_type *hw_3src_type_map =
> > > > +  get_hw_3src_type_map(devinfo, NULL);
> > > > for (enum brw_reg_type i = 0; i 

Re: [Mesa-dev] [PATCH v2 28/53] intel/compiler: add new half-float register type for 3-src instructions

2019-01-03 Thread Pohjolainen, Topi
On Thu, Jan 03, 2019 at 07:50:03AM +0100, Iago Toral wrote:
> On Wed, 2019-01-02 at 11:35 +0200, Pohjolainen, Topi wrote:
> > On Wed, Dec 19, 2018 at 12:50:56PM +0100, Iago Toral Quiroga wrote:
> > > This is available since gen8.
> > > ---
> > >  src/intel/compiler/brw_reg_type.c | 35
> > > +++
> > >  1 file changed, 31 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/src/intel/compiler/brw_reg_type.c
> > > b/src/intel/compiler/brw_reg_type.c
> > > index 60240ba1513..72295a2bd75 100644
> > > --- a/src/intel/compiler/brw_reg_type.c
> > > +++ b/src/intel/compiler/brw_reg_type.c
> > > @@ -138,6 +138,7 @@ enum hw_3src_reg_type {
> > > GEN7_3SRC_TYPE_D  = 1,
> > > GEN7_3SRC_TYPE_UD = 2,
> > > GEN7_3SRC_TYPE_DF = 3,
> > > +   GEN8_3SRC_TYPE_HF = 4,
> > >  
> > > /** When ExecutionDatatype is 1: @{ */
> > > GEN10_ALIGN1_3SRC_REG_TYPE_HF = 0b000,
> > > @@ -166,6 +167,14 @@ static const struct hw_3src_type {
> > > [BRW_REGISTER_TYPE_D]  = { GEN7_3SRC_TYPE_D  },
> > > [BRW_REGISTER_TYPE_UD] = { GEN7_3SRC_TYPE_UD },
> > > [BRW_REGISTER_TYPE_DF] = { GEN7_3SRC_TYPE_DF },
> > > +}, gen8_hw_3src_type[] = {
> > > +   [0 ... BRW_REGISTER_TYPE_LAST] = { INVALID },
> > > +
> > > +   [BRW_REGISTER_TYPE_F]  = { GEN7_3SRC_TYPE_F  },
> > > +   [BRW_REGISTER_TYPE_D]  = { GEN7_3SRC_TYPE_D  },
> > > +   [BRW_REGISTER_TYPE_UD] = { GEN7_3SRC_TYPE_UD },
> > > +   [BRW_REGISTER_TYPE_DF] = { GEN7_3SRC_TYPE_DF },
> > > +   [BRW_REGISTER_TYPE_HF] = { GEN8_3SRC_TYPE_HF },
> > >  }, gen10_hw_3src_align1_type[] = {
> > >  #define E(x) BRW_ALIGN1_3SRC_EXEC_TYPE_##x
> > > [0 ... BRW_REGISTER_TYPE_LAST] = { INVALID },
> > > @@ -249,6 +258,20 @@ brw_hw_type_to_reg_type(const struct
> > > gen_device_info *devinfo,
> > > unreachable("not reached");
> > >  }
> > >  
> > > +static inline const struct hw_3src_type *
> > > +get_hw_3src_type_map(const struct gen_device_info *devinfo,
> > > uint32_t *size)
> > > +{
> > > +   if (devinfo->gen < 8) {
> > > +  if (size)
> > > + *size = ARRAY_SIZE(gen7_hw_3src_type);
> > > +  return gen7_hw_3src_type;
> > > +   } else {
> > > +  if (size)
> > > + *size = ARRAY_SIZE(gen8_hw_3src_type);
> > > +  return gen8_hw_3src_type;
> > > +   }
> > > +}
> > > +
> > >  /**
> > >   * Convert a brw_reg_type enumeration value into the hardware
> > > representation
> > >   * for a 3-src align16 instruction
> > > @@ -257,9 +280,11 @@ unsigned
> > >  brw_reg_type_to_a16_hw_3src_type(const struct gen_device_info
> > > *devinfo,
> > >   enum brw_reg_type type)
> > >  {
> > > -   assert(type < ARRAY_SIZE(gen7_hw_3src_type));
> > > -   assert(gen7_hw_3src_type[type].reg_type != (enum
> > > hw_3src_reg_type)INVALID);
> > > -   return gen7_hw_3src_type[type].reg_type;
> > > +   uint32_t map_size;
> > > +   const struct hw_3src_type *hw_3src_type_map =
> > > +  get_hw_3src_type_map(devinfo, _size);
> > > +   assert(hw_3src_type_map[type].reg_type != (enum
> > > hw_3src_reg_type)INVALID);
> > > +   return hw_3src_type_map[type].reg_type;
> > 
> > I wonder if we should use a style equivalent to
> > brw_reg_type_to_hw_type() and
> > brw_hw_type_to_reg_type() and inline the table (or map) selection:
> 
> I don't have a strong opinion, but since we need this in at least two
> different places I think it is best to have that code in a single
> function that we can reuse rather than replicating it wherever we need
> it. I'd be more in favor of changing the other functions to follow a
> similar pattern for the same reason.

I don't have a preference either. Having similar logic in both would be nice
though. We could, for example, go with your patch here and as a follow-up
modify the existing. Hence, this patch:

Reviewed-by: Topi Pohjolainen 

> 
> Iago
> 
> >   const struct hw_type *table;
> > 
> >   if (devinfo->gen >= 8) {
> >  assert(type < ARRAY_SIZE(gen8_hw_3src_type));
> >  table = gen7_hw_3src_type;
> >   } else {
> >  assert(type < ARRAY_SIZE(gen7_hw_3src_type));
> >  table = gen7_hw_3src_type;
> >   }
> > 
> >   assert(table[type].reg_type != (enum hw_reg_type)INVALID);
> > 
> >   return table[type].reg_type;
> > 
> > >  }
> > >  
> > >  /**
> > > @@ -283,8 +308,10 @@ enum brw_reg_type
> > >  brw_a16_hw_3src_type_to_reg_type(const struct gen_device_info
> > > *devinfo,
> > >   unsigned hw_type)
> > >  {
> > > +   const struct hw_3src_type *hw_3src_type_map =
> > > +  get_hw_3src_type_map(devinfo, NULL);
> > > for (enum brw_reg_type i = 0; i <= BRW_REGISTER_TYPE_LAST; i++)
> > > {
> > > -  if (gen7_hw_3src_type[i].reg_type == hw_type) {
> > > +  if (hw_3src_type_map[i].reg_type == hw_type) {
> > >   return i;
> > >}
> > > }
> > > -- 
> > > 2.17.1
> > > 
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org

Re: [Mesa-dev] [PATCH v2 13/53] intel/compiler: add a helper to handle conversions to 64-bit in atom

2019-01-03 Thread Iago Toral
Thanks Curro, I'll check it out.

Iago

On Wed, 2019-01-02 at 15:00 -0800, Francisco Jerez wrote:
> This patch is redundant with the regioning lowering pass I sent a few
> days ago [1].  The problem with this approach is that on the one hand
> it's easy for the back-end compiler to cause code which was legalized
> at
> NIR translation time to become illegal again accidentally, on the
> other
> hand there's the maintainability issue of having workarounds for the
> exact same restriction scattered all over the place.
> 
> Please drop it, there shouldn't be any need to do manual workarounds
> at
> NIR translation time for the CHV/BXT regioning restrictions to be
> honored anymore.
> 
> [1] 
> https://lists.freedesktop.org/archives/mesa-dev/2018-December/212775.html
> 
> Iago Toral Quiroga  writes:
> 
> > ---
> >  src/intel/compiler/brw_fs_nir.cpp | 55 ++-
> > 
> >  1 file changed, 33 insertions(+), 22 deletions(-)
> > 
> > diff --git a/src/intel/compiler/brw_fs_nir.cpp
> > b/src/intel/compiler/brw_fs_nir.cpp
> > index 92ec85a27cc..15715651aa6 100644
> > --- a/src/intel/compiler/brw_fs_nir.cpp
> > +++ b/src/intel/compiler/brw_fs_nir.cpp
> > @@ -664,6 +664,38 @@ brw_rnd_mode_from_nir_op (const nir_op op) {
> > }
> >  }
> >  
> > +static bool
> > +fixup_64bit_conversion(const fs_builder ,
> > +   fs_reg dst, fs_reg src,
> > +   bool saturate,
> > +   const struct gen_device_info *devinfo)
> > +{
> > +   /* CHV PRM, vol07, 3D Media GPGPU Engine, Register Region
> > Restrictions:
> > +*
> > +*"When source or destination is 64b (...), regioning in
> > Align1
> > +* must follow these rules:
> > +*
> > +* 1. Source and destination horizontal stride must be
> > aligned to
> > +*the same qword.
> > +* (...)"
> > +*
> > +* This means that conversions from bit-sizes smaller than 64-
> > bit to
> > +* 64-bit need to have the source data elements aligned to 64-
> > bit.
> > +* This restriction does not apply to BDW and later.
> > +*/
> > +   if (type_sz(dst.type) == 8 && type_sz(src.type) < 8 &&
> > +   (devinfo->is_cherryview ||
> > gen_device_info_is_9lp(devinfo))) {
> > +  fs_reg tmp = bld.vgrf(dst.type, 1);
> > +  tmp = subscript(tmp, src.type, 0);
> > +  bld.MOV(tmp, src);
> > +  fs_inst *inst = bld.MOV(dst, tmp);
> > +  inst->saturate = saturate;
> > +  return true;
> > +   }
> > +
> > +   return false;
> > +}
> > +
> >  void
> >  fs_visitor::nir_emit_alu(const fs_builder , nir_alu_instr
> > *instr)
> >  {
> > @@ -805,29 +837,8 @@ fs_visitor::nir_emit_alu(const fs_builder
> > , nir_alu_instr *instr)
> > case nir_op_i2i64:
> > case nir_op_u2f64:
> > case nir_op_u2u64:
> > -  /* CHV PRM, vol07, 3D Media GPGPU Engine, Register Region
> > Restrictions:
> > -   *
> > -   *"When source or destination is 64b (...), regioning in
> > Align1
> > -   * must follow these rules:
> > -   *
> > -   * 1. Source and destination horizontal stride must be
> > aligned to
> > -   *the same qword.
> > -   * (...)"
> > -   *
> > -   * This means that conversions from bit-sizes smaller than
> > 64-bit to
> > -   * 64-bit need to have the source data elements aligned to
> > 64-bit.
> > -   * This restriction does not apply to BDW and later.
> > -   */
> > -  if (nir_dest_bit_size(instr->dest.dest) == 64 &&
> > -  nir_src_bit_size(instr->src[0].src) < 64 &&
> > -  (devinfo->is_cherryview ||
> > gen_device_info_is_9lp(devinfo))) {
> > - fs_reg tmp = bld.vgrf(result.type, 1);
> > - tmp = subscript(tmp, op[0].type, 0);
> > - inst = bld.MOV(tmp, op[0]);
> > - inst = bld.MOV(result, tmp);
> > - inst->saturate = instr->dest.saturate;
> > +  if (fixup_64bit_conversion(bld, result, op[0], instr-
> > >dest.saturate, devinfo))
> >   break;
> > -  }
> >/* fallthrough */
> > case nir_op_f2f32:
> > case nir_op_f2i32:
> > -- 
> > 2.17.1
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev