Re: [Mesa-dev] [PATCH v3 1/2] nir: Add nir_lower_viewport_transform

2019-04-12 Thread Jason Ekstrand
On Fri, Apr 12, 2019 at 6:46 PM Alyssa Rosenzweig 
wrote:

> On Mali hardware (supported by Panfrost and Lima), the fixed-function
> transformation from world-space to screen-space coordinates is done in
> the vertex shader prior to writing out the gl_Position varying, rather
> than in dedicated hardware. This commit adds a shared NIR pass for
> implementing coordinate transformation and lowering gl_Position writes
> into screen-space gl_Position writes.
>
> v2: Run directly on derefs before io/vars are lowered to cleanup the
> code substantially. Thank you to Qiang for this suggestion!
>
> v3: Bikeshed continues.
>
> Signed-off-by: Alyssa Rosenzweig 
> Suggested-by: Qiang Yu 
> Cc: Jason Ekstrand 
> Cc: Eric Anholt 
> ---
>  src/compiler/nir/meson.build  |   1 +
>  src/compiler/nir/nir.h|   1 +
>  .../nir/nir_lower_viewport_transform.c| 101 ++
>

For some short period of time, we still build with autotools.  Please add
this to Makefile.sources.  That's also required for Android (which may
actually be applicable for Mali)


>  3 files changed, 103 insertions(+)
>  create mode 100644 src/compiler/nir/nir_lower_viewport_transform.c
>
> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
> index c65f2ff62ff..c274361bdc4 100644
> --- a/src/compiler/nir/meson.build
> +++ b/src/compiler/nir/meson.build
> @@ -151,6 +151,7 @@ files_libnir = files(
>'nir_lower_vars_to_ssa.c',
>'nir_lower_var_copies.c',
>'nir_lower_vec_to_movs.c',
> +  'nir_lower_viewport_transform.c',
>'nir_lower_wpos_center.c',
>'nir_lower_wpos_ytransform.c',
>'nir_lower_bit_size.c',
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index bc72d8f83f5..0f6ed734efa 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -3124,6 +3124,7 @@ void nir_lower_io_to_scalar(nir_shader *shader,
> nir_variable_mode mask);
>  void nir_lower_io_to_scalar_early(nir_shader *shader, nir_variable_mode
> mask);
>  bool nir_lower_io_to_vector(nir_shader *shader, nir_variable_mode mask);
>
> +void nir_lower_viewport_transform(nir_shader *shader);
>  bool nir_lower_uniforms_to_ubo(nir_shader *shader, int multiplier);
>
>  typedef struct nir_lower_subgroups_options {
> diff --git a/src/compiler/nir/nir_lower_viewport_transform.c
> b/src/compiler/nir/nir_lower_viewport_transform.c
> new file mode 100644
> index 000..66085b8da5a
> --- /dev/null
> +++ b/src/compiler/nir/nir_lower_viewport_transform.c
> @@ -0,0 +1,101 @@
> +/*
> + * Copyright (C) 2019 Alyssa Rosenzweig
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> "Software"),
> + * to deal in the Software without restriction, including without
> limitation
> + * the rights to use, copy, modify, merge, publish, distribute,
> sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> next
> + * paragraph) shall be included in all copies or substantial portions of
> the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
> SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +/* On some hardware (particularly, all current versions of Mali GPUs),
> + * vertex shaders do not output gl_Position in world-space. Instead, they
> + * output gl_Position in transformed screen space via the "pseudo"
> + * position varying. Thus, this pass finds writes to gl_Position and
> + * changes them to transformed writes, still to gl_Position. The
> + * outputted screen space is still written back to VARYING_SLOT_POS,
> + * which is semantically ambiguous but nevertheless a good match for
> + * Gallium/NIR/Mali.
> + *
> + * Implements coordinate transformation as defined in section 12.5
> + * "Coordinate Transformation" of the OpenGL ES 3.2 full specification.
> + *
> + * This pass must run before lower_vars/lower_io such that derefs are
> + * still in place.
> + */
> +
> +#include "nir/nir.h"
> +#include "nir/nir_builder.h"
> +
> +void
> +nir_lower_viewport_transform(nir_shader *shader)
> +{
> +   assert(shader->info.stage == MESA_SHADER_VERTEX);
> +
> +   nir_foreach_function(func, shader) {
> +  nir_foreach_block(block, func->impl) {
> + nir_foreach_instr_safe(instr, block) {
> +if (instr->type != nir_instr_type_intrinsic)
> +   

Re: [Mesa-dev] [PATCH v3 1/2] nir: Add nir_lower_viewport_transform

2019-04-12 Thread Qiang Yu
Patch series are:
Reviewed-by: Qiang Yu 

Regards,
Qiang


On Sat, Apr 13, 2019 at 7:46 AM Alyssa Rosenzweig  wrote:
>
> On Mali hardware (supported by Panfrost and Lima), the fixed-function
> transformation from world-space to screen-space coordinates is done in
> the vertex shader prior to writing out the gl_Position varying, rather
> than in dedicated hardware. This commit adds a shared NIR pass for
> implementing coordinate transformation and lowering gl_Position writes
> into screen-space gl_Position writes.
>
> v2: Run directly on derefs before io/vars are lowered to cleanup the
> code substantially. Thank you to Qiang for this suggestion!
>
> v3: Bikeshed continues.
>
> Signed-off-by: Alyssa Rosenzweig 
> Suggested-by: Qiang Yu 
> Cc: Jason Ekstrand 
> Cc: Eric Anholt 
> ---
>  src/compiler/nir/meson.build  |   1 +
>  src/compiler/nir/nir.h|   1 +
>  .../nir/nir_lower_viewport_transform.c| 101 ++
>  3 files changed, 103 insertions(+)
>  create mode 100644 src/compiler/nir/nir_lower_viewport_transform.c
>
> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
> index c65f2ff62ff..c274361bdc4 100644
> --- a/src/compiler/nir/meson.build
> +++ b/src/compiler/nir/meson.build
> @@ -151,6 +151,7 @@ files_libnir = files(
>'nir_lower_vars_to_ssa.c',
>'nir_lower_var_copies.c',
>'nir_lower_vec_to_movs.c',
> +  'nir_lower_viewport_transform.c',
>'nir_lower_wpos_center.c',
>'nir_lower_wpos_ytransform.c',
>'nir_lower_bit_size.c',
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index bc72d8f83f5..0f6ed734efa 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -3124,6 +3124,7 @@ void nir_lower_io_to_scalar(nir_shader *shader, 
> nir_variable_mode mask);
>  void nir_lower_io_to_scalar_early(nir_shader *shader, nir_variable_mode 
> mask);
>  bool nir_lower_io_to_vector(nir_shader *shader, nir_variable_mode mask);
>
> +void nir_lower_viewport_transform(nir_shader *shader);
>  bool nir_lower_uniforms_to_ubo(nir_shader *shader, int multiplier);
>
>  typedef struct nir_lower_subgroups_options {
> diff --git a/src/compiler/nir/nir_lower_viewport_transform.c 
> b/src/compiler/nir/nir_lower_viewport_transform.c
> new file mode 100644
> index 000..66085b8da5a
> --- /dev/null
> +++ b/src/compiler/nir/nir_lower_viewport_transform.c
> @@ -0,0 +1,101 @@
> +/*
> + * Copyright (C) 2019 Alyssa Rosenzweig
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +/* On some hardware (particularly, all current versions of Mali GPUs),
> + * vertex shaders do not output gl_Position in world-space. Instead, they
> + * output gl_Position in transformed screen space via the "pseudo"
> + * position varying. Thus, this pass finds writes to gl_Position and
> + * changes them to transformed writes, still to gl_Position. The
> + * outputted screen space is still written back to VARYING_SLOT_POS,
> + * which is semantically ambiguous but nevertheless a good match for
> + * Gallium/NIR/Mali.
> + *
> + * Implements coordinate transformation as defined in section 12.5
> + * "Coordinate Transformation" of the OpenGL ES 3.2 full specification.
> + *
> + * This pass must run before lower_vars/lower_io such that derefs are
> + * still in place.
> + */
> +
> +#include "nir/nir.h"
> +#include "nir/nir_builder.h"
> +
> +void
> +nir_lower_viewport_transform(nir_shader *shader)
> +{
> +   assert(shader->info.stage == MESA_SHADER_VERTEX);
> +
> +   nir_foreach_function(func, shader) {
> +  nir_foreach_block(block, func->impl) {
> + nir_foreach_instr_safe(instr, block) {
> +if (instr->type != nir_instr_type_intrinsic)
> +   continue;
> +
> +nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);
> +if (intr->intrinsic != 

Re: [Mesa-dev] [PATCH v3 1/2] nir: Add nir_lower_viewport_transform

2019-04-12 Thread Ian Romanick
This patch is

Reviewed-by: Ian Romanick 

On 4/12/19 4:46 PM, Alyssa Rosenzweig wrote:
> On Mali hardware (supported by Panfrost and Lima), the fixed-function
> transformation from world-space to screen-space coordinates is done in
> the vertex shader prior to writing out the gl_Position varying, rather
> than in dedicated hardware. This commit adds a shared NIR pass for
> implementing coordinate transformation and lowering gl_Position writes
> into screen-space gl_Position writes.
> 
> v2: Run directly on derefs before io/vars are lowered to cleanup the
> code substantially. Thank you to Qiang for this suggestion!
> 
> v3: Bikeshed continues.
> 
> Signed-off-by: Alyssa Rosenzweig 
> Suggested-by: Qiang Yu 
> Cc: Jason Ekstrand 
> Cc: Eric Anholt 
> ---
>  src/compiler/nir/meson.build  |   1 +
>  src/compiler/nir/nir.h|   1 +
>  .../nir/nir_lower_viewport_transform.c| 101 ++
>  3 files changed, 103 insertions(+)
>  create mode 100644 src/compiler/nir/nir_lower_viewport_transform.c
> 
> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
> index c65f2ff62ff..c274361bdc4 100644
> --- a/src/compiler/nir/meson.build
> +++ b/src/compiler/nir/meson.build
> @@ -151,6 +151,7 @@ files_libnir = files(
>'nir_lower_vars_to_ssa.c',
>'nir_lower_var_copies.c',
>'nir_lower_vec_to_movs.c',
> +  'nir_lower_viewport_transform.c',
>'nir_lower_wpos_center.c',
>'nir_lower_wpos_ytransform.c',
>'nir_lower_bit_size.c',
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index bc72d8f83f5..0f6ed734efa 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -3124,6 +3124,7 @@ void nir_lower_io_to_scalar(nir_shader *shader, 
> nir_variable_mode mask);
>  void nir_lower_io_to_scalar_early(nir_shader *shader, nir_variable_mode 
> mask);
>  bool nir_lower_io_to_vector(nir_shader *shader, nir_variable_mode mask);
>  
> +void nir_lower_viewport_transform(nir_shader *shader);
>  bool nir_lower_uniforms_to_ubo(nir_shader *shader, int multiplier);
>  
>  typedef struct nir_lower_subgroups_options {
> diff --git a/src/compiler/nir/nir_lower_viewport_transform.c 
> b/src/compiler/nir/nir_lower_viewport_transform.c
> new file mode 100644
> index 000..66085b8da5a
> --- /dev/null
> +++ b/src/compiler/nir/nir_lower_viewport_transform.c
> @@ -0,0 +1,101 @@
> +/*
> + * Copyright (C) 2019 Alyssa Rosenzweig
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +/* On some hardware (particularly, all current versions of Mali GPUs),
> + * vertex shaders do not output gl_Position in world-space. Instead, they
> + * output gl_Position in transformed screen space via the "pseudo"
> + * position varying. Thus, this pass finds writes to gl_Position and
> + * changes them to transformed writes, still to gl_Position. The
> + * outputted screen space is still written back to VARYING_SLOT_POS,
> + * which is semantically ambiguous but nevertheless a good match for
> + * Gallium/NIR/Mali.
> + *
> + * Implements coordinate transformation as defined in section 12.5
> + * "Coordinate Transformation" of the OpenGL ES 3.2 full specification.
> + *
> + * This pass must run before lower_vars/lower_io such that derefs are
> + * still in place.
> + */
> +
> +#include "nir/nir.h"
> +#include "nir/nir_builder.h"
> +
> +void
> +nir_lower_viewport_transform(nir_shader *shader)
> +{
> +   assert(shader->info.stage == MESA_SHADER_VERTEX);
> +
> +   nir_foreach_function(func, shader) {
> +  nir_foreach_block(block, func->impl) {
> + nir_foreach_instr_safe(instr, block) {
> +if (instr->type != nir_instr_type_intrinsic)
> +   continue;
> +
> +nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);
> +if (intr->intrinsic != nir_intrinsic_store_deref)
> + 

Re: [Mesa-dev] [PATCH v2 1/2] nir: Add nir_lower_viewport_transform

2019-04-12 Thread Ian Romanick
On 4/12/19 5:11 PM, Ian Romanick wrote:
> On 4/8/19 5:34 AM, Thomas Helland wrote:
>> man. 8. apr. 2019 kl. 06:30 skrev Alyssa Rosenzweig :
>>>
>>> On Mali hardware (supported by Panfrost and Lima), the fixed-function
>>> transformation from world-space to screen-space coordinates is done in
>>> the vertex shader prior to writing out the gl_Position varying, rather
>>> than in dedicated hardware. This commit adds a shared NIR pass for
>>> implementing coordinate transformation and lowering gl_Position writes
>>> into screen-space gl_Position writes.
>>>
>>> v2: Run directly on derefs before io/vars are lowered to cleanup the
>>> code substantially. Thank you to Qiang for this suggestion!
>>>
>>> Signed-off-by: Alyssa Rosenzweig 
>>> Suggested-by: Qiang Yu 
>>> Cc: Jason Ekstrand 
>>> Cc: Eric Anholt 
>>> ---
>>>  src/compiler/nir/meson.build  |  1 +
>>>  src/compiler/nir/nir.h|  1 +
>>>  .../nir/nir_lower_viewport_transform.c| 98 +++
>>>  3 files changed, 100 insertions(+)
>>>  create mode 100644 src/compiler/nir/nir_lower_viewport_transform.c
>>>
>>> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
>>> index c65f2ff62ff..c274361bdc4 100644
>>> --- a/src/compiler/nir/meson.build
>>> +++ b/src/compiler/nir/meson.build
>>> @@ -151,6 +151,7 @@ files_libnir = files(
>>>'nir_lower_vars_to_ssa.c',
>>>'nir_lower_var_copies.c',
>>>'nir_lower_vec_to_movs.c',
>>> +  'nir_lower_viewport_transform.c',
>>>'nir_lower_wpos_center.c',
>>>'nir_lower_wpos_ytransform.c',
>>>'nir_lower_bit_size.c',
>>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>>> index bc72d8f83f5..0f6ed734efa 100644
>>> --- a/src/compiler/nir/nir.h
>>> +++ b/src/compiler/nir/nir.h
>>> @@ -3124,6 +3124,7 @@ void nir_lower_io_to_scalar(nir_shader *shader, 
>>> nir_variable_mode mask);
>>>  void nir_lower_io_to_scalar_early(nir_shader *shader, nir_variable_mode 
>>> mask);
>>>  bool nir_lower_io_to_vector(nir_shader *shader, nir_variable_mode mask);
>>>
>>> +void nir_lower_viewport_transform(nir_shader *shader);
>>>  bool nir_lower_uniforms_to_ubo(nir_shader *shader, int multiplier);
>>>
>>>  typedef struct nir_lower_subgroups_options {
>>> diff --git a/src/compiler/nir/nir_lower_viewport_transform.c 
>>> b/src/compiler/nir/nir_lower_viewport_transform.c
>>> new file mode 100644
>>> index 000..9646b72c053
>>> --- /dev/null
>>> +++ b/src/compiler/nir/nir_lower_viewport_transform.c
>>> @@ -0,0 +1,98 @@
>>> +/*
>>> + * Copyright (C) 2019 Alyssa Rosenzweig
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>> + * copy of this software and associated documentation files (the 
>>> "Software"),
>>> + * to deal in the Software without restriction, including without 
>>> limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>> + * Software is furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice (including the 
>>> next
>>> + * paragraph) shall be included in all copies or substantial portions of 
>>> the
>>> + * Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>>> OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
>>> OTHER
>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
>>> DEALINGS
>>> + * IN THE SOFTWARE.
>>> + */
>>> +
>>> +/* On some hardware (particularly, all current versions of Mali GPUs),
>>> + * vertex shaders do not output gl_Position in world-space. Instead, they
>>> + * output gl_Position in transformed screen space via the "pseudo"
>>> + * position varying. Thus, this pass finds writes to gl_Position and
>>> + * changes them to transformed writes, still to gl_Position. The
>>> + * outputted screen space is still written back to VARYING_SLOT_POS,
>>> + * which is semantically ambiguous but nevertheless a good match for
>>> + * Gallium/NIR/Mali.
>>> + *
>>> + * Implements coordinate transformation as defined in section 12.5
>>> + * "Coordinate Transformation" of the OpenGL ES 3.2 full specification.
>>> + *
>>> + * This pass must run before lower_vars/lower_io such that derefs are
>>> + * still in place.
>>> + */
>>> +
>>> +#include "nir/nir.h"
>>> +#include "nir/nir_builder.h"
>>> +
>>> +void
>>> +nir_lower_viewport_transform(nir_shader *shader)
>>> +{
>>> +   assert(shader->info.stage == MESA_SHADER_VERTEX);
>>> +
>>> +   nir_foreach_function(func, shader) {
>>> +  nir_foreach_block(block, func->impl) {
>>> + 

Re: [Mesa-dev] [PATCH v2 1/2] nir: Add nir_lower_viewport_transform

2019-04-12 Thread Ian Romanick
On 4/8/19 5:34 AM, Thomas Helland wrote:
> man. 8. apr. 2019 kl. 06:30 skrev Alyssa Rosenzweig :
>>
>> On Mali hardware (supported by Panfrost and Lima), the fixed-function
>> transformation from world-space to screen-space coordinates is done in
>> the vertex shader prior to writing out the gl_Position varying, rather
>> than in dedicated hardware. This commit adds a shared NIR pass for
>> implementing coordinate transformation and lowering gl_Position writes
>> into screen-space gl_Position writes.
>>
>> v2: Run directly on derefs before io/vars are lowered to cleanup the
>> code substantially. Thank you to Qiang for this suggestion!
>>
>> Signed-off-by: Alyssa Rosenzweig 
>> Suggested-by: Qiang Yu 
>> Cc: Jason Ekstrand 
>> Cc: Eric Anholt 
>> ---
>>  src/compiler/nir/meson.build  |  1 +
>>  src/compiler/nir/nir.h|  1 +
>>  .../nir/nir_lower_viewport_transform.c| 98 +++
>>  3 files changed, 100 insertions(+)
>>  create mode 100644 src/compiler/nir/nir_lower_viewport_transform.c
>>
>> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
>> index c65f2ff62ff..c274361bdc4 100644
>> --- a/src/compiler/nir/meson.build
>> +++ b/src/compiler/nir/meson.build
>> @@ -151,6 +151,7 @@ files_libnir = files(
>>'nir_lower_vars_to_ssa.c',
>>'nir_lower_var_copies.c',
>>'nir_lower_vec_to_movs.c',
>> +  'nir_lower_viewport_transform.c',
>>'nir_lower_wpos_center.c',
>>'nir_lower_wpos_ytransform.c',
>>'nir_lower_bit_size.c',
>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> index bc72d8f83f5..0f6ed734efa 100644
>> --- a/src/compiler/nir/nir.h
>> +++ b/src/compiler/nir/nir.h
>> @@ -3124,6 +3124,7 @@ void nir_lower_io_to_scalar(nir_shader *shader, 
>> nir_variable_mode mask);
>>  void nir_lower_io_to_scalar_early(nir_shader *shader, nir_variable_mode 
>> mask);
>>  bool nir_lower_io_to_vector(nir_shader *shader, nir_variable_mode mask);
>>
>> +void nir_lower_viewport_transform(nir_shader *shader);
>>  bool nir_lower_uniforms_to_ubo(nir_shader *shader, int multiplier);
>>
>>  typedef struct nir_lower_subgroups_options {
>> diff --git a/src/compiler/nir/nir_lower_viewport_transform.c 
>> b/src/compiler/nir/nir_lower_viewport_transform.c
>> new file mode 100644
>> index 000..9646b72c053
>> --- /dev/null
>> +++ b/src/compiler/nir/nir_lower_viewport_transform.c
>> @@ -0,0 +1,98 @@
>> +/*
>> + * Copyright (C) 2019 Alyssa Rosenzweig
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the next
>> + * paragraph) shall be included in all copies or substantial portions of the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>> OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
>> OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
>> DEALINGS
>> + * IN THE SOFTWARE.
>> + */
>> +
>> +/* On some hardware (particularly, all current versions of Mali GPUs),
>> + * vertex shaders do not output gl_Position in world-space. Instead, they
>> + * output gl_Position in transformed screen space via the "pseudo"
>> + * position varying. Thus, this pass finds writes to gl_Position and
>> + * changes them to transformed writes, still to gl_Position. The
>> + * outputted screen space is still written back to VARYING_SLOT_POS,
>> + * which is semantically ambiguous but nevertheless a good match for
>> + * Gallium/NIR/Mali.
>> + *
>> + * Implements coordinate transformation as defined in section 12.5
>> + * "Coordinate Transformation" of the OpenGL ES 3.2 full specification.
>> + *
>> + * This pass must run before lower_vars/lower_io such that derefs are
>> + * still in place.
>> + */
>> +
>> +#include "nir/nir.h"
>> +#include "nir/nir_builder.h"
>> +
>> +void
>> +nir_lower_viewport_transform(nir_shader *shader)
>> +{
>> +   assert(shader->info.stage == MESA_SHADER_VERTEX);
>> +
>> +   nir_foreach_function(func, shader) {
>> +  nir_foreach_block(block, func->impl) {
>> + nir_foreach_instr_safe(instr, block) {
>> +if (instr->type != nir_instr_type_intrinsic) continue;
>> +
>> +nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);
>> +  

Re: [Mesa-dev] [PATCH v2 1/2] nir: Add nir_lower_viewport_transform

2019-04-12 Thread Alyssa Rosenzweig
> I believe it's agreement in mesa that the if ( ... ) should be on one line,
> and the continue, return, etc should be on a new one.

Fixed in v3.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 1/2] nir: Add nir_lower_viewport_transform

2019-04-12 Thread Alyssa Rosenzweig
> This should be done per nir function, and I suggest to divide this function
> into more sub-functions to follow conventions of other nir_lower_*
> implementation.

You only should be writing gl_Position once anyway; why reinit for
functions that likely won't use them?

lower_alpha_test does it this way; it's fine.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 2/2] panfrost/midgard: Use shared nir_lower_viewport_transform

2019-04-12 Thread Alyssa Rosenzweig
v2: Run before lowering I/O.

Signed-off-by: Alyssa Rosenzweig 
---
 .../panfrost/midgard/midgard_compile.c| 105 +-
 1 file changed, 4 insertions(+), 101 deletions(-)

diff --git a/src/gallium/drivers/panfrost/midgard/midgard_compile.c 
b/src/gallium/drivers/panfrost/midgard/midgard_compile.c
index f91fa972246..5216cc124fa 100644
--- a/src/gallium/drivers/panfrost/midgard/midgard_compile.c
+++ b/src/gallium/drivers/panfrost/midgard/midgard_compile.c
@@ -3059,100 +3059,6 @@ actualise_ssa_to_alias(compiler_context *ctx)
 emit_leftover_move(ctx);
 }
 
-/* Vertex shaders do not write gl_Position as is; instead, they write a
- * transformed screen space position as a varying. See section 12.5 "Coordinate
- * Transformation" of the ES 3.2 full specification for details.
- *
- * This transformation occurs early on, as NIR and prior to optimisation, in
- * order to take advantage of NIR optimisation passes of the transform itself.
- * */
-
-static void
-write_transformed_position(nir_builder *b, nir_src input_point_src)
-{
-nir_ssa_def *input_point = nir_ssa_for_src(b, input_point_src, 4);
-nir_ssa_def *scale = nir_load_viewport_scale(b);
-nir_ssa_def *offset = nir_load_viewport_offset(b);
-
-/* World space to normalised device coordinates to screen space */
-
-nir_ssa_def *w_recip = nir_frcp(b, nir_channel(b, input_point, 3));
-nir_ssa_def *ndc_point = nir_fmul(b, nir_channels(b, input_point, 
0x7), w_recip);
-nir_ssa_def *screen = nir_fadd(b, nir_fmul(b, ndc_point, scale), 
offset);
-
-/* gl_Position will be written out in screenspace xyz, with w set to
- * the reciprocal we computed earlier. The transformed w component is
- * then used for perspective-correct varying interpolation. The
- * transformed w component must preserve its original sign; this is
- * used in depth clipping computations */
-
-nir_ssa_def *screen_space = nir_vec4(b,
- nir_channel(b, screen, 0),
- nir_channel(b, screen, 1),
- nir_channel(b, screen, 2),
- w_recip);
-
-/* Finally, write out the transformed values to the varying */
-
-nir_intrinsic_instr *store;
-store = nir_intrinsic_instr_create(b->shader, 
nir_intrinsic_store_output);
-store->num_components = 4;
-nir_intrinsic_set_base(store, 0);
-nir_intrinsic_set_write_mask(store, 0xf);
-store->src[0].ssa = screen_space;
-store->src[0].is_ssa = true;
-store->src[1] = nir_src_for_ssa(nir_imm_int(b, 0));
-nir_builder_instr_insert(b, >instr);
-}
-
-static void
-transform_position_writes(nir_shader *shader)
-{
-nir_foreach_function(func, shader) {
-nir_foreach_block(block, func->impl) {
-nir_foreach_instr_safe(instr, block) {
-if (instr->type != nir_instr_type_intrinsic) 
continue;
-
-nir_intrinsic_instr *intr = 
nir_instr_as_intrinsic(instr);
-nir_variable *out = NULL;
-
-switch (intr->intrinsic) {
-case nir_intrinsic_store_output:
-/* already had i/o lowered.. lookup 
the matching output var: */
-nir_foreach_variable(var, 
>outputs) {
-int drvloc = 
var->data.driver_location;
-
-if (nir_intrinsic_base(intr) 
== drvloc) {
-out = var;
-break;
-}
-}
-
-break;
-
-default:
-break;
-}
-
-if (!out) continue;
-
-if (out->data.mode != nir_var_shader_out)
-continue;
-
-if (out->data.location != VARYING_SLOT_POS)
-continue;
-
-nir_builder b;
-nir_builder_init(, func->impl);
-b.cursor = nir_before_instr(instr);
-
-write_transformed_position(, intr->src[0]);
-nir_instr_remove(instr);
-}
-}
-}
-}
-
 static void
 emit_fragment_epilogue(compiler_context *ctx)
 {
@@ -3522,7 +3428,10 @@ midgard_compile_shader_nir(nir_shader *nir, 

[Mesa-dev] [PATCH v3 1/2] nir: Add nir_lower_viewport_transform

2019-04-12 Thread Alyssa Rosenzweig
On Mali hardware (supported by Panfrost and Lima), the fixed-function
transformation from world-space to screen-space coordinates is done in
the vertex shader prior to writing out the gl_Position varying, rather
than in dedicated hardware. This commit adds a shared NIR pass for
implementing coordinate transformation and lowering gl_Position writes
into screen-space gl_Position writes.

v2: Run directly on derefs before io/vars are lowered to cleanup the
code substantially. Thank you to Qiang for this suggestion!

v3: Bikeshed continues.

Signed-off-by: Alyssa Rosenzweig 
Suggested-by: Qiang Yu 
Cc: Jason Ekstrand 
Cc: Eric Anholt 
---
 src/compiler/nir/meson.build  |   1 +
 src/compiler/nir/nir.h|   1 +
 .../nir/nir_lower_viewport_transform.c| 101 ++
 3 files changed, 103 insertions(+)
 create mode 100644 src/compiler/nir/nir_lower_viewport_transform.c

diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
index c65f2ff62ff..c274361bdc4 100644
--- a/src/compiler/nir/meson.build
+++ b/src/compiler/nir/meson.build
@@ -151,6 +151,7 @@ files_libnir = files(
   'nir_lower_vars_to_ssa.c',
   'nir_lower_var_copies.c',
   'nir_lower_vec_to_movs.c',
+  'nir_lower_viewport_transform.c',
   'nir_lower_wpos_center.c',
   'nir_lower_wpos_ytransform.c',
   'nir_lower_bit_size.c',
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index bc72d8f83f5..0f6ed734efa 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -3124,6 +3124,7 @@ void nir_lower_io_to_scalar(nir_shader *shader, 
nir_variable_mode mask);
 void nir_lower_io_to_scalar_early(nir_shader *shader, nir_variable_mode mask);
 bool nir_lower_io_to_vector(nir_shader *shader, nir_variable_mode mask);
 
+void nir_lower_viewport_transform(nir_shader *shader);
 bool nir_lower_uniforms_to_ubo(nir_shader *shader, int multiplier);
 
 typedef struct nir_lower_subgroups_options {
diff --git a/src/compiler/nir/nir_lower_viewport_transform.c 
b/src/compiler/nir/nir_lower_viewport_transform.c
new file mode 100644
index 000..66085b8da5a
--- /dev/null
+++ b/src/compiler/nir/nir_lower_viewport_transform.c
@@ -0,0 +1,101 @@
+/*
+ * Copyright (C) 2019 Alyssa Rosenzweig
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/* On some hardware (particularly, all current versions of Mali GPUs),
+ * vertex shaders do not output gl_Position in world-space. Instead, they
+ * output gl_Position in transformed screen space via the "pseudo"
+ * position varying. Thus, this pass finds writes to gl_Position and
+ * changes them to transformed writes, still to gl_Position. The
+ * outputted screen space is still written back to VARYING_SLOT_POS,
+ * which is semantically ambiguous but nevertheless a good match for
+ * Gallium/NIR/Mali.
+ *
+ * Implements coordinate transformation as defined in section 12.5
+ * "Coordinate Transformation" of the OpenGL ES 3.2 full specification.
+ *
+ * This pass must run before lower_vars/lower_io such that derefs are
+ * still in place.
+ */
+
+#include "nir/nir.h"
+#include "nir/nir_builder.h"
+
+void
+nir_lower_viewport_transform(nir_shader *shader)
+{
+   assert(shader->info.stage == MESA_SHADER_VERTEX);
+
+   nir_foreach_function(func, shader) {
+  nir_foreach_block(block, func->impl) {
+ nir_foreach_instr_safe(instr, block) {
+if (instr->type != nir_instr_type_intrinsic)
+   continue;
+
+nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);
+if (intr->intrinsic != nir_intrinsic_store_deref)
+   continue;
+
+nir_variable *var = nir_intrinsic_get_var(intr, 0);
+if (var->data.location != VARYING_SLOT_POS)
+   continue;
+
+nir_builder b;
+nir_builder_init(, func->impl);
+b.cursor = nir_before_instr(instr);
+
+/* Grab the 

Re: [Mesa-dev] [PATCH] android: fix LLVM version string related building errors

2019-04-12 Thread Mauro Rossi
Just a message to Eric,

as per our previous private thread,

I've checked that the Android build works,
but we use libLLVM70 name in library dependency.

Please adapt and apply the patch to mesa dev branch,
to fix the breakage

\ prior to " in the LLVM version string value, just touching
Android.mk solves the problem for me

Mauro

On Sat, Apr 13, 2019 at 1:27 AM Mauro Rossi  wrote:
>
> Fixes the following building errors:
>
> external/mesa/src/gallium/drivers/r600/r600_pipe_common.c:1290:14:
> error: expected ')'
>  ", LLVM " MESA_LLVM_VERSION_STRING
>^
> :8:34: note: expanded from here
>  ^
> external/mesa/src/gallium/drivers/r600/r600_pipe_common.c:1287:10:
> note: to match this '('
> snprintf(rscreen->renderer_string, sizeof(rscreen->renderer_string),
> ^
> 1 error generated.
>
> Fixes: 05b114e ("simplify LLVM version string printing")
> Signed-off-by: Mauro Rossi 
> ---
>  Android.mk | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/Android.mk b/Android.mk
> index 09139e86d1..b835eb64e9 100644
> --- a/Android.mk
> +++ b/Android.mk
> @@ -97,13 +97,13 @@ define mesa-build-with-llvm
>$(if $(filter $(MESA_ANDROID_MAJOR_VERSION), 4 5), \
>  $(warning Unsupported LLVM version in Android 
> $(MESA_ANDROID_MAJOR_VERSION)),) \
>$(if $(filter 6,$(MESA_ANDROID_MAJOR_VERSION)), \
> -$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0307 
> -DMESA_LLVM_VERSION_STRING="3.7")) \
> +$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0307 
> -DMESA_LLVM_VERSION_STRING=\"3.7\")) \
>$(if $(filter 7,$(MESA_ANDROID_MAJOR_VERSION)), \
> -$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0700 
> -DMESA_LLVM_VERSION_STRING="7.0")) \
> +$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0700 
> -DMESA_LLVM_VERSION_STRING=\"7.0\")) \
>$(if $(filter 8,$(MESA_ANDROID_MAJOR_VERSION)), \
> -$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0700 
> -DMESA_LLVM_VERSION_STRING="7.0")) \
> +$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0700 
> -DMESA_LLVM_VERSION_STRING=\"7.0\")) \
>$(if $(filter 9,$(MESA_ANDROID_MAJOR_VERSION)), \
> -$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0309 
> -DMESA_LLVM_VERSION_STRING="3.9")) \
> +$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0309 
> -DMESA_LLVM_VERSION_STRING=\"3.9\")) \
>$(eval LOCAL_SHARED_LIBRARIES += libLLVM70)
>  endef
>
> --
> 2.20.1
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] android: fix LLVM version string related building errors

2019-04-12 Thread Mauro Rossi
Fixes the following building errors:

external/mesa/src/gallium/drivers/r600/r600_pipe_common.c:1290:14:
error: expected ')'
 ", LLVM " MESA_LLVM_VERSION_STRING
   ^
:8:34: note: expanded from here
 ^
external/mesa/src/gallium/drivers/r600/r600_pipe_common.c:1287:10:
note: to match this '('
snprintf(rscreen->renderer_string, sizeof(rscreen->renderer_string),
^
1 error generated.

Fixes: 05b114e ("simplify LLVM version string printing")
Signed-off-by: Mauro Rossi 
---
 Android.mk | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Android.mk b/Android.mk
index 09139e86d1..b835eb64e9 100644
--- a/Android.mk
+++ b/Android.mk
@@ -97,13 +97,13 @@ define mesa-build-with-llvm
   $(if $(filter $(MESA_ANDROID_MAJOR_VERSION), 4 5), \
 $(warning Unsupported LLVM version in Android 
$(MESA_ANDROID_MAJOR_VERSION)),) \
   $(if $(filter 6,$(MESA_ANDROID_MAJOR_VERSION)), \
-$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0307 
-DMESA_LLVM_VERSION_STRING="3.7")) \
+$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0307 
-DMESA_LLVM_VERSION_STRING=\"3.7\")) \
   $(if $(filter 7,$(MESA_ANDROID_MAJOR_VERSION)), \
-$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0700 
-DMESA_LLVM_VERSION_STRING="7.0")) \
+$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0700 
-DMESA_LLVM_VERSION_STRING=\"7.0\")) \
   $(if $(filter 8,$(MESA_ANDROID_MAJOR_VERSION)), \
-$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0700 
-DMESA_LLVM_VERSION_STRING="7.0")) \
+$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0700 
-DMESA_LLVM_VERSION_STRING=\"7.0\")) \
   $(if $(filter 9,$(MESA_ANDROID_MAJOR_VERSION)), \
-$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0309 
-DMESA_LLVM_VERSION_STRING="3.9")) \
+$(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0309 
-DMESA_LLVM_VERSION_STRING=\"3.9\")) \
   $(eval LOCAL_SHARED_LIBRARIES += libLLVM70)
 endef
 
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: use CP DMA for the null const buffer clear on CIK

2019-04-12 Thread Marek Olšák
Done locally.

Marek

On Fri, Apr 12, 2019 at 12:20 PM Samuel Pitoiset 
wrote:

> I would suggest to document that workaround somewhere in the code.
>
> On 4/12/19 5:17 PM, Marek Olšák wrote:
> > From: Marek Olšák 
> >
> > This is a workaround for a thread deadlock that I have no idea
> > why it occurs.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108879
> > Fixes: 9b331e462e5021d994859756d46cd2519d9c9c6e
> > ---
> >   src/gallium/drivers/radeonsi/si_clear.c| 6 +++---
> >   src/gallium/drivers/radeonsi/si_compute_blit.c | 8 +---
> >   src/gallium/drivers/radeonsi/si_pipe.c | 2 +-
> >   src/gallium/drivers/radeonsi/si_pipe.h | 3 ++-
> >   src/gallium/drivers/radeonsi/si_test_dma.c | 2 +-
> >   5 files changed, 12 insertions(+), 9 deletions(-)
> >
> > diff --git a/src/gallium/drivers/radeonsi/si_clear.c
> b/src/gallium/drivers/radeonsi/si_clear.c
> > index e1805f2a1c9..ead680b857b 100644
> > --- a/src/gallium/drivers/radeonsi/si_clear.c
> > +++ b/src/gallium/drivers/radeonsi/si_clear.c
> > @@ -256,21 +256,21 @@ void vi_dcc_clear_level(struct si_context *sctx,
> >* would be more efficient than separate per-layer clear
> operations.
> >*/
> >   assert(tex->buffer.b.b.nr_storage_samples <= 2 ||
> num_layers == 1);
> >
> >   dcc_offset +=
> tex->surface.u.legacy.level[level].dcc_offset;
> >   clear_size =
> tex->surface.u.legacy.level[level].dcc_fast_clear_size *
> >num_layers;
> >   }
> >
> >   si_clear_buffer(sctx, dcc_buffer, dcc_offset, clear_size,
> > - _value, 4, SI_COHERENCY_CB_META);
> > + _value, 4, SI_COHERENCY_CB_META, false);
> >   }
> >
> >   /* Set the same micro tile mode as the destination of the last MSAA
> resolve.
> >* This allows hitting the MSAA resolve fast path, which requires that
> both
> >* src and dst micro tile modes match.
> >*/
> >   static void si_set_optimal_micro_tile_mode(struct si_screen *sscreen,
> >  struct si_texture *tex)
> >   {
> >   if (tex->buffer.b.is_shared ||
> > @@ -489,21 +489,21 @@ static void si_do_fast_color_clear(struct
> si_context *sctx,
> >
> >   /* DCC fast clear with MSAA should clear CMASK to
> 0xC. */
> >   if (tex->buffer.b.b.nr_samples >= 2 &&
> tex->cmask_buffer) {
> >   /* TODO: This doesn't work with MSAA. */
> >   if (eliminate_needed)
> >   continue;
> >
> >   uint32_t clear_value = 0x;
> >   si_clear_buffer(sctx,
> >cmask_buffer->b.b,
> >   tex->cmask_offset,
> tex->surface.cmask_size,
> > - _value, 4,
> SI_COHERENCY_CB_META);
> > + _value, 4,
> SI_COHERENCY_CB_META, false);
> >   fmask_decompress_needed = true;
> >   }
> >
> >   vi_dcc_clear_level(sctx, tex, 0, reset_value);
> >   tex->separate_dcc_dirty = true;
> >   } else {
> >   if (too_small)
> >   continue;
> >
> >   /* 128-bit formats are unusupported */
> > @@ -517,21 +517,21 @@ static void si_do_fast_color_clear(struct
> si_context *sctx,
> >
> >   /* ensure CMASK is enabled */
> >   si_alloc_separate_cmask(sctx->screen, tex);
> >   if (!tex->cmask_buffer)
> >   continue;
> >
> >   /* Do the fast clear. */
> >   uint32_t clear_value = 0;
> >   si_clear_buffer(sctx, >cmask_buffer->b.b,
> >   tex->cmask_offset,
> tex->surface.cmask_size,
> > - _value, 4,
> SI_COHERENCY_CB_META);
> > + _value, 4,
> SI_COHERENCY_CB_META, false);
> >   eliminate_needed = true;
> >   }
> >
> >   if ((eliminate_needed || fmask_decompress_needed) &&
> >   !(tex->dirty_level_mask & (1 << level))) {
> >   tex->dirty_level_mask |= 1 << level;
> >
>  p_atomic_inc(>screen->compressed_colortex_counter);
> >   }
> >
> >   /* We can change the micro tile mode before a full clear.
> */
> > diff --git a/src/gallium/drivers/radeonsi/si_compute_blit.c
> b/src/gallium/drivers/radeonsi/si_compute_blit.c
> > index 1abeac6adb0..fb0d8d2f1b6 100644
> > --- a/src/gallium/drivers/radeonsi/si_compute_blit.c
> > +++ b/src/gallium/drivers/radeonsi/si_compute_blit.c
> > @@ -179,21 +179,22 @@ static void si_compute_do_clear_or_copy(struct
> 

[Mesa-dev] [PATCH 1/8] etnaviv: create optional 2d pipe in screen

2019-04-12 Thread Lucas Stach
The 2D pipe is useful to implement fast planar and interleaved YUV buffer
imports. Not all systems have a 2D capable core, so this is strictly
optional.

Signed-off-by: Lucas Stach 
---
 src/gallium/drivers/etnaviv/etnaviv_context.c |  6 ++
 src/gallium/drivers/etnaviv/etnaviv_context.h |  1 +
 src/gallium/drivers/etnaviv/etnaviv_screen.c  | 68 +++
 src/gallium/drivers/etnaviv/etnaviv_screen.h  |  6 ++
 4 files changed, 81 insertions(+)

diff --git a/src/gallium/drivers/etnaviv/etnaviv_context.c 
b/src/gallium/drivers/etnaviv/etnaviv_context.c
index a59338490b62..631f551d0ad4 100644
--- a/src/gallium/drivers/etnaviv/etnaviv_context.c
+++ b/src/gallium/drivers/etnaviv/etnaviv_context.c
@@ -78,6 +78,9 @@ etna_context_destroy(struct pipe_context *pctx)
if (ctx->stream)
   etna_cmd_stream_del(ctx->stream);
 
+   if (ctx->stream2d)
+  etna_cmd_stream_del(ctx->stream2d);
+
slab_destroy_child(>transfer_pool);
 
if (ctx->in_fence_fd != -1)
@@ -434,6 +437,9 @@ etna_context_create(struct pipe_screen *pscreen, void 
*priv, unsigned flags)
if (ctx->stream == NULL)
   goto fail;
 
+   if (screen->pipe2d)
+  ctx->stream2d = etna_cmd_stream_new(screen->pipe2d, 0x1000, NULL, NULL);
+
/* context ctxate setup */
ctx->specs = screen->specs;
ctx->screen = screen;
diff --git a/src/gallium/drivers/etnaviv/etnaviv_context.h 
b/src/gallium/drivers/etnaviv/etnaviv_context.h
index a79d739100d9..2c6e5d6c3db1 100644
--- a/src/gallium/drivers/etnaviv/etnaviv_context.h
+++ b/src/gallium/drivers/etnaviv/etnaviv_context.h
@@ -110,6 +110,7 @@ struct etna_context {
struct etna_specs specs;
struct etna_screen *screen;
struct etna_cmd_stream *stream;
+   struct etna_cmd_stream *stream2d;
 
/* which state objects need to be re-emit'd: */
enum {
diff --git a/src/gallium/drivers/etnaviv/etnaviv_screen.c 
b/src/gallium/drivers/etnaviv/etnaviv_screen.c
index 62b6f1c80fae..0dea6056c75a 100644
--- a/src/gallium/drivers/etnaviv/etnaviv_screen.c
+++ b/src/gallium/drivers/etnaviv/etnaviv_screen.c
@@ -95,6 +95,12 @@ etna_screen_destroy(struct pipe_screen *pscreen)
if (screen->gpu)
   etna_gpu_del(screen->gpu);
 
+   if (screen->pipe2d)
+  etna_pipe_del(screen->pipe2d);
+
+   if (screen->gpu2d)
+  etna_gpu_del(screen->gpu2d);
+
if (screen->ro)
   FREE(screen->ro);
 
@@ -891,6 +897,66 @@ etna_screen_bo_from_handle(struct pipe_screen *pscreen,
return bo;
 }
 
+static void etna_screen_init_2d(struct etna_screen *screen)
+{
+   struct etna_gpu *gpu2d = NULL;
+   uint64_t val;
+   int ret, i;
+
+   /* If the current GPU is a combined 2d/3D core, use it as 2D engine */
+   if (screen->features[0] & chipFeatures_PIPE_2D)
+  gpu2d = screen->gpu;
+
+   /* otherwise search for a 2D capable core */
+   if (!gpu2d) {
+  for (i = 0;; i++) {
+ gpu2d = etna_gpu_new(screen->dev, i);
+ if (!gpu2d)
+return;
+
+ ret = etna_gpu_get_param(gpu2d, ETNA_GPU_FEATURES_0, );
+ if (!ret && (val & chipFeatures_PIPE_2D)) {
+screen->gpu2d = gpu2d;
+break;
+ }
+
+ etna_gpu_del(gpu2d);
+  }
+   }
+
+   if (etna_gpu_get_param(screen->gpu2d, ETNA_GPU_FEATURES_0, ))
+  return;
+   screen->features2d[0] = val;
+
+   if (etna_gpu_get_param(screen->gpu2d, ETNA_GPU_FEATURES_1, ))
+  return;
+   screen->features2d[1] = val;
+
+   if (etna_gpu_get_param(screen->gpu2d, ETNA_GPU_FEATURES_2, ))
+  return;
+   screen->features2d[2] = val;
+
+   if (etna_gpu_get_param(screen->gpu2d, ETNA_GPU_FEATURES_3, ))
+  return;
+   screen->features2d[3] = val;
+
+   if (etna_gpu_get_param(screen->gpu2d, ETNA_GPU_FEATURES_4, ))
+  return;
+   screen->features2d[4] = val;
+
+   if (etna_gpu_get_param(screen->gpu2d, ETNA_GPU_FEATURES_5, ))
+  return;
+   screen->features2d[5] = val;
+
+   if (etna_gpu_get_param(screen->gpu2d, ETNA_GPU_FEATURES_6, ))
+  return;
+   screen->features2d[6] = val;
+
+   screen->pipe2d = etna_pipe_new(gpu2d, ETNA_PIPE_2D);
+   if (!screen->pipe2d)
+  DBG("could not create 2d pipe");
+}
+
 struct pipe_screen *
 etna_screen_create(struct etna_device *dev, struct etna_gpu *gpu,
struct renderonly *ro)
@@ -984,6 +1050,8 @@ etna_screen_create(struct etna_device *dev, struct 
etna_gpu *gpu,
}
screen->features[6] = val;
 
+   etna_screen_init_2d(screen);
+
if (!etna_get_specs(screen))
   goto fail;
 
diff --git a/src/gallium/drivers/etnaviv/etnaviv_screen.h 
b/src/gallium/drivers/etnaviv/etnaviv_screen.h
index 9757985526ec..82733a379430 100644
--- a/src/gallium/drivers/etnaviv/etnaviv_screen.h
+++ b/src/gallium/drivers/etnaviv/etnaviv_screen.h
@@ -60,6 +60,9 @@ enum viv_features_word {
 #define VIV_FEATURE(screen, word, feature) \
((screen->features[viv_ ## word] & (word ## _ ## feature)) != 0)
 
+#define VIV_2D_FEATURE(screen, word, feature) \
+   ((screen->features2d[viv_ ## word] & (word ## _ ## feature)) != 0)
+
 

[Mesa-dev] [PATCH 7/8] etnaviv: improve PIPE_BIND_LINEAR handling

2019-04-12 Thread Lucas Stach
We weren't handling this flag at all, which broke some assumptions
made by the users of the resource_create interface. As we can't render
to a linear surface and the usefulness of yet another layout transition
to handle this case seems limited, we only respect the flag when the
resource isn't used for rendering.

Signed-off-by: Lucas Stach 
---
 src/gallium/drivers/etnaviv/etnaviv_resource.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/gallium/drivers/etnaviv/etnaviv_resource.c 
b/src/gallium/drivers/etnaviv/etnaviv_resource.c
index f405b880a6c0..650c8e7eb7f5 100644
--- a/src/gallium/drivers/etnaviv/etnaviv_resource.c
+++ b/src/gallium/drivers/etnaviv/etnaviv_resource.c
@@ -369,6 +369,14 @@ etna_resource_create(struct pipe_screen *pscreen,
if (templat->target == PIPE_TEXTURE_3D)
   layout = ETNA_LAYOUT_LINEAR;
 
+   /* The render pipe can't handle linear and there is no code to do yet 
another
+* layout transformation for this case, so we only respect the linear flag
+* if the resource isn't meant to be rendered.
+*/
+   if ((templat->bind & PIPE_BIND_LINEAR) &&
+   !(templat->bind & PIPE_BIND_RENDER_TARGET))
+  layout = ETNA_LAYOUT_LINEAR;
+
/* modifier is only used for scanout surfaces, so safe to use LINEAR here */
return etna_resource_alloc(pscreen, layout, mode, DRM_FORMAT_MOD_LINEAR, 
templat);
 }
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/8] etnaviv: clear out next pointer when allocating resource

2019-04-12 Thread Lucas Stach
We copy the template resource content into the newly allocated resource.
If the template derived from a planar resource this leads to a non reference
counted copy of the next resource pointer. Make sure to clear this out when
allocating a new resource.

Signed-off-by: Lucas Stach 
---
 src/gallium/drivers/etnaviv/etnaviv_resource.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/etnaviv/etnaviv_resource.c 
b/src/gallium/drivers/etnaviv/etnaviv_resource.c
index 83179d3cd088..77d027ac806b 100644
--- a/src/gallium/drivers/etnaviv/etnaviv_resource.c
+++ b/src/gallium/drivers/etnaviv/etnaviv_resource.c
@@ -274,6 +274,7 @@ etna_resource_alloc(struct pipe_screen *pscreen, unsigned 
layout,
   return NULL;
 
rsc->base = *templat;
+   rsc->base.next = NULL;
rsc->base.screen = pscreen;
rsc->base.nr_samples = nr_samples;
rsc->layout = layout;
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 6/8] etnaviv: use filter blit for 2D YUV import on old GC320

2019-04-12 Thread Lucas Stach
The GC320 without the 2D tiling feature doesn't support regular blits
with YUV input, as well as the tiled output. So on those cores we need
need to do a filter blit for the YUV->RGB conversion to a temporary
linear buffer and then do a tiling blit into the texture buffer using
the RS engine on the 3D core.

Not the most efficient path, but at least gives us the same level of
functionality as on the newer GC320 cores and looks the same to the
application.

Signed-off-by: Lucas Stach 
---
 src/gallium/drivers/etnaviv/etnaviv_2d.c | 198 ---
 1 file changed, 180 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/etnaviv/etnaviv_2d.c 
b/src/gallium/drivers/etnaviv/etnaviv_2d.c
index 457fa4e0cbd0..31b6bf4313dd 100644
--- a/src/gallium/drivers/etnaviv/etnaviv_2d.c
+++ b/src/gallium/drivers/etnaviv/etnaviv_2d.c
@@ -25,13 +25,16 @@
 #include "etnaviv_context.h"
 #include "etnaviv_emit.h"
 #include "etnaviv_screen.h"
+#include "etnaviv_rs.h"
 
 #include "pipe/p_state.h"
 #include "util/u_format.h"
 
 #include "hw/state_2d.xml.h"
+#include "hw/common.xml.h"
 
 #include 
+#include 
 
 #define EMIT_STATE(state_name, src_value) \
etna_coalsence_emit(stream, , VIVS_##state_name, src_value)
@@ -39,15 +42,85 @@
 #define EMIT_STATE_RELOC(state_name, src_value) \
etna_coalsence_emit_reloc(stream, , VIVS_##state_name, src_value)
 
+/* stolen from xf86-video-armada */
+#define KERNEL_ROWS 17
+#define KERNEL_INDICES  9
+#define KERNEL_SIZE (KERNEL_ROWS * KERNEL_INDICES)
+#define KERNEL_STATE_SZ ((KERNEL_SIZE + 1) / 2)
+
+static bool filter_kernel_initialized;
+static uint32_t filter_kernel[KERNEL_STATE_SZ];
+
+static inline float
+sinc (float x)
+{
+  return x != 0.0 ? sinf (x) / x : 1.0;
+}
+
+static void
+etnaviv_init_filter_kernel(void)
+{
+   unsigned row, idx, i;
+   int16_t kernel_val[KERNEL_STATE_SZ * 2];
+   float row_ofs = 0.5;
+   float radius = 4.0;
+
+   /* Compute lanczos filter kernel */
+   for (row = i = 0; row < KERNEL_ROWS; row++) {
+  float kernel[KERNEL_INDICES] = { 0.0 };
+  float sum = 0.0;
+
+  for (idx = 0; idx < KERNEL_INDICES; idx++) {
+ float x = idx - 4.0 + row_ofs;
+
+ if (fabs (x) <= radius)
+kernel[idx] = sinc (M_PI * x) * sinc (M_PI * x / radius);
+
+ sum += kernel[idx];
+   }
+
+   /* normalise the row */
+   if (sum)
+  for (idx = 0; idx < KERNEL_INDICES; idx++)
+ kernel[idx] /= sum;
+
+   /* convert to 1.14 format */
+   for (idx = 0; idx < KERNEL_INDICES; idx++) {
+  int val = kernel[idx] * (float) (1 << 14);
+
+  if (val < -0x8000)
+ val = -0x8000;
+  else if (val > 0x7fff)
+ val = 0x7fff;
+
+  kernel_val[i++] = val;
+   }
+
+   row_ofs -= 1.0 / ((KERNEL_ROWS - 1) * 2);
+   }
+
+   kernel_val[KERNEL_SIZE] = 0;
+
+   /* Now convert the kernel values into state values */
+   for (i = 0; i < KERNEL_STATE_SZ * 2; i += 2)
+  filter_kernel[i / 2] =
+ VIVS_DE_FILTER_KERNEL_COEFFICIENT0 (kernel_val[i]) |
+ VIVS_DE_FILTER_KERNEL_COEFFICIENT1 (kernel_val[i + 1]);
+}
+
 bool etna_try_2d_blit(struct pipe_context *pctx,
   const struct pipe_blit_info *blit_info)
 {
struct etna_context *ctx = etna_context(pctx);
+   struct etna_screen *screen = ctx->screen;
struct etna_cmd_stream *stream = ctx->stream2d;
struct etna_coalesce coalesce;
struct etna_reloc ry, ru, rv, rdst;
struct pipe_resource *res_y, *res_u, *res_v, *res_dst;
+   struct etna_bo *temp_bo = NULL;
uint32_t src_format;
+   bool ext_blt = VIV_2D_FEATURE(screen, chipMinorFeatures2, 2D_TILING);
+   uint32_t dst_config;
 
assert(util_format_is_yuv(blit_info->src.format));
assert(blit_info->dst.format == PIPE_FORMAT_R8G8B8A8_UNORM);
@@ -55,6 +128,11 @@ bool etna_try_2d_blit(struct pipe_context *pctx,
if (!stream)
   return FALSE;
 
+  if (unlikely(!ext_blt && !filter_kernel_initialized)) {
+  etnaviv_init_filter_kernel();
+  filter_kernel_initialized = true;
+  }
+
switch (blit_info->src.format) {
case PIPE_FORMAT_NV12:
   src_format = DE_FORMAT_NV12;
@@ -66,6 +144,18 @@ bool etna_try_2d_blit(struct pipe_context *pctx,
   return FALSE;
}
 
+   res_dst = blit_info->dst.resource;
+
+   if (!ext_blt && etna_resource(res_dst)->layout != ETNA_LAYOUT_LINEAR) {
+  struct etna_resource *dst = etna_resource(blit_info->dst.resource);
+  unsigned int bo_size = dst->levels[blit_info->dst.level].stride *
+ dst->levels[blit_info->dst.level].padded_height;
+
+  temp_bo = etna_bo_new(screen->dev, bo_size, DRM_ETNA_GEM_CACHE_WC);
+  if (!temp_bo)
+ return FALSE;
+   }
+
res_y = blit_info->src.resource;
res_u = res_y->next ? res_y->next : res_y;
res_v = res_u->next ? res_u->next : res_u;
@@ -79,8 +169,7 @@ bool etna_try_2d_blit(struct pipe_context *pctx,
 
ry.flags = ru.flags = rv.flags = 

[Mesa-dev] [PATCH 8/8] etnaviv: handle YUV textures with the 2D GPU

2019-04-12 Thread Lucas Stach
This allows color space conversion and tiling in a single step, as
well as handling multi-planar formats like NV12, which are really
useful when dealing with hardware video decoders.

Signed-off-by: Lucas Stach 
---
 .../drivers/etnaviv/etnaviv_clear_blit.c  |  2 +-
 src/gallium/drivers/etnaviv/etnaviv_format.c  |  5 +++-
 .../drivers/etnaviv/etnaviv_resource.c| 16 
 src/gallium/drivers/etnaviv/etnaviv_rs.c  |  5 
 src/gallium/drivers/etnaviv/etnaviv_screen.c  |  5 +++-
 src/gallium/drivers/etnaviv/etnaviv_texture.c | 25 ---
 6 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/etnaviv/etnaviv_clear_blit.c 
b/src/gallium/drivers/etnaviv/etnaviv_clear_blit.c
index 45c30cbf5076..5214162d8798 100644
--- a/src/gallium/drivers/etnaviv/etnaviv_clear_blit.c
+++ b/src/gallium/drivers/etnaviv/etnaviv_clear_blit.c
@@ -159,7 +159,7 @@ etna_copy_resource(struct pipe_context *pctx, struct 
pipe_resource *dst,
struct etna_resource *src_priv = etna_resource(src);
struct etna_resource *dst_priv = etna_resource(dst);
 
-   assert(src->format == dst->format);
+   assert(src->format == dst->format || util_format_is_yuv(src->format));
assert(src->array_size == dst->array_size);
assert(last_level <= dst->last_level && last_level <= src->last_level);
 
diff --git a/src/gallium/drivers/etnaviv/etnaviv_format.c 
b/src/gallium/drivers/etnaviv/etnaviv_format.c
index 29e81c4a8b04..0879ddd6a6c8 100644
--- a/src/gallium/drivers/etnaviv/etnaviv_format.c
+++ b/src/gallium/drivers/etnaviv/etnaviv_format.c
@@ -282,8 +282,11 @@ static struct etna_format formats[PIPE_FORMAT_COUNT] = {
_T(ASTC_12x12_SRGB, ASTC_SRGB8_ALPHA8_12x12 | ASTC_FORMAT,  SWIZ(X, 
Y, Z, W), NONE, NONE),
 
/* YUV */
-   _T(YUYV, YUY2, SWIZ(X, Y, Z, W), YUY2, NONE),
+   _T(YUYV, X8B8G8R8, SWIZ(X, Y, Z, W), NONE, NONE),
_T(UYVY, UYVY, SWIZ(X, Y, Z, W), NONE, NONE),
+
+   /* multi-planar YUV */
+   _T(NV12, X8B8G8R8, SWIZ(X, Y, Z, W), NONE, NONE),
 };
 
 uint32_t
diff --git a/src/gallium/drivers/etnaviv/etnaviv_resource.c 
b/src/gallium/drivers/etnaviv/etnaviv_resource.c
index 650c8e7eb7f5..5ba3eba5bd33 100644
--- a/src/gallium/drivers/etnaviv/etnaviv_resource.c
+++ b/src/gallium/drivers/etnaviv/etnaviv_resource.c
@@ -560,6 +560,17 @@ etna_resource_from_handle(struct pipe_screen *pscreen,
   
level->padded_height);
level->size = level->layer_stride;
 
+   rsc->pending_ctx = _mesa_set_create(NULL, _mesa_hash_pointer,
+   _mesa_key_pointer_equal);
+   if (!rsc->pending_ctx)
+  goto fail;
+
+   /* YUV resources are handled by the 2D GPU, so the below constraint checks
+* are invalid.
+*/
+   if (util_format_is_yuv(tmpl->format))
+  return prsc;
+
/* The DDX must give us a BO which conforms to our padding size.
 * The stride of the BO must be greater or equal to our padded
 * stride. The size of the BO must accomodate the padded height. */
@@ -576,11 +587,6 @@ etna_resource_from_handle(struct pipe_screen *pscreen,
   goto fail;
}
 
-   rsc->pending_ctx = _mesa_set_create(NULL, _mesa_hash_pointer,
-   _mesa_key_pointer_equal);
-   if (!rsc->pending_ctx)
-  goto fail;
-
if (rsc->layout == ETNA_LAYOUT_LINEAR) {
   /*
* Both sampler and pixel pipes can't handle linear, create a compatible
diff --git a/src/gallium/drivers/etnaviv/etnaviv_rs.c 
b/src/gallium/drivers/etnaviv/etnaviv_rs.c
index fcc2342aedc3..22d07d8f9726 100644
--- a/src/gallium/drivers/etnaviv/etnaviv_rs.c
+++ b/src/gallium/drivers/etnaviv/etnaviv_rs.c
@@ -26,6 +26,7 @@
 
 #include "etnaviv_rs.h"
 
+#include "etnaviv_2d.h"
 #include "etnaviv_clear_blit.h"
 #include "etnaviv_context.h"
 #include "etnaviv_emit.h"
@@ -775,6 +776,10 @@ etna_blit_rs(struct pipe_context *pctx, const struct 
pipe_blit_info *blit_info)
   return;
}
 
+   if (util_format_is_yuv(blit_info->src.format) &&
+   etna_try_2d_blit(pctx, blit_info))
+  return;
+
if (etna_try_rs_blit(pctx, blit_info))
   return;
 
diff --git a/src/gallium/drivers/etnaviv/etnaviv_screen.c 
b/src/gallium/drivers/etnaviv/etnaviv_screen.c
index 0dea6056c75a..b0630e27b507 100644
--- a/src/gallium/drivers/etnaviv/etnaviv_screen.c
+++ b/src/gallium/drivers/etnaviv/etnaviv_screen.c
@@ -535,6 +535,9 @@ gpu_supports_texure_format(struct etna_screen *screen, 
uint32_t fmt,
   supported = screen->specs.tex_astc;
}
 
+   if (util_format_is_yuv(format))
+  supported = !!screen->gpu2d;
+
if (!supported)
   return false;
 
@@ -658,7 +661,7 @@ etna_screen_query_dmabuf_modifiers(struct pipe_screen 
*pscreen,
   if (modifiers)
  modifiers[num_modifiers] = supported_modifiers[i];
   if (external_only)
- external_only[num_modifiers] = util_format_is_yuv(format) ? 1 : 0;
+ external_only[num_modifiers] = 0;
  

[Mesa-dev] [PATCH 3/8] etnaviv: remember data offset into BO

2019-04-12 Thread Lucas Stach
Imported resources might not start at offset 0 into the buffer object.
Make sure to remember the offset that is provided with the handle on
import.

Signed-off-by: Lucas Stach 
---
 src/gallium/drivers/etnaviv/etnaviv_resource.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/etnaviv/etnaviv_resource.c 
b/src/gallium/drivers/etnaviv/etnaviv_resource.c
index 77d027ac806b..f405b880a6c0 100644
--- a/src/gallium/drivers/etnaviv/etnaviv_resource.c
+++ b/src/gallium/drivers/etnaviv/etnaviv_resource.c
@@ -535,6 +535,7 @@ etna_resource_from_handle(struct pipe_screen *pscreen,
 
level->width = tmpl->width0;
level->height = tmpl->height0;
+   level->offset = handle->offset;
 
/* Determine padding of the imported resource. */
unsigned paddingX = 0, paddingY = 0;
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/8] etnaviv: add 2D GPU YUV->RGB blitter

2019-04-12 Thread Lucas Stach
This adds a blit path using the 2D GPU for a linear YUV to tiled RGB
blit. This allows to implement importing of planar YUV textures with
a single copy.

Signed-off-by: Lucas Stach 
---
 src/gallium/drivers/etnaviv/Makefile.sources  |2 +
 src/gallium/drivers/etnaviv/etnaviv_2d.c  |  164 ++
 src/gallium/drivers/etnaviv/etnaviv_2d.h  |   37 +
 src/gallium/drivers/etnaviv/hw/state_2d.xml.h | 1499 +
 src/gallium/drivers/etnaviv/meson.build   |3 +
 5 files changed, 1705 insertions(+)
 create mode 100644 src/gallium/drivers/etnaviv/etnaviv_2d.c
 create mode 100644 src/gallium/drivers/etnaviv/etnaviv_2d.h
 create mode 100644 src/gallium/drivers/etnaviv/hw/state_2d.xml.h

diff --git a/src/gallium/drivers/etnaviv/Makefile.sources 
b/src/gallium/drivers/etnaviv/Makefile.sources
index 01e7e49a38ad..36dd7d1b6aa4 100644
--- a/src/gallium/drivers/etnaviv/Makefile.sources
+++ b/src/gallium/drivers/etnaviv/Makefile.sources
@@ -3,11 +3,13 @@ C_SOURCES :=  \
hw/common.xml.h \
hw/common_3d.xml.h \
hw/isa.xml.h \
+   hw/state_2d.xml.h \
hw/state_3d.xml.h \
hw/state_blt.xml.h \
hw/state.xml.h \
hw/texdesc_3d.xml.h \
\
+   etnaviv_2d.c \
etnaviv_asm.c \
etnaviv_asm.h \
etnaviv_blend.c \
diff --git a/src/gallium/drivers/etnaviv/etnaviv_2d.c 
b/src/gallium/drivers/etnaviv/etnaviv_2d.c
new file mode 100644
index ..457fa4e0cbd0
--- /dev/null
+++ b/src/gallium/drivers/etnaviv/etnaviv_2d.c
@@ -0,0 +1,164 @@
+/*
+ * Copyright (c) 2018 Etnaviv Project
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sub license,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include "etnaviv_2d.h"
+#include "etnaviv_context.h"
+#include "etnaviv_emit.h"
+#include "etnaviv_screen.h"
+
+#include "pipe/p_state.h"
+#include "util/u_format.h"
+
+#include "hw/state_2d.xml.h"
+
+#include 
+
+#define EMIT_STATE(state_name, src_value) \
+   etna_coalsence_emit(stream, , VIVS_##state_name, src_value)
+
+#define EMIT_STATE_RELOC(state_name, src_value) \
+   etna_coalsence_emit_reloc(stream, , VIVS_##state_name, src_value)
+
+bool etna_try_2d_blit(struct pipe_context *pctx,
+  const struct pipe_blit_info *blit_info)
+{
+   struct etna_context *ctx = etna_context(pctx);
+   struct etna_cmd_stream *stream = ctx->stream2d;
+   struct etna_coalesce coalesce;
+   struct etna_reloc ry, ru, rv, rdst;
+   struct pipe_resource *res_y, *res_u, *res_v, *res_dst;
+   uint32_t src_format;
+
+   assert(util_format_is_yuv(blit_info->src.format));
+   assert(blit_info->dst.format == PIPE_FORMAT_R8G8B8A8_UNORM);
+
+   if (!stream)
+  return FALSE;
+
+   switch (blit_info->src.format) {
+   case PIPE_FORMAT_NV12:
+  src_format = DE_FORMAT_NV12;
+  break;
+   case PIPE_FORMAT_YUYV:
+  src_format = DE_FORMAT_YUY2;
+  break;
+   default:
+  return FALSE;
+   }
+
+   res_y = blit_info->src.resource;
+   res_u = res_y->next ? res_y->next : res_y;
+   res_v = res_u->next ? res_u->next : res_u;
+
+   ry.bo = etna_resource(res_y)->bo;
+   ry.offset = etna_resource(res_y)->levels[blit_info->src.level].offset;
+   ru.bo = etna_resource(res_u)->bo;
+   ru.offset = etna_resource(res_u)->levels[blit_info->src.level].offset;
+   rv.bo = etna_resource(res_v)->bo;
+   rv.offset = etna_resource(res_v)->levels[blit_info->src.level].offset;
+
+   ry.flags = ru.flags = rv.flags = ETNA_RELOC_READ;
+
+   res_dst = blit_info->dst.resource;
+   rdst.bo = etna_resource(res_dst)->bo;
+   rdst.flags = ETNA_RELOC_WRITE;
+   rdst.offset = 0;
+
+   etna_coalesce_start(stream, );
+
+   EMIT_STATE_RELOC(DE_SRC_ADDRESS, );
+   EMIT_STATE(DE_SRC_STRIDE, etna_resource(res_y)->levels[0].stride);
+
+   EMIT_STATE_RELOC(DE_UPLANE_ADDRESS, );
+   EMIT_STATE(DE_UPLANE_STRIDE, etna_resource(res_u)->levels[0].stride);
+   EMIT_STATE_RELOC(DE_VPLANE_ADDRESS, );
+   EMIT_STATE(DE_VPLANE_STRIDE, 

[Mesa-dev] [PATCH 5/8] etnaviv: export etna_submit_rs_state

2019-04-12 Thread Lucas Stach
The new 2D YUV blit needs this in some cases, so make it available.

Signed-off-by: Lucas Stach 
---
 src/gallium/drivers/etnaviv/etnaviv_rs.c | 2 +-
 src/gallium/drivers/etnaviv/etnaviv_rs.h | 4 
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/etnaviv/etnaviv_rs.c 
b/src/gallium/drivers/etnaviv/etnaviv_rs.c
index a9d3872ad41b..fcc2342aedc3 100644
--- a/src/gallium/drivers/etnaviv/etnaviv_rs.c
+++ b/src/gallium/drivers/etnaviv/etnaviv_rs.c
@@ -171,7 +171,7 @@ etna_modify_rs_clearbits(struct compiled_rs_state *cs, 
uint32_t clear_bits)
 
 /* submit RS state, without any processing and no dependence on context
  * except TS if this is a source-to-destination blit. */
-static void
+void
 etna_submit_rs_state(struct etna_context *ctx,
  const struct compiled_rs_state *cs)
 {
diff --git a/src/gallium/drivers/etnaviv/etnaviv_rs.h 
b/src/gallium/drivers/etnaviv/etnaviv_rs.h
index 125a13a9ad34..81ef05955a79 100644
--- a/src/gallium/drivers/etnaviv/etnaviv_rs.h
+++ b/src/gallium/drivers/etnaviv/etnaviv_rs.h
@@ -84,6 +84,10 @@ void
 etna_compile_rs_state(struct etna_context *ctx, struct compiled_rs_state *cs,
   const struct rs_state *rs);
 
+void
+etna_submit_rs_state(struct etna_context *ctx,
+ const struct compiled_rs_state *cs);
+
 /* Context initialization for RS clear_blit functions. */
 void
 etna_clear_blit_rs_init(struct pipe_context *pctx);
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/6] st/dri: fix dri2_from_planar for multiplanar images

2019-04-12 Thread Lucas Stach
From: Philipp Zabel 

Fix the gbm_dri_bo_get_handle_for_plane use case by allowing plane > 0
in dri2_from_planar for images with multiple planes in separate chained
texture resources.

Signed-off-by: Philipp Zabel 
---
 src/gallium/state_trackers/dri/dri2.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/dri/dri2.c 
b/src/gallium/state_trackers/dri/dri2.c
index 4c8ea485cc70..f139bd6722b9 100644
--- a/src/gallium/state_trackers/dri/dri2.c
+++ b/src/gallium/state_trackers/dri/dri2.c
@@ -1275,10 +1275,18 @@ static __DRIimage *
 dri2_from_planar(__DRIimage *image, int plane, void *loaderPrivate)
 {
__DRIimage *img;
+   struct pipe_resource *tex = image->texture;
+   int i;
 
-   if (plane != 0)
+   if (plane >= 3)
   return NULL;
 
+   for (i = 0; i < plane; i++) {
+  tex = tex->next;
+  if (!tex)
+ return NULL;
+   }
+
if (image->dri_components == 0)
   return NULL;
 
@@ -1286,6 +1294,8 @@ dri2_from_planar(__DRIimage *image, int plane, void 
*loaderPrivate)
if (img == NULL)
   return NULL;
 
+   pipe_resource_reference(>texture, tex);
+
if (img->texture->screen->resource_changed)
   img->texture->screen->resource_changed(img->texture->screen,
  img->texture);
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/6] st/dri: don't re-write plane format if supported by driver

2019-04-12 Thread Lucas Stach
If the driver supports multi-planar formats natively we don't want to
re-write the format of the planes on import. Split this out in a
separate function for clarity.

Signed-off-by: Lucas Stach 
---
 src/gallium/state_trackers/dri/dri2.c | 32 ++-
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/src/gallium/state_trackers/dri/dri2.c 
b/src/gallium/state_trackers/dri/dri2.c
index 4243a00cb38d..38a8e28ff439 100644
--- a/src/gallium/state_trackers/dri/dri2.c
+++ b/src/gallium/state_trackers/dri/dri2.c
@@ -776,6 +776,24 @@ dri2_update_tex_buffer(struct dri_drawable *drawable,
/* no-op */
 }
 
+static enum pipe_format get_plane_format(struct pipe_screen *pscreen,
+ enum pipe_format pf,
+ enum pipe_texture_target target,
+ unsigned usage, int plane)
+{
+
+   /* If the driver supports the format natively, no need to re-write */
+   if (pscreen->is_format_supported(pscreen, pf, target, 0, 0, usage))
+  return pf;
+
+   if (pf == PIPE_FORMAT_IYUV || (pf == PIPE_FORMAT_NV12 && plane == 0))
+  return PIPE_FORMAT_R8_UNORM;
+
+   if (pf == PIPE_FORMAT_NV12 && plane == 1)
+  return PIPE_FORMAT_RG88_UNORM;
+
+   return PIPE_FORMAT_NONE;
+}
 static __DRIimage *
 dri2_create_image_from_winsys(__DRIscreen *_screen,
   int width, int height, enum pipe_format pf,
@@ -800,9 +818,9 @@ dri2_create_image_from_winsys(__DRIscreen *_screen,
   /* YUV format sampling can be emulated by the Mesa state tracker by
* using multiple R8/RG88 samplers. So try to rewrite the pipe format.
*/
-  pf = PIPE_FORMAT_R8_UNORM;
 
-  if (pscreen->is_format_supported(pscreen, pf, screen->target, 0, 0,
+  if (pscreen->is_format_supported(pscreen, PIPE_FORMAT_R8_UNORM,
+   screen->target, 0, 0,
PIPE_BIND_SAMPLER_VIEW))
  tex_usage |= PIPE_BIND_SAMPLER_VIEW;
}
@@ -829,24 +847,18 @@ dri2_create_image_from_winsys(__DRIscreen *_screen,
   case 0:
  templ.width0 = width;
  templ.height0 = height;
- templ.format = pf;
  break;
   case 1:
- templ.width0 = width / 2;
- templ.height0 = height / 2;
- templ.format = (num_handles == 2) ?
-   PIPE_FORMAT_RG88_UNORM :   /* NV12, etc */
-   PIPE_FORMAT_R8_UNORM;  /* I420, etc */
- break;
   case 2:
  templ.width0 = width / 2;
  templ.height0 = height / 2;
- templ.format = PIPE_FORMAT_R8_UNORM;
  break;
   default:
  unreachable("too many planes!");
   }
 
+  templ.format = get_plane_format(pscreen, pf, screen->target, tex_usage, 
i);
+
   tex = pscreen->resource_from_handle(pscreen,
 , [i], PIPE_HANDLE_USAGE_FRAMEBUFFER_WRITE);
   if (!tex) {
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 6/6] st/mesa: skip any extra handling of YUV textures if driver supports them

2019-04-12 Thread Lucas Stach
If the driver provides native support for YUV textures we can skip
adding additional samplers and re-writing the shaders.

Signed-off-by: Lucas Stach 
---
 src/mesa/state_tracker/st_atom_sampler.c |  6 ++
 src/mesa/state_tracker/st_atom_texture.c |  6 ++
 src/mesa/state_tracker/st_program.h  |  6 ++
 src/mesa/state_tracker/st_sampler_view.c | 21 +
 4 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/src/mesa/state_tracker/st_atom_sampler.c 
b/src/mesa/state_tracker/st_atom_sampler.c
index 27e4da315817..9def70c9432a 100644
--- a/src/mesa/state_tracker/st_atom_sampler.c
+++ b/src/mesa/state_tracker/st_atom_sampler.c
@@ -303,10 +303,16 @@ update_shader_samplers(struct st_context *st,
   struct st_texture_object *stObj =
 st_get_texture_object(st->ctx, prog, unit);
   struct pipe_sampler_state *sampler = samplers + unit;
+  struct pipe_screen *pscreen = st->pipe->screen;
 
   if (!stObj)
  continue;
 
+  if (pscreen->is_format_supported(pscreen, st_get_view_format(stObj),
+   PIPE_TEXTURE_2D, 0, 0,
+   PIPE_BIND_SAMPLER_VIEW))
+ continue;
+
   switch (st_get_view_format(stObj)) {
   case PIPE_FORMAT_NV12:
  /* we need one additional sampler: */
diff --git a/src/mesa/state_tracker/st_atom_texture.c 
b/src/mesa/state_tracker/st_atom_texture.c
index ce7755f0c588..df84a472e722 100644
--- a/src/mesa/state_tracker/st_atom_texture.c
+++ b/src/mesa/state_tracker/st_atom_texture.c
@@ -174,11 +174,17 @@ update_textures(struct st_context *st,
   GLuint extra = 0;
   struct st_texture_object *stObj =
 st_get_texture_object(st->ctx, prog, unit);
+  struct pipe_screen *pscreen = st->pipe->screen;
   struct pipe_sampler_view tmpl;
 
   if (!stObj)
  continue;
 
+  if (pscreen->is_format_supported(pscreen, st_get_view_format(stObj),
+   sampler_views[unit]->target, 0, 0,
+   PIPE_BIND_SAMPLER_VIEW))
+ continue;
+
   /* use original view as template: */
   tmpl = *sampler_views[unit];
 
diff --git a/src/mesa/state_tracker/st_program.h 
b/src/mesa/state_tracker/st_program.h
index f67ea5eb2087..bc5f3e3d9642 100644
--- a/src/mesa/state_tracker/st_program.h
+++ b/src/mesa/state_tracker/st_program.h
@@ -67,6 +67,12 @@ st_get_external_sampler_key(struct st_context *st, struct 
gl_program *prog)
   unsigned unit = u_bit_scan();
   struct st_texture_object *stObj =
 st_get_texture_object(st->ctx, prog, unit);
+  struct pipe_screen *pscreen = st->pipe->screen;
+
+  if (pscreen->is_format_supported(pscreen, st_get_view_format(stObj),
+   PIPE_TEXTURE_2D, 0, 0,
+   PIPE_BIND_SAMPLER_VIEW))
+ continue;
 
   switch (st_get_view_format(stObj)) {
   case PIPE_FORMAT_NV12:
diff --git a/src/mesa/state_tracker/st_sampler_view.c 
b/src/mesa/state_tracker/st_sampler_view.c
index eb97f2bb6b7d..30dfa20af6b5 100644
--- a/src/mesa/state_tracker/st_sampler_view.c
+++ b/src/mesa/state_tracker/st_sampler_view.c
@@ -471,6 +471,7 @@ get_sampler_view_format(struct st_context *st,
 const struct st_texture_object *stObj,
 bool srgb_skip_decode)
 {
+   struct pipe_screen *pscreen = st->pipe->screen;
enum pipe_format format;
 
GLenum baseFormat = _mesa_base_tex_image(>base)->_BaseFormat;
@@ -489,15 +490,19 @@ get_sampler_view_format(struct st_context *st,
if (srgb_skip_decode)
   format = util_format_linear(format);
 
-   /* Use R8_UNORM for video formats */
-   switch (format) {
-   case PIPE_FORMAT_NV12:
-   case PIPE_FORMAT_IYUV:
-  format = PIPE_FORMAT_R8_UNORM;
-  break;
-   default:
-  break;
+   if (!pscreen->is_format_supported(pscreen, format, PIPE_TEXTURE_2D, 0, 0,
+ PIPE_BIND_SAMPLER_VIEW)) {
+  /* Use R8_UNORM for video formats */
+  switch (format) {
+  case PIPE_FORMAT_NV12:
+  case PIPE_FORMAT_IYUV:
+ format = PIPE_FORMAT_R8_UNORM;
+ break;
+  default:
+ break;
+  }
}
+
return format;
 }
 
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/6] st/dri: handle emulated YUV texture sampling in query_dma_buf_modifiers

2019-04-12 Thread Lucas Stach
The Mesa state tracker will emulate YUV texture sampling for drivers that
don't support it natively by using multiple R8/RG88 samplers. Teach
dri2_query_dma_buf_modifiers about this special case.
This allows clients like GStreamer glupload or Kodi to properly query
the supported dma-buf import formats, without the need to take a special
code path for the emulated OES_external texture handling.

Signed-off-by: Lucas Stach 
---
 src/gallium/state_trackers/dri/dri2.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/src/gallium/state_trackers/dri/dri2.c 
b/src/gallium/state_trackers/dri/dri2.c
index f139bd6722b9..4243a00cb38d 100644
--- a/src/gallium/state_trackers/dri/dri2.c
+++ b/src/gallium/state_trackers/dri/dri2.c
@@ -1358,19 +1358,30 @@ dri2_query_dma_buf_modifiers(__DRIscreen *_screen, int 
fourcc, int max,
const struct dri2_format_mapping *map = dri2_get_mapping_by_fourcc(fourcc);
enum pipe_format format;
 
-   if (!map)
+   if (!map || !pscreen->query_dmabuf_modifiers)
   return false;
 
format = map->pipe_format;
 
-   if (pscreen->query_dmabuf_modifiers != NULL &&
-   (pscreen->is_format_supported(pscreen, format, screen->target, 0, 0,
+   if ((pscreen->is_format_supported(pscreen, format, screen->target, 0, 0,
  PIPE_BIND_RENDER_TARGET) ||
 pscreen->is_format_supported(pscreen, format, screen->target, 0, 0,
  PIPE_BIND_SAMPLER_VIEW))) {
   pscreen->query_dmabuf_modifiers(pscreen, format, max, modifiers,
   external_only, count);
   return true;
+   } else if (util_format_is_yuv(format) &&
+  pscreen->is_format_supported(pscreen, PIPE_FORMAT_R8_UNORM,
+   screen->target, 0, 0,
+   PIPE_BIND_SAMPLER_VIEW)) {
+  /* YUV format sampling can be emulated by the Mesa state tracker by
+   * using multiple R8/RG88 samplers if the driver doesn't support those
+   * formats natively, so we need a special case here to give a mostly
+   * accurate answer to the modifiers query.
+   */
+  pscreen->query_dmabuf_modifiers(pscreen, PIPE_FORMAT_R8_UNORM, max,
+  modifiers, external_only, count);
+  return true;
}
return false;
 }
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/6] st/dri: fix dri2_query_image for multiplanar images

2019-04-12 Thread Lucas Stach
From: Philipp Zabel 

Images with multiple planes in separate chained texture resources should
report the correct number of planes.

Signed-off-by: Philipp Zabel 
---
 src/gallium/state_trackers/dri/dri2.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/dri/dri2.c 
b/src/gallium/state_trackers/dri/dri2.c
index 510b7f8d04a7..4c8ea485cc70 100644
--- a/src/gallium/state_trackers/dri/dri2.c
+++ b/src/gallium/state_trackers/dri/dri2.c
@@ -1083,7 +1083,9 @@ static GLboolean
 dri2_query_image(__DRIimage *image, int attrib, int *value)
 {
struct winsys_handle whandle;
+   struct pipe_resource *tex;
unsigned usage;
+   int i;
 
if (image->use & __DRI_IMAGE_USE_BACKBUFFER)
   usage = PIPE_HANDLE_USAGE_EXPLICIT_FLUSH;
@@ -1157,7 +1159,9 @@ dri2_query_image(__DRIimage *image, int attrib, int 
*value)
   }
   return GL_TRUE;
case __DRI_IMAGE_ATTRIB_NUM_PLANES:
-  *value = 1;
+  for (i = 0, tex = image->texture; i < 4 && tex; tex = tex->next)
+ i++;
+  *value = i;
   return GL_TRUE;
case __DRI_IMAGE_ATTRIB_MODIFIER_UPPER:
   whandle.type = WINSYS_HANDLE_TYPE_KMS;
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/6] st/dri: allow to create image for formats that only support SV or RT binding

2019-04-12 Thread Lucas Stach
Unconditionally requesting both bindings can lead to premature
failure to create a valid image.

Signed-off-by: Lucas Stach 
---
 src/gallium/state_trackers/dri/dri2.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/gallium/state_trackers/dri/dri2.c 
b/src/gallium/state_trackers/dri/dri2.c
index efb43c0d7973..510b7f8d04a7 100644
--- a/src/gallium/state_trackers/dri/dri2.c
+++ b/src/gallium/state_trackers/dri/dri2.c
@@ -987,14 +987,23 @@ dri2_create_image_common(__DRIscreen *_screen,
 {
const struct dri2_format_mapping *map = dri2_get_mapping_by_format(format);
struct dri_screen *screen = dri_screen(_screen);
+   struct pipe_screen *pscreen = screen->base.screen;
__DRIimage *img;
struct pipe_resource templ;
-   unsigned tex_usage;
+   unsigned tex_usage = 0;
 
if (!map)
   return NULL;
 
-   tex_usage = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW;
+   if (pscreen->is_format_supported(pscreen, map->pipe_format, screen->target,
+0, 0, PIPE_BIND_RENDER_TARGET))
+  tex_usage |= PIPE_BIND_RENDER_TARGET;
+   if (pscreen->is_format_supported(pscreen, map->pipe_format, screen->target,
+0, 0, PIPE_BIND_SAMPLER_VIEW))
+  tex_usage |= PIPE_BIND_SAMPLER_VIEW;
+
+   if (!tex_usage)
+  return NULL;
 
if (use & __DRI_IMAGE_USE_SCANOUT)
   tex_usage |= PIPE_BIND_SCANOUT;
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: use CP DMA for the null const buffer clear on CIK

2019-04-12 Thread Samuel Pitoiset

I would suggest to document that workaround somewhere in the code.

On 4/12/19 5:17 PM, Marek Olšák wrote:

From: Marek Olšák 

This is a workaround for a thread deadlock that I have no idea
why it occurs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108879
Fixes: 9b331e462e5021d994859756d46cd2519d9c9c6e
---
  src/gallium/drivers/radeonsi/si_clear.c| 6 +++---
  src/gallium/drivers/radeonsi/si_compute_blit.c | 8 +---
  src/gallium/drivers/radeonsi/si_pipe.c | 2 +-
  src/gallium/drivers/radeonsi/si_pipe.h | 3 ++-
  src/gallium/drivers/radeonsi/si_test_dma.c | 2 +-
  5 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_clear.c 
b/src/gallium/drivers/radeonsi/si_clear.c
index e1805f2a1c9..ead680b857b 100644
--- a/src/gallium/drivers/radeonsi/si_clear.c
+++ b/src/gallium/drivers/radeonsi/si_clear.c
@@ -256,21 +256,21 @@ void vi_dcc_clear_level(struct si_context *sctx,
 * would be more efficient than separate per-layer clear 
operations.
 */
assert(tex->buffer.b.b.nr_storage_samples <= 2 || num_layers == 
1);
  
  		dcc_offset += tex->surface.u.legacy.level[level].dcc_offset;

clear_size = 
tex->surface.u.legacy.level[level].dcc_fast_clear_size *
 num_layers;
}
  
  	si_clear_buffer(sctx, dcc_buffer, dcc_offset, clear_size,

-   _value, 4, SI_COHERENCY_CB_META);
+   _value, 4, SI_COHERENCY_CB_META, false);
  }
  
  /* Set the same micro tile mode as the destination of the last MSAA resolve.

   * This allows hitting the MSAA resolve fast path, which requires that both
   * src and dst micro tile modes match.
   */
  static void si_set_optimal_micro_tile_mode(struct si_screen *sscreen,
   struct si_texture *tex)
  {
if (tex->buffer.b.is_shared ||
@@ -489,21 +489,21 @@ static void si_do_fast_color_clear(struct si_context 
*sctx,
  
  			/* DCC fast clear with MSAA should clear CMASK to 0xC. */

if (tex->buffer.b.b.nr_samples >= 2 && 
tex->cmask_buffer) {
/* TODO: This doesn't work with MSAA. */
if (eliminate_needed)
continue;
  
  uint32_t clear_value = 0x;

si_clear_buffer(sctx, >cmask_buffer->b.b,
tex->cmask_offset, 
tex->surface.cmask_size,
-   _value, 4, 
SI_COHERENCY_CB_META);
+   _value, 4, 
SI_COHERENCY_CB_META, false);
fmask_decompress_needed = true;
}
  
  			vi_dcc_clear_level(sctx, tex, 0, reset_value);

tex->separate_dcc_dirty = true;
} else {
if (too_small)
continue;
  
  			/* 128-bit formats are unusupported */

@@ -517,21 +517,21 @@ static void si_do_fast_color_clear(struct si_context 
*sctx,
  
  			/* ensure CMASK is enabled */

si_alloc_separate_cmask(sctx->screen, tex);
if (!tex->cmask_buffer)
continue;
  
  			/* Do the fast clear. */

uint32_t clear_value = 0;
si_clear_buffer(sctx, >cmask_buffer->b.b,
tex->cmask_offset, 
tex->surface.cmask_size,
-   _value, 4, SI_COHERENCY_CB_META);
+   _value, 4, SI_COHERENCY_CB_META, 
false);
eliminate_needed = true;
}
  
  		if ((eliminate_needed || fmask_decompress_needed) &&

!(tex->dirty_level_mask & (1 << level))) {
tex->dirty_level_mask |= 1 << level;

p_atomic_inc(>screen->compressed_colortex_counter);
}
  
  		/* We can change the micro tile mode before a full clear. */

diff --git a/src/gallium/drivers/radeonsi/si_compute_blit.c 
b/src/gallium/drivers/radeonsi/si_compute_blit.c
index 1abeac6adb0..fb0d8d2f1b6 100644
--- a/src/gallium/drivers/radeonsi/si_compute_blit.c
+++ b/src/gallium/drivers/radeonsi/si_compute_blit.c
@@ -179,21 +179,22 @@ static void si_compute_do_clear_or_copy(struct si_context 
*sctx,
  
  	/* Restore states. */

ctx->bind_compute_state(ctx, saved_cs);
ctx->set_shader_buffers(ctx, PIPE_SHADER_COMPUTE, 0, src ? 2 : 1, 
saved_sb,
saved_writable_mask);
si_compute_internal_end(sctx);
  }
  
  void si_clear_buffer(struct si_context *sctx, struct pipe_resource *dst,

 uint64_t offset, uint64_t size, uint32_t *clear_value,
-uint32_t clear_value_size, enum 

[Mesa-dev] [Bug 110345] Unrecoverable GPU crash with DiRT 4

2019-04-12 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=110345

--- Comment #11 from Samuel Pitoiset  ---
Does the problem still happen if you try to downgrade your kernel?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 109326] mesa: Meson configuration summary should be printed

2019-04-12 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109326

--- Comment #1 from Erik Faye-Lund  ---
An MR has been submitted here:
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/648

-- 
You are receiving this mail because:
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] ac: fix possibly incorrect bindless atomic code in visit_image_atomic

2019-04-12 Thread Samuel Pitoiset

possibly? :)

Reviewed-by: Samuel Pitoiset 

On 4/12/19 5:40 PM, Marek Olšák wrote:

From: Marek Olšák 

---
  src/amd/common/ac_nir_to_llvm.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 55c64e2aacb..afdd9318fff 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -2543,25 +2543,25 @@ static LLVMValueRef visit_image_atomic(struct 
ac_nir_context *ctx,
int param_count = 0;
  
  	bool cmpswap = instr->intrinsic == nir_intrinsic_image_deref_atomic_comp_swap ||

   instr->intrinsic == 
nir_intrinsic_bindless_image_atomic_comp_swap;
const char *atomic_name;
char intrinsic_name[64];
enum ac_atomic_op atomic_subop;
MAYBE_UNUSED int length;
  
  	enum glsl_sampler_dim dim;

-   bool is_unsigned;
+   bool is_unsigned = false;
bool is_array;
if (bindless) {
-   if (instr->intrinsic == nir_intrinsic_image_atomic_min ||
-   instr->intrinsic == nir_intrinsic_image_atomic_max) {
+   if (instr->intrinsic == nir_intrinsic_bindless_image_atomic_min 
||
+   instr->intrinsic == 
nir_intrinsic_bindless_image_atomic_max) {
const GLenum format = nir_intrinsic_format(instr);
assert(format == GL_R32UI || format == GL_R32I);
is_unsigned = format == GL_R32UI;
}
dim = nir_intrinsic_image_dim(instr);
is_array = nir_intrinsic_image_array(instr);
} else {
const struct glsl_type *type = get_image_deref(instr)->type;
is_unsigned = glsl_get_sampler_result_type(type) == 
GLSL_TYPE_UINT;
dim = glsl_get_sampler_dim(type);

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: don't overwrite existing shader files with MESA_SHADER_CAPTURE_PATH

2019-04-12 Thread Eric Engestrom
On Friday, 2019-04-12 11:50:11 -0400, Marek Olšák wrote:
> On Fri, Apr 12, 2019 at 11:41 AM Eric Engestrom 
> wrote:
> 
> > On Friday, 2019-04-12 11:00:56 -0400, Marek Olšák wrote:
> > > On Thu, Apr 11, 2019 at 2:53 AM Tapani Pälli 
> > wrote:
> > > > On 4/11/19 3:32 AM, Marek Olšák wrote:
> > > > > -  file = fopen(filename, "w");
> > > > > + }
> > > > > + FILE *file = fopen(filename, "r");
> > > > > + if (!file)
> > > > > +break;
> > > >
> > > > I'm surprised we don't have some helper like 'util_path_exists' but
> > this
> > > > works, I guess then we should have 'util_path_isdir|isfile' and others
> > > > as well.
> > > >
> > >
> > > There is no standard API for checking whether a file exists. fopen is the
> > > only standard way to do it.
> >
> > What about `access(filename, F_OK)` ?
> >
> 
> That's not standard C API.

Right, that's POSIX and that can't be used on windows, which this file
can be built on.
My bad :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] llvmpipe: add lp_fence_timedwait() helper

2019-04-12 Thread Roland Scheidegger
Looks correct to me.
For the series,
Reviewed-by: Roland Scheidegger 

Am 11.04.19 um 18:05 schrieb Emil Velikov:
> The function is analogous to lp_fence_wait() while taking at timeout
> (ns) parameter, as needed for EGL fence/sync.
> 
> Cc: Roland Scheidegger 
> Signed-off-by: Emil Velikov 
> ---
>  src/gallium/drivers/llvmpipe/lp_fence.c | 22 ++
>  src/gallium/drivers/llvmpipe/lp_fence.h |  3 +++
>  2 files changed, 25 insertions(+)
> 
> diff --git a/src/gallium/drivers/llvmpipe/lp_fence.c 
> b/src/gallium/drivers/llvmpipe/lp_fence.c
> index 20cd91cd63d..f8b31a9d6a5 100644
> --- a/src/gallium/drivers/llvmpipe/lp_fence.c
> +++ b/src/gallium/drivers/llvmpipe/lp_fence.c
> @@ -125,3 +125,25 @@ lp_fence_wait(struct lp_fence *f)
>  }
>  
>  
> +boolean
> +lp_fence_timedwait(struct lp_fence *f, uint64_t timeout)
> +{
> +   struct timespec ts = {
> +  .tv_nsec = timeout % 10L,
> +  .tv_sec = timeout / 10L,
> +   };
> +   int ret;
> +
> +   if (LP_DEBUG & DEBUG_FENCE)
> +  debug_printf("%s %d\n", __FUNCTION__, f->id);
> +
> +   mtx_lock(>mutex);
> +   assert(f->issued);
> +   while (f->count < f->rank) {
> +  ret = cnd_timedwait(>signalled, >mutex, );
> +   }
> +   mtx_unlock(>mutex);
> +   return ret == thrd_success;
> +}
> +
> +
> diff --git a/src/gallium/drivers/llvmpipe/lp_fence.h 
> b/src/gallium/drivers/llvmpipe/lp_fence.h
> index b72026492c6..5ba746d22d1 100644
> --- a/src/gallium/drivers/llvmpipe/lp_fence.h
> +++ b/src/gallium/drivers/llvmpipe/lp_fence.h
> @@ -65,6 +65,9 @@ lp_fence_signalled(struct lp_fence *fence);
>  void
>  lp_fence_wait(struct lp_fence *fence);
>  
> +boolean
> +lp_fence_timedwait(struct lp_fence *fence, uint64_t timeout);
> +
>  void
>  llvmpipe_init_screen_fence_funcs(struct pipe_screen *screen);
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: don't overwrite existing shader files with MESA_SHADER_CAPTURE_PATH

2019-04-12 Thread Marek Olšák
On Fri, Apr 12, 2019 at 11:41 AM Eric Engestrom 
wrote:

> On Friday, 2019-04-12 11:00:56 -0400, Marek Olšák wrote:
> > On Thu, Apr 11, 2019 at 2:53 AM Tapani Pälli 
> wrote:
> > > On 4/11/19 3:32 AM, Marek Olšák wrote:
> > > > -  file = fopen(filename, "w");
> > > > + }
> > > > + FILE *file = fopen(filename, "r");
> > > > + if (!file)
> > > > +break;
> > >
> > > I'm surprised we don't have some helper like 'util_path_exists' but
> this
> > > works, I guess then we should have 'util_path_isdir|isfile' and others
> > > as well.
> > >
> >
> > There is no standard API for checking whether a file exists. fopen is the
> > only standard way to do it.
>
> What about `access(filename, F_OK)` ?
>

That's not standard C API.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: don't overwrite existing shader files with MESA_SHADER_CAPTURE_PATH

2019-04-12 Thread Eric Engestrom
On Friday, 2019-04-12 11:00:56 -0400, Marek Olšák wrote:
> On Thu, Apr 11, 2019 at 2:53 AM Tapani Pälli  wrote:
> > On 4/11/19 3:32 AM, Marek Olšák wrote:
> > > -  file = fopen(filename, "w");
> > > + }
> > > + FILE *file = fopen(filename, "r");
> > > + if (!file)
> > > +break;
> >
> > I'm surprised we don't have some helper like 'util_path_exists' but this
> > works, I guess then we should have 'util_path_isdir|isfile' and others
> > as well.
> >
> 
> There is no standard API for checking whether a file exists. fopen is the
> only standard way to do it.

What about `access(filename, F_OK)` ?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] ac: fix possibly incorrect bindless atomic code in visit_image_atomic

2019-04-12 Thread Marek Olšák
From: Marek Olšák 

---
 src/amd/common/ac_nir_to_llvm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 55c64e2aacb..afdd9318fff 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -2543,25 +2543,25 @@ static LLVMValueRef visit_image_atomic(struct 
ac_nir_context *ctx,
int param_count = 0;
 
bool cmpswap = instr->intrinsic == 
nir_intrinsic_image_deref_atomic_comp_swap ||
   instr->intrinsic == 
nir_intrinsic_bindless_image_atomic_comp_swap;
const char *atomic_name;
char intrinsic_name[64];
enum ac_atomic_op atomic_subop;
MAYBE_UNUSED int length;
 
enum glsl_sampler_dim dim;
-   bool is_unsigned;
+   bool is_unsigned = false;
bool is_array;
if (bindless) {
-   if (instr->intrinsic == nir_intrinsic_image_atomic_min ||
-   instr->intrinsic == nir_intrinsic_image_atomic_max) {
+   if (instr->intrinsic == nir_intrinsic_bindless_image_atomic_min 
||
+   instr->intrinsic == 
nir_intrinsic_bindless_image_atomic_max) {
const GLenum format = nir_intrinsic_format(instr);
assert(format == GL_R32UI || format == GL_R32I);
is_unsigned = format == GL_R32UI;
}
dim = nir_intrinsic_image_dim(instr);
is_array = nir_intrinsic_image_array(instr);
} else {
const struct glsl_type *type = get_image_deref(instr)->type;
is_unsigned = glsl_get_sampler_result_type(type) == 
GLSL_TYPE_UINT;
dim = glsl_get_sampler_dim(type);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 110253] glBlitFramebuffer fails on MSAA fbo source.

2019-04-12 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=110253

--- Comment #2 from Gregory Popovitch  ---
Could this possibly affect the AMD Radeon Pro 19.Q1.1 drivers?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Low interpolation precision for 8 bit textures using llvmpipe

2019-04-12 Thread Roland Scheidegger
Am 12.04.19 um 14:34 schrieb Dominik Drees:
> Hi Roland!
> 
> On 4/11/19 8:18 PM, Roland Scheidegger wrote:
>> What version of mesa are you using?
> The original results were generated using version 19.0.2 (from the arch
> linux repositories), but I got the same results using the current git
> version (98934e6aa19795072a353dae6020dafadc76a1e3).
Alright, both of these would use the GALLIVM_PERF var.

>> The debug flags were changed a while ago (so that those perf tweaks can
>> be disabled on release builds too), it needs to be either:
>> GALLIVM_PERF=no_rho_approx,no_brilinear,no_quad_lod
>> or easier
>> GALLIVM_PERF=no_filter_hacks (which disables these 3 things above
>> together)
>>
>> Although all of that only really affects filtering with mipmaps (not
>> sure if you do?).
> Using GALLIVM_PERF does not a make a difference, either, but that should
> be expected because I'm not using mipmaps, just "regular" linear
> filtering (GL_NEAREST).
>>
>>
>> (more below)
> See my responses below as well.
>>
>>
>> Am 11.04.19 um 18:00 schrieb Dominik Drees:
>>> Running with the suggested flags in the environment does not change the
>>> result for the test case I described below. The results with and without
>>> the environment variables set are pixel-wise equal.
>>>
>>> By the way, and if this of interest: For GL_NEAREST sampling the results
>>> from hardware and llvmpipe are equal as well.
>>>
>>> Best,
>>> Dominik
>>>
>>> On 4/11/19 4:36 PM, Ilia Mirkin wrote:
 llvmpipe takes a number of shortcuts in the interest of speed which
 cause inaccurate texturing. Try running with

 GALLIVM_DEBUG=no_rho_approx,no_brilinear,no_quad_lod

 and see if the issue still occurs.

 Cheers,

     -ilia



 On Thu, Apr 11, 2019 at 8:30 AM Dominik Drees 
 wrote:
>
> Hello, everyone!
>
> I have a question regarding the interpolation precision of llvmpipe.
> Feel free to redirect me to somewhere else if this is not the right
> place to ask. Consider the following scenario: In a fragment shader we
> are sampling from a 16x16, 8 bit texture with values between 0 and 3
> using linear interpolation. Then we write white to the screen if the
> sampled value is > 1/255 and black otherwise. The output looks very
> different when rendered with llvmpipe compared to the result
> produced by
> rendering hardware (for both intel (mesa i965) and nvidia (proprietary
> driver)).
>
> I've uploaded examplary output images here
> (https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fimgur.com%2Fa%2FD1udpezdata=02%7C01%7Csroland%40vmware.com%7Cbdef52eb504c4078f9f808d6be96da17%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905952501149697sdata=vymggYHZTDLwKNh7RpcM1eSyhVA2L%2BfHNchvYS8yQPQ%3Dreserved=0)
>
>
> and the corresponding fragment shader here
> (https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fpa808Reqdata=02%7C01%7Csroland%40vmware.com%7Cbdef52eb504c4078f9f808d6be96da17%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905952501149697sdata=%2FqKVJCXFS4UswynKeSoqCKivTHAb2o%2FZwVE1nwNms3M%3Dreserved=0).
>
>> The shader looks iffy to me, how do you use that vec4 in the if clause?
>>
>>
>
> My hypothesis is that llvmpipe (in contrast to hardware) only uses
> 8 bit
> for the interpolation computation when reading from 8 bit textures and
> thus loses precision in the lower bits. Is that correct? If so, does
> anyone know of a workaround?
>>
>> So, in theory it is indeed possible the results are less accurate with
>> llvmpipe (I believe all recent hw does rgba8 filtering with more than 8
>> bit precision).
>> For formats fitting into rgba8, we have a fast path in llvmpipe
>> (gallivm) for the lerp, which unpacks the 8bit values into 16bit values,
>> does the lerp with that and packs back to 8 bit. The result is
>> accurately rounded there (to 8 bit) but only for 1 lerp step - for a 2d
>> texture there are 3 of those (one per direction, and a final one
>> combining the result). And yes this means the filtered result only has 8
>> bits.
> Do I understand you correctly in that for the 2D case, the results of
> the first two lerps (done in 16 bit) are converted to 8 bit, then
> converted back to 16 bit for the final (second stage) lerp?
Yes. Even the final lerp is converted back to 8 bit before being finally
converted to float. (In theory we could avoid this for the final lerp,
but this would need some refactoring, since the last lerp isn't always
the same - if you have mipmaps for instance there's yet another lerp in
the end between the results of each mip.)


> 
> If so and if I'm understanding this correctly, for 2D (i.e., a 2-stage
> linear interpolation) we potentially have an error in the order of one
> bit for the final 8 bit value due to the intermediate 16->8->16
> conversion. For sampling from a 3D texture (i.e., a 3-stage linear
> 

[Mesa-dev] [PATCH] radeonsi: use CP DMA for the null const buffer clear on CIK

2019-04-12 Thread Marek Olšák
From: Marek Olšák 

This is a workaround for a thread deadlock that I have no idea
why it occurs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108879
Fixes: 9b331e462e5021d994859756d46cd2519d9c9c6e
---
 src/gallium/drivers/radeonsi/si_clear.c| 6 +++---
 src/gallium/drivers/radeonsi/si_compute_blit.c | 8 +---
 src/gallium/drivers/radeonsi/si_pipe.c | 2 +-
 src/gallium/drivers/radeonsi/si_pipe.h | 3 ++-
 src/gallium/drivers/radeonsi/si_test_dma.c | 2 +-
 5 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_clear.c 
b/src/gallium/drivers/radeonsi/si_clear.c
index e1805f2a1c9..ead680b857b 100644
--- a/src/gallium/drivers/radeonsi/si_clear.c
+++ b/src/gallium/drivers/radeonsi/si_clear.c
@@ -256,21 +256,21 @@ void vi_dcc_clear_level(struct si_context *sctx,
 * would be more efficient than separate per-layer clear 
operations.
 */
assert(tex->buffer.b.b.nr_storage_samples <= 2 || num_layers == 
1);
 
dcc_offset += tex->surface.u.legacy.level[level].dcc_offset;
clear_size = 
tex->surface.u.legacy.level[level].dcc_fast_clear_size *
 num_layers;
}
 
si_clear_buffer(sctx, dcc_buffer, dcc_offset, clear_size,
-   _value, 4, SI_COHERENCY_CB_META);
+   _value, 4, SI_COHERENCY_CB_META, false);
 }
 
 /* Set the same micro tile mode as the destination of the last MSAA resolve.
  * This allows hitting the MSAA resolve fast path, which requires that both
  * src and dst micro tile modes match.
  */
 static void si_set_optimal_micro_tile_mode(struct si_screen *sscreen,
   struct si_texture *tex)
 {
if (tex->buffer.b.is_shared ||
@@ -489,21 +489,21 @@ static void si_do_fast_color_clear(struct si_context 
*sctx,
 
/* DCC fast clear with MSAA should clear CMASK to 0xC. 
*/
if (tex->buffer.b.b.nr_samples >= 2 && 
tex->cmask_buffer) {
/* TODO: This doesn't work with MSAA. */
if (eliminate_needed)
continue;
 
uint32_t clear_value = 0x;
si_clear_buffer(sctx, >cmask_buffer->b.b,
tex->cmask_offset, 
tex->surface.cmask_size,
-   _value, 4, 
SI_COHERENCY_CB_META);
+   _value, 4, 
SI_COHERENCY_CB_META, false);
fmask_decompress_needed = true;
}
 
vi_dcc_clear_level(sctx, tex, 0, reset_value);
tex->separate_dcc_dirty = true;
} else {
if (too_small)
continue;
 
/* 128-bit formats are unusupported */
@@ -517,21 +517,21 @@ static void si_do_fast_color_clear(struct si_context 
*sctx,
 
/* ensure CMASK is enabled */
si_alloc_separate_cmask(sctx->screen, tex);
if (!tex->cmask_buffer)
continue;
 
/* Do the fast clear. */
uint32_t clear_value = 0;
si_clear_buffer(sctx, >cmask_buffer->b.b,
tex->cmask_offset, 
tex->surface.cmask_size,
-   _value, 4, SI_COHERENCY_CB_META);
+   _value, 4, SI_COHERENCY_CB_META, 
false);
eliminate_needed = true;
}
 
if ((eliminate_needed || fmask_decompress_needed) &&
!(tex->dirty_level_mask & (1 << level))) {
tex->dirty_level_mask |= 1 << level;

p_atomic_inc(>screen->compressed_colortex_counter);
}
 
/* We can change the micro tile mode before a full clear. */
diff --git a/src/gallium/drivers/radeonsi/si_compute_blit.c 
b/src/gallium/drivers/radeonsi/si_compute_blit.c
index 1abeac6adb0..fb0d8d2f1b6 100644
--- a/src/gallium/drivers/radeonsi/si_compute_blit.c
+++ b/src/gallium/drivers/radeonsi/si_compute_blit.c
@@ -179,21 +179,22 @@ static void si_compute_do_clear_or_copy(struct si_context 
*sctx,
 
/* Restore states. */
ctx->bind_compute_state(ctx, saved_cs);
ctx->set_shader_buffers(ctx, PIPE_SHADER_COMPUTE, 0, src ? 2 : 1, 
saved_sb,
saved_writable_mask);
si_compute_internal_end(sctx);
 }
 
 void si_clear_buffer(struct si_context *sctx, struct pipe_resource *dst,
 uint64_t offset, uint64_t size, uint32_t *clear_value,
-uint32_t 

Re: [Mesa-dev] [PATCH 2/2] ac: use the common helper ac_apply_fmask_to_sample

2019-04-12 Thread Samuel Pitoiset


On 4/12/19 5:04 PM, Marek Olšák wrote:


On Thu, Apr 11, 2019 at 3:15 AM Samuel Pitoiset 
mailto:samuel.pitoi...@gmail.com>> wrote:



On 4/11/19 3:30 AM, Marek Olšák wrote:
> From: Marek Olšák mailto:marek.ol...@amd.com>>
>
> ---
>   src/amd/common/ac_nir_to_llvm.c | 70
+++--
>   1 file changed, 5 insertions(+), 65 deletions(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c
b/src/amd/common/ac_nir_to_llvm.c
> index 3d2f738edec..3abde6e0969 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -2323,92 +2323,32 @@ static int
image_type_to_components_count(enum glsl_sampler_dim dim, bool array)
>       case GLSL_SAMPLER_DIM_SUBPASS:
>               return 2;
>       case GLSL_SAMPLER_DIM_SUBPASS_MS:
>               return 3;
>       default:
>               break;
>       }
>       return 0;
>   }
>
> -
> -/* Adjust the sample index according to FMASK.
> - *
> - * For uncompressed MSAA surfaces, FMASK should return 0x76543210,
> - * which is the identity mapping. Each nibble says which
physical sample
> - * should be fetched to get that sample.
> - *
> - * For example, 0x1100 means there are only 2 samples
stored and
> - * the second sample covers 3/4 of the pixel. When reading
samples 0
> - * and 1, return physical sample 0 (determined by the first two 0s
> - * in FMASK), otherwise return physical sample 1.
> - *
> - * The sample index should be adjusted as follows:
> - *   sample_index = (fmask >> (sample_index * 4)) & 0xF;
> - */
>   static LLVMValueRef adjust_sample_index_using_fmask(struct
ac_llvm_context *ctx,
>  LLVMValueRef coord_x, LLVMValueRef coord_y,
>  LLVMValueRef coord_z,
>  LLVMValueRef sample_index,
>  LLVMValueRef fmask_desc_ptr)
>   {
> -     struct ac_image_args args = {0};
> -     LLVMValueRef res;
> +     unsigned sample_chan = coord_z ? 3 : 2;
> +     LLVMValueRef addr[4] = {coord_x, coord_y, coord_z};
> +     addr[sample_chan] = sample_index;
>
> -     args.coords[0] = coord_x;
> -     args.coords[1] = coord_y;
> -     if (coord_z)
> -             args.coords[2] = coord_z;
> -
> -     args.opcode = ac_image_load;
> -     args.dim = coord_z ? ac_image_2darray : ac_image_2d;
> -     args.resource = fmask_desc_ptr;
> -     args.dmask = 0xf;
> -     args.attributes = AC_FUNC_ATTR_READNONE;
> -
> -     res = ac_build_image_opcode(ctx, );
> -
> -     res = ac_to_integer(ctx, res);
> -     LLVMValueRef four = LLVMConstInt(ctx->i32, 4, false);
> -     LLVMValueRef F = LLVMConstInt(ctx->i32, 0xf, false);
> -
> -     LLVMValueRef fmask = LLVMBuildExtractElement(ctx->builder,
> -                                                  res,
> - ctx->i32_0, "");
> -
> -     LLVMValueRef sample_index4 =
> -             LLVMBuildMul(ctx->builder, sample_index, four, "");
> -     LLVMValueRef shifted_fmask =
> -             LLVMBuildLShr(ctx->builder, fmask, sample_index4, "");
> -     LLVMValueRef final_sample =
> -             LLVMBuildAnd(ctx->builder, shifted_fmask, F, "");

The only difference is the mask (ie. ac_apply_fmask_to_sample uses
0x7)
while this code uses 0xF.


Yes.


According to the comment in that function, I assume 0x7 is the
correct
value?


Yes, it's for EQAA. Only samples 0-7 can occur with MSAA. If EQAA is 
used, 0x8 means the color of the sample is unknown, which is mapped to 
sample 0 by the 0x7 mask.

Reviewed-by: Samuel Pitoiset 


Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] ac: use the common helper ac_apply_fmask_to_sample

2019-04-12 Thread Marek Olšák
On Thu, Apr 11, 2019 at 3:15 AM Samuel Pitoiset 
wrote:

>
> On 4/11/19 3:30 AM, Marek Olšák wrote:
> > From: Marek Olšák 
> >
> > ---
> >   src/amd/common/ac_nir_to_llvm.c | 70 +++--
> >   1 file changed, 5 insertions(+), 65 deletions(-)
> >
> > diff --git a/src/amd/common/ac_nir_to_llvm.c
> b/src/amd/common/ac_nir_to_llvm.c
> > index 3d2f738edec..3abde6e0969 100644
> > --- a/src/amd/common/ac_nir_to_llvm.c
> > +++ b/src/amd/common/ac_nir_to_llvm.c
> > @@ -2323,92 +2323,32 @@ static int image_type_to_components_count(enum
> glsl_sampler_dim dim, bool array)
> >   case GLSL_SAMPLER_DIM_SUBPASS:
> >   return 2;
> >   case GLSL_SAMPLER_DIM_SUBPASS_MS:
> >   return 3;
> >   default:
> >   break;
> >   }
> >   return 0;
> >   }
> >
> > -
> > -/* Adjust the sample index according to FMASK.
> > - *
> > - * For uncompressed MSAA surfaces, FMASK should return 0x76543210,
> > - * which is the identity mapping. Each nibble says which physical sample
> > - * should be fetched to get that sample.
> > - *
> > - * For example, 0x1100 means there are only 2 samples stored and
> > - * the second sample covers 3/4 of the pixel. When reading samples 0
> > - * and 1, return physical sample 0 (determined by the first two 0s
> > - * in FMASK), otherwise return physical sample 1.
> > - *
> > - * The sample index should be adjusted as follows:
> > - *   sample_index = (fmask >> (sample_index * 4)) & 0xF;
> > - */
> >   static LLVMValueRef adjust_sample_index_using_fmask(struct
> ac_llvm_context *ctx,
> >   LLVMValueRef coord_x,
> LLVMValueRef coord_y,
> >   LLVMValueRef coord_z,
> >   LLVMValueRef
> sample_index,
> >   LLVMValueRef
> fmask_desc_ptr)
> >   {
> > - struct ac_image_args args = {0};
> > - LLVMValueRef res;
> > + unsigned sample_chan = coord_z ? 3 : 2;
> > + LLVMValueRef addr[4] = {coord_x, coord_y, coord_z};
> > + addr[sample_chan] = sample_index;
> >
> > - args.coords[0] = coord_x;
> > - args.coords[1] = coord_y;
> > - if (coord_z)
> > - args.coords[2] = coord_z;
> > -
> > - args.opcode = ac_image_load;
> > - args.dim = coord_z ? ac_image_2darray : ac_image_2d;
> > - args.resource = fmask_desc_ptr;
> > - args.dmask = 0xf;
> > - args.attributes = AC_FUNC_ATTR_READNONE;
> > -
> > - res = ac_build_image_opcode(ctx, );
> > -
> > - res = ac_to_integer(ctx, res);
> > - LLVMValueRef four = LLVMConstInt(ctx->i32, 4, false);
> > - LLVMValueRef F = LLVMConstInt(ctx->i32, 0xf, false);
> > -
> > - LLVMValueRef fmask = LLVMBuildExtractElement(ctx->builder,
> > -  res,
> > -  ctx->i32_0, "");
> > -
> > - LLVMValueRef sample_index4 =
> > - LLVMBuildMul(ctx->builder, sample_index, four, "");
> > - LLVMValueRef shifted_fmask =
> > - LLVMBuildLShr(ctx->builder, fmask, sample_index4, "");
> > - LLVMValueRef final_sample =
> > - LLVMBuildAnd(ctx->builder, shifted_fmask, F, "");
>
> The only difference is the mask (ie. ac_apply_fmask_to_sample uses 0x7)
> while this code uses 0xF.
>

Yes.


>
> According to the comment in that function, I assume 0x7 is the correct
> value?
>

Yes, it's for EQAA. Only samples 0-7 can occur with MSAA. If EQAA is used,
0x8 means the color of the sample is unknown, which is mapped to sample 0
by the 0x7 mask.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: don't overwrite existing shader files with MESA_SHADER_CAPTURE_PATH

2019-04-12 Thread Marek Olšák
On Thu, Apr 11, 2019 at 2:53 AM Tapani Pälli  wrote:

>
> On 4/11/19 3:32 AM, Marek Olšák wrote:
> > From: Marek Olšák 
> >
> > ---
> >   src/mesa/main/shaderapi.c | 20 +---
> >   1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
> > index 01342c04e8f..6b73e6c7e7a 100644
> > --- a/src/mesa/main/shaderapi.c
> > +++ b/src/mesa/main/shaderapi.c
> > @@ -1233,24 +1233,38 @@ link_program(struct gl_context *ctx, struct
> gl_shader_program *shProg,
> >if (shProg->_LinkedShaders[stage])
> >   prog = shProg->_LinkedShaders[stage]->Program;
> >
> >_mesa_use_program(ctx, stage, shProg, prog, ctx->_Shader);
> > }
> >  }
> >
> >  /* Capture .shader_test files. */
> >  const char *capture_path = _mesa_get_shader_capture_path();
> >  if (shProg->Name != 0 && shProg->Name != ~0 && capture_path !=
> NULL) {
> > -  FILE *file;
> > -  char *filename = ralloc_asprintf(NULL, "%s/%u.shader_test",
> > +  /* Find an unused filename. */
> > +  char *filename = NULL;
> > +  for (unsigned i = 0;; i++) {
> > + if (i) {
> > +filename = ralloc_asprintf(NULL, "%s/%u-%u.shader_test",
> > +   capture_path, shProg->Name, i);
> > + } else {
> > +filename = ralloc_asprintf(NULL, "%s/%u.shader_test",
> >  capture_path, shProg->Name);
>
> How about just having the counter always there, to simplify a bit and
> have consistent filename scheme? Just a suggestion.
>

My personal preference is to keep the original name without the -%u suffix
if %u == 0.


>
> > -  file = fopen(filename, "w");
> > + }
> > + FILE *file = fopen(filename, "r");
> > + if (!file)
> > +break;
>
> I'm surprised we don't have some helper like 'util_path_exists' but this
> works, I guess then we should have 'util_path_isdir|isfile' and others
> as well.
>

There is no standard API for checking whether a file exists. fopen is the
only standard way to do it.


>
> With or without the suggestion;
> Reviewed-by: Tapani Pälli 
>
> > + fclose(file);
> > + ralloc_free(filename);
> > +  }
> > +
> > +  FILE *file = fopen(filename, "w");
> > if (file) {
> >fprintf(file, "[require]\nGLSL%s >= %u.%02u\n",
> >shProg->IsES ? " ES" : "",
> >shProg->data->Version / 100, shProg->data->Version %
> 100);
> >if (shProg->SeparateShader)
> >   fprintf(file, "GL_ARB_separate_shader_objects\nSSO
> ENABLED\n");
> >fprintf(file, "\n");
> >
> >for (unsigned i = 0; i < shProg->NumShaders; i++) {
> >   fprintf(file, "[%s shader]\n%s\n",
> >
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] panfrost: Fix ->set_vertex_buffers() for partial vertex bufs updates

2019-04-12 Thread Alyssa Rosenzweig
Reviewed-by: Alyssa Rosenzweig 

Looks great! pan_context.c is definitely some of the most fragile code
in the driver right now, so I'm thrilled to get some attention to clean
it up :)

(Not tested by me yet -- I won't be able to test until next week)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 99781] Some Unity games fail assertion on startup in glXCreateContextAttribsARB

2019-04-12 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99781

tele4...@hotmail.com changed:

   What|Removed |Added

 CC||tele4...@hotmail.com

--- Comment #17 from tele4...@hotmail.com ---
I happened across this issue with a Unity 5.3.8 game and Mesa 18.3.6 on a
Sandybridge / xf86-video-intel laptop. The patch linked in comment #15 fixed it
for me.

Please consider CC'ing mesa-stable with the fix.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Low interpolation precision for 8 bit textures using llvmpipe

2019-04-12 Thread Dominik Drees

Hi Roland!

On 4/11/19 8:18 PM, Roland Scheidegger wrote:

What version of mesa are you using?
The original results were generated using version 19.0.2 (from the arch 
linux repositories), but I got the same results using the current git 
version (98934e6aa19795072a353dae6020dafadc76a1e3).

The debug flags were changed a while ago (so that those perf tweaks can
be disabled on release builds too), it needs to be either:
GALLIVM_PERF=no_rho_approx,no_brilinear,no_quad_lod
or easier
GALLIVM_PERF=no_filter_hacks (which disables these 3 things above together)

Although all of that only really affects filtering with mipmaps (not
sure if you do?).
Using GALLIVM_PERF does not a make a difference, either, but that should 
be expected because I'm not using mipmaps, just "regular" linear 
filtering (GL_NEAREST).



(more below)

See my responses below as well.



Am 11.04.19 um 18:00 schrieb Dominik Drees:

Running with the suggested flags in the environment does not change the
result for the test case I described below. The results with and without
the environment variables set are pixel-wise equal.

By the way, and if this of interest: For GL_NEAREST sampling the results
from hardware and llvmpipe are equal as well.

Best,
Dominik

On 4/11/19 4:36 PM, Ilia Mirkin wrote:

llvmpipe takes a number of shortcuts in the interest of speed which
cause inaccurate texturing. Try running with

GALLIVM_DEBUG=no_rho_approx,no_brilinear,no_quad_lod

and see if the issue still occurs.

Cheers,

    -ilia



On Thu, Apr 11, 2019 at 8:30 AM Dominik Drees 
wrote:


Hello, everyone!

I have a question regarding the interpolation precision of llvmpipe.
Feel free to redirect me to somewhere else if this is not the right
place to ask. Consider the following scenario: In a fragment shader we
are sampling from a 16x16, 8 bit texture with values between 0 and 3
using linear interpolation. Then we write white to the screen if the
sampled value is > 1/255 and black otherwise. The output looks very
different when rendered with llvmpipe compared to the result produced by
rendering hardware (for both intel (mesa i965) and nvidia (proprietary
driver)).

I've uploaded examplary output images here
(https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fimgur.com%2Fa%2FD1udpezdata=02%7C01%7Csroland%40vmware.com%7Cbdef52eb504c4078f9f808d6be96da17%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905952501149697sdata=vymggYHZTDLwKNh7RpcM1eSyhVA2L%2BfHNchvYS8yQPQ%3Dreserved=0)

and the corresponding fragment shader here
(https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fpa808Reqdata=02%7C01%7Csroland%40vmware.com%7Cbdef52eb504c4078f9f808d6be96da17%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636905952501149697sdata=%2FqKVJCXFS4UswynKeSoqCKivTHAb2o%2FZwVE1nwNms3M%3Dreserved=0).

The shader looks iffy to me, how do you use that vec4 in the if clause?




My hypothesis is that llvmpipe (in contrast to hardware) only uses 8 bit
for the interpolation computation when reading from 8 bit textures and
thus loses precision in the lower bits. Is that correct? If so, does
anyone know of a workaround?


So, in theory it is indeed possible the results are less accurate with
llvmpipe (I believe all recent hw does rgba8 filtering with more than 8
bit precision).
For formats fitting into rgba8, we have a fast path in llvmpipe
(gallivm) for the lerp, which unpacks the 8bit values into 16bit values,
does the lerp with that and packs back to 8 bit. The result is
accurately rounded there (to 8 bit) but only for 1 lerp step - for a 2d
texture there are 3 of those (one per direction, and a final one
combining the result). And yes this means the filtered result only has 8
bits.
Do I understand you correctly in that for the 2D case, the results of 
the first two lerps (done in 16 bit) are converted to 8 bit, then 
converted back to 16 bit for the final (second stage) lerp?


If so and if I'm understanding this correctly, for 2D (i.e., a 2-stage 
linear interpolation) we potentially have an error in the order of one 
bit for the final 8 bit value due to the intermediate 16->8->16 
conversion. For sampling from a 3D texture (i.e., a 3-stage linear 
interpolation) the effect would be amplified: The extra stage could 
cause an error with a magnitude of two bits of the final 8 bit result 
(if I'm doing the math in my head correctly).


Is there any (conceptual) reason why the result of a one dimensional 
interpolation step is reduced back to 8 bits before the second stage 
interpolation? Would avoiding these conversions not actually be faster 
(in addition to the improved accuracy)?


I do believe you should not rely on implementations having more accuracy
- as far as I know the filtering we do is conformant there (it is tricky
to do better using the fast path).
In principle you are correct. In our regressiontests we actually have 
(per test) configurable thresholds for maximum pixel distance/maximum 
number of differing 

Re: [Mesa-dev] [PATCH] nir,ac/nir: fix cube_face_coord

2019-04-12 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset 

On 4/12/19 12:15 PM, Rhys Perry wrote:

Seems it was missing the "/ ma + 0.5" and the order was swapped.

Fixes: a1a2a8dfda7b9cac7e ('nir: add AMD_gcn_shader extended instructions')
Signed-off-by: Rhys Perry 
---
  src/amd/common/ac_nir_to_llvm.c | 11 +--
  src/compiler/nir/nir_opcodes.py | 21 +++--
  2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 0c8891d26a0..12c4c21a8d9 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -1081,10 +1081,17 @@ static void visit_alu(struct ac_nir_context *ctx, const 
nir_alu_instr *instr)
LLVMValueRef in[3];
for (unsigned chan = 0; chan < 3; chan++)
in[chan] = ac_llvm_extract_elem(>ac, src[0], chan);
-   results[0] = ac_build_intrinsic(>ac, "llvm.amdgcn.cubetc",
+   results[0] = ac_build_intrinsic(>ac, "llvm.amdgcn.cubesc",
ctx->ac.f32, in, 3, 
AC_FUNC_ATTR_READNONE);
-   results[1] = ac_build_intrinsic(>ac, "llvm.amdgcn.cubesc",
+   results[1] = ac_build_intrinsic(>ac, "llvm.amdgcn.cubetc",
ctx->ac.f32, in, 3, 
AC_FUNC_ATTR_READNONE);
+   LLVMValueRef ma = ac_build_intrinsic(>ac, 
"llvm.amdgcn.cubema",
+ctx->ac.f32, in, 3, 
AC_FUNC_ATTR_READNONE);
+   results[0] = ac_build_fdiv(>ac, results[0], ma);
+   results[1] = ac_build_fdiv(>ac, results[1], ma);
+   LLVMValueRef offset = LLVMConstReal(ctx->ac.f32, 0.5);
+   results[0] = LLVMBuildFAdd(ctx->ac.builder, results[0], offset, 
"");
+   results[1] = LLVMBuildFAdd(ctx->ac.builder, results[1], offset, 
"");
result = ac_build_gather_values(>ac, results, 2);
break;
}
diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
index 90f7aed0c0d..0f56dd9596c 100644
--- a/src/compiler/nir/nir_opcodes.py
+++ b/src/compiler/nir/nir_opcodes.py
@@ -410,12 +410,21 @@ dst.x = dst.y = 0.0;
  float absX = fabs(src0.x);
  float absY = fabs(src0.y);
  float absZ = fabs(src0.z);
-if (src0.x >= 0 && absX >= absY && absX >= absZ) { dst.x = -src0.y; dst.y = 
-src0.z; }
-if (src0.x < 0 && absX >= absY && absX >= absZ) { dst.x = -src0.y; dst.y = 
src0.z; }
-if (src0.y >= 0 && absY >= absX && absY >= absZ) { dst.x = src0.z; dst.y = 
src0.x; }
-if (src0.y < 0 && absY >= absX && absY >= absZ) { dst.x = -src0.z; dst.y = 
src0.x; }
-if (src0.z >= 0 && absZ >= absX && absZ >= absY) { dst.x = -src0.y; dst.y = 
src0.x; }
-if (src0.z < 0 && absZ >= absX && absZ >= absY) { dst.x = -src0.y; dst.y = 
-src0.x; }
+
+float ma = 0.0;
+if (absX >= absY && absX >= absZ) { ma = 2 * src0.x; }
+if (absY >= absX && absY >= absZ) { ma = 2 * src0.y; }
+if (absZ >= absX && absZ >= absY) { ma = 2 * src0.z; }
+
+if (src0.x >= 0 && absX >= absY && absX >= absZ) { dst.x = -src0.z; dst.y = 
-src0.y; }
+if (src0.x < 0 && absX >= absY && absX >= absZ) { dst.x = src0.z; dst.y = 
-src0.y; }
+if (src0.y >= 0 && absY >= absX && absY >= absZ) { dst.x = src0.x; dst.y = 
src0.z; }
+if (src0.y < 0 && absY >= absX && absY >= absZ) { dst.x = src0.x; dst.y = 
-src0.z; }
+if (src0.z >= 0 && absZ >= absX && absZ >= absY) { dst.x = src0.x; dst.y = 
-src0.y; }
+if (src0.z < 0 && absZ >= absX && absZ >= absY) { dst.x = -src0.x; dst.y = 
-src0.y; }
+
+dst.x = dst.x / ma + 0.5;
+dst.y = dst.y / ma + 0.5;
  """)
  
  unop_horiz("cube_face_index", 1, tfloat32, 3, tfloat32, """

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] nir,ac/nir: fix cube_face_coord

2019-04-12 Thread Rhys Perry
Seems it was missing the "/ ma + 0.5" and the order was swapped.

Fixes: a1a2a8dfda7b9cac7e ('nir: add AMD_gcn_shader extended instructions')
Signed-off-by: Rhys Perry 
---
 src/amd/common/ac_nir_to_llvm.c | 11 +--
 src/compiler/nir/nir_opcodes.py | 21 +++--
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 0c8891d26a0..12c4c21a8d9 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -1081,10 +1081,17 @@ static void visit_alu(struct ac_nir_context *ctx, const 
nir_alu_instr *instr)
LLVMValueRef in[3];
for (unsigned chan = 0; chan < 3; chan++)
in[chan] = ac_llvm_extract_elem(>ac, src[0], chan);
-   results[0] = ac_build_intrinsic(>ac, "llvm.amdgcn.cubetc",
+   results[0] = ac_build_intrinsic(>ac, "llvm.amdgcn.cubesc",
ctx->ac.f32, in, 3, 
AC_FUNC_ATTR_READNONE);
-   results[1] = ac_build_intrinsic(>ac, "llvm.amdgcn.cubesc",
+   results[1] = ac_build_intrinsic(>ac, "llvm.amdgcn.cubetc",
ctx->ac.f32, in, 3, 
AC_FUNC_ATTR_READNONE);
+   LLVMValueRef ma = ac_build_intrinsic(>ac, 
"llvm.amdgcn.cubema",
+ctx->ac.f32, in, 3, 
AC_FUNC_ATTR_READNONE);
+   results[0] = ac_build_fdiv(>ac, results[0], ma);
+   results[1] = ac_build_fdiv(>ac, results[1], ma);
+   LLVMValueRef offset = LLVMConstReal(ctx->ac.f32, 0.5);
+   results[0] = LLVMBuildFAdd(ctx->ac.builder, results[0], offset, 
"");
+   results[1] = LLVMBuildFAdd(ctx->ac.builder, results[1], offset, 
"");
result = ac_build_gather_values(>ac, results, 2);
break;
}
diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
index 90f7aed0c0d..0f56dd9596c 100644
--- a/src/compiler/nir/nir_opcodes.py
+++ b/src/compiler/nir/nir_opcodes.py
@@ -410,12 +410,21 @@ dst.x = dst.y = 0.0;
 float absX = fabs(src0.x);
 float absY = fabs(src0.y);
 float absZ = fabs(src0.z);
-if (src0.x >= 0 && absX >= absY && absX >= absZ) { dst.x = -src0.y; dst.y = 
-src0.z; }
-if (src0.x < 0 && absX >= absY && absX >= absZ) { dst.x = -src0.y; dst.y = 
src0.z; }
-if (src0.y >= 0 && absY >= absX && absY >= absZ) { dst.x = src0.z; dst.y = 
src0.x; }
-if (src0.y < 0 && absY >= absX && absY >= absZ) { dst.x = -src0.z; dst.y = 
src0.x; }
-if (src0.z >= 0 && absZ >= absX && absZ >= absY) { dst.x = -src0.y; dst.y = 
src0.x; }
-if (src0.z < 0 && absZ >= absX && absZ >= absY) { dst.x = -src0.y; dst.y = 
-src0.x; }
+
+float ma = 0.0;
+if (absX >= absY && absX >= absZ) { ma = 2 * src0.x; }
+if (absY >= absX && absY >= absZ) { ma = 2 * src0.y; }
+if (absZ >= absX && absZ >= absY) { ma = 2 * src0.z; }
+
+if (src0.x >= 0 && absX >= absY && absX >= absZ) { dst.x = -src0.z; dst.y = 
-src0.y; }
+if (src0.x < 0 && absX >= absY && absX >= absZ) { dst.x = src0.z; dst.y = 
-src0.y; }
+if (src0.y >= 0 && absY >= absX && absY >= absZ) { dst.x = src0.x; dst.y = 
src0.z; }
+if (src0.y < 0 && absY >= absX && absY >= absZ) { dst.x = src0.x; dst.y = 
-src0.z; }
+if (src0.z >= 0 && absZ >= absX && absZ >= absY) { dst.x = src0.x; dst.y = 
-src0.y; }
+if (src0.z < 0 && absZ >= absX && absZ >= absY) { dst.x = -src0.x; dst.y = 
-src0.y; }
+
+dst.x = dst.x / ma + 0.5;
+dst.y = dst.y / ma + 0.5;
 """)
 
 unop_horiz("cube_face_index", 1, tfloat32, 3, tfloat32, """
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] panfrost: Fix ->set_vertex_buffers() for partial vertex bufs updates

2019-04-12 Thread Boris Brezillon
The caller of ->set_vertex_buffers() might want to update only a subset
of the currently active vertex buffers, hence the start_slot and
num_buffers arguments.
The panfrost driver was assuming that the whole set of vertex buffers
were to be freed/updated each time this hook is called, which leads to
issues when the caller want to update only some of those buffers (this
is what the gallium-cso layer does when it saves/restores the vertex
buf context).

Fix the logic to allow partial vertex bufs updates.

Signed-off-by: Boris Brezillon 
---
Hi,

I faced this issue while trying to make gallium-hud work with the
panfrost driver. I'm still struggling to fix at least 3 remaining
issues:

- the Y coordinates are inverted, and the HUD is displayed on the
  bottom-left corner instead of top-left corner, plus it's mirrored
- transparency/alpha-blending does not work correctly (I get black
  squares instead of semi transparent grey ones)
- the fonts shader (or something related to it) does not seem to work
  (no legend on the graph)

If anyone has an idea where this could come from, please let me know.

Thanks,

Boris
---
 src/gallium/drivers/panfrost/pan_context.c | 34 --
 src/gallium/drivers/panfrost/pan_context.h |  3 +-
 2 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/panfrost/pan_context.c 
b/src/gallium/drivers/panfrost/pan_context.c
index 9f401b1a7a12..7c346e475269 100644
--- a/src/gallium/drivers/panfrost/pan_context.c
+++ b/src/gallium/drivers/panfrost/pan_context.c
@@ -742,9 +742,13 @@ panfrost_emit_vertex_data(struct panfrost_context *ctx)
 union mali_attr attrs[PIPE_MAX_ATTRIBS];
 
 unsigned invocation_count = 
MALI_NEGATIVE(ctx->payload_tiler.prefix.invocation_count);
+unsigned vertex_buffer_count = 0;
+
+for (int i = 0; i < ARRAY_SIZE(ctx->vertex_buffers); ++i) {
+struct pipe_vertex_buffer *buf = ctx->vertex_buffers[i];
+if (!buf)
+continue;
 
-for (int i = 0; i < ctx->vertex_buffer_count; ++i) {
-struct pipe_vertex_buffer *buf = >vertex_buffers[i];
 struct panfrost_resource *rsrc = (struct panfrost_resource *) 
(buf->buffer.resource);
 
 /* Let's figure out the layout of the attributes in memory so
@@ -782,9 +786,10 @@ panfrost_emit_vertex_data(struct panfrost_context *ctx)
 } else {
 /* Leave unset? */
 }
+vertex_buffer_count++;
 }
 
-ctx->payload_vertex.postfix.attributes = 
panfrost_upload_transient(ctx, attrs, ctx->vertex_buffer_count * sizeof(union 
mali_attr));
+ctx->payload_vertex.postfix.attributes = 
panfrost_upload_transient(ctx, attrs, vertex_buffer_count * sizeof(union 
mali_attr));
 
 panfrost_emit_varying_descriptor(ctx, invocation_count);
 }
@@ -1817,21 +1822,20 @@ panfrost_set_vertex_buffers(
 const struct pipe_vertex_buffer *buffers)
 {
 struct panfrost_context *ctx = pan_context(pctx);
-assert(num_buffers <= PIPE_MAX_ATTRIBS);
+unsigned i;
+assert(start_slot + num_buffers <= PIPE_MAX_ATTRIBS);
 
-/* XXX: Dirty tracking? etc */
-if (buffers) {
-size_t sz = sizeof(buffers[0]) * num_buffers;
-ctx->vertex_buffers = malloc(sz);
-ctx->vertex_buffer_count = num_buffers;
-memcpy(ctx->vertex_buffers, buffers, sz);
-} else {
-if (ctx->vertex_buffers) {
-free(ctx->vertex_buffers);
-ctx->vertex_buffers = NULL;
+for (i = start_slot; i < start_slot + num_buffers; i++) {
+if (ctx->vertex_buffers[i]) {
+free(ctx->vertex_buffers[i]);
+ctx->vertex_buffers[i] = NULL;
 }
 
-ctx->vertex_buffer_count = 0;
+if (!buffers)
+continue;
+
+ctx->vertex_buffers[i] = malloc(sizeof(*buffers));
+memcpy(ctx->vertex_buffers[i], buffers + i, sizeof(*buffers));
 }
 }
 
diff --git a/src/gallium/drivers/panfrost/pan_context.h 
b/src/gallium/drivers/panfrost/pan_context.h
index d071da1c62fa..2a10fede334a 100644
--- a/src/gallium/drivers/panfrost/pan_context.h
+++ b/src/gallium/drivers/panfrost/pan_context.h
@@ -187,8 +187,7 @@ struct panfrost_context {
 
 struct panfrost_vertex_state *vertex;
 
-struct pipe_vertex_buffer *vertex_buffers;
-unsigned vertex_buffer_count;
+struct pipe_vertex_buffer *vertex_buffers[PIPE_MAX_ATTRIBS];
 
 struct panfrost_sampler_state 
*samplers[PIPE_SHADER_TYPES][PIPE_MAX_SAMPLERS];
 unsigned sampler_count[PIPE_SHADER_TYPES];
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

[Mesa-dev] [PATCH 2/2] radv: enable VK_KHR_shader_float16_int8

2019-04-12 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_device.c | 2 +-
 src/amd/vulkan/radv_shader.c | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index dacaac173ae..c517b56cd0f 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -902,7 +902,7 @@ void radv_GetPhysicalDeviceFeatures2(
VkPhysicalDeviceFloat16Int8FeaturesKHR *features =
(VkPhysicalDeviceFloat16Int8FeaturesKHR*)ext;
bool enabled = pdevice->rad_info.chip_class >= VI;
-   features->shaderFloat16 = VK_FALSE;
+   features->shaderFloat16 = enabled && HAVE_LLVM >= 
0x0800;
features->shaderInt8 = enabled;
break;
}
diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
index 7cde5e728e4..898195a71d4 100644
--- a/src/amd/vulkan/radv_shader.c
+++ b/src/amd/vulkan/radv_shader.c
@@ -252,6 +252,7 @@ radv_shader_compile_to_nir(struct radv_device *device,
.variable_pointers = true,
.storage_8bit = true,
.int8 = true,
+   .float16 = true,
},
.ubo_ptr_type = glsl_vector_type(GLSL_TYPE_UINT, 2),
.ssbo_ptr_type = glsl_vector_type(GLSL_TYPE_UINT, 2),
-- 
2.21.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] spirv: add SpvCapabilityFloat16 support

2019-04-12 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/compiler/shader_info.h| 1 +
 src/compiler/spirv/spirv_to_nir.c | 5 -
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/compiler/shader_info.h b/src/compiler/shader_info.h
index 0b67082a732..45ba2982884 100644
--- a/src/compiler/shader_info.h
+++ b/src/compiler/shader_info.h
@@ -70,6 +70,7 @@ struct spirv_supported_capabilities {
bool transform_feedback;
bool trinary_minmax;
bool variable_pointers;
+   bool float16;
 };
 
 typedef struct shader_info {
diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index 98205a1b8b2..e7d6aa02379 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -3611,7 +3611,6 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, 
SpvOp opcode,
   case SpvCapabilityLinkage:
   case SpvCapabilityVector16:
   case SpvCapabilityFloat16Buffer:
-  case SpvCapabilityFloat16:
   case SpvCapabilitySparseResidency:
  vtn_warn("Unsupported SPIR-V capability: %s",
   spirv_capability_to_string(cap));
@@ -3781,6 +3780,10 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, 
SpvOp opcode,
  spv_check_supported(derivative_group, cap);
  break;
 
+  case SpvCapabilityFloat16:
+ spv_check_supported(float16, cap);
+ break;
+
   default:
  vtn_fail("Unhandled capability");
   }
-- 
2.21.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 110408] Lima cannot handle too many vertices because of limited pre-allocated buffer

2019-04-12 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=110408

Icenowy Zheng  changed:

   What|Removed |Added

  Component|Other   |Drivers/Gallium/Lima
 QA Contact|mesa-dev@lists.freedesktop. |
   |org |

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 110408] Lima cannot handle too many vertices because of limited pre-allocated buffer

2019-04-12 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=110408

Bug ID: 110408
   Summary: Lima cannot handle too many vertices because of
limited pre-allocated buffer
   Product: Mesa
   Version: git
  Hardware: ARM
OS: Linux (All)
Status: NEW
  Severity: normal
  Priority: medium
 Component: Other
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: icenowy...@gmail.com
QA Contact: mesa-dev@lists.freedesktop.org

When I try to run the bump:bump-render=high-poly test of glmark2-es2-drm on
Lima, it crashes with a segfault.

The backtrace shows:
#0  0xb3ab03e0 lima_ctx_buff_va (sun4i-drm_dri.so)
#1  0xb3ab43b4 lima_update_varying (sun4i-drm_dri.so)
#2  0xb3619304 u_vbuf_draw_vbo (sun4i-drm_dri.so)
#3  0xb37b15e4 st_draw_vbo (sun4i-drm_dri.so)
#4  0xb3852938 _mesa_draw_arrays (sun4i-drm_dri.so)
#5  0xde1d3954 _ZN4Mesh10render_vboEv (glmark2-es2-drm)
#6  0xde16dce4 _ZN9SceneBump4drawEv (glmark2-es2-drm)
#7  0xde156d10 _ZN8MainLoop4drawEv (glmark2-es2-drm)
#8  0xde157540 _ZN8MainLoop4stepEv (glmark2-es2-drm)
#9  0xde14aca0 _Z12do_benchmarkR6Canvas
(glmark2-es2-drm)
#10 0xde148948 main (glmark2-es2-drm)
#11 0xb400c638 __libc_start_main (libc.so.6)
#12 0xde14aaf0 $x (glmark2-es2-drm)
#13 0xde14aaf0 $x (glmark2-es2-drm)

>From inspection, lima_ctx_buff_va seems to be refering to a context buffer that
haven't been successfully allocated, and in lima_update_varying() the
allocation failure seems to be because the requested memory size is too big
(because the vertex count is big, 144000 for bump:bump-render=high-poly).

Should we use some better way than u_suballocator to allocate the memory for
the varying storage?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev