[Mesa-dev] two radv regressions with latest addrlib import

2017-04-03 Thread Dave Airlie
Just documenting these here, I'm out for a few days so might not get
time to fix them:

I've sent a patch for the first.

1. Small 1D mipmapped textures tests all failed after
Fixes: 36149998 amdgpu/addrlib: Rewrite tile mode optimization code

This looks like the 1D base level gets reduced to a linear aligned,
and things go wrong after that, the fix stops it.

2. S8_UINT is broken

Since it looks like radeonsi never uses s8_uint

amdgpu/addrlib: Add new flags minimizePadding and maxBaseAlign
is what breaks it, it appears to be validating
HwlGetAlignmentInfoMacroTiled that is failing
as we loop through all the non-stencil levels, and we get a return and
stop before calculating the image size fully. Even hacking that
doesn't make it instantly fix it, but I'll try and track it down a bit
more.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/glthread: Avoid unnecessary batch reallocation

2017-04-03 Thread Timothy Arceri

On 04/04/17 13:54, Bartosz Tomczyk wrote:

Thank you Timothy.
Sorry about that, I'm still quite to new git/Mesa workflow.  I will do
better in future.


Not a problem :) Thanks for the patches.




On Apr 4, 2017 01:54, "Timothy Arceri" > wrote:

I've pushed this. Thanks!

In future please add the version to the subject when sending new
revisions. You can do this with the -v option when using git
send-email e.g -v2 will result in [PATCH v2]

Also please add changes to the patch since v1 in the commit message.
e.g.

V2: set batch->used = 0 rather than memsetting the batch to 0.

This helps those reviewing to quickly compare the updates since the
first revision.



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/glthread: Avoid unnecessary batch reallocation

2017-04-03 Thread Bartosz Tomczyk
Thank you Timothy.
Sorry about that, I'm still quite to new git/Mesa workflow.  I will do
better in future.


On Apr 4, 2017 01:54, "Timothy Arceri"  wrote:

I've pushed this. Thanks!

In future please add the version to the subject when sending new revisions.
You can do this with the -v option when using git send-email e.g -v2 will
result in [PATCH v2]

Also please add changes to the patch since v1 in the commit message. e.g.

V2: set batch->used = 0 rather than memsetting the batch to 0.

This helps those reviewing to quickly compare the updates since the first
revision.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] addrlib: don't use linear aligned when pow2Pad is selected.

2017-04-03 Thread Dave Airlie
From: Dave Airlie 

The below commit caused a regression on radv with 1D textures,
this was because the base level for these small textures was
being degraded to a linear level due to this code. This stops
the degradation to linear when the pow2Pad bit it set (this
is set when there are miplevels).

I'm not sure if this is the correct fix, it works at least,
maybe there is still a bug in radv somewhere.

Fixes: 36149998 amdgpu/addrlib: Rewrite tile mode optmization code

Signed-off-by: Dave Airlie 
---
 src/amd/addrlib/core/addrlib1.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/amd/addrlib/core/addrlib1.cpp 
b/src/amd/addrlib/core/addrlib1.cpp
index 2d640cf..9f783a5 100644
--- a/src/amd/addrlib/core/addrlib1.cpp
+++ b/src/amd/addrlib/core/addrlib1.cpp
@@ -3616,6 +3616,7 @@ VOID Lib::OptimizeTileMode(
 (ElemLib::IsBlockCompressed(pInOut->format) == FALSE) &&
 (pInOut->flags.depth == FALSE) &&
 (pInOut->flags.stencil == FALSE) &&
+(pInOut->flags.pow2Pad == FALSE) &&
 (m_configFlags.disableLinearOpt == FALSE) &&
 (pInOut->flags.disableLinearOpt == FALSE))
 {
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/19] tgsi: add BALLOT/READ_* opcodes

2017-04-03 Thread Ilia Mirkin
On Mon, Apr 3, 2017 at 10:01 PM, Boyan Ding  wrote:
> 2017-04-01 1:14 GMT+08:00 Nicolai Hähnle :
>> From: Ilia Mirkin 
>>
>> v2 (Nicolai):
>> - BALLOT isn't per-channel
>> - expand the documentation (also for VOTE_*)
>>
>> Signed-off-by: Ilia Mirkin 
>> Signed-off-by: Nicolai Hähnle 
>> ---
>>  src/gallium/auxiliary/tgsi/tgsi_info.c |  6 +--
>>  src/gallium/docs/source/tgsi.rst   | 67 
>> +-
>>  src/gallium/include/pipe/p_shader_tokens.h |  6 +--
>>  3 files changed, 62 insertions(+), 17 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
>> b/src/gallium/auxiliary/tgsi/tgsi_info.c
>> index 5a6a9bc..30bad6d 100644
>> --- a/src/gallium/auxiliary/tgsi/tgsi_info.c
>> +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
>> @@ -106,51 +106,51 @@ static const struct tgsi_opcode_info 
>> opcode_info[TGSI_OPCODE_LAST] =
>> { 1, 3, 0, 0, 0, 0, 0, COMP, "CMP", TGSI_OPCODE_CMP },
>> { 1, 1, 0, 0, 0, 0, 0, CHAN, "SCS", TGSI_OPCODE_SCS },
>> { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXB", TGSI_OPCODE_TXB },
>> { 1, 1, 0, 0, 0, 0, 0, OTHR, "FBFETCH", TGSI_OPCODE_FBFETCH },
>> { 1, 2, 0, 0, 0, 0, 0, COMP, "DIV", TGSI_OPCODE_DIV },
>> { 1, 2, 0, 0, 0, 0, 0, REPL, "DP2", TGSI_OPCODE_DP2 },
>> { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXL", TGSI_OPCODE_TXL },
>> { 0, 0, 0, 0, 0, 0, 0, NONE, "BRK", TGSI_OPCODE_BRK },
>> { 0, 1, 0, 0, 1, 0, 1, NONE, "IF", TGSI_OPCODE_IF },
>> { 0, 1, 0, 0, 1, 0, 1, NONE, "UIF", TGSI_OPCODE_UIF },
>> -   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 76 },  /* removed */
>> +   { 1, 2, 0, 0, 0, 0, 0, COMP, "READ_INVOC", TGSI_OPCODE_READ_INVOC },
>> { 0, 0, 0, 0, 1, 1, 1, NONE, "ELSE", TGSI_OPCODE_ELSE },
>> { 0, 0, 0, 0, 0, 1, 0, NONE, "ENDIF", TGSI_OPCODE_ENDIF },
>> { 1, 1, 0, 0, 0, 0, 0, COMP, "DDX_FINE", TGSI_OPCODE_DDX_FINE },
>> { 1, 1, 0, 0, 0, 0, 0, COMP, "DDY_FINE", TGSI_OPCODE_DDY_FINE },
>> { 0, 1, 0, 0, 0, 0, 0, NONE, "PUSHA", TGSI_OPCODE_PUSHA },
>> { 1, 0, 0, 0, 0, 0, 0, NONE, "POPA", TGSI_OPCODE_POPA },
>> { 1, 1, 0, 0, 0, 0, 0, COMP, "CEIL", TGSI_OPCODE_CEIL },
>> { 1, 1, 0, 0, 0, 0, 0, COMP, "I2F", TGSI_OPCODE_I2F },
>> { 1, 1, 0, 0, 0, 0, 0, COMP, "NOT", TGSI_OPCODE_NOT },
>> { 1, 1, 0, 0, 0, 0, 0, COMP, "TRUNC", TGSI_OPCODE_TRUNC },
>> { 1, 2, 0, 0, 0, 0, 0, COMP, "SHL", TGSI_OPCODE_SHL },
>> -   { 0, 0, 0, 0, 0, 0, 0, NONE, "", 88 },  /* removed */
>> +   { 1, 1, 0, 0, 0, 0, 0, OTHR, "BALLOT", TGSI_OPCODE_BALLOT },
>> { 1, 2, 0, 0, 0, 0, 0, COMP, "AND", TGSI_OPCODE_AND },
>> { 1, 2, 0, 0, 0, 0, 0, COMP, "OR", TGSI_OPCODE_OR },
>> { 1, 2, 0, 0, 0, 0, 0, COMP, "MOD", TGSI_OPCODE_MOD },
>> { 1, 2, 0, 0, 0, 0, 0, COMP, "XOR", TGSI_OPCODE_XOR },
>> { 1, 3, 0, 0, 0, 0, 0, COMP, "SAD", TGSI_OPCODE_SAD },
>> { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXF", TGSI_OPCODE_TXF },
>> { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXQ", TGSI_OPCODE_TXQ },
>> { 0, 0, 0, 0, 0, 0, 0, NONE, "CONT", TGSI_OPCODE_CONT },
>> { 0, 1, 0, 0, 0, 0, 0, NONE, "EMIT", TGSI_OPCODE_EMIT },
>> { 0, 1, 0, 0, 0, 0, 0, NONE, "ENDPRIM", TGSI_OPCODE_ENDPRIM },
>> { 0, 0, 0, 0, 1, 0, 1, NONE, "BGNLOOP", TGSI_OPCODE_BGNLOOP },
>> { 0, 0, 0, 0, 0, 0, 1, NONE, "BGNSUB", TGSI_OPCODE_BGNSUB },
>> { 0, 0, 0, 0, 1, 1, 0, NONE, "ENDLOOP", TGSI_OPCODE_ENDLOOP },
>> { 0, 0, 0, 0, 0, 1, 0, NONE, "ENDSUB", TGSI_OPCODE_ENDSUB },
>> { 1, 1, 1, 0, 0, 0, 0, OTHR, "TXQ_LZ", TGSI_OPCODE_TXQ_LZ },
>> { 1, 1, 1, 0, 0, 0, 0, OTHR, "TXQS", TGSI_OPCODE_TXQS },
>> { 1, 1, 0, 0, 0, 0, 0, OTHR, "RESQ", TGSI_OPCODE_RESQ },
>> -   { 0, 0, 0, 0, 0, 0, 0, NONE, "", 106 }, /* removed */
>> +   { 1, 1, 0, 0, 0, 0, 0, COMP, "READ_FIRST", TGSI_OPCODE_READ_FIRST },
>> { 0, 0, 0, 0, 0, 0, 0, NONE, "NOP", TGSI_OPCODE_NOP },
>> { 1, 2, 0, 0, 0, 0, 0, COMP, "FSEQ", TGSI_OPCODE_FSEQ },
>> { 1, 2, 0, 0, 0, 0, 0, COMP, "FSGE", TGSI_OPCODE_FSGE },
>> { 1, 2, 0, 0, 0, 0, 0, COMP, "FSLT", TGSI_OPCODE_FSLT },
>> { 1, 2, 0, 0, 0, 0, 0, COMP, "FSNE", TGSI_OPCODE_FSNE },
>> { 0, 1, 0, 0, 0, 0, 0, OTHR, "MEMBAR", TGSI_OPCODE_MEMBAR },
>> { 0, 1, 0, 0, 0, 0, 0, NONE, "CALLNZ", TGSI_OPCODE_CALLNZ },
>> { 0, 1, 0, 0, 0, 0, 0, NONE, "", 114 }, /* removed */
>> { 0, 1, 0, 0, 0, 0, 0, NONE, "BREAKC", TGSI_OPCODE_BREAKC },
>> { 0, 1, 0, 0, 0, 0, 0, NONE, "KILL_IF", TGSI_OPCODE_KILL_IF },
>> diff --git a/src/gallium/docs/source/tgsi.rst 
>> b/src/gallium/docs/source/tgsi.rst
>> index 05b06ce..7e9b47c 100644
>> --- a/src/gallium/docs/source/tgsi.rst
>> +++ b/src/gallium/docs/source/tgsi.rst
>> @@ -2852,36 +2852,81 @@ only be used with 32-bit integer image formats.
>>
>>The following operation is performed atomically:
>>
>>  .. math::
>>
>>dst_x = resource[offset]
>>
>>resource[offset] = (dst_x > src_x ? dst_x : src_x)
>>
>>
>> -.. _voteopcodes:
>> +.. 

Re: [Mesa-dev] [PATCH 07/19] tgsi: add BALLOT/READ_* opcodes

2017-04-03 Thread Boyan Ding
2017-04-01 1:14 GMT+08:00 Nicolai Hähnle :
> From: Ilia Mirkin 
>
> v2 (Nicolai):
> - BALLOT isn't per-channel
> - expand the documentation (also for VOTE_*)
>
> Signed-off-by: Ilia Mirkin 
> Signed-off-by: Nicolai Hähnle 
> ---
>  src/gallium/auxiliary/tgsi/tgsi_info.c |  6 +--
>  src/gallium/docs/source/tgsi.rst   | 67 
> +-
>  src/gallium/include/pipe/p_shader_tokens.h |  6 +--
>  3 files changed, 62 insertions(+), 17 deletions(-)
>
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
> b/src/gallium/auxiliary/tgsi/tgsi_info.c
> index 5a6a9bc..30bad6d 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_info.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
> @@ -106,51 +106,51 @@ static const struct tgsi_opcode_info 
> opcode_info[TGSI_OPCODE_LAST] =
> { 1, 3, 0, 0, 0, 0, 0, COMP, "CMP", TGSI_OPCODE_CMP },
> { 1, 1, 0, 0, 0, 0, 0, CHAN, "SCS", TGSI_OPCODE_SCS },
> { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXB", TGSI_OPCODE_TXB },
> { 1, 1, 0, 0, 0, 0, 0, OTHR, "FBFETCH", TGSI_OPCODE_FBFETCH },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "DIV", TGSI_OPCODE_DIV },
> { 1, 2, 0, 0, 0, 0, 0, REPL, "DP2", TGSI_OPCODE_DP2 },
> { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXL", TGSI_OPCODE_TXL },
> { 0, 0, 0, 0, 0, 0, 0, NONE, "BRK", TGSI_OPCODE_BRK },
> { 0, 1, 0, 0, 1, 0, 1, NONE, "IF", TGSI_OPCODE_IF },
> { 0, 1, 0, 0, 1, 0, 1, NONE, "UIF", TGSI_OPCODE_UIF },
> -   { 0, 1, 0, 0, 0, 0, 1, NONE, "", 76 },  /* removed */
> +   { 1, 2, 0, 0, 0, 0, 0, COMP, "READ_INVOC", TGSI_OPCODE_READ_INVOC },
> { 0, 0, 0, 0, 1, 1, 1, NONE, "ELSE", TGSI_OPCODE_ELSE },
> { 0, 0, 0, 0, 0, 1, 0, NONE, "ENDIF", TGSI_OPCODE_ENDIF },
> { 1, 1, 0, 0, 0, 0, 0, COMP, "DDX_FINE", TGSI_OPCODE_DDX_FINE },
> { 1, 1, 0, 0, 0, 0, 0, COMP, "DDY_FINE", TGSI_OPCODE_DDY_FINE },
> { 0, 1, 0, 0, 0, 0, 0, NONE, "PUSHA", TGSI_OPCODE_PUSHA },
> { 1, 0, 0, 0, 0, 0, 0, NONE, "POPA", TGSI_OPCODE_POPA },
> { 1, 1, 0, 0, 0, 0, 0, COMP, "CEIL", TGSI_OPCODE_CEIL },
> { 1, 1, 0, 0, 0, 0, 0, COMP, "I2F", TGSI_OPCODE_I2F },
> { 1, 1, 0, 0, 0, 0, 0, COMP, "NOT", TGSI_OPCODE_NOT },
> { 1, 1, 0, 0, 0, 0, 0, COMP, "TRUNC", TGSI_OPCODE_TRUNC },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "SHL", TGSI_OPCODE_SHL },
> -   { 0, 0, 0, 0, 0, 0, 0, NONE, "", 88 },  /* removed */
> +   { 1, 1, 0, 0, 0, 0, 0, OTHR, "BALLOT", TGSI_OPCODE_BALLOT },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "AND", TGSI_OPCODE_AND },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "OR", TGSI_OPCODE_OR },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "MOD", TGSI_OPCODE_MOD },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "XOR", TGSI_OPCODE_XOR },
> { 1, 3, 0, 0, 0, 0, 0, COMP, "SAD", TGSI_OPCODE_SAD },
> { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXF", TGSI_OPCODE_TXF },
> { 1, 2, 1, 0, 0, 0, 0, OTHR, "TXQ", TGSI_OPCODE_TXQ },
> { 0, 0, 0, 0, 0, 0, 0, NONE, "CONT", TGSI_OPCODE_CONT },
> { 0, 1, 0, 0, 0, 0, 0, NONE, "EMIT", TGSI_OPCODE_EMIT },
> { 0, 1, 0, 0, 0, 0, 0, NONE, "ENDPRIM", TGSI_OPCODE_ENDPRIM },
> { 0, 0, 0, 0, 1, 0, 1, NONE, "BGNLOOP", TGSI_OPCODE_BGNLOOP },
> { 0, 0, 0, 0, 0, 0, 1, NONE, "BGNSUB", TGSI_OPCODE_BGNSUB },
> { 0, 0, 0, 0, 1, 1, 0, NONE, "ENDLOOP", TGSI_OPCODE_ENDLOOP },
> { 0, 0, 0, 0, 0, 1, 0, NONE, "ENDSUB", TGSI_OPCODE_ENDSUB },
> { 1, 1, 1, 0, 0, 0, 0, OTHR, "TXQ_LZ", TGSI_OPCODE_TXQ_LZ },
> { 1, 1, 1, 0, 0, 0, 0, OTHR, "TXQS", TGSI_OPCODE_TXQS },
> { 1, 1, 0, 0, 0, 0, 0, OTHR, "RESQ", TGSI_OPCODE_RESQ },
> -   { 0, 0, 0, 0, 0, 0, 0, NONE, "", 106 }, /* removed */
> +   { 1, 1, 0, 0, 0, 0, 0, COMP, "READ_FIRST", TGSI_OPCODE_READ_FIRST },
> { 0, 0, 0, 0, 0, 0, 0, NONE, "NOP", TGSI_OPCODE_NOP },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "FSEQ", TGSI_OPCODE_FSEQ },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "FSGE", TGSI_OPCODE_FSGE },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "FSLT", TGSI_OPCODE_FSLT },
> { 1, 2, 0, 0, 0, 0, 0, COMP, "FSNE", TGSI_OPCODE_FSNE },
> { 0, 1, 0, 0, 0, 0, 0, OTHR, "MEMBAR", TGSI_OPCODE_MEMBAR },
> { 0, 1, 0, 0, 0, 0, 0, NONE, "CALLNZ", TGSI_OPCODE_CALLNZ },
> { 0, 1, 0, 0, 0, 0, 0, NONE, "", 114 }, /* removed */
> { 0, 1, 0, 0, 0, 0, 0, NONE, "BREAKC", TGSI_OPCODE_BREAKC },
> { 0, 1, 0, 0, 0, 0, 0, NONE, "KILL_IF", TGSI_OPCODE_KILL_IF },
> diff --git a/src/gallium/docs/source/tgsi.rst 
> b/src/gallium/docs/source/tgsi.rst
> index 05b06ce..7e9b47c 100644
> --- a/src/gallium/docs/source/tgsi.rst
> +++ b/src/gallium/docs/source/tgsi.rst
> @@ -2852,36 +2852,81 @@ only be used with 32-bit integer image formats.
>
>The following operation is performed atomically:
>
>  .. math::
>
>dst_x = resource[offset]
>
>resource[offset] = (dst_x > src_x ? dst_x : src_x)
>
>
> -.. _voteopcodes:
> +.. _interlaneopcodes:
> +
> +Inter-lane opcodes
> +^^
> +
> +These opcodes reduce the given value across the shader invocations
> +running in the current SIMD group. 

Re: [Mesa-dev] [PATCH v2 10/18] anv: Move queues, events, and semaphores to their own file

2017-04-03 Thread Chad Versace
On Mon 13 Mar 2017, Jason Ekstrand wrote:
> Things are about to get more complicated, especially as far as
> semaphores are concerned.
> ---
>  src/intel/Makefile.sources|   1 +
>  src/intel/vulkan/Makefile.sources |  86 +++
>  src/intel/vulkan/anv_device.c | 440 ---
>  src/intel/vulkan/anv_queue.c  | 470 
> ++
>  4 files changed, 557 insertions(+), 440 deletions(-)
>  create mode 100644 src/intel/vulkan/Makefile.sources
>  create mode 100644 src/intel/vulkan/anv_queue.c
> 
> diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
> index 1337574..4382d79 100644
> --- a/src/intel/Makefile.sources
> +++ b/src/intel/Makefile.sources
> @@ -184,6 +184,7 @@ VULKAN_FILES := \
>   vulkan/anv_pipeline.c \
>   vulkan/anv_pipeline_cache.c \
>   vulkan/anv_private.h \
> + vulkan/anv_queue.c \
>   vulkan/anv_util.c \
>   vulkan/anv_wsi.c \
>   vulkan/vk_format_info.h
> diff --git a/src/intel/vulkan/Makefile.sources 
> b/src/intel/vulkan/Makefile.sources
> new file mode 100644
> index 000..aa243a1
> --- /dev/null
> +++ b/src/intel/vulkan/Makefile.sources
> @@ -0,0 +1,86 @@
> +# Copyright © 2015 Intel Corporation
> +#
> +# Permission is hereby granted, free of charge, to any person obtaining a
> +# copy of this software and associated documentation files (the "Software"),
> +# to deal in the Software without restriction, including without limitation
> +# the rights to use, copy, modify, merge, publish, distribute, sublicense,
> +# and/or sell copies of the Software, and to permit persons to whom the
> +# Software is furnished to do so, subject to the following conditions:
> +#
> +# The above copyright notice and this permission notice (including the next
> +# paragraph) shall be included in all copies or substantial portions of the
> +# Software.
> +#
> +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> +# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> +# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> +# IN THE SOFTWARE.

Oops. This patch adds an extra Makefile.sources.

Other than the dup'd file, patch 10 is
Reviewed-by: Chad Versace 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 09/18] anv: Implement VK_KHX_external_memory_fd

2017-04-03 Thread Chad Versace
On Mon 13 Mar 2017, Jason Ekstrand wrote:
> v2 (chadv):
>   - Rebase.
>   - Fix vkGetPhysicalDeviceImageFormatProperties2KHR when
> handleType == 0.
>   - Move handleType-independency comments out of handleType-switch, in
> vkGetPhysicalDeviceExternalBufferPropertiesKHX.  Reduces diff in
> future dma_buf patches.
> 
> Co-authored-with: Chad Versace 
> ---
>  src/intel/vulkan/anv_device.c   | 71 
> -
>  src/intel/vulkan/anv_entrypoints_gen.py |  1 +
>  src/intel/vulkan/anv_formats.c  | 59 +++
>  3 files changed, 113 insertions(+), 18 deletions(-)

Patch 9 is
Reviewed-by: Chad Versace 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 07/18] anv/allocator: Add a BO cache

2017-04-03 Thread Jason Ekstrand
On Mon, Apr 3, 2017 at 5:19 PM, Chad Versace 
wrote:

> On Wed 15 Mar 2017, Jason Ekstrand wrote:
> > This cache allows us to easily ensure that we have a unique anv_bo for
> > each gem handle.  We'll need this in order to support multiple-import of
> > memory objects and semaphores.
> >
> > v2 (Jason Ekstrand):
> >  - Reject BO imports if the size doesn't match the prime fd size as
> >reported by lseek().
> >
> > v3 (Jason Ekstrand):
> >  - Fix reference counting around cache_release (Chris Willson)
> >  - Move the mutex_unlock() later in cache_release
> > ---
> >  src/intel/vulkan/anv_allocator.c | 261 ++
> +
> >  src/intel/vulkan/anv_private.h   |  26 
> >  2 files changed, 287 insertions(+)
>
>
>
> > +VkResult anv_bo_cache_alloc(struct anv_device *device,
> > +struct anv_bo_cache *cache,
> > +uint64_t size, struct anv_bo **bo,
> > +VkAllocationCallbacks *alloc);
>
> > +VkResult anv_bo_cache_import(struct anv_device *device,
> > + struct anv_bo_cache *cache,
> > + int fd, uint64_t size, struct anv_bo **bo,
> > + VkAllocationCallbacks *alloc);
>
> > +void anv_bo_cache_release(struct anv_device *device,
> > +  struct anv_bo_cache *cache,
> > +  struct anv_bo *bo,
> > +  VkAllocationCallbacks *alloc);
> > +
>
> The app may do this:
>
>   // fd1 != fd2
>   // fd1 and fd2 wrap the same gem handle
>
>   mem1 = vkAllocateMemory(import fd1, pAllocator=alloc1); // anv_cached_bo
> is allocated here
>   mem2 = vkAllocateMemory(import fd2, pAllocator=alloc2);
>
>   ...
>
>   vkFreeMemory(mem1, alloc1);
>   vkFreeMemory(mem2, alloc2); // anv_cached_bo is freed here
>
> So we can't use either pAllocator to allocate/free the anv_cached_bo,
> unless anv_cached_bo also cached its original allocator.
> If we don't cache the allocator (and please, let's not), then the
> 'alloc' params to anv_bo_cache_alloc/import/release are dead params.
> We must use an allocator that is not bound
> to the anv_cached_bo, such as device->alloc.
>
>
> Patch 8/18 correctly passes device->alloc into these functions. But,
> since the 'alloc' params aren't really used, they should be dropped to
> prevent accidental misuse.
>

I'm not quite sure what you're asking for here.  Do you want me to just use
device->alloc and drop the parameter?  I'm happy to do that.

--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 07/18] anv/allocator: Add a BO cache

2017-04-03 Thread Chad Versace
On Wed 15 Mar 2017, Jason Ekstrand wrote:
> This cache allows us to easily ensure that we have a unique anv_bo for
> each gem handle.  We'll need this in order to support multiple-import of
> memory objects and semaphores.
> 
> v2 (Jason Ekstrand):
>  - Reject BO imports if the size doesn't match the prime fd size as
>reported by lseek().
> 
> v3 (Jason Ekstrand):
>  - Fix reference counting around cache_release (Chris Willson)
>  - Move the mutex_unlock() later in cache_release
> ---
>  src/intel/vulkan/anv_allocator.c | 261 
> +++
>  src/intel/vulkan/anv_private.h   |  26 
>  2 files changed, 287 insertions(+)



> +VkResult anv_bo_cache_alloc(struct anv_device *device,
> +struct anv_bo_cache *cache,
> +uint64_t size, struct anv_bo **bo,
> +VkAllocationCallbacks *alloc);

> +VkResult anv_bo_cache_import(struct anv_device *device,
> + struct anv_bo_cache *cache,
> + int fd, uint64_t size, struct anv_bo **bo,
> + VkAllocationCallbacks *alloc);

> +void anv_bo_cache_release(struct anv_device *device,
> +  struct anv_bo_cache *cache,
> +  struct anv_bo *bo,
> +  VkAllocationCallbacks *alloc);
> +

The app may do this:

  // fd1 != fd2
  // fd1 and fd2 wrap the same gem handle

  mem1 = vkAllocateMemory(import fd1, pAllocator=alloc1); // anv_cached_bo is 
allocated here
  mem2 = vkAllocateMemory(import fd2, pAllocator=alloc2);

  ...

  vkFreeMemory(mem1, alloc1);
  vkFreeMemory(mem2, alloc2); // anv_cached_bo is freed here

So we can't use either pAllocator to allocate/free the anv_cached_bo,
unless anv_cached_bo also cached its original allocator.
If we don't cache the allocator (and please, let's not), then the
'alloc' params to anv_bo_cache_alloc/import/release are dead params.
We must use an allocator that is not bound
to the anv_cached_bo, such as device->alloc.


Patch 8/18 correctly passes device->alloc into these functions. But,
since the 'alloc' params aren't really used, they should be dropped to
prevent accidental misuse.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 08/18] anv: Use the BO cache for DeviceMemory allocations

2017-04-03 Thread Chad Versace
On Mon 13 Mar 2017, Jason Ekstrand wrote:
> ---
>  src/intel/vulkan/anv_device.c  | 27 ---
>  src/intel/vulkan/anv_image.c   |  2 +-
>  src/intel/vulkan/anv_intel.c   | 14 ++
>  src/intel/vulkan/anv_private.h |  4 +++-
>  src/intel/vulkan/anv_wsi.c |  6 +++---
>  5 files changed, 29 insertions(+), 24 deletions(-)

Patch 8 is
Reviewed-by: Chad Versace 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/vec4: Add some fall through comments

2017-04-03 Thread Jason Ekstrand
On Mon, Apr 3, 2017 at 4:43 PM, Ilia Mirkin  wrote:

> On Mon, Apr 3, 2017 at 7:24 PM, Jason Ekstrand 
> wrote:
> > ---
> >  src/intel/compiler/brw_vec4_nir.cpp | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/src/intel/compiler/brw_vec4_nir.cpp
> b/src/intel/compiler/brw_vec4_nir.cpp
> > index 2384265..613c695 100644
> > --- a/src/intel/compiler/brw_vec4_nir.cpp
> > +++ b/src/intel/compiler/brw_vec4_nir.cpp
> > @@ -1310,6 +1310,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
> >
> > case nir_op_iadd:
> >assert(nir_dest_bit_size(instr->dest.dest) < 64);
> > +  /* fall through */
> > case nir_op_fadd:
> >inst = emit(ADD(dst, op[0], op[1]));
> >inst->saturate = instr->dest.saturate;
> > @@ -1539,6 +1540,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
> >
> > case nir_op_imin:
> > case nir_op_umin:
> > +  /* fall through */
>
> Normally these are placed at the end rather than the beginning. Not
> sure if this will shut coverity up or not.
>

I think it would but I fixed it locally anyway.  Thanks!

--Jason


> >assert(nir_dest_bit_size(instr->dest.dest) < 64);
> > case nir_op_fmin:
> >inst = emit_minmax(BRW_CONDITIONAL_L, dst, op[0], op[1]);
> > @@ -1548,6 +1550,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
> > case nir_op_imax:
> > case nir_op_umax:
> >assert(nir_dest_bit_size(instr->dest.dest) < 64);
> > +  /* fall through */
> > case nir_op_fmax:
> >inst = emit_minmax(BRW_CONDITIONAL_GE, dst, op[0], op[1]);
> >inst->saturate = instr->dest.saturate;
> > @@ -2054,6 +2057,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
> > case nir_op_iabs:
> > case nir_op_ineg:
> >assert(nir_dest_bit_size(instr->dest.dest) < 64);
> > +  /* fall through */
> > case nir_op_fabs:
> > case nir_op_fneg:
> > case nir_op_fsat:
> > --
> > 2.5.0.400.gff86faf
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/glthread: Avoid unnecessary batch reallocation

2017-04-03 Thread Timothy Arceri

I've pushed this. Thanks!

In future please add the version to the subject when sending new 
revisions. You can do this with the -v option when using git send-email 
e.g -v2 will result in [PATCH v2]


Also please add changes to the patch since v1 in the commit message. e.g.

V2: set batch->used = 0 rather than memsetting the batch to 0.

This helps those reviewing to quickly compare the updates since the 
first revision.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/vec4: Add some fall through comments

2017-04-03 Thread Ilia Mirkin
On Mon, Apr 3, 2017 at 7:24 PM, Jason Ekstrand  wrote:
> ---
>  src/intel/compiler/brw_vec4_nir.cpp | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/src/intel/compiler/brw_vec4_nir.cpp 
> b/src/intel/compiler/brw_vec4_nir.cpp
> index 2384265..613c695 100644
> --- a/src/intel/compiler/brw_vec4_nir.cpp
> +++ b/src/intel/compiler/brw_vec4_nir.cpp
> @@ -1310,6 +1310,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
>
> case nir_op_iadd:
>assert(nir_dest_bit_size(instr->dest.dest) < 64);
> +  /* fall through */
> case nir_op_fadd:
>inst = emit(ADD(dst, op[0], op[1]));
>inst->saturate = instr->dest.saturate;
> @@ -1539,6 +1540,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
>
> case nir_op_imin:
> case nir_op_umin:
> +  /* fall through */

Normally these are placed at the end rather than the beginning. Not
sure if this will shut coverity up or not.

>assert(nir_dest_bit_size(instr->dest.dest) < 64);
> case nir_op_fmin:
>inst = emit_minmax(BRW_CONDITIONAL_L, dst, op[0], op[1]);
> @@ -1548,6 +1550,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
> case nir_op_imax:
> case nir_op_umax:
>assert(nir_dest_bit_size(instr->dest.dest) < 64);
> +  /* fall through */
> case nir_op_fmax:
>inst = emit_minmax(BRW_CONDITIONAL_GE, dst, op[0], op[1]);
>inst->saturate = instr->dest.saturate;
> @@ -2054,6 +2057,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
> case nir_op_iabs:
> case nir_op_ineg:
>assert(nir_dest_bit_size(instr->dest.dest) < 64);
> +  /* fall through */
> case nir_op_fabs:
> case nir_op_fneg:
> case nir_op_fsat:
> --
> 2.5.0.400.gff86faf
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/vec4: Add some fall through comments

2017-04-03 Thread Matt Turner
Thank you. These coverity warnings have been coming back repeatedly it seems.

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] [RFC] mesa/glthread: misaligned address access

2017-04-03 Thread Timothy Arceri


On 04/04/17 05:12, Bartosz Tomczyk wrote:

Address sanitizer reports lot of misaligned access:
SUMMARY: AddressSanitizer: undefined-behavior main/marshal.c:276:31 in
main/marshal.c:276:31: runtime error: load of misaligned address 0x631000104866 
for type
'const GLuint' (aka 'const unsigned int'), which requires 4 byte alignment
0x631000104866: note: pointer points here
 92 88 00 00 00 00  00 00 4a 03 0c 00 93 88  00 00 00 00 00 00 02 01  0c 00 40 
8d 00 00 00 00  00 00
 ^
SUMMARY: AddressSanitizer: undefined-behavior main/marshal_generated.c:28725:12 
in
main/marshal_generated.c:28726:12: runtime error: member access within 
misaligned address 0x6310003fc874 for type
'struct marshal_cmd_VertexAttribPointer', which requires 8 byte alignment
0x6310003fc874: note: pointer points here
  01 00 00 00 7a 02 20 00  00 00 00 00 be be be be  be be be be be be be be  be 
be be be be be be be
  ^
SUMMARY: AddressSanitizer: undefined-behavior main/marshal_generated.c:28726:12 
in
main/marshal_generated.c:28726:12: runtime error: store to misaligned address 
0x6310003fc87c for type
'GLint' (aka 'int'), which requires 8 byte alignment
0x6310003fc87c: note: pointer points here
  00 00 00 00 be be be be  be be be be be be be be  be be be be be be be be  be 
be be be be be be be

This probably isn't issue for x86 as misaligned access shoul be as fast
as aligned ones. But this could be issue for different architectures
like ARM/MIPS, any exert here ?

Any idea how to get correct aligment on different platforms ?


Thanks. I've had the same patch locally for a while, I think it makes 
sense to push this as is for now (which I've done). We can adjust for 
other platforms later if needed.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 07/18] anv/allocator: Add a BO cache

2017-04-03 Thread Jason Ekstrand
On Mon, Apr 3, 2017 at 4:26 PM, Chad Versace 
wrote:

> On Wed 15 Mar 2017, Jason Ekstrand wrote:
> > This cache allows us to easily ensure that we have a unique anv_bo for
> > each gem handle.  We'll need this in order to support multiple-import of
> > memory objects and semaphores.
> >
> > v2 (Jason Ekstrand):
> >  - Reject BO imports if the size doesn't match the prime fd size as
> >reported by lseek().
> >
> > v3 (Jason Ekstrand):
> >  - Fix reference counting around cache_release (Chris Willson)
> >  - Move the mutex_unlock() later in cache_release
> > ---
> >  src/intel/vulkan/anv_allocator.c | 261 ++
> +
> >  src/intel/vulkan/anv_private.h   |  26 
> >  2 files changed, 287 insertions(+)
>
>
>
> > +VkResult
> > +anv_bo_cache_import(struct anv_device *device,
> > +struct anv_bo_cache *cache,
> > +int fd, uint64_t size, struct anv_bo **bo_out,
> > +VkAllocationCallbacks *alloc)
> > +{
> > +   pthread_mutex_lock(>mutex);
> > +
> > +   /* The kernel is going to give us whole pages anyway */
> > +   size = align_u64(size, 4096);
> > +
> > +   uint32_t gem_handle = anv_gem_fd_to_handle(device, fd);
> > +   if (!gem_handle) {
> > +  pthread_mutex_unlock(>mutex);
> > +  return vk_error(VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX);
> > +   }
> > +
> > +   struct anv_cached_bo *bo = anv_bo_cache_lookup_locked(cache,
> gem_handle);
> > +   if (bo) {
> > +  assert(bo->bo.size == size);
>
> For security's sake, we must always check the size, even if the bo is in
> the cache. An assertion isn't enough. A malicious client may tell the
> truth the first time when sending the fd to the trusted app. The
> malicious client may then dup the fd and send it to the trusted app with
> a false size.
>

Yup.  Fixed locally.

--Jason


> > +  __sync_fetch_and_add(>refcount, 1);
> > +   } else {
> > +  /* For security purposes, we reject BO imports where the size
> does not
> > +   * match exactly.  This prevents a malicious client from passing a
> > +   * buffer to a trusted client, lying about the size, and telling
> the
> > +   * trusted client to try and texture from an image that goes
> > +   * out-of-bounds.  This sort of thing could lead to GPU hangs or
> worse
> > +   * in the trusted client.  The trusted client can protect itself
> against
> > +   * this sort of attack but only if it can trust the buffer size.
> > +   */
>
> > +  off_t import_size = lseek(fd, 0, SEEK_END);
> > +  if (import_size == (off_t)-1 || import_size != size) {
> > + anv_gem_close(device, gem_handle);
> > + pthread_mutex_unlock(>mutex);
> > + return vk_error(VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX);
> > +  }
> > +
> > +  struct anv_cached_bo *bo =
> > + vk_alloc(alloc, size, 8, VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
> > +  if (!bo) {
> > + anv_gem_close(device, gem_handle);
> > + pthread_mutex_unlock(>mutex);
> > + return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
> > +  }
> > +
> > +  bo->refcount = 1;
> > +
> > +  anv_bo_init(>bo, gem_handle, size);
> > +
> > +  _mesa_hash_table_insert(cache->bo_map, (void
> *)(uintptr_t)gem_handle, bo);
> > +   }
> > +
> > +   pthread_mutex_unlock(>mutex);
> > +
> > +   /* From the Vulkan spec:
> > +*
> > +*"Importing memory from a file descriptor transfers ownership of
> > +*the file descriptor from the application to the Vulkan
> > +*implementation. The application must not perform any
> operations on
> > +*the file descriptor after a successful import."
> > +*
> > +* If the import fails, we leave the file descriptor open.
> > +*/
> > +   close(fd);
> > +
> > +   *bo_out = >bo;
> > +
> > +   return VK_SUCCESS;
> > +}
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 07/18] anv/allocator: Add a BO cache

2017-04-03 Thread Chad Versace
On Wed 15 Mar 2017, Jason Ekstrand wrote:
> This cache allows us to easily ensure that we have a unique anv_bo for
> each gem handle.  We'll need this in order to support multiple-import of
> memory objects and semaphores.
> 
> v2 (Jason Ekstrand):
>  - Reject BO imports if the size doesn't match the prime fd size as
>reported by lseek().
> 
> v3 (Jason Ekstrand):
>  - Fix reference counting around cache_release (Chris Willson)
>  - Move the mutex_unlock() later in cache_release
> ---
>  src/intel/vulkan/anv_allocator.c | 261 
> +++
>  src/intel/vulkan/anv_private.h   |  26 
>  2 files changed, 287 insertions(+)



> +VkResult
> +anv_bo_cache_import(struct anv_device *device,
> +struct anv_bo_cache *cache,
> +int fd, uint64_t size, struct anv_bo **bo_out,
> +VkAllocationCallbacks *alloc)
> +{
> +   pthread_mutex_lock(>mutex);
> +
> +   /* The kernel is going to give us whole pages anyway */
> +   size = align_u64(size, 4096);
> +
> +   uint32_t gem_handle = anv_gem_fd_to_handle(device, fd);
> +   if (!gem_handle) {
> +  pthread_mutex_unlock(>mutex);
> +  return vk_error(VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX);
> +   }
> +
> +   struct anv_cached_bo *bo = anv_bo_cache_lookup_locked(cache, gem_handle);
> +   if (bo) {
> +  assert(bo->bo.size == size);

For security's sake, we must always check the size, even if the bo is in
the cache. An assertion isn't enough. A malicious client may tell the
truth the first time when sending the fd to the trusted app. The
malicious client may then dup the fd and send it to the trusted app with
a false size.

> +  __sync_fetch_and_add(>refcount, 1);
> +   } else {
> +  /* For security purposes, we reject BO imports where the size does not
> +   * match exactly.  This prevents a malicious client from passing a
> +   * buffer to a trusted client, lying about the size, and telling the
> +   * trusted client to try and texture from an image that goes
> +   * out-of-bounds.  This sort of thing could lead to GPU hangs or worse
> +   * in the trusted client.  The trusted client can protect itself 
> against
> +   * this sort of attack but only if it can trust the buffer size.
> +   */

> +  off_t import_size = lseek(fd, 0, SEEK_END);
> +  if (import_size == (off_t)-1 || import_size != size) {
> + anv_gem_close(device, gem_handle);
> + pthread_mutex_unlock(>mutex);
> + return vk_error(VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX);
> +  }
> +
> +  struct anv_cached_bo *bo =
> + vk_alloc(alloc, size, 8, VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
> +  if (!bo) {
> + anv_gem_close(device, gem_handle);
> + pthread_mutex_unlock(>mutex);
> + return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
> +  }
> +
> +  bo->refcount = 1;
> +
> +  anv_bo_init(>bo, gem_handle, size);
> +
> +  _mesa_hash_table_insert(cache->bo_map, (void *)(uintptr_t)gem_handle, 
> bo);
> +   }
> +
> +   pthread_mutex_unlock(>mutex);
> +
> +   /* From the Vulkan spec:
> +*
> +*"Importing memory from a file descriptor transfers ownership of
> +*the file descriptor from the application to the Vulkan
> +*implementation. The application must not perform any operations on
> +*the file descriptor after a successful import."
> +*
> +* If the import fails, we leave the file descriptor open.
> +*/
> +   close(fd);
> +
> +   *bo_out = >bo;
> +
> +   return VK_SUCCESS;
> +}
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 07/18] anv/allocator: Add a BO cache

2017-04-03 Thread Jason Ekstrand
On Mon, Apr 3, 2017 at 3:45 PM, Chad Versace 
wrote:

> On Mon 03 Apr 2017, Jason Ekstrand wrote:
> > On Mon, Apr 3, 2017 at 12:31 PM, Chad Versace 
> > wrote:
> >
> > > On Fri 31 Mar 2017, Chad Versace wrote:
> > > > On Wed 15 Mar 2017, Jason Ekstrand wrote:
> > > > > This cache allows us to easily ensure that we have a unique anv_bo
> for
> > > > > each gem handle.  We'll need this in order to support
> multiple-import
> > > of
> > > > > memory objects and semaphores.
> > > > >
> > > > > v2 (Jason Ekstrand):
> > > > >  - Reject BO imports if the size doesn't match the prime fd size as
> > > > >reported by lseek().
> > > > >
> > > > > v3 (Jason Ekstrand):
> > > > >  - Fix reference counting around cache_release (Chris Willson)
> > > > >  - Move the mutex_unlock() later in cache_release
> > > > > ---
> > > > >  src/intel/vulkan/anv_allocator.c | 261
> ++
> > > +
> > > > >  src/intel/vulkan/anv_private.h   |  26 
> > > > >  2 files changed, 287 insertions(+)
> > > >
> > > >
> > > > > +static uint32_t
> > > > > +hash_uint32_t(const void *key)
> > > > > +{
> > > > > +   return (uint32_t)(uintptr_t)key;
> > > > > +}
> > > >
> > > > This hash function does not appear hashy.
> > > >
> > > > If I correctly understand the details of Mesa's struct hash_table,
> > > > choosing the identify function for the hash function causes unwanted
> > > > clustering when inserting consecutive gem handles.  Since the kernel
> does
> > > > allocate gem handles consecutively, the problem is real.
> > > >
> > > > For proof, consider the following:
> > > >
> > > >- Suppose a long-running process (maybe the compositor) has
> thrashed
> > > on the
> > > >  hash table long enough that its bucket count
> > > >  is ht->size = hash_sizes[7].size = 283. Suppose a spike of
> > > >  compositor activity raises the hash table's density to about
> 0.5.
> > > >  And suppose the hash table buckets are filled with the
> consecutive
> > > gem
> > > >  handles
> > > >
> > > >  {0, 0, 0, 0, 4, 5, 6, 7, 8, 9, ..., 127, 128, 0, 0, 0, ..., 0 }
> > > >
> > > >  The exact density is (128 - 4 + 1) / 283 = 0.4417.
> > > >
> > > >- Next, some other in-process activity (maybe OpenGL) generated
> > > >  a lot of gem handles after Vulkan's most recently imported
> > > >  gem handle, 128.
> > >
> > > This point in the example---the reason why the gem handles in the
> > > anv_bo_cache skip from 128 to 287---is bogus in Vulkan. The problem
> *is*
> > > real for multiple in-process OpenGL contexts derived from the same
> > > EGLDisplay, using EGL_EXT_image_dma_buf_import, because each context
> > > shares the same intel_screen, and therefore the same drm device fd. But
> > > in Vulkan, each VkDevice opens its own drm device id. So, bogus
> example.
> > >
> > > BUT, that leads to a new question...
> > >
> > > Since each VkDevice has a unique drm device fd, and since the kernel
> > > allocates gem handles consecutively on the fd, and since struct
> > > hash_table only grows and never shrinks, and since patch 8/18 inserts
> > > every VkDeviceMemory into the cache... I believe no collisions are
> > > possible in anv_bo_cache.
> > >
> >
> > Does this fall under the category of unbreakable kernel ABI or is it
> just a
> > side-effect of the implementation?  If not, then I'm reluctant to trust
> it.
>
> I'm not certain. But krh, sitting beside me, says "It's ABI at this point.
> The kernel uses a 'idr' structure which guarantees that behavior".
>

If we think we can rely on it, I'm happy to make it into a flat array but
it'll basically be a simplified hash table at that point.


> > > If there are no collisions, then the hash table is only adding
> overhead,
> >
> > Sure, but a no-collision hash table is pretty cheap...
> >
> > > and we should use a direct-addressing lookup table. The bo cache should
> > > look like this:
> > >
> > > struct anv_bo_cache {
> > >/* The array indices are gem handles. Null entries are legal. */
> > >struct anv_bo **bos;
> > >
> > >/* Length of the array. Because the array can have holes, this
> > > * is *not* the number of gem handles in the array.
> > > */
> > >size_t len;
> > >
> > >pthread_mutex_t mutex;
> > > };
> > >
> > > struct anv_bo *
> > > anv_bo_cache_lookup(struct anv_bo_cache *cache, uint32_t
> gem_handle)
> > > {
> > >struct anv_bo *bo = NULL;
> > >
> > >pthread_mutex_lock(>mutex);
> > >
> > >if (gem_handle < cache->len)
> > >   bo = cache->entries[gem_handle];
> > >
> > >pthread_mutex_unlock(>mutex);
> > >
> > >return bo;
> > >
> > > }
> > >
> > > BUT, that leads to yet another question...
> > >
> > > Why is patch 8/18 inserting every VkDeviceMemory into the cache? If
> > > I understand things correctly, we only *need* to insert a
> VkDeviceMemory
> > > into the 

[Mesa-dev] [PATCH] intel/vec4: Add some fall through comments

2017-04-03 Thread Jason Ekstrand
---
 src/intel/compiler/brw_vec4_nir.cpp | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/intel/compiler/brw_vec4_nir.cpp 
b/src/intel/compiler/brw_vec4_nir.cpp
index 2384265..613c695 100644
--- a/src/intel/compiler/brw_vec4_nir.cpp
+++ b/src/intel/compiler/brw_vec4_nir.cpp
@@ -1310,6 +1310,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
 
case nir_op_iadd:
   assert(nir_dest_bit_size(instr->dest.dest) < 64);
+  /* fall through */
case nir_op_fadd:
   inst = emit(ADD(dst, op[0], op[1]));
   inst->saturate = instr->dest.saturate;
@@ -1539,6 +1540,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
 
case nir_op_imin:
case nir_op_umin:
+  /* fall through */
   assert(nir_dest_bit_size(instr->dest.dest) < 64);
case nir_op_fmin:
   inst = emit_minmax(BRW_CONDITIONAL_L, dst, op[0], op[1]);
@@ -1548,6 +1550,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
case nir_op_imax:
case nir_op_umax:
   assert(nir_dest_bit_size(instr->dest.dest) < 64);
+  /* fall through */
case nir_op_fmax:
   inst = emit_minmax(BRW_CONDITIONAL_GE, dst, op[0], op[1]);
   inst->saturate = instr->dest.saturate;
@@ -2054,6 +2057,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
case nir_op_iabs:
case nir_op_ineg:
   assert(nir_dest_bit_size(instr->dest.dest) < 64);
+  /* fall through */
case nir_op_fabs:
case nir_op_fneg:
case nir_op_fsat:
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] intel/gen_decoder: return -1 for unknown command formats

2017-04-03 Thread Jordan Justen
Decoding with aubinator encountered a command of 0x. With the
previous code, it caused aubinator to jump 255 + 2 dwords to start
decoding again.

Instead we can attempt to detect the known instruction formats. If the
format is not recognized, then we can advance just 1 dword.

Signed-off-by: Jordan Justen 
---
 src/intel/common/gen_decoder.c| 21 +++--
 src/intel/tools/aubinator.c   |  4 ++--
 src/mesa/drivers/dri/i965/intel_batchbuffer.c |  2 ++
 3 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c
index 1244f4c4480..d5ba3e6265e 100644
--- a/src/intel/common/gen_decoder.c
+++ b/src/intel/common/gen_decoder.c
@@ -692,28 +692,37 @@ gen_group_get_length(struct gen_group *group, const 
uint32_t *p)
 
case 3: /* Render */ {
   uint32_t subtype = field(h, 27, 28);
+  uint32_t opcode = field(h, 24, 26);
   switch (subtype) {
   case 0:
- return field(h, 0, 7) + 2;
+ if (opcode < 2)
+return field(h, 0, 7) + 2;
+ else
+return -1;
   case 1:
- return 1;
+ if (opcode < 2)
+return 1;
+ else
+return -1;
   case 2: {
- uint32_t opcode = field(h, 24, 26);
- assert(opcode < 3 /* 3 and above currently reserved */);
  if (opcode == 0)
 return field(h, 0, 7) + 2;
  else if (opcode < 3)
 return field(h, 0, 15) + 2;
  else
-return 1; /* FIXME: if more opcodes are added */
+return -1;
   }
   case 3:
- return field(h, 0, 7) + 2;
+ if (opcode < 4)
+return field(h, 0, 7) + 2;
+ else
+return -1;
   }
}
}
 
unreachable("bad opcode");
+   return -1;
 }
 
 void
diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c
index cae578babac..a64bce7a536 100644
--- a/src/intel/tools/aubinator.c
+++ b/src/intel/tools/aubinator.c
@@ -687,12 +687,12 @@ parse_commands(struct gen_spec *spec, uint32_t *cmds, int 
size, int engine)
 
for (p = cmds; p < end; p += length) {
   inst = gen_spec_find_instruction(spec, p);
+  length = gen_group_get_length(inst, p);
   if (inst == NULL) {
  fprintf(outfile, "unknown instruction %08x\n", p[0]);
- length = (p[0] & 0xff) + 2;
+ length = MIN2(1, length);
  continue;
   }
-  length = gen_group_get_length(inst, p);
 
   const char *color, *reset_color = NORMAL;
   uint64_t offset;
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 54bab9efb02..d16b4622456 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -318,6 +318,8 @@ do_batch_dump(struct brw_context *brw)
   }
 
   length = gen_group_get_length(inst, p);
+  assert(length > 0);
+  length = MAX2(length, 1);
}
 
if (ret == 0) {
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Fix blob memory leak

2017-04-03 Thread Timothy Arceri

Pushed, Thanks!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] intel/aubinator: Stop searching after a custom handler is found

2017-04-03 Thread Jordan Justen
Signed-off-by: Jordan Justen 
---
 src/intel/tools/aubinator.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c
index a64bce7a536..0d33c392384 100644
--- a/src/intel/tools/aubinator.c
+++ b/src/intel/tools/aubinator.c
@@ -724,8 +724,10 @@ parse_commands(struct gen_spec *spec, uint32_t *cmds, int 
size, int engine)
  decode_group(inst, p, 0);
 
  for (i = 0; i < ARRAY_LENGTH(custom_handlers); i++) {
-if (gen_group_get_opcode(inst) == custom_handlers[i].opcode)
+if (gen_group_get_opcode(inst) == custom_handlers[i].opcode) {
custom_handlers[i].handle(spec, p);
+   break;
+}
  }
   }
 
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] intel/gen_decoder: Fix length for Media State/Object commands

2017-04-03 Thread Jordan Justen
From BDW PRM, Volume 6: Command Stream Programming, 'Render Command
Header Format'.

Signed-off-by: Jordan Justen 
---
 src/intel/common/gen_decoder.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c
index 1ae78c88e3f..1244f4c4480 100644
--- a/src/intel/common/gen_decoder.c
+++ b/src/intel/common/gen_decoder.c
@@ -697,8 +697,16 @@ gen_group_get_length(struct gen_group *group, const 
uint32_t *p)
  return field(h, 0, 7) + 2;
   case 1:
  return 1;
-  case 2:
- return 2;
+  case 2: {
+ uint32_t opcode = field(h, 24, 26);
+ assert(opcode < 3 /* 3 and above currently reserved */);
+ if (opcode == 0)
+return field(h, 0, 7) + 2;
+ else if (opcode < 3)
+return field(h, 0, 15) + 2;
+ else
+return 1; /* FIXME: if more opcodes are added */
+  }
   case 3:
  return field(h, 0, 7) + 2;
   }
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] egl/android: Dequeue buffers inside EGL calls

2017-04-03 Thread Mauro Rossi
Hi Tomasz,

2017-04-03 10:00 GMT+02:00 Tomasz Figa :

> Hi Mauro,
>
> On Mon, Apr 3, 2017 at 2:48 AM, Mauro Rossi  wrote:
> >
> >
> > 2017-03-31 13:05 GMT+02:00 Tapani Pälli :
> >>
> >>
> >>
> >> On 03/31/2017 10:12 AM, Tapani Pälli wrote:
> >>>
> >>>
> >>>
> >>> On 03/31/2017 09:06 AM, Tapani Pälli wrote:
> 
> 
> 
>  On 03/31/2017 08:24 AM, Rob Clark wrote:
> >
> > On Fri, Mar 31, 2017 at 12:22 AM, Tapani Pälli
> >  wrote:
> >>
> >>
> >>
> >> On 03/30/2017 05:57 PM, Emil Velikov wrote:
> >>>
> >>>
> >>> On 30 March 2017 at 15:30, Tomasz Figa  wrote:
> 
> 
>  On Thu, Mar 30, 2017 at 11:17 PM, Emil Velikov
>  
>  wrote:
> >
> >
> >
> > On 30 March 2017 at 11:55, Tomasz Figa 
> wrote:
> >>
> >>
> >> Android buffer queues can be abandoned, which results in failing
> >> to
> >> dequeue next buffer. Currently this would fail somewhere deep
> >> within
> >> the DRI stack calling loader's getBuffers*(), without any error
> >> reporting to the client app. However Android framework code
> >> relies on
> >> proper signaling of this event, so we move buffer dequeue to
> >> createWindowSurface() and swapBuffers() call, which can generate
> >> proper
> >> EGL errors. To keep the performance benefits of delayed buffer
> >> handling,
> >> if any, fence wait and DRI image creation is kept delayed until
> >> getBuffers*() is called by the DRI driver.
> >>
> > Thank you Tomasz.
> >
> > I'm fairly confident that this should resolve the crash [in
> > swap_buffers] that Mauro was seeing.
> > Mauro can you give it a test ?
> 
> 
> 
>  Ah, I actually noticed a problem with existing code, supposedly
>  fixed
>  by [1], but I'm afraid it's still wrong.
> 
>  Current swap_buffers calls get_back_bo(), but doesn't call
>  update_buffers(), which is the function that should be called
> before
>  to actually dequeue a buffer from Android's buffer queue. Given
>  that,
>  get_back_bo() would simply fail with !dri2_surf->buffer, because
> no
>  buffer was dequeued.
> 
> >>> Right - I was wondering why we don't hit that on EGL/GBM or
> >>> EGL/Wayland.
> >>> From a quick look - may be because EGL/Android drops the dpy mutex
> in
> >>> droid_window_enqueue_buffer().
> >>>
>  My patch removes update_buffers() and changes the buffer
>  management so
>  that there is always a buffer dequeued, starting from surface
>  creation, unless there was an error somewhere.
> 
> >>> Of the top of your head - is there something stopping us from using
> >>> the same method on $other platforms?
> >>>
>  [1]
> 
>  https://cgit.freedesktop.org/mesa/mesa/commit/src/egl/
> drivers/dri2/platform_android.c?id=4d4558411db166d2d66f8cec9cb581
> 149dbe1597
> 
> 
> 
> >
> >
> > Not that huge of an expert on the Android specifics, so just a
> > humble
> > request:
> > Can we seek the code resuffle (droid_{alloc,free}_local_buffer,
> >>>
> >>>
> >>> Oops silly typo - s/seek/split/.
> >>>
> > other?) separate from the functionality changes ?
> 
> 
> 
>  Sure. Thanks for suggestion.
> 
> >>> Please give it a day or two for others to comment.
> >>
> >>
> >>
> >> I'm trying to debug why this causes our homescreen (wallpaper) to be
> >> black.
> >> Otherwise I haven't seen any issues with these changes.
> >>
> >
> > wallpaper seems to be a special sorta hell..  I wonder if there is
> > somehow some sort of interaction with what I fixed / worked-around in
> > a5e733c6b52e93de3000647d075f5ca2f55fcb71 ??
> >
> > Maybe at least try commenting out the temp-pbuffer thing to get max
> > texture size, and see if that "fixes" things
> 
> 
>  Can you give more details, I still live in la la land and don't know
>  about 'temp-pbuffer thing'?
> 
> >>>
> >>> aa I did not recall the problem, you mean the 'dummy pbuffer' in
> >>> SurfaceFlinger .. yes I will check if this is related.
> >>>
> >>
> >> If I take away that dummy pbuffer usage (which is useless anyway),
> couple
> >> of errors disappear from the log. They are:
> >>
> >> SurfaceFlinger: releasePendingBuffer failed: Unknown error -1 (1)
> >> SurfaceFlinger: releasePendingBuffer failed: Unknown error -1 (1)
> >>
> >> but otherwise the desktop still stays black, 

Re: [Mesa-dev] [PATCH 6/9] gallium: decrease the size of pipe_box - 24 -> 16 bytes

2017-04-03 Thread Roland Scheidegger
Am 03.04.2017 um 17:11 schrieb Alex Deucher:
> On Sun, Apr 2, 2017 at 2:00 PM, Marek Olšák  wrote:
>> From: Marek Olšák 
>>
>> Also:
>>
>> pipe_transfer: 48 -> 40 bytes.
>> pipe_blit_info = 176 -> 160 bytes.
>> ---
>>  src/gallium/include/pipe/p_state.h | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/gallium/include/pipe/p_state.h 
>> b/src/gallium/include/pipe/p_state.h
>> index 392bb8b..6a147ef 100644
>> --- a/src/gallium/include/pipe/p_state.h
>> +++ b/src/gallium/include/pipe/p_state.h
>> @@ -472,25 +472,25 @@ struct pipe_image_view
>> } u;
>>  };
>>
>>
>>  /**
>>   * Subregion of 1D/2D/3D image resource.
>>   */
>>  struct pipe_box
>>  {
>> int x;
>> -   int y;
>> -   int z;
>> +   int16_t y;
>> +   int16_t z;
>> int width;
>> -   int height;
>> -   int depth;
>> +   int16_t height;
>> +   int16_t depth;
>>  };
> 
> Not specific to this patch per se, but will this cause any issues in
> the future as max surface sizes supported by hw increase?  I feel like
> we'll need to change this back at some point.
I don't think this can increase easily. With 8 subpixel bits for
rasterization (which is the standard nowadays) the absolute maximum you
could theoretically support would be 64k when using float math, and there's
probably reasons why you don't want to go quite to the aboluste limit.
That said, with int16 the largest pot size possible would be 16k, which
indeed is what everybody already supports, not leaving even a factor 2
increase... So I'm not sure it's really a good idea.


> Alex
> 
>>
>>
>>  /**
>>   * A memory object/resource such as a vertex buffer or texture.
>>   */
>>  struct pipe_resource
>>  {
>> struct pipe_reference reference;
>> struct pipe_screen *screen; /**< screen that this texture belongs to */
>> --
>> 2.7.4
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] util/u_atomic: provide 64bit atomics where they're missing

2017-04-03 Thread Matt Turner
On Mon, Apr 3, 2017 at 1:37 PM, Mark Janes  wrote:
>
> This commit appears to intermittently provoke gpu hangs on 32-bit Intel
> systems when running
>
> piglit.shaders.glsl-max-varyings >max_varying_components
>
> Since the behavior is intermittent, I may have identified the wrong
> patch.  Please let me know if this patch seems unlikely to affect the
> test.

I think it's not this patch, but I haven't figured out which it is yet.

Applying this patch doesn't change i965_dri.so (md5sum is the same).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallium/util: libunwind support

2017-04-03 Thread Rob Clark
On Mon, Apr 3, 2017 at 4:57 PM, Rob Clark  wrote:
> On Mon, Apr 3, 2017 at 4:06 PM, Thomas Hellstrom  
> wrote:
>> On 04/03/2017 07:33 PM, Thomas Hellstrom wrote:
>>> On 04/03/2017 07:13 PM, Rob Clark wrote:
 On Mon, Apr 3, 2017 at 12:56 PM, Thomas Hellstrom  
 wrote:
> Hi, Rob,
>
> On 03/24/2017 10:21 PM, Rob Clark wrote:
>> It's kinda sad that (a) we don't have debug_backtrace support on !X86
>> and that (b) we re-invent our own crude backtrace support in the first
>> place.  If available, use libunwind instead.  The backtrace format is
>> based on what xserver and weston use, since it is nice not to have to
>> figure out a different format.
>>
>> Signed-off-by: Rob Clark 
> Did you consider glibc "backtrace()", I think it's also available on 
> ARM...
 I had not.. although xserver and weston are already using libunwind.
 I'm not sure about portability of libunwind to other libc
 implementations (but I guess it is at least not worse than using a
 glibc specific API).

 I suppose we could always add a fallback to backtrace().

>> Hmm. This commit (bisected) appears to break svga/vmwgfx in DEBUG mode:
>>
>> *** Error in `glxgears': malloc(): memory corruption: 0x025c09c0 ***
>> Aborted (core dumped)
>>
>> The svga linux winsys makes extensive use of the backtrace functionality
>> using u_debug_flush.c
>
> ok, I can reproduce.. hopefully patch following shortly..
>

So, I found one other minor issue, but the big problem is
debug_stack_frame size difference, because u_debug_flush.c doesn't
#include config.h..  adding:

#ifdef HAVE_CONFIG_H
#include 
#endif

in u_debug_stack.h solves that..  although I think usually folks
prefer to have that in .c files.  (Not sure if I remember the reason.)

So we should either add that in u_debug_stack.h or track down all the
.c files that #include it and add there.

(Or is there a reason we don't just do -include config.h globally in
the build system?)

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 07/18] anv/allocator: Add a BO cache

2017-04-03 Thread Jason Ekstrand
On Mon, Apr 3, 2017 at 12:31 PM, Chad Versace 
wrote:

> On Fri 31 Mar 2017, Chad Versace wrote:
> > On Wed 15 Mar 2017, Jason Ekstrand wrote:
> > > This cache allows us to easily ensure that we have a unique anv_bo for
> > > each gem handle.  We'll need this in order to support multiple-import
> of
> > > memory objects and semaphores.
> > >
> > > v2 (Jason Ekstrand):
> > >  - Reject BO imports if the size doesn't match the prime fd size as
> > >reported by lseek().
> > >
> > > v3 (Jason Ekstrand):
> > >  - Fix reference counting around cache_release (Chris Willson)
> > >  - Move the mutex_unlock() later in cache_release
> > > ---
> > >  src/intel/vulkan/anv_allocator.c | 261 ++
> +
> > >  src/intel/vulkan/anv_private.h   |  26 
> > >  2 files changed, 287 insertions(+)
> >
> >
> > > +static uint32_t
> > > +hash_uint32_t(const void *key)
> > > +{
> > > +   return (uint32_t)(uintptr_t)key;
> > > +}
> >
> > This hash function does not appear hashy.
> >
> > If I correctly understand the details of Mesa's struct hash_table,
> > choosing the identify function for the hash function causes unwanted
> > clustering when inserting consecutive gem handles.  Since the kernel does
> > allocate gem handles consecutively, the problem is real.
> >
> > For proof, consider the following:
> >
> >- Suppose a long-running process (maybe the compositor) has thrashed
> on the
> >  hash table long enough that its bucket count
> >  is ht->size = hash_sizes[7].size = 283. Suppose a spike of
> >  compositor activity raises the hash table's density to about 0.5.
> >  And suppose the hash table buckets are filled with the consecutive
> gem
> >  handles
> >
> >  {0, 0, 0, 0, 4, 5, 6, 7, 8, 9, ..., 127, 128, 0, 0, 0, ..., 0 }
> >
> >  The exact density is (128 - 4 + 1) / 283 = 0.4417.
> >
> >- Next, some other in-process activity (maybe OpenGL) generated
> >  a lot of gem handles after Vulkan's most recently imported
> >  gem handle, 128.
>
> This point in the example---the reason why the gem handles in the
> anv_bo_cache skip from 128 to 287---is bogus in Vulkan. The problem *is*
> real for multiple in-process OpenGL contexts derived from the same
> EGLDisplay, using EGL_EXT_image_dma_buf_import, because each context
> shares the same intel_screen, and therefore the same drm device fd. But
> in Vulkan, each VkDevice opens its own drm device id. So, bogus example.
>
> BUT, that leads to a new question...
>
> Since each VkDevice has a unique drm device fd, and since the kernel
> allocates gem handles consecutively on the fd, and since struct
> hash_table only grows and never shrinks, and since patch 8/18 inserts
> every VkDeviceMemory into the cache... I believe no collisions are
> possible in anv_bo_cache.
>

Does this fall under the category of unbreakable kernel ABI or is it just a
side-effect of the implementation?  If not, then I'm reluctant to trust it.


> If there are no collisions, then the hash table is only adding overhead,
>

Sure, but a no-collision hash table is pretty cheap...

and we should use a direct-addressing lookup table. The bo cache should
> look like this:
>
> struct anv_bo_cache {
>/* The array indices are gem handles. Null entries are legal. */
>struct anv_bo **bos;
>
>/* Length of the array. Because the array can have holes, this
> * is *not* the number of gem handles in the array.
> */
>size_t len;
>
>pthread_mutex_t mutex;
> };
>
> struct anv_bo *
> anv_bo_cache_lookup(struct anv_bo_cache *cache, uint32_t gem_handle)
> {
>struct anv_bo *bo = NULL;
>
>pthread_mutex_lock(>mutex);
>
>if (gem_handle < cache->len)
>   bo = cache->entries[gem_handle] == NULL)
>
>pthread_mutex_unlock(>mutex);
>
>return bo;
>
> }
>
> BUT, that leads to yet another question...
>
> Why is patch 8/18 inserting every VkDeviceMemory into the cache? If
> I understand things correctly, we only *need* to insert a VkDeviceMemory
> into the cache if, in vkAllocateMemory, either (1)
> VkExportMemoryAllocateInfoKHX::handleTypes != 0 or (2)
> VkMemoryAllocateInfo's pNext chain contains an import structure.
>

Because I'm lazy.  In order to start using the bo cache,
anv_device_memory::bo needs to be a pointer (well, it's makes the BO cache
API simpler and more efficient if it's a pointer).  This would mean that we
would have to allocate an additional chunk of memory or go through some
other hoops in order to make it work.  At the end of the day, just stuffing
everything in the cache was simpler and kept us to a single path.


> If we insert into the cache only those VkDeviceMemories that are
> imported or that will be exported, then the bo cache remains small, and
> we *should* use a hash table.
>

Maybe.  But the client isn't supposed to be allocating hundreds of
VkDeviceMemory objects.  It's supposed to 

Re: [Mesa-dev] [PATCH 2/2] gallium/util: libunwind support

2017-04-03 Thread Rob Clark
On Mon, Apr 3, 2017 at 4:06 PM, Thomas Hellstrom  wrote:
> On 04/03/2017 07:33 PM, Thomas Hellstrom wrote:
>> On 04/03/2017 07:13 PM, Rob Clark wrote:
>>> On Mon, Apr 3, 2017 at 12:56 PM, Thomas Hellstrom  
>>> wrote:
 Hi, Rob,

 On 03/24/2017 10:21 PM, Rob Clark wrote:
> It's kinda sad that (a) we don't have debug_backtrace support on !X86
> and that (b) we re-invent our own crude backtrace support in the first
> place.  If available, use libunwind instead.  The backtrace format is
> based on what xserver and weston use, since it is nice not to have to
> figure out a different format.
>
> Signed-off-by: Rob Clark 
 Did you consider glibc "backtrace()", I think it's also available on ARM...
>>> I had not.. although xserver and weston are already using libunwind.
>>> I'm not sure about portability of libunwind to other libc
>>> implementations (but I guess it is at least not worse than using a
>>> glibc specific API).
>>>
>>> I suppose we could always add a fallback to backtrace().
>>>
> Hmm. This commit (bisected) appears to break svga/vmwgfx in DEBUG mode:
>
> *** Error in `glxgears': malloc(): memory corruption: 0x025c09c0 ***
> Aborted (core dumped)
>
> The svga linux winsys makes extensive use of the backtrace functionality
> using u_debug_flush.c

ok, I can reproduce.. hopefully patch following shortly..

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] util/u_atomic: provide 64bit atomics where they're missing

2017-04-03 Thread Mark Janes
This commit appears to intermittently provoke gpu hangs on 32-bit Intel
systems when running

piglit.shaders.glsl-max-varyings >max_varying_components

Since the behavior is intermittent, I may have identified the wrong
patch.  Please let me know if this patch seems unlikely to affect the
test.

Matt Turner  writes:

> On Thu, Mar 30, 2017 at 3:47 PM, Matt Turner  wrote:
>> On Thu, Mar 30, 2017 at 3:26 PM, Grazvydas Ignotas 
> wrote:
>>> There are still some distributions trying to support unfortunate people
>>> with old or exotic CPUs that don't have 64bit atomic operations. When
>>> compiling for such a machine, gcc conveniently inserts a library call to
>>> a helper, but it's implementation is missing and we get a linker error.
>>> This allows us to provide our own implementation, which is marked weak
>>> to prefer a better implementation, should one exist.
>>>
>>> v2: changed copyright, some style adjustments
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93089
>>> Signed-off-by: Grazvydas Ignotas 
>>> Reviewed-by: Matt Turner 
>>
>> Thanks, I'll commit this.
>>
>>> ---
>>>  no commit access, but request sent:
>>>  https://bugs.freedesktop.org/show_bug.cgi?id=100467
>>
>> Thanks. I commented on the bug and said I approve :)
>
> I modified the patch slightly to print the results of the test with
> AC_MSG_CHECKING/AC_MSG_RESULT and pushed it as commit a6a38a038bd62e.
>
> I'll do some build tests on various architectures. I think I like the idea
> of this going to the stable 17.0 branch.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: Enable VK_KHR_incremental_present.

2017-04-03 Thread Jason Ekstrand

Reviewed-by: Jason Ekstrand 


On April 3, 2017 12:36:16 PM Bas Nieuwenhuizen  wrote:


Just enabling the driver-independent implementation that Jason did.

Signed-off-by: Bas Nieuwenhuizen 
---
 src/amd/vulkan/radv_device.c   |  4 
 src/amd/vulkan/radv_entrypoints_gen.py |  1 +
 src/amd/vulkan/radv_wsi.c  | 11 ++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 5c48be1d11a..6456eb17f56 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -92,6 +92,10 @@ static const VkExtensionProperties instance_extensions[] = {

 static const VkExtensionProperties common_device_extensions[] = {
{
+   .extensionName = VK_KHR_INCREMENTAL_PRESENT_EXTENSION_NAME,
+   .specVersion = 1,
+   },
+   {
.extensionName = VK_KHR_MAINTENANCE1_EXTENSION_NAME,
.specVersion = 1,
},
diff --git a/src/amd/vulkan/radv_entrypoints_gen.py 
b/src/amd/vulkan/radv_entrypoints_gen.py

index b7b2bcf97e4..bee29dc1202 100644
--- a/src/amd/vulkan/radv_entrypoints_gen.py
+++ b/src/amd/vulkan/radv_entrypoints_gen.py
@@ -31,6 +31,7 @@ supported_extensions = [
'VK_AMD_draw_indirect_count',
'VK_NV_dedicated_allocation',
'VK_KHR_get_physical_device_properties2',
+   'VK_KHR_incremental_present',
'VK_KHR_maintenance1',
'VK_KHR_sampler_mirror_clamp_to_edge',
'VK_KHR_shader_draw_parameters',
diff --git a/src/amd/vulkan/radv_wsi.c b/src/amd/vulkan/radv_wsi.c
index 8b66095b263..b8999f4eb02 100644
--- a/src/amd/vulkan/radv_wsi.c
+++ b/src/amd/vulkan/radv_wsi.c
@@ -26,6 +26,7 @@
 #include "radv_private.h"
 #include "radv_meta.h"
 #include "wsi_common.h"
+#include "util/vk_util.h"

 static const struct wsi_callbacks wsi_cbs = {
.get_phys_device_format_properties = radv_GetPhysicalDeviceFormatProperties,
@@ -452,9 +453,14 @@ VkResult radv_QueuePresentKHR(
RADV_FROM_HANDLE(radv_queue, queue, _queue);
VkResult result = VK_SUCCESS;

+   const VkPresentRegionsKHR *regions =
+vk_find_struct_const(pPresentInfo->pNext, PRESENT_REGIONS_KHR);
+
for (uint32_t i = 0; i < pPresentInfo->swapchainCount; i++) {
RADV_FROM_HANDLE(wsi_swapchain, swapchain, 
pPresentInfo->pSwapchains[i]);
struct radeon_winsys_cs *cs;
+   const VkPresentRegionKHR *region = NULL;
+
assert(radv_device_from_handle(swapchain->device) == 
queue->device);
if (swapchain->fences[0] == VK_NULL_HANDLE) {
result = 
radv_CreateFence(radv_device_to_handle(queue->device),
@@ -484,9 +490,12 @@ VkResult radv_QueuePresentKHR(
 pPresentInfo->waitSemaphoreCount, 
NULL, 0, false, base_fence);
fence->submitted = true;

+   if (regions && regions->pRegions)
+   region = >pRegions[i];
+
result = swapchain->queue_present(swapchain,
  
pPresentInfo->pImageIndices[i],
- NULL);
+ region);
/* TODO: What if one of them returns OUT_OF_DATE? */
if (result != VK_SUCCESS)
return result;
--
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallium/util: libunwind support

2017-04-03 Thread Rob Clark
On Mon, Apr 3, 2017 at 4:06 PM, Thomas Hellstrom  wrote:
> On 04/03/2017 07:33 PM, Thomas Hellstrom wrote:
>> On 04/03/2017 07:13 PM, Rob Clark wrote:
>>> On Mon, Apr 3, 2017 at 12:56 PM, Thomas Hellstrom  
>>> wrote:
 Hi, Rob,

 On 03/24/2017 10:21 PM, Rob Clark wrote:
> It's kinda sad that (a) we don't have debug_backtrace support on !X86
> and that (b) we re-invent our own crude backtrace support in the first
> place.  If available, use libunwind instead.  The backtrace format is
> based on what xserver and weston use, since it is nice not to have to
> figure out a different format.
>
> Signed-off-by: Rob Clark 
 Did you consider glibc "backtrace()", I think it's also available on ARM...
>>> I had not.. although xserver and weston are already using libunwind.
>>> I'm not sure about portability of libunwind to other libc
>>> implementations (but I guess it is at least not worse than using a
>>> glibc specific API).
>>>
>>> I suppose we could always add a fallback to backtrace().
>>>
> Hmm. This commit (bisected) appears to break svga/vmwgfx in DEBUG mode:
>
> *** Error in `glxgears': malloc(): memory corruption: 0x025c09c0 ***
> Aborted (core dumped)
>
> The svga linux winsys makes extensive use of the backtrace functionality
> using u_debug_flush.c

hmm, do you think I should be able to reproduce by sprinkling some
calls to debug_flush_alert()?

I was using it mostly w/ the refcnt logging before..

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3] intel: tools: add aubinator_error_decode tool

2017-04-03 Thread Lionel Landwerlin
This is pretty much the same tool as what i-g-t has, only with a more
fancy decoding of the instructions/registers. It also doesn't support
anything before gen4.

v2 (from Matt): Drop authors
Remove undefined automake variable

v3: Fix incorrect offsets for dword > 1 (Jordan)

Signed-off-by: Lionel Landwerlin 
Acked-by: Matt Turner 
---
 src/intel/Makefile.tools.am  |  19 +-
 src/intel/common/gen_decoder.c   |  11 +
 src/intel/common/gen_decoder.h   |   1 +
 src/intel/tools/.gitignore   |   1 +
 src/intel/tools/aubinator_error_decode.c | 733 +++
 5 files changed, 764 insertions(+), 1 deletion(-)
 create mode 100644 src/intel/tools/aubinator_error_decode.c

diff --git a/src/intel/Makefile.tools.am b/src/intel/Makefile.tools.am
index dc073e8daa..576beea4f2 100644
--- a/src/intel/Makefile.tools.am
+++ b/src/intel/Makefile.tools.am
@@ -19,7 +19,9 @@
 # FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
 # IN THE SOFTWARE.

-noinst_PROGRAMS += tools/aubinator
+noinst_PROGRAMS += \
+   tools/aubinator \
+   tools/aubinator_error_decode

 tools_aubinator_SOURCES = \
tools/aubinator.c \
@@ -42,3 +44,18 @@ tools_aubinator_LDADD = \
$(EXPAT_LIBS) \
$(ZLIB_LIBS) \
-lm
+
+
+tools_aubinator_error_decode_SOURCES = \
+   tools/aubinator_error_decode.c
+
+tools_aubinator_error_decode_LDADD = \
+   common/libintel_common.la \
+   $(top_builddir)/src/util/libmesautil.la \
+   $(EXPAT_LIBS) \
+   $(ZLIB_LIBS)
+
+tools_aubinator_error_decode_CFLAGS = \
+   $(AM_CFLAGS) \
+   $(EXPAT_CFLAGS) \
+   $(ZLIB_CFLAGS)
diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c
index 6cc6896b05..1ae78c88e3 100644
--- a/src/intel/common/gen_decoder.c
+++ b/src/intel/common/gen_decoder.c
@@ -108,6 +108,17 @@ gen_spec_find_register(struct gen_spec *spec, uint32_t 
offset)
return NULL;
 }

+struct gen_group *
+gen_spec_find_register_by_name(struct gen_spec *spec, const char *name)
+{
+   for (int i = 0; i < spec->nregisters; i++) {
+  if (strcmp(spec->registers[i]->name, name) == 0)
+ return spec->registers[i];
+   }
+
+   return NULL;
+}
+
 struct gen_enum *
 gen_spec_find_enum(struct gen_spec *spec, const char *name)
 {
diff --git a/src/intel/common/gen_decoder.h b/src/intel/common/gen_decoder.h
index be37e8a542..870bd7f784 100644
--- a/src/intel/common/gen_decoder.h
+++ b/src/intel/common/gen_decoder.h
@@ -45,6 +45,7 @@ struct gen_spec *gen_spec_load_from_path(const struct 
gen_device_info *devinfo,
 uint32_t gen_spec_get_gen(struct gen_spec *spec);
 struct gen_group *gen_spec_find_instruction(struct gen_spec *spec, const 
uint32_t *p);
 struct gen_group *gen_spec_find_register(struct gen_spec *spec, uint32_t 
offset);
+struct gen_group *gen_spec_find_register_by_name(struct gen_spec *spec, const 
char *name);
 int gen_group_get_length(struct gen_group *group, const uint32_t *p);
 const char *gen_group_get_name(struct gen_group *group);
 uint32_t gen_group_get_opcode(struct gen_group *group);
diff --git a/src/intel/tools/.gitignore b/src/intel/tools/.gitignore
index 0c80a6fed2..27437f9eef 100644
--- a/src/intel/tools/.gitignore
+++ b/src/intel/tools/.gitignore
@@ -1 +1,2 @@
 /aubinator
+/aubinator_error_decode
diff --git a/src/intel/tools/aubinator_error_decode.c 
b/src/intel/tools/aubinator_error_decode.c
new file mode 100644
index 00..b55cc33761
--- /dev/null
+++ b/src/intel/tools/aubinator_error_decode.c
@@ -0,0 +1,733 @@
+/*
+ * Copyright © 2007-2017 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 

Re: [Mesa-dev] [PATCH 2/2] gallium/util: libunwind support

2017-04-03 Thread Thomas Hellstrom
On 04/03/2017 07:33 PM, Thomas Hellstrom wrote:
> On 04/03/2017 07:13 PM, Rob Clark wrote:
>> On Mon, Apr 3, 2017 at 12:56 PM, Thomas Hellstrom  
>> wrote:
>>> Hi, Rob,
>>>
>>> On 03/24/2017 10:21 PM, Rob Clark wrote:
 It's kinda sad that (a) we don't have debug_backtrace support on !X86
 and that (b) we re-invent our own crude backtrace support in the first
 place.  If available, use libunwind instead.  The backtrace format is
 based on what xserver and weston use, since it is nice not to have to
 figure out a different format.

 Signed-off-by: Rob Clark 
>>> Did you consider glibc "backtrace()", I think it's also available on ARM...
>> I had not.. although xserver and weston are already using libunwind.
>> I'm not sure about portability of libunwind to other libc
>> implementations (but I guess it is at least not worse than using a
>> glibc specific API).
>>
>> I suppose we could always add a fallback to backtrace().
>>
Hmm. This commit (bisected) appears to break svga/vmwgfx in DEBUG mode:

*** Error in `glxgears': malloc(): memory corruption: 0x025c09c0 ***
Aborted (core dumped)

The svga linux winsys makes extensive use of the backtrace functionality
using u_debug_flush.c

/Thomas

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: oa-counters: keep on reading reports until delimiting timestamp

2017-04-03 Thread Robert Bragg
On Mar 30, 2017 16:16, "Lionel Landwerlin" 
wrote:

While exercising reading report with moderate load, we might have to
wait for all the reports to land in the OA buffer, otherwise we might
miss some reports. That means we need to keep on reading the OA stream
until the last report we read has a timestamp older that the timestamp
recorded by the MI_REPORT_PERF_COUNT at the end of the performance
query.


The expectation is that since we have received the second
MI_REPORT_PERF_COUNT report that implies any periodic/automatic reports
that the could possibly relate to a query must have been written to the
OABUFFER. Since we read() as much as is available from i915 perf when we
come to finish processing a query then we shouldn't miss anything.

We shouldn't synchronously wait in a busy loop until we've found a report
that has a timestamp >= our end MI_RPC because that could be a really
significant amount of time (e.g. 50+ milliseconds on HSW). NB. the periodic
samples are just to account for counter overflow. On Gen8+ the loop would
likely block until the next context switch happens.

Was this proposal based on a code inspection or did you maybe hit a problem
correctly measuring something?

Maybe if based on inspection we can find somewhere to put a comment to
clarify the assumptions above?

Br,
- Robert


Signed-off-by: Lionel Landwerlin 
Cc: Robert Bragg 
---
 src/mesa/drivers/dri/i965/brw_performance_query.c | 51
++-
 1 file changed, 32 insertions(+), 19 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_performance_query.c
b/src/mesa/drivers/dri/i965/brw_performance_query.c
index 4a94e4b3cc..076c59e633 100644
--- a/src/mesa/drivers/dri/i965/brw_performance_query.c
+++ b/src/mesa/drivers/dri/i965/brw_performance_query.c
@@ -647,38 +647,50 @@ discard_all_queries(struct brw_context *brw)
 }

 static bool
-read_oa_samples(struct brw_context *brw)
+read_oa_samples_until(struct brw_context *brw,
+  uint32_t start_timestamp,
+  uint32_t end_timestamp)
 {
-   while (1) {
+   uint32_t last_timestamp = start_timestamp;
+
+   while (last_timestamp < end_timestamp) {
   struct brw_oa_sample_buf *buf = get_free_sample_buf(brw);
+  uint32_t offset;
   int len;

   while ((len = read(brw->perfquery.oa_stream_fd, buf->buf,
- sizeof(buf->buf))) < 0 && errno == EINTR)
+ sizeof(buf->buf))) < 0 &&
+ (errno == EINTR || errno == EAGAIN))
  ;

   if (len <= 0) {
  exec_list_push_tail(>perfquery.free_sample_buffers,
>link);

- if (len < 0) {
-if (errno == EAGAIN)
-   return true;
-else {
-   DBG("Error reading i915 perf samples: %m\n");
-   return false;
-}
- } else {
+ if (len < 0)
+DBG("Error reading i915 perf samples: %m\n");
+ else
 DBG("Spurious EOF reading i915 perf samples\n");
-return false;
- }
+
+ return false;
   }

   buf->len = len;
   exec_list_push_tail(>perfquery.sample_buffers, >link);
+
+  /* Go through the reports and update the last timestamp. */
+  while (offset < buf->len) {
+ const struct drm_i915_perf_record_header *header =
+(const struct drm_i915_perf_record_header *) >buf[offset];
+ uint32_t *report = (uint32_t *) (header + 1);
+
+ if (header->type == DRM_I915_PERF_RECORD_SAMPLE)
+last_timestamp = report[1];
+
+ offset += header->size;
+  }
}

-   unreachable("not reached");
-   return false;
+   return true;
 }

 /**
@@ -709,10 +721,6 @@ accumulate_oa_reports(struct brw_context *brw,

assert(o->Ready);

-   /* Collect the latest periodic OA reports from i915 perf */
-   if (!read_oa_samples(brw))
-  goto error;
-
drm_intel_bo_map(obj->oa.bo, false);
query_buffer = obj->oa.bo->virtual;

@@ -728,6 +736,11 @@ accumulate_oa_reports(struct brw_context *brw,
   goto error;
}

+   /* Collect the latest periodic OA reports from i915 perf */
+   if (!read_oa_samples_until(brw, start[1], end[1]))
+  goto error;
+
+
/* See if we have any periodic reports to accumulate too... */

/* N.B. The oa.samples_head was set when the query began and
--
2.11.0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] radv: overhaul fragment shader sample positions.

2017-04-03 Thread Bas Nieuwenhuizen
Series is

Reviewed-by: Bas Nieuwenhuizen 

Not really a fan of yet another flag in the command buffer, but not
sure what would be the best solution.

On Mon, Apr 3, 2017 at 5:44 AM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> The current code was broken, and I decided to redesign it instead.
>
> This puts the sample positions for all samples into the queue
> constant descriptor buffer after all the spill/ring descriptors.
>
> It then uses a single offset register to point how far into the
> samples the samples for num_samples are. This saves one user sgpr
> and means we only generate the sample position data in the rare
> single case where we need it currently.
>
> This doesn't fix the failing CTS tests without the followup
> fix.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/amd/common/ac_nir_to_llvm.c  | 31 +++-
>  src/amd/common/ac_nir_to_llvm.h  |  4 ++-
>  src/amd/vulkan/radv_cmd_buffer.c | 61 
> 
>  src/amd/vulkan/radv_device.c | 40 ++
>  src/amd/vulkan/radv_private.h|  2 ++
>  5 files changed, 87 insertions(+), 51 deletions(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index 520e4cf..aabcabe 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -109,7 +109,7 @@ struct nir_to_llvm_context {
> LLVMValueRef hs_ring_tess_factor;
>
> LLVMValueRef prim_mask;
> -   LLVMValueRef sample_positions;
> +   LLVMValueRef sample_pos_offset;
> LLVMValueRef persp_sample, persp_center, persp_centroid;
> LLVMValueRef linear_sample, linear_center, linear_centroid;
> LLVMValueRef front_face;
> @@ -573,6 +573,7 @@ static void create_function(struct nir_to_llvm_context 
> *ctx)
> ctx->stage == MESA_SHADER_VERTEX ||
> ctx->stage == MESA_SHADER_TESS_CTRL ||
> ctx->stage == MESA_SHADER_TESS_EVAL ||
> +   ctx->stage == MESA_SHADER_FRAGMENT ||
> ctx->is_gs_copy_shader)
> need_ring_offsets = true;
>
> @@ -584,7 +585,7 @@ static void create_function(struct nir_to_llvm_context 
> *ctx)
> need_push_constants = false;
>
> if (need_ring_offsets && !ctx->options->supports_spill) {
> -   arg_types[arg_idx++] = const_array(ctx->v16i8, 8); /* address 
> of rings */
> +   arg_types[arg_idx++] = const_array(ctx->v16i8, 16); /* 
> address of rings */
> }
>
> /* 1 for each descriptor set */
> @@ -679,7 +680,7 @@ static void create_function(struct nir_to_llvm_context 
> *ctx)
> arg_types[arg_idx++] = ctx->i32; // GS instance id
> break;
> case MESA_SHADER_FRAGMENT:
> -   arg_types[arg_idx++] = const_array(ctx->f32, 32); /* sample 
> positions */
> +   arg_types[arg_idx++] = ctx->i32; /* sample position offset */
> user_sgpr_count = arg_idx;
> arg_types[arg_idx++] = ctx->i32; /* prim mask */
> sgpr_count = arg_idx;
> @@ -735,7 +736,7 @@ static void create_function(struct nir_to_llvm_context 
> *ctx)
>
> LLVMPointerType(ctx->i8, CONST_ADDR_SPACE),
>NULL, 0, 
> AC_FUNC_ATTR_READNONE);
> ctx->ring_offsets = LLVMBuildBitCast(ctx->builder, 
> ctx->ring_offsets,
> -
> const_array(ctx->v16i8, 8), "");
> +
> const_array(ctx->v16i8, 16), "");
> } else
> ctx->ring_offsets = LLVMGetParam(ctx->main_function, 
> arg_idx++);
> }
> @@ -844,9 +845,9 @@ static void create_function(struct nir_to_llvm_context 
> *ctx)
> ctx->gs_invocation_id = LLVMGetParam(ctx->main_function, 
> arg_idx++);
> break;
> case MESA_SHADER_FRAGMENT:
> -   set_userdata_location_shader(ctx, AC_UD_PS_SAMPLE_POS, 
> user_sgpr_idx, 2);
> -   user_sgpr_idx += 2;
> -   ctx->sample_positions = LLVMGetParam(ctx->main_function, 
> arg_idx++);
> +   set_userdata_location_shader(ctx, AC_UD_PS_SAMPLE_POS_OFFSET, 
> user_sgpr_idx, 1);
> +   user_sgpr_idx += 1;
> +   ctx->sample_pos_offset = LLVMGetParam(ctx->main_function, 
> arg_idx++);
> ctx->prim_mask = LLVMGetParam(ctx->main_function, arg_idx++);
> ctx->persp_sample = LLVMGetParam(ctx->main_function, 
> arg_idx++);
> ctx->persp_center = LLVMGetParam(ctx->main_function, 
> arg_idx++);
> @@ -3518,15 +3519,17 @@ static LLVMValueRef lookup_interp_param(struct 
> nir_to_llvm_context *ctx,
>  static LLVMValueRef 

[Mesa-dev] [PATCH] radv: Enable VK_KHR_incremental_present.

2017-04-03 Thread Bas Nieuwenhuizen
Just enabling the driver-independent implementation that Jason did.

Signed-off-by: Bas Nieuwenhuizen 
---
 src/amd/vulkan/radv_device.c   |  4 
 src/amd/vulkan/radv_entrypoints_gen.py |  1 +
 src/amd/vulkan/radv_wsi.c  | 11 ++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 5c48be1d11a..6456eb17f56 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -92,6 +92,10 @@ static const VkExtensionProperties instance_extensions[] = {
 
 static const VkExtensionProperties common_device_extensions[] = {
{
+   .extensionName = VK_KHR_INCREMENTAL_PRESENT_EXTENSION_NAME,
+   .specVersion = 1,
+   },
+   {
.extensionName = VK_KHR_MAINTENANCE1_EXTENSION_NAME,
.specVersion = 1,
},
diff --git a/src/amd/vulkan/radv_entrypoints_gen.py 
b/src/amd/vulkan/radv_entrypoints_gen.py
index b7b2bcf97e4..bee29dc1202 100644
--- a/src/amd/vulkan/radv_entrypoints_gen.py
+++ b/src/amd/vulkan/radv_entrypoints_gen.py
@@ -31,6 +31,7 @@ supported_extensions = [
'VK_AMD_draw_indirect_count',
'VK_NV_dedicated_allocation',
'VK_KHR_get_physical_device_properties2',
+   'VK_KHR_incremental_present',
'VK_KHR_maintenance1',
'VK_KHR_sampler_mirror_clamp_to_edge',
'VK_KHR_shader_draw_parameters',
diff --git a/src/amd/vulkan/radv_wsi.c b/src/amd/vulkan/radv_wsi.c
index 8b66095b263..b8999f4eb02 100644
--- a/src/amd/vulkan/radv_wsi.c
+++ b/src/amd/vulkan/radv_wsi.c
@@ -26,6 +26,7 @@
 #include "radv_private.h"
 #include "radv_meta.h"
 #include "wsi_common.h"
+#include "util/vk_util.h"
 
 static const struct wsi_callbacks wsi_cbs = {
.get_phys_device_format_properties = radv_GetPhysicalDeviceFormatProperties,
@@ -452,9 +453,14 @@ VkResult radv_QueuePresentKHR(
RADV_FROM_HANDLE(radv_queue, queue, _queue);
VkResult result = VK_SUCCESS;
 
+   const VkPresentRegionsKHR *regions =
+vk_find_struct_const(pPresentInfo->pNext, PRESENT_REGIONS_KHR);
+
for (uint32_t i = 0; i < pPresentInfo->swapchainCount; i++) {
RADV_FROM_HANDLE(wsi_swapchain, swapchain, 
pPresentInfo->pSwapchains[i]);
struct radeon_winsys_cs *cs;
+   const VkPresentRegionKHR *region = NULL;
+
assert(radv_device_from_handle(swapchain->device) == 
queue->device);
if (swapchain->fences[0] == VK_NULL_HANDLE) {
result = 
radv_CreateFence(radv_device_to_handle(queue->device),
@@ -484,9 +490,12 @@ VkResult radv_QueuePresentKHR(
 pPresentInfo->waitSemaphoreCount, 
NULL, 0, false, base_fence);
fence->submitted = true;
 
+   if (regions && regions->pRegions)
+   region = >pRegions[i];
+
result = swapchain->queue_present(swapchain,
  
pPresentInfo->pImageIndices[i],
- NULL);
+ region);
/* TODO: What if one of them returns OUT_OF_DATE? */
if (result != VK_SUCCESS)
return result;
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 07/18] anv/allocator: Add a BO cache

2017-04-03 Thread Chad Versace
On Fri 31 Mar 2017, Chad Versace wrote:
> On Wed 15 Mar 2017, Jason Ekstrand wrote:
> > This cache allows us to easily ensure that we have a unique anv_bo for
> > each gem handle.  We'll need this in order to support multiple-import of
> > memory objects and semaphores.
> > 
> > v2 (Jason Ekstrand):
> >  - Reject BO imports if the size doesn't match the prime fd size as
> >reported by lseek().
> > 
> > v3 (Jason Ekstrand):
> >  - Fix reference counting around cache_release (Chris Willson)
> >  - Move the mutex_unlock() later in cache_release
> > ---
> >  src/intel/vulkan/anv_allocator.c | 261 
> > +++
> >  src/intel/vulkan/anv_private.h   |  26 
> >  2 files changed, 287 insertions(+)
> 
> 
> > +static uint32_t
> > +hash_uint32_t(const void *key)
> > +{
> > +   return (uint32_t)(uintptr_t)key;
> > +}
> 
> This hash function does not appear hashy.
> 
> If I correctly understand the details of Mesa's struct hash_table,
> choosing the identify function for the hash function causes unwanted
> clustering when inserting consecutive gem handles.  Since the kernel does
> allocate gem handles consecutively, the problem is real.
> 
> For proof, consider the following:
> 
>- Suppose a long-running process (maybe the compositor) has thrashed on the
>  hash table long enough that its bucket count
>  is ht->size = hash_sizes[7].size = 283. Suppose a spike of
>  compositor activity raises the hash table's density to about 0.5.
>  And suppose the hash table buckets are filled with the consecutive gem
>  handles
>  
>  {0, 0, 0, 0, 4, 5, 6, 7, 8, 9, ..., 127, 128, 0, 0, 0, ..., 0 }
>  
>  The exact density is (128 - 4 + 1) / 283 = 0.4417.
> 
>- Next, some other in-process activity (maybe OpenGL) generated
>  a lot of gem handles after Vulkan's most recently imported
>  gem handle, 128.

This point in the example---the reason why the gem handles in the
anv_bo_cache skip from 128 to 287---is bogus in Vulkan. The problem *is*
real for multiple in-process OpenGL contexts derived from the same
EGLDisplay, using EGL_EXT_image_dma_buf_import, because each context
shares the same intel_screen, and therefore the same drm device fd. But
in Vulkan, each VkDevice opens its own drm device id. So, bogus example.

BUT, that leads to a new question...

Since each VkDevice has a unique drm device fd, and since the kernel
allocates gem handles consecutively on the fd, and since struct
hash_table only grows and never shrinks, and since patch 8/18 inserts
every VkDeviceMemory into the cache... I believe no collisions are
possible in anv_bo_cache.

If there are no collisions, then the hash table is only adding overhead,
and we should use a direct-addressing lookup table. The bo cache should
look like this:

struct anv_bo_cache {
   /* The array indices are gem handles. Null entries are legal. */
   struct anv_bo **bos;

   /* Length of the array. Because the array can have holes, this
* is *not* the number of gem handles in the array. 
*/
   size_t len; 

   pthread_mutex_t mutex;
};

struct anv_bo *
anv_bo_cache_lookup(struct anv_bo_cache *cache, uint32_t gem_handle)
{
   struct anv_bo *bo = NULL;

   pthread_mutex_lock(>mutex);

   if (gem_handle < cache->len)
  bo = cache->entries[gem_handle] == NULL)

   pthread_mutex_unlock(>mutex);

   return bo;

}

BUT, that leads to yet another question...

Why is patch 8/18 inserting every VkDeviceMemory into the cache? If
I understand things correctly, we only *need* to insert a VkDeviceMemory
into the cache if, in vkAllocateMemory, either (1)
VkExportMemoryAllocateInfoKHX::handleTypes != 0 or (2)
VkMemoryAllocateInfo's pNext chain contains an import structure.

If we insert into the cache only those VkDeviceMemories that are
imported or that will be exported, then the bo cache remains small, and
we *should* use a hash table.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] radv/ac: round cube array coordinate before fixup.

2017-04-03 Thread Bas Nieuwenhuizen
Series is:

Reviewed-by: Bas Nieuwenhuizen 

On Mon, Apr 3, 2017 at 5:46 AM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> This fixes:
> dEQP-VK.glsl.texture_functions.texture.samplercubearray*
>
> Signed-off-by: Dave Airlie 
> ---
>  src/amd/common/ac_nir_to_llvm.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index 6985371..adba539 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -4237,6 +4237,8 @@ static void visit_tex(struct nir_to_llvm_context *ctx, 
> nir_tex_instr *instr)
> }
>
> if (instr->sampler_dim == GLSL_SAMPLER_DIM_CUBE && coord) {
> +   if (instr->is_array && instr->op != nir_texop_lod)
> +   coords[3] = apply_round_slice(ctx, coords[3]);
> for (chan = 0; chan < instr->coord_components; chan++)
> coords[chan] = to_float(ctx, coords[chan]);
> if (instr->coord_components == 3)
> @@ -4264,7 +4266,9 @@ static void visit_tex(struct nir_to_llvm_context *ctx, 
> nir_tex_instr *instr)
> }
> if (instr->coord_components > 2) {
> /* This seems like a bit of a hack - but it passes 
> Vulkan CTS with it */
> -   if (instr->sampler_dim != GLSL_SAMPLER_DIM_3D && 
> instr->op != nir_texop_txf) {
> +   if (instr->sampler_dim != GLSL_SAMPLER_DIM_3D &&
> +   instr->sampler_dim != GLSL_SAMPLER_DIM_CUBE &&
> +   instr->op != nir_texop_txf) {
> coords[2] = apply_round_slice(ctx, coords[2]);
> }
> address[count++] = coords[2];
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/glthread: Avoid unnecessary batch reallocation

2017-04-03 Thread Bartosz Tomczyk
---
 src/mesa/main/glthread.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/src/mesa/main/glthread.c b/src/mesa/main/glthread.c
index 3f07c420d4..c4d3f4a434 100644
--- a/src/mesa/main/glthread.c
+++ b/src/mesa/main/glthread.c
@@ -53,7 +53,8 @@ glthread_allocate_batch(struct gl_context *ctx)
 }
 
 static void
-glthread_unmarshal_batch(struct gl_context *ctx, struct glthread_batch *batch)
+glthread_unmarshal_batch(struct gl_context *ctx, struct glthread_batch *batch,
+ const bool release_batch)
 {
size_t pos = 0;
 
@@ -64,7 +65,10 @@ glthread_unmarshal_batch(struct gl_context *ctx, struct 
glthread_batch *batch)
 
assert(pos == batch->used);
 
-   free(batch);
+   if (release_batch)
+  free(batch);
+   else
+  batch->used = 0;
 }
 
 static void *
@@ -103,7 +107,7 @@ glthread_worker(void *data)
   glthread->busy = true;
   pthread_mutex_unlock(>mutex);
 
-  glthread_unmarshal_batch(ctx, batch);
+  glthread_unmarshal_batch(ctx, batch, true);
 
   pthread_mutex_lock(>mutex);
   glthread->busy = false;
@@ -214,7 +218,7 @@ _mesa_glthread_flush_batch_locked(struct gl_context *ctx)
 * need to restore it when it returns.
 */
if (false) {
-  glthread_unmarshal_batch(ctx, batch);
+  glthread_unmarshal_batch(ctx, batch, true);
   _glapi_set_dispatch(ctx->CurrentClientDispatch);
   return;
}
@@ -269,9 +273,8 @@ _mesa_glthread_finish(struct gl_context *ctx)
if (!(glthread->batch_queue || glthread->busy)) {
   if (glthread->batch && glthread->batch->used) {
  struct _glapi_table *dispatch = _glapi_get_dispatch();
- glthread_unmarshal_batch(ctx, glthread->batch);
+ glthread_unmarshal_batch(ctx, glthread->batch, false);
  _glapi_set_dispatch(dispatch);
- glthread_allocate_batch(ctx);
   }
} else {
   _mesa_glthread_flush_batch_locked(ctx);
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: move to using common buffer load format.

2017-04-03 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

On Mon, Apr 3, 2017 at 8:57 PM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> Get rid of usage of SI.vs.load.input.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/amd/common/ac_nir_to_llvm.c | 13 +
>  1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index 520e4cf..da38331 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -4564,7 +4564,6 @@ handle_vs_input_decl(struct nir_to_llvm_context *ctx,
> LLVMValueRef t_list_ptr = ctx->vertex_buffers;
> LLVMValueRef t_offset;
> LLVMValueRef t_list;
> -   LLVMValueRef args[3];
> LLVMValueRef input;
> LLVMValueRef buffer_index;
> int index = variable->data.location - VERT_ATTRIB_GENERIC0;
> @@ -4586,13 +4585,11 @@ handle_vs_input_decl(struct nir_to_llvm_context *ctx,
> t_offset = LLVMConstInt(ctx->i32, index + i, false);
>
> t_list = ac_build_indexed_load_const(>ac, t_list_ptr, 
> t_offset);
> -   args[0] = t_list;
> -   args[1] = LLVMConstInt(ctx->i32, 0, false);
> -   args[2] = buffer_index;
> -   input = ac_build_intrinsic(>ac,
> -   "llvm.SI.vs.load.input", ctx->v4f32, args, 3,
> -   AC_FUNC_ATTR_READNONE | AC_FUNC_ATTR_NOUNWIND |
> -   AC_FUNC_ATTR_LEGACY);
> +
> +   input = ac_build_buffer_load_format(>ac, t_list,
> +   buffer_index,
> +   LLVMConstInt(ctx->i32, 0, 
> false),
> +   true);
>
> for (unsigned chan = 0; chan < 4; chan++) {
> LLVMValueRef llvm_chan = LLVMConstInt(ctx->i32, chan, 
> false);
> --
> 2.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] [RFC] mesa/glthread: misaligned address access

2017-04-03 Thread Bartosz Tomczyk
Address sanitizer reports lot of misaligned access:
SUMMARY: AddressSanitizer: undefined-behavior main/marshal.c:276:31 in
main/marshal.c:276:31: runtime error: load of misaligned address 0x631000104866 
for type
'const GLuint' (aka 'const unsigned int'), which requires 4 byte alignment
0x631000104866: note: pointer points here
 92 88 00 00 00 00  00 00 4a 03 0c 00 93 88  00 00 00 00 00 00 02 01  0c 00 40 
8d 00 00 00 00  00 00
 ^
SUMMARY: AddressSanitizer: undefined-behavior main/marshal_generated.c:28725:12 
in
main/marshal_generated.c:28726:12: runtime error: member access within 
misaligned address 0x6310003fc874 for type
'struct marshal_cmd_VertexAttribPointer', which requires 8 byte alignment
0x6310003fc874: note: pointer points here
  01 00 00 00 7a 02 20 00  00 00 00 00 be be be be  be be be be be be be be  be 
be be be be be be be
  ^
SUMMARY: AddressSanitizer: undefined-behavior main/marshal_generated.c:28726:12 
in
main/marshal_generated.c:28726:12: runtime error: store to misaligned address 
0x6310003fc87c for type
'GLint' (aka 'int'), which requires 8 byte alignment
0x6310003fc87c: note: pointer points here
  00 00 00 00 be be be be  be be be be be be be be  be be be be be be be be  be 
be be be be be be be

This probably isn't issue for x86 as misaligned access shoul be as fast
as aligned ones. But this could be issue for different architectures
like ARM/MIPS, any exert here ?

Any idea how to get correct aligment on different platforms ?
---
 src/mesa/main/marshal.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/marshal.h b/src/mesa/main/marshal.h
index 2f1509b2d5..4842d27eeb 100644
--- a/src/mesa/main/marshal.h
+++ b/src/mesa/main/marshal.h
@@ -32,6 +32,7 @@
 
 #include "main/glthread.h"
 #include "main/context.h"
+#include "main/macros.h"
 
 struct marshal_cmd_base
 {
@@ -55,15 +56,16 @@ _mesa_glthread_allocate_command(struct gl_context *ctx,
 {
struct glthread_state *glthread = ctx->GLThread;
struct marshal_cmd_base *cmd_base;
+   const size_t aligned_size = ALIGN(size, 8);
 
if (unlikely(glthread->batch->used + size > MARSHAL_MAX_CMD_SIZE))
   _mesa_glthread_flush_batch(ctx);
 
cmd_base = (struct marshal_cmd_base *)
   >batch->buffer[glthread->batch->used];
-   glthread->batch->used += size;
+   glthread->batch->used += aligned_size;
cmd_base->cmd_id = cmd_id;
-   cmd_base->cmd_size = size;
+   cmd_base->cmd_size = aligned_size;
return cmd_base;
 }
 
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/glthread: Avoid unnecessary batch reallocation

2017-04-03 Thread Nicolai Hähnle

On 03.04.2017 20:53, Bartosz Tomczyk wrote:

Actually, I can just set only batch->used to 0, but it seems to error
prone. When someone adds some fields to batch struct, it will be easy to
miss that it should be initialized in glthread_unmarshal_batch.


Better to have it fail early and loudly with garbage data, rather than 
silently if 0 happens to be an accepted value. I think it should be 
changed (unless there's a good reason to rely on it on purpose, but I 
don't see one...).


Cheers,
Nicolai




Anyway I can change it if you want to.

On Mon, Apr 3, 2017 at 8:42 PM, Nicolai Hähnle > wrote:

On 03.04.2017 20:38, Bartosz Tomczyk wrote:

---
 src/mesa/main/glthread.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/src/mesa/main/glthread.c b/src/mesa/main/glthread.c
index 3f07c420d4..aa14292e59 100644
--- a/src/mesa/main/glthread.c
+++ b/src/mesa/main/glthread.c
@@ -53,7 +53,8 @@ glthread_allocate_batch(struct gl_context *ctx)
 }

 static void
-glthread_unmarshal_batch(struct gl_context *ctx, struct
glthread_batch *batch)
+glthread_unmarshal_batch(struct gl_context *ctx, struct
glthread_batch *batch,
+ const bool release_batch)
 {
size_t pos = 0;

@@ -64,7 +65,10 @@ glthread_unmarshal_batch(struct gl_context
*ctx, struct glthread_batch *batch)

assert(pos == batch->used);

-   free(batch);
+   if (release_batch)
+  free(batch);
+   else
+  memset(batch, 0, offsetof(struct glthread_batch, buffer));


Hmm. Why do we memset the batch? Seems like a trivial optimization
to just not do that...

Apart from that, the patch looks good to me.

Cheers,
Nicolai



 }

 static void *
@@ -103,7 +107,7 @@ glthread_worker(void *data)
   glthread->busy = true;
   pthread_mutex_unlock(>mutex);

-  glthread_unmarshal_batch(ctx, batch);
+  glthread_unmarshal_batch(ctx, batch, true);

   pthread_mutex_lock(>mutex);
   glthread->busy = false;
@@ -214,7 +218,7 @@ _mesa_glthread_flush_batch_locked(struct
gl_context *ctx)
 * need to restore it when it returns.
 */
if (false) {
-  glthread_unmarshal_batch(ctx, batch);
+  glthread_unmarshal_batch(ctx, batch, true);
   _glapi_set_dispatch(ctx->CurrentClientDispatch);
   return;
}
@@ -269,9 +273,8 @@ _mesa_glthread_finish(struct gl_context *ctx)
if (!(glthread->batch_queue || glthread->busy)) {
   if (glthread->batch && glthread->batch->used) {
  struct _glapi_table *dispatch = _glapi_get_dispatch();
- glthread_unmarshal_batch(ctx, glthread->batch);
+ glthread_unmarshal_batch(ctx, glthread->batch, false);
  _glapi_set_dispatch(dispatch);
- glthread_allocate_batch(ctx);
   }
} else {
   _mesa_glthread_flush_batch_locked(ctx);



--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.





--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radv: move to using common buffer load format.

2017-04-03 Thread Dave Airlie
From: Dave Airlie 

Get rid of usage of SI.vs.load.input.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 520e4cf..da38331 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -4564,7 +4564,6 @@ handle_vs_input_decl(struct nir_to_llvm_context *ctx,
LLVMValueRef t_list_ptr = ctx->vertex_buffers;
LLVMValueRef t_offset;
LLVMValueRef t_list;
-   LLVMValueRef args[3];
LLVMValueRef input;
LLVMValueRef buffer_index;
int index = variable->data.location - VERT_ATTRIB_GENERIC0;
@@ -4586,13 +4585,11 @@ handle_vs_input_decl(struct nir_to_llvm_context *ctx,
t_offset = LLVMConstInt(ctx->i32, index + i, false);
 
t_list = ac_build_indexed_load_const(>ac, t_list_ptr, 
t_offset);
-   args[0] = t_list;
-   args[1] = LLVMConstInt(ctx->i32, 0, false);
-   args[2] = buffer_index;
-   input = ac_build_intrinsic(>ac,
-   "llvm.SI.vs.load.input", ctx->v4f32, args, 3,
-   AC_FUNC_ATTR_READNONE | AC_FUNC_ATTR_NOUNWIND |
-   AC_FUNC_ATTR_LEGACY);
+
+   input = ac_build_buffer_load_format(>ac, t_list,
+   buffer_index,
+   LLVMConstInt(ctx->i32, 0, 
false),
+   true);
 
for (unsigned chan = 0; chan < 4; chan++) {
LLVMValueRef llvm_chan = LLVMConstInt(ctx->i32, chan, 
false);
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/glthread: Avoid unnecessary batch reallocation

2017-04-03 Thread Bartosz Tomczyk
Actually, I can just set only batch->used to 0, but it seems to error
prone. When someone adds some fields to batch struct, it will be easy to
miss that it should be initialized in glthread_unmarshal_batch.

Anyway I can change it if you want to.

On Mon, Apr 3, 2017 at 8:42 PM, Nicolai Hähnle  wrote:

> On 03.04.2017 20:38, Bartosz Tomczyk wrote:
>
>> ---
>>  src/mesa/main/glthread.c | 15 +--
>>  1 file changed, 9 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/mesa/main/glthread.c b/src/mesa/main/glthread.c
>> index 3f07c420d4..aa14292e59 100644
>> --- a/src/mesa/main/glthread.c
>> +++ b/src/mesa/main/glthread.c
>> @@ -53,7 +53,8 @@ glthread_allocate_batch(struct gl_context *ctx)
>>  }
>>
>>  static void
>> -glthread_unmarshal_batch(struct gl_context *ctx, struct glthread_batch
>> *batch)
>> +glthread_unmarshal_batch(struct gl_context *ctx, struct glthread_batch
>> *batch,
>> + const bool release_batch)
>>  {
>> size_t pos = 0;
>>
>> @@ -64,7 +65,10 @@ glthread_unmarshal_batch(struct gl_context *ctx,
>> struct glthread_batch *batch)
>>
>> assert(pos == batch->used);
>>
>> -   free(batch);
>> +   if (release_batch)
>> +  free(batch);
>> +   else
>> +  memset(batch, 0, offsetof(struct glthread_batch, buffer));
>>
>
> Hmm. Why do we memset the batch? Seems like a trivial optimization to just
> not do that...
>
> Apart from that, the patch looks good to me.
>
> Cheers,
> Nicolai
>
>
>
>  }
>>
>>  static void *
>> @@ -103,7 +107,7 @@ glthread_worker(void *data)
>>glthread->busy = true;
>>pthread_mutex_unlock(>mutex);
>>
>> -  glthread_unmarshal_batch(ctx, batch);
>> +  glthread_unmarshal_batch(ctx, batch, true);
>>
>>pthread_mutex_lock(>mutex);
>>glthread->busy = false;
>> @@ -214,7 +218,7 @@ _mesa_glthread_flush_batch_locked(struct gl_context
>> *ctx)
>>  * need to restore it when it returns.
>>  */
>> if (false) {
>> -  glthread_unmarshal_batch(ctx, batch);
>> +  glthread_unmarshal_batch(ctx, batch, true);
>>_glapi_set_dispatch(ctx->CurrentClientDispatch);
>>return;
>> }
>> @@ -269,9 +273,8 @@ _mesa_glthread_finish(struct gl_context *ctx)
>> if (!(glthread->batch_queue || glthread->busy)) {
>>if (glthread->batch && glthread->batch->used) {
>>   struct _glapi_table *dispatch = _glapi_get_dispatch();
>> - glthread_unmarshal_batch(ctx, glthread->batch);
>> + glthread_unmarshal_batch(ctx, glthread->batch, false);
>>   _glapi_set_dispatch(dispatch);
>> - glthread_allocate_batch(ctx);
>>}
>> } else {
>>_mesa_glthread_flush_batch_locked(ctx);
>>
>>
>
> --
> Lerne, wie die Welt wirklich ist,
> Aber vergiss niemals, wie sie sein sollte.
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] radeonsi: use i32_0/1 instead of *int_bld.zero/one in most places

2017-04-03 Thread Dave Airlie
On 3 April 2017 at 19:52, Marek Olšák  wrote:
> From: Marek Olšák 

radv uses i32zero and i32one, it might be nice to be consistent, but I
don't mind which way.

Dave.
>
> ---
>  src/gallium/drivers/radeonsi/si_shader.c   | 88 
> ++
>  .../drivers/radeonsi/si_shader_tgsi_setup.c| 14 ++--
>  2 files changed, 47 insertions(+), 55 deletions(-)
>
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> b/src/gallium/drivers/radeonsi/si_shader.c
> index 21efd9a..a5d370b 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -538,38 +538,38 @@ static void declare_input_vs(
> break;
> }
>  }
>
>  static LLVMValueRef get_primitive_id(struct lp_build_tgsi_context *bld_base,
>  unsigned swizzle)
>  {
> struct si_shader_context *ctx = si_shader_context(bld_base);
>
> if (swizzle > 0)
> -   return bld_base->uint_bld.zero;
> +   return ctx->i32_0;
>
> switch (ctx->type) {
> case PIPE_SHADER_VERTEX:
> return LLVMGetParam(ctx->main_fn,
> ctx->param_vs_prim_id);
> case PIPE_SHADER_TESS_CTRL:
> return LLVMGetParam(ctx->main_fn,
> SI_PARAM_PATCH_ID);
> case PIPE_SHADER_TESS_EVAL:
> return LLVMGetParam(ctx->main_fn,
> ctx->param_tes_patch_id);
> case PIPE_SHADER_GEOMETRY:
> return LLVMGetParam(ctx->main_fn,
> SI_PARAM_PRIMITIVE_ID);
> default:
> assert(0);
> -   return bld_base->uint_bld.zero;
> +   return ctx->i32_0;
> }
>  }
>
>  /**
>   * Return the value of tgsi_ind_register for indexing.
>   * This is the indirect index with the constant offset added to it.
>   */
>  static LLVMValueRef get_indirect_index(struct si_shader_context *ctx,
>const struct tgsi_ind_register *ind,
>int rel_index)
> @@ -1096,28 +1096,28 @@ static LLVMValueRef fetch_input_gs(
> vtx_offset_param += SI_PARAM_VTX2_OFFSET - 2;
> }
> vtx_offset = lp_build_mul_imm(uint,
>   LLVMGetParam(ctx->main_fn,
>vtx_offset_param),
>   4);
>
> param = si_shader_io_get_unique_index(semantic_name, semantic_index);
> soffset = LLVMConstInt(ctx->i32, (param * 4 + swizzle) * 256, 0);
>
> -   value = ac_build_buffer_load(>ac, ctx->esgs_ring, 1, uint->zero,
> +   value = ac_build_buffer_load(>ac, ctx->esgs_ring, 1, ctx->i32_0,
>  vtx_offset, soffset, 0, 1, 0, true);
> if (tgsi_type_is_64bit(type)) {
> LLVMValueRef value2;
> soffset = LLVMConstInt(ctx->i32, (param * 4 + swizzle + 1) * 
> 256, 0);
>
> value2 = ac_build_buffer_load(>ac, ctx->esgs_ring, 1,
> - uint->zero, vtx_offset, soffset,
> + ctx->i32_0, vtx_offset, soffset,
>   0, 1, 0, true);
> return si_llvm_emit_fetch_64bit(bld_base, type,
> value, value2);
> }
> return LLVMBuildBitCast(gallivm->builder,
> value,
> tgsi2llvmtype(bld_base, type), "");
>  }
>
>  static int lookup_interp_param_index(unsigned interpolate, unsigned location)
> @@ -1169,21 +1169,20 @@ static void interp_fs_input(struct si_shader_context 
> *ctx,
> unsigned semantic_index,
> unsigned num_interp_inputs,
> unsigned colors_read_mask,
> LLVMValueRef interp_param,
> LLVMValueRef prim_mask,
> LLVMValueRef face,
> LLVMValueRef result[4])
>  {
> struct lp_build_tgsi_context *bld_base = >bld_base;
> struct lp_build_context *base = _base->base;
> -   struct lp_build_context *uint = _base->uint_bld;
> struct gallivm_state *gallivm = base->gallivm;
> LLVMValueRef attr_number;
> LLVMValueRef i, j;
>
> unsigned chan;
>
> /* fs.constant returns the param from the middle vertex, so it's not
>  * really useful for flat shading. It's meant to be used for custom
>  * interpolation (but the intrinsic can't fetch from the other two
>  * vertices).
> @@ -1198,41 +1197,41 @@ static void interp_fs_input(struct si_shader_context 
> *ctx,
>  

Re: [Mesa-dev] [PATCH] travis: Support LLVM 3.8+ on Trusty-based Travis-CI via apt-get not apt addon

2017-04-03 Thread Emil Velikov
On 2 April 2017 at 21:48, Rhys Kidd  wrote:
> Per comments by Travis-CI, the apt addon is only really needed for the
> container-based Precise builds, as they don't yet support Trusty on that 
> platform.
>
> Mesa currently uses Trusty fully-virtualized environment (due to sudo: 
> required).
>
> See further:
> https://docs.travis-ci.com/user/trusty-ci-environment/#Fully-virtualized-via-sudo%3A-required
> https://github.com/travis-ci/apt-source-whitelist/pull/205#issuecomment-216054237
>
> Signed-off-by: Rhys Kidd 
You're a start Rhys, thank you !

Reviewed-by: Emil Velikov  and pushed to master.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallium/util: libunwind support

2017-04-03 Thread Emil Velikov
Hi Rob,

On 24 March 2017 at 21:21, Rob Clark  wrote:
> It's kinda sad that (a) we don't have debug_backtrace support on !X86
> and that (b) we re-invent our own crude backtrace support in the first
> place.  If available, use libunwind instead.  The backtrace format is
> based on what xserver and weston use, since it is nice not to have to
> figure out a different format.
>
> Signed-off-by: Rob Clark 
> ---
>  configure.ac   | 24 
>  src/gallium/Automake.inc   |  1 +
>  src/gallium/auxiliary/util/u_debug_stack.c | 91 
> ++
>  src/gallium/auxiliary/util/u_debug_stack.h | 15 -
>  4 files changed, 129 insertions(+), 2 deletions(-)
>
> diff --git a/configure.ac b/configure.ac
> index a99684b..5046acb 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -1025,6 +1025,30 @@ AC_SUBST([LLVM_LIBS])
>  AC_SUBST([LLVM_LDFLAGS])
>  AC_SUBST([LLVM_INCLUDEDIR])
>
> +dnl
> +dnl libunwind
> +dnl
> +AC_ARG_ENABLE([libunwind],
> +[AS_HELP_STRING([--enable-libunwind],
> +[Use libunwind for backtracing (default: auto)])],
> +[LIBUNWIND="$enableval"],
> +[LIBUNWIND="auto"])
> +
> +PKG_CHECK_MODULES(LIBUNWIND, libunwind, [HAVE_LIBUNWIND=yes], 
> [HAVE_LIBUNWIND=no])
> +if test "x$LIBUNWIND" = "xauto"; then
> +LIBUNWIND="$HAVE_LIBUNWIND"
> +fi
> +
> +if test "x$LIBUNWIND" = "xyes"; then
> +if test "x$HAVE_LIBUNWIND" != "xyes"; then
> +AC_MSG_ERROR([libunwind requested but not installed.])
> +fi
> +AC_DEFINE(HAVE_LIBUNWIND, 1, [Have libunwind support])
> +fi
> +
> +AM_CONDITIONAL(HAVE_LIBUNWIND, [test "x$LIBUNWIND" = xyes])
> +
> +
>  dnl Options for APIs
>  AC_ARG_ENABLE([opengl],
>  [AS_HELP_STRING([--disable-opengl],
> diff --git a/src/gallium/Automake.inc b/src/gallium/Automake.inc
> index a01fa54..48b5a44 100644
> --- a/src/gallium/Automake.inc
> +++ b/src/gallium/Automake.inc
> @@ -46,6 +46,7 @@ GALLIUM_TARGET_CFLAGS = \
>
>  GALLIUM_COMMON_LIB_DEPS = \
> -lm \
> +   $(LIBUNWIND_LIBS) \

We're using the LIBUNWIND_LIBS but LIBUNWIND_CFLAGS is missing. Please
it to src/gallium/auxiliary/Makefile.am's AM_CFLAGS


> diff --git a/src/gallium/auxiliary/util/u_debug_stack.h 
> b/src/gallium/auxiliary/util/u_debug_stack.h
> index 04eba08..0effcbe 100644
> --- a/src/gallium/auxiliary/util/u_debug_stack.h
> +++ b/src/gallium/auxiliary/util/u_debug_stack.h
> @@ -30,6 +30,11 @@
>
>  #include 
>
> +#ifdef HAVE_LIBUNWIND
> +#define UNW_LOCAL_ONLY
> +#include 
> +#endif
> +
This hunk is not needed in the header. Please move it to the C file.

With the above
Reviewed-by: Emil Velikov 

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/glthread: Avoid unnecessary batch reallocation

2017-04-03 Thread Nicolai Hähnle

On 03.04.2017 20:38, Bartosz Tomczyk wrote:

---
 src/mesa/main/glthread.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/src/mesa/main/glthread.c b/src/mesa/main/glthread.c
index 3f07c420d4..aa14292e59 100644
--- a/src/mesa/main/glthread.c
+++ b/src/mesa/main/glthread.c
@@ -53,7 +53,8 @@ glthread_allocate_batch(struct gl_context *ctx)
 }

 static void
-glthread_unmarshal_batch(struct gl_context *ctx, struct glthread_batch *batch)
+glthread_unmarshal_batch(struct gl_context *ctx, struct glthread_batch *batch,
+ const bool release_batch)
 {
size_t pos = 0;

@@ -64,7 +65,10 @@ glthread_unmarshal_batch(struct gl_context *ctx, struct 
glthread_batch *batch)

assert(pos == batch->used);

-   free(batch);
+   if (release_batch)
+  free(batch);
+   else
+  memset(batch, 0, offsetof(struct glthread_batch, buffer));


Hmm. Why do we memset the batch? Seems like a trivial optimization to 
just not do that...


Apart from that, the patch looks good to me.

Cheers,
Nicolai



 }

 static void *
@@ -103,7 +107,7 @@ glthread_worker(void *data)
   glthread->busy = true;
   pthread_mutex_unlock(>mutex);

-  glthread_unmarshal_batch(ctx, batch);
+  glthread_unmarshal_batch(ctx, batch, true);

   pthread_mutex_lock(>mutex);
   glthread->busy = false;
@@ -214,7 +218,7 @@ _mesa_glthread_flush_batch_locked(struct gl_context *ctx)
 * need to restore it when it returns.
 */
if (false) {
-  glthread_unmarshal_batch(ctx, batch);
+  glthread_unmarshal_batch(ctx, batch, true);
   _glapi_set_dispatch(ctx->CurrentClientDispatch);
   return;
}
@@ -269,9 +273,8 @@ _mesa_glthread_finish(struct gl_context *ctx)
if (!(glthread->batch_queue || glthread->busy)) {
   if (glthread->batch && glthread->batch->used) {
  struct _glapi_table *dispatch = _glapi_get_dispatch();
- glthread_unmarshal_batch(ctx, glthread->batch);
+ glthread_unmarshal_batch(ctx, glthread->batch, false);
  _glapi_set_dispatch(dispatch);
- glthread_allocate_batch(ctx);
   }
} else {
   _mesa_glthread_flush_batch_locked(ctx);




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] intel: gen_decoder: store pointer to current decoded field in iterator

2017-04-03 Thread Matt Turner
Both are

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/glthread: Avoid unnecessary batch reallocation

2017-04-03 Thread Bartosz Tomczyk
---
 src/mesa/main/glthread.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/src/mesa/main/glthread.c b/src/mesa/main/glthread.c
index 3f07c420d4..aa14292e59 100644
--- a/src/mesa/main/glthread.c
+++ b/src/mesa/main/glthread.c
@@ -53,7 +53,8 @@ glthread_allocate_batch(struct gl_context *ctx)
 }
 
 static void
-glthread_unmarshal_batch(struct gl_context *ctx, struct glthread_batch *batch)
+glthread_unmarshal_batch(struct gl_context *ctx, struct glthread_batch *batch,
+ const bool release_batch)
 {
size_t pos = 0;
 
@@ -64,7 +65,10 @@ glthread_unmarshal_batch(struct gl_context *ctx, struct 
glthread_batch *batch)
 
assert(pos == batch->used);
 
-   free(batch);
+   if (release_batch)
+  free(batch);
+   else
+  memset(batch, 0, offsetof(struct glthread_batch, buffer));
 }
 
 static void *
@@ -103,7 +107,7 @@ glthread_worker(void *data)
   glthread->busy = true;
   pthread_mutex_unlock(>mutex);
 
-  glthread_unmarshal_batch(ctx, batch);
+  glthread_unmarshal_batch(ctx, batch, true);
 
   pthread_mutex_lock(>mutex);
   glthread->busy = false;
@@ -214,7 +218,7 @@ _mesa_glthread_flush_batch_locked(struct gl_context *ctx)
 * need to restore it when it returns.
 */
if (false) {
-  glthread_unmarshal_batch(ctx, batch);
+  glthread_unmarshal_batch(ctx, batch, true);
   _glapi_set_dispatch(ctx->CurrentClientDispatch);
   return;
}
@@ -269,9 +273,8 @@ _mesa_glthread_finish(struct gl_context *ctx)
if (!(glthread->batch_queue || glthread->busy)) {
   if (glthread->batch && glthread->batch->used) {
  struct _glapi_table *dispatch = _glapi_get_dispatch();
- glthread_unmarshal_batch(ctx, glthread->batch);
+ glthread_unmarshal_batch(ctx, glthread->batch, false);
  _glapi_set_dispatch(dispatch);
- glthread_allocate_batch(ctx);
   }
} else {
   _mesa_glthread_flush_batch_locked(ctx);
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] bin/get-fixes-pick-list.sh: fix typo

2017-04-03 Thread Emil Velikov
Reviewed-by: Emil Velikov 

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] util: fix MSVC warning in u_align_u32()

2017-04-03 Thread Emil Velikov
On 3 April 2017 at 17:09, Brian Paul  wrote:
> To silence
> C:\Users\Brian\projects\mesa\src\util/u_vector.h(41) : warning C4146: unary
> minus operator applied to unsigned type, result still unsigned

For the series:
Reviewed-by: Emil Velikov 

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv/winsys: only workout color/depth levels if we have color/depth

2017-04-03 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

On Mon, Apr 3, 2017 at 7:17 AM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> This fixes an old bug that seems to get triggered by
> dEQP-VK.memory.requirements.image.sparse_tiling_optimal
>
> We return early when allocating S8_UINT due to there being
> no color or depth, and end up with image size of 0.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/amd/vulkan/winsys/amdgpu/radv_amdgpu_surface.c | 48 
> --
>  1 file changed, 26 insertions(+), 22 deletions(-)
>
> diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_surface.c 
> b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_surface.c
> index 0433952..53ca6ff 100644
> --- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_surface.c
> +++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_surface.c
> @@ -481,28 +481,32 @@ static int radv_amdgpu_winsys_surface_init(struct 
> radeon_winsys *_ws,
> surf->htile_size = surf->htile_slice_size = 0;
> surf->htile_alignment = 1;
>
> -   /* Calculate texture layout information. */
> -   for (level = 0; level <= surf->last_level; level++) {
> -   r = radv_compute_level(ws->addrlib, surf, false, level, type, 
> compressed,
> -  , , 
> , );
> -   if (r)
> -   return r;
> -
> -   if (level == 0) {
> -   surf->bo_alignment = AddrSurfInfoOut.baseAlign;
> -   surf->pipe_config = 
> AddrSurfInfoOut.pTileInfo->pipeConfig - 1;
> -   radv_set_micro_tile_mode(surf, >info);
> -
> -   /* For 2D modes only. */
> -   if (AddrSurfInfoOut.tileMode >= 
> ADDR_TM_2D_TILED_THIN1) {
> -   surf->bankw = 
> AddrSurfInfoOut.pTileInfo->bankWidth;
> -   surf->bankh = 
> AddrSurfInfoOut.pTileInfo->bankHeight;
> -   surf->mtilea = 
> AddrSurfInfoOut.pTileInfo->macroAspectRatio;
> -   surf->tile_split = 
> AddrSurfInfoOut.pTileInfo->tileSplitBytes;
> -   surf->num_banks = 
> AddrSurfInfoOut.pTileInfo->banks;
> -   surf->macro_tile_index = 
> AddrSurfInfoOut.macroModeIndex;
> -   } else {
> -   surf->macro_tile_index = 0;
> +   if (AddrSurfInfoIn.flags.color || AddrSurfInfoIn.flags.depth) {
> +   /* Calculate texture layout information. */
> +   for (level = 0; level <= surf->last_level; level++) {
> +   r = radv_compute_level(ws->addrlib, surf, false, 
> level, type, compressed,
> +  , 
> , , );
> +   if (r) {
> +   assert(0);
> +   return r;
> +   }
> +
> +   if (level == 0) {
> +   surf->bo_alignment = 
> AddrSurfInfoOut.baseAlign;
> +   surf->pipe_config = 
> AddrSurfInfoOut.pTileInfo->pipeConfig - 1;
> +   radv_set_micro_tile_mode(surf, >info);
> +
> +   /* For 2D modes only. */
> +   if (AddrSurfInfoOut.tileMode >= 
> ADDR_TM_2D_TILED_THIN1) {
> +   surf->bankw = 
> AddrSurfInfoOut.pTileInfo->bankWidth;
> +   surf->bankh = 
> AddrSurfInfoOut.pTileInfo->bankHeight;
> +   surf->mtilea = 
> AddrSurfInfoOut.pTileInfo->macroAspectRatio;
> +   surf->tile_split = 
> AddrSurfInfoOut.pTileInfo->tileSplitBytes;
> +   surf->num_banks = 
> AddrSurfInfoOut.pTileInfo->banks;
> +   surf->macro_tile_index = 
> AddrSurfInfoOut.macroModeIndex;
> +   } else {
> +   surf->macro_tile_index = 0;
> +   }
> }
> }
> }
> --
> 2.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] util/u_atomic: provide 64bit atomics where they're missing

2017-04-03 Thread Matt Turner
On Thu, Mar 30, 2017 at 3:47 PM, Matt Turner  wrote:
> On Thu, Mar 30, 2017 at 3:26 PM, Grazvydas Ignotas 
wrote:
>> There are still some distributions trying to support unfortunate people
>> with old or exotic CPUs that don't have 64bit atomic operations. When
>> compiling for such a machine, gcc conveniently inserts a library call to
>> a helper, but it's implementation is missing and we get a linker error.
>> This allows us to provide our own implementation, which is marked weak
>> to prefer a better implementation, should one exist.
>>
>> v2: changed copyright, some style adjustments
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93089
>> Signed-off-by: Grazvydas Ignotas 
>> Reviewed-by: Matt Turner 
>
> Thanks, I'll commit this.
>
>> ---
>>  no commit access, but request sent:
>>  https://bugs.freedesktop.org/show_bug.cgi?id=100467
>
> Thanks. I commented on the bug and said I approve :)

I modified the patch slightly to print the results of the test with
AC_MSG_CHECKING/AC_MSG_RESULT and pushed it as commit a6a38a038bd62e.

I'll do some build tests on various architectures. I think I like the idea
of this going to the stable 17.0 branch.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radv: Increase descriptor limits.

2017-04-03 Thread Bas Nieuwenhuizen
We supported more generally. Decreased the dynamic buffers though, as
we only support 16 for uniform+storage.

Signed-off-by: Bas Nieuwenhuizen 
---
 src/amd/vulkan/radv_device.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 5c48be1d11a..d3aac90468c 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -545,21 +545,21 @@ void radv_GetPhysicalDeviceProperties(
.bufferImageGranularity   = 64, /* A cache line 
*/
.sparseAddressSpaceSize   = 0xu, /* 
buffer max size */
.maxBoundDescriptorSets   = MAX_SETS,
-   .maxPerStageDescriptorSamplers= 64,
-   .maxPerStageDescriptorUniformBuffers  = 64,
-   .maxPerStageDescriptorStorageBuffers  = 64,
-   .maxPerStageDescriptorSampledImages   = 64,
-   .maxPerStageDescriptorStorageImages   = 64,
-   .maxPerStageDescriptorInputAttachments= 64,
-   .maxPerStageResources = 128,
+   .maxPerStageDescriptorSamplers= (1u << 31) / 16,
+   .maxPerStageDescriptorUniformBuffers  = (1u << 31) / 16,
+   .maxPerStageDescriptorStorageBuffers  = (1u << 31) / 16,
+   .maxPerStageDescriptorSampledImages   = (1u << 31) / 96,
+   .maxPerStageDescriptorStorageImages   = (1u << 31) / 64,
+   .maxPerStageDescriptorInputAttachments= (1u << 31) / 64,
+   .maxPerStageResources = (1u << 31) / 32,
.maxDescriptorSetSamplers = 256,
-   .maxDescriptorSetUniformBuffers   = 256,
-   .maxDescriptorSetUniformBuffersDynamic= 256,
-   .maxDescriptorSetStorageBuffers   = 256,
-   .maxDescriptorSetStorageBuffersDynamic= 256,
-   .maxDescriptorSetSampledImages= 256,
-   .maxDescriptorSetStorageImages= 256,
-   .maxDescriptorSetInputAttachments = 256,
+   .maxDescriptorSetUniformBuffers   = (1u << 31) / 16,
+   .maxDescriptorSetUniformBuffersDynamic= 8,
+   .maxDescriptorSetStorageBuffers   = (1u << 31) / 16,
+   .maxDescriptorSetStorageBuffersDynamic= 8,
+   .maxDescriptorSetSampledImages= (1u << 31) / 96,
+   .maxDescriptorSetStorageImages= (1u << 31) / 64,
+   .maxDescriptorSetInputAttachments = (1u << 31) / 64,
.maxVertexInputAttributes = 32,
.maxVertexInputBindings   = 32,
.maxVertexInputAttributeOffset= 2047,
-- 
2.12.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: Fix blob memory leak

2017-04-03 Thread Bartosz Tomczyk
---
 src/compiler/glsl/blob.h | 11 +++
 src/compiler/glsl/shader_cache.cpp   |  2 +-
 src/compiler/glsl/tests/blob_test.c  |  8 
 src/mesa/state_tracker/st_shader_cache.c |  2 +-
 4 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/src/compiler/glsl/blob.h b/src/compiler/glsl/blob.h
index 9de12e6eb8..940c81e13b 100644
--- a/src/compiler/glsl/blob.h
+++ b/src/compiler/glsl/blob.h
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef __cplusplus
 extern "C" {
@@ -80,6 +81,16 @@ struct blob *
 blob_create(void);
 
 /**
+ * Destroy a blob and free its memory.
+ */
+static inline void
+blob_destroy(struct blob *blob)
+{
+   free(blob->data);
+   free(blob);
+}
+
+/**
  * Add some unstructured, fixed-size data to a blob.
  *
  * \return True unless allocation failed.
diff --git a/src/compiler/glsl/shader_cache.cpp 
b/src/compiler/glsl/shader_cache.cpp
index ea1bc01f02..f5e6a22bb9 100644
--- a/src/compiler/glsl/shader_cache.cpp
+++ b/src/compiler/glsl/shader_cache.cpp
@@ -1273,7 +1273,7 @@ shader_cache_write_program_metadata(struct gl_context 
*ctx,
 
disk_cache_put(cache, prog->data->sha1, metadata->data, metadata->size);
 
-   free(metadata);
+   blob_destroy(metadata);
 
if (ctx->_Shader->Flags & GLSL_CACHE_INFO) {
   _mesa_sha1_format(sha1_buf, prog->data->sha1);
diff --git a/src/compiler/glsl/tests/blob_test.c 
b/src/compiler/glsl/tests/blob_test.c
index 01b5ef0b2f..df0e91af35 100644
--- a/src/compiler/glsl/tests/blob_test.c
+++ b/src/compiler/glsl/tests/blob_test.c
@@ -184,7 +184,7 @@ test_write_and_read_functions (void)
 "read_consumes_all_bytes");
expect_equal(false, reader.overrun, "read_does_not_overrun");
 
-   free(blob);
+   blob_destroy(blob);
 }
 
 /* Test that data values are written and read with proper alignment. */
@@ -242,7 +242,7 @@ test_alignment(void)
"aligned read of intptr_t");
}
 
-   free(blob);
+   blob_destroy(blob);
 }
 
 /* Test that we detect overrun. */
@@ -264,7 +264,7 @@ test_overrun(void)
expect_equal(0, blob_read_uint32(), "read at overrun");
expect_equal(true, reader.overrun, "overrun flag set");
 
-   free(blob);
+   blob_destroy(blob);
 }
 
 /* Test that we can read and write some large objects, (exercising the code in
@@ -308,7 +308,7 @@ test_big_objects(void)
expect_equal(false, reader.overrun,
 "overrun flag not set reading large objects");
 
-   free(blob);
+   blob_destroy(blob);
ralloc_free(ctx);
 }
 
diff --git a/src/mesa/state_tracker/st_shader_cache.c 
b/src/mesa/state_tracker/st_shader_cache.c
index e8c7289ec6..1a11f1135d 100644
--- a/src/mesa/state_tracker/st_shader_cache.c
+++ b/src/mesa/state_tracker/st_shader_cache.c
@@ -135,7 +135,7 @@ st_store_tgsi_in_disk_cache(struct st_context *st, struct 
gl_program *prog,
   _mesa_shader_stage_to_string(prog->info.stage), sha1_buf);
}
 
-   free(blob);
+   blob_destroy(blob);
 }
 
 static void
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallium/util: libunwind support

2017-04-03 Thread Thomas Hellstrom
On 04/03/2017 07:13 PM, Rob Clark wrote:
> On Mon, Apr 3, 2017 at 12:56 PM, Thomas Hellstrom  
> wrote:
>> Hi, Rob,
>>
>> On 03/24/2017 10:21 PM, Rob Clark wrote:
>>> It's kinda sad that (a) we don't have debug_backtrace support on !X86
>>> and that (b) we re-invent our own crude backtrace support in the first
>>> place.  If available, use libunwind instead.  The backtrace format is
>>> based on what xserver and weston use, since it is nice not to have to
>>> figure out a different format.
>>>
>>> Signed-off-by: Rob Clark 
>> Did you consider glibc "backtrace()", I think it's also available on ARM...
> I had not.. although xserver and weston are already using libunwind.
> I'm not sure about portability of libunwind to other libc
> implementations (but I guess it is at least not worse than using a
> glibc specific API).
>
> I suppose we could always add a fallback to backtrace().
>
>> Also is the output format the same as before, or at least compatible with
>> gallium/tools/addr2line.sh?
> quite possibly not.. I chose to align to the format that xserver and
> weston was already using.  Otoh, not sure if you would need to use
> addr2line.sh since it already decodes things to human readable
> fxn/file names.
>
> BR,
> -R

backtrace() (or the homebrew i386 implementation) + addr2line.sh gives
you both function name, source file and line number, which IMO is pretty
useful. I'll give the new format a shot. Perhaps if needed we could have
a config option to choose the method. I guess this is mostly used in
DEBUG builds anyway, so a developer can choose the desired method...

/Thomas



>
>> Thanks,
>> Thomas
>>


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallium/util: libunwind support

2017-04-03 Thread Kristian Høgsberg
On Mon, Apr 3, 2017 at 10:13 AM, Rob Clark  wrote:
> On Mon, Apr 3, 2017 at 12:56 PM, Thomas Hellstrom  
> wrote:
>> Hi, Rob,
>>
>> On 03/24/2017 10:21 PM, Rob Clark wrote:
>>> It's kinda sad that (a) we don't have debug_backtrace support on !X86
>>> and that (b) we re-invent our own crude backtrace support in the first
>>> place.  If available, use libunwind instead.  The backtrace format is
>>> based on what xserver and weston use, since it is nice not to have to
>>> figure out a different format.
>>>
>>> Signed-off-by: Rob Clark 
>>
>> Did you consider glibc "backtrace()", I think it's also available on ARM...
>
> I had not.. although xserver and weston are already using libunwind.
> I'm not sure about portability of libunwind to other libc
> implementations (but I guess it is at least not worse than using a
> glibc specific API).

We did use glibc backtrace initially in weston, but switched to
libbacktrace to get better support for symbols in dlopened modules.
Here's the commit message:

compositor: Use libunwind if available for better backtraces

libunwind has a dwarf parser and automatically queries the dlinfo
for location of dlopened modules.  The resulting backtrace is much
better and includes stack frames in dynamically loaded modules.

krh: Originally submitted for Xorg, adapted for weston:

  http://lists.x.org/archives/xorg-devel/2013-February/035493.html

Note this require libunwind at least 1.1 to get the pkg-config files.

Signed-off-by: Marcin Slusarz 

See 
https://cgit.freedesktop.org/wayland/weston/commit/?id=554a0da74a3f6fc945503a3eb1bfcc9038441b39

Kristian

> I suppose we could always add a fallback to backtrace().
>
>> Also is the output format the same as before, or at least compatible with
>> gallium/tools/addr2line.sh?
>
> quite possibly not.. I chose to align to the format that xserver and
> weston was already using.  Otoh, not sure if you would need to use
> addr2line.sh since it already decodes things to human readable
> fxn/file names.
>
> BR,
> -R
>
>> Thanks,
>> Thomas
>>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallium/util: libunwind support

2017-04-03 Thread Rob Clark
On Mon, Apr 3, 2017 at 12:56 PM, Thomas Hellstrom  wrote:
> Hi, Rob,
>
> On 03/24/2017 10:21 PM, Rob Clark wrote:
>> It's kinda sad that (a) we don't have debug_backtrace support on !X86
>> and that (b) we re-invent our own crude backtrace support in the first
>> place.  If available, use libunwind instead.  The backtrace format is
>> based on what xserver and weston use, since it is nice not to have to
>> figure out a different format.
>>
>> Signed-off-by: Rob Clark 
>
> Did you consider glibc "backtrace()", I think it's also available on ARM...

I had not.. although xserver and weston are already using libunwind.
I'm not sure about portability of libunwind to other libc
implementations (but I guess it is at least not worse than using a
glibc specific API).

I suppose we could always add a fallback to backtrace().

> Also is the output format the same as before, or at least compatible with
> gallium/tools/addr2line.sh?

quite possibly not.. I chose to align to the format that xserver and
weston was already using.  Otoh, not sure if you would need to use
addr2line.sh since it already decodes things to human readable
fxn/file names.

BR,
-R

> Thanks,
> Thomas
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallium/util: libunwind support

2017-04-03 Thread Thomas Hellstrom
Hi, Rob,

On 03/24/2017 10:21 PM, Rob Clark wrote:
> It's kinda sad that (a) we don't have debug_backtrace support on !X86
> and that (b) we re-invent our own crude backtrace support in the first
> place.  If available, use libunwind instead.  The backtrace format is
> based on what xserver and weston use, since it is nice not to have to
> figure out a different format.
>
> Signed-off-by: Rob Clark 

Did you consider glibc "backtrace()", I think it's also available on ARM...

Also is the output format the same as before, or at least compatible with
gallium/tools/addr2line.sh?

Thanks,
Thomas

> ---
>  configure.ac   | 24 
>  src/gallium/Automake.inc   |  1 +
>  src/gallium/auxiliary/util/u_debug_stack.c | 91 
> ++
>  src/gallium/auxiliary/util/u_debug_stack.h | 15 -
>  4 files changed, 129 insertions(+), 2 deletions(-)
>
> diff --git a/configure.ac b/configure.ac
> index a99684b..5046acb 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -1025,6 +1025,30 @@ AC_SUBST([LLVM_LIBS])
>  AC_SUBST([LLVM_LDFLAGS])
>  AC_SUBST([LLVM_INCLUDEDIR])
>  
> +dnl
> +dnl libunwind
> +dnl
> +AC_ARG_ENABLE([libunwind],
> +[AS_HELP_STRING([--enable-libunwind],
> +[Use libunwind for backtracing (default: auto)])],
> +[LIBUNWIND="$enableval"],
> +[LIBUNWIND="auto"])
> +
> +PKG_CHECK_MODULES(LIBUNWIND, libunwind, [HAVE_LIBUNWIND=yes], 
> [HAVE_LIBUNWIND=no])
> +if test "x$LIBUNWIND" = "xauto"; then
> +LIBUNWIND="$HAVE_LIBUNWIND"
> +fi
> +
> +if test "x$LIBUNWIND" = "xyes"; then
> +if test "x$HAVE_LIBUNWIND" != "xyes"; then
> +AC_MSG_ERROR([libunwind requested but not installed.])
> +fi
> +AC_DEFINE(HAVE_LIBUNWIND, 1, [Have libunwind support])
> +fi
> +
> +AM_CONDITIONAL(HAVE_LIBUNWIND, [test "x$LIBUNWIND" = xyes])
> +
> +
>  dnl Options for APIs
>  AC_ARG_ENABLE([opengl],
>  [AS_HELP_STRING([--disable-opengl],
> diff --git a/src/gallium/Automake.inc b/src/gallium/Automake.inc
> index a01fa54..48b5a44 100644
> --- a/src/gallium/Automake.inc
> +++ b/src/gallium/Automake.inc
> @@ -46,6 +46,7 @@ GALLIUM_TARGET_CFLAGS = \
>  
>  GALLIUM_COMMON_LIB_DEPS = \
>   -lm \
> + $(LIBUNWIND_LIBS) \
>   $(LIBSENSORS_LIBS) \
>   $(CLOCK_LIB) \
>   $(PTHREAD_LIBS) \
> diff --git a/src/gallium/auxiliary/util/u_debug_stack.c 
> b/src/gallium/auxiliary/util/u_debug_stack.c
> index f941234..cf05f13 100644
> --- a/src/gallium/auxiliary/util/u_debug_stack.c
> +++ b/src/gallium/auxiliary/util/u_debug_stack.c
> @@ -36,6 +36,95 @@
>  #include "u_debug_symbol.h"
>  #include "u_debug_stack.h"
>  
> +#if defined(HAVE_LIBUNWIND)
> +
> +#ifndef _GNU_SOURCE
> +#define _GNU_SOURCE
> +#endif
> +#include 
> +
> +void
> +debug_backtrace_capture(struct debug_stack_frame *backtrace,
> +unsigned start_frame,
> +unsigned nr_frames)
> +{
> +   unw_cursor_t cursor;
> +   unw_context_t context;
> +   unw_proc_info_t pip;
> +   unsigned i = 0;
> +   int ret;
> +
> +   pip.unwind_info = NULL;
> +
> +   unw_getcontext();
> +   unw_init_local(, );
> +
> +   while ((start_frame > 0) && (unw_step() > 0))
> +  start_frame--;
> +
> +   while (unw_step() > 0) {
> +  char procname[256];
> +  const char *filename;
> +  unw_word_t off;
> +  Dl_info dlinfo;
> +
> +  unw_get_proc_info(, );
> +
> +  ret = unw_get_proc_name(, procname, 256, );
> +  if (ret && ret != -UNW_ENOMEM) {
> + procname[0] = '?';
> + procname[1] = 0;
> +  }
> +
> +   if (dladdr((void *)(uintptr_t)(pip.start_ip + off), ) && 
> dlinfo.dli_fname &&
> +   *dlinfo.dli_fname)
> +   filename = dlinfo.dli_fname;
> +   else
> +   filename = "?";
> +
> +  snprintf(backtrace[i].buf, sizeof(backtrace[i].buf),
> +"%u: %s (%s%s+0x%x) [%p]", i, filename, procname,
> +ret == -UNW_ENOMEM ? "..." : "", (int)off,
> +(void *)(uintptr_t)(pip.start_ip + off));
> +
> +  i++;
> +   }
> +
> +   while (i < nr_frames) {
> +  backtrace[i].buf[0] = '\0';
> +  i++;
> +   }
> +}
> +
> +void
> +debug_backtrace_dump(const struct debug_stack_frame *backtrace,
> + unsigned nr_frames)
> +{
> +   unsigned i;
> +
> +   for (i = 0; i < nr_frames; ++i) {
> +  if (backtrace[i].buf[0] == '\0')
> + break;
> +  debug_printf("\t%s\n", backtrace[i].buf);
> +   }
> +}
> +
> +void
> +debug_backtrace_print(FILE *f,
> +  const struct debug_stack_frame *backtrace,
> +  unsigned nr_frames)
> +{
> +   unsigned i;
> +
> +   for (i = 0; i < nr_frames; ++i) {
> +  if (backtrace[i].buf[0] == '\0')
> + break;
> +  fprintf(f, "\t%s\n", backtrace[i].buf);
> +   }
> +}
> +
> +#else /* ! HAVE_LIBUNWIND */
> +
>  #if defined(PIPE_OS_WINDOWS)
>  #include 
>  #endif
> @@ -179,3 +268,5 @@ debug_backtrace_print(FILE 

Re: [Mesa-dev] [PATCH v2] anv: add support for allocating more than 1 block of memory

2017-04-03 Thread Jason Ekstrand
On Mon, Apr 3, 2017 at 7:44 AM, Jason Ekstrand  wrote:

> On Mon, Apr 3, 2017 at 5:02 AM, Juan A. Suarez Romero  > wrote:
>
>> On Wed, 2017-03-29 at 12:06 -0700, Jason Ekstrand wrote:
>> > Looking over the patch, I think I've convinced myself that it's
>> correct.  (I honestly wasn't expecting to come to that conclusion without
>> more iteration.)  That said, this raises some interesting questions.  I
>> added Kristian to the Cc in case he has any input.
>> >
>> >  1. Should we do powers of two or linear.  I'm still a fan of powers of
>> two.
>>
>> In which specific part do you mean? In free block lists? or in the
>> allocator_new?
>>
>
> In the state pool, we just round all allocation sizes up to the nearest
> power-of-two, and then the index into the array of free lists isn't "size -
> base", it's "ilog2(size) - base".
>
>
>> >
>> >  2. Should block pools even have a block size at all? We could just
>> make every block pool allow any power-of-two size from 4 KiB up to. say, 1
>> MiB and then make the block size part of the state pool or stream that's
>> allocating from it.  At the moment, I like this idea, but I've given it
>> very little thought.
>> >
>> So IIUC, the idea would be the block pool is just a flat chunk of
>> memory, where we later fetch blocks of memory from, as requested by
>> applications. Is that correct?
>>
>
Thinking about this again, I think your statement may have been more
correct than I thought.  If we make the state_stream chain off of a
state_pool rather than the block_pool, we could make the block pool
structure a simple bi-directional growing BO and trust in the state pool
for 100% of the re-use.  That would probably be a way simpler structure.
For that matter, the state pool could just own the block_pool and
setup/teardown would be easier.


> Sorry, but this patch gave me some sudden revelations and things are still
> in the process of reforming in my brain.  In the past, we assumed a
> two-layered allocation strategy where we had a block pool which was the
> base and then the state pool and state stream allocators sat on top of it.
> Originally, the state pool allocator was just for a single size as well.
>
> Now that the block pool is going to be capable of allocating multiple
> sizes, the whole mental model of the separation falls apart.  The new
> future that I see is one where the block pool and state pool aren't
> separate.  Instead, we have a single thing which I'll call state_pool (we
> have to pick one of the names) which lets you allocate a state of any size
> from 32B to 2MB.  The state stream will then allocate off of a state_pool
> instead of a block_pool since they're now the same.  For smaller states, we
> still want to allocate reasonably large chunks (probably 4K) so that we
> ensure that things are nicely aligned.  I think I've got a pretty good idea
> of how it should work at this point and can write more if you'd like.
>
> Before we dive in and do a pile of refactoring, I think this patch is
> pretty much good-to-go assuming we fix the power-of-two thing and it fixes
> a bug so let's focus there.  Are you  interested in doing the refactoring?
> If not, that's ok, I'm happy to do it and it wouldn't take me long now that
> I've gotten a chance to think about it.  If you are interested, go for it!
>
>
>> >  3. If we go with the idea in 2. should we still call it block_pool?  I
>> think we can keep the name but it doesn't it as well as it once did.
>> >
>> > Thanks for working on this!  I'm sorry it's taken so long to respond.
>> Every time I've looked at it, my brain hasn't been in the right state to
>> think about lock-free code. :-/
>> >
>> > On Wed, Mar 15, 2017 at 5:05 AM, Juan A. Suarez Romero <
>> jasua...@igalia.com> wrote:
>> > > Current Anv allocator assign memory in terms of a fixed block size.
>> > >
>> > > But there can be cases where this block is not enough for a memory
>> > > request, and thus several blocks must be assigned in a row.
>> > >
>> > > This commit adds support for specifying how many blocks of memory must
>> > > be assigned.
>> > >
>> > > This fixes a number dEQP-VK.pipeline.render_to_image.* tests that
>> crash.
>> > >
>> > > v2: lock-free free-list is not handled correctly (Jason)
>> > > ---
>> > >  src/intel/vulkan/anv_allocator.c   | 81
>> +++---
>> > >  src/intel/vulkan/anv_batch_chain.c |  4 +-
>> > >  src/intel/vulkan/anv_private.h |  7 +++-
>> > >  3 files changed, 66 insertions(+), 26 deletions(-)
>> > >
>> > > diff --git a/src/intel/vulkan/anv_allocator.c
>> b/src/intel/vulkan/anv_allocator.c
>> > > index 45c663b..3924551 100644
>> > > --- a/src/intel/vulkan/anv_allocator.c
>> > > +++ b/src/intel/vulkan/anv_allocator.c
>> > > @@ -257,7 +257,8 @@ anv_block_pool_init(struct anv_block_pool *pool,
>> > > pool->device = device;
>> > > anv_bo_init(>bo, 0, 0);
>> > > pool->block_size = block_size;
>> > > -   pool->free_list = 

Re: [Mesa-dev] [PATCH v2] anv: add support for allocating more than 1 block of memory

2017-04-03 Thread Jason Ekstrand
On Mon, Apr 3, 2017 at 9:40 AM, Kristian Høgsberg 
wrote:

> On Wed, Mar 29, 2017 at 12:06 PM, Jason Ekstrand 
> wrote:
> > Looking over the patch, I think I've convinced myself that it's
> correct.  (I
> > honestly wasn't expecting to come to that conclusion without more
> > iteration.)  That said, this raises some interesting questions.  I added
> > Kristian to the Cc in case he has any input.
>
> I haven't looked closely, and I agree it's increasingly tricky code to
> review. I'd be careful about making this a fully generic any-block
> size allocator. The premise, when we first designed this, was that for
> something like a fixed-size, power-of-two pool, we could write a fast,
> lock-less and fragmentation free allocator without getting in over our
> heads.


Yeah, it's getting tricky but I don't think it's outside the realm of
humans yet. :-)


> However, if this evolves (devolves?) into "malloc, but for
> bos", it may be time to take a step back and look if something like
> jemalloc with bo backing is a better choice.
>

Yeah, it may be time to start considering that.  Unfortunately, I don't
think we can use jemalloc for this in a safe way.  jemalloc does provide a
way to manage pools but it does so, not with a pool pointer, but with an
arena number in a global namespace.  If an app were to use jemalloc (I
wouldn't be surprised in the gaming world if jemalloc is fairly
common-place) and uses arenas, we could end up with silent collisions.

At some point (probably later this year), I hope to look into pulling in a
foreign memory allocator and use that for BO placement and start
softpinning the universe (no relocations!).  That may be a good time to
revisit allocation strategies a bit.

Kristian
>
> >
> >  1. Should we do powers of two or linear.  I'm still a fan of powers of
> two.
> >
> >  2. Should block pools even have a block size at all? We could just make
> > every block pool allow any power-of-two size from 4 KiB up to. say, 1 MiB
> > and then make the block size part of the state pool or stream that's
> > allocating from it.  At the moment, I like this idea, but I've given it
> very
> > little thought.
> >
> >  3. If we go with the idea in 2. should we still call it block_pool?  I
> > think we can keep the name but it doesn't it as well as it once did.
> >
> > Thanks for working on this!  I'm sorry it's taken so long to respond.
> Every
> > time I've looked at it, my brain hasn't been in the right state to think
> > about lock-free code. :-/
> >
> > On Wed, Mar 15, 2017 at 5:05 AM, Juan A. Suarez Romero <
> jasua...@igalia.com>
> > wrote:
> >>
> >> Current Anv allocator assign memory in terms of a fixed block size.
> >>
> >> But there can be cases where this block is not enough for a memory
> >> request, and thus several blocks must be assigned in a row.
> >>
> >> This commit adds support for specifying how many blocks of memory must
> >> be assigned.
> >>
> >> This fixes a number dEQP-VK.pipeline.render_to_image.* tests that
> crash.
> >>
> >> v2: lock-free free-list is not handled correctly (Jason)
> >> ---
> >>  src/intel/vulkan/anv_allocator.c   | 81
> >> +++---
> >>  src/intel/vulkan/anv_batch_chain.c |  4 +-
> >>  src/intel/vulkan/anv_private.h |  7 +++-
> >>  3 files changed, 66 insertions(+), 26 deletions(-)
> >>
> >> diff --git a/src/intel/vulkan/anv_allocator.c
> >> b/src/intel/vulkan/anv_allocator.c
> >> index 45c663b..3924551 100644
> >> --- a/src/intel/vulkan/anv_allocator.c
> >> +++ b/src/intel/vulkan/anv_allocator.c
> >> @@ -257,7 +257,8 @@ anv_block_pool_init(struct anv_block_pool *pool,
> >> pool->device = device;
> >> anv_bo_init(>bo, 0, 0);
> >> pool->block_size = block_size;
> >> -   pool->free_list = ANV_FREE_LIST_EMPTY;
> >> +   for (uint32_t i = 0; i < ANV_MAX_BLOCKS; i++)
> >> +  pool->free_list[i] = ANV_FREE_LIST_EMPTY;
> >> pool->back_free_list = ANV_FREE_LIST_EMPTY;
> >>
> >> pool->fd = memfd_create("block pool", MFD_CLOEXEC);
> >> @@ -500,30 +501,35 @@ fail:
> >>
> >>  static uint32_t
> >>  anv_block_pool_alloc_new(struct anv_block_pool *pool,
> >> - struct anv_block_state *pool_state)
> >> + struct anv_block_state *pool_state,
> >> + uint32_t n_blocks)
> >
> >
> > Maybe have this take a size rather than n_blocks?  It's only ever called
> by
> > stuff in the block pool so the caller can do the multiplication.  It
> would
> > certainly make some of the math below easier.
> >
> >>
> >>  {
> >> struct anv_block_state state, old, new;
> >>
> >> while (1) {
> >> -  state.u64 = __sync_fetch_and_add(_state->u64,
> >> pool->block_size);
> >> -  if (state.next < state.end) {
> >> +  state.u64 = __sync_fetch_and_add(_state->u64, n_blocks *
> >> pool->block_size);
> >> +  if (state.next > state.end) {
> >> + futex_wait(_state->end, state.end);
> >> + continue;
> >> 

[Mesa-dev] [PATCH] bin/get-fixes-pick-list.sh: fix typo

2017-04-03 Thread Juan A. Suarez Romero
Replace "nore" by "more".
---
 bin/get-fixes-pick-list.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bin/get-fixes-pick-list.sh b/bin/get-fixes-pick-list.sh
index 59bcae4..75242a2 100755
--- a/bin/get-fixes-pick-list.sh
+++ b/bin/get-fixes-pick-list.sh
@@ -27,7 +27,7 @@ do
# For each one try to extract the tag
fixes_count=`git show $sha | grep -i "fixes:" | wc -l`
if [ "x$fixes_count" != x1 ] ; then
-   echo WARNING: Commit $sha has nore than one Fixes tag
+   echo WARNING: Commit $sha has more than one Fixes tag
fi
fixes=`git show $sha | grep -i "fixes:" | head -n 1`
# The following sed/cut combination is borrowed from GregKH
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/9] gallium: decrease the size of pipe_box - 24 -> 16 bytes

2017-04-03 Thread Marek Olšák
On Apr 3, 2017 5:11 PM, "Alex Deucher"  wrote:

On Sun, Apr 2, 2017 at 2:00 PM, Marek Olšák  wrote:
> From: Marek Olšák 
>
> Also:
>
> pipe_transfer: 48 -> 40 bytes.
> pipe_blit_info = 176 -> 160 bytes.
> ---
>  src/gallium/include/pipe/p_state.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/src/gallium/include/pipe/p_state.h
b/src/gallium/include/pipe/p_state.h
> index 392bb8b..6a147ef 100644
> --- a/src/gallium/include/pipe/p_state.h
> +++ b/src/gallium/include/pipe/p_state.h
> @@ -472,25 +472,25 @@ struct pipe_image_view
> } u;
>  };
>
>
>  /**
>   * Subregion of 1D/2D/3D image resource.
>   */
>  struct pipe_box
>  {
> int x;
> -   int y;
> -   int z;
> +   int16_t y;
> +   int16_t z;
> int width;
> -   int height;
> -   int depth;
> +   int16_t height;
> +   int16_t depth;
>  };

Not specific to this patch per se, but will this cause any issues in
the future as max surface sizes supported by hw increase?  I feel like
we'll need to change this back at some point.


We should be fine as long as OpenGL and/or games don't require bigger sizes.

Marek


Alex

>
>
>  /**
>   * A memory object/resource such as a vertex buffer or texture.
>   */
>  struct pipe_resource
>  {
> struct pipe_reference reference;
> struct pipe_screen *screen; /**< screen that this texture belongs to
*/
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] anv: add support for allocating more than 1 block of memory

2017-04-03 Thread Kristian Høgsberg
On Wed, Mar 29, 2017 at 12:06 PM, Jason Ekstrand  wrote:
> Looking over the patch, I think I've convinced myself that it's correct.  (I
> honestly wasn't expecting to come to that conclusion without more
> iteration.)  That said, this raises some interesting questions.  I added
> Kristian to the Cc in case he has any input.

I haven't looked closely, and I agree it's increasingly tricky code to
review. I'd be careful about making this a fully generic any-block
size allocator. The premise, when we first designed this, was that for
something like a fixed-size, power-of-two pool, we could write a fast,
lock-less and fragmentation free allocator without getting in over our
heads. However, if this evolves (devolves?) into "malloc, but for
bos", it may be time to take a step back and look if something like
jemalloc with bo backing is a better choice.

Kristian

>
>  1. Should we do powers of two or linear.  I'm still a fan of powers of two.
>
>  2. Should block pools even have a block size at all? We could just make
> every block pool allow any power-of-two size from 4 KiB up to. say, 1 MiB
> and then make the block size part of the state pool or stream that's
> allocating from it.  At the moment, I like this idea, but I've given it very
> little thought.
>
>  3. If we go with the idea in 2. should we still call it block_pool?  I
> think we can keep the name but it doesn't it as well as it once did.
>
> Thanks for working on this!  I'm sorry it's taken so long to respond.  Every
> time I've looked at it, my brain hasn't been in the right state to think
> about lock-free code. :-/
>
> On Wed, Mar 15, 2017 at 5:05 AM, Juan A. Suarez Romero 
> wrote:
>>
>> Current Anv allocator assign memory in terms of a fixed block size.
>>
>> But there can be cases where this block is not enough for a memory
>> request, and thus several blocks must be assigned in a row.
>>
>> This commit adds support for specifying how many blocks of memory must
>> be assigned.
>>
>> This fixes a number dEQP-VK.pipeline.render_to_image.* tests that crash.
>>
>> v2: lock-free free-list is not handled correctly (Jason)
>> ---
>>  src/intel/vulkan/anv_allocator.c   | 81
>> +++---
>>  src/intel/vulkan/anv_batch_chain.c |  4 +-
>>  src/intel/vulkan/anv_private.h |  7 +++-
>>  3 files changed, 66 insertions(+), 26 deletions(-)
>>
>> diff --git a/src/intel/vulkan/anv_allocator.c
>> b/src/intel/vulkan/anv_allocator.c
>> index 45c663b..3924551 100644
>> --- a/src/intel/vulkan/anv_allocator.c
>> +++ b/src/intel/vulkan/anv_allocator.c
>> @@ -257,7 +257,8 @@ anv_block_pool_init(struct anv_block_pool *pool,
>> pool->device = device;
>> anv_bo_init(>bo, 0, 0);
>> pool->block_size = block_size;
>> -   pool->free_list = ANV_FREE_LIST_EMPTY;
>> +   for (uint32_t i = 0; i < ANV_MAX_BLOCKS; i++)
>> +  pool->free_list[i] = ANV_FREE_LIST_EMPTY;
>> pool->back_free_list = ANV_FREE_LIST_EMPTY;
>>
>> pool->fd = memfd_create("block pool", MFD_CLOEXEC);
>> @@ -500,30 +501,35 @@ fail:
>>
>>  static uint32_t
>>  anv_block_pool_alloc_new(struct anv_block_pool *pool,
>> - struct anv_block_state *pool_state)
>> + struct anv_block_state *pool_state,
>> + uint32_t n_blocks)
>
>
> Maybe have this take a size rather than n_blocks?  It's only ever called by
> stuff in the block pool so the caller can do the multiplication.  It would
> certainly make some of the math below easier.
>
>>
>>  {
>> struct anv_block_state state, old, new;
>>
>> while (1) {
>> -  state.u64 = __sync_fetch_and_add(_state->u64,
>> pool->block_size);
>> -  if (state.next < state.end) {
>> +  state.u64 = __sync_fetch_and_add(_state->u64, n_blocks *
>> pool->block_size);
>> +  if (state.next > state.end) {
>> + futex_wait(_state->end, state.end);
>> + continue;
>> +  } else if ((state.next + (n_blocks - 1) * pool->block_size) <
>> state.end) {
>
>
> First off, please keep the if's in the same order unless we have a reason to
> re-arrange them.  It would make this way easier to review. :-)
>
> Second, I think this would be much easier to read as:
>
> if (state.next + size <= state.end) {
>/* Success */
> } else if (state.next <= state.end) {
>/* Our block is the one that crosses the line */
> } else {
>/* Wait like everyone else */
> }
>
>>
>>   assert(pool->map);
>>   return state.next;
>> -  } else if (state.next == state.end) {
>> - /* We allocated the first block outside the pool, we have to
>> grow it.
>> -  * pool_state->next acts a mutex: threads who try to allocate
>> now will
>> -  * get block indexes above the current limit and hit futex_wait
>> -  * below. */
>> - new.next = state.next + pool->block_size;
>> +  } else {
>> + /* We allocated the firsts blocks outside the pool, we have to
>> grow
>> + 

[Mesa-dev] [Bug 100259] [EGL] [GBM] undefined reference to `gbm_bo_create_with_modifiers'

2017-04-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=100259

--- Comment #14 from Emil Velikov  ---
Ouch that is some nasty bug in Slackware packaging.

Note you want /usr/local/ and /usr/ in the same order across
--with-pkg-config-dir and --with-system-libdir.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 9/9] gallium: decrease the size of pipe_draw_info - 88 -> 80 bytes

2017-04-03 Thread Marek Olšák
On Apr 3, 2017 4:04 PM, "Brian Paul"  wrote:

I need to test this series on Windows and with MinGW first.  I'm worried
about enums with bitfields.


Hopefully it'll work. Enums without bitfields always occupy 4 bytes. If
bitfields don't work with those, we'll have to stop using enums in these
cases.

Marek



-Brian


On Mon, Apr 3, 2017 at 2:46 AM, Nicolai Hähnle  wrote:

> Some of these are a bit subtle, but I think they're fine. Series is:
>
> Reviewed-by: Nicolai Hähnle 
>
>
> On 02.04.2017 20:00, Marek Olšák wrote:
>
>> From: Marek Olšák 
>>
>> ---
>>  src/gallium/auxiliary/indices/u_primconvert.c | 10 --
>>  src/gallium/include/pipe/p_state.h|  2 +-
>>  2 files changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/indices/u_primconvert.c
>> b/src/gallium/auxiliary/indices/u_primconvert.c
>> index 2bdfade..1ffca4b 100644
>> --- a/src/gallium/auxiliary/indices/u_primconvert.c
>> +++ b/src/gallium/auxiliary/indices/u_primconvert.c
>> @@ -121,39 +121,45 @@ util_primconvert_draw_vbo(struct
>> primconvert_context *pc,
>> util_draw_init_info(_info);
>> new_info.indexed = true;
>> new_info.min_index = info->min_index;
>> new_info.max_index = info->max_index;
>> new_info.index_bias = info->index_bias;
>> new_info.start_instance = info->start_instance;
>> new_info.instance_count = info->instance_count;
>> new_info.primitive_restart = info->primitive_restart;
>> new_info.restart_index = info->restart_index;
>> if (info->indexed) {
>> +  enum pipe_prim_type mode = 0;
>> +
>>u_index_translator(pc->primtypes_mask,
>>   info->mode, pc->saved_ib.index_size,
>> info->count,
>>   pc->api_pv, pc->api_pv,
>>   info->primitive_restart ? PR_ENABLE :
>> PR_DISABLE,
>> - _info.mode, _ib.index_size,
>> _info.count,
>> + , _ib.index_size, _info.count,
>>   _func);
>> +  new_info.mode = mode;
>>src = ib->user_buffer;
>>if (!src) {
>>   src = pipe_buffer_map(pc->pipe, ib->buffer,
>> PIPE_TRANSFER_READ, _transfer);
>>}
>>src = (const uint8_t *)src + ib->offset;
>> }
>> else {
>> +  enum pipe_prim_type mode = 0;
>> +
>>u_index_generator(pc->primtypes_mask,
>>  info->mode, info->start, info->count,
>>  pc->api_pv, pc->api_pv,
>> -_info.mode, _ib.index_size,
>> _info.count,
>> +, _ib.index_size, _info.count,
>>  _func);
>> +  new_info.mode = mode;
>> }
>>
>> u_upload_alloc(pc->pipe->stream_uploader, 0, new_ib.index_size *
>> new_info.count, 4,
>>_ib.offset, _ib.buffer, );
>>
>> if (info->indexed) {
>>trans_func(src, info->start, info->count, new_info.count,
>> info->restart_index, dst);
>> }
>> else {
>>gen_func(info->start, new_info.count, dst);
>> diff --git a/src/gallium/include/pipe/p_state.h
>> b/src/gallium/include/pipe/p_state.h
>> index d68a4d4..eeabf8b 100644
>> --- a/src/gallium/include/pipe/p_state.h
>> +++ b/src/gallium/include/pipe/p_state.h
>> @@ -634,21 +634,21 @@ struct pipe_index_buffer
>> const void *user_buffer;  /**< pointer to a user buffer if buffer ==
>> NULL */
>>  };
>>
>>
>>  /**
>>   * Information to describe a draw_vbo call.
>>   */
>>  struct pipe_draw_info
>>  {
>> boolean indexed;  /**< use index buffer */
>> -   enum pipe_prim_type mode;  /**< the mode of the primitive */
>> +   enum pipe_prim_type mode:8;  /**< the mode of the primitive */
>> boolean primitive_restart;
>> ubyte vertices_per_patch; /**< the number of vertices per patch */
>>
>> unsigned start;  /**< the index of the first vertex */
>> unsigned count;  /**< number of vertices */
>>
>> unsigned start_instance; /**< first instance id */
>> unsigned instance_count; /**< number of instances */
>>
>> unsigned drawid; /**< id of this draw in a multidraw */
>>
>>
>
> --
> Lerne, wie die Welt wirklich ist,
> Aber vergiss niemals, wie sie sein sollte.
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] util: #include "c99_compat.h" to fix Windows build

2017-04-03 Thread Jose Fonseca

On 03/04/17 17:09, Brian Paul wrote:

Otherwise, we were getting the definition for 'inline' by chance from
some other preceeding #include.
---
 src/util/list.h  | 1 +
 src/util/mesa-sha1.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/util/list.h b/src/util/list.h
index 07eb9f3..6edb750 100644
--- a/src/util/list.h
+++ b/src/util/list.h
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include "c99_compat.h"


 struct list_head
diff --git a/src/util/mesa-sha1.h b/src/util/mesa-sha1.h
index d3f7aff..bde50ba 100644
--- a/src/util/mesa-sha1.h
+++ b/src/util/mesa-sha1.h
@@ -24,6 +24,7 @@
 #define MESA_SHA1_H

 #include 
+#include "c99_compat.h"
 #include "sha1/sha1.h"

 #ifdef __cplusplus



Looks good.

Series is

Reviewed-by: Jose Fonseca 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv/pipeline: do not disable depth writes if depth testing is disabled

2017-04-03 Thread Jason Ekstrand
On Mon, Apr 3, 2017 at 8:55 AM, Nanley Chery  wrote:

> On Mon, Apr 03, 2017 at 08:02:54AM +0200, Iago Toral wrote:
> > Can anyone review this one?
> >
> > On Wed, 2017-03-29 at 08:58 +0200, Iago Toral Quiroga wrote:
> > > Writing and testing are two different things and they can be set
> > > separately
> > > by the application. If an application wants to record depth data
> > > without
> > > caring for the depth test, it can enable depth test and set the depth
> > > compare function to VK_COMPARE_OP_ALWAYS or it can simply disable
> > > depth testing altogether. Some CTS tests do the latter.
> > >
>
> Doesn't disabling the depth test prevent any updates to the depth
> buffer?
>
> Section 25.10 Depth Test from the Vulkan spec says:
>
>When disabled, the depth comparison and subsequent possible updates
>to the value of the depth component of the depth/stencil attachment
>are bypassed and the fragment is passed to the next operation.
>

Thanks!  Maybe I wasn't crazy when I wrote that line of code.  Consider my
review retracted until we're sure this isn't a test bug.


> -Nanley
>
> > > Fixes all multisample tests with depth-only formats in:
> > > dEQP-VK.renderpass.multisample.*
> > > ---
> > >  src/intel/vulkan/genX_pipeline.c | 4 
> > >  1 file changed, 4 deletions(-)
> > >
> > > diff --git a/src/intel/vulkan/genX_pipeline.c
> > > b/src/intel/vulkan/genX_pipeline.c
> > > index 85a9e4f..dc393cb 100644
> > > --- a/src/intel/vulkan/genX_pipeline.c
> > > +++ b/src/intel/vulkan/genX_pipeline.c
> > > @@ -728,10 +728,6 @@
> > > sanitize_ds_state(VkPipelineDepthStencilStateCreateInfo *state,
> > >  {
> > > *stencilWriteEnable = state->stencilTestEnable;
> > >
> > > -   /* If the depth test is disabled, we won't be writing anything.
> > > */
> > > -   if (!state->depthTestEnable)
> > > -  state->depthWriteEnable = false;
> > > -
> > > /* The Vulkan spec requires that if either depth or stencil is
> > > not present,
> > >  * the pipeline is to act as if the test silently passes.
> > >  */
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] util: fix MSVC warning in u_align_u32()

2017-04-03 Thread Brian Paul
To silence
C:\Users\Brian\projects\mesa\src\util/u_vector.h(41) : warning C4146: unary
minus operator applied to unsigned type, result still unsigned
---
 src/util/u_vector.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/util/u_vector.h b/src/util/u_vector.h
index f97a8b4..c7fcb37 100644
--- a/src/util/u_vector.h
+++ b/src/util/u_vector.h
@@ -38,7 +38,7 @@
 static inline uint32_t
 u_align_u32(uint32_t v, uint32_t a)
 {
-   assert(a != 0 && a == (a & -a));
+   assert(a != 0 && a == (a & -((int32_t) a)));
return (v + a - 1) & ~(a - 1);
 }
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] util: s/SHA1_H/MESA_SHA1_H/

2017-04-03 Thread Brian Paul
To follow the convention of other header include guards.
---
 src/util/mesa-sha1.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/util/mesa-sha1.h b/src/util/mesa-sha1.h
index a81aba9..d3f7aff 100644
--- a/src/util/mesa-sha1.h
+++ b/src/util/mesa-sha1.h
@@ -20,8 +20,8 @@
  * DEALINGS IN THE SOFTWARE.
  */
 
-#ifndef SHA1_H
-#define SHA1_H
+#ifndef MESA_SHA1_H
+#define MESA_SHA1_H
 
 #include 
 #include "sha1/sha1.h"
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] util: #include "c99_compat.h" to fix Windows build

2017-04-03 Thread Brian Paul
Otherwise, we were getting the definition for 'inline' by chance from
some other preceeding #include.
---
 src/util/list.h  | 1 +
 src/util/mesa-sha1.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/util/list.h b/src/util/list.h
index 07eb9f3..6edb750 100644
--- a/src/util/list.h
+++ b/src/util/list.h
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include "c99_compat.h"
 
 
 struct list_head
diff --git a/src/util/mesa-sha1.h b/src/util/mesa-sha1.h
index d3f7aff..bde50ba 100644
--- a/src/util/mesa-sha1.h
+++ b/src/util/mesa-sha1.h
@@ -24,6 +24,7 @@
 #define MESA_SHA1_H
 
 #include 
+#include "c99_compat.h"
 #include "sha1/sha1.h"
 
 #ifdef __cplusplus
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv/nir: Delete the apply_dynamic_offsets prototype

2017-04-03 Thread Nanley Chery
On Wed, Mar 22, 2017 at 03:13:56PM -0700, Jason Ekstrand wrote:
> That pass hasn't existed since dd4db84640bbb694f180dd50850c3388f67228be
> but the prototype stuck around for no reason.
> ---
>  src/intel/vulkan/anv_nir.h | 3 ---
>  1 file changed, 3 deletions(-)
> 

This patch is
Reviewed-by: Nanley Chery 

> diff --git a/src/intel/vulkan/anv_nir.h b/src/intel/vulkan/anv_nir.h
> index a929dd9..3f97701 100644
> --- a/src/intel/vulkan/anv_nir.h
> +++ b/src/intel/vulkan/anv_nir.h
> @@ -35,9 +35,6 @@ void anv_nir_lower_input_attachments(nir_shader *shader);
>  
>  void anv_nir_lower_push_constants(nir_shader *shader);
>  
> -void anv_nir_apply_dynamic_offsets(struct anv_pipeline *pipeline,
> -   nir_shader *shader,
> -   struct brw_stage_prog_data *prog_data);
>  void anv_nir_apply_pipeline_layout(struct anv_pipeline *pipeline,
> nir_shader *shader,
> struct brw_stage_prog_data *prog_data,
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/3] nv50/ir: fix AlgebraicOpt for slcts with mods

2017-04-03 Thread Karol Herbst
Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 4c92a1efb5..bd60a84998 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1797,10 +1797,10 @@ AlgebraicOpt::handleSLCT(Instruction *slct)
   if (slct->getSrc(2)->asImm()->compare(slct->asCmp()->setCond, 0.0f))
  slct->setSrc(0, slct->getSrc(1));
} else
-   if (slct->getSrc(0) != slct->getSrc(1)) {
+   if (slct->getSrc(0) != slct->getSrc(1) || slct->src(0).mod != 
slct->src(1).mod)
   return;
-   }
-   slct->op = OP_MOV;
+   slct->op = slct->src(0).mod.getOp();
+   slct->src(0).mod = slct->src(0).mod ^ Modifier(slct->op);
slct->setSrc(1, NULL);
slct->setSrc(2, NULL);
 }
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/3] nv50/ir: handle logops with NOT in AlgebraicOpt

2017-04-03 Thread Karol Herbst
Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index bd60a84998..0de84fe9fc 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1856,6 +1856,12 @@ AlgebraicOpt::handleLOGOP(Instruction *logop)
 
   set0 = cloneForward(func, set0);
   set1 = cloneShallow(func, set1);
+
+  if (logop->src(0).mod == Modifier(NV50_IR_MOD_NOT))
+ set0->asCmp()->setCond = inverseCondCode(set0->asCmp()->setCond);
+  if (logop->src(1).mod == Modifier(NV50_IR_MOD_NOT))
+ set1->asCmp()->setCond = inverseCondCode(set1->asCmp()->setCond);
+
   logop->bb->insertAfter(logop, set1);
   logop->bb->insertAfter(logop, set0);
 
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 3/3] nv50/ir: run some passes multiple times

2017-04-03 Thread Karol Herbst
With the shader cache, compilation time matters less.

As a side effect we can write more optimizations to produce better optimized
code.

total instructions in shared programs : 3931743 -> 3917512 (-0.36%)
total gprs used in shared programs: 481460 -> 481680 (0.05%)
total local used in shared programs   : 27481 -> 26761 (-2.62%)
total bytes used in shared programs   : 36032672 -> 35902648 (-0.36%)

localgpr   inst  bytes
helped  48 13338433843
  hurt   1 295  75  75

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp| 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 0de84fe9fc..505de08573 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -3729,12 +3729,17 @@ Program::optimizeSSA(int level)
RUN_PASS(1, CopyPropagation, run);
RUN_PASS(1, MergeSplits, run);
RUN_PASS(2, GlobalCSE, run);
-   RUN_PASS(1, LocalCSE, run);
-   RUN_PASS(2, AlgebraicOpt, run);
-   RUN_PASS(2, ModifierFolding, run); // before load propagation -> less checks
-   RUN_PASS(1, ConstantFolding, foldAll);
-   RUN_PASS(2, LateAlgebraicOpt, run);
-   RUN_PASS(1, Split64BitOpPreRA, run);
+   for (int i = 0; i < 2; ++i) {
+  RUN_PASS(1, LocalCSE, run);
+  RUN_PASS(2, AlgebraicOpt, run);
+  RUN_PASS(2, ModifierFolding, run); // before load propagation -> less 
checks
+  RUN_PASS(1, ConstantFolding, foldAll);
+  RUN_PASS(2, LateAlgebraicOpt, run);
+  // only once
+  if (i == 0)
+ RUN_PASS(1, Split64BitOpPreRA, run);
+  RUN_PASS(1, DeadCodeElim, buryAll);
+   }
RUN_PASS(1, LoadPropagation, run);
RUN_PASS(1, IndirectPropagation, run);
RUN_PASS(2, MemoryOpt, run);
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 0/3] nv50/ir: Preapre for running Opts inside a loop

2017-04-03 Thread Karol Herbst
Slowly we are getting to the point, that we miss enough optimization
opportunities as the result of our own passes.

For this we need to fix AlgebraicOpt to be able to handle mods on sources
without creating new issues.

The last patch enables looping opts.

v2: update commit author

Karol Herbst (3):
  nv50/ir: fix AlgebraicOpt for slcts with mods
  nv50/ir: handle logops with NOT in AlgebraicOpt
  nv50/ir: run some passes multiple times

 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 29 +++---
 1 file changed, 20 insertions(+), 9 deletions(-)

-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/3] nv50/ir: Preapre for running Opts inside a loop

2017-04-03 Thread Karol Herbst
Slowly we are getting to the point, that we miss enough optimization
opportunities as the result of our own passes.

For this we need to fix AlgebraicOpt to be able to handle mods on sources
without creating new issues.

The last patch enables looping opts.

Karol Herbst (3):
  nv50/ir: fix AlgebraicOpt for slcts with mods
  nv50/ir: handle logops with NOT in AlgebraicOpt
  nv50/ir: run some passes multiple times

 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 29 +++---
 1 file changed, 20 insertions(+), 9 deletions(-)

-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] nv50/ir: handle logops with NOT in AlgebraicOpt

2017-04-03 Thread Karol Herbst
Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index bd60a84998..0de84fe9fc 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1856,6 +1856,12 @@ AlgebraicOpt::handleLOGOP(Instruction *logop)
 
   set0 = cloneForward(func, set0);
   set1 = cloneShallow(func, set1);
+
+  if (logop->src(0).mod == Modifier(NV50_IR_MOD_NOT))
+ set0->asCmp()->setCond = inverseCondCode(set0->asCmp()->setCond);
+  if (logop->src(1).mod == Modifier(NV50_IR_MOD_NOT))
+ set1->asCmp()->setCond = inverseCondCode(set1->asCmp()->setCond);
+
   logop->bb->insertAfter(logop, set1);
   logop->bb->insertAfter(logop, set0);
 
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] nv50/ir: run some passes multiple times

2017-04-03 Thread Karol Herbst
From: Karol Herbst 

With the shader cache, compilation time matters less.

As a side effect we can write more optimizations to produce better optimized
code.

total instructions in shared programs : 3931743 -> 3917512 (-0.36%)
total gprs used in shared programs: 481460 -> 481680 (0.05%)
total local used in shared programs   : 27481 -> 26761 (-2.62%)
total bytes used in shared programs   : 36032672 -> 35902648 (-0.36%)

localgpr   inst  bytes
helped  48 13338433843
  hurt   1 295  75  75

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp| 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 0de84fe9fc..505de08573 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -3729,12 +3729,17 @@ Program::optimizeSSA(int level)
RUN_PASS(1, CopyPropagation, run);
RUN_PASS(1, MergeSplits, run);
RUN_PASS(2, GlobalCSE, run);
-   RUN_PASS(1, LocalCSE, run);
-   RUN_PASS(2, AlgebraicOpt, run);
-   RUN_PASS(2, ModifierFolding, run); // before load propagation -> less checks
-   RUN_PASS(1, ConstantFolding, foldAll);
-   RUN_PASS(2, LateAlgebraicOpt, run);
-   RUN_PASS(1, Split64BitOpPreRA, run);
+   for (int i = 0; i < 2; ++i) {
+  RUN_PASS(1, LocalCSE, run);
+  RUN_PASS(2, AlgebraicOpt, run);
+  RUN_PASS(2, ModifierFolding, run); // before load propagation -> less 
checks
+  RUN_PASS(1, ConstantFolding, foldAll);
+  RUN_PASS(2, LateAlgebraicOpt, run);
+  // only once
+  if (i == 0)
+ RUN_PASS(1, Split64BitOpPreRA, run);
+  RUN_PASS(1, DeadCodeElim, buryAll);
+   }
RUN_PASS(1, LoadPropagation, run);
RUN_PASS(1, IndirectPropagation, run);
RUN_PASS(2, MemoryOpt, run);
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] nv50/ir: fix AlgebraicOpt for slcts with mods

2017-04-03 Thread Karol Herbst
From: Karol Herbst 

Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 4c92a1efb5..bd60a84998 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1797,10 +1797,10 @@ AlgebraicOpt::handleSLCT(Instruction *slct)
   if (slct->getSrc(2)->asImm()->compare(slct->asCmp()->setCond, 0.0f))
  slct->setSrc(0, slct->getSrc(1));
} else
-   if (slct->getSrc(0) != slct->getSrc(1)) {
+   if (slct->getSrc(0) != slct->getSrc(1) || slct->src(0).mod != 
slct->src(1).mod)
   return;
-   }
-   slct->op = OP_MOV;
+   slct->op = slct->src(0).mod.getOp();
+   slct->src(0).mod = slct->src(0).mod ^ Modifier(slct->op);
slct->setSrc(1, NULL);
slct->setSrc(2, NULL);
 }
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv/pipeline: do not disable depth writes if depth testing is disabled

2017-04-03 Thread Nanley Chery
On Mon, Apr 03, 2017 at 08:02:54AM +0200, Iago Toral wrote:
> Can anyone review this one?
> 
> On Wed, 2017-03-29 at 08:58 +0200, Iago Toral Quiroga wrote:
> > Writing and testing are two different things and they can be set
> > separately
> > by the application. If an application wants to record depth data
> > without
> > caring for the depth test, it can enable depth test and set the depth
> > compare function to VK_COMPARE_OP_ALWAYS or it can simply disable
> > depth testing altogether. Some CTS tests do the latter.
> > 

Doesn't disabling the depth test prevent any updates to the depth
buffer?

Section 25.10 Depth Test from the Vulkan spec says:

   When disabled, the depth comparison and subsequent possible updates
   to the value of the depth component of the depth/stencil attachment
   are bypassed and the fragment is passed to the next operation.

-Nanley

> > Fixes all multisample tests with depth-only formats in:
> > dEQP-VK.renderpass.multisample.*
> > ---
> >  src/intel/vulkan/genX_pipeline.c | 4 
> >  1 file changed, 4 deletions(-)
> > 
> > diff --git a/src/intel/vulkan/genX_pipeline.c
> > b/src/intel/vulkan/genX_pipeline.c
> > index 85a9e4f..dc393cb 100644
> > --- a/src/intel/vulkan/genX_pipeline.c
> > +++ b/src/intel/vulkan/genX_pipeline.c
> > @@ -728,10 +728,6 @@
> > sanitize_ds_state(VkPipelineDepthStencilStateCreateInfo *state,
> >  {
> > *stencilWriteEnable = state->stencilTestEnable;
> >  
> > -   /* If the depth test is disabled, we won't be writing anything.
> > */
> > -   if (!state->depthTestEnable)
> > -  state->depthWriteEnable = false;
> > -
> > /* The Vulkan spec requires that if either depth or stencil is
> > not present,
> >  * the pipeline is to act as if the test silently passes.
> >  */
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/9] gallium: decrease the size of pipe_box - 24 -> 16 bytes

2017-04-03 Thread Alex Deucher
On Sun, Apr 2, 2017 at 2:00 PM, Marek Olšák  wrote:
> From: Marek Olšák 
>
> Also:
>
> pipe_transfer: 48 -> 40 bytes.
> pipe_blit_info = 176 -> 160 bytes.
> ---
>  src/gallium/include/pipe/p_state.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/src/gallium/include/pipe/p_state.h 
> b/src/gallium/include/pipe/p_state.h
> index 392bb8b..6a147ef 100644
> --- a/src/gallium/include/pipe/p_state.h
> +++ b/src/gallium/include/pipe/p_state.h
> @@ -472,25 +472,25 @@ struct pipe_image_view
> } u;
>  };
>
>
>  /**
>   * Subregion of 1D/2D/3D image resource.
>   */
>  struct pipe_box
>  {
> int x;
> -   int y;
> -   int z;
> +   int16_t y;
> +   int16_t z;
> int width;
> -   int height;
> -   int depth;
> +   int16_t height;
> +   int16_t depth;
>  };

Not specific to this patch per se, but will this cause any issues in
the future as max surface sizes supported by hw increase?  I feel like
we'll need to change this back at some point.

Alex

>
>
>  /**
>   * A memory object/resource such as a vertex buffer or texture.
>   */
>  struct pipe_resource
>  {
> struct pipe_reference reference;
> struct pipe_screen *screen; /**< screen that this texture belongs to */
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/9] gallium: decrease the size of pipe_box - 24 -> 16 bytes

2017-04-03 Thread Brian Paul

On 04/02/2017 12:00 PM, Marek Olšák wrote:

From: Marek Olšák 

Also:

pipe_transfer: 48 -> 40 bytes.
pipe_blit_info = 176 -> 160 bytes.
---
  src/gallium/include/pipe/p_state.h | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/gallium/include/pipe/p_state.h 
b/src/gallium/include/pipe/p_state.h
index 392bb8b..6a147ef 100644
--- a/src/gallium/include/pipe/p_state.h
+++ b/src/gallium/include/pipe/p_state.h
@@ -472,25 +472,25 @@ struct pipe_image_view
 } u;
  };


  /**
   * Subregion of 1D/2D/3D image resource.
   */
  struct pipe_box
  {
 int x;
-   int y;
-   int z;
+   int16_t y;
+   int16_t z;
 int width;
-   int height;
-   int depth;
+   int16_t height;
+   int16_t depth;


I think a comment explaining why x/width are int but y/z/height/depth 
are int16_t (texture buffer objects, right?) would be good.  Otherwise, 
someone's going to wonder about that and maybe submit a patch to change it.


Same thing for the pipe_resource change.

I tested the series on Windows/MSVC and MinGW and didn't see any problems.

With the above comments, Reviewed-by: Brian Paul 

BTW, some years ago I looked at using bitfields in gl_texture_image and 
gl_texture_object and it looked like a substantial size reduction was 
possible.  Maybe you or someone else would like to look into that.


-Brian



  };


  /**
   * A memory object/resource such as a vertex buffer or texture.
   */
  struct pipe_resource
  {
 struct pipe_reference reference;
 struct pipe_screen *screen; /**< screen that this texture belongs to */



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] vulkan/wsi/wayland: Pass damage through to the compositor

2017-04-03 Thread Daniel Stone
Hi,

On 3 April 2017 at 15:45, Jason Ekstrand  wrote:
> On Mon, Apr 3, 2017 at 1:54 AM, Daniel Stone  wrote:
>> Very scrupulous version check, but you forgot to actually use
>> wl_surface_damage_buffer. ;)
>
> Gah!!!  I'll get that fixed.  Assuming that change, now that you've looked
> at it all, would you mind reviewing at least this patch?

Yeah, with a trivial change to use surface_buffer, the series is:
Reviewed-by: Daniel Stone 

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv/pipeline: do not disable depth writes if depth testing is disabled

2017-04-03 Thread Jason Ekstrand
Sorry.  I thought I had.

Reviewed-by: Jason Ekstrand 

On Sun, Apr 2, 2017 at 11:02 PM, Iago Toral  wrote:

> Can anyone review this one?
>
> On Wed, 2017-03-29 at 08:58 +0200, Iago Toral Quiroga wrote:
> > Writing and testing are two different things and they can be set
> > separately
> > by the application. If an application wants to record depth data
> > without
> > caring for the depth test, it can enable depth test and set the depth
> > compare function to VK_COMPARE_OP_ALWAYS or it can simply disable
> > depth testing altogether. Some CTS tests do the latter.
> >
> > Fixes all multisample tests with depth-only formats in:
> > dEQP-VK.renderpass.multisample.*
> > ---
> >  src/intel/vulkan/genX_pipeline.c | 4 
> >  1 file changed, 4 deletions(-)
> >
> > diff --git a/src/intel/vulkan/genX_pipeline.c
> > b/src/intel/vulkan/genX_pipeline.c
> > index 85a9e4f..dc393cb 100644
> > --- a/src/intel/vulkan/genX_pipeline.c
> > +++ b/src/intel/vulkan/genX_pipeline.c
> > @@ -728,10 +728,6 @@
> > sanitize_ds_state(VkPipelineDepthStencilStateCreateInfo *state,
> >  {
> > *stencilWriteEnable = state->stencilTestEnable;
> >
> > -   /* If the depth test is disabled, we won't be writing anything.
> > */
> > -   if (!state->depthTestEnable)
> > -  state->depthWriteEnable = false;
> > -
> > /* The Vulkan spec requires that if either depth or stencil is
> > not present,
> >  * the pipeline is to act as if the test silently passes.
> >  */
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] vulkan/wsi/wayland: Pass damage through to the compositor

2017-04-03 Thread Jason Ekstrand
On Mon, Apr 3, 2017 at 1:54 AM, Daniel Stone  wrote:

> Hi Jason,
>
> On 1 April 2017 at 06:37, Jason Ekstrand  wrote:
> > @@ -594,7 +595,19 @@ wsi_wl_swapchain_queue_present(struct
> wsi_swapchain *wsi_chain,
> >
> > assert(image_index < chain->base.image_count);
> > wl_surface_attach(chain->surface, chain->images[image_index].buffer,
> 0, 0);
> > -   wl_surface_damage(chain->surface, 0, 0, INT32_MAX, INT32_MAX);
> > +
> > +   if (chain->surface_version >= 4 && damage &&
> > +   damage->pRectangles && damage->rectangleCount > 0) {
> > +  for (unsigned i = 0; i < damage->rectangleCount; i++) {
> > + const VkRectLayerKHR *rect = >pRectangles[i];
> > + assert(rect->layer == 0);
> > + wl_surface_damage(chain->surface,
> > +   rect->offset.x, rect->offset.y,
> > +   rect->extent.width, rect->extent.height);
>
> Very scrupulous version check, but you forgot to actually use
> wl_surface_damage_buffer. ;)
>

Gah!!!  I'll get that fixed.  Assuming that change, now that you've looked
at it all, would you mind reviewing at least this patch?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] anv: add support for allocating more than 1 block of memory

2017-04-03 Thread Jason Ekstrand
On Mon, Apr 3, 2017 at 5:02 AM, Juan A. Suarez Romero 
wrote:

> On Wed, 2017-03-29 at 12:06 -0700, Jason Ekstrand wrote:
> > Looking over the patch, I think I've convinced myself that it's
> correct.  (I honestly wasn't expecting to come to that conclusion without
> more iteration.)  That said, this raises some interesting questions.  I
> added Kristian to the Cc in case he has any input.
> >
> >  1. Should we do powers of two or linear.  I'm still a fan of powers of
> two.
>
> In which specific part do you mean? In free block lists? or in the
> allocator_new?
>

In the state pool, we just round all allocation sizes up to the nearest
power-of-two, and then the index into the array of free lists isn't "size -
base", it's "ilog2(size) - base".


> >
> >  2. Should block pools even have a block size at all? We could just make
> every block pool allow any power-of-two size from 4 KiB up to. say, 1 MiB
> and then make the block size part of the state pool or stream that's
> allocating from it.  At the moment, I like this idea, but I've given it
> very little thought.
> >
> So IIUC, the idea would be the block pool is just a flat chunk of
> memory, where we later fetch blocks of memory from, as requested by
> applications. Is that correct?
>

Sorry, but this patch gave me some sudden revelations and things are still
in the process of reforming in my brain.  In the past, we assumed a
two-layered allocation strategy where we had a block pool which was the
base and then the state pool and state stream allocators sat on top of it.
Originally, the state pool allocator was just for a single size as well.

Now that the block pool is going to be capable of allocating multiple
sizes, the whole mental model of the separation falls apart.  The new
future that I see is one where the block pool and state pool aren't
separate.  Instead, we have a single thing which I'll call state_pool (we
have to pick one of the names) which lets you allocate a state of any size
from 32B to 2MB.  The state stream will then allocate off of a state_pool
instead of a block_pool since they're now the same.  For smaller states, we
still want to allocate reasonably large chunks (probably 4K) so that we
ensure that things are nicely aligned.  I think I've got a pretty good idea
of how it should work at this point and can write more if you'd like.

Before we dive in and do a pile of refactoring, I think this patch is
pretty much good-to-go assuming we fix the power-of-two thing and it fixes
a bug so let's focus there.  Are you  interested in doing the refactoring?
If not, that's ok, I'm happy to do it and it wouldn't take me long now that
I've gotten a chance to think about it.  If you are interested, go for it!


> >  3. If we go with the idea in 2. should we still call it block_pool?  I
> think we can keep the name but it doesn't it as well as it once did.
> >
> > Thanks for working on this!  I'm sorry it's taken so long to respond.
> Every time I've looked at it, my brain hasn't been in the right state to
> think about lock-free code. :-/
> >
> > On Wed, Mar 15, 2017 at 5:05 AM, Juan A. Suarez Romero <
> jasua...@igalia.com> wrote:
> > > Current Anv allocator assign memory in terms of a fixed block size.
> > >
> > > But there can be cases where this block is not enough for a memory
> > > request, and thus several blocks must be assigned in a row.
> > >
> > > This commit adds support for specifying how many blocks of memory must
> > > be assigned.
> > >
> > > This fixes a number dEQP-VK.pipeline.render_to_image.* tests that
> crash.
> > >
> > > v2: lock-free free-list is not handled correctly (Jason)
> > > ---
> > >  src/intel/vulkan/anv_allocator.c   | 81
> +++---
> > >  src/intel/vulkan/anv_batch_chain.c |  4 +-
> > >  src/intel/vulkan/anv_private.h |  7 +++-
> > >  3 files changed, 66 insertions(+), 26 deletions(-)
> > >
> > > diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_
> allocator.c
> > > index 45c663b..3924551 100644
> > > --- a/src/intel/vulkan/anv_allocator.c
> > > +++ b/src/intel/vulkan/anv_allocator.c
> > > @@ -257,7 +257,8 @@ anv_block_pool_init(struct anv_block_pool *pool,
> > > pool->device = device;
> > > anv_bo_init(>bo, 0, 0);
> > > pool->block_size = block_size;
> > > -   pool->free_list = ANV_FREE_LIST_EMPTY;
> > > +   for (uint32_t i = 0; i < ANV_MAX_BLOCKS; i++)
> > > +  pool->free_list[i] = ANV_FREE_LIST_EMPTY;
> > > pool->back_free_list = ANV_FREE_LIST_EMPTY;
> > >
> > > pool->fd = memfd_create("block pool", MFD_CLOEXEC);
> > > @@ -500,30 +501,35 @@ fail:
> > >
> > >  static uint32_t
> > >  anv_block_pool_alloc_new(struct anv_block_pool *pool,
> > > - struct anv_block_state *pool_state)
> > > + struct anv_block_state *pool_state,
> > > + uint32_t n_blocks)
> >
> > Maybe have this take a size rather than n_blocks?  It's only ever called
> by stuff in 

Re: [Mesa-dev] [PATCH 9/9] gallium: decrease the size of pipe_draw_info - 88 -> 80 bytes

2017-04-03 Thread Brian Paul
I need to test this series on Windows and with MinGW first.  I'm worried
about enums with bitfields.

-Brian


On Mon, Apr 3, 2017 at 2:46 AM, Nicolai Hähnle  wrote:

> Some of these are a bit subtle, but I think they're fine. Series is:
>
> Reviewed-by: Nicolai Hähnle 
>
>
> On 02.04.2017 20:00, Marek Olšák wrote:
>
>> From: Marek Olšák 
>>
>> ---
>>  src/gallium/auxiliary/indices/u_primconvert.c | 10 --
>>  src/gallium/include/pipe/p_state.h|  2 +-
>>  2 files changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/indices/u_primconvert.c
>> b/src/gallium/auxiliary/indices/u_primconvert.c
>> index 2bdfade..1ffca4b 100644
>> --- a/src/gallium/auxiliary/indices/u_primconvert.c
>> +++ b/src/gallium/auxiliary/indices/u_primconvert.c
>> @@ -121,39 +121,45 @@ util_primconvert_draw_vbo(struct
>> primconvert_context *pc,
>> util_draw_init_info(_info);
>> new_info.indexed = true;
>> new_info.min_index = info->min_index;
>> new_info.max_index = info->max_index;
>> new_info.index_bias = info->index_bias;
>> new_info.start_instance = info->start_instance;
>> new_info.instance_count = info->instance_count;
>> new_info.primitive_restart = info->primitive_restart;
>> new_info.restart_index = info->restart_index;
>> if (info->indexed) {
>> +  enum pipe_prim_type mode = 0;
>> +
>>u_index_translator(pc->primtypes_mask,
>>   info->mode, pc->saved_ib.index_size,
>> info->count,
>>   pc->api_pv, pc->api_pv,
>>   info->primitive_restart ? PR_ENABLE :
>> PR_DISABLE,
>> - _info.mode, _ib.index_size,
>> _info.count,
>> + , _ib.index_size, _info.count,
>>   _func);
>> +  new_info.mode = mode;
>>src = ib->user_buffer;
>>if (!src) {
>>   src = pipe_buffer_map(pc->pipe, ib->buffer,
>> PIPE_TRANSFER_READ, _transfer);
>>}
>>src = (const uint8_t *)src + ib->offset;
>> }
>> else {
>> +  enum pipe_prim_type mode = 0;
>> +
>>u_index_generator(pc->primtypes_mask,
>>  info->mode, info->start, info->count,
>>  pc->api_pv, pc->api_pv,
>> -_info.mode, _ib.index_size,
>> _info.count,
>> +, _ib.index_size, _info.count,
>>  _func);
>> +  new_info.mode = mode;
>> }
>>
>> u_upload_alloc(pc->pipe->stream_uploader, 0, new_ib.index_size *
>> new_info.count, 4,
>>_ib.offset, _ib.buffer, );
>>
>> if (info->indexed) {
>>trans_func(src, info->start, info->count, new_info.count,
>> info->restart_index, dst);
>> }
>> else {
>>gen_func(info->start, new_info.count, dst);
>> diff --git a/src/gallium/include/pipe/p_state.h
>> b/src/gallium/include/pipe/p_state.h
>> index d68a4d4..eeabf8b 100644
>> --- a/src/gallium/include/pipe/p_state.h
>> +++ b/src/gallium/include/pipe/p_state.h
>> @@ -634,21 +634,21 @@ struct pipe_index_buffer
>> const void *user_buffer;  /**< pointer to a user buffer if buffer ==
>> NULL */
>>  };
>>
>>
>>  /**
>>   * Information to describe a draw_vbo call.
>>   */
>>  struct pipe_draw_info
>>  {
>> boolean indexed;  /**< use index buffer */
>> -   enum pipe_prim_type mode;  /**< the mode of the primitive */
>> +   enum pipe_prim_type mode:8;  /**< the mode of the primitive */
>> boolean primitive_restart;
>> ubyte vertices_per_patch; /**< the number of vertices per patch */
>>
>> unsigned start;  /**< the index of the first vertex */
>> unsigned count;  /**< number of vertices */
>>
>> unsigned start_instance; /**< first instance id */
>> unsigned instance_count; /**< number of instances */
>>
>> unsigned drawid; /**< id of this draw in a multidraw */
>>
>>
>
> --
> Lerne, wie die Welt wirklich ist,
> Aber vergiss niemals, wie sie sein sollte.
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] anv: add support for allocating more than 1 block of memory

2017-04-03 Thread Juan A. Suarez Romero
On Wed, 2017-03-29 at 12:06 -0700, Jason Ekstrand wrote:
> Looking over the patch, I think I've convinced myself that it's correct.  (I 
> honestly wasn't expecting to come to that conclusion without more iteration.) 
>  That said, this raises some interesting questions.  I added Kristian to the 
> Cc in case he has any input.
> 
>  1. Should we do powers of two or linear.  I'm still a fan of powers of two.

In which specific part do you mean? In free block lists? or in the
allocator_new? 

> 
>  2. Should block pools even have a block size at all? We could just make 
> every block pool allow any power-of-two size from 4 KiB up to. say, 1 MiB and 
> then make the block size part of the state pool or stream that's allocating 
> from it.  At the moment, I like this idea, but I've given it very little 
> thought.
> 
So IIUC, the idea would be the block pool is just a flat chunk of
memory, where we later fetch blocks of memory from, as requested by
applications. Is that correct?



>  3. If we go with the idea in 2. should we still call it block_pool?  I think 
> we can keep the name but it doesn't it as well as it once did.
> 
> Thanks for working on this!  I'm sorry it's taken so long to respond.  Every 
> time I've looked at it, my brain hasn't been in the right state to think 
> about lock-free code. :-/
> 
> On Wed, Mar 15, 2017 at 5:05 AM, Juan A. Suarez Romero  
> wrote:
> > Current Anv allocator assign memory in terms of a fixed block size.
> > 
> > But there can be cases where this block is not enough for a memory
> > request, and thus several blocks must be assigned in a row.
> > 
> > This commit adds support for specifying how many blocks of memory must
> > be assigned.
> > 
> > This fixes a number dEQP-VK.pipeline.render_to_image.* tests that crash.
> > 
> > v2: lock-free free-list is not handled correctly (Jason)
> > ---
> >  src/intel/vulkan/anv_allocator.c   | 81 
> > +++---
> >  src/intel/vulkan/anv_batch_chain.c |  4 +-
> >  src/intel/vulkan/anv_private.h     |  7 +++-
> >  3 files changed, 66 insertions(+), 26 deletions(-)
> > 
> > diff --git a/src/intel/vulkan/anv_allocator.c 
> > b/src/intel/vulkan/anv_allocator.c
> > index 45c663b..3924551 100644
> > --- a/src/intel/vulkan/anv_allocator.c
> > +++ b/src/intel/vulkan/anv_allocator.c
> > @@ -257,7 +257,8 @@ anv_block_pool_init(struct anv_block_pool *pool,
> >     pool->device = device;
> >     anv_bo_init(>bo, 0, 0);
> >     pool->block_size = block_size;
> > -   pool->free_list = ANV_FREE_LIST_EMPTY;
> > +   for (uint32_t i = 0; i < ANV_MAX_BLOCKS; i++)
> > +      pool->free_list[i] = ANV_FREE_LIST_EMPTY;
> >     pool->back_free_list = ANV_FREE_LIST_EMPTY;
> > 
> >     pool->fd = memfd_create("block pool", MFD_CLOEXEC);
> > @@ -500,30 +501,35 @@ fail:
> > 
> >  static uint32_t
> >  anv_block_pool_alloc_new(struct anv_block_pool *pool,
> > -                         struct anv_block_state *pool_state)
> > +                         struct anv_block_state *pool_state,
> > +                         uint32_t n_blocks)
> 
> Maybe have this take a size rather than n_blocks?  It's only ever called by 
> stuff in the block pool so the caller can do the multiplication.  It would 
> certainly make some of the math below easier.
>  
> >  {
> >     struct anv_block_state state, old, new;
> > 
> >     while (1) {
> > -      state.u64 = __sync_fetch_and_add(_state->u64, pool->block_size);
> > -      if (state.next < state.end) {
> > +      state.u64 = __sync_fetch_and_add(_state->u64, n_blocks * 
> > pool->block_size);
> > +      if (state.next > state.end) {
> > +         futex_wait(_state->end, state.end);
> > +         continue;
> > +      } else if ((state.next + (n_blocks - 1) * pool->block_size) < 
> > state.end) {
> 
> First off, please keep the if's in the same order unless we have a reason to 
> re-arrange them.  It would make this way easier to review. :-)
> 
> Second, I think this would be much easier to read as:
> 
> if (state.next + size <= state.end) {
>    /* Success */
> } else if (state.next <= state.end) {
>    /* Our block is the one that crosses the line */
> } else {
>    /* Wait like everyone else */
> }
>  
> >           assert(pool->map);
> >           return state.next;
> > -      } else if (state.next == state.end) {
> > -         /* We allocated the first block outside the pool, we have to grow 
> > it.
> > -          * pool_state->next acts a mutex: threads who try to allocate now 
> > will
> > -          * get block indexes above the current limit and hit futex_wait
> > -          * below. */
> > -         new.next = state.next + pool->block_size;
> > +      } else {
> > +         /* We allocated the firsts blocks outside the pool, we have to 
> > grow
> > +          * it. pool_state->next acts a mutex: threads who try to allocate
> > +          * now will get block indexes above the current limit and hit
> > +          * futex_wait below.
> > +          */
> > +         

[Mesa-dev] [PATCH v2] docs: mark GL_ARB_vertex_attrib_64bit and OpenGL 4.2 as supported by i965/gen7+

2017-04-03 Thread Juan A. Suarez Romero
v2 (Andreas Boll):
- Mark GL 4.1 as supported by i965/gen7+
- Mark GL_ARB_shader_precision as supported by i965/gen7+
- Update release notes
---
 docs/features.txt | 4 ++--
 docs/relnotes/17.1.0.html | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/docs/features.txt b/docs/features.txt
index 5c22acf..e71b197 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -136,12 +136,12 @@ GL 4.0, GLSL 4.00 --- all DONE: i965/gen7+, nvc0, r600, 
radeonsi
   GL_ARB_transform_feedback3DONE (i965/gen7+, 
llvmpipe, softpipe, swr)
 
 
-GL 4.1, GLSL 4.10 --- all DONE: i965/hsw+, nvc0, r600, radeonsi
+GL 4.1, GLSL 4.10 --- all DONE: i965/gen7+, nvc0, r600, radeonsi
 
   GL_ARB_ES2_compatibility  DONE (i965, nv50, 
llvmpipe, softpipe, swr)
   GL_ARB_get_program_binary DONE (0 binary formats)
   GL_ARB_separate_shader_objectsDONE (all drivers)
-  GL_ARB_shader_precision   DONE (i965/hsw+, all 
drivers that support GLSL 4.10)
+  GL_ARB_shader_precision   DONE (i965/gen7+, all 
drivers that support GLSL 4.10)
   GL_ARB_vertex_attrib_64bitDONE (i965/gen7+, 
llvmpipe, softpipe)
   GL_ARB_viewport_array DONE (i965, nv50, 
llvmpipe, softpipe)
 
diff --git a/docs/relnotes/17.1.0.html b/docs/relnotes/17.1.0.html
index ada1e38..dbf8379 100644
--- a/docs/relnotes/17.1.0.html
+++ b/docs/relnotes/17.1.0.html
@@ -44,9 +44,12 @@ Note: some of the new features are only available with 
certain drivers.
 
 
 
+OpenGL 4.2 on i965/ivb
 GL_ARB_gpu_shader_int64 on i965/gen8+, nvc0, radeonsi, softpipe, 
llvmpipe
+GL_ARB_shader_precision on i965/ivb
 GL_ARB_transform_feedback2 on i965/gen6
 GL_ARB_transform_feedback_overflow_query on i965/gen6+
+GL_ARB_vertex_attrib_64bit on i965/ivb
 Geometry shaders enabled on swr
 
 
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 100402] [d3d9 bisected] Diablo III fails to start after commit 0630d3600bfb770cf3b23761c45b3add3b277c6b

2017-04-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=100402

Vedran Miletić  changed:

   What|Removed |Added

 CC||ved...@miletic.net

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] targets/va: export radeon winsys_create functions

2017-04-03 Thread Christian König

Am 03.04.2017 um 12:08 schrieb Marek Olšák:

From: Marek Olšák 

It silences the following radeonsi LLVM warning due to a previous
commit adding an LLVM workaround:
   "mesa: for the -simplifycfg-sink-common option: may only occur zero or one
times!"

Cc: 17.0 


Reviewed-by: Christian König 


---
  src/gallium/targets/omx/omx.sym  | 5 +
  src/gallium/targets/pipe-loader/pipe.sym | 5 +
  src/gallium/targets/va/va.sym| 5 +
  3 files changed, 15 insertions(+)

diff --git a/src/gallium/targets/omx/omx.sym b/src/gallium/targets/omx/omx.sym
index af22aed..e8a2876 100644
--- a/src/gallium/targets/omx/omx.sym
+++ b/src/gallium/targets/omx/omx.sym
@@ -1,6 +1,11 @@
  {
global:
omx_component_library_Setup;
+
+   # Workaround for an LLVM warning with -simplifycfg-sink-common
+   # due to LLVM being initialized multiple times.
+   radeon_drm_winsys_create;
+   amdgpu_winsys_create;
local:
*;
  };
diff --git a/src/gallium/targets/pipe-loader/pipe.sym 
b/src/gallium/targets/pipe-loader/pipe.sym
index b2fa619..605cb83 100644
--- a/src/gallium/targets/pipe-loader/pipe.sym
+++ b/src/gallium/targets/pipe-loader/pipe.sym
@@ -1,7 +1,12 @@
  {
global:
driver_descriptor;
swrast_driver_descriptor;
+
+   # Workaround for an LLVM warning with -simplifycfg-sink-common
+   # due to LLVM being initialized multiple times.
+   radeon_drm_winsys_create;
+   amdgpu_winsys_create;
local:
*;
  };
diff --git a/src/gallium/targets/va/va.sym b/src/gallium/targets/va/va.sym
index c925b2e..917c3d3 100644
--- a/src/gallium/targets/va/va.sym
+++ b/src/gallium/targets/va/va.sym
@@ -1,6 +1,11 @@
  {
global:
__vaDriverInit_*_*;
+
+   # Workaround for an LLVM warning with -simplifycfg-sink-common
+   # due to LLVM being initialized multiple times.
+   radeon_drm_winsys_create;
+   amdgpu_winsys_create;
local:
*;
  };



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] r600g: check rasterizer primitive states like in radeonsi

2017-04-03 Thread Marek Olšák
For the series:

Reviewed-by: Marek Olšák 

Marek

On Sun, Apr 2, 2017 at 7:33 PM, Constantine Kharlamov
 wrote:
> Specifically, non-line primitives skipped, and defaulting to reset on
> each packet.
>
> The skip of non-line primitives saves ≈110 resetting of
> PA_SC_LINE_STIPPLE register per frame in Kane
>
> Signed-off-by: Constantine Kharlamov 
> ---
>  src/gallium/drivers/r600/r600_state_common.c | 21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/r600_state_common.c 
> b/src/gallium/drivers/r600/r600_state_common.c
> index e4d1660933..4de2a7344b 100644
> --- a/src/gallium/drivers/r600/r600_state_common.c
> +++ b/src/gallium/drivers/r600/r600_state_common.c
> @@ -1670,19 +1670,24 @@ void r600_emit_clip_misc_state(struct r600_context 
> *rctx, struct r600_atom *atom
>  static inline void r600_emit_rasterizer_prim_state(struct r600_context *rctx)
>  {
> struct radeon_winsys_cs *cs = rctx->b.gfx.cs;
> -   unsigned ls_mask = 0;
> enum pipe_prim_type rast_prim = rctx->current_rast_prim;
> -   if (rast_prim == rctx->last_rast_prim)
> +
> +   /* Skip this if not rendering lines. */
> +   if (rast_prim != PIPE_PRIM_LINES &&
> +   rast_prim != PIPE_PRIM_LINE_LOOP &&
> +   rast_prim != PIPE_PRIM_LINE_STRIP &&
> +   rast_prim != PIPE_PRIM_LINES_ADJACENCY &&
> +   rast_prim != PIPE_PRIM_LINE_STRIP_ADJACENCY)
> return;
>
> -   if (rast_prim == PIPE_PRIM_LINES)
> -   ls_mask = 1;
> -   else if (rast_prim == PIPE_PRIM_LINE_STRIP ||
> -rast_prim == PIPE_PRIM_LINE_LOOP)
> -   ls_mask = 2;
> +   if (rast_prim == rctx->last_rast_prim)
> +   return;
>
> +   /* For lines, reset the stipple pattern at each primitive. Otherwise,
> +* reset the stipple pattern at each packet (line strips, line loops).
> +*/
> radeon_set_context_reg(cs, R_028A0C_PA_SC_LINE_STIPPLE,
> -  S_028A0C_AUTO_RESET_CNTL(ls_mask) |
> +  S_028A0C_AUTO_RESET_CNTL(rast_prim == 
> PIPE_PRIM_LINES ? 1 : 2) |
>(rctx->rasterizer ? 
> rctx->rasterizer->pa_sc_line_stipple : 0));
> rctx->last_rast_prim = rast_prim;
>  }
> --
> 2.12.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] targets/va: export radeon winsys_create functions

2017-04-03 Thread Marek Olšák
I've changed the commit message to "targets: export radeon
winsys_create functions to silence LLVM warning".

Marek

On Mon, Apr 3, 2017 at 12:08 PM, Marek Olšák  wrote:
> From: Marek Olšák 
>
> It silences the following radeonsi LLVM warning due to a previous
> commit adding an LLVM workaround:
>   "mesa: for the -simplifycfg-sink-common option: may only occur zero or one
>times!"
>
> Cc: 17.0 
> ---
>  src/gallium/targets/omx/omx.sym  | 5 +
>  src/gallium/targets/pipe-loader/pipe.sym | 5 +
>  src/gallium/targets/va/va.sym| 5 +
>  3 files changed, 15 insertions(+)
>
> diff --git a/src/gallium/targets/omx/omx.sym b/src/gallium/targets/omx/omx.sym
> index af22aed..e8a2876 100644
> --- a/src/gallium/targets/omx/omx.sym
> +++ b/src/gallium/targets/omx/omx.sym
> @@ -1,6 +1,11 @@
>  {
> global:
> omx_component_library_Setup;
> +
> +   # Workaround for an LLVM warning with -simplifycfg-sink-common
> +   # due to LLVM being initialized multiple times.
> +   radeon_drm_winsys_create;
> +   amdgpu_winsys_create;
> local:
> *;
>  };
> diff --git a/src/gallium/targets/pipe-loader/pipe.sym 
> b/src/gallium/targets/pipe-loader/pipe.sym
> index b2fa619..605cb83 100644
> --- a/src/gallium/targets/pipe-loader/pipe.sym
> +++ b/src/gallium/targets/pipe-loader/pipe.sym
> @@ -1,7 +1,12 @@
>  {
> global:
> driver_descriptor;
> swrast_driver_descriptor;
> +
> +   # Workaround for an LLVM warning with -simplifycfg-sink-common
> +   # due to LLVM being initialized multiple times.
> +   radeon_drm_winsys_create;
> +   amdgpu_winsys_create;
> local:
> *;
>  };
> diff --git a/src/gallium/targets/va/va.sym b/src/gallium/targets/va/va.sym
> index c925b2e..917c3d3 100644
> --- a/src/gallium/targets/va/va.sym
> +++ b/src/gallium/targets/va/va.sym
> @@ -1,6 +1,11 @@
>  {
> global:
> __vaDriverInit_*_*;
> +
> +   # Workaround for an LLVM warning with -simplifycfg-sink-common
> +   # due to LLVM being initialized multiple times.
> +   radeon_drm_winsys_create;
> +   amdgpu_winsys_create;
> local:
> *;
>  };
> --
> 2.7.4
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] targets/va: export radeon winsys_create functions

2017-04-03 Thread Marek Olšák
From: Marek Olšák 

It silences the following radeonsi LLVM warning due to a previous
commit adding an LLVM workaround:
  "mesa: for the -simplifycfg-sink-common option: may only occur zero or one
   times!"

Cc: 17.0 
---
 src/gallium/targets/omx/omx.sym  | 5 +
 src/gallium/targets/pipe-loader/pipe.sym | 5 +
 src/gallium/targets/va/va.sym| 5 +
 3 files changed, 15 insertions(+)

diff --git a/src/gallium/targets/omx/omx.sym b/src/gallium/targets/omx/omx.sym
index af22aed..e8a2876 100644
--- a/src/gallium/targets/omx/omx.sym
+++ b/src/gallium/targets/omx/omx.sym
@@ -1,6 +1,11 @@
 {
global:
omx_component_library_Setup;
+
+   # Workaround for an LLVM warning with -simplifycfg-sink-common
+   # due to LLVM being initialized multiple times.
+   radeon_drm_winsys_create;
+   amdgpu_winsys_create;
local:
*;
 };
diff --git a/src/gallium/targets/pipe-loader/pipe.sym 
b/src/gallium/targets/pipe-loader/pipe.sym
index b2fa619..605cb83 100644
--- a/src/gallium/targets/pipe-loader/pipe.sym
+++ b/src/gallium/targets/pipe-loader/pipe.sym
@@ -1,7 +1,12 @@
 {
global:
driver_descriptor;
swrast_driver_descriptor;
+
+   # Workaround for an LLVM warning with -simplifycfg-sink-common
+   # due to LLVM being initialized multiple times.
+   radeon_drm_winsys_create;
+   amdgpu_winsys_create;
local:
*;
 };
diff --git a/src/gallium/targets/va/va.sym b/src/gallium/targets/va/va.sym
index c925b2e..917c3d3 100644
--- a/src/gallium/targets/va/va.sym
+++ b/src/gallium/targets/va/va.sym
@@ -1,6 +1,11 @@
 {
global:
__vaDriverInit_*_*;
+
+   # Workaround for an LLVM warning with -simplifycfg-sink-common
+   # due to LLVM being initialized multiple times.
+   radeon_drm_winsys_create;
+   amdgpu_winsys_create;
local:
*;
 };
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/omx/dec: Properly undefine DEBUG macro

2017-04-03 Thread Shaleen
Yes, that is correct. I wasn't thinking clearly.

On Mon, Apr 3, 2017 at 1:57 PM Christian König 
wrote:

> Am 03.04.2017 um 06:35 schrieb Shaleen Jain:
> > ---
> >   src/gallium/state_trackers/omx/vid_dec.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/src/gallium/state_trackers/omx/vid_dec.c
> b/src/gallium/state_trackers/omx/vid_dec.c
> > index 9a6efb8e28..94664eba04 100644
> > --- a/src/gallium/state_trackers/omx/vid_dec.c
> > +++ b/src/gallium/state_trackers/omx/vid_dec.c
> > @@ -37,7 +37,7 @@
> >   #include 
> >
> >   /* bellagio defines a DEBUG macro that we don't want */
> > -#ifndef DEBUG
> > +#ifdef DEBUG
>
> NAK, the logic locked correct as it was before.
>
> Why do you want to invert it?
>
> Regards,
> Christian.
>
> >   #include 
> >   #undef DEBUG
> >   #else
>
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] radeonsi: use ctx->types instead of bld->types etc.

2017-04-03 Thread Marek Olšák
From: Marek Olšák 

even vec_type is f32.
---
 src/gallium/drivers/radeonsi/si_shader.c   | 28 ++
 .../drivers/radeonsi/si_shader_tgsi_setup.c| 16 ++---
 2 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index a5d370b..0200172 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -3306,21 +3306,21 @@ static LLVMValueRef image_fetch_coords(
struct gallivm_state *gallivm = bld_base->base.gallivm;
LLVMBuilderRef builder = gallivm->builder;
unsigned target = inst->Memory.Texture;
unsigned num_coords = tgsi_util_get_texture_coord_dim(target);
LLVMValueRef coords[4];
LLVMValueRef tmp;
int chan;
 
for (chan = 0; chan < num_coords; ++chan) {
tmp = lp_build_emit_fetch(bld_base, inst, src, chan);
-   tmp = LLVMBuildBitCast(builder, tmp, 
bld_base->uint_bld.elem_type, "");
+   tmp = LLVMBuildBitCast(builder, tmp, ctx->i32, "");
coords[chan] = tmp;
}
 
/* 1D textures are allocated and used as 2D on GFX9. */
if (ctx->screen->b.chip_class >= GFX9) {
if (target == TGSI_TEXTURE_1D) {
coords[1] = ctx->i32_0;
num_coords++;
} else if (target == TGSI_TEXTURE_1D_ARRAY) {
coords[2] = coords[1];
@@ -3414,31 +3414,31 @@ static void buffer_append_args(
 static void load_fetch_args(
struct lp_build_tgsi_context * bld_base,
struct lp_build_emit_data * emit_data)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
struct gallivm_state *gallivm = bld_base->base.gallivm;
const struct tgsi_full_instruction * inst = emit_data->inst;
unsigned target = inst->Memory.Texture;
LLVMValueRef rsrc;
 
-   emit_data->dst_type = LLVMVectorType(bld_base->base.elem_type, 4);
+   emit_data->dst_type = ctx->v4f32;
 
if (inst->Src[0].Register.File == TGSI_FILE_BUFFER) {
LLVMBuilderRef builder = gallivm->builder;
LLVMValueRef offset;
LLVMValueRef tmp;
 
rsrc = shader_buffer_fetch_rsrc(ctx, >Src[0]);
 
tmp = lp_build_emit_fetch(bld_base, inst, 1, 0);
-   offset = LLVMBuildBitCast(builder, tmp, 
bld_base->uint_bld.elem_type, "");
+   offset = LLVMBuildBitCast(builder, tmp, ctx->i32, "");
 
buffer_append_args(ctx, emit_data, rsrc, ctx->i32_0,
   offset, false, false);
} else if (inst->Src[0].Register.File == TGSI_FILE_IMAGE) {
LLVMValueRef coords;
 
image_fetch_rsrc(bld_base, >Src[0], false, target, );
coords = image_fetch_coords(bld_base, inst, 1);
 
if (target == TGSI_TEXTURE_BUFFER) {
@@ -3522,32 +3522,31 @@ static LLVMValueRef get_memory_ptr(struct 
si_shader_context *ctx,
ptr = LLVMBuildBitCast(builder, ptr, LLVMPointerType(type, addr_space), 
"");
 
return ptr;
 }
 
 static void load_emit_memory(
struct si_shader_context *ctx,
struct lp_build_emit_data *emit_data)
 {
const struct tgsi_full_instruction *inst = emit_data->inst;
-   struct lp_build_context *base = >bld_base.base;
struct gallivm_state *gallivm = >gallivm;
LLVMBuilderRef builder = gallivm->builder;
unsigned writemask = inst->Dst[0].Register.WriteMask;
LLVMValueRef channels[4], ptr, derived_ptr, index;
int chan;
 
-   ptr = get_memory_ptr(ctx, inst, base->elem_type, 1);
+   ptr = get_memory_ptr(ctx, inst, ctx->f32, 1);
 
for (chan = 0; chan < 4; ++chan) {
if (!(writemask & (1 << chan))) {
-   channels[chan] = LLVMGetUndef(base->elem_type);
+   channels[chan] = LLVMGetUndef(ctx->f32);
continue;
}
 
index = LLVMConstInt(ctx->i32, chan, 0);
derived_ptr = LLVMBuildGEP(builder, ptr, , 1, "");
channels[chan] = LLVMBuildLoad(builder, derived_ptr, "");
}
emit_data->output[emit_data->chan] = lp_build_gather_values(gallivm, 
channels, 4);
 }
 
@@ -3692,21 +3691,21 @@ static void store_fetch_args(
 
memory = tgsi_full_src_register_from_dst(>Dst[0]);
 
if (inst->Dst[0].Register.File == TGSI_FILE_BUFFER) {
LLVMValueRef offset;
LLVMValueRef tmp;
 
rsrc = shader_buffer_fetch_rsrc(ctx, );
 
tmp = lp_build_emit_fetch(bld_base, inst, 0, 0);
-   offset = LLVMBuildBitCast(builder, tmp, 
bld_base->uint_bld.elem_type, "");
+   offset = LLVMBuildBitCast(builder, tmp, 

  1   2   >