Re: [Mesa-dev] [PATCH] Revert "st/vdpau: use linear layout for output surfaces"

2016-09-14 Thread Ilia Mirkin
On Wed, Sep 14, 2016 at 11:58 PM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> This reverts commit d180de35320eafa3df3d76f0e82b332656530126.
>
> This is a radeon specific hack that causes problems on nouveau
> when combined with the SHARED flag later. If radeonsi needs a fix
> for this, please fix it in the driver.
>
> Signed-off-by: Dave Airlie 

Tested-by: Ilia Mirkin 
Cc: "12.0" 

> ---
>  src/gallium/state_trackers/vdpau/output.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/gallium/state_trackers/vdpau/output.c 
> b/src/gallium/state_trackers/vdpau/output.c
> index 85751ea..09a1517 100644
> --- a/src/gallium/state_trackers/vdpau/output.c
> +++ b/src/gallium/state_trackers/vdpau/output.c
> @@ -82,7 +82,7 @@ vlVdpOutputSurfaceCreate(VdpDevice device,
> res_tmpl.depth0 = 1;
> res_tmpl.array_size = 1;
> res_tmpl.bind = PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_RENDER_TARGET |
> -   PIPE_BIND_LINEAR | PIPE_BIND_SHARED;
> +   PIPE_BIND_SHARED;
> res_tmpl.usage = PIPE_USAGE_DEFAULT;
>
> pipe_mutex_lock(dev->mutex);
> --
> 2.5.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] Revert "st/vdpau: use linear layout for output surfaces"

2016-09-14 Thread Dave Airlie
From: Dave Airlie 

This reverts commit d180de35320eafa3df3d76f0e82b332656530126.

This is a radeon specific hack that causes problems on nouveau
when combined with the SHARED flag later. If radeonsi needs a fix
for this, please fix it in the driver.

Signed-off-by: Dave Airlie 
---
 src/gallium/state_trackers/vdpau/output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/vdpau/output.c 
b/src/gallium/state_trackers/vdpau/output.c
index 85751ea..09a1517 100644
--- a/src/gallium/state_trackers/vdpau/output.c
+++ b/src/gallium/state_trackers/vdpau/output.c
@@ -82,7 +82,7 @@ vlVdpOutputSurfaceCreate(VdpDevice device,
res_tmpl.depth0 = 1;
res_tmpl.array_size = 1;
res_tmpl.bind = PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_RENDER_TARGET |
-   PIPE_BIND_LINEAR | PIPE_BIND_SHARED;
+   PIPE_BIND_SHARED;
res_tmpl.usage = PIPE_USAGE_DEFAULT;
 
pipe_mutex_lock(dev->mutex);
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/10] st/vdpau: implement the new DMA-buf based interop

2016-09-14 Thread Dave Airlie
On 15 September 2016 at 13:03, Ilia Mirkin  wrote:
> On Wed, Sep 14, 2016 at 10:15 PM, Michel Dänzer  wrote:
>>> No, the current impl is pretty radeon-specific (note - it doesn't work
>>> on nouveau, and no other drivers support the interfaces, so ... it's
>>> radeon-specific).
>>
>> We're getting into semantics here, but since the reason it doesn't work
>> well with nouveau is a fundamental issue in nouveau (which should also
>> affect at least DRI3 in general), while you may call it "de facto radeon
>> specific" if you're so inclined, that doesn't make the implementation
>> actually radeon specific.
>
> No one's reported any issues with DRI3, I use it on my home desktop
> every day. And VDPAU used to work great until these changes to
> st/vdpau went in. Prior to those changes in st/vdpau, saying that
> "shared == gart" was a perfectly reasonable thing to say, since no one
> tried blending/readback on those surfaces (or at least not enough for
> it to matter). But now ... poof ... it doesn't work [actually, worse -
> it works - but can't come close to keeping up with 24fps video].
>
> Anyways, I realize this is a losing argument. Interfaces and usages
> move forward and change over time. This happens to be a change that
> leaves nouveau behind. As a spare-time contributor, I can't keep up
> with multiple full timers. I had hoped that there'd be some way to
> make it all still work, but that doesn't seem to be the case.
> Unfortunately end users are going to lose out on functionality as a
> result.

So (a) this is a regression, regressions aren't allowed, so it would
be good to back the change out until it can be fixed.

The problem is the combo of LINEAR and SHARED means that
GART placement is most likely, radeon should be doing the same
in most circumstances.

We should possible introduced SHARED_OTHER_GPU maybe
and use that throughout the stack where it matters.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Problem with RX 480 on Alien: Isolation and Dota 2

2016-09-14 Thread Romain Failliot
2016-09-13 13:53 GMT-04:00 Marek Olšák :
> LLVM 32-bit:
>
> mkdir -p build32
> cd build32
> cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/i386-linux-gnu
> -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=ON
>   -DCMAKE_BUILD_TYPE=RelWithDebInfo
> -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \
>   -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
> -fno-omit-frame-pointer" \
>   -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
> -fno-omit-frame-pointer" \
>   -DLLVM_BUILD_32_BITS=ON

I have a problem with the 32-bit compilation of llvm.

I get this error:

-- Target triple: x86_64-unknown-linux-gnu
-- Native target architecture is X86
-- Threads enabled.
-- Doxygen disabled.
-- Sphinx disabled.
-- Go bindings disabled.
-- Could NOT find OCaml (missing:  OCAMLFIND OCAML_VERSION OCAML_STDLIB_PATH)
-- Could NOT find OCaml (missing:  OCAMLFIND OCAML_VERSION OCAML_STDLIB_PATH)
-- OCaml bindings disabled.
-- Building with -fPIC
-- Building 32 bits executables and libraries.
CMake Error at cmake/modules/HandleLLVMOptions.cmake:469 (message):
  LLVM requires C++11 support but the '-std=c++11' flag isn't supported.
Call Stack (most recent call first):
  CMakeLists.txt:473 (include)


-- Configuring incomplete, errors occurred!

I don't know why my LLVM doesn't handle C++11... Any idea?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/10] st/vdpau: implement the new DMA-buf based interop

2016-09-14 Thread Ilia Mirkin
On Wed, Sep 14, 2016 at 10:15 PM, Michel Dänzer  wrote:
>> No, the current impl is pretty radeon-specific (note - it doesn't work
>> on nouveau, and no other drivers support the interfaces, so ... it's
>> radeon-specific).
>
> We're getting into semantics here, but since the reason it doesn't work
> well with nouveau is a fundamental issue in nouveau (which should also
> affect at least DRI3 in general), while you may call it "de facto radeon
> specific" if you're so inclined, that doesn't make the implementation
> actually radeon specific.

No one's reported any issues with DRI3, I use it on my home desktop
every day. And VDPAU used to work great until these changes to
st/vdpau went in. Prior to those changes in st/vdpau, saying that
"shared == gart" was a perfectly reasonable thing to say, since no one
tried blending/readback on those surfaces (or at least not enough for
it to matter). But now ... poof ... it doesn't work [actually, worse -
it works - but can't come close to keeping up with 24fps video].

Anyways, I realize this is a losing argument. Interfaces and usages
move forward and change over time. This happens to be a change that
leaves nouveau behind. As a spare-time contributor, I can't keep up
with multiple full timers. I had hoped that there'd be some way to
make it all still work, but that doesn't seem to be the case.
Unfortunately end users are going to lose out on functionality as a
result.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/10] st/vdpau: implement the new DMA-buf based interop

2016-09-14 Thread Ilia Mirkin
On Wed, Sep 14, 2016 at 10:15 PM, Michel Dänzer  wrote:
>> This shouldn't be too hard - just make the compositor output to a temporary
>> surface before copying things out to the prime-shareable one.
>
> You mean something like https://patchwork.freedesktop.org/patch/110375/ ?

Something like that, but presumably something would have to be applied
to st/vdpau so that it doesn't create its output surfaces with the
PIPE_BIND_SHARED flag either.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/10] st/vdpau: implement the new DMA-buf based interop

2016-09-14 Thread Michel Dänzer
On 15/09/16 11:01 AM, Ilia Mirkin wrote:
> On Wed, Sep 14, 2016 at 9:42 PM, Michel Dänzer  wrote:
>> On 15/09/16 08:20 AM, Ilia Mirkin wrote:
>>> On Wed, Sep 7, 2016 at 12:06 PM, Marek Olšák  wrote:
 On Wed, Sep 7, 2016 at 5:36 PM, Ilia Mirkin  wrote:
> On Wed, Sep 7, 2016 at 4:08 AM, Michel Dänzer  wrote:
>> On 07/09/16 04:19 AM, Christian König wrote:
>>> Am 06.09.2016 um 21:05 schrieb Ilia Mirkin:
 On Tue, Sep 6, 2016 at 2:22 PM, Christian König
  wrote:
> Am 06.09.2016 um 16:23 schrieb Ilia Mirkin:
>> On Mon, Sep 5, 2016 at 2:48 AM, Michel Dänzer 
>> wrote:
>>> On 05/09/16 04:37 AM, Ilia Mirkin wrote:
 On Tue, Mar 8, 2016 at 7:21 AM, Christian König
  wrote:
> @@ -80,7 +82,7 @@ vlVdpOutputSurfaceCreate(VdpDevice device,
>   res_tmpl.depth0 = 1;
>   res_tmpl.array_size = 1;
>   res_tmpl.bind = PIPE_BIND_SAMPLER_VIEW |
> PIPE_BIND_RENDER_TARGET |
> -   PIPE_BIND_LINEAR;
> +   PIPE_BIND_LINEAR | PIPE_BIND_SHARED;
 Hi Christian,

 This change appears to have semi-broken vdpau on nouveau. Whenever 
 I
 flip on the OSD in mplayer, the rendering becomes *extremely* slow.
 However regular up-scaling without the OSD is plenty fast. This
 effectively is forcing the output surfaces to live in GART instead 
 of
 VRAM.
>>> Strictly speaking, they'd only need to be forced to GART while 
>>> they're
>>> actually being shared between different GPUs. That's how it works 
>>> with
>>> the amdgpu and radeon kernel drivers.
>> Any suggestions on how to handle this? Perhaps reallocate + copy the
>> surface in st/vdpau when actual dmabuf sharing is requested?
>>
>> To be clear - with this change, vdpau with nouveau is unusable in the
>> presence of an OSD in mplayer. The OSD comes up whenever you seek
>> around in the video, so in effect, it's unusable. Used to work great.
>
> Well I think you should clearly figure out why adding
> PIPE_BIND_SHARED has
> such dramatic effect.
 Because the buffer goes into GART. And then you try to blend on it,
 which involves readback from GART (that's how the functions OSD is
 based on work, I believe). We normally don't allocate renderable
 surfaces or textures in GART.

> We not only need this for DMA-buf based interop, but also for the
> DRI3 based
> sharing of buffers with X.
>
> So that clearly sounds like a bug in nouveau to me.
 OK, so SHARED != GART? With nouveau, buffers are placed statically in
 either VRAM or GART, so I think that if it's shared it has to end up
 in GART, no?
>>>
>>> As far as I understand it no. Shared just means that we can share it
>>> between applications, doesn't it? Or does it mean the buffer should be
>>> shareable between GPUs?
>>>
>>> Could be that my understanding was wrong and so if it's the later feel
>>> free to provide a patch to just remove the flag.
>>>
 I'm pretty weak on all these concepts, as well as how the DRI3 stuff
 works, unfortunately.
>>>
>>> I have to confess I'm not so deeply into this stuff either. Marek,
>>> Michel what exactly is the meaning of the flag?
>>
>> According to src/gallium/docs/source/screen.rst:
>>
>> * ``PIPE_BIND_SHARED``: A sharable buffer that can be given to another
>>   process.
>>
>> It's also used e.g. for buffers shared via DRI3. So I'm afraid this is
>> something nouveau has to deal with better.
>
> Any suggestions that don't involve rewriting nouveau bo handling at
> every level (kernel, ddx, mesa)?
>
> Otherwise I'll send a revert for this change.

 PIPE_BIND_SHARED means texture_get_handle is expected to be used on
 the resource, meaning that inter-API, inter-process, or inter-device
 sharing is possible. All window back buffers should have the flag. If
 they don't, it's a bug. If the flag causes nouveau to put the buffer
 in GART, it's a bug too. There is no reason to use GART for inter-API
 and inter-process sharing like VDPAU and DRI3 are.

 To be honest, the flag is pratically useless with respect to EGL and
 VDPAU, which allow sharing almost any texture.

 I suggest you fix nouveau. The first step would be to become less
 dependent on BIND flags whose existence is already questionable.
>>>
>>> As I suspected, merely flipping away from using 

Re: [Mesa-dev] [PATCH 09/10] st/vdpau: implement the new DMA-buf based interop

2016-09-14 Thread Ilia Mirkin
On Wed, Sep 14, 2016 at 9:42 PM, Michel Dänzer  wrote:
> On 15/09/16 08:20 AM, Ilia Mirkin wrote:
>> On Wed, Sep 7, 2016 at 12:06 PM, Marek Olšák  wrote:
>>> On Wed, Sep 7, 2016 at 5:36 PM, Ilia Mirkin  wrote:
 On Wed, Sep 7, 2016 at 4:08 AM, Michel Dänzer  wrote:
> On 07/09/16 04:19 AM, Christian König wrote:
>> Am 06.09.2016 um 21:05 schrieb Ilia Mirkin:
>>> On Tue, Sep 6, 2016 at 2:22 PM, Christian König
>>>  wrote:
 Am 06.09.2016 um 16:23 schrieb Ilia Mirkin:
> On Mon, Sep 5, 2016 at 2:48 AM, Michel Dänzer 
> wrote:
>> On 05/09/16 04:37 AM, Ilia Mirkin wrote:
>>> On Tue, Mar 8, 2016 at 7:21 AM, Christian König
>>>  wrote:
 @@ -80,7 +82,7 @@ vlVdpOutputSurfaceCreate(VdpDevice device,
   res_tmpl.depth0 = 1;
   res_tmpl.array_size = 1;
   res_tmpl.bind = PIPE_BIND_SAMPLER_VIEW |
 PIPE_BIND_RENDER_TARGET |
 -   PIPE_BIND_LINEAR;
 +   PIPE_BIND_LINEAR | PIPE_BIND_SHARED;
>>> Hi Christian,
>>>
>>> This change appears to have semi-broken vdpau on nouveau. Whenever I
>>> flip on the OSD in mplayer, the rendering becomes *extremely* slow.
>>> However regular up-scaling without the OSD is plenty fast. This
>>> effectively is forcing the output surfaces to live in GART instead 
>>> of
>>> VRAM.
>> Strictly speaking, they'd only need to be forced to GART while 
>> they're
>> actually being shared between different GPUs. That's how it works 
>> with
>> the amdgpu and radeon kernel drivers.
> Any suggestions on how to handle this? Perhaps reallocate + copy the
> surface in st/vdpau when actual dmabuf sharing is requested?
>
> To be clear - with this change, vdpau with nouveau is unusable in the
> presence of an OSD in mplayer. The OSD comes up whenever you seek
> around in the video, so in effect, it's unusable. Used to work great.

 Well I think you should clearly figure out why adding
 PIPE_BIND_SHARED has
 such dramatic effect.
>>> Because the buffer goes into GART. And then you try to blend on it,
>>> which involves readback from GART (that's how the functions OSD is
>>> based on work, I believe). We normally don't allocate renderable
>>> surfaces or textures in GART.
>>>
 We not only need this for DMA-buf based interop, but also for the
 DRI3 based
 sharing of buffers with X.

 So that clearly sounds like a bug in nouveau to me.
>>> OK, so SHARED != GART? With nouveau, buffers are placed statically in
>>> either VRAM or GART, so I think that if it's shared it has to end up
>>> in GART, no?
>>
>> As far as I understand it no. Shared just means that we can share it
>> between applications, doesn't it? Or does it mean the buffer should be
>> shareable between GPUs?
>>
>> Could be that my understanding was wrong and so if it's the later feel
>> free to provide a patch to just remove the flag.
>>
>>> I'm pretty weak on all these concepts, as well as how the DRI3 stuff
>>> works, unfortunately.
>>
>> I have to confess I'm not so deeply into this stuff either. Marek,
>> Michel what exactly is the meaning of the flag?
>
> According to src/gallium/docs/source/screen.rst:
>
> * ``PIPE_BIND_SHARED``: A sharable buffer that can be given to another
>   process.
>
> It's also used e.g. for buffers shared via DRI3. So I'm afraid this is
> something nouveau has to deal with better.

 Any suggestions that don't involve rewriting nouveau bo handling at
 every level (kernel, ddx, mesa)?

 Otherwise I'll send a revert for this change.
>>>
>>> PIPE_BIND_SHARED means texture_get_handle is expected to be used on
>>> the resource, meaning that inter-API, inter-process, or inter-device
>>> sharing is possible. All window back buffers should have the flag. If
>>> they don't, it's a bug. If the flag causes nouveau to put the buffer
>>> in GART, it's a bug too. There is no reason to use GART for inter-API
>>> and inter-process sharing like VDPAU and DRI3 are.
>>>
>>> To be honest, the flag is pratically useless with respect to EGL and
>>> VDPAU, which allow sharing almost any texture.
>>>
>>> I suggest you fix nouveau. The first step would be to become less
>>> dependent on BIND flags whose existence is already questionable.
>>
>> As I suspected, merely flipping away from using PIPE_BIND_SHARED
>> doesn't work. By flipping the logic like this:
>>
>> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
>> 

Re: [Mesa-dev] [PATCH] st/vdpau: remove nouveau target

2016-09-14 Thread Ilia Mirkin
On Wed, Sep 14, 2016 at 9:52 PM, Ilia Mirkin  wrote:
> Recent changes have been made to the VDPAU state tracker to make it
> unusable with nouveau. Don't provide users with an awfully slow
> "hardware" decoding option.
>
> [To preemptively answer the question that will invariably be asked -
> this is due to the state tracker's use of PIPE_BIND_SHARED, which
> nouveau uses to force GART placement to make things with with PRIME.
> However when this is used for output surfaces, which are then blended on
> (the most common way of implementing an OSD), this results in
> *incredibly* slow operation.]
>
> Signed-off-by: Ilia Mirkin 

Oops, meant to add a CC to mesa-stable, since the breakage was
introduced in 12.0.

> ---
>  src/gallium/targets/vdpau/Makefile.am | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/src/gallium/targets/vdpau/Makefile.am 
> b/src/gallium/targets/vdpau/Makefile.am
> index d388f8b..9549a23 100644
> --- a/src/gallium/targets/vdpau/Makefile.am
> +++ b/src/gallium/targets/vdpau/Makefile.am
> @@ -49,8 +49,6 @@ TARGET_DRIVERS =
>  TARGET_CPPFLAGS =
>  TARGET_LIB_DEPS =
>
> -include $(top_srcdir)/src/gallium/drivers/nouveau/Automake.inc
> -
>  include $(top_srcdir)/src/gallium/drivers/r300/Automake.inc
>  include $(top_srcdir)/src/gallium/drivers/r600/Automake.inc
>  include $(top_srcdir)/src/gallium/drivers/radeonsi/Automake.inc
> --
> 2.7.3
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] st/vdpau: remove nouveau target

2016-09-14 Thread Ilia Mirkin
Recent changes have been made to the VDPAU state tracker to make it
unusable with nouveau. Don't provide users with an awfully slow
"hardware" decoding option.

[To preemptively answer the question that will invariably be asked -
this is due to the state tracker's use of PIPE_BIND_SHARED, which
nouveau uses to force GART placement to make things with with PRIME.
However when this is used for output surfaces, which are then blended on
(the most common way of implementing an OSD), this results in
*incredibly* slow operation.]

Signed-off-by: Ilia Mirkin 
---
 src/gallium/targets/vdpau/Makefile.am | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/gallium/targets/vdpau/Makefile.am 
b/src/gallium/targets/vdpau/Makefile.am
index d388f8b..9549a23 100644
--- a/src/gallium/targets/vdpau/Makefile.am
+++ b/src/gallium/targets/vdpau/Makefile.am
@@ -49,8 +49,6 @@ TARGET_DRIVERS =
 TARGET_CPPFLAGS =
 TARGET_LIB_DEPS =
 
-include $(top_srcdir)/src/gallium/drivers/nouveau/Automake.inc
-
 include $(top_srcdir)/src/gallium/drivers/r300/Automake.inc
 include $(top_srcdir)/src/gallium/drivers/r600/Automake.inc
 include $(top_srcdir)/src/gallium/drivers/radeonsi/Automake.inc
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Problem with RX 480 on Alien: Isolation and Dota 2

2016-09-14 Thread Michel Dänzer
On 14/09/16 07:41 PM, Marek Olšák wrote:
> On Wed, Sep 14, 2016 at 5:26 AM, Michel Dänzer  wrote:
>> On 14/09/16 02:53 AM, Marek Olšák wrote:
>>>
>>> cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/x86_64-linux-gnu
>>> -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=O
>>>   -DCMAKE_BUILD_TYPE=RelWithDebInfo
>>> -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \
>>>   -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
>>> -fno-omit-frame-pointer" \
>>>   -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
>>> -fno-omit-frame-pointer".
>>
>> FWIW, I recommend enabling assertions, i.e. setting
>> -DLLVM_ENABLE_ASSERTIONS=1 and removing -DNDEBUG.
> 
> That should have been:
> 
> -DLLVM_ENABLE_ASSERTIONS=ON \
> 
> It was cut when I was copy-pasting it.

Doesn't -DNDEBUG disable assertions anyway though? When was the last
time an LLVM assertion failed for you? :)


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/10] st/vdpau: implement the new DMA-buf based interop

2016-09-14 Thread Michel Dänzer
On 15/09/16 08:20 AM, Ilia Mirkin wrote:
> On Wed, Sep 7, 2016 at 12:06 PM, Marek Olšák  wrote:
>> On Wed, Sep 7, 2016 at 5:36 PM, Ilia Mirkin  wrote:
>>> On Wed, Sep 7, 2016 at 4:08 AM, Michel Dänzer  wrote:
 On 07/09/16 04:19 AM, Christian König wrote:
> Am 06.09.2016 um 21:05 schrieb Ilia Mirkin:
>> On Tue, Sep 6, 2016 at 2:22 PM, Christian König
>>  wrote:
>>> Am 06.09.2016 um 16:23 schrieb Ilia Mirkin:
 On Mon, Sep 5, 2016 at 2:48 AM, Michel Dänzer 
 wrote:
> On 05/09/16 04:37 AM, Ilia Mirkin wrote:
>> On Tue, Mar 8, 2016 at 7:21 AM, Christian König
>>  wrote:
>>> @@ -80,7 +82,7 @@ vlVdpOutputSurfaceCreate(VdpDevice device,
>>>   res_tmpl.depth0 = 1;
>>>   res_tmpl.array_size = 1;
>>>   res_tmpl.bind = PIPE_BIND_SAMPLER_VIEW |
>>> PIPE_BIND_RENDER_TARGET |
>>> -   PIPE_BIND_LINEAR;
>>> +   PIPE_BIND_LINEAR | PIPE_BIND_SHARED;
>> Hi Christian,
>>
>> This change appears to have semi-broken vdpau on nouveau. Whenever I
>> flip on the OSD in mplayer, the rendering becomes *extremely* slow.
>> However regular up-scaling without the OSD is plenty fast. This
>> effectively is forcing the output surfaces to live in GART instead of
>> VRAM.
> Strictly speaking, they'd only need to be forced to GART while they're
> actually being shared between different GPUs. That's how it works with
> the amdgpu and radeon kernel drivers.
 Any suggestions on how to handle this? Perhaps reallocate + copy the
 surface in st/vdpau when actual dmabuf sharing is requested?

 To be clear - with this change, vdpau with nouveau is unusable in the
 presence of an OSD in mplayer. The OSD comes up whenever you seek
 around in the video, so in effect, it's unusable. Used to work great.
>>>
>>> Well I think you should clearly figure out why adding
>>> PIPE_BIND_SHARED has
>>> such dramatic effect.
>> Because the buffer goes into GART. And then you try to blend on it,
>> which involves readback from GART (that's how the functions OSD is
>> based on work, I believe). We normally don't allocate renderable
>> surfaces or textures in GART.
>>
>>> We not only need this for DMA-buf based interop, but also for the
>>> DRI3 based
>>> sharing of buffers with X.
>>>
>>> So that clearly sounds like a bug in nouveau to me.
>> OK, so SHARED != GART? With nouveau, buffers are placed statically in
>> either VRAM or GART, so I think that if it's shared it has to end up
>> in GART, no?
>
> As far as I understand it no. Shared just means that we can share it
> between applications, doesn't it? Or does it mean the buffer should be
> shareable between GPUs?
>
> Could be that my understanding was wrong and so if it's the later feel
> free to provide a patch to just remove the flag.
>
>> I'm pretty weak on all these concepts, as well as how the DRI3 stuff
>> works, unfortunately.
>
> I have to confess I'm not so deeply into this stuff either. Marek,
> Michel what exactly is the meaning of the flag?

 According to src/gallium/docs/source/screen.rst:

 * ``PIPE_BIND_SHARED``: A sharable buffer that can be given to another
   process.

 It's also used e.g. for buffers shared via DRI3. So I'm afraid this is
 something nouveau has to deal with better.
>>>
>>> Any suggestions that don't involve rewriting nouveau bo handling at
>>> every level (kernel, ddx, mesa)?
>>>
>>> Otherwise I'll send a revert for this change.
>>
>> PIPE_BIND_SHARED means texture_get_handle is expected to be used on
>> the resource, meaning that inter-API, inter-process, or inter-device
>> sharing is possible. All window back buffers should have the flag. If
>> they don't, it's a bug. If the flag causes nouveau to put the buffer
>> in GART, it's a bug too. There is no reason to use GART for inter-API
>> and inter-process sharing like VDPAU and DRI3 are.
>>
>> To be honest, the flag is pratically useless with respect to EGL and
>> VDPAU, which allow sharing almost any texture.
>>
>> I suggest you fix nouveau. The first step would be to become less
>> dependent on BIND flags whose existence is already questionable.
> 
> As I suspected, merely flipping away from using PIPE_BIND_SHARED
> doesn't work. By flipping the logic like this:
> 
> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
> b/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
> index f2e304f..5532794 100644
> --- a/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
> +++ b/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
> @@ -377,7 +377,8 @@ nv50_miptree_create(struct 

[Mesa-dev] [Bug 97260] [bisected] R9 290 low performance in Linux 4.7

2016-09-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97260

--- Comment #49 from Michel Dänzer  ---
Seems like there are various different issues at play here. The bottom line is:
Don't expect your issue to get fixed without bisecting it.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Expose RESET_NOTIFICATION_STRATEGY with KHR_robustness.

2016-09-14 Thread Ilia Mirkin
Reviewed-by: Ilia Mirkin 

On Wed, Sep 14, 2016 at 8:47 PM, Kenneth Graunke  wrote:
> This is supposed to be exposed with the GL_KHR_robustness extension,
> which we support on ES 2.0 and later.  On desktop GL, it's also exposed
> by GL_ARB_robustness, which is supported by all drivers ("dummy_true").
> so we also allow desktop GL.
>
> Fixes:
> - ES32-CTS.robust.robustness.noResetNotification
> - ES32-CTS.robust.robustness.loseContextOnReset
>
> Cc: mesa-sta...@lists.freedesktop.org
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/main/get.c  | 7 +++
>  src/mesa/main/get_hash_params.py | 6 +++---
>  2 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
> index 3cabb2b..e7ebc7f 100644
> --- a/src/mesa/main/get.c
> +++ b/src/mesa/main/get.c
> @@ -423,6 +423,13 @@ static const int extra_ES32[] = {
> EXTRA_END
>  };
>
> +static const int extra_KHR_robustness_or_GL[] = {
> +   EXT(KHR_robustness),
> +   EXTRA_API_GL,
> +   EXTRA_API_GL_CORE,
> +   EXTRA_END
> +};
> +
>  EXTRA_EXT(ARB_texture_cube_map);
>  EXTRA_EXT(EXT_texture_array);
>  EXTRA_EXT(NV_fog_distance);
> diff --git a/src/mesa/main/get_hash_params.py 
> b/src/mesa/main/get_hash_params.py
> index 4b86697..1f63dc3 100644
> --- a/src/mesa/main/get_hash_params.py
> +++ b/src/mesa/main/get_hash_params.py
> @@ -341,6 +341,9 @@ descriptor=[
>
>  # GL_KHR_blend_equation_advanced_coherent
>[ "BLEND_ADVANCED_COHERENT_KHR", "CONTEXT_BOOL(Color.BlendCoherent), 
> extra_KHR_blend_equation_advanced_coherent" ],
> +
> +# GL_ARB_robustness / GL_KHR_robustness
> +  [ "RESET_NOTIFICATION_STRATEGY_ARB", "CONTEXT_ENUM(Const.ResetStrategy), 
> extra_KHR_robustness_or_GL" ],
>  ]},
>
>  # GLES3 is not a typo.
> @@ -901,9 +904,6 @@ descriptor=[
>  # GL 3.2
>[ "CONTEXT_PROFILE_MASK", "CONTEXT_INT(Const.ProfileMask), 
> extra_version_32" ],
>
> -# GL_ARB_robustness
> -  [ "RESET_NOTIFICATION_STRATEGY_ARB", "CONTEXT_ENUM(Const.ResetStrategy), 
> NO_EXTRA" ],
> -
>  # GL_ARB_timer_query
>[ "TIMESTAMP", "LOC_CUSTOM, TYPE_INT64, 0, extra_ARB_timer_query" ],
>
> --
> 2.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: Expose RESET_NOTIFICATION_STRATEGY with KHR_robustness.

2016-09-14 Thread Kenneth Graunke
This is supposed to be exposed with the GL_KHR_robustness extension,
which we support on ES 2.0 and later.  On desktop GL, it's also exposed
by GL_ARB_robustness, which is supported by all drivers ("dummy_true").
so we also allow desktop GL.

Fixes:
- ES32-CTS.robust.robustness.noResetNotification
- ES32-CTS.robust.robustness.loseContextOnReset

Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Kenneth Graunke 
---
 src/mesa/main/get.c  | 7 +++
 src/mesa/main/get_hash_params.py | 6 +++---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
index 3cabb2b..e7ebc7f 100644
--- a/src/mesa/main/get.c
+++ b/src/mesa/main/get.c
@@ -423,6 +423,13 @@ static const int extra_ES32[] = {
EXTRA_END
 };
 
+static const int extra_KHR_robustness_or_GL[] = {
+   EXT(KHR_robustness),
+   EXTRA_API_GL,
+   EXTRA_API_GL_CORE,
+   EXTRA_END
+};
+
 EXTRA_EXT(ARB_texture_cube_map);
 EXTRA_EXT(EXT_texture_array);
 EXTRA_EXT(NV_fog_distance);
diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/get_hash_params.py
index 4b86697..1f63dc3 100644
--- a/src/mesa/main/get_hash_params.py
+++ b/src/mesa/main/get_hash_params.py
@@ -341,6 +341,9 @@ descriptor=[
 
 # GL_KHR_blend_equation_advanced_coherent
   [ "BLEND_ADVANCED_COHERENT_KHR", "CONTEXT_BOOL(Color.BlendCoherent), 
extra_KHR_blend_equation_advanced_coherent" ],
+
+# GL_ARB_robustness / GL_KHR_robustness
+  [ "RESET_NOTIFICATION_STRATEGY_ARB", "CONTEXT_ENUM(Const.ResetStrategy), 
extra_KHR_robustness_or_GL" ],
 ]},
 
 # GLES3 is not a typo.
@@ -901,9 +904,6 @@ descriptor=[
 # GL 3.2
   [ "CONTEXT_PROFILE_MASK", "CONTEXT_INT(Const.ProfileMask), extra_version_32" 
],
 
-# GL_ARB_robustness
-  [ "RESET_NOTIFICATION_STRATEGY_ARB", "CONTEXT_ENUM(Const.ResetStrategy), 
NO_EXTRA" ],
-
 # GL_ARB_timer_query
   [ "TIMESTAMP", "LOC_CUSTOM, TYPE_INT64, 0, extra_ARB_timer_query" ],
 
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97260] [bisected] R9 290 low performance in Linux 4.7

2016-09-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97260

--- Comment #48 from alvarex  ---
edit: It's from september the 6th not from the 7th.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97260] [bisected] R9 290 low performance in Linux 4.7

2016-09-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97260

--- Comment #47 from alvarex  ---
*edit* I get 50 fps we newer kernel 4.8rc5; with kernel 4.8rc5 and Mesa 12.0.2
the performance is the same.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97260] [bisected] R9 290 low performance in Linux 4.7

2016-09-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97260

--- Comment #45 from Clésio Luiz  ---
Padoka PPA finally updated, version 2.1~git1600912162600.546bc07~x~padoka0.

Here the bug continues. This time a take the time to test in various games.
Valve's Source Engine take less hit by this bug, only about 50% loss in
performance. But others, like American Truck Simulator and Unigine Valley take
a 70/80 % hit in loss of performance, both cannot pass 10/12 FPS in a beefy
Core i7 and a R9 290.

I tested in kernel 4.7.3 and 4.8-RC6.

--- Comment #46 from alvarex  ---
I have hit this bug too with r7 260x. I don't have time for bisecting but with
Mesa from 7th september the perfomance is fine . I ve also noticed some
perfomance delta dif between the 12.0.2 release and git from september 7th .
With dirt showdown with kernel 4.6.7 and LIBL_DRI3_DISABLE, I get an average of
35~30 framerates with Mesa from 7th september I get 50 fps. 
Anyway here I have setup a repo if someone else on Opensuse is hitting the same
bug. 

https://build.opensuse.org/package/show?project=home%3Aalvarex%3Abranches%3Ahome%3Apontostroy%3AX11=Mesa

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/10] st/vdpau: implement the new DMA-buf based interop

2016-09-14 Thread Ilia Mirkin
On Wed, Sep 7, 2016 at 12:06 PM, Marek Olšák  wrote:
> On Wed, Sep 7, 2016 at 5:36 PM, Ilia Mirkin  wrote:
>> On Wed, Sep 7, 2016 at 4:08 AM, Michel Dänzer  wrote:
>>> On 07/09/16 04:19 AM, Christian König wrote:
 Am 06.09.2016 um 21:05 schrieb Ilia Mirkin:
> On Tue, Sep 6, 2016 at 2:22 PM, Christian König
>  wrote:
>> Am 06.09.2016 um 16:23 schrieb Ilia Mirkin:
>>> On Mon, Sep 5, 2016 at 2:48 AM, Michel Dänzer 
>>> wrote:
 On 05/09/16 04:37 AM, Ilia Mirkin wrote:
> On Tue, Mar 8, 2016 at 7:21 AM, Christian König
>  wrote:
>> @@ -80,7 +82,7 @@ vlVdpOutputSurfaceCreate(VdpDevice device,
>>   res_tmpl.depth0 = 1;
>>   res_tmpl.array_size = 1;
>>   res_tmpl.bind = PIPE_BIND_SAMPLER_VIEW |
>> PIPE_BIND_RENDER_TARGET |
>> -   PIPE_BIND_LINEAR;
>> +   PIPE_BIND_LINEAR | PIPE_BIND_SHARED;
> Hi Christian,
>
> This change appears to have semi-broken vdpau on nouveau. Whenever I
> flip on the OSD in mplayer, the rendering becomes *extremely* slow.
> However regular up-scaling without the OSD is plenty fast. This
> effectively is forcing the output surfaces to live in GART instead of
> VRAM.
 Strictly speaking, they'd only need to be forced to GART while they're
 actually being shared between different GPUs. That's how it works with
 the amdgpu and radeon kernel drivers.
>>> Any suggestions on how to handle this? Perhaps reallocate + copy the
>>> surface in st/vdpau when actual dmabuf sharing is requested?
>>>
>>> To be clear - with this change, vdpau with nouveau is unusable in the
>>> presence of an OSD in mplayer. The OSD comes up whenever you seek
>>> around in the video, so in effect, it's unusable. Used to work great.
>>
>> Well I think you should clearly figure out why adding
>> PIPE_BIND_SHARED has
>> such dramatic effect.
> Because the buffer goes into GART. And then you try to blend on it,
> which involves readback from GART (that's how the functions OSD is
> based on work, I believe). We normally don't allocate renderable
> surfaces or textures in GART.
>
>> We not only need this for DMA-buf based interop, but also for the
>> DRI3 based
>> sharing of buffers with X.
>>
>> So that clearly sounds like a bug in nouveau to me.
> OK, so SHARED != GART? With nouveau, buffers are placed statically in
> either VRAM or GART, so I think that if it's shared it has to end up
> in GART, no?

 As far as I understand it no. Shared just means that we can share it
 between applications, doesn't it? Or does it mean the buffer should be
 shareable between GPUs?

 Could be that my understanding was wrong and so if it's the later feel
 free to provide a patch to just remove the flag.

> I'm pretty weak on all these concepts, as well as how the DRI3 stuff
> works, unfortunately.

 I have to confess I'm not so deeply into this stuff either. Marek,
 Michel what exactly is the meaning of the flag?
>>>
>>> According to src/gallium/docs/source/screen.rst:
>>>
>>> * ``PIPE_BIND_SHARED``: A sharable buffer that can be given to another
>>>   process.
>>>
>>> It's also used e.g. for buffers shared via DRI3. So I'm afraid this is
>>> something nouveau has to deal with better.
>>
>> Any suggestions that don't involve rewriting nouveau bo handling at
>> every level (kernel, ddx, mesa)?
>>
>> Otherwise I'll send a revert for this change.
>
> PIPE_BIND_SHARED means texture_get_handle is expected to be used on
> the resource, meaning that inter-API, inter-process, or inter-device
> sharing is possible. All window back buffers should have the flag. If
> they don't, it's a bug. If the flag causes nouveau to put the buffer
> in GART, it's a bug too. There is no reason to use GART for inter-API
> and inter-process sharing like VDPAU and DRI3 are.
>
> To be honest, the flag is pratically useless with respect to EGL and
> VDPAU, which allow sharing almost any texture.
>
> I suggest you fix nouveau. The first step would be to become less
> dependent on BIND flags whose existence is already questionable.

As I suspected, merely flipping away from using PIPE_BIND_SHARED
doesn't work. By flipping the logic like this:

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
b/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
index f2e304f..5532794 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_miptree.c
@@ -377,7 +377,8 @@ nv50_miptree_create(struct pipe_screen *pscreen,
}
bo_config.nv50.tile_mode = mt->level[0].tile_mode;

-   if (!bo_config.nv50.memtype && (pt->bind & PIPE_BIND_SHARED))
+   

[Mesa-dev] [PATCH v2] st/vdpau: fix argument type to vlVdpOutputSurfaceDMABuf

2016-09-14 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin 
---

v1 -> v2: adjust typedef in vdpau_dmabuf.h, per Nayan

 src/gallium/include/state_tracker/vdpau_dmabuf.h | 2 +-
 src/gallium/state_trackers/vdpau/output.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/include/state_tracker/vdpau_dmabuf.h 
b/src/gallium/include/state_tracker/vdpau_dmabuf.h
index 886c344..f838c92 100644
--- a/src/gallium/include/state_tracker/vdpau_dmabuf.h
+++ b/src/gallium/include/state_tracker/vdpau_dmabuf.h
@@ -87,7 +87,7 @@ typedef VdpStatus VdpVideoSurfaceDMABuf(
 );
 
 typedef VdpStatus VdpOutputSurfaceDMABuf(
-   VdpVideoSurface   surface,
+   VdpOutputSurface  surface,
struct VdpSurfaceDMABufDesc * result
 );
 
diff --git a/src/gallium/state_trackers/vdpau/output.c 
b/src/gallium/state_trackers/vdpau/output.c
index 85751ea..f4d62a3 100644
--- a/src/gallium/state_trackers/vdpau/output.c
+++ b/src/gallium/state_trackers/vdpau/output.c
@@ -773,7 +773,7 @@ struct pipe_resource 
*vlVdpOutputSurfaceGallium(VdpOutputSurface surface)
return vlsurface->surface->texture;
 }
 
-VdpStatus vlVdpOutputSurfaceDMABuf(VdpVideoSurface surface,
+VdpStatus vlVdpOutputSurfaceDMABuf(VdpOutputSurface surface,
struct VdpSurfaceDMABufDesc *result)
 {
vlVdpOutputSurface *vlsurface;
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/vdpau: fix argument type to vlVdpOutputSurfaceDMABuf

2016-09-14 Thread Nayan Deshmukh
Hi Ilia,

You also need to change the arguments in gallium/include/st/vdpau_dmabuf.h.

Regards,
Nayan.

On Thu, Sep 15, 2016 at 3:56 AM, Ilia Mirkin  wrote:

> Signed-off-by: Ilia Mirkin 
> ---
>  src/gallium/state_trackers/vdpau/output.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/gallium/state_trackers/vdpau/output.c
> b/src/gallium/state_trackers/vdpau/output.c
> index 85751ea..f4d62a3 100644
> --- a/src/gallium/state_trackers/vdpau/output.c
> +++ b/src/gallium/state_trackers/vdpau/output.c
> @@ -773,7 +773,7 @@ struct pipe_resource 
> *vlVdpOutputSurfaceGallium(VdpOutputSurface
> surface)
> return vlsurface->surface->texture;
>  }
>
> -VdpStatus vlVdpOutputSurfaceDMABuf(VdpVideoSurface surface,
> +VdpStatus vlVdpOutputSurfaceDMABuf(VdpOutputSurface surface,
> struct VdpSurfaceDMABufDesc *result)
>  {
> vlVdpOutputSurface *vlsurface;
> --
> 2.7.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] st/vdpau: fix argument type to vlVdpOutputSurfaceDMABuf

2016-09-14 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin 
---
 src/gallium/state_trackers/vdpau/output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/vdpau/output.c 
b/src/gallium/state_trackers/vdpau/output.c
index 85751ea..f4d62a3 100644
--- a/src/gallium/state_trackers/vdpau/output.c
+++ b/src/gallium/state_trackers/vdpau/output.c
@@ -773,7 +773,7 @@ struct pipe_resource 
*vlVdpOutputSurfaceGallium(VdpOutputSurface surface)
return vlsurface->surface->texture;
 }
 
-VdpStatus vlVdpOutputSurfaceDMABuf(VdpVideoSurface surface,
+VdpStatus vlVdpOutputSurfaceDMABuf(VdpOutputSurface surface,
struct VdpSurfaceDMABufDesc *result)
 {
vlVdpOutputSurface *vlsurface;
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] i965/reg: Make brw_sr0_reg take a subnr and return a vec1 reg

2016-09-14 Thread Jason Ekstrand
The state register sr0 is really a collection of dwords not a SIMD8
anything.  It's much more convenient for brw_sr0_reg to return the
particular dword you're looking for rather than a giant blob you have to
massage into what you want.

Signed-off-by: Jason Ekstrand 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp |  2 +-
 src/mesa/drivers/dri/i965/brw_reg.h  | 20 
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 75642d3..ff0b25e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -6268,7 +6268,7 @@ fs_visitor::run_cs()
if (devinfo->is_haswell && prog_data->total_shared > 0) {
   /* Move SLM index from g0.0[27:24] to sr0.1[11:8] */
   const fs_builder abld = bld.exec_all().group(1, 0);
-  abld.MOV(retype(suboffset(brw_sr0_reg(), 1), BRW_REGISTER_TYPE_UW),
+  abld.MOV(retype(brw_sr0_reg(1), BRW_REGISTER_TYPE_UW),
suboffset(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_UW), 1));
}
 
diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
b/src/mesa/drivers/dri/i965/brw_reg.h
index d6f22ed..531d24e 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i965/brw_reg.h
@@ -567,6 +567,12 @@ brw_uw1_reg(enum brw_reg_file file, unsigned nr, unsigned 
subnr)
 }
 
 static inline struct brw_reg
+brw_ud1_reg(enum brw_reg_file file, unsigned nr, unsigned subnr)
+{
+   return suboffset(retype(brw_vec1_reg(file, nr, 0), BRW_REGISTER_TYPE_UD), 
subnr);
+}
+
+static inline struct brw_reg
 brw_imm_reg(enum brw_reg_type type)
 {
return brw_reg(BRW_IMMEDIATE_VALUE,
@@ -789,19 +795,9 @@ brw_notification_reg(void)
 }
 
 static inline struct brw_reg
-brw_sr0_reg(void)
+brw_sr0_reg(unsigned subnr)
 {
-   return brw_reg(BRW_ARCHITECTURE_REGISTER_FILE,
-  BRW_ARF_STATE,
-  0,
-  0,
-  0,
-  BRW_REGISTER_TYPE_UD,
-  BRW_VERTICAL_STRIDE_8,
-  BRW_WIDTH_8,
-  BRW_HORIZONTAL_STRIDE_1,
-  BRW_SWIZZLE_XYZW,
-  WRITEMASK_XYZW);
+   return brw_ud1_reg(BRW_ARCHITECTURE_REGISTER_FILE, BRW_ARF_STATE, subnr);
 }
 
 static inline struct brw_reg
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] i965/fs: Take Dispatch/Vector mask into account in FIND_LIVE_CHANNEL

2016-09-14 Thread Jason Ekstrand
On at least Sky Lake, ce0 does not contain the full story as far as enabled
channels goes.  It is possible to have completely disabled channels where
the corresponding bits in ce0 are 1.  In order to get the correct execution
mask, you have to mask off those channels which were disabled from the
beginning by taking the AND of ce0 with either sr0.2 or sr0.3 depending on
the shader stage.  Failure to do so can result in FIND_LIVE_CHANNEL
returning a completely dead channel.

Signed-off-by: Jason Ekstrand 
Cc: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_eu.h   |  3 ++-
 src/mesa/drivers/dri/i965/brw_eu_emit.c  | 33 +---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  7 +++--
 src/mesa/drivers/dri/i965/brw_reg.h  | 12 +
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  2 +-
 5 files changed, 44 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
b/src/mesa/drivers/dri/i965/brw_eu.h
index 3e52764..737a335 100644
--- a/src/mesa/drivers/dri/i965/brw_eu.h
+++ b/src/mesa/drivers/dri/i965/brw_eu.h
@@ -488,7 +488,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
 
 void
 brw_find_live_channel(struct brw_codegen *p,
-  struct brw_reg dst);
+  struct brw_reg dst,
+  struct brw_reg mask);
 
 void
 brw_broadcast(struct brw_codegen *p,
diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 3b12030..04fca74 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -3361,7 +3361,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
 }
 
 void
-brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst)
+brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst,
+  struct brw_reg mask)
 {
const struct gen_device_info *devinfo = p->devinfo;
const unsigned exec_size = 1 << brw_inst_exec_size(devinfo, p->current);
@@ -3377,18 +3378,32 @@ brw_find_live_channel(struct brw_codegen *p, struct 
brw_reg dst)
 
   if (devinfo->gen >= 8) {
  /* Getting the first active channel index is easy on Gen8: Just find
-  * the first bit set in the mask register.  The same register exists
-  * on HSW already but it reads back as all ones when the current
+  * the first bit set in the execution mask.  The register exists
+  * on HSW already but it reads back ec0 as all ones when the current
   * instruction has execution masking disabled, so it's kind of
   * useless.
   */
- inst = brw_FBL(p, vec1(dst),
-retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD));
+ struct brw_reg exec_mask =
+retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD);
+
+ if (mask.file != BRW_IMMEDIATE_VALUE || mask.ud != 0x) {
+/* Unfortunately, ec0 does not contain exactly what we want.  It
+ * has to first be combined with the dispatch (or vector) mask to
+ * mask off those channels which were never dispatched by the
+ * hardware.
+ */
+brw_SHR(p, vec1(dst), mask, brw_imm_ud(qtr_control * 8));
 
- /* Quarter control has the effect of magically shifting the value of
-  * this register so you'll get the first active channel relative to
-  * the specified quarter control as result.
-  */
+/* Quarter control has the effect of magically shifting the value 
of
+ * ec0 so you'll get the first active channel relative to the
+ * specified quarter control as result.
+ */
+brw_AND(p, vec1(dst), exec_mask, vec1(dst));
+
+exec_mask = vec1(dst);
+ }
+
+ inst = brw_FBL(p, vec1(dst), exec_mask);
   } else {
  const struct brw_reg flag = brw_flag_reg(1, 0);
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 2f4ba7b..52f5308 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -2040,9 +2040,12 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
  generate_set_simd4x2_offset(inst, dst, src[0]);
  break;
 
-  case SHADER_OPCODE_FIND_LIVE_CHANNEL:
- brw_find_live_channel(p, dst);
+  case SHADER_OPCODE_FIND_LIVE_CHANNEL: {
+ const struct brw_reg mask =
+stage == MESA_SHADER_FRAGMENT ? brw_vmask_reg() : brw_dmask_reg();
+ brw_find_live_channel(p, dst, mask);
  break;
+  }
 
   case SHADER_OPCODE_BROADCAST:
  assert(inst->force_writemask_all);
diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
b/src/mesa/drivers/dri/i965/brw_reg.h
index 531d24e..b77deee 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h

[Mesa-dev] [PATCH 3/3] i965/vec4: Always use the predicated MOV path for FIND_LIVE_CHANNEL

2016-09-14 Thread Jason Ekstrand
The old method of looking at ce0 assumed that the enabled channels are
tightly packed.  There is no documentation that guarantees this, however,
so it's a bit dangerous to assume it.  In any case, the predicated MOV
implementation was only 2 instructions so it's not that bad to just use
that all the time and we know for 100% sure it's correct.

Signed-off-by: Jason Ekstrand 
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 32 +++-
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 04fca74..836d555 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -3437,28 +3437,18 @@ brw_find_live_channel(struct brw_codegen *p, struct 
brw_reg dst,
} else {
   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
 
-  if (devinfo->gen >= 8) {
- /* In SIMD4x2 mode the first active channel index is just the
-  * negation of the first bit of the mask register.
-  */
- inst = brw_AND(p, brw_writemask(dst, WRITEMASK_X),
-negate(retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD)),
-brw_imm_ud(1));
-
-  } else {
- /* Overwrite the destination without and with execution masking to
-  * find out which of the channels is active.
-  */
- brw_push_insn_state(p);
- brw_set_default_exec_size(p, BRW_EXECUTE_4);
- brw_MOV(p, brw_writemask(vec4(dst), WRITEMASK_X),
- brw_imm_ud(1));
+  /* Overwrite the destination without and with execution masking to
+   * find out which of the channels is active.
+   */
+  brw_push_insn_state(p);
+  brw_set_default_exec_size(p, BRW_EXECUTE_4);
+  brw_MOV(p, brw_writemask(vec4(dst), WRITEMASK_X),
+  brw_imm_ud(1));
 
- inst = brw_MOV(p, brw_writemask(vec4(dst), WRITEMASK_X),
-brw_imm_ud(0));
- brw_pop_insn_state(p);
- brw_inst_set_mask_control(devinfo, inst, BRW_MASK_ENABLE);
-  }
+  inst = brw_MOV(p, brw_writemask(vec4(dst), WRITEMASK_X),
+ brw_imm_ud(0));
+  brw_pop_insn_state(p);
+  brw_inst_set_mask_control(devinfo, inst, BRW_MASK_ENABLE);
}
 
brw_pop_insn_state(p);
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] i965/fs: Use sample interpolation for interpolateAtCentroid in persample mode

2016-09-14 Thread Anuj Phogat
On Wed, Sep 14, 2016 at 1:31 PM, Jason Ekstrand  wrote:
>
>
> On Wed, Sep 14, 2016 at 1:29 PM, Anuj Phogat  wrote:
>>
>> On Wed, Sep 14, 2016 at 10:45 AM, Jason Ekstrand 
>> wrote:
>> > From the ARB_gpu_shader5 spec:
>> >
>> >The built-in functions interpolateAtCentroid() and
>> > interpolateAtSample()
>> >will sample variables as though they were declared with the
>> > "centroid"
>> >or "sample" qualifiers, respectively.
>> >
>> > When running with persample dispatch forced by the API, we interpolate
>> > anything that isn't flat as if it's qualified by "sample".  In order to
>> > keep interpolateAtCentroid() consistent with the "centroid" qualifier,
>> > we
>> > need to make interpolateAtCentroid() do sample interpolation instead.
>> > Nothing in the GLSL spec guarantees that the result of
>> > interpolateAtCentroid is uniform across samples in any way, so this is a
>> > perfectly fine thing to do.
>> >
>> This explanation sounds good to me. To be consistent with what
>> we do in case of per sample interpolation, shouldn't we do sample
>> interpolation in case of InterpolateAtOffset() too? This series
>> doesn't seem to include it.
>
>
> No.  interpolateAtOffset ask that the input be interpolated at a particular
> offset relative to the pixel center.  I believe we have to respect that.
>
Series is: Reviewed-by: Anuj Phogat 
>>
>> > Fixes 8 of the new dEQP-VK.pipeline.multisample_interpolation.* Vulkan
>> > CTS
>> > tests that specifically validate consistency between the "sample"
>> > qualifier
>> > and interpolateAtSample()
>> >
>> > Signed-off-by: Jason Ekstrand 
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 26 ++
>> >  1 file changed, 26 insertions(+)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > index 75642d3..9dbb699 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > @@ -6497,6 +6497,32 @@ brw_nir_set_default_interpolation(const struct
>> > gen_device_info *devinfo,
>> >   var->data.sample = false;
>> >}
>> > }
>> > +
>> > +   if (per_sample_interpolation) {
>> > +  nir_foreach_block(block, nir_shader_get_entrypoint(nir)) {
>> > + nir_foreach_instr(instr, block) {
>> > +if (instr->type != nir_instr_type_intrinsic)
>> > +   continue;
>> > +
>> > +nir_intrinsic_instr *intrin =
>> > nir_instr_as_intrinsic(instr);
>> > +if (intrin->intrinsic !=
>> > nir_intrinsic_interp_var_at_centroid)
>> > +   continue;
>> > +
>> > +nir_variable *var = intrin->variables[0]->var;
>> > +if (var->data.interpolation == INTERP_MODE_FLAT)
>> > +   continue;
>> > +
>> > +/* The description of the interpolateAtCentroid intrinsic
>> > is that
>> > + * it interpolates the variable as if it had the "centroid"
>> > + * qualifier.  When executing with
>> > per_sample_interpolation, this
>> > + * is equivalent to having the "sample" qualifier.  Just
>> > convert
>> > + * it to a load_var instead.
>> > + */
>> > +assert(var->data.sample);
>> > +intrin->intrinsic = nir_intrinsic_load_var;
>> > + }
>> > +  }
>> > +   }
>> >  }
>> >
>> >  /**
>> > --
>> > 2.5.0.400.gff86faf
>> >
>> > ___
>> > mesa-dev mailing list
>> > mesa-dev@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] i965/fs: Use sample interpolation for interpolateAtCentroid in persample mode

2016-09-14 Thread Jason Ekstrand
On Wed, Sep 14, 2016 at 1:29 PM, Anuj Phogat  wrote:

> On Wed, Sep 14, 2016 at 10:45 AM, Jason Ekstrand 
> wrote:
> > From the ARB_gpu_shader5 spec:
> >
> >The built-in functions interpolateAtCentroid() and
> interpolateAtSample()
> >will sample variables as though they were declared with the "centroid"
> >or "sample" qualifiers, respectively.
> >
> > When running with persample dispatch forced by the API, we interpolate
> > anything that isn't flat as if it's qualified by "sample".  In order to
> > keep interpolateAtCentroid() consistent with the "centroid" qualifier, we
> > need to make interpolateAtCentroid() do sample interpolation instead.
> > Nothing in the GLSL spec guarantees that the result of
> > interpolateAtCentroid is uniform across samples in any way, so this is a
> > perfectly fine thing to do.
> >
> This explanation sounds good to me. To be consistent with what
> we do in case of per sample interpolation, shouldn't we do sample
> interpolation in case of InterpolateAtOffset() too? This series
> doesn't seem to include it.
>

No.  interpolateAtOffset ask that the input be interpolated at a particular
offset relative to the pixel center.  I believe we have to respect that.


> > Fixes 8 of the new dEQP-VK.pipeline.multisample_interpolation.* Vulkan
> CTS
> > tests that specifically validate consistency between the "sample"
> qualifier
> > and interpolateAtSample()
> >
> > Signed-off-by: Jason Ekstrand 
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 26 ++
> >  1 file changed, 26 insertions(+)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > index 75642d3..9dbb699 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -6497,6 +6497,32 @@ brw_nir_set_default_interpolation(const struct
> gen_device_info *devinfo,
> >   var->data.sample = false;
> >}
> > }
> > +
> > +   if (per_sample_interpolation) {
> > +  nir_foreach_block(block, nir_shader_get_entrypoint(nir)) {
> > + nir_foreach_instr(instr, block) {
> > +if (instr->type != nir_instr_type_intrinsic)
> > +   continue;
> > +
> > +nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
> > +if (intrin->intrinsic != nir_intrinsic_interp_var_at_
> centroid)
> > +   continue;
> > +
> > +nir_variable *var = intrin->variables[0]->var;
> > +if (var->data.interpolation == INTERP_MODE_FLAT)
> > +   continue;
> > +
> > +/* The description of the interpolateAtCentroid intrinsic
> is that
> > + * it interpolates the variable as if it had the "centroid"
> > + * qualifier.  When executing with
> per_sample_interpolation, this
> > + * is equivalent to having the "sample" qualifier.  Just
> convert
> > + * it to a load_var instead.
> > + */
> > +assert(var->data.sample);
> > +intrin->intrinsic = nir_intrinsic_load_var;
> > + }
> > +  }
> > +   }
> >  }
> >
> >  /**
> > --
> > 2.5.0.400.gff86faf
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] i965/fs: Use sample interpolation for interpolateAtCentroid in persample mode

2016-09-14 Thread Anuj Phogat
On Wed, Sep 14, 2016 at 10:45 AM, Jason Ekstrand  wrote:
> From the ARB_gpu_shader5 spec:
>
>The built-in functions interpolateAtCentroid() and interpolateAtSample()
>will sample variables as though they were declared with the "centroid"
>or "sample" qualifiers, respectively.
>
> When running with persample dispatch forced by the API, we interpolate
> anything that isn't flat as if it's qualified by "sample".  In order to
> keep interpolateAtCentroid() consistent with the "centroid" qualifier, we
> need to make interpolateAtCentroid() do sample interpolation instead.
> Nothing in the GLSL spec guarantees that the result of
> interpolateAtCentroid is uniform across samples in any way, so this is a
> perfectly fine thing to do.
>
This explanation sounds good to me. To be consistent with what
we do in case of per sample interpolation, shouldn't we do sample
interpolation in case of InterpolateAtOffset() too? This series
doesn't seem to include it.

> Fixes 8 of the new dEQP-VK.pipeline.multisample_interpolation.* Vulkan CTS
> tests that specifically validate consistency between the "sample" qualifier
> and interpolateAtSample()
>
> Signed-off-by: Jason Ekstrand 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 26 ++
>  1 file changed, 26 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 75642d3..9dbb699 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -6497,6 +6497,32 @@ brw_nir_set_default_interpolation(const struct 
> gen_device_info *devinfo,
>   var->data.sample = false;
>}
> }
> +
> +   if (per_sample_interpolation) {
> +  nir_foreach_block(block, nir_shader_get_entrypoint(nir)) {
> + nir_foreach_instr(instr, block) {
> +if (instr->type != nir_instr_type_intrinsic)
> +   continue;
> +
> +nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
> +if (intrin->intrinsic != nir_intrinsic_interp_var_at_centroid)
> +   continue;
> +
> +nir_variable *var = intrin->variables[0]->var;
> +if (var->data.interpolation == INTERP_MODE_FLAT)
> +   continue;
> +
> +/* The description of the interpolateAtCentroid intrinsic is that
> + * it interpolates the variable as if it had the "centroid"
> + * qualifier.  When executing with per_sample_interpolation, this
> + * is equivalent to having the "sample" qualifier.  Just convert
> + * it to a load_var instead.
> + */
> +assert(var->data.sample);
> +intrin->intrinsic = nir_intrinsic_load_var;
> + }
> +  }
> +   }
>  }
>
>  /**
> --
> 2.5.0.400.gff86faf
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 3/7] intel/isl: Add support for 1-D compressed textures

2016-09-14 Thread Jason Ekstrand
Compressed 1-D textures are a well-defined thing in both GL and Vulkan.

v2: Fix some asserts (Nanley)

Signed-off-by: Jason Ekstrand 
---
 src/intel/isl/isl.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
index a75fddf..710c990 100644
--- a/src/intel/isl/isl.c
+++ b/src/intel/isl/isl.c
@@ -518,7 +518,6 @@ isl_calc_phys_level0_extent_sa(const struct isl_device *dev,
   assert(info->height == 1);
   assert(info->depth == 1);
   assert(info->samples == 1);
-  assert(!isl_format_is_compressed(info->format));
 
   switch (dim_layout) {
   case ISL_DIM_LAYOUT_GEN4_3D:
@@ -527,8 +526,8 @@ isl_calc_phys_level0_extent_sa(const struct isl_device *dev,
   case ISL_DIM_LAYOUT_GEN9_1D:
   case ISL_DIM_LAYOUT_GEN4_2D:
  *phys_level0_sa = (struct isl_extent4d) {
-.w = info->width,
-.h = 1,
+.w = isl_align_npot(info->width, fmtl->bw),
+.h = fmtl->bh,
 .d = 1,
 .a = info->array_len,
  };
@@ -757,7 +756,7 @@ isl_calc_phys_slice0_extent_sa_gen9_1d(
 {
MAYBE_UNUSED const struct isl_format_layout *fmtl = 
isl_format_get_layout(info->format);
 
-   assert(phys_level0_sa->height == 1);
+   assert(phys_level0_sa->height == fmtl->bh);
assert(phys_level0_sa->depth == 1);
assert(info->samples == 1);
assert(image_align_sa->w >= fmtl->bw);
@@ -1567,9 +1566,12 @@ get_image_offset_sa_gen9_1d(const struct isl_surf *surf,
 uint32_t *x_offset_sa,
 uint32_t *y_offset_sa)
 {
+   MAYBE_UNUSED const struct isl_format_layout *fmtl =
+  isl_format_get_layout(surf->format);
+
assert(level < surf->levels);
assert(layer < surf->phys_level0_sa.array_len);
-   assert(surf->phys_level0_sa.height == 1);
+   assert(surf->phys_level0_sa.height == fmtl->bh);
assert(surf->phys_level0_sa.depth == 1);
assert(surf->samples == 1);
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 3/7] intel/isl: Add support for 1-D compressed textures

2016-09-14 Thread Jason Ekstrand
On Wed, Sep 14, 2016 at 11:10 AM, Nanley Chery 
wrote:

> On Mon, Sep 12, 2016 at 05:58:20PM -0700, Jason Ekstrand wrote:
> > Compressed 1-D textures are a well-defined thing in both GL and Vulkan.
> > ---
> >  src/intel/isl/isl.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> > index a75fddf..185984d 100644
> > --- a/src/intel/isl/isl.c
> > +++ b/src/intel/isl/isl.c
> > @@ -518,7 +518,6 @@ isl_calc_phys_level0_extent_sa(const struct
> isl_device *dev,
> >assert(info->height == 1);
> >assert(info->depth == 1);
> >assert(info->samples == 1);
> > -  assert(!isl_format_is_compressed(info->format));
> >
> >switch (dim_layout) {
> >case ISL_DIM_LAYOUT_GEN4_3D:
> > @@ -527,8 +526,8 @@ isl_calc_phys_level0_extent_sa(const struct
> isl_device *dev,
> >case ISL_DIM_LAYOUT_GEN9_1D:
> >case ISL_DIM_LAYOUT_GEN4_2D:
> >   *phys_level0_sa = (struct isl_extent4d) {
> > -.w = info->width,
> > -.h = 1,
> > +.w = isl_align_npot(info->width, fmtl->bw),
> > +.h = fmtl->bh,
> >  .d = 1,
> >  .a = info->array_len,
>
> Mustn't the height assertion in get_image_offset_sa_gen9_1d() be
> removed with this change?
>

Good catch!  I'll get that fixed and send a v2

--Jason


> >   };
> > --
> > 2.5.0.400.gff86faf
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nv50/ir: drop unused NVISA_XXX_CHIPSET constants

2016-09-14 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
index 58a5d38..e85b5fa 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
@@ -75,8 +75,6 @@ struct nv50_ir_prog_symbol
uint32_t offset;
 };
 
-#define NVISA_GF100_CHIPSET_C0 0xc0
-#define NVISA_GF100_CHIPSET_D0 0xd0
 #define NVISA_GK104_CHIPSET0xe0
 #define NVISA_GK20A_CHIPSET0xea
 #define NVISA_GM107_CHIPSET0x110
-- 
2.8.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nvc0: allow to force compiling programs in debug build

2016-09-14 Thread Samuel Pitoiset
This adds a new envvar called NOUVEAU_FORCE_CHIPSET which allows
to compile shaders with a different target, especially useful for
shader-db.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
index 9f29b29..428a010 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
@@ -562,6 +562,14 @@ nvc0_program_translate(struct nvc0_program *prog, uint16_t 
chipset,
info->bin.sourceRep = NV50_PROGRAM_IR_TGSI;
info->bin.source = (void *)prog->pipe.tokens;
 
+#ifdef DEBUG
+   info->target = debug_get_num_option("NOUVEAU_FORCE_CHIPSET", chipset);
+   info->optLevel = debug_get_num_option("NV50_PROG_OPTIMIZE", 3);
+   info->dbgFlags = debug_get_num_option("NV50_PROG_DEBUG", 0);
+#else
+   info->optLevel = 3;
+#endif
+
info->io.genUserClip = prog->vp.num_ucps;
info->io.auxCBSlot = 15;
info->io.msInfoCBSlot = 15;
@@ -570,12 +578,12 @@ nvc0_program_translate(struct nvc0_program *prog, 
uint16_t chipset,
info->io.msInfoBase = NVC0_CB_AUX_MS_INFO;
info->io.bufInfoBase = NVC0_CB_AUX_BUF_INFO(0);
info->io.suInfoBase = NVC0_CB_AUX_SU_INFO(0);
-   if (chipset >= NVISA_GK104_CHIPSET) {
+   if (info->target >= NVISA_GK104_CHIPSET) {
   info->io.texBindBase = NVC0_CB_AUX_TEX_INFO(0);
}
 
if (prog->type == PIPE_SHADER_COMPUTE) {
-  if (chipset >= NVISA_GK104_CHIPSET) {
+  if (info->target >= NVISA_GK104_CHIPSET) {
  info->io.auxCBSlot = 7;
  info->io.msInfoCBSlot = 7;
  info->io.uboInfoBase = NVC0_CB_AUX_UBO_INFO(0);
@@ -587,13 +595,6 @@ nvc0_program_translate(struct nvc0_program *prog, uint16_t 
chipset,
 
info->assignSlots = nvc0_program_assign_varying_slots;
 
-#ifdef DEBUG
-   info->optLevel = debug_get_num_option("NV50_PROG_OPTIMIZE", 3);
-   info->dbgFlags = debug_get_num_option("NV50_PROG_DEBUG", 0);
-#else
-   info->optLevel = 3;
-#endif
-
ret = nv50_ir_generate_code(info);
if (ret) {
   NOUVEAU_ERR("shader translation failed: %i\n", ret);
-- 
2.8.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallium/util: make use of strtol() in debug_get_num_option()

2016-09-14 Thread Samuel Pitoiset
This allows to use hexadecimal numbers which are automatically
detected by strtol() when the base is 0.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/auxiliary/util/u_debug.c | 25 -
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_debug.c 
b/src/gallium/auxiliary/util/u_debug.c
index 4619526..dd3e167 100644
--- a/src/gallium/auxiliary/util/u_debug.c
+++ b/src/gallium/auxiliary/util/u_debug.c
@@ -203,25 +203,16 @@ debug_get_num_option(const char *name, long dfault)
const char *str;
 
str = os_get_option(name);
-   if (!str)
+   if (!str) {
   result = dfault;
-   else {
-  long sign;
-  char c;
-  c = *str++;
-  if (c == '-') {
-sign = -1;
-c = *str++;
-  }
-  else {
-sign = 1;
-  }
-  result = 0;
-  while ('0' <= c && c <= '9') {
-result = result*10 + (c - '0');
-c = *str++;
+   } else {
+  char *endptr;
+
+  result = strtol(str, , 0);
+  if (str == endptr) {
+ /* Restore the default value when no digits were found. */
+ result = dfault;
   }
-  result *= sign;
}
 
if (debug_get_option_should_print())
-- 
2.8.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/aubinator: Properly handle batch buffer chaining

2016-09-14 Thread Gandikota, Sirisha
aubinator.c: In function ‘parse_commands’:
aubinator.c:768:19: warning: suggest parentheses around comparison in operand 
of ‘&’ [-Wparentheses]
  if (p[0] & (1 << 22) == 0)

-Original Message-
From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf Of 
Jason Ekstrand
Sent: Thursday, September 08, 2016 9:12 PM
To: mesa-dev@lists.freedesktop.org
Cc: Ekstrand, Jason 
Subject: [Mesa-dev] [PATCH] intel/aubinator: Properly handle batch buffer 
chaining

The original aubinator that Kristian wrote had a bug in the handling of 
MI_BATCH_BUFFER_START that propagated into the version in upstream mesa.
Say you have two batch buffers A and B where A calls MI_BATCH_BUFFER_START to 
jump to B.  Now suppose that A and B are placed consecutively in the address 
space with A before B.  What can happen is that aubinator will process A, and 
start processing B when it should.  When it gets done with B, it returns and 
continues to process A.  Because A doesn't have any more data after the 
MI_BATCH_BUFFER_START, it will just process a bunch of NOPs until it gets to 
the next buffer in memory which is B again.  In this scenario B gets processed 
twice which can be very confusing.  If you place things in memory just right, 
you can also end up with infinite loops which are all sorts of fun.

The root problem here is that it continues to process commands even after an 
MI_BATCH_BUFFER_START.  By simply checking the 2nd level we can detect whether 
or not the command buffer we are jumping to will return here and stop 
processing commands if it won't.

Signed-off-by: Jason Ekstrand 
>
---
 src/intel/tools/aubinator.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c index 
fe1f369..73a7f21 100644
--- a/src/intel/tools/aubinator.c
+++ b/src/intel/tools/aubinator.c
@@ -766,6 +766,13 @@ parse_commands(struct gen_spec *spec, uint32_t *cmds, int 
size, int engine)
 start = p[1];

  parse_commands(spec, gtt + start, 1 << 20, engine);
+
+ /* MI_BATCH_BUFFER_START with "2nd Level Batch Buffer" unset acts
+  * like a goto.  No commands after such a MI_BATCH_BUFFER_START will
+  * get processed so we should bail as well.
+  */
+ if (p[0] & (1 << 22) == 0)

[SG]: The above line might need extra pair of parenthesis around comparison to 
get rid of the compile time warning I was seeing as below. Otherwise, Patch 
looks good to me.
aubinator.c: In function ‘parse_commands’:
aubinator.c:768:19: warning: suggest parentheses around comparison in operand 
of ‘&’ [-Wparentheses]
  if (p[0] & (1 << 22) == 0)

+break;
   } else if ((p[0] & 0x) == AUB_MI_BATCH_BUFFER_END) {
  break;
   }
--
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] aubinator: add a custom handler for immediate register load

2016-09-14 Thread Gandikota, Sirisha


-Original Message-
From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf Of 
Lionel Landwerlin
Sent: Friday, September 09, 2016 3:26 AM
To: mesa-dev@lists.freedesktop.org
Cc: Landwerlin, Lionel G 
Subject: [Mesa-dev] [PATCH] aubinator: add a custom handler for immediate 
register load

Transforming this :

0x00c77084:  0x1101:  MI_LOAD_REGISTER_IMM
0x00c77088:  0xb020 : Dword 1
Register Offset: 0xb020
0x00c7708c:  0x00880038 : Dword 2
Data DWord: 8912952

Into this:

0x00c77084:  0x1101:  MI_LOAD_REGISTER_IMM
0x00c77088:  0xb020 : Dword 1
Register Offset: 0xb020
0x00c7708c:  0x00880038 : Dword 2
Data DWord: 8912952
SLM Enable: 0
URB Allocation: 28
URB Low Bandwidth: 0
RO Allocation: 32
RO Low Bandwidth: 0
DC Allocation: 4
DC Low Bandwidth: 0

Signed-off-by: Lionel Landwerlin 
>
---
 src/intel/tools/aubinator.c | 14 +-
 src/intel/tools/decoder.c   | 30 --
 src/intel/tools/decoder.h   |  4 
 3 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c index 
fe1f369..74229a9 100644
--- a/src/intel/tools/aubinator.c
+++ b/src/intel/tools/aubinator.c
@@ -621,6 +621,15 @@ handle_3dstate_scissor_state_pointers(struct gen_spec 
*spec, uint32_t *p)
decode_structure(spec, scissor_rect, gtt + start);  }

+static void
+handle_load_register_imm(struct gen_spec *spec, uint32_t *p) {
+   struct gen_group *reg = gen_spec_find_register(spec, p[1]);
+
+   if (reg != NULL)
+  decode_structure(spec, reg, [2]); }
+
 #define ARRAY_LENGTH(a) (sizeof (a) / sizeof (a)[0])

 #define STATE_BASE_ADDRESS  0x6101
@@ -659,6 +668,8 @@ handle_3dstate_scissor_state_pointers(struct gen_spec 
*spec, uint32_t *p)
 #define _3DSTATE_CC_STATE_POINTERS  0x780e
 #define _3DSTATE_SCISSOR_STATE_POINTERS 0x780f

+#define _MI_LOAD_REGISTER_IMM   0x1100
+
 struct custom_handler {
uint32_t opcode;
void (*handle)(struct gen_spec *spec, uint32_t *p); @@ -692,7 +703,8 @@ 
struct custom_handler {
{ _3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP, 
handle_3dstate_viewport_state_pointers_sf_clip },
{ _3DSTATE_BLEND_STATE_POINTERS, handle_3dstate_blend_state_pointers },
{ _3DSTATE_CC_STATE_POINTERS, handle_3dstate_cc_state_pointers },
-   { _3DSTATE_SCISSOR_STATE_POINTERS, handle_3dstate_scissor_state_pointers }
+   { _3DSTATE_SCISSOR_STATE_POINTERS, handle_3dstate_scissor_state_pointers },
+   { _MI_LOAD_REGISTER_IMM, handle_load_register_imm }
 };

 static void
diff --git a/src/intel/tools/decoder.c b/src/intel/tools/decoder.c index 
f080437..d7e6a3f 100644
--- a/src/intel/tools/decoder.c
+++ b/src/intel/tools/decoder.c
@@ -88,6 +88,16 @@ gen_spec_find_struct(struct gen_spec *spec, const char *name)
return NULL;
 }

+struct gen_group *
+gen_spec_find_register(struct gen_spec *spec, uint32_t offset) {
+   for (int i = 0; i < spec->nregisters; i++)
+  if (spec->registers[i]->register_offset == offset)
+ return spec->registers[i];
+
+   return NULL;
+}
+
 uint32_t
 gen_spec_get_gen(struct gen_spec *spec)  { @@ -168,6 +178,20 @@ 
get_group_offset_count(struct parser_context *ctx, const char *name,
return;
 }

+static void
+get_register_offset(struct parser_context *ctx, const char *name,
+const char **atts, uint32_t *offset) {
+   char *p;
+   int i;
+
+   for (i = 0; atts[i]; i += 2) {
+  if (strcmp(atts[i], "num") == 0)
+ *offset = strtoul(atts[i + 1], , 0);
+   }
+   return;
+}
+

[SG]:  "ctx" , "name" arguments seems unnecessary when you are not using it. I 
would get rid of these.
Except for this, patch works for me.


 static inline uint64_t
 mask(int start, int end)
 {
@@ -288,9 +312,11 @@ start_element(void *data, const char *element_name, const 
char **atts)

   ctx->spec->gen = MAKE_GEN(major, minor);
} else if (strcmp(element_name, "instruction") == 0 ||
-  strcmp(element_name, "struct") == 0 ||
-  strcmp(element_name, "register") == 0) {
+  strcmp(element_name, "struct") == 0) {
+  ctx->group = create_group(ctx, name, atts);
+   } else if (strcmp(element_name, "register") == 0) {
   ctx->group = create_group(ctx, name, atts);
+  get_register_offset(ctx, name, atts,
+ >group->register_offset);

[SG]: same as above comment... "ctx", "name" arguments unnecessary
If you fix the above comments, patch looks okay to me.

} else if (strcmp(element_name, "group") == 0) {
   get_group_offset_count(ctx, name, atts, >group->group_offset,
  >group->group_count); diff --git 
a/src/intel/tools/decoder.h b/src/intel/tools/decoder.h index b46e451..9b74cb4 
100644
--- a/src/intel/tools/decoder.h
+++ b/src/intel/tools/decoder.h
@@ -39,6 +39,7 @@ 

Re: [Mesa-dev] [PATCH v2 3/7] intel/isl: Add support for 1-D compressed textures

2016-09-14 Thread Nanley Chery
On Mon, Sep 12, 2016 at 05:58:20PM -0700, Jason Ekstrand wrote:
> Compressed 1-D textures are a well-defined thing in both GL and Vulkan.
> ---
>  src/intel/isl/isl.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> index a75fddf..185984d 100644
> --- a/src/intel/isl/isl.c
> +++ b/src/intel/isl/isl.c
> @@ -518,7 +518,6 @@ isl_calc_phys_level0_extent_sa(const struct isl_device 
> *dev,
>assert(info->height == 1);
>assert(info->depth == 1);
>assert(info->samples == 1);
> -  assert(!isl_format_is_compressed(info->format));
>  
>switch (dim_layout) {
>case ISL_DIM_LAYOUT_GEN4_3D:
> @@ -527,8 +526,8 @@ isl_calc_phys_level0_extent_sa(const struct isl_device 
> *dev,
>case ISL_DIM_LAYOUT_GEN9_1D:
>case ISL_DIM_LAYOUT_GEN4_2D:
>   *phys_level0_sa = (struct isl_extent4d) {
> -.w = info->width,
> -.h = 1,
> +.w = isl_align_npot(info->width, fmtl->bw),
> +.h = fmtl->bh,
>  .d = 1,
>  .a = info->array_len,

Mustn't the height assertion in get_image_offset_sa_gen9_1d() be
removed with this change?

>   };
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] glx/glvnd: Use bsearch() in FindGLXFunction instead of open-coding it

2016-09-14 Thread Adam Jackson
Signed-off-by: Adam Jackson 
---
 src/glx/glxglvnd.c | 30 ++
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/src/glx/glxglvnd.c b/src/glx/glxglvnd.c
index 2fc9b00..b6b4151 100644
--- a/src/glx/glxglvnd.c
+++ b/src/glx/glxglvnd.c
@@ -1,11 +1,11 @@
 #include 
+#include 
 #include 
 
 #include "glvnd/libglxabi.h"
 
 #include "glxglvnd.h"
 
-
 static Bool __glXGLVNDIsScreenSupported(Display *dpy, int screen)
 {
 /* TODO: Think of a better heuristic... */
@@ -17,26 +17,24 @@ static void *__glXGLVNDGetProcAddress(const GLubyte 
*procName)
 return glXGetProcAddressARB(procName);
 }
 
+static int
+compare(const void *l, const void *r)
+{
+const char *s = *(const char **)r;
+return strcmp(l, s);
+}
+
 static unsigned FindGLXFunction(const GLubyte *name)
 {
-int first = 0;
-int last = DI_FUNCTION_COUNT - 1;
+const char **match;
 
-while (first <= last) {
-int middle = (first + last) / 2;
-int comp = strcmp(__glXDispatchTableStrings[middle],
-  (const char *) name);
+match = bsearch(name, __glXDispatchTableStrings, DI_FUNCTION_COUNT,
+sizeof(const char *), compare);
 
-if (comp < 0)
-first = middle + 1;
-else if (comp > 0)
-last = middle - 1;
-else
-return middle;
-}
+if (match == NULL)
+return DI_FUNCTION_COUNT;
 
-/* Just point to the dummy entry at the end of the respective table */
-return DI_FUNCTION_COUNT;
+return match - __glXDispatchTableStrings;
 }
 
 static void *__glXGLVNDGetDispatchAddress(const GLubyte *procName)
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] glx/glvnd: Fix dispatch function names and indices

2016-09-14 Thread Adam Jackson
As this array was not actually sorted, FindGLXFunction's binary search
would only sometimes work.

Signed-off-by: Adam Jackson 
---
 src/glx/g_glxglvnddispatchfuncs.c   | 254 ++--
 src/glx/g_glxglvnddispatchindices.h |  36 ++---
 2 files changed, 144 insertions(+), 146 deletions(-)

diff --git a/src/glx/g_glxglvnddispatchfuncs.c 
b/src/glx/g_glxglvnddispatchfuncs.c
index e6b9c0b..b5e3398 100644
--- a/src/glx/g_glxglvnddispatchfuncs.c
+++ b/src/glx/g_glxglvnddispatchfuncs.c
@@ -17,16 +17,19 @@ const char * const __glXDispatchTableStrings[DI_LAST_INDEX] 
= {
 #define __ATTRIB(field) \
 [DI_##field] = "glX"#field
 
+__ATTRIB(BindSwapBarrierSGIX),
 __ATTRIB(BindTexImageEXT),
 // glXChooseFBConfig implemented by libglvnd
 __ATTRIB(ChooseFBConfigSGIX),
 // glXChooseVisual implemented by libglvnd
 // glXCopyContext implemented by libglvnd
+__ATTRIB(CopySubBufferMESA),
 // glXCreateContext implemented by libglvnd
 __ATTRIB(CreateContextAttribsARB),
 __ATTRIB(CreateContextWithConfigSGIX),
 __ATTRIB(CreateGLXPbufferSGIX),
 // glXCreateGLXPixmap implemented by libglvnd
+__ATTRIB(CreateGLXPixmapMESA),
 __ATTRIB(CreateGLXPixmapWithConfigSGIX),
 // glXCreateNewContext implemented by libglvnd
 // glXCreatePbuffer implemented by libglvnd
@@ -51,54 +54,50 @@ const char * const __glXDispatchTableStrings[DI_LAST_INDEX] 
= {
 __ATTRIB(GetFBConfigAttribSGIX),
 __ATTRIB(GetFBConfigFromVisualSGIX),
 // glXGetFBConfigs implemented by libglvnd
+__ATTRIB(GetMscRateOML),
 // glXGetProcAddress implemented by libglvnd
 // glXGetProcAddressARB implemented by libglvnd
+__ATTRIB(GetScreenDriver),
 // glXGetSelectedEvent implemented by libglvnd
 __ATTRIB(GetSelectedEventSGIX),
+__ATTRIB(GetSwapIntervalMESA),
+__ATTRIB(GetSyncValuesOML),
 __ATTRIB(GetVideoSyncSGI),
 // glXGetVisualFromFBConfig implemented by libglvnd
 __ATTRIB(GetVisualFromFBConfigSGIX),
 // glXImportContextEXT implemented by libglvnd
 // glXIsDirect implemented by libglvnd
+__ATTRIB(JoinSwapGroupSGIX),
 // glXMakeContextCurrent implemented by libglvnd
 // glXMakeCurrent implemented by libglvnd
 // glXQueryContext implemented by libglvnd
 __ATTRIB(QueryContextInfoEXT),
+__ATTRIB(QueryCurrentRendererIntegerMESA),
+__ATTRIB(QueryCurrentRendererStringMESA),
 // glXQueryDrawable implemented by libglvnd
 // glXQueryExtension implemented by libglvnd
 // glXQueryExtensionsString implemented by libglvnd
 __ATTRIB(QueryGLXPbufferSGIX),
+__ATTRIB(QueryMaxSwapBarriersSGIX),
+__ATTRIB(QueryRendererIntegerMESA),
+__ATTRIB(QueryRendererStringMESA),
 // glXQueryServerString implemented by libglvnd
 // glXQueryVersion implemented by libglvnd
+__ATTRIB(ReleaseBuffersMESA),
 __ATTRIB(ReleaseTexImageEXT),
 // glXSelectEvent implemented by libglvnd
 __ATTRIB(SelectEventSGIX),
 // glXSwapBuffers implemented by libglvnd
+__ATTRIB(SwapBuffersMscOML),
+__ATTRIB(SwapIntervalMESA),
 __ATTRIB(SwapIntervalSGI),
 // glXUseXFont implemented by libglvnd
+__ATTRIB(WaitForMscOML),
+__ATTRIB(WaitForSbcOML),
 // glXWaitGL implemented by libglvnd
 __ATTRIB(WaitVideoSyncSGI),
 // glXWaitX implemented by libglvnd
 
-__ATTRIB(glXBindSwapBarrierSGIX),
-__ATTRIB(glXCopySubBufferMESA),
-__ATTRIB(glXCreateGLXPixmapMESA),
-__ATTRIB(glXGetMscRateOML),
-__ATTRIB(glXGetScreenDriver),
-__ATTRIB(glXGetSwapIntervalMESA),
-__ATTRIB(glXGetSyncValuesOML),
-__ATTRIB(glXJoinSwapGroupSGIX),
-__ATTRIB(glXQueryCurrentRendererIntegerMESA),
-__ATTRIB(glXQueryCurrentRendererStringMESA),
-__ATTRIB(glXQueryMaxSwapBarriersSGIX),
-__ATTRIB(glXQueryRendererIntegerMESA),
-__ATTRIB(glXQueryRendererStringMESA),
-__ATTRIB(glXReleaseBuffersMESA),
-__ATTRIB(glXSwapBuffersMscOML),
-__ATTRIB(glXSwapIntervalMESA),
-__ATTRIB(glXWaitForMscOML),
-__ATTRIB(glXWaitForSbcOML),
-
 #undef __ATTRIB
 };
 
@@ -557,49 +556,49 @@ static int dispatch_WaitVideoSyncSGI(int divisor, int 
remainder,
 
 
 
-static void dispatch_glXBindSwapBarrierSGIX(Display *dpy, GLXDrawable drawable,
+static void dispatch_BindSwapBarrierSGIX(Display *dpy, GLXDrawable drawable,
 int barrier)
 {
-PFNGLXBINDSWAPBARRIERSGIXPROC pglXBindSwapBarrierSGIX;
+PFNGLXBINDSWAPBARRIERSGIXPROC pBindSwapBarrierSGIX;
 __GLXvendorInfo *dd;
 
 dd = GetDispatchFromDrawable(dpy, drawable);
 if (dd == NULL)
 return;
 
-__FETCH_FUNCTION_PTR(glXBindSwapBarrierSGIX);
-if (pglXBindSwapBarrierSGIX == NULL)
+__FETCH_FUNCTION_PTR(BindSwapBarrierSGIX);
+if (pBindSwapBarrierSGIX == NULL)
 return;
 
-(*pglXBindSwapBarrierSGIX)(dpy, drawable, barrier);
+(*pBindSwapBarrierSGIX)(dpy, drawable, barrier);
 }
 
 
 
-static void 

Re: [Mesa-dev] [PATCH] glvnd: Fix dynamic GLX entrypoint lookup

2016-09-14 Thread Adam Jackson
On Mon, 2016-09-05 at 11:23 +0100, Eric Engestrom wrote:

> > +static int
> > +compare(const void *l, const void *r)
> > +{
> > +const char *s = *(const char **)r;
>
> Shouldn't we do the same with `l`?

No. 'l' is the key we're looking for (the function name), r is the
current element of the array of char* we're searching. We want to
dereference r once to get the string we're comparing to, but *l is just
'g'.

Split series sent.

- ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] glx/glvnd: Don't modify the dummy slot in the dispatch table

2016-09-14 Thread Adam Jackson
Signed-off-by: Adam Jackson 
---
 src/glx/glxglvnd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/glx/glxglvnd.c b/src/glx/glxglvnd.c
index 098304d..2fc9b00 100644
--- a/src/glx/glxglvnd.c
+++ b/src/glx/glxglvnd.c
@@ -50,6 +50,9 @@ static void __glXGLVNDSetDispatchIndex(const GLubyte 
*procName, int index)
 {
 unsigned internalIndex = FindGLXFunction(procName);
 
+if (internalIndex == DI_FUNCTION_COUNT)
+return; /* unknown or static dispatch */
+
 __glXDispatchTableIndices[internalIndex] = index;
 }
 
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] i965/fs: Use NIR for handling forced per-sample interpolation

2016-09-14 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 40 +++-
 src/mesa/drivers/dri/i965/brw_nir.c  |  9 ++--
 src/mesa/drivers/dri/i965/brw_nir.h  |  3 ++-
 3 files changed, 12 insertions(+), 40 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 9dbb699..a3fc839 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -6461,8 +6461,7 @@ move_interpolation_to_top(nir_shader *nir)
 static void
 brw_nir_set_default_interpolation(const struct gen_device_info *devinfo,
   struct nir_shader *nir,
-  bool api_flat_shade,
-  bool per_sample_interpolation)
+  bool api_flat_shade)
 {
assert(nir->stage == MESA_SHADER_FRAGMENT);
 
@@ -6481,13 +6480,6 @@ brw_nir_set_default_interpolation(const struct 
gen_device_info *devinfo,
 : INTERP_MODE_SMOOTH;
   }
 
-  /* Apply 'sample' if necessary for API state. */
-  if (per_sample_interpolation &&
-  var->data.interpolation != INTERP_MODE_FLAT) {
- var->data.centroid = false;
- var->data.sample = true;
-  }
-
   /* On Ironlake and below, there is only one interpolation mode.
* Centroid interpolation doesn't mean anything on this hardware --
* there is no multisampling.
@@ -6497,32 +6489,6 @@ brw_nir_set_default_interpolation(const struct 
gen_device_info *devinfo,
  var->data.sample = false;
   }
}
-
-   if (per_sample_interpolation) {
-  nir_foreach_block(block, nir_shader_get_entrypoint(nir)) {
- nir_foreach_instr(instr, block) {
-if (instr->type != nir_instr_type_intrinsic)
-   continue;
-
-nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
-if (intrin->intrinsic != nir_intrinsic_interp_var_at_centroid)
-   continue;
-
-nir_variable *var = intrin->variables[0]->var;
-if (var->data.interpolation == INTERP_MODE_FLAT)
-   continue;
-
-/* The description of the interpolateAtCentroid intrinsic is that
- * it interpolates the variable as if it had the "centroid"
- * qualifier.  When executing with per_sample_interpolation, this
- * is equivalent to having the "sample" qualifier.  Just convert
- * it to a load_var instead.
- */
-assert(var->data.sample);
-intrin->intrinsic = nir_intrinsic_load_var;
- }
-  }
-   }
 }
 
 /**
@@ -6583,8 +6549,8 @@ brw_compile_fs(const struct brw_compiler *compiler, void 
*log_data,
shader = brw_nir_apply_sampler_key(shader, compiler->devinfo, >tex,
   true);
brw_nir_set_default_interpolation(compiler->devinfo, shader,
- key->flat_shade, key->persample_interp);
-   brw_nir_lower_fs_inputs(shader);
+ key->flat_shade);
+   brw_nir_lower_fs_inputs(shader, key);
brw_nir_lower_fs_outputs(shader);
if (!key->multisample_fbo)
   NIR_PASS_V(shader, demote_sample_qualifiers);
diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index 2273299..1a9ca8a 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -281,13 +281,18 @@ brw_nir_lower_tes_inputs(nir_shader *nir, const struct 
brw_vue_map *vue_map)
 }
 
 void
-brw_nir_lower_fs_inputs(nir_shader *nir)
+brw_nir_lower_fs_inputs(nir_shader *nir,
+const struct brw_wm_prog_key *key)
 {
foreach_list_typed(nir_variable, var, node, >inputs) {
   var->data.driver_location = var->data.location;
}
 
-   nir_lower_io(nir, nir_var_shader_in, type_size_vec4, 0);
+   nir_lower_io_options lower_io_options = 0;
+   if (key->persample_interp)
+  lower_io_options |= nir_lower_io_force_sample_interpolation;
+
+   nir_lower_io(nir, nir_var_shader_in, type_size_vec4, lower_io_options);
 
/* This pass needs actual constants */
nir_opt_constant_folding(nir);
diff --git a/src/mesa/drivers/dri/i965/brw_nir.h 
b/src/mesa/drivers/dri/i965/brw_nir.h
index b025d55..51d2f63 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.h
+++ b/src/mesa/drivers/dri/i965/brw_nir.h
@@ -106,7 +106,8 @@ void brw_nir_lower_vs_inputs(nir_shader *nir,
 void brw_nir_lower_vue_inputs(nir_shader *nir, bool is_scalar,
   const struct brw_vue_map *vue_map);
 void brw_nir_lower_tes_inputs(nir_shader *nir, const struct brw_vue_map *vue);
-void brw_nir_lower_fs_inputs(nir_shader *nir);
+void brw_nir_lower_fs_inputs(nir_shader *nir,
+ const struct brw_wm_prog_key *key);
 void brw_nir_lower_vue_outputs(nir_shader *nir, bool is_scalar);
 void brw_nir_lower_tcs_outputs(nir_shader *nir, 

[Mesa-dev] [PATCH 4/4] i965/nir: Roll set_default_interpolation into lower_fs_inputs

2016-09-14 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 40 +---
 src/mesa/drivers/dri/i965/brw_nir.c  | 24 ++
 src/mesa/drivers/dri/i965/brw_nir.h  |  1 +
 3 files changed, 26 insertions(+), 39 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index a3fc839..213e42b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -6456,42 +6456,6 @@ move_interpolation_to_top(nir_shader *nir)
 }
 
 /**
- * Apply default interpolation settings to FS inputs which don't specify any.
- */
-static void
-brw_nir_set_default_interpolation(const struct gen_device_info *devinfo,
-  struct nir_shader *nir,
-  bool api_flat_shade)
-{
-   assert(nir->stage == MESA_SHADER_FRAGMENT);
-
-   nir_foreach_variable(var, >inputs) {
-  /* Apply default interpolation mode.
-   *
-   * Everything defaults to smooth except for the legacy GL color
-   * built-in variables, which might be flat depending on API state.
-   */
-  if (var->data.interpolation == INTERP_MODE_NONE) {
- const bool flat = api_flat_shade &&
-(var->data.location == VARYING_SLOT_COL0 ||
- var->data.location == VARYING_SLOT_COL1);
-
- var->data.interpolation = flat ? INTERP_MODE_FLAT
-: INTERP_MODE_SMOOTH;
-  }
-
-  /* On Ironlake and below, there is only one interpolation mode.
-   * Centroid interpolation doesn't mean anything on this hardware --
-   * there is no multisampling.
-   */
-  if (devinfo->gen < 6) {
- var->data.centroid = false;
- var->data.sample = false;
-  }
-   }
-}
-
-/**
  * Demote per-sample barycentric intrinsics to centroid.
  *
  * Useful when rendering to a non-multisampled buffer.
@@ -6548,9 +6512,7 @@ brw_compile_fs(const struct brw_compiler *compiler, void 
*log_data,
nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
shader = brw_nir_apply_sampler_key(shader, compiler->devinfo, >tex,
   true);
-   brw_nir_set_default_interpolation(compiler->devinfo, shader,
- key->flat_shade);
-   brw_nir_lower_fs_inputs(shader, key);
+   brw_nir_lower_fs_inputs(shader, compiler->devinfo, key);
brw_nir_lower_fs_outputs(shader);
if (!key->multisample_fbo)
   NIR_PASS_V(shader, demote_sample_qualifiers);
diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index 1a9ca8a..05d0730 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -282,10 +282,34 @@ brw_nir_lower_tes_inputs(nir_shader *nir, const struct 
brw_vue_map *vue_map)
 
 void
 brw_nir_lower_fs_inputs(nir_shader *nir,
+const struct gen_device_info *devinfo,
 const struct brw_wm_prog_key *key)
 {
foreach_list_typed(nir_variable, var, node, >inputs) {
   var->data.driver_location = var->data.location;
+
+  /* Apply default interpolation mode.
+   *
+   * Everything defaults to smooth except for the legacy GL color
+   * built-in variables, which might be flat depending on API state.
+   */
+  if (var->data.interpolation == INTERP_MODE_NONE) {
+ const bool flat = key->flat_shade &&
+(var->data.location == VARYING_SLOT_COL0 ||
+ var->data.location == VARYING_SLOT_COL1);
+
+ var->data.interpolation = flat ? INTERP_MODE_FLAT
+: INTERP_MODE_SMOOTH;
+  }
+
+  /* On Ironlake and below, there is only one interpolation mode.
+   * Centroid interpolation doesn't mean anything on this hardware --
+   * there is no multisampling.
+   */
+  if (devinfo->gen < 6) {
+ var->data.centroid = false;
+ var->data.sample = false;
+  }
}
 
nir_lower_io_options lower_io_options = 0;
diff --git a/src/mesa/drivers/dri/i965/brw_nir.h 
b/src/mesa/drivers/dri/i965/brw_nir.h
index 51d2f63..425d6ce 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.h
+++ b/src/mesa/drivers/dri/i965/brw_nir.h
@@ -107,6 +107,7 @@ void brw_nir_lower_vue_inputs(nir_shader *nir, bool 
is_scalar,
   const struct brw_vue_map *vue_map);
 void brw_nir_lower_tes_inputs(nir_shader *nir, const struct brw_vue_map *vue);
 void brw_nir_lower_fs_inputs(nir_shader *nir,
+ const struct gen_device_info *devinfo,
  const struct brw_wm_prog_key *key);
 void brw_nir_lower_vue_outputs(nir_shader *nir, bool is_scalar);
 void brw_nir_lower_tcs_outputs(nir_shader *nir, const struct brw_vue_map *vue);
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/4] nir: Add a flag to lower_io to force "sample" interpolation

2016-09-14 Thread Jason Ekstrand
---
 src/compiler/nir/nir.h  | 10 +-
 src/compiler/nir/nir_lower_io.c | 20 ++--
 src/gallium/drivers/freedreno/ir3/ir3_cmdline.c |  2 +-
 src/intel/blorp/blorp.c |  2 +-
 src/mesa/drivers/dri/i965/brw_nir.c | 18 +-
 src/mesa/drivers/dri/i965/brw_program.c |  4 ++--
 src/mesa/state_tracker/st_glsl_to_nir.cpp   |  3 ++-
 7 files changed, 38 insertions(+), 21 deletions(-)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index ff7c422..6f10477 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2395,9 +2395,17 @@ void nir_assign_var_locations(struct exec_list 
*var_list, unsigned *size,
   unsigned base_offset,
   int (*type_size)(const struct glsl_type *));
 
+typedef enum {
+   /* If set, this forces all non-flat fragment shader inputs to be
+* interpolated as if with the "sample" qualifier.  This requires
+* nir_shader_compiler_options::use_interpolated_input_intrinsics.
+*/
+   nir_lower_io_force_sample_interpolation = (1 << 1),
+} nir_lower_io_options;
 void nir_lower_io(nir_shader *shader,
   nir_variable_mode modes,
-  int (*type_size)(const struct glsl_type *));
+  int (*type_size)(const struct glsl_type *),
+  nir_lower_io_options);
 nir_src *nir_get_io_offset_src(nir_intrinsic_instr *instr);
 nir_src *nir_get_io_vertex_index_src(nir_intrinsic_instr *instr);
 
diff --git a/src/compiler/nir/nir_lower_io.c b/src/compiler/nir/nir_lower_io.c
index b36836f..742bc1f2 100644
--- a/src/compiler/nir/nir_lower_io.c
+++ b/src/compiler/nir/nir_lower_io.c
@@ -39,6 +39,7 @@ struct lower_io_state {
void *mem_ctx;
int (*type_size)(const struct glsl_type *type);
nir_variable_mode modes;
+   nir_lower_io_options options;
 };
 
 void
@@ -205,7 +206,8 @@ lower_load(nir_intrinsic_instr *intrin, struct 
lower_io_state *state,
  assert(vertex_index == NULL);
 
  nir_intrinsic_op bary_op;
- if (var->data.sample)
+ if (var->data.sample ||
+ (state->options & nir_lower_io_force_sample_interpolation))
 bary_op = nir_intrinsic_load_barycentric_sample;
  else if (var->data.centroid)
 bary_op = nir_intrinsic_load_barycentric_centroid;
@@ -347,7 +349,9 @@ lower_interpolate_at(nir_intrinsic_instr *intrin, struct 
lower_io_state *state,
nir_intrinsic_op bary_op;
switch (intrin->intrinsic) {
case nir_intrinsic_interp_var_at_centroid:
-  bary_op = nir_intrinsic_load_barycentric_centroid;
+  bary_op = (state->options & nir_lower_io_force_sample_interpolation) ?
+nir_intrinsic_load_barycentric_sample :
+nir_intrinsic_load_barycentric_centroid;
   break;
case nir_intrinsic_interp_var_at_sample:
   bary_op = nir_intrinsic_load_barycentric_at_sample;
@@ -505,7 +509,8 @@ nir_lower_io_block(nir_block *block,
 static void
 nir_lower_io_impl(nir_function_impl *impl,
   nir_variable_mode modes,
-  int (*type_size)(const struct glsl_type *))
+  int (*type_size)(const struct glsl_type *),
+  nir_lower_io_options options)
 {
struct lower_io_state state;
 
@@ -513,6 +518,7 @@ nir_lower_io_impl(nir_function_impl *impl,
state.mem_ctx = ralloc_parent(impl);
state.modes = modes;
state.type_size = type_size;
+   state.options;
 
nir_foreach_block(block, impl) {
   nir_lower_io_block(block, );
@@ -524,11 +530,13 @@ nir_lower_io_impl(nir_function_impl *impl,
 
 void
 nir_lower_io(nir_shader *shader, nir_variable_mode modes,
- int (*type_size)(const struct glsl_type *))
+ int (*type_size)(const struct glsl_type *),
+ nir_lower_io_options options)
 {
nir_foreach_function(function, shader) {
-  if (function->impl)
- nir_lower_io_impl(function->impl, modes, type_size);
+  if (function->impl) {
+ nir_lower_io_impl(function->impl, modes, type_size, options);
+  }
}
 }
 
diff --git a/src/gallium/drivers/freedreno/ir3/ir3_cmdline.c 
b/src/gallium/drivers/freedreno/ir3/ir3_cmdline.c
index 41532fc..d749bfa 100644
--- a/src/gallium/drivers/freedreno/ir3/ir3_cmdline.c
+++ b/src/gallium/drivers/freedreno/ir3/ir3_cmdline.c
@@ -93,7 +93,7 @@ load_glsl(unsigned num_files, char* const* files, 
gl_shader_stage stage)
// TODO nir_assign_var_locations??
 
NIR_PASS_V(nir, nir_lower_system_values);
-   NIR_PASS_V(nir, nir_lower_io, nir_var_all, st_glsl_type_size);
+   NIR_PASS_V(nir, nir_lower_io, nir_var_all, st_glsl_type_size, 0);
NIR_PASS_V(nir, nir_lower_samplers, prog);
 
return nir;
diff --git a/src/intel/blorp/blorp.c b/src/intel/blorp/blorp.c
index 955e543..1c8fd55 100644
--- a/src/intel/blorp/blorp.c
+++ b/src/intel/blorp/blorp.c
@@ 

[Mesa-dev] [PATCH 1/4] i965/fs: Use sample interpolation for interpolateAtCentroid in persample mode

2016-09-14 Thread Jason Ekstrand
From the ARB_gpu_shader5 spec:

   The built-in functions interpolateAtCentroid() and interpolateAtSample()
   will sample variables as though they were declared with the "centroid"
   or "sample" qualifiers, respectively.

When running with persample dispatch forced by the API, we interpolate
anything that isn't flat as if it's qualified by "sample".  In order to
keep interpolateAtCentroid() consistent with the "centroid" qualifier, we
need to make interpolateAtCentroid() do sample interpolation instead.
Nothing in the GLSL spec guarantees that the result of
interpolateAtCentroid is uniform across samples in any way, so this is a
perfectly fine thing to do.

Fixes 8 of the new dEQP-VK.pipeline.multisample_interpolation.* Vulkan CTS
tests that specifically validate consistency between the "sample" qualifier
and interpolateAtSample()

Signed-off-by: Jason Ekstrand 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 75642d3..9dbb699 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -6497,6 +6497,32 @@ brw_nir_set_default_interpolation(const struct 
gen_device_info *devinfo,
  var->data.sample = false;
   }
}
+
+   if (per_sample_interpolation) {
+  nir_foreach_block(block, nir_shader_get_entrypoint(nir)) {
+ nir_foreach_instr(instr, block) {
+if (instr->type != nir_instr_type_intrinsic)
+   continue;
+
+nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
+if (intrin->intrinsic != nir_intrinsic_interp_var_at_centroid)
+   continue;
+
+nir_variable *var = intrin->variables[0]->var;
+if (var->data.interpolation == INTERP_MODE_FLAT)
+   continue;
+
+/* The description of the interpolateAtCentroid intrinsic is that
+ * it interpolates the variable as if it had the "centroid"
+ * qualifier.  When executing with per_sample_interpolation, this
+ * is equivalent to having the "sample" qualifier.  Just convert
+ * it to a load_var instead.
+ */
+assert(var->data.sample);
+intrin->intrinsic = nir_intrinsic_load_var;
+ }
+  }
+   }
 }
 
 /**
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gbm/dri2: propagate errors when creating a DMA-BUF fd

2016-09-14 Thread Nicholas Bishop
Thanks for review. Could someone with commit access push this for me?

On Wed, Sep 14, 2016 at 8:48 AM, Eric Engestrom
 wrote:
> On Thu, Sep 08, 2016 at 03:55:02PM -0400, Nicholas Bishop wrote:
>> Changed dri2_query_image to check the return value of
>> resource_get_handle and return GL_FALSE if an error occurs. Similarly
>> changed gbm_dri_bo_get_fd to check the return value of queryImage and
>> return -1 (an invalid file descriptor) if an error occurs.
>>
>> Updated the comment for gbm_bo_get_fd to say that -1 is returned if
>> an error occurs.
>>
>> For reference this is an example callstack that should propagate the
>> error back to the user:
>>
>> i915_drm_buffer_get_handle
>> i915_texture_get_handle
>> u_resource_get_handle_vtbl
>> dri2_query_image
>> gbm_dri_bo_get_fd
>> gbm_bo_get_fd
>>
>> Signed-off-by: Nicholas Bishop 
>
> Looks good to me
> Reviewed-by: Eric Engestrom 
>
>> ---
>>  src/gallium/state_trackers/dri/dri2.c | 11 +++
>>  src/gbm/backends/dri/gbm_dri.c|  8 +---
>>  src/gbm/main/gbm.c|  3 ++-
>>  3 files changed, 14 insertions(+), 8 deletions(-)
>>
>> diff --git a/src/gallium/state_trackers/dri/dri2.c 
>> b/src/gallium/state_trackers/dri/dri2.c
>> index 28f8078..c6260ba 100644
>> --- a/src/gallium/state_trackers/dri/dri2.c
>> +++ b/src/gallium/state_trackers/dri/dri2.c
>> @@ -979,10 +979,13 @@ dri2_query_image(__DRIimage *image, int attrib, int 
>> *value)
>>return GL_TRUE;
>> case __DRI_IMAGE_ATTRIB_FD:
>>whandle.type= DRM_API_HANDLE_TYPE_FD;
>> -  image->texture->screen->resource_get_handle(image->texture->screen,
>> - image->texture, , usage);
>> -  *value = whandle.handle;
>> -  return GL_TRUE;
>> +  if 
>> (image->texture->screen->resource_get_handle(image->texture->screen,
>> + image->texture, , usage)) {
>> + *value = whandle.handle;
>> + return GL_TRUE;
>> +  } else {
>> + return GL_FALSE;
>> +  }
>> case __DRI_IMAGE_ATTRIB_FORMAT:
>>*value = image->dri_format;
>>return GL_TRUE;
>> diff --git a/src/gbm/backends/dri/gbm_dri.c b/src/gbm/backends/dri/gbm_dri.c
>> index c3626e3..54b293a 100644
>> --- a/src/gbm/backends/dri/gbm_dri.c
>> +++ b/src/gbm/backends/dri/gbm_dri.c
>> @@ -589,9 +589,11 @@ gbm_dri_bo_get_fd(struct gbm_bo *_bo)
>> if (bo->image == NULL)
>>return -1;
>>
>> -   dri->image->queryImage(bo->image, __DRI_IMAGE_ATTRIB_FD, );
>> -
>> -   return fd;
>> +   if (dri->image->queryImage(bo->image, __DRI_IMAGE_ATTRIB_FD, )) {
>> +  return fd;
>> +   } else {
>> +  return -1;
>> +   }
>>  }
>>
>>  static void
>> diff --git a/src/gbm/main/gbm.c b/src/gbm/main/gbm.c
>> index 95b4c2c..c3a2ec33 100644
>> --- a/src/gbm/main/gbm.c
>> +++ b/src/gbm/main/gbm.c
>> @@ -242,7 +242,8 @@ gbm_bo_get_handle(struct gbm_bo *bo)
>>   * descriptor.
>>
>>   * \param bo The buffer object
>> - * \return Returns a file descriptor referring  to the underlying buffer
>> + * \return Returns a file descriptor referring to the underlying buffer or 
>> -1
>> + * if an error occurs.
>>   */
>>  GBM_EXPORT int
>>  gbm_bo_get_fd(struct gbm_bo *bo)
>> --
>> 2.7.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97808] "tgsi/scan: don't set interp flags for inputs only used by INTERP instructions" causes glitches in wine with gallium nine

2016-09-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97808

--- Comment #5 from raffa...@zoho.com ---
Created attachment 126522
  --> https://bugs.freedesktop.org/attachment.cgi?id=126522=edit
LoL 32 bit - decoration + shadows (radeonsi gallium nine)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/14] EGL_KHR_debug v3

2016-09-14 Thread Adam Jackson
On Tue, 2016-09-13 at 17:30 +0100, Emil Velikov wrote:

> Everything else (1-12 incl.) is
> Reviewed-by: Emil Velikov 

Merged these, thanks.

- ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97808] "tgsi/scan: don't set interp flags for inputs only used by INTERP instructions" causes glitches in wine with gallium nine

2016-09-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97808

--- Comment #4 from raffa...@zoho.com ---
Created attachment 126521
  --> https://bugs.freedesktop.org/attachment.cgi?id=126521=edit
LoL 32 bit - bush (radeonsi gallium nine)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97808] "tgsi/scan: don't set interp flags for inputs only used by INTERP instructions" causes glitches in wine with gallium nine

2016-09-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97808

--- Comment #3 from Christoph Haag  ---
Also artifacts in native csgo: https://www.youtube.com/watch?v=pMBz2grTAQg

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97808] "tgsi/scan: don't set interp flags for inputs only used by INTERP instructions" causes glitches in wine with gallium nine

2016-09-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97808

--- Comment #2 from raffa...@zoho.com ---
(In reply to Ilia Mirkin from comment #1)
> Any particular driver?

Gallium Nine on radeonsi

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97808] "tgsi/scan: don't set interp flags for inputs only used by INTERP instructions" causes glitches in wine with gallium nine

2016-09-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97808

--- Comment #1 from Ilia Mirkin  ---
Any particular driver?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97808] "tgsi/scan: don't set interp flags for inputs only used by INTERP instructions" causes glitches in wine with gallium nine

2016-09-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97808

Bug ID: 97808
   Summary: "tgsi/scan: don't set interp flags for inputs only
used by INTERP instructions" causes glitches in wine
with gallium nine
   Product: Mesa
   Version: git
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: raffa...@zoho.com
QA Contact: mesa-dev@lists.freedesktop.org

Commit 524fd55d2d973f50a5d8bc2255684610f5faae32 "tgsi/scan: don't set interp
flags for inputs only used by INTERP instructions" caused regression in gallium
nine.

Observed glitches in:
-terrain on WoW 32 bit wine, only with multisampling;
-grass on LoL 32 bit wine;
-terrain on LoL 32 bit wine, only with shadows.

The glitch looks like noise/corruption on texture surfaces.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 13/14] egl: Track EGL_KHR_debug state when going through EGL API calls (v3)

2016-09-14 Thread Kyle Brenneman

This looks right to me.

-Kyle

On 09/14/2016 07:59 AM, Adam Jackson wrote:

From: Kyle Brenneman 

This decorates every EGL entrypoint with _EGL_FUNC_START, which records
the function name and primary dispatch object label in the current
thread state. It also adds debug report functions and calls them when
appropriate.

This would be useful enough for debugging on its own, if the user set a
breakpoint when the report function was called. We will also need this
state tracked in order to expose EGL_KHR_debug.

v2:
- Clear the object label in more cases in _eglSetFuncName
- Pass draw surface (if any) to _EGL_FUNC_START in eglSwapInterval

v3:
- Set dummy thread's CurrentAPI to EGL_OPENGL_ES_API not zero
- Less ?: in _eglSetFuncName
---
  src/egl/main/eglapi.c | 153 +++---
  src/egl/main/eglcurrent.c |  91 ++-
  src/egl/main/eglcurrent.h |  22 +++
  src/egl/main/eglglobals.h |   5 ++
  4 files changed, 258 insertions(+), 13 deletions(-)

diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c
index 1c62a80..cbc3841 100644
--- a/src/egl/main/eglapi.c
+++ b/src/egl/main/eglapi.c
@@ -250,6 +250,37 @@ _eglUnlockDisplay(_EGLDisplay *dpy)
 mtx_unlock(>Mutex);
  }
  
+static EGLBoolean

+_eglSetFuncName(const char *funcName, _EGLDisplay *disp, EGLenum objectType, 
_EGLResource *object)
+{
+   _EGLThreadInfo *thr = _eglGetCurrentThread();
+   if (!_eglIsCurrentThreadDummy()) {
+  thr->CurrentFuncName = funcName;
+  thr->CurrentObjectLabel = NULL;
+
+  if (objectType == EGL_OBJECT_THREAD_KHR)
+ thr->CurrentObjectLabel = thr->Label;
+  else if (objectType == EGL_OBJECT_DISPLAY_KHR && disp)
+ thr->CurrentObjectLabel = disp->Label;
+  else if (object)
+ thr->CurrentObjectLabel = object->Label;
+
+  return EGL_TRUE;
+   }
+
+   _eglDebugReportFull(EGL_BAD_ALLOC, funcName, funcName,
+  EGL_DEBUG_MSG_CRITICAL_KHR, NULL, NULL);
+   return EGL_FALSE;
+}
+
+#define _EGL_FUNC_START(disp, objectType, object, ret) \
+   do { \
+  if (!_eglSetFuncName(__func__, disp, objectType, (_EGLResource *) 
object)) { \
+ if (disp) \
+_eglUnlockDisplay(disp);   \
+ return ret; \
+  } \
+   } while(0)
  
  static EGLint *

  _eglConvertAttribsToInt(const EGLAttrib *attr_list)
@@ -287,6 +318,8 @@ eglGetDisplay(EGLNativeDisplayType nativeDisplay)
 _EGLDisplay *dpy;
 void *native_display_ptr;
  
+   _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY);

+
 STATIC_ASSERT(sizeof(void*) == sizeof(nativeDisplay));
 native_display_ptr = (void*) nativeDisplay;
  
@@ -330,6 +363,7 @@ static EGLDisplay EGLAPIENTRY

  eglGetPlatformDisplayEXT(EGLenum platform, void *native_display,
   const EGLint *attrib_list)
  {
+   _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY);
 return _eglGetPlatformDisplayCommon(platform, native_display, attrib_list);
  }
  
@@ -340,6 +374,8 @@ eglGetPlatformDisplay(EGLenum platform, void *native_display,

 EGLDisplay display;
 EGLint *int_attribs;
  
+   _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY);

+
 int_attribs = _eglConvertAttribsToInt(attrib_list);
 if (attrib_list && !int_attribs)
RETURN_EGL_ERROR(NULL, EGL_BAD_ALLOC, NULL);
@@ -483,6 +519,8 @@ eglInitialize(EGLDisplay dpy, EGLint *major, EGLint *minor)
  {
 _EGLDisplay *disp = _eglLockDisplay(dpy);
  
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);

+
 if (!disp)
RETURN_EGL_ERROR(NULL, EGL_BAD_DISPLAY, EGL_FALSE);
  
@@ -533,6 +571,8 @@ eglTerminate(EGLDisplay dpy)

  {
 _EGLDisplay *disp = _eglLockDisplay(dpy);
  
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);

+
 if (!disp)
RETURN_EGL_ERROR(NULL, EGL_BAD_DISPLAY, EGL_FALSE);
  
@@ -560,6 +600,7 @@ eglQueryString(EGLDisplay dpy, EGLint name)

 }
  
 disp = _eglLockDisplay(dpy);

+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, NULL);
 _EGL_CHECK_DISPLAY(disp, NULL, drv);
  
 switch (name) {

@@ -585,6 +626,8 @@ eglGetConfigs(EGLDisplay dpy, EGLConfig *configs,
 _EGLDriver *drv;
 EGLBoolean ret;
  
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);

+
 _EGL_CHECK_DISPLAY(disp, EGL_FALSE, drv);
 ret = drv->API.GetConfigs(drv, disp, configs, config_size, num_config);
  
@@ -600,6 +643,8 @@ eglChooseConfig(EGLDisplay dpy, const EGLint *attrib_list, EGLConfig *configs,

 _EGLDriver *drv;
 EGLBoolean ret;
  
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);

+
 _EGL_CHECK_DISPLAY(disp, EGL_FALSE, drv);
 ret = drv->API.ChooseConfig(drv, disp, attrib_list, configs,
  config_size, num_config);
@@ -617,6 +662,8 @@ eglGetConfigAttrib(EGLDisplay dpy, EGLConfig config,
 

Re: [Mesa-dev] [PATCH 13/14] egl: Track EGL_KHR_debug state when going through EGL API calls

2016-09-14 Thread Kyle Brenneman
Note that the primary object can still be meaningful even on a function 
that's defined to never throw an error. Those functions could still send 
a WARN or INFO level message if they had reason to, just not a CRITICAL 
or ERROR level. Until any of those are added to Mesa, though, it's an 
academic distinction.


-Kyle

On 09/14/2016 08:00 AM, Adam Jackson wrote:

On Wed, 2016-09-14 at 12:08 +0100, Emil Velikov wrote:


Thanks for reminding me - eglQueryAPI should never throw an error,
indeed. Since EGL_KHR_debug is applicable for functions_do_ throw an
error, one should leave the API out of the spec text shouldn't they ?

I mean, sure, but this patch is against Mesa, not the spec.


This is precisely what I'm talking about - one cannot relate the error
label to a {surface,context,display} object that is yet to be found.
As such the object label (and friends) should be related to the
current thread.

I see your point, I'm just not sure what you want done about it. My
reading of the spec is that there are two ways an implementation can
handle this:

a) "The primary object should be the object the function operates on,
see table 13.2 which provides the recommended mapping between functions
and their primary object."

Note "recommended", which suggests the primary object could be
something else.

b) " will contain the label attached to the primary object
of the message; Labels will be NULL if not set by the application.
[...] This  may be NULL even though the application
labeled the object. This is because it is possible an error was raised
while executing the command before the primary object was validated,
therefore its label can not be included in the callback."

This suggests that if the primary object can't be validated, then a
NULL label will be used.

Now to me, option b seems more conservative. Debug callbacks need to be
prepared for null object labels due to mandatory spec language. They
need to be prepared for unexpected primary objects only due to optional
spec language. And option b is the approach this patch takes,
entrypoints that error before the primary object is validated will
invoke the callback with a null object label.

If we want to amend the spec language, great, that's a fine idea, and
Khronos bugzilla is → that way. But even if we did, I think the
implementation in this patch (well, v3 of it) can be said to conform to
the spec as it currently exists, and that such amendment should not
invalidate existing implementations.

- ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH mesa] glsl: grammar fix

2016-09-14 Thread Eric Engestrom
From: Eero Tamminen 

Signed-off-by: Eero Tamminen 
Reviewed-by: Eric Engestrom 
---

Eero, this is the format a patch should have when sent to mesa-dev.
You can find more info on our website [1], but the gist of it is:
- Use `git send-email`, don't attach patches to emails
- Describe the change done by the patch. This consists of:
  - Title: starting with a prefix describing what it components affects,
and followed by a short sentence explaining the change.
  - Body: if the change is non-trivial, explain it.
- Signature, in the form "Signed-off-by: Name ". This isn't required,
  but recommended. It indicates who authored the change (can be more than
  one person, unlike the git author).

Additionally, anything after the first `---` doesn't appear in the commit
message, which is useful for email-only comments like this one.

Cheers,
  Eric

[1] http://www.mesa3d.org/devinfo.html#submitting

---
 src/compiler/glsl/linker.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
index 4440c03..f3eece2 100644
--- a/src/compiler/glsl/linker.cpp
+++ b/src/compiler/glsl/linker.cpp
@@ -750,8 +750,8 @@ validate_vertex_shader_executable(struct gl_shader_program 
*prog,
   if (!find.variable_found()) {
 if (prog->IsES) {
   linker_warning(prog,
- "vertex shader does not write to `gl_Position'."
- "It's value is undefined. \n");
+ "vertex shader does not write to `gl_Position'. "
+ "Its value is undefined. \n");
 } else {
   linker_error(prog,
"vertex shader does not write to `gl_Position'. \n");
-- 
Cheers,
  Eric

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/14] egl: Lock the display in _eglCreateSync's callers

2016-09-14 Thread Adam Jackson
On Wed, 2016-09-14 at 14:29 +0100, Emil Velikov wrote:

> It's surprising that you haven't heard about this, considering it's
> been in use for more than three years. Guess you simply forgot ?

Most of my own patches to Mesa have been so far from being "features"
that I've never had to care.

- ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 13/14] egl: Track EGL_KHR_debug state when going through EGL API calls

2016-09-14 Thread Emil Velikov
On 14 September 2016 at 15:00, Adam Jackson  wrote:
> On Wed, 2016-09-14 at 12:08 +0100, Emil Velikov wrote:
>
>> Thanks for reminding me - eglQueryAPI should never throw an error,
>> indeed. Since EGL_KHR_debug is applicable for functions_do_ throw an
>> error, one should leave the API out of the spec text shouldn't they ?
>
> I mean, sure, but this patch is against Mesa, not the spec.
>
Fully agree - this is not something we need to address in mesa.

>> This is precisely what I'm talking about - one cannot relate the error
>> label to a {surface,context,display} object that is yet to be found.
>> As such the object label (and friends) should be related to the
>> current thread.
>
> I see your point, I'm just not sure what you want done about it. My
> reading of the spec is that there are two ways an implementation can
> handle this:
>
> a) "The primary object should be the object the function operates on,
> see table 13.2 which provides the recommended mapping between functions
> and their primary object."
>
> Note "recommended", which suggests the primary object could be
> something else.
>
> b) " will contain the label attached to the primary object
> of the message; Labels will be NULL if not set by the application.
> [...] This  may be NULL even though the application
> labeled the object. This is because it is possible an error was raised
> while executing the command before the primary object was validated,
> therefore its label can not be included in the callback."
>
> This suggests that if the primary object can't be validated, then a
> NULL label will be used.
>
> Now to me, option b seems more conservative. Debug callbacks need to be
> prepared for null object labels due to mandatory spec language. They
> need to be prepared for unexpected primary objects only due to optional
> spec language. And option b is the approach this patch takes,
> entrypoints that error before the primary object is validated will
> invoke the callback with a null object label.
>
> If we want to amend the spec language, great, that's a fine idea, and
> Khronos bugzilla is → that way. But even if we did, I think the
> implementation in this patch (well, v3 of it) can be said to conform to
> the spec as it currently exists, and that such amendment should not
> invalidate existing implementations.
>
Again, fully agree - it's not something we should address in mesa.
Just checking that our understanding of the spec aligns and it [the
spec] leaves an open question. Then again... seems like I've missed
"recommended" which effectively gives implementations flexibility to
answer do things in a way they seem fit.

Thanks !
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/winsys/kms: Close drm device filedescriptor on kms_dri_sw_winsys release

2016-09-14 Thread Emil Velikov
Hi Lukasz,

On 5 September 2016 at 17:48, Lukasz Spintzyk  wrote:
> This closes filedescriptor owned by kms_dri_sw_winsys struct. It fixes issue
> where removal of udl or evdi module used by DisplayLink devices was impossible
> due to not closed filedescriptors.
>
> When this file descriptor was not closed then command
> rmmod udl was returning error "Module udl is in use".
> By this fix xserver does not prevent module removal when usb device
> is unplugged.
>
> Signed-off-by: Lukasz Spintzyk 
> ---
>  src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c 
> b/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c
> index 07eca99..f06ccef 100644
> --- a/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c
> +++ b/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c
> @@ -371,6 +371,9 @@ kms_sw_displaytarget_display(struct sw_winsys *ws,
>  static void
>  kms_destroy_sw_winsys(struct sw_winsys *winsys)
>  {
> +   struct kms_sw_winsys *kms_sw = kms_sw_winsys(winsys);
> +
> +   close(kms_sw->fd);
AFAICT not even a single driver/winsys takes ownership of the fd. As
such they should not close() it.

From a quick skim - on the st/dri side, we explicitly provide new fd
(we dup the one passed from the upper layers) to the driver. Similarly
in vl/dri3 we open the device, yet neither of these closes the fd.

Imho it might be worth going through the code paths adding comments
about the fd ownership at different stages while fixing any leaks that
you/others come across.

Note: Multiple displays/Xinerama and GL-VDPAU interop are sensitive on
the area, so I'd suggest testing things carefully and poking the last
dev who was working in the area (git log/blame are your friend).

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 13/14] egl: Track EGL_KHR_debug state when going through EGL API calls

2016-09-14 Thread Adam Jackson
On Wed, 2016-09-14 at 12:08 +0100, Emil Velikov wrote:

> Thanks for reminding me - eglQueryAPI should never throw an error,
> indeed. Since EGL_KHR_debug is applicable for functions_do_ throw an
> error, one should leave the API out of the spec text shouldn't they ?

I mean, sure, but this patch is against Mesa, not the spec.

> This is precisely what I'm talking about - one cannot relate the error
> label to a {surface,context,display} object that is yet to be found.
> As such the object label (and friends) should be related to the
> current thread.

I see your point, I'm just not sure what you want done about it. My
reading of the spec is that there are two ways an implementation can
handle this:

a) "The primary object should be the object the function operates on,
see table 13.2 which provides the recommended mapping between functions
and their primary object."

Note "recommended", which suggests the primary object could be
something else.

b) " will contain the label attached to the primary object
of the message; Labels will be NULL if not set by the application.
[...] This  may be NULL even though the application
labeled the object. This is because it is possible an error was raised
while executing the command before the primary object was validated,
therefore its label can not be included in the callback."

This suggests that if the primary object can't be validated, then a
NULL label will be used.

Now to me, option b seems more conservative. Debug callbacks need to be
prepared for null object labels due to mandatory spec language. They
need to be prepared for unexpected primary objects only due to optional
spec language. And option b is the approach this patch takes,
entrypoints that error before the primary object is validated will
invoke the callback with a null object label.

If we want to amend the spec language, great, that's a fine idea, and
Khronos bugzilla is → that way. But even if we did, I think the
implementation in this patch (well, v3 of it) can be said to conform to
the spec as it currently exists, and that such amendment should not
invalidate existing implementations.

- ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 13/14] egl: Track EGL_KHR_debug state when going through EGL API calls (v3)

2016-09-14 Thread Adam Jackson
From: Kyle Brenneman 

This decorates every EGL entrypoint with _EGL_FUNC_START, which records
the function name and primary dispatch object label in the current
thread state. It also adds debug report functions and calls them when
appropriate.

This would be useful enough for debugging on its own, if the user set a
breakpoint when the report function was called. We will also need this
state tracked in order to expose EGL_KHR_debug.

v2:
- Clear the object label in more cases in _eglSetFuncName
- Pass draw surface (if any) to _EGL_FUNC_START in eglSwapInterval

v3:
- Set dummy thread's CurrentAPI to EGL_OPENGL_ES_API not zero
- Less ?: in _eglSetFuncName
---
 src/egl/main/eglapi.c | 153 +++---
 src/egl/main/eglcurrent.c |  91 ++-
 src/egl/main/eglcurrent.h |  22 +++
 src/egl/main/eglglobals.h |   5 ++
 4 files changed, 258 insertions(+), 13 deletions(-)

diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c
index 1c62a80..cbc3841 100644
--- a/src/egl/main/eglapi.c
+++ b/src/egl/main/eglapi.c
@@ -250,6 +250,37 @@ _eglUnlockDisplay(_EGLDisplay *dpy)
mtx_unlock(>Mutex);
 }
 
+static EGLBoolean
+_eglSetFuncName(const char *funcName, _EGLDisplay *disp, EGLenum objectType, 
_EGLResource *object)
+{
+   _EGLThreadInfo *thr = _eglGetCurrentThread();
+   if (!_eglIsCurrentThreadDummy()) {
+  thr->CurrentFuncName = funcName;
+  thr->CurrentObjectLabel = NULL;
+
+  if (objectType == EGL_OBJECT_THREAD_KHR)
+ thr->CurrentObjectLabel = thr->Label;
+  else if (objectType == EGL_OBJECT_DISPLAY_KHR && disp)
+ thr->CurrentObjectLabel = disp->Label;
+  else if (object)
+ thr->CurrentObjectLabel = object->Label;
+
+  return EGL_TRUE;
+   }
+
+   _eglDebugReportFull(EGL_BAD_ALLOC, funcName, funcName,
+  EGL_DEBUG_MSG_CRITICAL_KHR, NULL, NULL);
+   return EGL_FALSE;
+}
+
+#define _EGL_FUNC_START(disp, objectType, object, ret) \
+   do { \
+  if (!_eglSetFuncName(__func__, disp, objectType, (_EGLResource *) 
object)) { \
+ if (disp) \
+_eglUnlockDisplay(disp);   \
+ return ret; \
+  } \
+   } while(0)
 
 static EGLint *
 _eglConvertAttribsToInt(const EGLAttrib *attr_list)
@@ -287,6 +318,8 @@ eglGetDisplay(EGLNativeDisplayType nativeDisplay)
_EGLDisplay *dpy;
void *native_display_ptr;
 
+   _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY);
+
STATIC_ASSERT(sizeof(void*) == sizeof(nativeDisplay));
native_display_ptr = (void*) nativeDisplay;
 
@@ -330,6 +363,7 @@ static EGLDisplay EGLAPIENTRY
 eglGetPlatformDisplayEXT(EGLenum platform, void *native_display,
  const EGLint *attrib_list)
 {
+   _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY);
return _eglGetPlatformDisplayCommon(platform, native_display, attrib_list);
 }
 
@@ -340,6 +374,8 @@ eglGetPlatformDisplay(EGLenum platform, void 
*native_display,
EGLDisplay display;
EGLint *int_attribs;
 
+   _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY);
+
int_attribs = _eglConvertAttribsToInt(attrib_list);
if (attrib_list && !int_attribs)
   RETURN_EGL_ERROR(NULL, EGL_BAD_ALLOC, NULL);
@@ -483,6 +519,8 @@ eglInitialize(EGLDisplay dpy, EGLint *major, EGLint *minor)
 {
_EGLDisplay *disp = _eglLockDisplay(dpy);
 
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);
+
if (!disp)
   RETURN_EGL_ERROR(NULL, EGL_BAD_DISPLAY, EGL_FALSE);
 
@@ -533,6 +571,8 @@ eglTerminate(EGLDisplay dpy)
 {
_EGLDisplay *disp = _eglLockDisplay(dpy);
 
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);
+
if (!disp)
   RETURN_EGL_ERROR(NULL, EGL_BAD_DISPLAY, EGL_FALSE);
 
@@ -560,6 +600,7 @@ eglQueryString(EGLDisplay dpy, EGLint name)
}
 
disp = _eglLockDisplay(dpy);
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, NULL);
_EGL_CHECK_DISPLAY(disp, NULL, drv);
 
switch (name) {
@@ -585,6 +626,8 @@ eglGetConfigs(EGLDisplay dpy, EGLConfig *configs,
_EGLDriver *drv;
EGLBoolean ret;
 
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);
+
_EGL_CHECK_DISPLAY(disp, EGL_FALSE, drv);
ret = drv->API.GetConfigs(drv, disp, configs, config_size, num_config);
 
@@ -600,6 +643,8 @@ eglChooseConfig(EGLDisplay dpy, const EGLint *attrib_list, 
EGLConfig *configs,
_EGLDriver *drv;
EGLBoolean ret;
 
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);
+
_EGL_CHECK_DISPLAY(disp, EGL_FALSE, drv);
ret = drv->API.ChooseConfig(drv, disp, attrib_list, configs,
 config_size, num_config);
@@ -617,6 +662,8 @@ eglGetConfigAttrib(EGLDisplay dpy, EGLConfig config,
_EGLDriver *drv;
EGLBoolean ret;
 
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);
+
_EGL_CHECK_CONFIG(disp, conf, 

Re: [Mesa-dev] [PATCH v2] gallium: fix return value check

2016-09-14 Thread Emil Velikov
On 14 September 2016 at 13:47, Eric Engestrom  wrote:
> On Thu, Sep 08, 2016 at 03:12:42PM +0300, Martina Kollarova wrote:
>> A possible error (-1) was being lost because it was first converted to an
>> unsigned int and only then checked.
>>
>> Reviewed-by: Nicolai Hähnle 
>> Signed-off-by: Martina Kollarova 
>
> Good cleanup :)
> Reviewed-by: Eric Engestrom 
>
> By the way, I noticed this hasn't landed, after almost a week.
> Would anyone care to push it? Nicolai?
>
Just did. Thanks for the reminder !

Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH mesa] configure.ac: fix the name of the Wayland Scanner pc file

2016-09-14 Thread Emil Velikov
On 13 September 2016 at 17:31, Eric Engestrom  wrote:
> From: Brendan King 
>
> The Wayland Scanner pkg-config file is called wayland-scanner.pc.
>
Nice one, thanks.

Added the stable tag and pushed alongside the GBM cleanup patch.
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/14] egl: Lock the display in _eglCreateSync's callers

2016-09-14 Thread Emil Velikov
On 14 September 2016 at 13:36, Adam Jackson  wrote:
> On Wed, 2016-09-14 at 11:15 +0100, Emil Velikov wrote:
>
>> Nice one... I wonder if your view will be the same if you were never
>> involved in distribution packaging? Guess we'll never know :-\
>> In case you've forgotten things have been like that for a long time -
>> long before I jumped in.
>
> I wasn't accusing you of anything. I said _I_ am not the one making the
> decision, that's all.
>
It's surprising that you haven't heard about this, considering it's
been in use for more than three years. Guess you simply forgot ?

> Obviously I can't make definite assertions about counterfactuals about
> my work history, but

> I think considering all "new features" equally
> destabilizing is wrong.
Fully agree. The point is that defining which new things are "more
destabilizing" than others is a never ending topic. There are a few
major influences:
 - humans always have a subjective view on things, always
 - due to ^^ the feasibility and impact of backporting is related to
the personal interest in the feature

Thus, such topics are better left to distributions to lobby, discuss,
vote and/or other, as they seem fit.

> Why have an extension model if you're not going
> to use it to make assertions about the orthogonality of feature sets?
> Why refuse to reason about the code, unless you don't have any
> confidence that it's something that can be reasoned about?
>
> Yes, we do backport features, it works pretty well. If one does so
> enough times, a sense develops of how "big" of a feature it's possible
> to backport reasonably. I have my own opinion about this one, and I was
> asking what the rule was for mesa stable. Since the rule seems to be
> "no", fine, not for stable.
>
With the above said:
Yes, we are aware of (at least some) the backports that you do. Even
though I don't use such packages, I think it's quite reasonable thing
to do. Yes, your view about the severity/feasibility of such backports
is (in all likelihood) going to be spot on, yet there is the
subjective element.

Hope the above provides some clarity on the topic.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gbm/dri2: propagate errors when creating a DMA-BUF fd

2016-09-14 Thread Eric Engestrom
On Thu, Sep 08, 2016 at 03:55:02PM -0400, Nicholas Bishop wrote:
> Changed dri2_query_image to check the return value of
> resource_get_handle and return GL_FALSE if an error occurs. Similarly
> changed gbm_dri_bo_get_fd to check the return value of queryImage and
> return -1 (an invalid file descriptor) if an error occurs.
> 
> Updated the comment for gbm_bo_get_fd to say that -1 is returned if
> an error occurs.
> 
> For reference this is an example callstack that should propagate the
> error back to the user:
> 
> i915_drm_buffer_get_handle
> i915_texture_get_handle
> u_resource_get_handle_vtbl
> dri2_query_image
> gbm_dri_bo_get_fd
> gbm_bo_get_fd
> 
> Signed-off-by: Nicholas Bishop 

Looks good to me
Reviewed-by: Eric Engestrom 

> ---
>  src/gallium/state_trackers/dri/dri2.c | 11 +++
>  src/gbm/backends/dri/gbm_dri.c|  8 +---
>  src/gbm/main/gbm.c|  3 ++-
>  3 files changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/src/gallium/state_trackers/dri/dri2.c 
> b/src/gallium/state_trackers/dri/dri2.c
> index 28f8078..c6260ba 100644
> --- a/src/gallium/state_trackers/dri/dri2.c
> +++ b/src/gallium/state_trackers/dri/dri2.c
> @@ -979,10 +979,13 @@ dri2_query_image(__DRIimage *image, int attrib, int 
> *value)
>return GL_TRUE;
> case __DRI_IMAGE_ATTRIB_FD:
>whandle.type= DRM_API_HANDLE_TYPE_FD;
> -  image->texture->screen->resource_get_handle(image->texture->screen,
> - image->texture, , usage);
> -  *value = whandle.handle;
> -  return GL_TRUE;
> +  if (image->texture->screen->resource_get_handle(image->texture->screen,
> + image->texture, , usage)) {
> + *value = whandle.handle;
> + return GL_TRUE;
> +  } else {
> + return GL_FALSE;
> +  }
> case __DRI_IMAGE_ATTRIB_FORMAT:
>*value = image->dri_format;
>return GL_TRUE;
> diff --git a/src/gbm/backends/dri/gbm_dri.c b/src/gbm/backends/dri/gbm_dri.c
> index c3626e3..54b293a 100644
> --- a/src/gbm/backends/dri/gbm_dri.c
> +++ b/src/gbm/backends/dri/gbm_dri.c
> @@ -589,9 +589,11 @@ gbm_dri_bo_get_fd(struct gbm_bo *_bo)
> if (bo->image == NULL)
>return -1;
>  
> -   dri->image->queryImage(bo->image, __DRI_IMAGE_ATTRIB_FD, );
> -
> -   return fd;
> +   if (dri->image->queryImage(bo->image, __DRI_IMAGE_ATTRIB_FD, )) {
> +  return fd;
> +   } else {
> +  return -1;
> +   }
>  }
>  
>  static void
> diff --git a/src/gbm/main/gbm.c b/src/gbm/main/gbm.c
> index 95b4c2c..c3a2ec33 100644
> --- a/src/gbm/main/gbm.c
> +++ b/src/gbm/main/gbm.c
> @@ -242,7 +242,8 @@ gbm_bo_get_handle(struct gbm_bo *bo)
>   * descriptor.
>  
>   * \param bo The buffer object
> - * \return Returns a file descriptor referring  to the underlying buffer
> + * \return Returns a file descriptor referring to the underlying buffer or -1
> + * if an error occurs.
>   */
>  GBM_EXPORT int
>  gbm_bo_get_fd(struct gbm_bo *bo)
> -- 
> 2.7.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] gallium: fix return value check

2016-09-14 Thread Eric Engestrom
On Thu, Sep 08, 2016 at 03:12:42PM +0300, Martina Kollarova wrote:
> A possible error (-1) was being lost because it was first converted to an
> unsigned int and only then checked.
> 
> Reviewed-by: Nicolai Hähnle 
> Signed-off-by: Martina Kollarova 

Good cleanup :)
Reviewed-by: Eric Engestrom 

By the way, I noticed this hasn't landed, after almost a week.
Would anyone care to push it? Nicolai?

Cheers,
  Eric

> ---
>  src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c 
> b/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c
> index 07eca99..22e1c93 100644
> --- a/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c
> +++ b/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c
> @@ -252,18 +252,18 @@ kms_sw_displaytarget_add_from_prime(struct 
> kms_sw_winsys *kms_sw, int fd,
> if (!kms_sw_dt)
>return NULL;
>  
> +   off_t lseek_ret = lseek(fd, 0, SEEK_END);
> +   if (lseek_ret == -1) {
> +  FREE(kms_sw_dt);
> +  return NULL;
> +   }
> +   kms_sw_dt->size = lseek_ret;
> kms_sw_dt->ref_count = 1;
> kms_sw_dt->handle = handle;
> -   kms_sw_dt->size = lseek(fd, 0, SEEK_END);
> kms_sw_dt->width = width;
> kms_sw_dt->height = height;
> kms_sw_dt->stride = stride;
>  
> -   if (kms_sw_dt->size == (off_t)-1) {
> -  FREE(kms_sw_dt);
> -  return NULL;
> -   }
> -
> lseek(fd, 0, SEEK_SET);
>  
> list_add(_sw_dt->link, _sw->bo_list);
> -- 
> 1.9.1
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/14] egl: Lock the display in _eglCreateSync's callers

2016-09-14 Thread Adam Jackson
On Wed, 2016-09-14 at 11:15 +0100, Emil Velikov wrote:

> Nice one... I wonder if your view will be the same if you were never
> involved in distribution packaging? Guess we'll never know :-\
> In case you've forgotten things have been like that for a long time -
> long before I jumped in.

I wasn't accusing you of anything. I said _I_ am not the one making the
decision, that's all.

Obviously I can't make definite assertions about counterfactuals about
my work history, but I think considering all "new features" equally
destabilizing is wrong. Why have an extension model if you're not going
to use it to make assertions about the orthogonality of feature sets?
Why refuse to reason about the code, unless you don't have any
confidence that it's something that can be reasoned about?

Yes, we do backport features, it works pretty well. If one does so
enough times, a sense develops of how "big" of a feature it's possible
to backport reasonably. I have my own opinion about this one, and I was
asking what the rule was for mesa stable. Since the rule seems to be
"no", fine, not for stable.

- ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH mesa] gbm: remove left-over array

2016-09-14 Thread Eric Engestrom
e7c8c85785b3a8f29e3f ("gbm: Removed unused function.") forgot to remove
the global array used only by that function.

Signed-off-by: Eric Engestrom 
---
 src/gbm/main/gbm.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/src/gbm/main/gbm.c b/src/gbm/main/gbm.c
index 1acbbcd..5a8e8b7 100644
--- a/src/gbm/main/gbm.c
+++ b/src/gbm/main/gbm.c
@@ -45,12 +45,6 @@
 #include "gbmint.h"
 #include "backend.h"
 
-#define ARRAY_SIZE(a) (sizeof(a)/sizeof((a)[0]))
-
-static struct gbm_device *devices[16];
-
-static int device_num = 0;
-
 /** Returns the file description for the gbm device
  *
  * \return The fd that the struct gbm_device was created with
@@ -127,9 +121,6 @@ gbm_create_device(int fd)
   return NULL;
}
 
-   if (device_num == 0)
-  memset(devices, 0, sizeof devices);
-
gbm = _gbm_create_device(fd);
if (gbm == NULL)
   return NULL;
@@ -138,9 +129,6 @@ gbm_create_device(int fd)
gbm->stat = buf;
gbm->refcount = 1;
 
-   if (device_num < ARRAY_SIZE(devices)-1)
-  devices[device_num++] = gbm;
-
return gbm;
 }
 
-- 
Cheers,
  Eric

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH mesa] configure.ac: fix the name of the Wayland Scanner pc file

2016-09-14 Thread Eric Engestrom
From: Brendan King 

The Wayland Scanner pkg-config file is called wayland-scanner.pc.

Fixes: 153539bd9d4445b50411 ("configure: rework wayland_scanner
   handling (fix make distcheck)")
CC: Emil Velikov 
Reviewed-by: Eric Engestrom 
Tested-by: Eric Engestrom 
Signed-off-by: Brendan King 
---
 configure.ac | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure.ac b/configure.ac
index a413a3a..b81171b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2005,8 +2005,8 @@ if test "x$with_egl_platforms" != "x" -a "x$enable_egl" 
!= xyes; then
 AC_MSG_ERROR([cannot build egl state tracker without EGL library])
 fi
 
-PKG_CHECK_MODULES([WAYLAND_SCANNER], [wayland_scanner],
-WAYLAND_SCANNER=`$PKG_CONFIG --variable=wayland_scanner 
wayland_scanner`,
+PKG_CHECK_MODULES([WAYLAND_SCANNER], [wayland-scanner],
+WAYLAND_SCANNER=`$PKG_CONFIG --variable=wayland_scanner 
wayland-scanner`,
 WAYLAND_SCANNER='')
 if test "x$WAYLAND_SCANNER" = x; then
 AC_PATH_PROG([WAYLAND_SCANNER], [wayland-scanner])
-- 
Cheers,
  Eric

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Remove GL_GLEXT_PROTOTYPES guards from non-ext headers.

2016-09-14 Thread Emil Velikov
On 13 September 2016 at 21:05, Eric Anholt  wrote:
> Ilia Mirkin  writes:
>
>> On Mon, Sep 12, 2016 at 11:55 AM, Emil Velikov  
>> wrote:
>>> On 12 September 2016 at 15:35, Ilia Mirkin  wrote:
 On Mon, Sep 12, 2016 at 10:10 AM, Emil Velikov  
 wrote:
> Keeping diff/patches in git always felt like a hack, imho. Plus
> most/all(?) distros rely on the Mesa headers, so I'm not sure how that
> is going to work.

 The alternatives are considerably more painful for just a handful of
 files with a small number of diffs. This would be as a tool for
 developers like us who update the mesa versions by importing new KHR
 versions, which will not have our local changes applied. The patch
 would not be used as part of the build process or anything else.

>>> The goal being to have the patches alongside the patched headers.
>>> This way one can use them as reference ? Sure sounds great imho.
>>
>> Exactly. So that when I download new KHR headers, I just apply the
>> patch to them (and hope it applies), and if not, look at what was
>> being done and try to repeat the process. Then I regenerate the patch
>> against the (new) originals and check the whole thing in.
>
> Or you could just use git like normal: You have a public branch of the
> unchanged headers.  You make your own changes to the headers on master.
> When you want to update to new upstream headers, you check out the
> unchanged-headers branch, commit new unchanged upstreams there, check
> out master, and git merge.
I'd imagine that our (people/companies who are Khronos members) time
would be better spent on upstreaming things, rather than finding ways
how to manage the diff.

Or is it just me ?

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97804] Later precision statement isn't overriding earlier one

2016-09-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97804

Bug ID: 97804
   Summary: Later precision statement isn't overriding earlier one
   Product: Mesa
   Version: git
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: glsl-compiler
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: eero.t.tammi...@intel.com
QA Contact: intel-3d-b...@lists.freedesktop.org

Created attachment 126514
  --> https://bugs.freedesktop.org/attachment.cgi?id=126514=edit
Glmark2 shader triggering the bug

GLES 3.2 spec:
https://www.khronos.org/registry/gles/specs/3.2/GLSL_ES_Specification_3.20.pdf

States following in "4.7.4 Default Precision Qualifiers":

Non-precision qualified declarations will use the precision qualifier specified
in the most recent precision statement that is still in scope. The precision
statement has the same scoping rules as variable declarations.
...
*Multiple precision statements for the same basic type can appear inside the
same scope, with later statements overriding earlier statements within that
scope.*
...
"
All languages except for the fragment language have the following predeclared
globally scoped default precision statements:
   precision highp float;
...
The fragment language has no default precision qualifier for floating point
types. Hence for float, floating point vector and matrix variable declarations,
either the declaration must include a precision qualifier or the default float
precision must have been previously declared.


Check for latter was added with bug 60737, but precision overriding doesn't
seem to work.

Attached Glmark2 shader test fails to Mesa bug 97532.  Vertex shader float
uniform defaults to highp, and same should happen in fragment shader, however,
that happens only if the first precision statement in shader is removed, as
only then it will pass the linker check.  I.e.  precision overriding doesn't
work.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 13/14] egl: Track EGL_KHR_debug state when going through EGL API calls

2016-09-14 Thread Emil Velikov
On 13 September 2016 at 17:46, Adam Jackson  wrote:
> On Tue, 2016-09-13 at 16:54 +0100, Emil Velikov wrote:
>> Going through table 13.2 and the below there are some discrepancies.
>>
>> AFAICT some of those can be seen as spec bugs (B), while others seem
>> to be missing (M)
>>  - thread - eglBindAPI (M)
>
> Not really missing, but tricky. _EGL_FUNC_START calls _eglSetFuncName
> which initializes thr->CurrentObjectLabel, so if there _is_ a non-dummy
> thread then the callback will get the correct label for the thread
> object.
>
> However, if we're on the dummy thread, then we'll hit the call
> to _eglDebugReportFull(objectLabel=NULL), which will correctly call the
> callback with both labels NULL. Arguably _eglSetFuncName should also
> clear ->CurrentObjectLabel in this case.
>
Staring at the list for a while and yet I've missed the
_EGL_FUNC_START line in eglBindAPI.

>>  - display - eglGetCurrentDisplay (B)
>
> It's somewhat irrelevant since our implementation never throws an error
> on this path (and it's not clear that any implementation ever would),
> but: what do you mean by "spec bug" here?
>
From the spec

 will contain the label attached to the primary object
of the message; Labels will be NULL if not set by the application.
The primary object should be the object the function operates on, see
table 13.2 which provides the recommended mapping between functions and
their primary object.

It tells us the relation between the label and the (primary) object
which we implement by attaching the label to the corresponding
primitive object in _eglSetFuncName.

In this particular case if one cannot derive the current display, how
can they cannot attach the label to the display object ? In a similar
way we have the eglCreate entry points, which relate to the dpy since
one cannot relate (attach in our case) the label to the non-existent
primitive that one is trying to create.

NB:
The fact that mesa/foo does not throw an error is implementation
detail, which should not be relied upon.

>>  - context - eglQueryAPI (M),
>
> eglQueryAPI is _defined_ as never throwing an error, so I'm not sure
> this is really "missing". However, the dummy thread's ->CurrentAPI is
> initialized to 0, but "no API" is EGL_NONE which is not zero but
> 0x3038, so that really is a bug; I'll fix that up.
>
Thanks for reminding me - eglQueryAPI should never throw an error,
indeed. Since EGL_KHR_debug is applicable for functions_do_ throw an
error, one should leave the API out of the spec text shouldn't they ?
The only text that would be applicable i one that reminds us about
that. Something vaguely like "Since eglQueryAPI never throws an error,
calling the function should not have any effect on the object label,
(others) already associated with the context/thread/..."

>>  eglGetCurrentContext (B)
>
> Again, this is defined as not throwing an error, so as long as we never
> trigger the debug callback there's no problem here.
>
The above description for dpy still applies here. Just replace
s/display/context/.

>>  - surface - eglSwapInterval (B)
>
> Again, not sure what you mean by "spec bug" here. But there is an
> implementation bug, we should pass ctx->DrawSurface as the active
> object to _EGL_FUNC_START since we're already locked; if it's NULL and
> we have a live thread then _eglSetFuncName will clear the current
> object label correctly. I'll fix that up.
>
Yes there is the implementation bug that you've mentioned. But there's
more to it imho.

The validation between current context (and thus draw surface) and
user provided dpy sounds a bit iffy.
I'm also leaning that the function operate/relates closer to the
(current) thread than the actual drawable, no ?

>>  eglGetCurrentSurface(B)
>
> The weird part about this one is that we might need to throw an error
> before we've found a valid surface to operate with.
This is precisely what I'm talking about - one cannot relate the error
label to a {surface,context,display} object that is yet to be found.
As such the object label (and friends) should be related to the
current thread.

Regards,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Linker warning typo fix

2016-09-14 Thread Eero Tamminen

Hi,

Attached is fix to linker typo I noticed.

(It's so trivial that I'm not going to send updates to it for process 
reasons / attributions.  Apply it as you wish.)



- Eero
diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
index c95edf3..f008b4f 100644
--- a/src/compiler/glsl/linker.cpp
+++ b/src/compiler/glsl/linker.cpp
@@ -750,8 +750,8 @@ validate_vertex_shader_executable(struct gl_shader_program *prog,
   if (!find.variable_found()) {
 if (prog->IsES) {
   linker_warning(prog,
- "vertex shader does not write to `gl_Position'."
- "It's value is undefined. \n");
+ "vertex shader does not write to `gl_Position'. "
+ "Its value is undefined. \n");
 } else {
   linker_error(prog,
"vertex shader does not write to `gl_Position'. \n");
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Problem with RX 480 on Alien: Isolation and Dota 2

2016-09-14 Thread Marek Olšák
On Wed, Sep 14, 2016 at 5:26 AM, Michel Dänzer  wrote:
> On 14/09/16 02:53 AM, Marek Olšák wrote:
>>
>> cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/x86_64-linux-gnu
>> -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=O
>>   -DCMAKE_BUILD_TYPE=RelWithDebInfo
>> -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \
>>   -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
>> -fno-omit-frame-pointer" \
>>   -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
>> -fno-omit-frame-pointer".
>
> FWIW, I recommend enabling assertions, i.e. setting
> -DLLVM_ENABLE_ASSERTIONS=1 and removing -DNDEBUG.

That should have been:

-DLLVM_ENABLE_ASSERTIONS=ON \

It was cut when I was copy-pasting it.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/14] egl: Lock the display in _eglCreateSync's callers

2016-09-14 Thread Emil Velikov
On 13 September 2016 at 19:22, Adam Jackson  wrote:
> On Tue, 2016-09-13 at 19:18 +0100, Emil Velikov wrote:
>
>> For the series as a whole ?
>> Two words which contradict any software's stable scheme - new feature.
>
> Disagree, but I'm not the one running Mesa's stable branch, so my
> opinion doesn't count here.
>
Nice one... I wonder if your view will be the same if you were never
involved in distribution packaging? Guess we'll never know :-\
In case you've forgotten things have been like that for a long time -
long before I jumped in.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Problem with RX 480 on Alien: Isolation and Dota 2

2016-09-14 Thread Evgenii Shatokhin

On 13.09.2016 21:27, Romain Failliot wrote:

Thanks a lot! I'll try that tonight!

I have a 64-bit distrib, I don't think so but do I need to compile the 32-bit
version of llvm as well (is it because Steam is using 32-bit libraries?).


Yes, you likely need both 64-bit and 32-bit LLVM and Mesa. 32-bit 
libraries are needed for Steam itself but some games launched from Steam 
seem to use 64-bit ones.


For this reason, I am now actually building both 64-bit and 32-bit 
versions for our Linux distro, ROSA.


I do not know whether it is possible to get away with only 64-bit 
versions somehow.


By the way, many thanks to Marek Olšák for the build instructions!

Regards,
Evgenii



2016-09-13 13:53 GMT-04:00 Marek Olšák :

LLVM 64-bit:

mkdir -p build
cd build
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/x86_64-linux-gnu
-DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=O
   -DCMAKE_BUILD_TYPE=RelWithDebInfo
-DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \
   -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
-fno-omit-frame-pointer" \
   -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
-fno-omit-frame-pointer".
ninja
sudo ninja install


LLVM 32-bit:

mkdir -p build32
cd build32
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/i386-linux-gnu
-DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=ON
   -DCMAKE_BUILD_TYPE=RelWithDebInfo
-DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \
   -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
-fno-omit-frame-pointer" \
   -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
-fno-omit-frame-pointer" \
   -DLLVM_BUILD_32_BITS=ON
ninja
sudo ninja install
# then add /usr/llvm/x86_64-linux-gnu and /usr/llvm/i386-linux-gnu to
ld.conf


Mesa configure helper script, it will overwrite the /usr/lib/ files on
Ubuntu (run as-is for 64-bit, or use "-32" for 32-bit):

if test x$1 = x-32; then
 dir=i386-linux-gnu
 build=i686-linux-gnu
 export CFLAGS="-m32 -O2 -g"
 export CXXFLAGS="$CFLAGS"
 export LDFLAGS="-L/usr/lib/$dir"
 export PKG_CONFIG_PATH="/usr/lib/$dir/pkgconfig"
else
 dir=x86_64-linux-gnu
 build=$dir
fi

./autogen.sh \
  --build=$build --prefix=/usr --libdir=/usr/lib/$dir
--with-llvm-prefix=/usr/llvm/$dir \
  --enable-glx-tls --enable-texture-float --enable-debug --enable-vdpau \
  --disable-xvmc --disable-va --enable-nine --with-sha1=libnettle \
  --with-gallium-drivers=radeonsi,r600,swrast --with-dri-drivers= \
  --with-egl-platforms=x11,drm --enable-gles1 --enable-gles2

make -j4
sudo make install

You'll probably want to delete /usr/lib/$dir/*mesa*/*. That's Ubuntu's
invention that will prevent you from using installed libGL and libEGL.

It's all kind of a mess, but I don't know of a better way.

Marek



On Tue, Sep 13, 2016 at 7:33 PM, Romain Failliot
 wrote:

2016-09-13 12:41 GMT-04:00 Marek Olšák :


BTW, If you update LLVM to a newer version, you also have to re-build
Mesa, because the LLVM version used by Mesa is determined while Mesa
is being built.

Also, the chance to rage-quit while building LLVM+Mesa is pretty high
if you've never done it before.


I see, is there a tutorial somewhere maybe on how to do that?
I know how to compile projects, that's not a problem. It's more about the
little details to make everything work once it's compiled.



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Take Dispatch/Vector mask into account in FIND_LIVE_CHANNEL

2016-09-14 Thread Jason Ekstrand
On at least Sky Lake, ce0 does not contain the full story as far as enabled
channels goes.  It is possible to have completely disabled channels where
the corresponding bits in ce0 are 1.  In order to get the correct execution
mask, you have to mask off those channels which were disabled from the
beginning by taking the AND of ce0 with either sr0.2 or sr0.3 depending on
the shader stage.  Failure to do so can result in FIND_LIVE_CHANNEL
returning a completely dead channel.

Signed-off-by: Jason Ekstrand 
Cc: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_eu.h   |  3 ++-
 src/mesa/drivers/dri/i965/brw_eu_emit.c  | 25 +---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  2 +-
 4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
b/src/mesa/drivers/dri/i965/brw_eu.h
index 3e52764..109ff8a 100644
--- a/src/mesa/drivers/dri/i965/brw_eu.h
+++ b/src/mesa/drivers/dri/i965/brw_eu.h
@@ -488,7 +488,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
 
 void
 brw_find_live_channel(struct brw_codegen *p,
-  struct brw_reg dst);
+  struct brw_reg dst,
+  bool vector_mask_enable);
 
 void
 brw_broadcast(struct brw_codegen *p,
diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 3b12030..a760b30 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -3361,7 +3361,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
 }
 
 void
-brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst)
+brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst,
+  bool vector_mask_enable)
 {
const struct gen_device_info *devinfo = p->devinfo;
const unsigned exec_size = 1 << brw_inst_exec_size(devinfo, p->current);
@@ -3377,13 +3378,23 @@ brw_find_live_channel(struct brw_codegen *p, struct 
brw_reg dst)
 
   if (devinfo->gen >= 8) {
  /* Getting the first active channel index is easy on Gen8: Just find
-  * the first bit set in the mask register.  The same register exists
-  * on HSW already but it reads back as all ones when the current
-  * instruction has execution masking disabled, so it's kind of
-  * useless.
+  * the first bit set in the execution mask.  The only tricky part is
+  * that ce0 needs to be combined with dispatch mask from sr0.2 in
+  * order to get the actual set of channel enables.  Both registers
+  * exist on HSW already but it reads back ec0 as all ones when the
+  * current instruction has execution masking disabled, so it's kind
+  * of useless.
   */
- inst = brw_FBL(p, vec1(dst),
-retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD));
+ if (vector_mask_enable) {
+brw_MOV(p, vec1(dst), get_element_ud(brw_sr0_reg(), 3));
+ } else {
+brw_MOV(p, vec1(dst), get_element_ud(brw_sr0_reg(), 2));
+ }
+ brw_AND(p, vec1(dst),
+ retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD),
+ vec1(dst));
+
+ inst = brw_FBL(p, vec1(dst), vec1(dst));
 
  /* Quarter control has the effect of magically shifting the value of
   * this register so you'll get the first active channel relative to
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 2f4ba7b..f2c49da 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -2041,7 +2041,7 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
  break;
 
   case SHADER_OPCODE_FIND_LIVE_CHANNEL:
- brw_find_live_channel(p, dst);
+ brw_find_live_channel(p, dst, stage == MESA_SHADER_FRAGMENT);
  break;
 
   case SHADER_OPCODE_BROADCAST:
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 256abae..5ff9a3a 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1863,7 +1863,7 @@ generate_code(struct brw_codegen *p,
  break;
 
   case SHADER_OPCODE_FIND_LIVE_CHANNEL:
- brw_find_live_channel(p, dst);
+ brw_find_live_channel(p, dst, false);
  break;
 
   case SHADER_OPCODE_BROADCAST:
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 62/95] i965/vec4: Add a shuffle_64bit_data helper

2016-09-14 Thread Iago Toral
On Tue, 2016-09-13 at 22:24 -0700, Francisco Jerez wrote:
> Iago Toral  writes:
> 
> > 
> > On Mon, 2016-09-12 at 14:19 -0700, Francisco Jerez wrote:
> > > 
> > > Iago Toral Quiroga  writes:
> > > 
> > > > 
> > > > 
> > > > SIMD4x2 64bit data is stored in register space like this:
> > > > 
> > > > r0.0:DF  x0 y0 z0 w0
> > > > r0.1:DF  x1 y1 z1 w1
> > > > 
> > > > When we need to write data such as this to memory using 32-bit
> > > > write
> > > > messages we need to shuffle it in this fashion:
> > > > 
> > > > r0.0:DF  x0 y0 x1 y1
> > > > r0.1:DF  z0 w0 z1 w1
> > > > 
> > > > and emit two 32-bit write messages, one for r0.0 at base_offset
> > > > and another one for r0.1 at base_offset+16.
> > > > 
> > > > We also need to do the inverse operation when we read using 32-
> > > > bit
> > > > messages
> > > > to produce valid SIMD4x2 64bit data from the data read. We can
> > > > achieve this
> > > > by aplying the exact same shuffling to the data read, although
> > > > we
> > > > need to
> > > > apply different channel enables since the layout of the data is
> > > > reversed.
> > > > 
> > > > This helper implements the data shuffling logic and we will use
> > > > it
> > > > in
> > > > various places where we read and write 64bit data from/to
> > > > memory.
> > > > ---
> > > >  src/mesa/drivers/dri/i965/brw_vec4.h   |  5 ++
> > > >  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 96
> > > > ++
> > > >  2 files changed, 101 insertions(+)
> > > > 
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h
> > > > b/src/mesa/drivers/dri/i965/brw_vec4.h
> > > > index 26228d0..3337fc0 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_vec4.h
> > > > +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
> > > > @@ -327,6 +327,11 @@ public:
> > > >  
> > > > src_reg setup_imm_df(double v);
> > > >  
> > > > +   vec4_instruction *shuffle_64bit_data(dst_reg dst, src_reg
> > > > src,
> > > > +bool for_write,
> > > > +bblock_t *block =
> > > > NULL,
> > > > +vec4_instruction *ref
> > > > =
> > > > NULL);
> > > > +
> > > > virtual void emit_nir_code();
> > > > virtual void nir_setup_uniforms();
> > > > virtual void
> > > > nir_setup_system_value_intrinsic(nir_intrinsic_instr *instr);
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > > > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > > > index 450db92..346e822 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > > > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > > > @@ -2145,4 +2145,100 @@
> > > > vec4_visitor::nir_emit_undef(nir_ssa_undef_instr *instr)
> > > >    dst_reg(VGRF, alloc.allocate(instr->def.bit_size / 32));
> > > >  }
> > > >  
> > > > +/* SIMD4x2 64bit data is stored in register space like this:
> > > > + *
> > > > + * r0.0:DF  x0 y0 z0 w0
> > > > + * r0.1:DF  x1 y1 z1 w1
> > > > + *
> > > > + * When we need to write data such as this to memory using 32-
> > > > bit
> > > > write
> > > > + * messages we need to shuffle it in this fashion:
> > > > + *
> > > > + * r0.0:DF  x0 y0 x1 y1 (to be written at base offset)
> > > > + * r0.0:DF  z0 w0 z1 w1 (to be written at base offset + 16)
> > > > + *
> > > > + * We need to do the inverse operation when we read using 32-
> > > > bit
> > > > messages,
> > > > + * which we can do by applying the same exact shuffling on the
> > > > 64-
> > > > bit data
> > > > + * read, only that because the data for each vertex is
> > > > positioned
> > > > differently
> > > > + * we need to apply different channel enables.
> > > > + *
> > > > + * This function takes 64bit data and shuffles it as explained
> > > > above.
> > > > + *
> > > > + * The @for_write parameter is used to specify if the
> > > > shuffling is
> > > > being done
> > > > + * for proper SIMD4x2 64-bit data that needs to be shuffled
> > > > prior
> > > > to a 32-bit
> > > > + * write message (for_write = true), or instead we are doing
> > > > the
> > > > inverse
> > > > + * opperation and we have just read 64-bit data using a 32-bit
> > > > messages that we
> > > > + * need to shuffle to create valid SIMD4x2 64-bit data
> > > > (for_write
> > > > = false).
> > > > + *
> > > > + * If @block and @ref are non-NULL, then the shuffling is done
> > > > after @ref,
> > > > + * otherwise the instructions are emitted normally at the end.
> > > > The
> > > > function
> > > > + * returns the last instruction inserted.
> > > > + *
> > > > + * Notice that @src and @dst cannot be the same register.
> > > > + */
> > > > +vec4_instruction *
> > > > +vec4_visitor::shuffle_64bit_data(dst_reg dst, src_reg src,
> > > > bool
> > > > for_write,
> > > > + bblock_t *block,
> > > > vec4_instruction
> > > > *ref)
> > > > +{
> > > > +   assert(type_sz(src.type) == 8);
> > > > +   assert(type_sz(dst.type) == 8);
> > > > +   

Re: [Mesa-dev] [PATCH] i965/fs: Take the sample mask into account in FIND_LIVE_CHANNEL

2016-09-14 Thread Jason Ekstrand
On Wed, Sep 14, 2016 at 12:02 AM, Francisco Jerez 
wrote:

> Jason Ekstrand  writes:
>
> > Just looking at the channel enables is not sufficient, at least not on
> Sky
> > Lake.  Channels that are disabled by the sample_mask may show up in the
> > channel enable register as being enabled even if they are not executing.
> > This can cause FIND_LIVE_CHANNEL to return a channel that isn't actually
> > executing.  In our handling of interpolateAtSample we do a clever trick
> > with emit_uniformize to call the interpolator once for each unique sample
> > id.  Thanks to FIND_LIVE_CHANNEL returning a dead channel, we can get an
> > infinite loop which hangs the GPU.
> >
> > Signed-off-by: Jason Ekstrand 
>
> NAK, FIND_LIVE_CHANNEL returns channels from the EU execution mask by
> design (see the doxygen comment in brw_defines.h), which is necessary
> for the instruction to return a well-defined result when only helper
> invocations are enabled in the execution mask.  Several users of the
> instruction are likely to be relying on this.
>

Perhaps I need to be a bit more specific about what problem is being fixed
here.  It's not just that we need to take sample mask into account.  It's
actually a far more subtle problem.  It can happen (Yes, I've seen this in
practice) that a group of channels is disabled (i.e., doesn't execute, not
just helper invocations) but the corresponding bits in ec0 are set to 1.
Maybe this means that ec0 is still broken on Sky Lake.  Maybe it just means
that ec0 doesn't mean what it looks like it means.  I'm not sure.  What I
do know is that FIND_LIVE_CHANNEL is returning a 100% dead channel.

I'm fine if this is the wrong fix.  Maybe we need to just do the gen7 thing
everywhere.


> And isn't interpolateAtSample supposed to give a well-defined result too
> when it's run from a helper invocation?
>

Probably?


> > ---
> >  src/mesa/drivers/dri/i965/brw_eu.h   |  3 ++-
> >  src/mesa/drivers/dri/i965/brw_eu_emit.c  | 22
> +++---
> >  src/mesa/drivers/dri/i965/brw_fs_builder.h   |  3 ++-
> >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  2 +-
> >  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  2 +-
> >  5 files changed, 21 insertions(+), 11 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_eu.h
> b/src/mesa/drivers/dri/i965/brw_eu.h
> > index 3e52764..9aaab78 100644
> > --- a/src/mesa/drivers/dri/i965/brw_eu.h
> > +++ b/src/mesa/drivers/dri/i965/brw_eu.h
> > @@ -488,7 +488,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
> >
> >  void
> >  brw_find_live_channel(struct brw_codegen *p,
> > -  struct brw_reg dst);
> > +  struct brw_reg dst,
> > +  struct brw_reg sample_mask);
> >
> >  void
> >  brw_broadcast(struct brw_codegen *p,
> > diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> > index 3b12030..f593a8d 100644
> > --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> > +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> > @@ -3361,7 +3361,8 @@ brw_pixel_interpolator_query(struct brw_codegen
> *p,
> >  }
> >
> >  void
> > -brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst)
> > +brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst,
> > +  struct brw_reg sample_mask)
> >  {
> > const struct gen_device_info *devinfo = p->devinfo;
> > const unsigned exec_size = 1 << brw_inst_exec_size(devinfo,
> p->current);
> > @@ -3377,13 +3378,20 @@ brw_find_live_channel(struct brw_codegen *p,
> struct brw_reg dst)
> >
> >if (devinfo->gen >= 8) {
> >   /* Getting the first active channel index is easy on Gen8:
> Just find
> > -  * the first bit set in the mask register.  The same register
> exists
> > -  * on HSW already but it reads back as all ones when the
> current
> > -  * instruction has execution masking disabled, so it's kind of
> > -  * useless.
> > +  * the first bit set in the mask register AND the sample
> mask.  The
> > +  * same register exists on HSW already but it reads back as
> all ones
> > +  * when the current instruction has execution masking
> disabled, so
> > +  * it's kind of useless.
> >*/
> > - inst = brw_FBL(p, vec1(dst),
> > -retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD));
> > + struct brw_reg mask_reg = retype(brw_mask_reg(0),
> > +  BRW_REGISTER_TYPE_UD);
> > + if (sample_mask.file != BRW_IMMEDIATE_VALUE ||
> > + sample_mask.ud != 0x) {
> > +brw_AND(p, vec1(dst), mask_reg, sample_mask);
> > +mask_reg = vec1(dst);
> > + }
> > +
> > + inst = brw_FBL(p, vec1(dst), mask_reg);
> >
> >   /* Quarter control has the effect of magically shifting the
> value 

Re: [Mesa-dev] [PATCH] radeonsi: reload PS inputs with direct indexing at each use (v2)

2016-09-14 Thread Nicolai Hähnle

Reviewed-by: Nicolai Hähnle 

On 13.09.2016 22:20, Marek Olšák wrote:

From: Marek Olšák 

The LLVM compiler can CSE interp intrinsics thanks to
LLVMReadNoneAttribute.

26011 shaders in 14651 tests
Totals:
SGPRS: 1146340 -> 1132676 (-1.19 %)
VGPRS: 727371 -> 711730 (-2.15 %)
Spilled SGPRs: 2218 -> 2078 (-6.31 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 35841268 -> 36009732 (0.47 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 222559 -> 224779 (1.00 %)
Wait states: 0 -> 0 (0.00 %)

v2: don't call load_input for fragment shaders in emit_declaration
---
 src/gallium/drivers/radeon/radeon_llvm.h   |  6 -
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 30 ++
 src/gallium/drivers/radeonsi/si_shader.c   | 27 ---
 3 files changed, 41 insertions(+), 22 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_llvm.h 
b/src/gallium/drivers/radeon/radeon_llvm.h
index da5b7f5..f508d32 100644
--- a/src/gallium/drivers/radeon/radeon_llvm.h
+++ b/src/gallium/drivers/radeon/radeon_llvm.h
@@ -23,21 +23,23 @@
  * Authors: Tom Stellard 
  *
  */

 #ifndef RADEON_LLVM_H
 #define RADEON_LLVM_H

 #include 
 #include "gallivm/lp_bld_init.h"
 #include "gallivm/lp_bld_tgsi.h"
+#include "tgsi/tgsi_parse.h"

+#define RADEON_LLVM_MAX_INPUT_SLOTS 32
 #define RADEON_LLVM_MAX_INPUTS 32 * 4
 #define RADEON_LLVM_MAX_OUTPUTS 32 * 4

 #define RADEON_LLVM_INITIAL_CF_DEPTH 4

 #define RADEON_LLVM_MAX_SYSTEM_VALUES 4

 struct radeon_llvm_branch {
LLVMBasicBlockRef endif_block;
LLVMBasicBlockRef if_block;
@@ -55,33 +57,35 @@ struct radeon_llvm_context {

/*=== Front end configuration ===*/

/* Instructions that are not described by any of the TGSI opcodes. */

/** This function is responsible for initilizing the inputs array and 
will be
  * called once for each input declared in the TGSI shader.
  */
void (*load_input)(struct radeon_llvm_context *,
   unsigned input_index,
-  const struct tgsi_full_declaration *decl);
+  const struct tgsi_full_declaration *decl,
+  LLVMValueRef out[4]);

void (*load_system_value)(struct radeon_llvm_context *,
  unsigned index,
  const struct tgsi_full_declaration *decl);

void (*declare_memory_region)(struct radeon_llvm_context *,
  const struct tgsi_full_declaration *decl);

/** This array contains the input values for the shader.  Typically 
these
  * values will be in the form of a target intrinsic that will inform 
the
  * backend how to load the actual inputs to the shader.
  */
+   struct tgsi_full_declaration input_decls[RADEON_LLVM_MAX_INPUT_SLOTS];
LLVMValueRef inputs[RADEON_LLVM_MAX_INPUTS];
LLVMValueRef outputs[RADEON_LLVM_MAX_OUTPUTS][TGSI_NUM_CHANNELS];

/** This pointer is used to contain the temporary values.
  * The amount of temporary used in tgsi can't be bound to a max value 
and
  * thus we must allocate this array at runtime.
  */
LLVMValueRef *temps;
unsigned temps_count;
LLVMValueRef system_values[RADEON_LLVM_MAX_SYSTEM_VALUES];
diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 4643e6d..4fa43cd 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -439,28 +439,43 @@ LLVMValueRef radeon_llvm_emit_fetch(struct 
lp_build_tgsi_context *bld_base,
bld_base->int_bld.zero);
result = LLVMConstInsertElement(result,

bld->immediates[reg->Register.Index][swizzle + 1],
bld_base->int_bld.one);
return LLVMConstBitCast(result, ctype);
} else {
return 
LLVMConstBitCast(bld->immediates[reg->Register.Index][swizzle], ctype);
}
}

-   case TGSI_FILE_INPUT:
-   result = 
ctx->inputs[radeon_llvm_reg_index_soa(reg->Register.Index, swizzle)];
+   case TGSI_FILE_INPUT: {
+   unsigned index = reg->Register.Index;
+   LLVMValueRef input[4];
+
+   /* I don't think doing this for vertex shaders is beneficial.
+* For those, we want to make sure the VMEM loads are executed
+* only once. Fragment shaders don't care much, because
+* v_interp instructions are much cheaper than VMEM loads.
+*/
+   

Re: [Mesa-dev] [PATCH] radeonsi: get rid of img/buf/sampler descriptor preloading (v2)

2016-09-14 Thread Nicolai Hähnle

On 13.09.2016 22:20, Marek Olšák wrote:

From: Marek Olšák 

26011 shaders in 14651 tests
Totals:
SGPRS: 1251920 -> 1152636 (-7.93 %)
VGPRS: 728421 -> 728198 (-0.03 %)
Spilled SGPRs: 16644 -> 3776 (-77.31 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 36001064 -> 35835152 (-0.46 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 21 -> 222372 (0.07 %)
Wait states: 0 -> 0 (0.00 %)

v2: merge codepaths where possible
---
 src/gallium/drivers/radeonsi/si_shader.c | 173 ---
 1 file changed, 41 insertions(+), 132 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 84cbfd7..6f9c45f 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -100,25 +100,20 @@ struct si_shader_context

LLVMTargetMachineRef tm;

unsigned invariant_load_md_kind;
unsigned range_md_kind;
unsigned uniform_md_kind;
LLVMValueRef empty_md;

/* Preloaded descriptors. */
LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS];
-   LLVMValueRef shader_buffers[SI_NUM_SHADER_BUFFERS];
-   LLVMValueRef sampler_views[SI_NUM_SAMPLERS];
-   LLVMValueRef sampler_states[SI_NUM_SAMPLERS];
-   LLVMValueRef fmasks[SI_NUM_SAMPLERS];
-   LLVMValueRef images[SI_NUM_IMAGES];
LLVMValueRef esgs_ring;
LLVMValueRef gsvs_ring[4];

LLVMValueRef lds;
LLVMValueRef gs_next_vertex[4];
LLVMValueRef return_value;

LLVMTypeRef voidt;
LLVMTypeRef i1;
LLVMTypeRef i8;
@@ -3399,32 +3394,32 @@ static void membar_emit(
 {
struct si_shader_context *ctx = si_shader_context(bld_base);

emit_waitcnt(ctx);
 }

 static LLVMValueRef
 shader_buffer_fetch_rsrc(struct si_shader_context *ctx,
 const struct tgsi_full_src_register *reg)
 {
-   LLVMValueRef ind_index;
-   LLVMValueRef rsrc_ptr;
+   LLVMValueRef index;
+   LLVMValueRef rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn,
+SI_PARAM_SHADER_BUFFERS);

if (!reg->Register.Indirect)
-   return ctx->shader_buffers[reg->Register.Index];
-
-   ind_index = get_bounded_indirect_index(ctx, >Indirect,
-  reg->Register.Index,
-  SI_NUM_SHADER_BUFFERS);
+   index = LLVMConstInt(ctx->i32, reg->Register.Index, 0);
+   else
+   index = get_bounded_indirect_index(ctx, >Indirect,
+  reg->Register.Index,
+  SI_NUM_SHADER_BUFFERS);

-   rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, 
SI_PARAM_SHADER_BUFFERS);
-   return build_indexed_load_const(ctx, rsrc_ptr, ind_index);
+   return build_indexed_load_const(ctx, rsrc_ptr, index);
 }

 static bool tgsi_is_array_sampler(unsigned target)
 {
return target == TGSI_TEXTURE_1D_ARRAY ||
   target == TGSI_TEXTURE_SHADOW1D_ARRAY ||
   target == TGSI_TEXTURE_2D_ARRAY ||
   target == TGSI_TEXTURE_SHADOW2D_ARRAY ||
   target == TGSI_TEXTURE_CUBE_ARRAY ||
   target == TGSI_TEXTURE_SHADOWCUBE_ARRAY ||
@@ -3473,51 +3468,47 @@ static LLVMValueRef force_dcc_off(struct 
si_shader_context *ctx,
  * Load the resource descriptor for \p image.
  */
 static void
 image_fetch_rsrc(
struct lp_build_tgsi_context *bld_base,
const struct tgsi_full_src_register *image,
bool dcc_off,
LLVMValueRef *rsrc)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
+   LLVMValueRef rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn,
+SI_PARAM_IMAGES);
+   LLVMValueRef index, tmp;

assert(image->Register.File == TGSI_FILE_IMAGE);

if (!image->Register.Indirect) {
-   /* Fast path: use preloaded resources */
-   *rsrc = ctx->images[image->Register.Index];
+   index = LLVMConstInt(ctx->i32, image->Register.Index, 0);


I think it would be beneficial to put

if (info->images_writemask & (1 << image->Register.Index) &&
!(info->images_buffers & (1 << image->Register.Index)))
dcc_off = true;

here, so that CSE can work better when an image is both read from and 
written to.


Apart from that, the patch is

Reviewed-by: Nicolai Hähnle 


} else {
-   /* Indexing and manual load */
-   LLVMValueRef ind_index;
-   LLVMValueRef rsrc_ptr;
-   LLVMValueRef tmp;
-
/* From the GL_ARB_shader_image_load_store extension spec:
 *
 *If a shader performs an image load, store, or atomic
 

Re: [Mesa-dev] [PATCH] radeonsi: load streamout buffer descriptors before use (v2)

2016-09-14 Thread Nicolai Hähnle

Reviewed-by: Nicolai Hähnle 

On 13.09.2016 22:20, Marek Olšák wrote:

From: Marek Olšák 

v2: inline the code and remove the conditional that's a no-op now
---
 src/gallium/drivers/radeonsi/si_shader.c | 47 ++--
 1 file changed, 14 insertions(+), 33 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index be6fae7..d61f4ff 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -105,21 +105,20 @@ struct si_shader_context
unsigned uniform_md_kind;
LLVMValueRef empty_md;

LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS];
LLVMValueRef lds;
LLVMValueRef shader_buffers[SI_NUM_SHADER_BUFFERS];
LLVMValueRef sampler_views[SI_NUM_SAMPLERS];
LLVMValueRef sampler_states[SI_NUM_SAMPLERS];
LLVMValueRef fmasks[SI_NUM_SAMPLERS];
LLVMValueRef images[SI_NUM_IMAGES];
-   LLVMValueRef so_buffers[4];
LLVMValueRef esgs_ring;
LLVMValueRef gsvs_ring[4];
LLVMValueRef gs_next_vertex[4];
LLVMValueRef return_value;

LLVMTypeRef voidt;
LLVMTypeRef i1;
LLVMTypeRef i8;
LLVMTypeRef i32;
LLVMTypeRef i64;
@@ -2264,20 +2263,33 @@ static void si_dump_streamout(struct 
pipe_stream_output_info *so)
  * to buffers. */
 static void si_llvm_emit_streamout(struct si_shader_context *ctx,
   struct si_shader_output_values *outputs,
   unsigned noutput)
 {
struct pipe_stream_output_info *so = >shader->selector->so;
struct gallivm_state *gallivm = >radeon_bld.gallivm;
LLVMBuilderRef builder = gallivm->builder;
int i, j;
struct lp_build_if_state if_ctx;
+   LLVMValueRef so_buffers[4];
+   LLVMValueRef buf_ptr = LLVMGetParam(ctx->radeon_bld.main_fn,
+   SI_PARAM_RW_BUFFERS);
+
+   /* Load the descriptors. */
+   for (i = 0; i < 4; ++i) {
+   if (ctx->shader->selector->so.stride[i]) {
+   LLVMValueRef offset = lp_build_const_int32(gallivm,
+  
SI_VS_STREAMOUT_BUF0 + i);
+
+   so_buffers[i] = build_indexed_load_const(ctx, buf_ptr, 
offset);
+   }
+   }

/* Get bits [22:16], i.e. (so_param >> 16) & 127; */
LLVMValueRef so_vtx_count =
unpack_param(ctx, ctx->param_streamout_config, 16, 7);

LLVMValueRef tid = get_thread_id(ctx);

/* can_emit = tid < so_vtx_count; */
LLVMValueRef can_emit =
LLVMBuildICmp(builder, LLVMIntULT, tid, so_vtx_count, "");
@@ -2359,21 +2371,21 @@ static void si_llvm_emit_streamout(struct 
si_shader_context *ctx,
}
break;
}

LLVMValueRef can_emit_stream =
LLVMBuildICmp(builder, LLVMIntEQ,
  stream_id,
  lp_build_const_int32(gallivm, stream), 
"");

lp_build_if(_ctx_stream, gallivm, can_emit_stream);
-   build_tbuffer_store_dwords(ctx, 
ctx->so_buffers[buf_idx],
+   build_tbuffer_store_dwords(ctx, so_buffers[buf_idx],
   vdata, num_comps,
   so_write_offset[buf_idx],
   LLVMConstInt(ctx->i32, 0, 0),
   so->output[i].dst_offset*4);
lp_build_endif(_ctx_stream);
}
}
lp_build_endif(_ctx);
 }

@@ -5917,49 +5929,20 @@ static void preload_images(struct si_shader_context 
*ctx)
 lp_build_const_int32(gallivm, 
i));

if (info->images_writemask & (1 << i) &&
!(info->images_buffers & (1 << i)))
rsrc = force_dcc_off(ctx, rsrc);

ctx->images[i] = rsrc;
}
 }

-static void preload_streamout_buffers(struct si_shader_context *ctx)
-{
-   struct lp_build_tgsi_context *bld_base = >radeon_bld.soa.bld_base;
-   struct gallivm_state *gallivm = bld_base->base.gallivm;
-   unsigned i;
-
-   /* Streamout can only be used if the shader is compiled as VS. */
-   if (!ctx->shader->selector->so.num_outputs ||
-   (ctx->type == PIPE_SHADER_VERTEX &&
-(ctx->shader->key.vs.as_es ||
- ctx->shader->key.vs.as_ls)) ||
-   (ctx->type == PIPE_SHADER_TESS_EVAL &&
-ctx->shader->key.tes.as_es))
-   return;
-
-   LLVMValueRef buf_ptr = 

[Mesa-dev] [Bug 94627] Game Risen on wine black grass

2016-09-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=94627

Nicolai Hähnle  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Nicolai Hähnle  ---
Closing as per comment #9. Thanks for pointing this out.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi/compute: Use the HSA abi for non-TGSI compute shaders v2

2016-09-14 Thread Nicolai Hähnle

On 13.09.2016 19:16, Tom Stellard wrote:

This patch switches non-TGSI compute shaders over to using the HSA
ABI described here:

https://github.com/RadeonOpenCompute/ROCm-Docs/blob/master/AMDGPU-ABI.md

The HSA ABI provides a much cleaner interface for compute shaders and allows
us to share more code in the compiler with the HSA stack.

The main changes in this patch are:
  - We now pass the scratch buffer resource into the shader via user sgprs
rather than using relocations.
  - Grid/Block sizes are now passed to the shader via the dispatch packet
rather than at the beginning of the kernel arguments.

Typically for HSA, the CP firmware will create the dispatch packet and set
up the user sgprs automatically.  However, in Mesa we let the driver do
this work.  The main reason for this is that I haven't researched how to
get the CP to do all these things, and I'm not sure if it is supported
for all GPUs.

v2:
  - Add comments explaining why we are setting certian bits of the scratch
resource descriptor.


Spelling: certain :)


---
 src/gallium/drivers/radeon/r600_pipe_common.c|   6 +-
 src/gallium/drivers/radeonsi/amd_kernel_code_t.h | 534 +++
 src/gallium/drivers/radeonsi/si_compute.c| 236 +-
 3 files changed, 758 insertions(+), 18 deletions(-)
 create mode 100644 src/gallium/drivers/radeonsi/amd_kernel_code_t.h

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 6d7cc1b..8f17f36 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -822,7 +822,11 @@ static int r600_get_compute_param(struct pipe_screen 
*screen,
if (rscreen->family <= CHIP_ARUBA) {
triple = "r600--";
} else {
-   triple = "amdgcn--";
+   if (HAVE_LLVM < 0x0400) {
+   triple = "amdgcn--";
+   } else {
+   triple = "amdgcn--mesa3d";
+   }
}
switch(rscreen->family) {
/* Clang < 3.6 is missing Hainan in its list of
diff --git a/src/gallium/drivers/radeonsi/amd_kernel_code_t.h 
b/src/gallium/drivers/radeonsi/amd_kernel_code_t.h
new file mode 100644
index 000..d0d7809
--- /dev/null
+++ b/src/gallium/drivers/radeonsi/amd_kernel_code_t.h


This could go into src/amd/common, though I admittedly don't feel too 
strongly about it.



[snip]

diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index a79c224..0603553 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -28,6 +28,7 @@
 #include "radeon/r600_pipe_common.h"
 #include "radeon/radeon_elf_util.h"

+#include "amd_kernel_code_t.h"
 #include "radeon/r600_cs.h"
 #include "si_pipe.h"
 #include "si_shader.h"
@@ -43,8 +44,52 @@ struct si_compute {
struct si_shader shader;

struct pipe_resource *global_buffers[MAX_GLOBAL_BUFFERS];
+   bool use_code_object_v2;
 };

+struct dispatch_packet {
+   uint16_t header;
+   uint16_t setup;
+   uint16_t workgroup_size_x;
+   uint16_t workgroup_size_y;
+   uint16_t workgroup_size_z;
+   uint16_t reserved0;
+   uint32_t grid_size_x;
+   uint32_t grid_size_y;
+   uint32_t grid_size_z;
+   uint32_t private_segment_size;
+   uint32_t group_segment_size;
+   uint64_t kernel_object;
+   uint64_t kernarg_address;
+   uint64_t reserved2;
+};
+
+static const amd_kernel_code_t *si_compute_get_code_object(
+   const struct si_compute *program,
+   uint64_t symbol_offset)
+{
+   if (!program->use_code_object_v2) {
+   return NULL;
+   }
+   return (const amd_kernel_code_t*)
+   (program->shader.binary.code + symbol_offset);
+}
+
+static void code_object_to_config(const amd_kernel_code_t *code_object,
+ struct si_shader_config *out_config) {
+
+   uint32_t rsrc1 = code_object->compute_pgm_resource_registers;
+   uint32_t rsrc2 = code_object->compute_pgm_resource_registers >> 32;
+   out_config->num_sgprs = code_object->wavefront_sgpr_count;
+   out_config->num_vgprs = code_object->workitem_vgpr_count;
+   out_config->float_mode = G_00B028_FLOAT_MODE(rsrc1);
+   out_config->rsrc1 = rsrc1;
+   out_config->lds_size = MAX2(out_config->lds_size, 
G_00B84C_LDS_SIZE(rsrc2));
+   out_config->rsrc2 = rsrc2;
+   out_config->scratch_bytes_per_wave =
+   align(code_object->workitem_private_segment_byte_size * 64, 
1024);
+}
+
 static void *si_create_compute_state(
struct pipe_context *ctx,
const struct pipe_compute_state *cso)
@@ -59,6 +104,8 @@ static void *si_create_compute_state(
program->local_size = cso->req_local_mem;
program->private_size 

Re: [Mesa-dev] [PATCH] egl/wayland-egl: Fix for segfault in dri2_wl_destroy_surface.

2016-09-14 Thread Pekka Paalanen
On Wed, 24 Aug 2016 10:23:11 +0100
Emil Velikov  wrote:

> On 24 August 2016 at 08:48, Stencel, Joanna  wrote:
> > I couldn't find any clear requirement about order of destroys in EGL (or 
> > wayland) specification.
> > Also, in EGL spec (3.7.3, about eglMakeCurrent) one can find:
> >
> > " If a native window underlying either draw or read is no longer valid, an
> > EGL_BAD_NATIVE_WINDOW error is generated."
> > "If a native window or pixmap underlying the draw or read surfaces is
> > destroyed, rendering and readback are handled as above."
> >
> > So it seems that in general destroying native window underlying existing 
> > surface can be a case
> > and should be handled.
> > I agree that it's reasonable to call eglDestroySurface() first and probably 
> > most people do this.
> > However, I think that different user's usage shouldn't cause a crash (or 
> > memory issues).
> >  
> Completely agree - crashing (esp in the driver) isn't what we want.
> Seems like we don't handle the case you mentioned. Care to spin a
> patch ?
> 
> > Could you explain what you call a memory leak here? Pointer which is 
> > nullified points to already
> > free'd structure of wayland window.
> >  
> Hmm you're right - it's the EGL implementation's back pointer that
> gets nullified and not the original one used by the wayland-egl (as I
> originally thought).
> 
> With that said, I've rebased the patch on top of master added the tags
> and pushed to master.

Hi,

I was about to scream that the patch changes the public stable ABI, but
no, it does not and all seems fine. The reason for my mistake is that
there are actually two different "native window" types at play here:
wl_surface and wl_egl_surface. Of course, strictly from EGL point of
view, there is only wl_egl_surface. Here is some background information
in case you are interested for the other case.

You are handling the premature destruction of wl_egl_surface which is
very nice.

Handling the premature destruction of wl_surface is practically
impossible. The story for that can be found in the thread starting at:
https://lists.freedesktop.org/archives/wayland-devel/2016-May/029134.html
and continues in:
https://lists.freedesktop.org/archives/wayland-devel/2016-June/029339.html


Thanks,
pq


pgphdkXhXjciR.pgp
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Take the sample mask into account in FIND_LIVE_CHANNEL

2016-09-14 Thread Francisco Jerez
Jason Ekstrand  writes:

> Just looking at the channel enables is not sufficient, at least not on Sky
> Lake.  Channels that are disabled by the sample_mask may show up in the
> channel enable register as being enabled even if they are not executing.
> This can cause FIND_LIVE_CHANNEL to return a channel that isn't actually
> executing.  In our handling of interpolateAtSample we do a clever trick
> with emit_uniformize to call the interpolator once for each unique sample
> id.  Thanks to FIND_LIVE_CHANNEL returning a dead channel, we can get an
> infinite loop which hangs the GPU.
>
> Signed-off-by: Jason Ekstrand 

NAK, FIND_LIVE_CHANNEL returns channels from the EU execution mask by
design (see the doxygen comment in brw_defines.h), which is necessary
for the instruction to return a well-defined result when only helper
invocations are enabled in the execution mask.  Several users of the
instruction are likely to be relying on this.

And isn't interpolateAtSample supposed to give a well-defined result too
when it's run from a helper invocation?

> ---
>  src/mesa/drivers/dri/i965/brw_eu.h   |  3 ++-
>  src/mesa/drivers/dri/i965/brw_eu_emit.c  | 22 +++---
>  src/mesa/drivers/dri/i965/brw_fs_builder.h   |  3 ++-
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  2 +-
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  2 +-
>  5 files changed, 21 insertions(+), 11 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
> b/src/mesa/drivers/dri/i965/brw_eu.h
> index 3e52764..9aaab78 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu.h
> +++ b/src/mesa/drivers/dri/i965/brw_eu.h
> @@ -488,7 +488,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
>  
>  void
>  brw_find_live_channel(struct brw_codegen *p,
> -  struct brw_reg dst);
> +  struct brw_reg dst,
> +  struct brw_reg sample_mask);
>  
>  void
>  brw_broadcast(struct brw_codegen *p,
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> index 3b12030..f593a8d 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> @@ -3361,7 +3361,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
>  }
>  
>  void
> -brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst)
> +brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst,
> +  struct brw_reg sample_mask)
>  {
> const struct gen_device_info *devinfo = p->devinfo;
> const unsigned exec_size = 1 << brw_inst_exec_size(devinfo, p->current);
> @@ -3377,13 +3378,20 @@ brw_find_live_channel(struct brw_codegen *p, struct 
> brw_reg dst)
>  
>if (devinfo->gen >= 8) {
>   /* Getting the first active channel index is easy on Gen8: Just find
> -  * the first bit set in the mask register.  The same register exists
> -  * on HSW already but it reads back as all ones when the current
> -  * instruction has execution masking disabled, so it's kind of
> -  * useless.
> +  * the first bit set in the mask register AND the sample mask.  The
> +  * same register exists on HSW already but it reads back as all ones
> +  * when the current instruction has execution masking disabled, so
> +  * it's kind of useless.
>*/
> - inst = brw_FBL(p, vec1(dst),
> -retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD));
> + struct brw_reg mask_reg = retype(brw_mask_reg(0),
> +  BRW_REGISTER_TYPE_UD);
> + if (sample_mask.file != BRW_IMMEDIATE_VALUE ||
> + sample_mask.ud != 0x) {
> +brw_AND(p, vec1(dst), mask_reg, sample_mask);
> +mask_reg = vec1(dst);
> + }
> +
> + inst = brw_FBL(p, vec1(dst), mask_reg);
>  
>   /* Quarter control has the effect of magically shifting the value of
>* this register so you'll get the first active channel relative to
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
> b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> index 483672f..45b5f88 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> @@ -407,7 +407,8 @@ namespace brw {
>   const dst_reg chan_index = vgrf(BRW_REGISTER_TYPE_UD);
>   const dst_reg dst = vgrf(src.type);
>  
> - ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index);
> + ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index,
> +   sample_mask_reg());
>   ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, component(chan_index, 
> 0));
>  
>   return src_reg(component(dst, 0));
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index 2f4ba7b..d923b0b 100644
> --- 

Re: [Mesa-dev] [PATCH 48/95] i965/vec4: add a force_vstride0 flag to src_reg

2016-09-14 Thread Iago Toral
On Tue, 2016-09-13 at 22:12 -0700, Francisco Jerez wrote:
> Iago Toral  writes:
> 
> > 
> > On Mon, 2016-09-12 at 14:05 -0700, Francisco Jerez wrote:
> > > 
> > > Iago Toral Quiroga  writes:
> > > 
> > > > 
> > > > 
> > > > We will use this in cases where we want to force the vstride of
> > > > a
> > > > src_reg
> > > > to 0 to exploit a particular behavior of the hardware. It will
> > > > come
> > > > in
> > > > handy to implement access to components Z/W.
> > > > ---
> > > >  src/mesa/drivers/dri/i965/brw_ir_vec4.h | 1 +
> > > >  src/mesa/drivers/dri/i965/brw_vec4.cpp  | 2 ++
> > > >  2 files changed, 3 insertions(+)
> > > > 
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> > > > b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> > > > index f66c093..f3cce4b 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> > > > +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> > > > @@ -51,6 +51,7 @@ public:
> > > > explicit src_reg(const dst_reg );
> > > >  
> > > > src_reg *reladdr;
> > > > +   bool force_vstride0;
> > > I was wondering whether it would make more sense to unify this
> > > with
> > > the
> > > FS back-end's fs_reg::stride (a numeric stride field is also
> > > likely
> > > more
> > > convenient to do arithmetic on than a boolean) and promote it to
> > > backend_reg?  It could be defined as the number of components to
> > > jump
> > > over for each logical channel of the register, which is just the
> > > vstride
> > > in single-precision SIMD4x2 and the hstride in scalar mode.
> > We could do that, but I thought it would be a good idea to make it
> > clear that here we are using the vstride=0 with a very specific
> > intention and we don't expect the hardware to do what it would be
> > expected (we are trying to exploit a hardware bug after all). If we
> > were to use a normal stride field for this I think we would make
> > this
> > intention much less obvious and other people reading the code would
> > have a much harder time understanding what is really going on.
> > Since we
> > are being tricky here I think the extra field to signal that we are
> > trying to do something "special" might be worth it: people can
> > track
> > where we read and write that field and see exactly where it is
> > being
> > used for the purpose of exploiting this particular hardware
> > behavior.
> > 
> Yes, I agree that the hardware's behavior on Gen7 with non-identity
> vstride is tricky and special -- Special enough that *none* of the
> VEC4
> optimization passes and IR-handling code need to be aware of it,
> because
> the field is only going to be used as internal book-keeping data
> structure in convert_to_hw_regs() and immediately discarded.  IOW
> you're
> storing an internal data structure of convert_to_hw_regs() as part of
> the shared IR data structure, with no well-defined semantics and
> which
> no back-end code (not even convert_to_hw_regs()) is going to be able
> to
> honor.
> 
> So if your argument for making the representation of vstride
> unnecessarily non-orthogonal is that you want to discourage people
> from
> using it at the IR level (which is fair because it won't work at
> all!),
> I would argue that it doesn't belong in the IR data structures in the
> first place, because you could just keep convert_to_hw_regs' internal
> data structures internal to convert_to_hw_regs.  (I don't actually
> think
> you need the data structure, neither internal nor external, but more
> on
> that later)

Yes, that makes sense.

> > 
> > > 
> > > But thinking about it some more, I wonder if it's really
> > > necessary to
> > > expose vertical strides at the IR level?  Aren't you planing to
> > > use
> > > this
> > > during the conversion to HW registers exclusively?  Why don't you
> > > set
> > > the vstride field directly in that case?
> > Yes, this is used exclusively at that time. The conversion to
> > hardware
> > registers in convert_to_hw_regs() happens in two stages now:
> > 
> > We call our 'expand_64bit_swizzle_to_32bit()' helper first. This
> > one
> > takes care of checking the regioning on DF instructions, translate
> > swizzles and set force_vstrid0 to true when needed (which is also
> > the
> > only place that would set this to true). Then the rest of the code
> > in
> > convert_to_hw_regs() just operates as usual, only that it will
> > check
> > the force_vstride0 setting to decide the vstride to use for DF
> > regions.
> > 
> > I did it like this because it allows us to keep the DF swizzle
> > translation and regioning checking logic separated from the
> > conversion
> > to hardware registers, but this separation means that we need to
> > tell
> > the latter when it has to set the vstride to 0, thus the addition
> > of
> > the forcE_vstride0 field. I think having these two things separated
> > makes sense and makes the code easier to read. We can keep both
> > things
> > separate and still avoid the force_vstride0 field by using a stride
> > 

[Mesa-dev] [PATCH] i965/fs: Take the sample mask into account in FIND_LIVE_CHANNEL

2016-09-14 Thread Jason Ekstrand
Just looking at the channel enables is not sufficient, at least not on Sky
Lake.  Channels that are disabled by the sample_mask may show up in the
channel enable register as being enabled even if they are not executing.
This can cause FIND_LIVE_CHANNEL to return a channel that isn't actually
executing.  In our handling of interpolateAtSample we do a clever trick
with emit_uniformize to call the interpolator once for each unique sample
id.  Thanks to FIND_LIVE_CHANNEL returning a dead channel, we can get an
infinite loop which hangs the GPU.

Signed-off-by: Jason Ekstrand 
---
 src/mesa/drivers/dri/i965/brw_eu.h   |  3 ++-
 src/mesa/drivers/dri/i965/brw_eu_emit.c  | 22 +++---
 src/mesa/drivers/dri/i965/brw_fs_builder.h   |  3 ++-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  2 +-
 5 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
b/src/mesa/drivers/dri/i965/brw_eu.h
index 3e52764..9aaab78 100644
--- a/src/mesa/drivers/dri/i965/brw_eu.h
+++ b/src/mesa/drivers/dri/i965/brw_eu.h
@@ -488,7 +488,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
 
 void
 brw_find_live_channel(struct brw_codegen *p,
-  struct brw_reg dst);
+  struct brw_reg dst,
+  struct brw_reg sample_mask);
 
 void
 brw_broadcast(struct brw_codegen *p,
diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 3b12030..f593a8d 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -3361,7 +3361,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
 }
 
 void
-brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst)
+brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst,
+  struct brw_reg sample_mask)
 {
const struct gen_device_info *devinfo = p->devinfo;
const unsigned exec_size = 1 << brw_inst_exec_size(devinfo, p->current);
@@ -3377,13 +3378,20 @@ brw_find_live_channel(struct brw_codegen *p, struct 
brw_reg dst)
 
   if (devinfo->gen >= 8) {
  /* Getting the first active channel index is easy on Gen8: Just find
-  * the first bit set in the mask register.  The same register exists
-  * on HSW already but it reads back as all ones when the current
-  * instruction has execution masking disabled, so it's kind of
-  * useless.
+  * the first bit set in the mask register AND the sample mask.  The
+  * same register exists on HSW already but it reads back as all ones
+  * when the current instruction has execution masking disabled, so
+  * it's kind of useless.
   */
- inst = brw_FBL(p, vec1(dst),
-retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD));
+ struct brw_reg mask_reg = retype(brw_mask_reg(0),
+  BRW_REGISTER_TYPE_UD);
+ if (sample_mask.file != BRW_IMMEDIATE_VALUE ||
+ sample_mask.ud != 0x) {
+brw_AND(p, vec1(dst), mask_reg, sample_mask);
+mask_reg = vec1(dst);
+ }
+
+ inst = brw_FBL(p, vec1(dst), mask_reg);
 
  /* Quarter control has the effect of magically shifting the value of
   * this register so you'll get the first active channel relative to
diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
b/src/mesa/drivers/dri/i965/brw_fs_builder.h
index 483672f..45b5f88 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
+++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
@@ -407,7 +407,8 @@ namespace brw {
  const dst_reg chan_index = vgrf(BRW_REGISTER_TYPE_UD);
  const dst_reg dst = vgrf(src.type);
 
- ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index);
+ ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index,
+   sample_mask_reg());
  ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, component(chan_index, 
0));
 
  return src_reg(component(dst, 0));
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 2f4ba7b..d923b0b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -2041,7 +2041,7 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
  break;
 
   case SHADER_OPCODE_FIND_LIVE_CHANNEL:
- brw_find_live_channel(p, dst);
+ brw_find_live_channel(p, dst, src[0]);
  break;
 
   case SHADER_OPCODE_BROADCAST:
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 256abae..63fca6f 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1863,7 +1863,7 @@