Re: [Mesa-dev] [RFC] r600/evergreen compute shader + glsl 4.30 support

2017-11-29 Thread Dave Airlie
On 30 November 2017 at 17:20, Gert Wollny  wrote:
> Am Donnerstag, den 30.11.2017, 09:30 +1000 schrieb Dave Airlie:
>> On 29 November 2017 at 22:46, Gert Wollny 
>> wrote:
>> >
>> >
>> > I run the arb_compute_shader piglits on BARTS, the piglits
>> >
>> >basic-texelfetch
>> >border-color
>> >multiple-workgroups
>> >basic-uniform-access
>> >multiple-texture-reading
>> >simple-barrier
>> >
>> > result in GPU lockups and, consequently, fail. The other 20 tests
>> > pass.
>>
>> Does the attached patch help with the lockups at all?
> no, no changes with the arb_compute_shader tests,

Could you give:
https://cgit.freedesktop.org/~airlied/mesa/log/?h=r600-wip-cs a spin?

I'm guessing WIP hacks might fix it, but I really want to avoid flushing.
it might be necessary to reemit a bunch of graphics state.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] r600/evergreen compute shader + glsl 4.30 support

2017-11-29 Thread Gert Wollny
Am Donnerstag, den 30.11.2017, 09:30 +1000 schrieb Dave Airlie:
> On 29 November 2017 at 22:46, Gert Wollny 
> wrote:
> > 
> > 
> > I run the arb_compute_shader piglits on BARTS, the piglits
> > 
> >    basic-texelfetch
> >    border-color
> >    multiple-workgroups
> >    basic-uniform-access
> >    multiple-texture-reading
> >    simple-barrier
> > 
> > result in GPU lockups and, consequently, fail. The other 20 tests
> > pass.
> 
> Does the attached patch help with the lockups at all?
no, no changes with the arb_compute_shader tests, 

Best,
Gert 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/isl: Declare private array as static const

2017-11-29 Thread Tapani Pälli

Reviewed-by: Tapani Pälli 

On 11/29/2017 09:10 PM, Chad Versace wrote:

It's array isl_drm.c:modifier_info[] .
---
  src/intel/isl/isl_drm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/isl/isl_drm.c b/src/intel/isl/isl_drm.c
index eb3c6f59138..31895e15e16 100644
--- a/src/intel/isl/isl_drm.c
+++ b/src/intel/isl/isl_drm.c
@@ -71,7 +71,7 @@ isl_tiling_from_i915_tiling(uint32_t tiling)
 unreachable("Invalid i915 tiling");
  }
  
-struct isl_drm_modifier_info modifier_info[] = {

+static const struct isl_drm_modifier_info modifier_info[] = {
 {
.modifier = DRM_FORMAT_MOD_NONE,
.name = "DRM_FORMAT_MOD_NONE",


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-29 Thread James Jones

On 11/29/2017 01:10 PM, Rob Clark wrote:

On Wed, Nov 29, 2017 at 12:33 PM, Jason Ekstrand  wrote:

On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:


On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
wrote:

On November 24, 2017 09:29:43 Rob Clark  wrote:



On Mon, Nov 20, 2017 at 8:11 PM, James Jones 
wrote:


As many here know at this point, I've been working on solving issues
related
to DMA-capable memory allocation for various devices for some time
now.
I'd
like to take this opportunity to apologize for the way I handled the
EGL
stream proposals.  I understand now that the development process
followed
there was unacceptable to the community and likely offended many great
engineers.

Moving forward, I attempted to reboot talks in a more constructive
manner
with the generic allocator library proposals & discussion forum at XDC
2016.
Some great design ideas came out of that, and I've since been
prototyping
some code to prove them out before bringing them back as official
proposals.
Again, I understand some people are growing concerned that I've been
doing
this off on the side in a github project that has primarily NVIDIA
contributors.  My goal was only to avoid wasting everyone's time with
unproven ideas.  The intent was never to dump the prototype code as-is
on
the community and presume acceptance. It's just a public research
project.

Now the prototyping is nearing completion, and I'd like to renew
discussion
on whether and how the new mechanisms can be integrated with the Linux
graphics stack.

I'd be interested to know if more work is needed to demonstrate the
usefulness of the new mechanisms, or whether people think they have
value
at
this point.

After talking with people on the hallway track at XDC this year, I've
heard
several proposals for incorporating the new mechanisms:

-Include ideas from the generic allocator design into GBM.  This could
take
the form of designing a "GBM 2.0" API, or incrementally adding to the
existing GBM API.

-Develop a library to replace GBM.  The allocator prototype code could
be
massaged into something production worthy to jump start this process.

-Develop a library that sits beside or on top of GBM, using GBM for
low-level graphics buffer allocation, while supporting non-graphics
kernel
APIs directly.  The additional cross-device negotiation and sorting of
capabilities would be handled in this slightly higher-level API before
handing off to GBM and other APIs for actual allocation somehow.



tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
still the "winsys" for running on "bare metal" (ie. kms).  And we
don't want to saddle $new_thing with aspects of that, but rather have
it focus on being the thing that in multiple-"device"[1] scenarious
figures out what sort of buffer can be allocated by who for sharing.
Ie $new_thing should really not care about winsys level things like
cursors or surfaces.. only buffers.

The mesa implementation of $new_thing could sit on top of GBM,
although it could also just sit on top of the same internal APIs that
GBM sits on top of.  That is an implementation detail.  It could be
that GBM grows an API to return an instance of $new_thing for
use-cases that involve sharing a buffer with the GPU.  Or perhaps that
is exposed via some sort of EGL extension.  (We probably also need a
way to get an instance from libdrm (?) for display-only KMS drivers,
to cover cases like etnaviv sharing a buffer with a separate display
driver.)

[1] where "devices" could be multiple GPUs or multiple APIs for one or
more GPUs, but also includes non-GPU devices like camera, video
decoder, "image processor" (which may or may not be part of camera),
etc, etc



I'm not quite some sure what I think about this.  I think I would like
to
see $new_thing at least replace the guts of GBM. Whether GBM becomes a
wrapper around $new_thing or $new_thing implements the GBM API, I'm not
sure.  What I don't think I want is to see GBM development continuing on
it's own so we have two competing solutions.


I don't really view them as competing.. there is *some* overlap, ie.
allocating a buffer.. but even if you are using GBM w/out $new_thing
you could allocate a buffer externally and import it.  I don't see
$new_thing as that much different from GBM PoV.

But things like surfaces (aka swap chains) seem a bit out of place
when you are thinking about implementing $new_thing for non-gpu
devices.  Plus EGL<->GBM tie-ins that seem out of place when talking
about a (for ex.) camera.  I kinda don't want to throw out the baby
with the bathwater here.



Agreed.  GBM is very EGLish and we don't want the new allocator to be that.



*maybe* GBM could be partially implemented on top of $new_thing.  I
don't quite see how that would work.  Possibly we could deprecate
parts of GBM that are no longer needed?  idk..  Either way, I fully
expect that GBM and mesa's 

Re: [Mesa-dev] [PATCH] egl/android: Partially handle HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED

2017-11-29 Thread Tapani Pälli



On 11/30/2017 06:13 AM, Tomasz Figa wrote:

On Thu, Nov 30, 2017 at 3:43 AM, Robert Foss  wrote:

Hey,

On Tue, 2017-11-28 at 11:49 +, Emil Velikov wrote:

On 28 November 2017 at 10:45, Tapani Pälli 
wrote:

Hi;


On 11/27/2017 04:14 PM, Robert Foss wrote:


From: Tomasz Figa 

There is no API available to properly query the
IMPLEMENTATION_DEFINED
format. As a workaround we rely here on gralloc allocating either
an arbitrary YCbCr 4:2:0 or RGBX_, with the latter being
recognized
by lock_ycbcr failing.

Reviewed-on: https://chromium-review.googlesource.com/566793

Signed-off-by: Tomasz Figa 
Reviewed-by: Chad Versace 
Signed-off-by: Robert Foss 
---
   src/egl/drivers/dri2/platform_android.c | 39
+++--
   1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_android.c
b/src/egl/drivers/dri2/platform_android.c
index 63223e9a69..ae914d79c1 100644
--- a/src/egl/drivers/dri2/platform_android.c
+++ b/src/egl/drivers/dri2/platform_android.c
@@ -59,6 +59,10 @@ static const struct droid_yuv_format
droid_yuv_formats[] = {
  { HAL_PIXEL_FORMAT_YCbCr_420_888,   0, 1,
__DRI_IMAGE_FOURCC_YUV420
},
  { HAL_PIXEL_FORMAT_YCbCr_420_888,   1, 1,
__DRI_IMAGE_FOURCC_YVU420
},
  { HAL_PIXEL_FORMAT_YV12,1, 1,
__DRI_IMAGE_FOURCC_YVU420
},
+   /* HACK: See droid_create_image_from_prime_fd() and
b/32077885. */
+   { HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED,   0, 2,
__DRI_IMAGE_FOURCC_NV12 },
+   { HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED,   0, 1,
__DRI_IMAGE_FOURCC_YUV420 },
+   { HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED,   1, 1,
__DRI_IMAGE_FOURCC_YVU420 },



One alternative way would be to ask gralloc about these formats. On
gralloc0
this would need a perform() hook and gralloc1 has getFormat(). This
is how
it is done currently on Android-IA, see following commits:

https://github.com/intel/external-mesa/commit/deb323eafa321c725805a
702ed19cb4983346b60

https://github.com/intel/external-mesa/commit/7cc01beaf540e29862853
561ef93c6c4e86c4c1a

Do you think this approach would work with Chromium as well?



i think the Android-IA approach looks good, although it depends on
local gralloc0 changes. With gralloc1 on the horizon, I don't know
how
much sense it makes to extend the predecessor.
AFAICT the patch should not cause any issues and it's nicely
documented.


I had a look at the chromiumos/minigbm implementation, and it does not
contain a gralloc1 implementation as far as I can see. I assume that it
is available somewhere, but maybe not on a public branch.

Would it be possible to make the minigbm gralloc1 impl. public? That
way I could submit a patch mirroring what intel/minigbm does.

If you fine folks as at Google prefer to roll it yourselves, just give
me a poke.


There is no gralloc1 implementation for ChromiumOS minigbm and AFAIK
we don't have any plans of adding one. AFAICT there is nothing we
would gain with it over gralloc0.



Those are the two options I'm seeing.

As for gralloc0 support, would it be needed?



Perhaps someone from the Google/CrOS team can assist in making the
bug
public, although even then it might be better to focus on a 'perfect'
gralloc1?

IMHO the patch looks perfectly reasonable and we could merge it even,
if we were to switch to gralloc1 in the not too distant future ;-)


Maybe doing both is reasonable.


I believe there isn't much adoption of gralloc1 in the wild.
Android-IA is the first I saw (might have missed something, though).
Tapani, what was the reason for switching to gralloc1?


Main reason was that we thought this is something Android will be moving 
in to (and deprecating gralloc0). But now if it's gone, it does not make 
sense to support it.



Could we just support gralloc0 for now in Mesa, make sure the next
generation IAllocator/IMapper stuff suites our needs and switch to it
later when it happens?


Yes, this sounds good to me.


(As a side note, I had an idea to create a new interface, standardized
by Mesa, let's say libdri_android, completely free of any
gralloc-internals. It would have to be exposed additionally by any
Android that intends to run Mesa. Given the need to deal with 3
different gralloc versions already, it could be something easier to
manage.)


Makes sense, it is a bit messy and we have bit too much patches on our 
tree because of these differences.


// Tapani
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-29 Thread James Jones

On 11/29/2017 04:09 PM, Miguel Angel Vico wrote:



On Wed, 29 Nov 2017 16:28:15 -0500
Rob Clark  wrote:


On Wed, Nov 29, 2017 at 2:41 PM, Miguel Angel Vico  wrote:

Many of you may already know, but James is going to be out for a few
weeks and I'll be taking over this in the meantime.


Sorry for the unfortunate timing.  I am indeed on paternity leave at the 
moment.  Some quick comments below.  I'll be trying to follow the 
discussion as time allows while I'm out.



See inline for comments.

On Wed, 29 Nov 2017 09:33:29 -0800
Jason Ekstrand  wrote:
  

On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:
  

On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
wrote:

On November 24, 2017 09:29:43 Rob Clark  wrote:



On Mon, Nov 20, 2017 at 8:11 PM, James Jones 

wrote:


As many here know at this point, I've been working on solving issues
related
to DMA-capable memory allocation for various devices for some time now.
I'd
like to take this opportunity to apologize for the way I handled the

EGL

stream proposals.  I understand now that the development process

followed

there was unacceptable to the community and likely offended many great
engineers.

Moving forward, I attempted to reboot talks in a more constructive

manner

with the generic allocator library proposals & discussion forum at XDC
2016.
Some great design ideas came out of that, and I've since been

prototyping

some code to prove them out before bringing them back as official
proposals.
Again, I understand some people are growing concerned that I've been
doing
this off on the side in a github project that has primarily NVIDIA
contributors.  My goal was only to avoid wasting everyone's time with
unproven ideas.  The intent was never to dump the prototype code as-is

on

the community and presume acceptance. It's just a public research
project.

Now the prototyping is nearing completion, and I'd like to renew
discussion
on whether and how the new mechanisms can be integrated with the Linux
graphics stack.

I'd be interested to know if more work is needed to demonstrate the
usefulness of the new mechanisms, or whether people think they have

value

at
this point.

After talking with people on the hallway track at XDC this year, I've
heard
several proposals for incorporating the new mechanisms:

-Include ideas from the generic allocator design into GBM.  This could
take
the form of designing a "GBM 2.0" API, or incrementally adding to the
existing GBM API.

-Develop a library to replace GBM.  The allocator prototype code could

be

massaged into something production worthy to jump start this process.

-Develop a library that sits beside or on top of GBM, using GBM for
low-level graphics buffer allocation, while supporting non-graphics
kernel
APIs directly.  The additional cross-device negotiation and sorting of
capabilities would be handled in this slightly higher-level API before
handing off to GBM and other APIs for actual allocation somehow.



tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
still the "winsys" for running on "bare metal" (ie. kms).  And we
don't want to saddle $new_thing with aspects of that, but rather have
it focus on being the thing that in multiple-"device"[1] scenarious
figures out what sort of buffer can be allocated by who for sharing.
Ie $new_thing should really not care about winsys level things like
cursors or surfaces.. only buffers.

The mesa implementation of $new_thing could sit on top of GBM,
although it could also just sit on top of the same internal APIs that
GBM sits on top of.  That is an implementation detail.  It could be
that GBM grows an API to return an instance of $new_thing for
use-cases that involve sharing a buffer with the GPU.  Or perhaps that
is exposed via some sort of EGL extension.  (We probably also need a
way to get an instance from libdrm (?) for display-only KMS drivers,
to cover cases like etnaviv sharing a buffer with a separate display
driver.)

[1] where "devices" could be multiple GPUs or multiple APIs for one or
more GPUs, but also includes non-GPU devices like camera, video
decoder, "image processor" (which may or may not be part of camera),
etc, etc



I'm not quite some sure what I think about this.  I think I would like to
see $new_thing at least replace the guts of GBM. Whether GBM becomes a
wrapper around $new_thing or $new_thing implements the GBM API, I'm not
sure.  What I don't think I want is to see GBM development continuing on
it's own so we have two competing solutions.


I don't really view them as competing.. there is *some* overlap, ie.
allocating a buffer.. but even if you are using GBM w/out $new_thing
you could allocate a buffer externally and import it.  I don't see
$new_thing as that much different from GBM PoV.

But things like surfaces (aka swap chains) seem a bit out of place
when you are 

Re: [Mesa-dev] V2 Initial GS NIR support for radeonsi

2017-11-29 Thread Timothy Arceri

On 30/11/17 14:00, Dieter Nützel wrote:

Hello Timo,

do you have a V3 handy...? ;-)


I haven't run piglit yet after rebasing so run at your own risk.

https://github.com/tarceri/Mesa.git radeonsi_nir_final




Greetings,
Dieter

Am 23.11.2017 06:31, schrieb Timothy Arceri:

On 23/11/17 15:09, Dieter Nützel wrote:

Am 22.11.2017 10:29, schrieb Timothy Arceri:

This series depends on [1] and [2].

V2
 - use driver_location as per Nicolais suggestion
 - tidy ups as per Mareks suggestions
 - bug fixes (many more piglit tests now passing)

[1] https://patchwork.freedesktop.org/series/34131/
[2] https://patchwork.freedesktop.org/series/34132/


Hello Timothy,

I could run Unigine_Heaven-4.0 (with tess disabled of course) and 
Unigine_Valley-1.0 with all 3 together on my RX580.
If I'll try to swith to wireframe, 'game' window disappeared (as 
expected, too).


SOURCE/Unigine_Valley-1.0> echo $R600_DEBUG
nir

So here is my

Tested-by: Dieter Nützel 

on all _3_ series.


Cool. Thanks for testing.



GREAT work!
Dieter

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: do not allocate CMASK or DCC for small surfaces

2017-11-29 Thread Dieter Nützel

Tested-by: Dieter Nützel 

on RX580

with F1 2017

Dieter

Am 29.11.2017 14:48, schrieb Samuel Pitoiset:

The idea is ported from RadeonSI, but using 512x512 instead of
256x256 seems slightly better. This improves dota2 performance
by +2%.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_image.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c
index c241e369b9..1bf2fa12ed 100644
--- a/src/amd/vulkan/radv_image.c
+++ b/src/amd/vulkan/radv_image.c
@@ -805,6 +805,16 @@ radv_image_alloc_htile(struct radv_image *image)
 static inline bool
 radv_image_can_enable_dcc_or_cmask(struct radv_image *image)
 {
+   if (image->info.samples <= 1 &&
+   image->info.width <= 512 && image->info.height <= 512) {
+   /* Do not enable CMASK or DCC for small surfaces where the cost
+* of the eliminate pass can be higher than the benefit of fast
+* clear. RadeonSI does this, but the image threshold is
+* different.
+*/
+   return false;
+   }
+
return image->usage & VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT &&
   (image->exclusive || image->queue_family_mask == 1);
 }

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] r600/evergreen compute shader + glsl 4.30 support

2017-11-29 Thread Dave Airlie
On 29 November 2017 at 14:36, Dave Airlie  wrote:
> This set of patches enables compute shaders on r600 and exposes GLSL 4.30
> support. They are pretty alpha level, but I'd like to land some of them
> (maybe disabled) so I can avoid the rebasing fun with the more intrusive
> ones.
>
> It is based on the previous ssbo support patch.
>
> It may not be stable, I have a few patches sitting on top locally
> for flushing various things I want to figure out if they are required or
> if I can fix things properly.
>
> It for some reason fails to launch compute on cayman and hangs instead,
> I've got some traces from fglrx, just need to take time to work out what
> crashes, I've tested it on CAICOS mostly.

FYI, I got cayman shaders running today, but it appears cayman uses GDS
for atomics not the append/consume ctrs, at least the current code doesn't
work and tracing fglrx show is using GDS.

I've written the code but haven't debugged it into working yet.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl/android: Partially handle HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED

2017-11-29 Thread Tomasz Figa
On Thu, Nov 30, 2017 at 3:43 AM, Robert Foss  wrote:
> Hey,
>
> On Tue, 2017-11-28 at 11:49 +, Emil Velikov wrote:
>> On 28 November 2017 at 10:45, Tapani Pälli 
>> wrote:
>> > Hi;
>> >
>> >
>> > On 11/27/2017 04:14 PM, Robert Foss wrote:
>> > >
>> > > From: Tomasz Figa 
>> > >
>> > > There is no API available to properly query the
>> > > IMPLEMENTATION_DEFINED
>> > > format. As a workaround we rely here on gralloc allocating either
>> > > an arbitrary YCbCr 4:2:0 or RGBX_, with the latter being
>> > > recognized
>> > > by lock_ycbcr failing.
>> > >
>> > > Reviewed-on: https://chromium-review.googlesource.com/566793
>> > >
>> > > Signed-off-by: Tomasz Figa 
>> > > Reviewed-by: Chad Versace 
>> > > Signed-off-by: Robert Foss 
>> > > ---
>> > >   src/egl/drivers/dri2/platform_android.c | 39
>> > > +++--
>> > >   1 file changed, 37 insertions(+), 2 deletions(-)
>> > >
>> > > diff --git a/src/egl/drivers/dri2/platform_android.c
>> > > b/src/egl/drivers/dri2/platform_android.c
>> > > index 63223e9a69..ae914d79c1 100644
>> > > --- a/src/egl/drivers/dri2/platform_android.c
>> > > +++ b/src/egl/drivers/dri2/platform_android.c
>> > > @@ -59,6 +59,10 @@ static const struct droid_yuv_format
>> > > droid_yuv_formats[] = {
>> > >  { HAL_PIXEL_FORMAT_YCbCr_420_888,   0, 1,
>> > > __DRI_IMAGE_FOURCC_YUV420
>> > > },
>> > >  { HAL_PIXEL_FORMAT_YCbCr_420_888,   1, 1,
>> > > __DRI_IMAGE_FOURCC_YVU420
>> > > },
>> > >  { HAL_PIXEL_FORMAT_YV12,1, 1,
>> > > __DRI_IMAGE_FOURCC_YVU420
>> > > },
>> > > +   /* HACK: See droid_create_image_from_prime_fd() and
>> > > b/32077885. */
>> > > +   { HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED,   0, 2,
>> > > __DRI_IMAGE_FOURCC_NV12 },
>> > > +   { HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED,   0, 1,
>> > > __DRI_IMAGE_FOURCC_YUV420 },
>> > > +   { HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED,   1, 1,
>> > > __DRI_IMAGE_FOURCC_YVU420 },
>> >
>> >
>> > One alternative way would be to ask gralloc about these formats. On
>> > gralloc0
>> > this would need a perform() hook and gralloc1 has getFormat(). This
>> > is how
>> > it is done currently on Android-IA, see following commits:
>> >
>> > https://github.com/intel/external-mesa/commit/deb323eafa321c725805a
>> > 702ed19cb4983346b60
>> >
>> > https://github.com/intel/external-mesa/commit/7cc01beaf540e29862853
>> > 561ef93c6c4e86c4c1a
>> >
>> > Do you think this approach would work with Chromium as well?
>> >
>>
>> i think the Android-IA approach looks good, although it depends on
>> local gralloc0 changes. With gralloc1 on the horizon, I don't know
>> how
>> much sense it makes to extend the predecessor.
>> AFAICT the patch should not cause any issues and it's nicely
>> documented.
>
> I had a look at the chromiumos/minigbm implementation, and it does not
> contain a gralloc1 implementation as far as I can see. I assume that it
> is available somewhere, but maybe not on a public branch.
>
> Would it be possible to make the minigbm gralloc1 impl. public? That
> way I could submit a patch mirroring what intel/minigbm does.
>
> If you fine folks as at Google prefer to roll it yourselves, just give
> me a poke.

There is no gralloc1 implementation for ChromiumOS minigbm and AFAIK
we don't have any plans of adding one. AFAICT there is nothing we
would gain with it over gralloc0.

>
> Those are the two options I'm seeing.
>
> As for gralloc0 support, would it be needed?
>
>>
>> Perhaps someone from the Google/CrOS team can assist in making the
>> bug
>> public, although even then it might be better to focus on a 'perfect'
>> gralloc1?
>>
>> IMHO the patch looks perfectly reasonable and we could merge it even,
>> if we were to switch to gralloc1 in the not too distant future ;-)
>
> Maybe doing both is reasonable.

I believe there isn't much adoption of gralloc1 in the wild.
Android-IA is the first I saw (might have missed something, though).
Tapani, what was the reason for switching to gralloc1?

Could we just support gralloc0 for now in Mesa, make sure the next
generation IAllocator/IMapper stuff suites our needs and switch to it
later when it happens?

(As a side note, I had an idea to create a new interface, standardized
by Mesa, let's say libdri_android, completely free of any
gralloc-internals. It would have to be exposed additionally by any
Android that intends to run Mesa. Given the need to deal with 3
different gralloc versions already, it could be something easier to
manage.)

Best regards,
Tomasz
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] V2 Initial GS NIR support for radeonsi

2017-11-29 Thread Dieter Nützel

Hello Timo,

do you have a V3 handy...? ;-)

Greetings,
Dieter

Am 23.11.2017 06:31, schrieb Timothy Arceri:

On 23/11/17 15:09, Dieter Nützel wrote:

Am 22.11.2017 10:29, schrieb Timothy Arceri:

This series depends on [1] and [2].

V2
 - use driver_location as per Nicolais suggestion
 - tidy ups as per Mareks suggestions
 - bug fixes (many more piglit tests now passing)

[1] https://patchwork.freedesktop.org/series/34131/
[2] https://patchwork.freedesktop.org/series/34132/


Hello Timothy,

I could run Unigine_Heaven-4.0 (with tess disabled of course) and 
Unigine_Valley-1.0 with all 3 together on my RX580.
If I'll try to swith to wireframe, 'game' window disappeared (as 
expected, too).


SOURCE/Unigine_Valley-1.0> echo $R600_DEBUG
nir

So here is my

Tested-by: Dieter Nützel 

on all _3_ series.


Cool. Thanks for testing.



GREAT work!
Dieter

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 41/44] i965/fs: Use half_precision data_format on 16-bit fb writes

2017-11-29 Thread Jose Maria Casanova Crespo
From: Alejandro Piñeiro 

---
 src/intel/compiler/brw_fs_visitor.cpp | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/intel/compiler/brw_fs_visitor.cpp 
b/src/intel/compiler/brw_fs_visitor.cpp
index 481d9c51e7..01e75ff7fc 100644
--- a/src/intel/compiler/brw_fs_visitor.cpp
+++ b/src/intel/compiler/brw_fs_visitor.cpp
@@ -439,6 +439,12 @@ fs_visitor::emit_fb_writes()
   inst = emit_single_fb_write(abld, this->outputs[target],
   this->dual_src_output, src0_alpha, 4);
   inst->target = target;
+
+  /* Enables half-precision data_format for 16-bit outputs on
+   * Render Target Write Messages. Supported since cherry-view and
+   * Skylake.
+   */
+  inst->data_format = type_sz(this->outputs[target].type) == 2;
}
 
prog_data->dual_src_blend = (this->dual_src_output.file != BAD_FILE);
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 44/44] anv: Enable VK_KHR_16bit_storage for push_constant

2017-11-29 Thread Jose Maria Casanova Crespo
Enables storagePushConstant16 feature of VK_KHR_16bit_storage
for Gen8+.
---
 src/intel/vulkan/anv_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 26c0ace1ca..5b6032d794 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -733,7 +733,7 @@ void anv_GetPhysicalDeviceFeatures2KHR(
 
  features->storageBuffer16BitAccess = pdevice->info.gen >= 8;
  features->uniformAndStorageBuffer16BitAccess = pdevice->info.gen >= 8;
- features->storagePushConstant16 = false;
+ features->storagePushConstant16 = pdevice->info.gen >= 8;
  features->storageInputOutput16 = pdevice->info.gen >= 8;
  break;
   }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 42/44] i965/fs: Enable 16-bit render target write on SKL and CHV

2017-11-29 Thread Jose Maria Casanova Crespo
Once the infrastruture to support Render Target Messages with 16-bit
payload is available, this patch enables it on SKL and CHV platforms.

Enabling it allows 16-bit payload that use half of the register on
SIMD16 and avoids the spurious conversion from 16-bit to 32-bit needed
on BDW, just to be converted again to 16-bit.

In the case of CHV there is no support for UINT so in this case the
half precision data format is not enabled and the fallback of the
32-bit payload is used.

From PRM CHV, vol 07, section "Pixel Data Port" page 260:

"Half Precision Render Target Write messages do not support UNIT
formats." where UNIT is a typo for UINT.

v2: Removed use of stride = 2 on sources (Jason Ekstrand)

Signed-off-by: Jose Maria Casanova Crespo 
Signed-off-by: Eduardo Lima 
---
 src/intel/compiler/brw_fs_nir.cpp | 46 +++
 1 file changed, 32 insertions(+), 14 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 04d1e3bbf7..f4a1dd644b 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -54,19 +54,24 @@ fs_visitor::nir_setup_outputs()
   return;
 
if (stage == MESA_SHADER_FRAGMENT) {
-  /*
+  /* On HW that doesn't support half-precision render-target-write
+   * messages (e.g, some gen8 HW like Broadwell), we need a workaround
+   * to support 16-bit outputs from pixel shaders.
+   *
* The following code uses the outputs map to save the variable's
* original output type, so later we can retrieve it and retype
* the output accordingly while emitting the FS 16-bit outputs.
*/
-  nir_foreach_variable(var, >outputs) {
- const enum glsl_base_type base_type =
-glsl_get_base_type(var->type->without_array());
-
- if (glsl_base_type_is_16bit(base_type)) {
-outputs[var->data.driver_location] =
-   retype(outputs[var->data.driver_location],
-  brw_type_for_base_type(var->type));
+  if (devinfo->gen == 8) {
+ nir_foreach_variable(var, >outputs) {
+const enum glsl_base_type base_type =
+   glsl_get_base_type(var->type->without_array());
+
+if (glsl_base_type_is_16bit(base_type)) {
+   outputs[var->data.driver_location] =
+  retype(outputs[var->data.driver_location],
+ brw_type_for_base_type(var->type));
+}
  }
   }
   return;
@@ -3341,14 +3346,27 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder ,
   const unsigned location = nir_intrinsic_base(instr) +
  SET_FIELD(const_offset->u32[0], BRW_NIR_FRAG_OUTPUT_LOCATION);
 
+  /* This flag discriminates HW where we have support for half-precision
+   * render target write messages (aka, the data-format bit), so 16-bit
+   * render target payloads can be used. It is available since skylake
+   * and cherryview. In the case of cherryview there is no support for
+   * UINT formats.
+   */
+  bool enable_hp_rtw = is_16bit &&
+ (devinfo->gen >= 9 || (devinfo->is_cherryview &&
+outputs[location].type != 
BRW_REGISTER_TYPE_UW));
+
   if (is_16bit) {
- /* The outputs[location] should already have the original output type
-  * stored from nir_setup_outputs.
+ /* outputs[location] should already have the original output type
+  * stored from nir_setup_outputs, in case the HW doesn't support
+  * half-precision RTW messages.
+  * If HP RTW is enabled we just use HF to copy 16-bit values.
   */
- src = retype(src, outputs[location].type);
+ src = retype(src, enable_hp_rtw ?
+  BRW_REGISTER_TYPE_HF : outputs[location].type);
   }
 
-  fs_reg new_dest = retype(alloc_frag_output(this, location, false),
+  fs_reg new_dest = retype(alloc_frag_output(this, location, 
enable_hp_rtw),
src.type);
 
   /* This is a workaround to support 16-bits outputs on HW that doesn't
@@ -3358,7 +3376,7 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder ,
* render target with a 16-bit surface format will force the correct
* conversion of the 32-bit output values to 16-bit.
*/
-  if (is_16bit) {
+  if (is_16bit && !enable_hp_rtw) {
  new_dest.type = brw_reg_type_from_bit_size(32, src.type);
   }
   for (unsigned j = 0; j < instr->num_components; j++)
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 43/44] i965/fs: Support push constants of 16-bit types

2017-11-29 Thread Jose Maria Casanova Crespo
We enable the use of 16-bit values in push constants
modifying the assign_constant_locations function to work
with 16-bit types.

The API to access buffers in Vulkan use multiples of 4-byte for
offsets and sizes. Current accountability of uniforms based on 4-byte
slots will work for 16-bit values if they are allowed to use 32-bit
slots. For that, we replace the division by 4 by a DIV_ROUND_UP, so
2-byte elements will use 1 slot instead of 0.

We align the 16-bit locations after assigning the 32-bit
ones.

v2: Minor changes after rebase against recent master
(José María Casanova)

v3: Rebase needs compiler->supports_pull_constants at
set_push_pull_constant_loc call. (José María Casanova)
---
 src/intel/compiler/brw_fs.cpp | 31 ---
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index b1e548fd93..650ddff09e 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -1948,8 +1948,9 @@ set_push_pull_constant_loc(unsigned uniform, int 
*chunk_start,
if (!contiguous) {
   /* If bitsize doesn't match the target one, skip it */
   if (*max_chunk_bitsize != target_bitsize) {
- /* FIXME: right now we only support 32 and 64-bit accesses */
- assert(*max_chunk_bitsize == 4 || *max_chunk_bitsize == 8);
+ assert(*max_chunk_bitsize == 4 ||
+*max_chunk_bitsize == 8 ||
+*max_chunk_bitsize == 2);
  *max_chunk_bitsize = 0;
  *chunk_start = -1;
  return;
@@ -2038,8 +2039,9 @@ fs_visitor::assign_constant_locations()
  int constant_nr = inst->src[i].nr + inst->src[i].offset / 4;
 
  if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) {
-assert(inst->src[2].ud % 4 == 0);
-unsigned last = constant_nr + (inst->src[2].ud / 4) - 1;
+assert(type_sz(inst->src[i].type) == 2 ?
+   (inst->src[2].ud % 2 == 0) : (inst->src[2].ud % 4 == 0));
+unsigned last = constant_nr + DIV_ROUND_UP(inst->src[2].ud, 4) - 1;
 assert(last < uniforms);
 
 for (unsigned j = constant_nr; j < last; j++) {
@@ -2051,8 +2053,8 @@ fs_visitor::assign_constant_locations()
 bitsize_access[last] = MAX2(bitsize_access[last], 
type_sz(inst->src[i].type));
  } else {
 if (constant_nr >= 0 && constant_nr < (int) uniforms) {
-   int regs_read = inst->components_read(i) *
-  type_sz(inst->src[i].type) / 4;
+   int regs_read = DIV_ROUND_UP(inst->components_read(i) *
+type_sz(inst->src[i].type), 4);
assert(regs_read <= 2);
if (regs_read == 2)
   contiguous[constant_nr] = true;
@@ -2116,7 +2118,7 @@ fs_visitor::assign_constant_locations()
 
}
 
-   /* Then push the rest of uniforms */
+   /* Then push the 32-bit uniforms */
const unsigned uniform_32_bit_size = type_sz(BRW_REGISTER_TYPE_F);
for (unsigned u = 0; u < uniforms; u++) {
   if (!is_live[u])
@@ -2136,6 +2138,21 @@ fs_visitor::assign_constant_locations()
  stage_prog_data);
}
 
+   const unsigned uniform_16_bit_size = type_sz(BRW_REGISTER_TYPE_HF);
+   for (unsigned u = 0; u < uniforms; u++) {
+  if (!is_live[u])
+ continue;
+
+  set_push_pull_constant_loc(u, _start, _chunk_bitsize,
+ contiguous[u], bitsize_access[u],
+ uniform_16_bit_size,
+ push_constant_loc, pull_constant_loc,
+ _push_constants, _pull_constants,
+ max_push_components, max_chunk_size,
+ compiler->supports_pull_constants,
+ stage_prog_data);
+   }
+
/* Add the CS local thread ID uniform at the end of the push constants */
if (subgroup_id_index >= 0)
   push_constant_loc[subgroup_id_index] = num_push_constants++;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 39/44] i965/fs: Mark 16-bit outputs on FS store_output

2017-11-29 Thread Jose Maria Casanova Crespo
On SKL the render target write operations allow 16-bit format
output. This marks output registers as 16-bit using
BRW_REGISTER_TYPE_HF on the proper outputs target.

This allows to recognise when the data_format of 16-bit should be
enabled on render_target_write messages.

Signed-off-by: Jose Maria Casanova Crespo 
Signed-off-by: Eduardo Lima 
---
 src/intel/compiler/brw_fs_nir.cpp | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index fb138de76a..04d1e3bbf7 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -3238,13 +3238,16 @@ emit_coherent_fb_read(const fs_builder , const 
fs_reg , unsigned target)
 }
 
 static fs_reg
-alloc_temporary(const fs_builder , unsigned size, fs_reg *regs, unsigned n)
+alloc_temporary(const fs_builder , unsigned size, fs_reg *regs, unsigned n,
+bool is_16bit)
 {
if (n && regs[0].file != BAD_FILE) {
   return regs[0];
 
} else {
-  const fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F, size);
+  const brw_reg_type type =
+ is_16bit ? BRW_REGISTER_TYPE_HF : BRW_REGISTER_TYPE_F;
+  const fs_reg tmp = bld.vgrf(type, size);
 
   for (unsigned i = 0; i < n; i++)
  regs[i] = tmp;
@@ -3254,7 +3257,7 @@ alloc_temporary(const fs_builder , unsigned size, 
fs_reg *regs, unsigned n)
 }
 
 static fs_reg
-alloc_frag_output(fs_visitor *v, unsigned location)
+alloc_frag_output(fs_visitor *v, unsigned location, bool is_16bit)
 {
assert(v->stage == MESA_SHADER_FRAGMENT);
const brw_wm_prog_key *const key =
@@ -3263,26 +3266,26 @@ alloc_frag_output(fs_visitor *v, unsigned location)
const unsigned i = GET_FIELD(location, BRW_NIR_FRAG_OUTPUT_INDEX);
 
if (i > 0 || (key->force_dual_color_blend && l == FRAG_RESULT_DATA1))
-  return alloc_temporary(v->bld, 4, >dual_src_output, 1);
+  return alloc_temporary(v->bld, 4, >dual_src_output, 1, is_16bit);
 
else if (l == FRAG_RESULT_COLOR)
   return alloc_temporary(v->bld, 4, v->outputs,
- MAX2(key->nr_color_regions, 1));
+ MAX2(key->nr_color_regions, 1),
+ is_16bit);
 
else if (l == FRAG_RESULT_DEPTH)
-  return alloc_temporary(v->bld, 1, >frag_depth, 1);
+  return alloc_temporary(v->bld, 1, >frag_depth, 1, is_16bit);
 
else if (l == FRAG_RESULT_STENCIL)
-  return alloc_temporary(v->bld, 1, >frag_stencil, 1);
+  return alloc_temporary(v->bld, 1, >frag_stencil, 1, is_16bit);
 
else if (l == FRAG_RESULT_SAMPLE_MASK)
-  return alloc_temporary(v->bld, 1, >sample_mask, 1);
+  return alloc_temporary(v->bld, 1, >sample_mask, 1, is_16bit);
 
else if (l >= FRAG_RESULT_DATA0 &&
 l < FRAG_RESULT_DATA0 + BRW_MAX_DRAW_BUFFERS)
   return alloc_temporary(v->bld, 4,
- >outputs[l - FRAG_RESULT_DATA0], 1);
-
+ >outputs[l - FRAG_RESULT_DATA0], 1, is_16bit);
else
   unreachable("Invalid location");
 }
@@ -3345,7 +3348,7 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder ,
  src = retype(src, outputs[location].type);
   }
 
-  fs_reg new_dest = retype(alloc_frag_output(this, location),
+  fs_reg new_dest = retype(alloc_frag_output(this, location, false),
src.type);
 
   /* This is a workaround to support 16-bits outputs on HW that doesn't
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 40/44] i965/fs: 16-bit source payloads always use 1 register

2017-11-29 Thread Jose Maria Casanova Crespo
Render Target Message's payloads for 16bit values fit in only one
register.

From Intel PRM vol07, page 249 "Render Target Messages" / "Message
Data Payloads"

   "The half precision Render Target Write messages have data payloads
that can pack a full SIMD16 payload into 1 register instead of
two. The half-precision packed format is used for RGBA and Source
0 Alpha, but Source Depth data payload is always supplied in full
precision."

So when 16-bit data is uploaded to the payload it will use 1 register
independently of it is SIMD16 or SIMD8.

This change implies that we need to replicate the approach in the
copy propagation of the load_payload operations.

v2: By default 16-bit sources should be packed (Jason Ekstrand)
Include changes in in copy_propagation of load_payload (Chema Casanova)
---
 src/intel/compiler/brw_fs.cpp  | 5 -
 src/intel/compiler/brw_fs_copy_propagation.cpp | 4 ++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index b695508823..b1e548fd93 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -3485,7 +3485,10 @@ fs_visitor::lower_load_payload()
   for (uint8_t i = inst->header_size; i < inst->sources; i++) {
  if (inst->src[i].file != BAD_FILE)
 ibld.MOV(retype(dst, inst->src[i].type), inst->src[i]);
- dst = offset(dst, ibld, 1);
+ if (type_sz(inst->src[i].type) == 2)
+dst = byte_offset(dst, REG_SIZE);
+ else
+dst = offset(dst, ibld, 1);
   }
 
   inst->remove(block);
diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp 
b/src/intel/compiler/brw_fs_copy_propagation.cpp
index d4d01d783c..470eaeec4f 100644
--- a/src/intel/compiler/brw_fs_copy_propagation.cpp
+++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
@@ -800,7 +800,7 @@ fs_visitor::opt_copy_propagation_local(void *copy_prop_ctx, 
bblock_t *block,
  int offset = 0;
  for (int i = 0; i < inst->sources; i++) {
 int effective_width = i < inst->header_size ? 8 : inst->exec_size;
-assert(effective_width * type_sz(inst->src[i].type) % REG_SIZE == 
0);
+assert(effective_width * MAX2(4, type_sz(inst->src[i].type)) % 
REG_SIZE == 0);
 const unsigned size_written = effective_width *
   type_sz(inst->src[i].type);
 if (inst->src[i].file == VGRF) {
@@ -816,7 +816,7 @@ fs_visitor::opt_copy_propagation_local(void *copy_prop_ctx, 
bblock_t *block,
   ralloc_free(entry);
}
 }
-offset += size_written;
+offset += type_sz(inst->src[i].type) == 2 ? REG_SIZE : 
size_written;
  }
   }
}
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 38/44] i965/disasm: Show half-precision data_format on rt_writes

2017-11-29 Thread Jose Maria Casanova Crespo
---
 src/intel/compiler/brw_disasm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/intel/compiler/brw_disasm.c b/src/intel/compiler/brw_disasm.c
index 1a94ed3954..c752e15331 100644
--- a/src/intel/compiler/brw_disasm.c
+++ b/src/intel/compiler/brw_disasm.c
@@ -1676,6 +1676,10 @@ brw_disassemble_inst(FILE *file, const struct 
gen_device_info *devinfo,
   brw_inst_rt_message_type(devinfo, inst), );
if (devinfo->gen >= 6 && brw_inst_rt_slot_group(devinfo, inst))
   string(file, " Hi");
+   if ((devinfo->gen >= 9 || devinfo->is_cherryview) &&
+   brw_inst_data_format(devinfo, inst)) {
+  string(file, " HP");
+   }
if (brw_inst_rt_last(devinfo, inst))
   string(file, " LastRT");
if (devinfo->gen < 7 && brw_inst_dp_write_commit(devinfo, inst))
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 33/44] i965/fs: Unpack 16-bit from 32-bit components in VS load_input

2017-11-29 Thread Jose Maria Casanova Crespo
The VS load input for 16-bit values receives pairs of 16-bit values
packed in 32-bit values. Because of the adjusted format used at:

 anv/pipeline: Use 32-bit surface formats for 16-bit formats

v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)
v3: Fix coding style and typo (Topi Pohjolainen)
Simplify unshuffle 32-bit to 16-bit using helper function
(Jason Ekstrand)
---
 src/intel/compiler/brw_fs_nir.cpp | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 57e79853ef..0f1a428242 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2430,8 +2430,26 @@ fs_visitor::nir_emit_vs_intrinsic(const fs_builder ,
   assert(const_offset && "Indirect input loads not allowed");
   src = offset(src, bld, const_offset->u32[0]);
 
-  for (unsigned j = 0; j < num_components; j++) {
- bld.MOV(offset(dest, bld, j), offset(src, bld, j + first_component));
+  if (type_sz(type) == 2) {
+ /* The VS load input for 16-bit values receives pairs of 16-bit
+  * values packed in 32-bit values. This is an example on SIMD8:
+  *
+  * xy xy xy xy xy xy xy xy
+  * zw zw zw zw zw zw zw xw
+  *
+  * We need to format it to something like:
+  *
+  * xx xx xx xx yy yy yy yy
+  * zz zz zz zz ww ww ww ww
+  */
+
+ shuffle_32bit_load_result_to_16bit_data(bld,
+ dest,
+ retype(src, 
BRW_REGISTER_TYPE_F),
+ num_components);
+  } else {
+ for (unsigned j = 0; j < num_components; j++)
+bld.MOV(offset(dest, bld, j), offset(src, bld, j + 
first_component));
   }
 
   if (type == BRW_REGISTER_TYPE_DF) {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 30/44] i965/compiler: includes 16-bit vertex input

2017-11-29 Thread Jose Maria Casanova Crespo
Includes the info about 16-bit vertex inputs coming from nir on brw VS
prog data, as we already do with 64-bit vertex input.

v2: Renamed half_inputs_read to inputs_read_16bit (Jason Ekstrand)
---
 src/intel/compiler/brw_compiler.h | 1 +
 src/intel/compiler/brw_vec4.cpp   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/intel/compiler/brw_compiler.h 
b/src/intel/compiler/brw_compiler.h
index 28aed83324..191dc8bd1d 100644
--- a/src/intel/compiler/brw_compiler.h
+++ b/src/intel/compiler/brw_compiler.h
@@ -961,6 +961,7 @@ struct brw_vs_prog_data {
 
GLbitfield64 inputs_read;
GLbitfield64 double_inputs_read;
+   GLbitfield64 inputs_read_16bit;
 
unsigned nr_attribute_slots;
 
diff --git a/src/intel/compiler/brw_vec4.cpp b/src/intel/compiler/brw_vec4.cpp
index 73c40ad600..d32b1e3302 100644
--- a/src/intel/compiler/brw_vec4.cpp
+++ b/src/intel/compiler/brw_vec4.cpp
@@ -2771,6 +2771,7 @@ brw_compile_vs(const struct brw_compiler *compiler, void 
*log_data,
 
prog_data->inputs_read = shader->info.inputs_read;
prog_data->double_inputs_read = shader->info.double_inputs_read;
+   prog_data->inputs_read_16bit = shader->info.inputs_read_16bit;
 
brw_nir_lower_vs_inputs(shader, use_legacy_snorm_formula,
key->gl_attrib_wa_flags);
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 37/44] i965/fs: Include support for SEND data_format bit for Render Targets

2017-11-29 Thread Jose Maria Casanova Crespo
From intel Skylake PRM, vol 07, section "EU Overview", subsection
"Send Message" (page 905):

   "Bit 30: Data format. This field specifies the width of data read
from sampler or written to render target. Format = U1 0
Single Precision (32b), 1 Half Precision (16b)"

Also present on vol 02d, "Message Descriptor - Render Target Write"
(page 326).

It is worth to note that this bit is also present on
Cherryview/Braswell but not on Broadwell, both Gen8, so we can't check
for the presence of that bit based just on the gen (example: on
brw_inst.h).

Signed-off-by: Jose Maria Casanova Crespo 
Signed-off-by: Eduardo Lima 
Signed-off-by: Alejandro Piñeiro 
---
 src/intel/compiler/brw_eu.h   |  6 --
 src/intel/compiler/brw_eu_emit.c  | 25 -
 src/intel/compiler/brw_fs.cpp |  1 +
 src/intel/compiler/brw_fs_generator.cpp   |  3 ++-
 src/intel/compiler/brw_fs_surface_builder.cpp |  3 ++-
 src/intel/compiler/brw_inst.h |  1 +
 src/intel/compiler/brw_shader.h   |  7 +++
 src/intel/compiler/brw_vec4_generator.cpp |  3 ++-
 8 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/src/intel/compiler/brw_eu.h b/src/intel/compiler/brw_eu.h
index 2d0f56f793..35adb47684 100644
--- a/src/intel/compiler/brw_eu.h
+++ b/src/intel/compiler/brw_eu.h
@@ -251,7 +251,8 @@ void brw_set_dp_write_message(struct brw_codegen *p,
  unsigned last_render_target,
  unsigned response_length,
  unsigned end_of_thread,
- unsigned send_commit_msg);
+ unsigned send_commit_msg,
+ unsigned data_format);
 
 void brw_urb_WRITE(struct brw_codegen *p,
   struct brw_reg dest,
@@ -303,7 +304,8 @@ void brw_fb_WRITE(struct brw_codegen *p,
   unsigned response_length,
   bool eot,
   bool last_render_target,
-  bool header_present);
+  bool header_present,
+  unsigned data_format);
 
 brw_inst *gen9_fb_READ(struct brw_codegen *p,
struct brw_reg dst,
diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index bdc516848a..70d735d3fd 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++ b/src/intel/compiler/brw_eu_emit.c
@@ -520,7 +520,8 @@ brw_set_dp_write_message(struct brw_codegen *p,
 unsigned last_render_target,
 unsigned response_length,
 unsigned end_of_thread,
-unsigned send_commit_msg)
+unsigned send_commit_msg,
+unsigned data_format)
 {
const struct gen_device_info *devinfo = p->devinfo;
const unsigned sfid = (devinfo->gen >= 6 ? target_cache :
@@ -532,6 +533,16 @@ brw_set_dp_write_message(struct brw_codegen *p,
brw_inst_set_binding_table_index(devinfo, insn, binding_table_index);
brw_inst_set_dp_write_msg_type(devinfo, insn, msg_type);
brw_inst_set_dp_write_msg_control(devinfo, insn, msg_control);
+   if (data_format) {
+  /* data_format is supported since CherryView. So we can't just set the
+   * any data_format value, because it would trigger an assertion on
+   * brw_inst_set_data_format for previous hw if they try to set it to
+   * zero. And we don't add an generation assert because as mentioned,
+   * brw_inst_set_data_format already does that.
+   */
+  brw_inst_set_data_format(devinfo, insn, data_format);
+   }
+
brw_inst_set_rt_last(devinfo, insn, last_render_target);
if (devinfo->gen < 7) {
   brw_inst_set_dp_write_commit(devinfo, insn, send_commit_msg);
@@ -2050,7 +2061,8 @@ void brw_oword_block_write_scratch(struct brw_codegen *p,
   0, /* not a render target */
   send_commit_msg, /* response_length */
   0, /* eot */
-  send_commit_msg);
+  send_commit_msg,
+  0 /* data_format */);
}
 }
 
@@ -2244,7 +2256,8 @@ void brw_fb_WRITE(struct brw_codegen *p,
   unsigned response_length,
   bool eot,
   bool last_render_target,
-  bool header_present)
+  bool header_present,
+  unsigned data_format)
 {
const struct gen_device_info *devinfo = p->devinfo;
const unsigned target_cache =
@@ -2292,7 +2305,8 @@ void brw_fb_WRITE(struct brw_codegen *p,
last_render_target,
response_length,
eot,
-   0 /* send_commit_msg */);
+   0, 

[Mesa-dev] [PATCH v4 36/44] anv: Enable VK_KHR_16bit_storage for input/output

2017-11-29 Thread Jose Maria Casanova Crespo
Enables storageInputOutput16 feature of VK_KHR_16bit_storage
for Gen8+.
---
 src/intel/vulkan/anv_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 2e5b914480..26c0ace1ca 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -734,7 +734,7 @@ void anv_GetPhysicalDeviceFeatures2KHR(
  features->storageBuffer16BitAccess = pdevice->info.gen >= 8;
  features->uniformAndStorageBuffer16BitAccess = pdevice->info.gen >= 8;
  features->storagePushConstant16 = false;
- features->storageInputOutput16 = false;
+ features->storageInputOutput16 = pdevice->info.gen >= 8;
  break;
   }
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 32/44] anv/cmd_buffer: Add a padding to the vertex buffer

2017-11-29 Thread Jose Maria Casanova Crespo
From: Alejandro Piñeiro 

As we are using 32-bit surface formats with 16-bit elements we can be
on a situation where a vertex element can poke over the buffer by 2
bytes. To avoid that we add a padding when flushing the state.

This is similar to what the i965 drivers prior to Haswell do, as they
use 4-component formats to fake 3-component formats, and add a padding
there too. See commit:
   7c8dfa78b98a12c1c5f74d11433c8554d4c90657

v2: (Jason Ekstrand)
Increase by 2 the size returned by GetBufferMemoryRequirements
when robust buffer access is enabled in a vertex buffer.
Renamed half_inputs_read to inputs_read_16bit.

Signed-off-by: Jose Maria Casanova Crespo 
Signed-off-by: Alejandro Piñeiro 
---
 src/intel/vulkan/anv_device.c  | 10 ++
 src/intel/vulkan/genX_cmd_buffer.c | 20 ++--
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 69a1f5a5f6..2e5b914480 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -1847,6 +1847,16 @@ void anv_GetBufferMemoryRequirements(
}
 
pMemoryRequirements->size = buffer->size;
+
+   /* Vertex buffers with 16-bit values need a 2 bytes padding in some cases
+* because they are read as 32-bit components. By adding 2 bytes to memory
+* requirements size when robust buffer accesss is enabled the paddings we
+* read would be outside of the VkBuffer but would not be outside "the
+* memory range(s) bound to the buffer".
+*/
+   if (device->robust_buffer_access && (buffer->usage & 
VK_BUFFER_USAGE_VERTEX_BUFFER_BIT))
+  pMemoryRequirements->size += 2;
+
pMemoryRequirements->alignment = 16;
pMemoryRequirements->memoryTypeBits = memory_types;
 }
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index ab5590d7ce..c019ab5259 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -1868,6 +1868,11 @@ genX(cmd_buffer_flush_state)(struct anv_cmd_buffer 
*cmd_buffer)
 {
struct anv_pipeline *pipeline = cmd_buffer->state.pipeline;
uint32_t *p;
+#if GEN_GEN >= 8
+   const struct brw_vs_prog_data *vs_prog_data = get_vs_prog_data(pipeline);
+   const uint64_t inputs_read_16bit = vs_prog_data->inputs_read_16bit;
+   const uint32_t elements_16bit = inputs_read_16bit >> VERT_ATTRIB_GENERIC0;
+#endif
 
uint32_t vb_emit = cmd_buffer->state.vb_dirty & pipeline->vb_used;
 
@@ -1880,6 +1885,17 @@ genX(cmd_buffer_flush_state)(struct anv_cmd_buffer 
*cmd_buffer)
if (vb_emit) {
   const uint32_t num_buffers = __builtin_popcount(vb_emit);
   const uint32_t num_dwords = 1 + num_buffers * 4;
+  /* ISL 16-bit formats do a 16-bit to 32-bit float conversion, so we need
+   * to use ISL 32-bit formats to avoid such conversion in order to support
+   * properly 16-bit formats. This means that the vertex element may poke
+   * over the end of the buffer by 2 bytes.
+   */
+  const unsigned padding =
+#if GEN_GEN >= 8
+ (elements_16bit > 0) * 2;
+#else
+  0;
+#endif
 
   p = anv_batch_emitn(_buffer->batch, num_dwords,
   GENX(3DSTATE_VERTEX_BUFFERS));
@@ -1909,9 +1925,9 @@ genX(cmd_buffer_flush_state)(struct anv_cmd_buffer 
*cmd_buffer)
 .BufferStartingAddress = { buffer->bo, buffer->offset + offset },
 
 #if GEN_GEN >= 8
-.BufferSize = buffer->size - offset
+.BufferSize = buffer->size - offset + padding,
 #else
-.EndAddress = { buffer->bo, buffer->offset + buffer->size - 1},
+.EndAddress = { buffer->bo, buffer->offset + buffer->size + 
padding - 1},
 #endif
  };
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 28/44] i965/fs: Use untyped_surface_read for 16-bit load_ssbo

2017-11-29 Thread Jose Maria Casanova Crespo
SSBO loads were using byte_scattered read messages as they allow
reading 16-bit size components. byte_scattered messages can only
operate one component at a time so we needed to emit as many messages
as components.

But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the
untyped_surface_read message to read pairs of 16-bit components using only
one message. Once each pair is read it is unshuffled to return the proper
16-bit components.

On 16-bit scalar and vec3 16-bit the not paired component is read using
only one byte_scattered_read message.

v2: Removed use of stride = 2 on sources (Jason Ekstrand)
Rework optimization using unshuffle 16 reads (Chema Casanova)
---
 src/intel/compiler/brw_fs_nir.cpp | 43 ++-
 1 file changed, 33 insertions(+), 10 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index fa7aa9c247..57e79853ef 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2354,16 +2354,39 @@ do_untyped_vector_read(const fs_builder ,
  bld.ADD(read_offset, read_offset, brw_imm_ud(16));
   }
} else if (type_sz(dest.type) == 2) {
-  fs_reg read_offset = bld.vgrf(BRW_REGISTER_TYPE_UD);
-  bld.MOV(read_offset, offset_reg);
-  for (unsigned i = 0; i < num_components; i++) {
- fs_reg read_reg = emit_byte_scattered_read(bld, surf_index, 
read_offset,
-1 /* dims */,
-1,
-16 /*bit_size */,
-BRW_PREDICATE_NONE);
- bld.MOV(offset(dest,bld,i), subscript(read_reg, dest.type, 0));
- bld.ADD(read_offset, read_offset, brw_imm_ud(type_sz(dest.type)));
+  assert(dest.stride == 1);
+
+  int component_pairs = num_components / 2;
+  /* Pairs of 16-bit components can be read with untyped read */
+  if (component_pairs > 0) {
+ fs_reg read_result = emit_untyped_read(bld, surf_index,
+offset_reg,
+1 /* dims */,
+component_pairs,
+BRW_PREDICATE_NONE);
+ shuffle_32bit_load_result_to_16bit_data(bld,
+   retype(dest, BRW_REGISTER_TYPE_HF),
+   retype(read_result, BRW_REGISTER_TYPE_F),
+   component_pairs * 2);
+  }
+  /* Last component of vec3 and scalar 16-bit read needs to be read
+   * using one byte_scattered_read message
+   */
+  if (num_components % 2) {
+ fs_reg read_offset = bld.vgrf(BRW_REGISTER_TYPE_UD);
+ bld.ADD(read_offset,
+ offset_reg,
+ brw_imm_ud((num_components - 1) * type_sz(dest.type)));
+ fs_reg read_result = emit_byte_scattered_read(bld, surf_index,
+   read_offset,
+   1 /* dims */,
+   1,
+   16 /* bit_size */,
+   BRW_PREDICATE_NONE);
+ read_result.type = dest.type;
+ read_result.stride = 2;
+
+ bld.MOV(offset(dest, bld, num_components - 1), read_result);
   }
} else {
   unreachable("Unsupported type");
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 27/44] i965/fs: Predicate byte scattered writes if needed

2017-11-29 Thread Jose Maria Casanova Crespo
From: Alejandro Piñeiro 

While on Untyped Surface messages the bits of the execution mask are
ANDed with the corresponding bits of the Pixel/Sample Mask, that is
not the case for byte scattered writes. That is needed to avoid ssbo
stores writing on helper invocations. So when that can affect, we load
the sample mask, and predicate the send message.

Note: the need for this patch was tested with a custom test. Right now
the 16 bit storage CTS tests doesnt need this path in order to get a
full pass.
---
 src/intel/compiler/brw_fs_nir.cpp | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 2c344ec7df..fa7aa9c247 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4167,12 +4167,24 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
  * to rely on byte scattered in order to write 16-bit elements.
  * The byte_scattered_write message needs that every written 16-bit
  * type to be aligned 32-bits (stride=2).
+ * Additionally, while on Untyped Surface messages the
+ * bits of the execution mask are ANDed with the corresponding
+ * bits of the Pixel/Sample Mask, that is not the case for byte
+ * scattered writes. That is needed to avoid ssbo stores writing
+ * on helper invocations. So when that can affect, we load the
+ * sample mask, and predicate the send message.
  */
+brw_predicate pred = BRW_PREDICATE_NONE;
+
+if (stage == MESA_SHADER_FRAGMENT) {
+   bld.emit(FS_OPCODE_MOV_DISPATCH_TO_FLAGS);
+   pred = BRW_PREDICATE_NORMAL;
+}
 emit_byte_scattered_write(bld, surf_index, offset_reg,
   current_val_reg,
   1 /* dims */, 1,
   bit_size,
-  BRW_PREDICATE_NONE);
+  pred);
  } else {
 unsigned write_size = (length * type_size) / 4;
 assert (write_size > 0);
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 35/44] i965/fs: Enable Render Target Write for 16-bit outputs

2017-11-29 Thread Jose Maria Casanova Crespo
Broadwell doesn't support half precisions data formats on render
target writes (RTW) messages. So the solution to write 16-bit outputs
is to use the conversion from 32-bit to 16-bit when writing 32-bit
values on a 16-bit format surface using formats like R16_FLOAT.

Half-precision outputs are converted from HF->F, W->D and UW->UD. This
requires to know the GLSL base type used in NIR to define the shader
output. We store the 16-bit register types at nir_setup_outputs.

This conversion will be used on all 16-bit types on BDW and in the
case of Cherryview that doesn't have UINT16 type support for RTW
with half precision data formats.

It is important to note that in these cases the payload has 32-bit
format, different to the one used when the half precision data format
will be enabled on SKL and Cherryview with the following patches.

v2: By default 16-bit sources should be packed (Jason Ekstrand)
Remove not necessary alignment operation for 16-bit to
32-bit conversion (Chema Casanova)

Signed-off-by: Jose Maria Casanova Crespo 
Signed-off-by: Eduardo Lima 
---
 src/intel/compiler/brw_fs_nir.cpp | 48 +++
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 66299860c9..fb138de76a 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -50,9 +50,28 @@ fs_visitor::emit_nir_code()
 void
 fs_visitor::nir_setup_outputs()
 {
-   if (stage == MESA_SHADER_TESS_CTRL || stage == MESA_SHADER_FRAGMENT)
+   if (stage == MESA_SHADER_TESS_CTRL)
   return;
 
+   if (stage == MESA_SHADER_FRAGMENT) {
+  /*
+   * The following code uses the outputs map to save the variable's
+   * original output type, so later we can retrieve it and retype
+   * the output accordingly while emitting the FS 16-bit outputs.
+   */
+  nir_foreach_variable(var, >outputs) {
+ const enum glsl_base_type base_type =
+glsl_get_base_type(var->type->without_array());
+
+ if (glsl_base_type_is_16bit(base_type)) {
+outputs[var->data.driver_location] =
+   retype(outputs[var->data.driver_location],
+  brw_type_for_base_type(var->type));
+ }
+  }
+  return;
+   }
+
unsigned vec4s[VARYING_SLOT_TESS_MAX] = { 0, };
 
/* Calculate the size of output registers in a separate pass, before
@@ -3310,14 +3329,35 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder ,
}
 
case nir_intrinsic_store_output: {
-  const fs_reg src = get_nir_src(instr->src[0]);
+  fs_reg src = get_nir_src(instr->src[0]);
+  bool is_16bit = (instr->src[0].is_ssa ?
+ instr->src[0].ssa->bit_size : instr->src[0].reg.reg->bit_size) == 16;
+
   const nir_const_value *const_offset = 
nir_src_as_const_value(instr->src[1]);
   assert(const_offset && "Indirect output stores not allowed");
   const unsigned location = nir_intrinsic_base(instr) +
  SET_FIELD(const_offset->u32[0], BRW_NIR_FRAG_OUTPUT_LOCATION);
-  const fs_reg new_dest = retype(alloc_frag_output(this, location),
- src.type);
 
+  if (is_16bit) {
+ /* The outputs[location] should already have the original output type
+  * stored from nir_setup_outputs.
+  */
+ src = retype(src, outputs[location].type);
+  }
+
+  fs_reg new_dest = retype(alloc_frag_output(this, location),
+   src.type);
+
+  /* This is a workaround to support 16-bits outputs on HW that doesn't
+   * support half-precision render-target-write (RTW) messages. In these
+   * cases, we construct a 32-bit payload with the result of the
+   * conversion of the output values from 16-bit to 32-bit. Later on, a
+   * render target with a 16-bit surface format will force the correct
+   * conversion of the 32-bit output values to 16-bit.
+   */
+  if (is_16bit) {
+ new_dest.type = brw_reg_type_from_bit_size(32, src.type);
+  }
   for (unsigned j = 0; j < instr->num_components; j++)
  bld.MOV(offset(new_dest, bld, nir_intrinsic_component(instr) + j),
  offset(src, bld, j));
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 29/44] compiler: Mark when input/ouput attribute at VS uses 16-bit

2017-11-29 Thread Jose Maria Casanova Crespo
New shader attribute to mark when a location has 16-bit
value. This patch includes support on mesa glsl and nir.

v2: Remove use of is_half_slot as is a duplicate of is_16bit
(Topi Pohjolainen)
Renamed half_inputs_read to inputs_read_16bit (Jason Ekstrand)
---
 src/compiler/glsl_types.h  | 15 +++
 src/compiler/nir/nir_gather_info.c | 17 -
 src/compiler/shader_info.h |  2 ++
 3 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/src/compiler/glsl_types.h b/src/compiler/glsl_types.h
index ee8aa71c75..4e1c4325f7 100644
--- a/src/compiler/glsl_types.h
+++ b/src/compiler/glsl_types.h
@@ -100,6 +100,13 @@ static inline bool glsl_base_type_is_integer(enum 
glsl_base_type type)
   type == GLSL_TYPE_IMAGE;
 }
 
+static inline bool glsl_base_type_is_16bit(enum glsl_base_type type)
+{
+   return type == GLSL_TYPE_FLOAT16 ||
+  type == GLSL_TYPE_UINT16 ||
+  type == GLSL_TYPE_INT16;
+}
+
 enum glsl_sampler_dim {
GLSL_SAMPLER_DIM_1D = 0,
GLSL_SAMPLER_DIM_2D,
@@ -574,6 +581,14 @@ public:
   return glsl_base_type_is_64bit(base_type);
}
 
+   /**
+* Query whether or not a type is 16-bit
+*/
+   bool is_16bit() const
+   {
+  return glsl_base_type_is_16bit(base_type);
+   }
+
/**
 * Query whether or not a type is a non-array boolean type
 */
diff --git a/src/compiler/nir/nir_gather_info.c 
b/src/compiler/nir/nir_gather_info.c
index 946939657e..e8724313bd 100644
--- a/src/compiler/nir/nir_gather_info.c
+++ b/src/compiler/nir/nir_gather_info.c
@@ -230,11 +230,17 @@ gather_intrinsic_info(nir_intrinsic_instr *instr, 
nir_shader *shader)
  /* We need to track which input_reads bits correspond to a
   * dvec3/dvec4 input attribute */
  if (shader->info.stage == MESA_SHADER_VERTEX &&
- var->data.mode == nir_var_shader_in &&
- glsl_type_is_dual_slot(glsl_without_array(var->type))) {
-for (uint i = 0; i < glsl_count_attribute_slots(var->type, false); 
i++) {
-   int idx = var->data.location + i;
-   shader->info.double_inputs_read |= BITFIELD64_BIT(idx);
+ var->data.mode == nir_var_shader_in) {
+if (glsl_type_is_dual_slot(glsl_without_array(var->type))) {
+   for (uint i = 0; i < glsl_count_attribute_slots(var->type, 
false); i++) {
+  int idx = var->data.location + i;
+  shader->info.double_inputs_read |= BITFIELD64_BIT(idx);
+   }
+} else if (glsl_get_bit_size(glsl_without_array(var->type)) == 16) 
{
+   for (uint i = 0; i < glsl_count_attribute_slots(var->type, 
false); i++) {
+  int idx = var->data.location + i;
+  shader->info.inputs_read_16bit |= BITFIELD64_BIT(idx);
+   }
 }
  }
   }
@@ -357,6 +363,7 @@ nir_shader_gather_info(nir_shader *shader, 
nir_function_impl *entrypoint)
shader->info.outputs_read = 0;
shader->info.patch_outputs_read = 0;
shader->info.double_inputs_read = 0;
+   shader->info.inputs_read_16bit = 0;
shader->info.patch_inputs_read = 0;
shader->info.patch_outputs_written = 0;
shader->info.system_values_read = 0;
diff --git a/src/compiler/shader_info.h b/src/compiler/shader_info.h
index bcb3f0fffa..016751de8d 100644
--- a/src/compiler/shader_info.h
+++ b/src/compiler/shader_info.h
@@ -55,6 +55,8 @@ typedef struct shader_info {
uint64_t inputs_read;
/* Which inputs are actually read and are double */
uint64_t double_inputs_read;
+   /* Which inputs are actually read and are 16-bit type */
+   uint64_t inputs_read_16bit;
/* Which outputs are actually written */
uint64_t outputs_written;
/* Which outputs are actually read */
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 34/44] i965/fs: Support 16-bit types at load_input and store_output

2017-11-29 Thread Jose Maria Casanova Crespo
Enables the support of 16-bit types on load_input and
store_outputs intrinsics intra-stages.

The approach was based on re-using the 32-bit URB read
and writes between stages, shuffling pairs of 16-bit values into
32-bit values at load_store intrinsic and un-shuffling the values
at load_inputs.

v2: Minor changes after rebase against recent master (Jose Maria
Casanova)

v3: - Remove unnecessary retypes (Topi Pohjolainen)
- Rebase needed changes as now get_nir_src doesn't returns a 32-bit
  type, it returns a bitsized integer. Previous implementation of this
  patch assumed 32-bit type for get_nir_src. (Jose María Casanova)
- Move 32-16 shuffle-unshuffle helpers to independent patch.
  (Jose María Casanova)
---
 src/intel/compiler/brw_fs_nir.cpp | 69 +--
 1 file changed, 67 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 0f1a428242..66299860c9 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2209,12 +2209,17 @@ fs_visitor::emit_gs_input_load(const fs_reg ,
   first_component = first_component / 2;
}
 
+   if (type_sz(dst.type) == 2) {
+  num_components = DIV_ROUND_UP(num_components, 2);
+  tmp_dst = bld.vgrf(BRW_REGISTER_TYPE_F, num_components);
+   }
+
for (unsigned iter = 0; iter < num_iterations; iter++) {
   if (offset_const) {
  /* Constant indexing - use global offset. */
  if (first_component != 0) {
 unsigned read_components = num_components + first_component;
-fs_reg tmp = bld.vgrf(dst.type, read_components);
+fs_reg tmp = bld.vgrf(tmp_dst.type, read_components);
 inst = bld.emit(SHADER_OPCODE_URB_READ_SIMD8, tmp, icp_handle);
 inst->size_written = read_components *
  tmp.component_size(inst->exec_size);
@@ -2264,6 +2269,11 @@ fs_visitor::emit_gs_input_load(const fs_reg ,
 bld.MOV(offset(dst, bld, iter * 2 + c), offset(tmp_dst, bld, c));
   }
 
+  if (type_sz(dst.type) == 2) {
+ shuffle_32bit_load_result_to_16bit_data(bld, dst, tmp_dst,
+ orig_num_components);
+  }
+
   if (num_iterations > 1) {
  num_components = orig_num_components - 2;
  if(offset_const) {
@@ -2593,6 +2603,11 @@ fs_visitor::nir_emit_tcs_intrinsic(const fs_builder ,
  dst = tmp;
   }
 
+  if (type_sz(dst.type) == 2) {
+ num_components = DIV_ROUND_UP(num_components, 2);
+ dst = bld.vgrf(BRW_REGISTER_TYPE_F, num_components);
+  }
+
   for (unsigned iter = 0; iter < num_iterations; iter++) {
  if (indirect_offset.file == BAD_FILE) {
 /* Constant indexing - use global offset. */
@@ -2648,6 +2663,11 @@ fs_visitor::nir_emit_tcs_intrinsic(const fs_builder ,
 }
  }
 
+ if (type_sz(orig_dst.type) == 2) {
+shuffle_32bit_load_result_to_16bit_data(
+   bld, orig_dst, dst, instr->num_components);
+ }
+
  /* Copy the temporary to the destination to deal with writemasking.
   *
   * Also attempt to deal with gl_PointSize being in the .w component.
@@ -2738,6 +2758,8 @@ fs_visitor::nir_emit_tcs_intrinsic(const fs_builder ,
   fs_reg value = get_nir_src(instr->src[0]);
   bool is_64bit = (instr->src[0].is_ssa ?
  instr->src[0].ssa->bit_size : instr->src[0].reg.reg->bit_size) == 64;
+  bool is_16bit = (instr->src[0].is_ssa ?
+ instr->src[0].ssa->bit_size : instr->src[0].reg.reg->bit_size) == 16;
   fs_reg indirect_offset = get_indirect_offset(instr);
   unsigned imm_offset = instr->const_index[0];
   unsigned mask = instr->const_index[1];
@@ -2767,6 +2789,11 @@ fs_visitor::nir_emit_tcs_intrinsic(const fs_builder ,
 num_iterations = 2;
 iter_components = 2;
  }
+  } else {
+ if (is_16bit) {
+iter_components = DIV_ROUND_UP(num_components, 2);
+value = retype (value, BRW_REGISTER_TYPE_D);
+ }
   }
 
   mask = mask << first_component;
@@ -2812,6 +2839,13 @@ fs_visitor::nir_emit_tcs_intrinsic(const fs_builder ,
continue;
 
 if (!is_64bit) {
+   if (is_16bit) {
+  shuffle_16bit_data_for_32bit_write(bld,
+ retype(offset(value,bld, i), BRW_REGISTER_TYPE_F),
+ retype(offset(value,bld, i), BRW_REGISTER_TYPE_HF),
+ 2);
+  value = retype (value, BRW_REGISTER_TYPE_D);
+   }
srcs[header_regs + i + first_component] = offset(value, bld, i);
 } else {
/* We need to shuffle the 64-bit data to match the layout
@@ -2955,6 +2989,11 @@ fs_visitor::nir_emit_tes_intrinsic(const fs_builder ,
 dest = tmp;
 

[Mesa-dev] [PATCH v4 31/44] anv/pipeline: Use 32-bit surface formats for 16-bit formats

2017-11-29 Thread Jose Maria Casanova Crespo
From: Alejandro Piñeiro 

From Vulkan 1.0.50 spec, Section 3.30.1. Format Definition:
VK_FORMAT_R16G16_SFLOAT

A two-component, 32-bit signed floating-point format that has a
16-bit R component in bytes 0..1, and a 16-bit G component in
bytes 2..3.

As vertex data and other inputs has been always expected to be up-converted
from 16-bit to 32-bits. But when we use the 16-bit input in the shader
without any conversion we use 32-bit uint format. (applies also to use of
2/3/4 components)

At skl PRM, vol 07, section FormatConversion, page 445 there is
a table that points that *16*FLOAT formats are converted to FLOAT,
that in that context, is a 32-bit float. This is similar to the
*64*FLOAT formats, that converts 64-bit floats to 32-bit floats.

Unfortunately, while with 64-bit floats we have the alternative to use
*64*PASSTHRU formats, it is not the case with 16-bits.

This issue happens too with 16-bit int surface formats.

As a workaround, if we are using a 16-bit location at the shader, we
use 32-bit uint formats to avoid the conversion, and will fix getting the
proper content later. Note that as we are using 32-bit formats, we
can use formats with less components (example: use *R32* for *R16G16*).

v2: Always use UINT surface format variants. (Topi Pohjolainen)
Renamed half_inputs_read to inputs_read_16bit (Jason Ekstrand)
Reword commit log (Jason Ekstrand)

Signed-off-by: Jose Maria Casanova Crespo 
Signed-off-by: Alejandro Piñeiro 
---
 src/intel/vulkan/genX_pipeline.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
index 7e3a785c58..6ab3b6f7c2 100644
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -83,6 +83,31 @@ vertex_element_comp_control(enum isl_format format, unsigned 
comp)
}
 }
 
+#if GEN_GEN >= 8
+static enum isl_format
+adjust_16bit_format(enum isl_format format)
+{
+   switch(format) {
+   case ISL_FORMAT_R16_UINT:
+   case ISL_FORMAT_R16_SINT:
+   case ISL_FORMAT_R16_FLOAT:
+   case ISL_FORMAT_R16G16_UINT:
+   case ISL_FORMAT_R16G16_SINT:
+   case ISL_FORMAT_R16G16_FLOAT:
+  return ISL_FORMAT_R32_UINT;
+   case ISL_FORMAT_R16G16B16_UINT:
+   case ISL_FORMAT_R16G16B16_SINT:
+   case ISL_FORMAT_R16G16B16_FLOAT:
+   case ISL_FORMAT_R16G16B16A16_UINT:
+   case ISL_FORMAT_R16G16B16A16_SINT:
+   case ISL_FORMAT_R16G16B16A16_FLOAT:
+  return ISL_FORMAT_R32G32_UINT;
+   default:
+  return format;
+   }
+}
+#endif
+
 static void
 emit_vertex_input(struct anv_pipeline *pipeline,
   const VkPipelineVertexInputStateCreateInfo *info)
@@ -95,6 +120,10 @@ emit_vertex_input(struct anv_pipeline *pipeline,
assert((inputs_read & ((1 << VERT_ATTRIB_GENERIC0) - 1)) == 0);
const uint32_t elements = inputs_read >> VERT_ATTRIB_GENERIC0;
const uint32_t elements_double = double_inputs_read >> VERT_ATTRIB_GENERIC0;
+#if GEN_GEN >= 8
+   const uint64_t inputs_read_16bit = vs_prog_data->inputs_read_16bit;
+   const uint32_t elements_16bit = inputs_read_16bit >> VERT_ATTRIB_GENERIC0;
+#endif
const bool needs_svgs_elem = vs_prog_data->uses_vertexid ||
 vs_prog_data->uses_instanceid ||
 vs_prog_data->uses_basevertex ||
@@ -125,6 +154,11 @@ emit_vertex_input(struct anv_pipeline *pipeline,
   VK_IMAGE_ASPECT_COLOR_BIT,
   VK_IMAGE_TILING_LINEAR);
 
+#if GEN_GEN >= 8
+  if ((elements_16bit & (1 << desc->location)) != 0) {
+ format = adjust_16bit_format(format);
+  }
+#endif
   assert(desc->binding < MAX_VBS);
 
   if ((elements & (1 << desc->location)) == 0)
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 25/44] anv: Enable VK_KHR_16bit_storage for SSBO and UBO

2017-11-29 Thread Jose Maria Casanova Crespo
From: Alejandro Piñeiro 

It uses VK_KHR_get_physical_device_properties2 functionality to expose
if the extension is supported or not.

v2: update due rebase against master (Alejandro)

v3: (Jason Ekstrand)
- Move this patch up in VK_KHR_16bit_storage series enabling only
  storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccess.
- Only expose VK_KHR_16bit_storage on Gen8+

Signed-off-by: Jose Maria Casanova Crespo 
Signed-off-by: Alejandro Piñeiro 
---
 src/intel/vulkan/anv_device.c  | 13 +
 src/intel/vulkan/anv_extensions.py |  1 +
 2 files changed, 14 insertions(+)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index b5577ee61d..69a1f5a5f6 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -725,6 +725,19 @@ void anv_GetPhysicalDeviceFeatures2KHR(
  break;
   }
 
+  case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_16BIT_STORAGE_FEATURES_KHR: {
+ ANV_FROM_HANDLE(anv_physical_device, pdevice, physicalDevice);
+
+ VkPhysicalDevice16BitStorageFeaturesKHR *features =
+(VkPhysicalDevice16BitStorageFeaturesKHR *)ext;
+
+ features->storageBuffer16BitAccess = pdevice->info.gen >= 8;
+ features->uniformAndStorageBuffer16BitAccess = pdevice->info.gen >= 8;
+ features->storagePushConstant16 = false;
+ features->storageInputOutput16 = false;
+ break;
+  }
+
   default:
  anv_debug_ignored_stype(ext->sType);
  break;
diff --git a/src/intel/vulkan/anv_extensions.py 
b/src/intel/vulkan/anv_extensions.py
index b1e984b8cd..c49718dfd4 100644
--- a/src/intel/vulkan/anv_extensions.py
+++ b/src/intel/vulkan/anv_extensions.py
@@ -51,6 +51,7 @@ class Extension:
 # and dEQP-VK.api.info.device fail due to the duplicated strings.
 EXTENSIONS = [
 Extension('VK_ANDROID_native_buffer', 5, 'ANDROID'),
+Extension('VK_KHR_16bit_storage', 1, 'device->info.gen 
>= 8'),
 Extension('VK_KHR_bind_memory2',  1, True),
 Extension('VK_KHR_dedicated_allocation',  1, True),
 Extension('VK_KHR_descriptor_update_template',1, True),
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 26/44] i965/fs: Optimize 16-bit SSBO stores by packing two into a 32-bit reg

2017-11-29 Thread Jose Maria Casanova Crespo
From: Eduardo Lima Mitev 

Currently, we use byte-scattered write messages for storing 16-bit
into an SSBO. This is because untyped surface messages have a fixed
32-bit size.

This patch optimizes these 16-bit writes by combining 2 values (e.g,
two consecutive components aligned with 32-bits) into a 32-bit register,
packing the two 16-bit words.

16-bit single component values will continue to use byte-scattered
write messages. The same will happens when the first consecutive
component is not aligned 32-bits.

This optimization reduces the number of SEND messages used for storing
16-bit values potentially by 2 or 4, which cuts down execution time
significantly because byte-scattered writes are an expensive
operation as they only write a component for message.

v2: Removed use of stride = 2 on sources (Jason Ekstrand)
Rework optimization using shuffle 16 write and enable writes
of 16bit vec4 with only one message of 32-bits. (Chema Casanova)
v3: - Fix coding style (Eduardo Lima)
- Reorganize code to avoid duplication. (Jason Ekstrand)
- Include new comments to explain the length calculations to
  fix alignment issues of components. (Jason Ekstrand)
- Fix issues with writemask yz with 16-bit writes. (Jason Ektrand)

Signed-off-by: Jose Maria Casanova Crespo 
Signed-off-by: Eduardo Lima 
---
 src/intel/compiler/brw_fs_nir.cpp | 61 +--
 1 file changed, 46 insertions(+), 15 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index c091241132..2c344ec7df 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4088,14 +4088,14 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
*/
   unsigned bit_size = nir_src_bit_size(instr->src[0]);
   unsigned type_size = bit_size / 8;
+  unsigned slots_per_component = 1;
+
   if (bit_size == 64) {
  val_reg = shuffle_64bit_data_for_32bit_write(bld,
 val_reg, instr->num_components);
+ slots_per_component = 2;
   }
 
-  /* 16-bit types would use a minimum of 1 slot */
-  unsigned type_slots = MAX2(type_size / 4, 1);
-
   /* Combine groups of consecutive enabled channels in one write
* message. We use ffs to find the first enabled channel and then ffs on
* the bit-inverse, down-shifted writemask to determine the length of
@@ -4105,18 +4105,48 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
  unsigned first_component = ffs(writemask) - 1;
  unsigned length = ffs(~(writemask >> first_component)) - 1;
 
+ fs_reg current_val_reg =
+offset(val_reg, bld, first_component * slots_per_component);
+
  if (type_size > 4) {
 /* We can't write more than 2 64-bit components at once. Limit
  * the length of the write to what we can do and let the next
  * iteration handle the rest.
  */
 length = MIN2(2, length);
- } else if (type_size == 2) {
-/* For 16-bit types we are using byte scattered writes, that can
- * only write one component per call. So we limit the length, and
- * let the write happening in several iterations.
+ } else if (type_size < 4) {
+assert(type_size == 2);
+/* For 16-bit types we pack two consecutive values into a 32-bit
+ * word and use an untyped write message. For single values or not
+ * 32-bit-aligned we need to use byte-scattered writes because
+ * untyped writes works with 32-bit components with 32-bit
+ * alignment. byte_scattered_write messages only support one
+ * 16-bit component at a time.
+ *
+ * For example, if there is a 3-component vector we submit one
+ * untyped-write message of 32-bit (first two components), and one
+ * byte-scattered write message (the last component).
  */
-length = 1;
+
+if (first_component % 2) {
+   /* If we use a .yz writemask we also need to emit 2
+* byte-scattered write messages because of y-component not
+* being aligned to 32-bit.
+*/
+   length = 1;
+} else if (length > 2 && (length % 2)) {
+   /* If there is an odd number of consecutive components we left
+* the not paired component for a following emit of length == 1
+* with byte_scattered_write.
+*/
+   length --;
+}
+
+fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_D,
+  DIV_ROUND_UP(length, 2));
+shuffle_16bit_data_for_32bit_write(bld, tmp, current_val_reg,
+ 

Re: [Mesa-dev] [PATCH v3] nir: add varying array splitting pass

2017-11-29 Thread Dieter Nützel

Tested-by: Dieter Nützel 

Dieter

Am 30.11.2017 01:19, schrieb Timothy Arceri:

V2:
 - fix matrix support, non-array matrices were being skipped in v1

v3:
 - handle lowering of tcs output loads correctly
 - correctly mark indirect locations for either in or out not both
   when processing a stage.
 - use nir_src_copy() when lowering stores.
---
 src/compiler/Makefile.sources  |   1 +
 src/compiler/nir/meson.build   |   1 +
 src/compiler/nir/nir.h |   1 +
 src/compiler/nir/nir_lower_io_arrays_to_elements.c | 383 
+

 4 files changed, 386 insertions(+)
 create mode 100644 src/compiler/nir/nir_lower_io_arrays_to_elements.c

diff --git a/src/compiler/Makefile.sources 
b/src/compiler/Makefile.sources

index 2ab8e163a26..c5094b7f198 100644
--- a/src/compiler/Makefile.sources
+++ b/src/compiler/Makefile.sources
@@ -219,20 +219,21 @@ NIR_FILES = \
nir/nir_lower_double_ops.c \
nir/nir_lower_drawpixels.c \
nir/nir_lower_global_vars_to_local.c \
nir/nir_lower_gs_intrinsics.c \
nir/nir_lower_load_const_to_scalar.c \
nir/nir_lower_locals_to_regs.c \
nir/nir_lower_idiv.c \
nir/nir_lower_indirect_derefs.c \
nir/nir_lower_int64.c \
nir/nir_lower_io.c \
+   nir/nir_lower_io_arrays_to_elements.c \
nir/nir_lower_io_to_temporaries.c \
nir/nir_lower_io_to_scalar.c \
nir/nir_lower_io_types.c \
nir/nir_lower_passthrough_edgeflags.c \
nir/nir_lower_patch_vertices.c \
nir/nir_lower_phis_to_scalar.c \
nir/nir_lower_regs_to_ssa.c \
nir/nir_lower_returns.c \
nir/nir_lower_samplers.c \
nir/nir_lower_samplers_as_deref.c \
diff --git a/src/compiler/nir/meson.build 
b/src/compiler/nir/meson.build

index e5c8326aa06..b61a07773d3 100644
--- a/src/compiler/nir/meson.build
+++ b/src/compiler/nir/meson.build
@@ -107,20 +107,21 @@ files_libnir = files(
   'nir_lower_double_ops.c',
   'nir_lower_drawpixels.c',
   'nir_lower_global_vars_to_local.c',
   'nir_lower_gs_intrinsics.c',
   'nir_lower_load_const_to_scalar.c',
   'nir_lower_locals_to_regs.c',
   'nir_lower_idiv.c',
   'nir_lower_indirect_derefs.c',
   'nir_lower_int64.c',
   'nir_lower_io.c',
+  'nir_lower_io_arrays_to_elements.c',
   'nir_lower_io_to_temporaries.c',
   'nir_lower_io_to_scalar.c',
   'nir_lower_io_types.c',
   'nir_lower_passthrough_edgeflags.c',
   'nir_lower_patch_vertices.c',
   'nir_lower_phis_to_scalar.c',
   'nir_lower_regs_to_ssa.c',
   'nir_lower_returns.c',
   'nir_lower_samplers.c',
   'nir_lower_samplers_as_deref.c',
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index c01fa6707a4..4c5d976a60d 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2486,20 +2486,21 @@ bool nir_lower_constant_initializers(nir_shader 
*shader,

  nir_variable_mode modes);

 bool nir_move_vec_src_uses_to_dest(nir_shader *shader);
 bool nir_lower_vec_to_movs(nir_shader *shader);
 void nir_lower_alpha_test(nir_shader *shader, enum compare_func func,
   bool alpha_to_one);
 bool nir_lower_alu_to_scalar(nir_shader *shader);
 bool nir_lower_load_const_to_scalar(nir_shader *shader);
 bool nir_lower_read_invocation_to_scalar(nir_shader *shader);
 bool nir_lower_phis_to_scalar(nir_shader *shader);
+void nir_lower_io_arrays_to_elements(nir_shader *producer, nir_shader
*consumer);
 void nir_lower_io_to_scalar(nir_shader *shader, nir_variable_mode 
mask);
 void nir_lower_io_to_scalar_early(nir_shader *shader, 
nir_variable_mode mask);


 bool nir_lower_samplers(nir_shader *shader,
 const struct gl_shader_program 
*shader_program);

 bool nir_lower_samplers_as_deref(nir_shader *shader,
  const struct gl_shader_program
*shader_program);

 typedef struct nir_lower_subgroups_options {
uint8_t subgroup_size;
diff --git a/src/compiler/nir/nir_lower_io_arrays_to_elements.c
b/src/compiler/nir/nir_lower_io_arrays_to_elements.c
new file mode 100644
index 000..94b93e3ec91
--- /dev/null
+++ b/src/compiler/nir/nir_lower_io_arrays_to_elements.c
@@ -0,0 +1,383 @@
+/*
+ * Copyright © 2017 Timothy Arceri
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a
+ * copy of this software and associated documentation files (the 
"Software"),
+ * to deal in the Software without restriction, including without 
limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom 
the
+ * Software is furnished to do so, subject to the following 
conditions:

+ *
+ * The above copyright notice and this permission notice (including 
the next
+ * paragraph) shall be included in all copies or substantial portions 
of the

+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", 

Re: [Mesa-dev] [PATCH v4] nir: add varying component packing helpers

2017-11-29 Thread Dieter Nützel

Tested-by: Dieter Nützel 

Dieter

Am 30.11.2017 01:20, schrieb Timothy Arceri:

v2: update shader info input/output masks when pack components
v3: make sure interpolation loc matches, this is required for the
radeonsi NIR backend.
v4: 33dca36f4f28 fixed nir_gather_info to update outputs_read
correct, make sure we also adjust this correctly when
packing components.

Reviewed-by: Bas Nieuwenhuizen  (v1)
Reviewed-by: Nicolai Hähnle  (v3)
---
 src/compiler/nir/nir.h |   2 +
 src/compiler/nir/nir_linking_helpers.c | 330 
+

 2 files changed, 332 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 4c5d976a60d..83858afe148 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2452,20 +2452,22 @@ void nir_lower_io_to_temporaries(nir_shader 
*shader,

  nir_function_impl *entrypoint,
  bool outputs, bool inputs);

 void nir_shader_gather_info(nir_shader *shader, nir_function_impl 
*entrypoint);


 void nir_assign_var_locations(struct exec_list *var_list, unsigned 
*size,
   int (*type_size)(const struct glsl_type 
*));


 /* Some helpers to do very simple linking */
 bool nir_remove_unused_varyings(nir_shader *producer, nir_shader 
*consumer);

+void nir_compact_varyings(nir_shader *producer, nir_shader *consumer,
+  bool default_to_smooth_interp);

 typedef enum {
/* If set, this forces all non-flat fragment shader inputs to be
 * interpolated as if with the "sample" qualifier.  This requires
 * nir_shader_compiler_options::use_interpolated_input_intrinsics.
 */
nir_lower_io_force_sample_interpolation = (1 << 1),
 } nir_lower_io_options;
 bool nir_lower_io(nir_shader *shader,
   nir_variable_mode modes,
diff --git a/src/compiler/nir/nir_linking_helpers.c
b/src/compiler/nir/nir_linking_helpers.c
index 4d709c1b3c5..9f0122d4519 100644
--- a/src/compiler/nir/nir_linking_helpers.c
+++ b/src/compiler/nir/nir_linking_helpers.c
@@ -166,10 +166,340 @@ nir_remove_unused_varyings(nir_shader
*producer, nir_shader *consumer)

bool progress = false;
progress = remove_unused_io_vars(producer, >outputs, 
read,

 patches_read);

progress = remove_unused_io_vars(consumer, >inputs, 
written,

 patches_written) || progress;

return progress;
 }
+
+static uint8_t
+get_interp_type(nir_variable *var, bool default_to_smooth_interp)
+{
+   if (var->data.interpolation != INTERP_MODE_NONE)
+  return var->data.interpolation;
+   else if (default_to_smooth_interp)
+  return INTERP_MODE_SMOOTH;
+   else
+  return INTERP_MODE_NONE;
+}
+
+#define INTERPOLATE_LOC_SAMPLE 0
+#define INTERPOLATE_LOC_CENTROID 1
+#define INTERPOLATE_LOC_CENTER 2
+
+static uint8_t
+get_interp_loc(nir_variable *var)
+{
+   if (var->data.sample)
+  return INTERPOLATE_LOC_SAMPLE;
+   else if (var->data.centroid)
+  return INTERPOLATE_LOC_CENTROID;
+   else
+  return INTERPOLATE_LOC_CENTER;
+}
+
+static void
+get_slot_component_masks_and_interp_types(struct exec_list *var_list,
+  uint8_t *comps,
+  uint8_t *interp_type,
+  uint8_t *interp_loc,
+  gl_shader_stage stage,
+  bool 
default_to_smooth_interp)

+{
+   nir_foreach_variable_safe(var, var_list) {
+  assert(var->data.location >= 0);
+
+  /* Only remap things that aren't built-ins.
+   * TODO: add TES patch support.
+   */
+  if (var->data.location >= VARYING_SLOT_VAR0 &&
+  var->data.location - VARYING_SLOT_VAR0 < 32) {
+
+ const struct glsl_type *type = var->type;
+ if (nir_is_per_vertex_io(var, stage)) {
+assert(glsl_type_is_array(type));
+type = glsl_get_array_element(type);
+ }
+
+ unsigned location = var->data.location - VARYING_SLOT_VAR0;
+ unsigned elements =
+glsl_get_vector_elements(glsl_without_array(type));
+
+ bool dual_slot = 
glsl_type_is_dual_slot(glsl_without_array(type));

+ unsigned slots = glsl_count_attribute_slots(type, false);
+ unsigned comps_slot2 = 0;
+ for (unsigned i = 0; i < slots; i++) {
+interp_type[location + i] =
+   get_interp_type(var, default_to_smooth_interp);
+interp_loc[location + i] = get_interp_loc(var);
+
+if (dual_slot) {
+   if (i & 1) {
+  comps[location + i] |= ((1 << comps_slot2) - 1);
+   } else {
+  unsigned num_comps = 4 - var->data.location_frac;
+  comps_slot2 = (elements * 2) - 

[Mesa-dev] [PATCH v4 23/44] i965/fs: Enables 16-bit load_ubo with sampler

2017-11-29 Thread Jose Maria Casanova Crespo
load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit
surface format defined. So when reading 16-bit components with the
sampler we need to unshuffle two 16-bit components from each 32-bit
component.

Using the sampler avoids the use of the byte_scattered_read message
that needs one message for each component and is supposed to be
slower.

In the case of SKL+ we take advantage of a hardware feature that
automatically defines a channel mask based on the rlen value, so on
SKL+ we only use half of the registers without using a header in the
payload.
---
 src/intel/compiler/brw_fs.cpp   | 31 +++
 src/intel/compiler/brw_fs_generator.cpp | 10 --
 2 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 1ca4d416b2..9c543496ba 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -184,9 +184,17 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
,
 * a double this means we are only loading 2 elements worth of data.
 * We also want to use a 32-bit data type for the dst of the load operation
 * so other parts of the driver don't get confused about the size of the
-* result.
+* result. On the case of 16-bit data we only need half of the 32-bit
+* components on SKL+ as we take advance of using message return size to
+* define an xy channel mask.
 */
-   fs_reg vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 4);
+   fs_reg vec4_result;
+   if (type_sz(dst.type) == 2 && (devinfo->gen >= 9)) {
+  vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 2);
+  vec4_result = retype(vec4_result, BRW_REGISTER_TYPE_HF);
+   } else {
+  vec4_result = bld.vgrf(BRW_REGISTER_TYPE_F, 4);
+   }
fs_inst *inst = bld.emit(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_LOGICAL,
 vec4_result, surf_index, vec4_offset);
inst->size_written = 4 * vec4_result.component_size(inst->exec_size);
@@ -197,8 +205,23 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
,
}
 
vec4_result.type = dst.type;
-   bld.MOV(dst, offset(vec4_result, bld,
-   (const_offset & 0xf) / type_sz(vec4_result.type)));
+
+   if (type_sz(dst.type) == 2) {
+  /* 16-bit types need to be unshuffled as each pair of 16-bit components
+   * is packed on a 32-bit compoment because we are using a 32-bit format
+   * in the surface of uniform that is read by the sampler.
+   * TODO: On BDW+ mark when an uniform has 16-bit type so we could setup a
+   * surface format of 16-bit and use the 16-bit return format at the
+   * sampler.
+   */
+  vec4_result.stride = 2;
+  bld.MOV(dst, byte_offset(offset(vec4_result, bld,
+  (const_offset & 0x7) / 4),
+   (const_offset & 0x7) / 2 % 2 * 2));
+   } else {
+  bld.MOV(dst, offset(vec4_result, bld,
+  (const_offset & 0xf) / type_sz(vec4_result.type)));
+   }
 }
 
 /**
diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index a3861cd68e..00a4e29147 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -1381,12 +1381,18 @@ 
fs_generator::generate_varying_pull_constant_load_gen7(fs_inst *inst,
uint32_t simd_mode, rlen, mlen;
if (inst->exec_size == 16) {
   mlen = 2;
-  rlen = 8;
+  if (type_sz(dst.type) == 2 && (devinfo->gen >= 9))
+ rlen = 4;
+  else
+ rlen = 8;
   simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD16;
} else {
   assert(inst->exec_size == 8);
   mlen = 1;
-  rlen = 4;
+  if (type_sz(dst.type) == 2 && (devinfo->gen >= 9))
+ rlen = 2;
+  else
+ rlen = 4;
   simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD8;
}
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 22/44] i965/fs: Helpers for un/shuffle 16-bit pairs in 32-bit components

2017-11-29 Thread Jose Maria Casanova Crespo
This helpers are used to load/store 16-bit types from/to 32-bit
components.

The functions shuffle_32bit_load_result_to_16bit_data and
shuffle_16bit_data_for_32bit_write are implemented in a similar
way than the analogous functions for handling 64-bit types.
---
 src/intel/compiler/brw_fs.h   | 11 +
 src/intel/compiler/brw_fs_nir.cpp | 51 +++
 2 files changed, 62 insertions(+)

diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 19b897e7a9..30557324d5 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -497,6 +497,17 @@ void shuffle_32bit_load_result_to_64bit_data(const 
brw::fs_builder ,
 fs_reg shuffle_64bit_data_for_32bit_write(const brw::fs_builder ,
   const fs_reg ,
   uint32_t components);
+
+void shuffle_32bit_load_result_to_16bit_data(const brw::fs_builder ,
+ const fs_reg ,
+ const fs_reg ,
+ uint32_t components);
+
+void shuffle_16bit_data_for_32bit_write(const brw::fs_builder ,
+const fs_reg ,
+const fs_reg ,
+uint32_t components);
+
 fs_reg setup_imm_df(const brw::fs_builder ,
 double v);
 
diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 726b2fcee7..c091241132 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4828,6 +4828,33 @@ shuffle_32bit_load_result_to_64bit_data(const fs_builder 
,
}
 }
 
+void
+shuffle_32bit_load_result_to_16bit_data(const fs_builder ,
+const fs_reg ,
+const fs_reg ,
+uint32_t components)
+{
+   assert(type_sz(src.type) == 4);
+   assert(type_sz(dst.type) == 2);
+
+   fs_reg tmp = retype(bld.vgrf(src.type), dst.type);
+
+   for (unsigned i = 0; i < components; i++) {
+  const fs_reg component_i = subscript(offset(src, bld, i / 2), dst.type, 
i % 2);
+
+  bld.MOV(offset(tmp, bld, i % 2), component_i);
+
+  if (i % 2) {
+ bld.MOV(offset(dst, bld, i -1), offset(tmp, bld, 0));
+ bld.MOV(offset(dst, bld, i), offset(tmp, bld, 1));
+  }
+   }
+   if (components % 2) {
+  bld.MOV(offset(dst, bld, components - 1), tmp);
+   }
+}
+
+
 /**
  * This helper does the inverse operation of
  * SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA.
@@ -4860,6 +4887,30 @@ shuffle_64bit_data_for_32bit_write(const fs_builder ,
return dst;
 }
 
+void
+shuffle_16bit_data_for_32bit_write(const fs_builder ,
+   const fs_reg ,
+   const fs_reg ,
+   uint32_t components)
+{
+   assert(type_sz(src.type) == 2);
+   assert(type_sz(dst.type) == 4);
+
+   fs_reg tmp = bld.vgrf(dst.type);
+
+   for (unsigned i = 0; i < components; i++) {
+  const fs_reg component_i = offset(src, bld, i);
+  bld.MOV(subscript(tmp, src.type, i % 2), component_i);
+  if (i % 2) {
+ bld.MOV(offset(dst, bld, i / 2), tmp);
+  }
+   }
+   if (components % 2) {
+  bld.MOV(offset(dst, bld, components / 2), tmp);
+   }
+}
+
+
 fs_reg
 setup_imm_df(const fs_builder , double v)
 {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 24/44] anv: Enable SPV_KHR_16bit_storage on gen 8+

2017-11-29 Thread Jose Maria Casanova Crespo
From: Eduardo Lima Mitev 

v2: minor changes after rebase against recent master (Alejandro)
---
 src/intel/vulkan/anv_pipeline.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 907b24a758..c58bd2f9a1 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -140,6 +140,7 @@ anv_shader_compile_to_nir(struct anv_pipeline *pipeline,
   .image_write_without_format = true,
   .multiview = true,
   .variable_pointers = true,
+  .storage_16bit = device->instance->physicalDevice.info.gen >= 8,
};
 
nir_function *entry_point =
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 21/44] i965/fs: Use byte scattered read for 16-bit load_ssbo

2017-11-29 Thread Jose Maria Casanova Crespo
Used to enable 16-bit reads at do_untyped_vector_read, that is used on
the following intrinsics:

   * nir_intrinsic_load_shared
   * nir_intrinsic_load_ssbo

v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)

v3: - Add bitsize to scattered read operation (Jason Ekstrand)
- Remove implementation of 16-bit UBO read from this patch.
- Avoid assertion at opt_algebraic caused by ADD of two IMM with
  offset with BRW_REGISTER_TYPE_UD type found on matrix tests.
  (Jose Maria Casanova)
---
 src/intel/compiler/brw_fs_nir.cpp | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index ff04e2468b..726b2fcee7 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2353,6 +2353,18 @@ do_untyped_vector_read(const fs_builder ,
 
  bld.ADD(read_offset, read_offset, brw_imm_ud(16));
   }
+   } else if (type_sz(dest.type) == 2) {
+  fs_reg read_offset = bld.vgrf(BRW_REGISTER_TYPE_UD);
+  bld.MOV(read_offset, offset_reg);
+  for (unsigned i = 0; i < num_components; i++) {
+ fs_reg read_reg = emit_byte_scattered_read(bld, surf_index, 
read_offset,
+1 /* dims */,
+1,
+16 /*bit_size */,
+BRW_PREDICATE_NONE);
+ bld.MOV(offset(dest,bld,i), subscript(read_reg, dest.type, 0));
+ bld.ADD(read_offset, read_offset, brw_imm_ud(type_sz(dest.type)));
+  }
} else {
   unreachable("Unsupported type");
}
@@ -3929,7 +3941,6 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
   if (const_offset == NULL) {
  fs_reg base_offset = retype(get_nir_src(instr->src[1]),
  BRW_REGISTER_TYPE_UD);
-
  for (int i = 0; i < instr->num_components; i++)
 VARYING_PULL_CONSTANT_LOAD(bld, offset(dest, bld, i), surf_index,
base_offset, i * type_sz(dest.type));
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 20/44] i965/fs: Add byte scattered read message and fs support

2017-11-29 Thread Jose Maria Casanova Crespo
v2: Fix alignment style (Topi Pohjolainen)
(Jason Ekstrand)
- Enable bit_size parameter to scattered messages to enable different
  bitsizes byte/word/dword.
- Remove use of brw_send_indirect_scattered_message in favor of
  brw_send_indirect_surface_message.
- Move scattered messages to surface messages namespace.
- Assert align1 for scattered messages and assume Gen8+.
- Inline brw_set_dp_byte_scattered_read.
---
 src/intel/compiler/brw_eu.h|  8 +++
 src/intel/compiler/brw_eu_defines.h|  2 ++
 src/intel/compiler/brw_eu_emit.c   | 30 ++
 src/intel/compiler/brw_fs.cpp  | 19 
 src/intel/compiler/brw_fs_copy_propagation.cpp |  2 ++
 src/intel/compiler/brw_fs_generator.cpp|  6 ++
 src/intel/compiler/brw_fs_surface_builder.cpp  | 11 +-
 src/intel/compiler/brw_fs_surface_builder.h|  7 ++
 src/intel/compiler/brw_shader.cpp  |  6 ++
 9 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_eu.h b/src/intel/compiler/brw_eu.h
index 3ac3b4342a..2d0f56f793 100644
--- a/src/intel/compiler/brw_eu.h
+++ b/src/intel/compiler/brw_eu.h
@@ -485,6 +485,14 @@ brw_typed_surface_write(struct brw_codegen *p,
 unsigned msg_length,
 unsigned num_channels);
 
+void
+brw_byte_scattered_read(struct brw_codegen *p,
+struct brw_reg dst,
+struct brw_reg payload,
+struct brw_reg surface,
+unsigned msg_length,
+unsigned bit_size);
+
 void
 brw_byte_scattered_write(struct brw_codegen *p,
  struct brw_reg payload,
diff --git a/src/intel/compiler/brw_eu_defines.h 
b/src/intel/compiler/brw_eu_defines.h
index de6330ee54..aa510ebfa4 100644
--- a/src/intel/compiler/brw_eu_defines.h
+++ b/src/intel/compiler/brw_eu_defines.h
@@ -409,6 +409,8 @@ enum opcode {
 * opcode, but instead of taking a single payload blog they expect their
 * arguments separately as individual sources, like untyped write/read.
 */
+   SHADER_OPCODE_BYTE_SCATTERED_READ,
+   SHADER_OPCODE_BYTE_SCATTERED_READ_LOGICAL,
SHADER_OPCODE_BYTE_SCATTERED_WRITE,
SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL,
 
diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index ded7e228cf..bdc516848a 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++ b/src/intel/compiler/brw_eu_emit.c
@@ -2998,6 +2998,36 @@ static enum brw_data_size 
brw_data_size_from_bit_size(unsigned bit_size)
}
 }
 
+
+void
+brw_byte_scattered_read(struct brw_codegen *p,
+struct brw_reg dst,
+struct brw_reg payload,
+struct brw_reg surface,
+unsigned msg_length,
+unsigned bit_size)
+{
+   assert(brw_inst_access_mode(p->devinfo, p->current) == BRW_ALIGN_1);
+   const struct gen_device_info *devinfo = p->devinfo;
+   const unsigned sfid =  GEN7_SFID_DATAPORT_DATA_CACHE;
+
+   struct brw_inst *insn = brw_send_indirect_surface_message(
+  p, sfid, dst, payload, surface, msg_length,
+  brw_surface_payload_size(p, 1, true, true),
+  false);
+
+   unsigned msg_control = brw_data_size_from_bit_size(bit_size) << 2;
+
+   if (brw_inst_exec_size(devinfo, p->current) == BRW_EXECUTE_16)
+  msg_control |= 1; /* SIMD16 mode */
+   else
+  msg_control |= 0; /* SIMD8 mode */
+
+   brw_inst_set_dp_msg_type(devinfo, insn,
+HSW_DATAPORT_DC_PORT0_BYTE_SCATTERED_READ);
+   brw_inst_set_dp_msg_control(devinfo, insn, msg_control);
+}
+
 void
 brw_byte_scattered_write(struct brw_codegen *p,
  struct brw_reg payload,
diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 32f1d757f0..1ca4d416b2 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -251,6 +251,7 @@ fs_inst::is_send_from_grf() const
case SHADER_OPCODE_UNTYPED_SURFACE_READ:
case SHADER_OPCODE_UNTYPED_SURFACE_WRITE:
case SHADER_OPCODE_BYTE_SCATTERED_WRITE:
+   case SHADER_OPCODE_BYTE_SCATTERED_READ:
case SHADER_OPCODE_TYPED_ATOMIC:
case SHADER_OPCODE_TYPED_SURFACE_READ:
case SHADER_OPCODE_TYPED_SURFACE_WRITE:
@@ -750,6 +751,16 @@ fs_inst::components_read(unsigned i) const
   else
  return 1;
 
+   case SHADER_OPCODE_BYTE_SCATTERED_READ_LOGICAL:
+  assert(src[3].file == IMM &&
+ src[4].file == IMM);
+  if (i == 0)
+ return 1;
+  else if (i == 1)
+ return 0;
+  else
+ return 1;
+
case SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL:
   assert(src[3].file == IMM &&
  src[4].file == IMM);
@@ -798,6 +809,7 @@ fs_inst::size_read(int arg) const
case SHADER_OPCODE_TYPED_SURFACE_WRITE:
case 

[Mesa-dev] [PATCH v4 19/44] i965/fs: Use byte_scattered_write on 16-bit store_ssbo

2017-11-29 Thread Jose Maria Casanova Crespo
From: Alejandro Piñeiro 

We need to rely on byte scattered writes as untyped writes are 32-bit
size. We could try to keep using 32-bit messages when we have two or
four 16-bit elements, but for simplicity sake, we use the same message
for any component number. We revisit this aproach in the follwing
patches.

v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)

v3: (Jason Ekstrand)
- Include bit_size to scattered write message and remove namespace
- specific for scattered messages.
- Move comment to proper place.
- Squashed with i965/fs: Adjust type_size/type_slots on store_ssbo.
(Jose Maria Casanova)
- Take into account that get_nir_src returns now WORD types for
  16-bit sources instead of DWORD.

Signed-off-by: Jose Maria Casanova Crespo 
Signed-off-by: Alejandro Piñeiro 
---
 src/intel/compiler/brw_fs_nir.cpp | 51 ---
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index d6ab286147..ff04e2468b 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4075,14 +4075,15 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
* Also, we have to suffle 64-bit data to be in the appropriate layout
* expected by our 32-bit write messages.
*/
-  unsigned type_size = 4;
-  if (nir_src_bit_size(instr->src[0]) == 64) {
- type_size = 8;
+  unsigned bit_size = nir_src_bit_size(instr->src[0]);
+  unsigned type_size = bit_size / 8;
+  if (bit_size == 64) {
  val_reg = shuffle_64bit_data_for_32bit_write(bld,
 val_reg, instr->num_components);
   }
 
-  unsigned type_slots = type_size / 4;
+  /* 16-bit types would use a minimum of 1 slot */
+  unsigned type_slots = MAX2(type_size / 4, 1);
 
   /* Combine groups of consecutive enabled channels in one write
* message. We use ffs to find the first enabled channel and then ffs on
@@ -4093,12 +4094,19 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
  unsigned first_component = ffs(writemask) - 1;
  unsigned length = ffs(~(writemask >> first_component)) - 1;
 
- /* We can't write more than 2 64-bit components at once. Limit the
-  * length of the write to what we can do and let the next iteration
-  * handle the rest
-  */
- if (type_size > 4)
+ if (type_size > 4) {
+/* We can't write more than 2 64-bit components at once. Limit
+ * the length of the write to what we can do and let the next
+ * iteration handle the rest.
+ */
 length = MIN2(2, length);
+ } else if (type_size == 2) {
+/* For 16-bit types we are using byte scattered writes, that can
+ * only write one component per call. So we limit the length, and
+ * let the write happening in several iterations.
+ */
+length = 1;
+ }
 
  fs_reg offset_reg;
  nir_const_value *const_offset = nir_src_as_const_value(instr->src[2]);
@@ -4112,11 +4120,26 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
 brw_imm_ud(type_size * first_component));
  }
 
-
- emit_untyped_write(bld, surf_index, offset_reg,
-offset(val_reg, bld, first_component * type_slots),
-1 /* dims */, length * type_slots,
-BRW_PREDICATE_NONE);
+ if (type_size == 2) {
+/* Untyped Surface messages have a fixed 32-bit size, so we need
+ * to rely on byte scattered in order to write 16-bit elements.
+ * The byte_scattered_write message needs that every written 16-bit
+ * type to be aligned 32-bits (stride=2).
+ */
+fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_D);
+bld.MOV(subscript(tmp, BRW_REGISTER_TYPE_W, 0),
+ offset(val_reg, bld, first_component));
+emit_byte_scattered_write(bld, surf_index, offset_reg,
+  tmp,
+  1 /* dims */, 1,
+  bit_size,
+  BRW_PREDICATE_NONE);
+ } else {
+emit_untyped_write(bld, surf_index, offset_reg,
+   offset(val_reg, bld, first_component * 
type_slots),
+   1 /* dims */, length * type_slots,
+   BRW_PREDICATE_NONE);
+ }
 
  /* Clear the bits in the writemask that we just wrote, then try
   * again to see if more channels are left.
-- 
2.14.3


[Mesa-dev] [PATCH v4 11/44] i965: Support for 16-bit base types in helper functions

2017-11-29 Thread Jose Maria Casanova Crespo
v2: Fixed calculation of scalar size for 16-bit types. (Jason Ekstrand)

Signed-off-by: Jose Maria Casanova Crespo 
Signed-off-by: Eduardo Lima 
Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_fs.cpp |  4 
 src/intel/compiler/brw_nir.c  | 16 
 src/intel/compiler/brw_shader.cpp |  6 ++
 3 files changed, 26 insertions(+)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 6772c0d5a5..6cdd2bd9f3 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -454,6 +454,10 @@ type_size_scalar(const struct glsl_type *type)
case GLSL_TYPE_FLOAT:
case GLSL_TYPE_BOOL:
   return type->components();
+   case GLSL_TYPE_UINT16:
+   case GLSL_TYPE_INT16:
+   case GLSL_TYPE_FLOAT16:
+  return DIV_ROUND_UP(type->components(), 2);
case GLSL_TYPE_DOUBLE:
case GLSL_TYPE_UINT64:
case GLSL_TYPE_INT64:
diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index 8f3f77f89a..cca4b45ae6 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -843,12 +843,18 @@ brw_type_for_nir_type(const struct gen_device_info 
*devinfo, nir_alu_type type)
case nir_type_float:
case nir_type_float32:
   return BRW_REGISTER_TYPE_F;
+   case nir_type_float16:
+  return BRW_REGISTER_TYPE_HF;
case nir_type_float64:
   return BRW_REGISTER_TYPE_DF;
case nir_type_int64:
   return devinfo->gen < 8 ? BRW_REGISTER_TYPE_DF : BRW_REGISTER_TYPE_Q;
case nir_type_uint64:
   return devinfo->gen < 8 ? BRW_REGISTER_TYPE_DF : BRW_REGISTER_TYPE_UQ;
+   case nir_type_int16:
+  return BRW_REGISTER_TYPE_W;
+   case nir_type_uint16:
+  return BRW_REGISTER_TYPE_UW;
default:
   unreachable("unknown type");
}
@@ -867,6 +873,9 @@ brw_glsl_base_type_for_nir_type(nir_alu_type type)
case nir_type_float32:
   return GLSL_TYPE_FLOAT;
 
+   case nir_type_float16:
+  return GLSL_TYPE_FLOAT16;
+
case nir_type_float64:
   return GLSL_TYPE_DOUBLE;
 
@@ -878,6 +887,13 @@ brw_glsl_base_type_for_nir_type(nir_alu_type type)
case nir_type_uint32:
   return GLSL_TYPE_UINT;
 
+   case nir_type_int16:
+  return GLSL_TYPE_INT16;
+
+   case nir_type_uint16:
+  return GLSL_TYPE_UINT16;
+
+
default:
   unreachable("bad type");
}
diff --git a/src/intel/compiler/brw_shader.cpp 
b/src/intel/compiler/brw_shader.cpp
index ba61481a0a..aa9e5f3d28 100644
--- a/src/intel/compiler/brw_shader.cpp
+++ b/src/intel/compiler/brw_shader.cpp
@@ -34,14 +34,20 @@ enum brw_reg_type
 brw_type_for_base_type(const struct glsl_type *type)
 {
switch (type->base_type) {
+   case GLSL_TYPE_FLOAT16:
+  return BRW_REGISTER_TYPE_HF;
case GLSL_TYPE_FLOAT:
   return BRW_REGISTER_TYPE_F;
case GLSL_TYPE_INT:
case GLSL_TYPE_BOOL:
case GLSL_TYPE_SUBROUTINE:
   return BRW_REGISTER_TYPE_D;
+   case GLSL_TYPE_INT16:
+  return BRW_REGISTER_TYPE_W;
case GLSL_TYPE_UINT:
   return BRW_REGISTER_TYPE_UD;
+   case GLSL_TYPE_UINT16:
+  return BRW_REGISTER_TYPE_UW;
case GLSL_TYPE_ARRAY:
   return brw_type_for_base_type(type->fields.array);
case GLSL_TYPE_STRUCT:
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 09/44] spirv/nir: Add support for SPV_KHR_16bit_storage

2017-11-29 Thread Jose Maria Casanova Crespo
From: Eduardo Lima Mitev 

v2: Minor changes after rebase against recent master (Alejandro
Pinheiro)

Reviewed-by: Jason Ekstrand 
---
 src/compiler/spirv/nir_spirv.h| 1 +
 src/compiler/spirv/spirv_to_nir.c | 7 +++
 2 files changed, 8 insertions(+)

diff --git a/src/compiler/spirv/nir_spirv.h b/src/compiler/spirv/nir_spirv.h
index 83577fb5d2..be7f536fe4 100644
--- a/src/compiler/spirv/nir_spirv.h
+++ b/src/compiler/spirv/nir_spirv.h
@@ -52,6 +52,7 @@ struct nir_spirv_supported_extensions {
bool int64;
bool multiview;
bool variable_pointers;
+   bool storage_16bit;
 };
 
 nir_function *spirv_to_nir(const uint32_t *words, size_t word_count,
diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index f745373473..d9b1400778 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -2865,6 +2865,13 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, 
SpvOp opcode,
  spv_check_supported(variable_pointers, cap);
  break;
 
+  case SpvCapabilityStorageUniformBufferBlock16:
+  case SpvCapabilityStorageUniform16:
+  case SpvCapabilityStoragePushConstant16:
+  case SpvCapabilityStorageInputOutput16:
+ spv_check_supported(storage_16bit, cap);
+ break;
+
   default:
  unreachable("Unhandled capability");
   }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 07/44] spirv/nir: Handle 16-bit types

2017-11-29 Thread Jose Maria Casanova Crespo
From: Eduardo Lima Mitev 

v2: Added more missing implementations of 16-bit types. (Jason Ekstrand)

v3: Store values in values[0].u16[i] (Jason Ekstrand)
Include switches based on bitsize for 16-bit types
(Chema Casanova)

Signed-off-by: Jose Maria Casanova Crespo 
Signed-off-by: Eduardo Lima 
---
 src/compiler/spirv/spirv_to_nir.c  | 111 +++--
 src/compiler/spirv/vtn_variables.c |  21 +++
 2 files changed, 115 insertions(+), 17 deletions(-)

diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index 027efab88d..f745373473 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -104,10 +104,13 @@ vtn_const_ssa_value(struct vtn_builder *b, nir_constant 
*constant,
switch (glsl_get_base_type(type)) {
case GLSL_TYPE_INT:
case GLSL_TYPE_UINT:
+   case GLSL_TYPE_INT16:
+   case GLSL_TYPE_UINT16:
case GLSL_TYPE_INT64:
case GLSL_TYPE_UINT64:
case GLSL_TYPE_BOOL:
case GLSL_TYPE_FLOAT:
+   case GLSL_TYPE_FLOAT16:
case GLSL_TYPE_DOUBLE: {
   int bit_size = glsl_get_bit_size(type);
   if (glsl_type_is_vector_or_scalar(type)) {
@@ -751,16 +754,38 @@ vtn_handle_type(struct vtn_builder *b, SpvOp opcode,
   int bit_size = w[2];
   const bool signedness = w[3];
   val->type->base_type = vtn_base_type_scalar;
-  if (bit_size == 64)
+  switch (bit_size) {
+  case 64:
  val->type->type = (signedness ? glsl_int64_t_type() : 
glsl_uint64_t_type());
-  else
+ break;
+  case 32:
  val->type->type = (signedness ? glsl_int_type() : glsl_uint_type());
+ break;
+  case 16:
+ val->type->type = (signedness ? glsl_int16_t_type() : 
glsl_uint16_t_type());
+ break;
+  default:
+ unreachable("Invalid int bit size");
+  }
   break;
}
+
case SpvOpTypeFloat: {
   int bit_size = w[2];
   val->type->base_type = vtn_base_type_scalar;
-  val->type->type = bit_size == 64 ? glsl_double_type() : 
glsl_float_type();
+  switch (bit_size) {
+  case 16:
+ val->type->type = glsl_float16_t_type();
+ break;
+  case 32:
+ val->type->type = glsl_float_type();
+ break;
+  case 64:
+ val->type->type = glsl_double_type();
+ break;
+  default:
+ assert(!"Invalid float bit size");
+  }
   break;
}
 
@@ -980,10 +1005,13 @@ vtn_null_constant(struct vtn_builder *b, const struct 
glsl_type *type)
switch (glsl_get_base_type(type)) {
case GLSL_TYPE_INT:
case GLSL_TYPE_UINT:
+   case GLSL_TYPE_INT16:
+   case GLSL_TYPE_UINT16:
case GLSL_TYPE_INT64:
case GLSL_TYPE_UINT64:
case GLSL_TYPE_BOOL:
case GLSL_TYPE_FLOAT:
+   case GLSL_TYPE_FLOAT16:
case GLSL_TYPE_DOUBLE:
   /* Nothing to do here.  It's already initialized to zero */
   break;
@@ -1106,12 +1134,20 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp opcode,
case SpvOpConstant: {
   assert(glsl_type_is_scalar(val->const_type));
   int bit_size = glsl_get_bit_size(val->const_type);
-  if (bit_size == 64) {
+  switch (bit_size) {
+  case 64: {
  val->constant->values->u32[0] = w[3];
  val->constant->values->u32[1] = w[4];
-  } else {
- assert(bit_size == 32);
+ break;
+  }
+  case 32:
  val->constant->values->u32[0] = w[3];
+ break;
+  case 16:
+ val->constant->values->u16[0] = w[3];
+ break;
+  default:
+ unreachable("Unsupported SpvOpConstant bit size");
   }
   break;
}
@@ -1119,11 +1155,21 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp opcode,
   assert(glsl_type_is_scalar(val->const_type));
   val->constant->values[0].u32[0] = get_specialization(b, val, w[3]);
   int bit_size = glsl_get_bit_size(val->const_type);
-  if (bit_size == 64)
+  switch (bit_size) {
+  case 64:{
  val->constant->values[0].u64[0] =
 get_specialization64(b, val, vtn_u64_literal([3]));
-  else
+ break;
+  }
+  case 32:
  val->constant->values[0].u32[0] = get_specialization(b, val, w[3]);
+ break;
+  case 16:
+ val->constant->values[0].u16[0] = get_specialization(b, val, w[3]);
+ break;
+  default:
+ unreachable("Unsupported SpvOpSpecConstant bit size");
+  }
   break;
}
case SpvOpSpecConstantComposite:
@@ -1136,9 +1182,12 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp opcode,
   switch (glsl_get_base_type(val->const_type)) {
   case GLSL_TYPE_UINT:
   case GLSL_TYPE_INT:
+  case GLSL_TYPE_UINT16:
+  case GLSL_TYPE_INT16:
   case GLSL_TYPE_UINT64:
   case GLSL_TYPE_INT64:
   case GLSL_TYPE_FLOAT:
+  case GLSL_TYPE_FLOAT16:
   case GLSL_TYPE_BOOL:
   case GLSL_TYPE_DOUBLE: {

[Mesa-dev] [PATCH v4 13/44] i965/fs: Handle 32-bit to 16-bit conversions

2017-11-29 Thread Jose Maria Casanova Crespo
From: Alejandro Piñeiro 

Conversions to 16-bit need having aligment between the 16-bit
and 32-bit types. So the conversion operations unpack 16-bit types
to with an stride=2 and then applies a MOV with the conversion.

v2 (Jason Ekstrand):
  - Avoid the general use of stride=2 for 16-bit register types.

v3 (Topi Pohjolainen)
  - Code style fix
   (Jason Ekstrand)
  - Now nir_op_f2f16 was renamed to nir_op_f2f16_undef
because conversion to f16 with undefined rounding is explicit

Signed-off-by: Eduardo Lima 
Signed-off-by: Alejandro Piñeiro 
Signed-off-by: Jose Maria Casanova Crespo 
Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_fs_nir.cpp | 25 +
 1 file changed, 25 insertions(+)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index bed1cd3b49..ddc0c6d105 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -724,6 +724,31 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
   inst->saturate = instr->dest.saturate;
   break;
 
+  /* In theory, it would be better to use BRW_OPCODE_F32TO16. Depending
+   * on the HW gen, it is a special hw opcode or just a MOV, and
+   * brw_F32TO16 (at brw_eu_emit) would do the work to chose.
+   *
+   * But if we want to use that opcode, we need to provide support on
+   * different optimizations and lowerings. As right now HF support is
+   * only for gen8+, it will be better to use directly the MOV, and use
+   * BRW_OPCODE_F32TO16 when/if we work for HF support on gen7.
+   */
+
+   case nir_op_f2f16_undef:
+   case nir_op_i2i16:
+   case nir_op_u2u16: {
+  /* TODO: Fixing aligment rules for conversions from 32-bits to
+   * 16-bit types should be moved to lower_conversions
+   */
+  fs_reg tmp = bld.vgrf(op[0].type, 1);
+  tmp = subscript(tmp, result.type, 0);
+  inst = bld.MOV(tmp, op[0]);
+  inst->saturate = instr->dest.saturate;
+  inst = bld.MOV(result, tmp);
+  inst->saturate = instr->dest.saturate;
+  break;
+   }
+
case nir_op_f2f64:
case nir_op_f2i64:
case nir_op_f2u64:
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 08/44] spirv: Enable FPRoundingMode decorator to nir operations

2017-11-29 Thread Jose Maria Casanova Crespo
SpvOpFConvert now manages the FPRoundingMode decorator for the
returning values enabling the nir_rounding_mode in the conversion
operation to fp16 values.

v2: Fixed breaking of specialization constants. (Jason Ekstrand)

v3: Avoid nir_rounding_mode * casting. (Jason Ekstrand)

Reviewed-by: Jason Ekstrand 
---
 src/compiler/spirv/vtn_alu.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/src/compiler/spirv/vtn_alu.c b/src/compiler/spirv/vtn_alu.c
index 7ec30b8a63..8e5af348a6 100644
--- a/src/compiler/spirv/vtn_alu.c
+++ b/src/compiler/spirv/vtn_alu.c
@@ -381,6 +381,27 @@ handle_no_contraction(struct vtn_builder *b, struct 
vtn_value *val, int member,
b->nb.exact = true;
 }
 
+static void
+handle_rounding_mode(struct vtn_builder *b, struct vtn_value *val, int member,
+ const struct vtn_decoration *dec, void 
*_out_rounding_mode)
+{
+   nir_rounding_mode *out_rounding_mode = _out_rounding_mode;
+   assert(dec->scope == VTN_DEC_DECORATION);
+   if (dec->decoration != SpvDecorationFPRoundingMode)
+  return;
+   switch (dec->literals[0]) {
+   case SpvFPRoundingModeRTE:
+  *out_rounding_mode = nir_rounding_mode_rtne;
+  break;
+   case SpvFPRoundingModeRTZ:
+  *out_rounding_mode = nir_rounding_mode_rtz;
+  break;
+   default:
+  unreachable("Not supported rounding mode");
+  break;
+   }
+}
+
 void
 vtn_handle_alu(struct vtn_builder *b, SpvOp opcode,
const uint32_t *w, unsigned count)
@@ -568,6 +589,18 @@ vtn_handle_alu(struct vtn_builder *b, SpvOp opcode,
   vtn_handle_bitcast(b, val->ssa, src[0]);
   break;
 
+   case SpvOpFConvert: {
+  nir_alu_type src_alu_type = 
nir_get_nir_type_for_glsl_type(vtn_src[0]->type);
+  nir_alu_type dst_alu_type = nir_get_nir_type_for_glsl_type(type);
+  nir_rounding_mode rounding_mode = nir_rounding_mode_undef;
+
+  vtn_foreach_decoration(b, val, handle_rounding_mode, _mode);
+  nir_op op = nir_type_conversion_op(src_alu_type, dst_alu_type, 
rounding_mode);
+
+  val->ssa->def = nir_build_alu(>nb, op, src[0], src[1], NULL, NULL);
+  break;
+   }
+
default: {
   bool swap;
   nir_alu_type src_alu_type = 
nir_get_nir_type_for_glsl_type(vtn_src[0]->type);
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 17/44] i965/fs: Add remove_extra_rounding_modes optimization

2017-11-29 Thread Jose Maria Casanova Crespo
From: Alejandro Piñeiro 

Although from SPIR-V point of view, rounding modes are attached to the
operation/destination, on i965 it is a status, so we don't need to
explicitly set the rounding mode if the one we want is already set.

Taking into account that the default mode is RTE, one possible
optimization would be optimize out the first RTE set for each
block. For in order to work, we would need to take into account block
interrelationships. At this point, it is not worth to complicate the
optimization for such small gain.

v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate
with the rounding mode (Curro)
v3: Reset optimization for every block. (Jason Ekstrand)

Signed-off-by: Jose Maria Casanova Crespo 
Signed-off-by: Alejandro Piñeiro 
Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_eu_defines.h |  1 +
 src/intel/compiler/brw_fs.cpp   | 37 +
 src/intel/compiler/brw_fs.h |  1 +
 3 files changed, 39 insertions(+)

diff --git a/src/intel/compiler/brw_eu_defines.h 
b/src/intel/compiler/brw_eu_defines.h
index 8a8f36cbc1..9d5cf05c86 100644
--- a/src/intel/compiler/brw_eu_defines.h
+++ b/src/intel/compiler/brw_eu_defines.h
@@ -1252,6 +1252,7 @@ enum PACKED brw_rnd_mode {
BRW_RND_MODE_RU = 1,/* Round Up, toward +inf */
BRW_RND_MODE_RD = 2,/* Round Down, toward -inf */
BRW_RND_MODE_RTZ = 3,   /* Round Toward Zero */
+   BRW_RND_MODE_UNSPECIFIED,  /* Unspecified rounding mode */
 };
 
 #endif /* BRW_EU_DEFINES_H */
diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 6cdd2bd9f3..36fb337c62 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -3092,6 +3092,42 @@ fs_visitor::remove_duplicate_mrf_writes()
return progress;
 }
 
+/**
+ * Rounding modes for conversion instructions are included for each
+ * conversion, but right now it is a state. So once it is set,
+ * we don't need to call it again for subsequent calls.
+ *
+ * This is useful for vector/matrices conversions, as setting the
+ * mode once is enough for the full vector/matrix
+ */
+bool
+fs_visitor::remove_extra_rounding_modes()
+{
+   bool progress = false;
+
+   foreach_block (block, cfg) {
+  brw_rnd_mode prev_mode = BRW_RND_MODE_UNSPECIFIED;
+
+  foreach_inst_in_block_safe (fs_inst, inst, block) {
+ if (inst->opcode == SHADER_OPCODE_RND_MODE) {
+assert(inst->src[0].file == BRW_IMMEDIATE_VALUE);
+const brw_rnd_mode mode = (brw_rnd_mode) inst->src[0].d;
+if (mode == prev_mode) {
+   inst->remove(block);
+   progress = true;
+} else {
+   prev_mode = mode;
+}
+ }
+  }
+   }
+
+   if (progress)
+  invalidate_live_intervals();
+
+   return progress;
+}
+
 static void
 clear_deps_for_inst_src(fs_inst *inst, bool *deps, int first_grf, int grf_len)
 {
@@ -5808,6 +5844,7 @@ fs_visitor::optimize()
int pass_num = 0;
 
OPT(opt_drop_redundant_mov_to_flags);
+   OPT(remove_extra_rounding_modes);
 
do {
   progress = false;
diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 40dd83f45e..19b897e7a9 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -150,6 +150,7 @@ public:
bool eliminate_find_live_channel();
bool dead_code_eliminate();
bool remove_duplicate_mrf_writes();
+   bool remove_extra_rounding_modes();
 
bool opt_sampler_eot();
bool virtual_grf_interferes(int a, int b);
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 12/44] i965/fs: Remove BRW_REGISTER_TYPE_HF assert at get_exec_type

2017-11-29 Thread Jose Maria Casanova Crespo
From: Alejandro Piñeiro 

Note that we don't remove the assert at i965/vec4. At this point half
float support is only for the scalar backend.

Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_ir_fs.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
index 0e7c55bcc0..cd603630a4 100644
--- a/src/intel/compiler/brw_ir_fs.h
+++ b/src/intel/compiler/brw_ir_fs.h
@@ -465,9 +465,6 @@ get_exec_type(const fs_inst *inst)
if (exec_type == BRW_REGISTER_TYPE_B)
   exec_type = inst->dst.type;
 
-   /* TODO: We need to handle half-float conversions. */
-   assert(exec_type != BRW_REGISTER_TYPE_HF ||
-  inst->dst.type == BRW_REGISTER_TYPE_HF);
assert(exec_type != BRW_REGISTER_TYPE_B);
 
return exec_type;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 15/44] i965/fs: Define new shader opcode to set rounding modes

2017-11-29 Thread Jose Maria Casanova Crespo
From: Alejandro Piñeiro 

Although it is possible to emit them directly as AND/OR on brw_fs_nir,
having a specific opcode makes it easier to remove duplicate settings
later.

v2: (Curro)
  - Set thread control to 'switch' when using the control register
  - Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate
with the rounding mode.
  - Avoid magic numbers setting rounding mode field at control register.
v3: (Curro)
  - Remove redundant and add missing whitespace lines.
  - Match printing instruction to IR opcode "rnd_mode"

v4: (Topi Pohjolainen)
  - Fix code style.

Signed-off-by:  Alejandro Piñeiro 
Signed-off-by:  Jose Maria Casanova Crespo 
Reviewed-by: Francisco Jerez 
Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_eu.h |  4 
 src/intel/compiler/brw_eu_defines.h | 16 
 src/intel/compiler/brw_eu_emit.c| 33 +
 src/intel/compiler/brw_fs_generator.cpp |  5 +
 src/intel/compiler/brw_shader.cpp   |  4 
 5 files changed, 62 insertions(+)

diff --git a/src/intel/compiler/brw_eu.h b/src/intel/compiler/brw_eu.h
index b5a206b3f1..343dcd867d 100644
--- a/src/intel/compiler/brw_eu.h
+++ b/src/intel/compiler/brw_eu.h
@@ -510,6 +510,10 @@ brw_broadcast(struct brw_codegen *p,
   struct brw_reg src,
   struct brw_reg idx);
 
+void
+brw_rounding_mode(struct brw_codegen *p,
+  enum brw_rnd_mode mode);
+
 /***
  * brw_eu_util.c:
  */
diff --git a/src/intel/compiler/brw_eu_defines.h 
b/src/intel/compiler/brw_eu_defines.h
index 291dd361a2..8a8f36cbc1 100644
--- a/src/intel/compiler/brw_eu_defines.h
+++ b/src/intel/compiler/brw_eu_defines.h
@@ -400,6 +400,8 @@ enum opcode {
SHADER_OPCODE_TYPED_SURFACE_WRITE,
SHADER_OPCODE_TYPED_SURFACE_WRITE_LOGICAL,
 
+   SHADER_OPCODE_RND_MODE,
+
SHADER_OPCODE_MEMORY_FENCE,
 
SHADER_OPCODE_GEN4_SCRATCH_READ,
@@ -1238,4 +1240,18 @@ enum brw_message_target {
 /* R0 */
 # define GEN7_GS_PAYLOAD_INSTANCE_ID_SHIFT 27
 
+/* CR0.0[5:4] Floating-Point Rounding Modes
+ *  Skylake PRM, Volume 7 Part 1, "Control Register", page 756
+ */
+
+#define BRW_CR0_RND_MODE_MASK 0x30
+#define BRW_CR0_RND_MODE_SHIFT4
+
+enum PACKED brw_rnd_mode {
+   BRW_RND_MODE_RTNE = 0,  /* Round to Nearest or Even */
+   BRW_RND_MODE_RU = 1,/* Round Up, toward +inf */
+   BRW_RND_MODE_RD = 2,/* Round Down, toward -inf */
+   BRW_RND_MODE_RTZ = 3,   /* Round Toward Zero */
+};
+
 #endif /* BRW_EU_DEFINES_H */
diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index dc14023b48..ca97ff7325 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++ b/src/intel/compiler/brw_eu_emit.c
@@ -3589,3 +3589,36 @@ brw_WAIT(struct brw_codegen *p)
brw_inst_set_exec_size(devinfo, insn, BRW_EXECUTE_1);
brw_inst_set_mask_control(devinfo, insn, BRW_MASK_DISABLE);
 }
+
+/**
+ * Changes the floating point rounding mode updating the control register
+ * field defined at cr0.0[5-6] bits. This function supports the changes to
+ * RTNE (00), RU (01), RD (10) and RTZ (11) rounding using bitwise operations.
+ * Only RTNE and RTZ rounding are enabled at nir.
+ */
+void
+brw_rounding_mode(struct brw_codegen *p,
+  enum brw_rnd_mode mode)
+{
+   const unsigned bits = mode << BRW_CR0_RND_MODE_SHIFT;
+
+   if (bits != BRW_CR0_RND_MODE_MASK) {
+  brw_inst *inst = brw_AND(p, brw_cr0_reg(0), brw_cr0_reg(0),
+   brw_imm_ud(~BRW_CR0_RND_MODE_MASK));
+
+  /* From the Skylake PRM, Volume 7, page 760:
+   *  "Implementation Restriction on Register Access: When the control
+   *   register is used as an explicit source and/or destination, hardware
+   *   does not ensure execution pipeline coherency. Software must set the
+   *   thread control field to ‘switch’ for an instruction that uses
+   *   control register as an explicit operand."
+   */
+  brw_inst_set_thread_control(p->devinfo, inst, BRW_THREAD_SWITCH);
+}
+
+   if (bits) {
+  brw_inst *inst = brw_OR(p, brw_cr0_reg(0), brw_cr0_reg(0),
+  brw_imm_ud(bits));
+  brw_inst_set_thread_control(p->devinfo, inst, BRW_THREAD_SWITCH);
+   }
+}
diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index 28790c86a6..1835c4bf72 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -2163,6 +2163,11 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
  brw_DIM(p, dst, retype(src[0], BRW_REGISTER_TYPE_F));
  break;
 
+  case SHADER_OPCODE_RND_MODE:
+ assert(src[0].file == BRW_IMMEDIATE_VALUE);
+ brw_rounding_mode(p, (brw_rnd_mode) src[0].d);
+ break;
+
 

[Mesa-dev] [PATCH v4 16/44] i965/fs: Enable rounding mode on f2f16 ops

2017-11-29 Thread Jose Maria Casanova Crespo
From: Alejandro Piñeiro 

By default we don't set the rounding mode. We only set
round-to-near-even or round-to-zero mode if explicitly set from nir.

v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate
with the rounding mode (Curro)

v3: Use new helper brw_rnd_mode_from_nir_op  (Jason Ekstrand)

Signed-off-by: Jose Maria Casanova Crespo 
Signed-off-by: Alejandro Piñeiro 
Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_fs_nir.cpp | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index ddc0c6d105..d6ab286147 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -621,6 +621,18 @@ emit_find_msb_using_lzd(const fs_builder ,
inst->src[0].negate = true;
 }
 
+static brw_rnd_mode
+brw_rnd_mode_from_nir_op (const nir_op op) {
+   switch (op) {
+   case nir_op_f2f16_rtz:
+  return BRW_RND_MODE_RTZ;
+   case nir_op_f2f16_rtne:
+  return BRW_RND_MODE_RTNE;
+   default:
+  unreachable("Operation doesn't support rounding mode");
+   }
+}
+
 void
 fs_visitor::nir_emit_alu(const fs_builder , nir_alu_instr *instr)
 {
@@ -724,6 +736,12 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
   inst->saturate = instr->dest.saturate;
   break;
 
+   case nir_op_f2f16_rtne:
+   case nir_op_f2f16_rtz:
+  bld.emit(SHADER_OPCODE_RND_MODE, bld.null_reg_ud(),
+   brw_imm_d(brw_rnd_mode_from_nir_op(instr->op)));
+  /* fallthrough */
+
   /* In theory, it would be better to use BRW_OPCODE_F32TO16. Depending
* on the HW gen, it is a special hw opcode or just a MOV, and
* brw_F32TO16 (at brw_eu_emit) would do the work to chose.
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 18/44] i965/fs: Add byte scattered write message and fs support

2017-11-29 Thread Jose Maria Casanova Crespo
v2: (Jason Ekstrand)
- Enable bit_size parameter to scattered messages to enable different
  bitsizes byte/word/dword.
- Remove use of brw_send_indirect_scattered_message in favor of
  brw_send_indirect_surface_message.
- Move scattered messages to surface messages namespace.
- Assert align1 for scattered messages and assume Gen8+.
- Inline brw_set_dp_byte_scattered_write.

Signed-off-by: Jose Maria Casanova Crespo 
Signed-off-by: Alejandro Piñeiro 
---
 src/intel/compiler/brw_eu.h|  7 +
 src/intel/compiler/brw_eu_defines.h| 17 +++
 src/intel/compiler/brw_eu_emit.c   | 42 ++
 src/intel/compiler/brw_fs.cpp  | 14 +
 src/intel/compiler/brw_fs_copy_propagation.cpp |  2 ++
 src/intel/compiler/brw_fs_generator.cpp|  6 
 src/intel/compiler/brw_fs_surface_builder.cpp  | 11 +++
 src/intel/compiler/brw_fs_surface_builder.h|  7 +
 src/intel/compiler/brw_shader.cpp  |  7 +
 9 files changed, 113 insertions(+)

diff --git a/src/intel/compiler/brw_eu.h b/src/intel/compiler/brw_eu.h
index 343dcd867d..3ac3b4342a 100644
--- a/src/intel/compiler/brw_eu.h
+++ b/src/intel/compiler/brw_eu.h
@@ -485,6 +485,13 @@ brw_typed_surface_write(struct brw_codegen *p,
 unsigned msg_length,
 unsigned num_channels);
 
+void
+brw_byte_scattered_write(struct brw_codegen *p,
+ struct brw_reg payload,
+ struct brw_reg surface,
+ unsigned msg_length,
+ unsigned bit_size);
+
 void
 brw_memory_fence(struct brw_codegen *p,
  struct brw_reg dst);
diff --git a/src/intel/compiler/brw_eu_defines.h 
b/src/intel/compiler/brw_eu_defines.h
index 9d5cf05c86..de6330ee54 100644
--- a/src/intel/compiler/brw_eu_defines.h
+++ b/src/intel/compiler/brw_eu_defines.h
@@ -402,6 +402,16 @@ enum opcode {
 
SHADER_OPCODE_RND_MODE,
 
+   /**
+* Byte scattered write/read opcodes.
+*
+* LOGICAL opcodes are eventually translated to the matching non-LOGICAL
+* opcode, but instead of taking a single payload blog they expect their
+* arguments separately as individual sources, like untyped write/read.
+*/
+   SHADER_OPCODE_BYTE_SCATTERED_WRITE,
+   SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL,
+
SHADER_OPCODE_MEMORY_FENCE,
 
SHADER_OPCODE_GEN4_SCRATCH_READ,
@@ -1255,4 +1265,11 @@ enum PACKED brw_rnd_mode {
BRW_RND_MODE_UNSPECIFIED,  /* Unspecified rounding mode */
 };
 
+/* MDC_DS - Data Size Message Descriptor Control Field */
+enum PACKED brw_data_size {
+   GEN7_BYTE_SCATTERED_DATA_SIZE_BYTE = 0,
+   GEN7_BYTE_SCATTERED_DATA_SIZE_WORD = 1,
+   GEN7_BYTE_SCATTERED_DATA_SIZE_DWORD = 2
+};
+
 #endif /* BRW_EU_DEFINES_H */
diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index ca97ff7325..ded7e228cf 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++ b/src/intel/compiler/brw_eu_emit.c
@@ -2580,6 +2580,7 @@ brw_send_indirect_surface_message(struct brw_codegen *p,
return insn;
 }
 
+
 static bool
 while_jumps_before_offset(const struct gen_device_info *devinfo,
   brw_inst *insn, int while_offset, int start_offset)
@@ -2983,6 +2984,47 @@ brw_untyped_surface_write(struct brw_codegen *p,
   p, insn, num_channels);
 }
 
+static enum brw_data_size brw_data_size_from_bit_size(unsigned bit_size)
+{
+   switch (bit_size) {
+   case 8:
+  return GEN7_BYTE_SCATTERED_DATA_SIZE_BYTE;
+   case 16:
+  return GEN7_BYTE_SCATTERED_DATA_SIZE_WORD;
+   case 32:
+  return GEN7_BYTE_SCATTERED_DATA_SIZE_DWORD;
+   default:
+  unreachable("Unsupported bit_size for byte scattered messages");
+   }
+}
+
+void
+brw_byte_scattered_write(struct brw_codegen *p,
+ struct brw_reg payload,
+ struct brw_reg surface,
+ unsigned msg_length,
+ unsigned bit_size)
+{
+   assert(brw_inst_access_mode(p->devinfo, p->current) == BRW_ALIGN_1);
+   const struct gen_device_info *devinfo = p->devinfo;
+   const unsigned sfid = GEN7_SFID_DATAPORT_DATA_CACHE;
+
+   struct brw_inst *insn = brw_send_indirect_surface_message(
+  p, sfid, brw_writemask(brw_null_reg(), WRITEMASK_XYZW),
+  payload, surface, msg_length, 0, true);
+
+   unsigned msg_control = brw_data_size_from_bit_size(bit_size) << 2;
+
+   if (brw_inst_exec_size(devinfo, p->current) == BRW_EXECUTE_16)
+  msg_control |= 1;
+   else
+  msg_control |= 0;
+
+   brw_inst_set_dp_msg_type(devinfo, insn,
+HSW_DATAPORT_DC_PORT0_BYTE_SCATTERED_WRITE);
+   brw_inst_set_dp_msg_control(devinfo, insn, msg_control);
+}
+
 static void
 brw_set_dp_typed_atomic_message(struct brw_codegen *p,
 struct brw_inst *insn,
diff --git 

[Mesa-dev] [PATCH v4 14/44] i965: Add support for control register

2017-11-29 Thread Jose Maria Casanova Crespo
Control register cr0 in i965 can be used to change the rounding modes
in 32-bit to 16-bit floating-point conversions.

From intel Skylake PRM, vol 07, section "Register and Tegister Regions",
 subsection "Control Register" (page 754):

"Subregister cr0.0:ud contains normal operation control fields such as the
 floating-point mode ... "

Floating-point Rounding mode is changed at bits 5:4 of cr0.0:

"Rounding Mode. This field specifies the FPU rounding mode. It is
initialized by Thread Dispatch."
  00b = Round to Nearest or Even (RTNE)
  01b = Round Up, toward +inf (RU)
  10b = Round Down, toward -inf (RD)
  11b = Round Toward Zero (RTZ)"

Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_reg.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/intel/compiler/brw_reg.h b/src/intel/compiler/brw_reg.h
index ec1045b612..9a8e6389bb 100644
--- a/src/intel/compiler/brw_reg.h
+++ b/src/intel/compiler/brw_reg.h
@@ -810,6 +810,12 @@ brw_notification_reg(void)
   WRITEMASK_X);
 }
 
+static inline struct brw_reg
+brw_cr0_reg(unsigned subnr)
+{
+   return brw_ud1_reg(BRW_ARCHITECTURE_REGISTER_FILE, BRW_ARF_CONTROL, subnr);
+}
+
 static inline struct brw_reg
 brw_sr0_reg(unsigned subnr)
 {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 10/44] i965/vec4: Handle 16-bit types at type_size_xvec4

2017-11-29 Thread Jose Maria Casanova Crespo
From: Alejandro Piñeiro 

These types have similar vec4 sizes as their 32-bit counterparts.

The vec4 backend doesn't support 16-bit types and probably never will,
but this method is called by the scalar backend at
fs_visitor::nir_setup_outputs(), so we still need to provide valid vec4
sizes for 16-bit types. In the future, something different should be
implemented to avoid this dependency.

Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_vec4_visitor.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/intel/compiler/brw_vec4_visitor.cpp 
b/src/intel/compiler/brw_vec4_visitor.cpp
index a845a8dc63..53f6a5ed54 100644
--- a/src/intel/compiler/brw_vec4_visitor.cpp
+++ b/src/intel/compiler/brw_vec4_visitor.cpp
@@ -584,8 +584,11 @@ type_size_xvec4(const struct glsl_type *type, bool as_vec4)
case GLSL_TYPE_UINT:
case GLSL_TYPE_INT:
case GLSL_TYPE_FLOAT:
+   case GLSL_TYPE_FLOAT16:
case GLSL_TYPE_BOOL:
case GLSL_TYPE_DOUBLE:
+   case GLSL_TYPE_UINT16:
+   case GLSL_TYPE_INT16:
case GLSL_TYPE_UINT64:
case GLSL_TYPE_INT64:
   if (type->is_matrix()) {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 01/44] glsl: Add 16-bit types

2017-11-29 Thread Jose Maria Casanova Crespo
From: Eduardo Lima Mitev 

Adds new INT16, UINT16 and FLOAT16 base types.

The corresponding GL types for half floats were reused from the
AMD_gpu_shader_half_float extension. The int16 and uint16 types come from
NV_gpu_shader_5 extension.

This adds the builtins and the lexer support.

To avoid a bunch of warnings due to cases not handled in switch, the
new types have been added to a few places using same behavior as
their 32-bit counterparts, except for a few trivial cases where they are
already handled properly. Subsequent patches in this set will provide
correct 16-bit implementations when needed.

v2: * Use FLOAT16 instead of HALF_FLOAT as name of the base type.
* Removed float16_t from builtin types.
* Don't copy 16-bit types as if they were 32-bit values in
  copy_constant_to_storage().
* Use get_scalar_type() instead of adding a new custom switch
  statement.
(Jason Ekstrand)
v3: Use GL_FLOAT16_NV instead of GL_HALF_FLOAT for consistency
(Ilia Mirkin)
v4: Add missing 16-bit base types support in glsl_to_nir (Eduardo Lima).

Signed-off-by: Jose Maria Casanova Crespo 
Signed-off-by: Eduardo Lima 
Signed-off-by: Alejandro Piñeiro 
Reviewed-by: Jason Ekstrand 
Reviewed-by: Nicolai Hähnle 
---
 src/compiler/builtin_type_macros.h  | 26 +++
 src/compiler/glsl/ast_to_hir.cpp|  3 +
 src/compiler/glsl/glsl_to_nir.cpp   |  6 +-
 src/compiler/glsl/ir_clone.cpp  |  3 +
 src/compiler/glsl/link_uniform_initializers.cpp |  3 +
 src/compiler/glsl/lower_buffer_access.cpp   |  3 +-
 src/compiler/glsl_types.cpp | 93 -
 src/compiler/glsl_types.h   | 10 ++-
 src/mesa/program/ir_to_mesa.cpp |  6 ++
 9 files changed, 145 insertions(+), 8 deletions(-)

diff --git a/src/compiler/builtin_type_macros.h 
b/src/compiler/builtin_type_macros.h
index a275617b34..e3a1cd29c8 100644
--- a/src/compiler/builtin_type_macros.h
+++ b/src/compiler/builtin_type_macros.h
@@ -62,6 +62,22 @@ DECL_TYPE(mat3x4, GL_FLOAT_MAT3x4, GLSL_TYPE_FLOAT, 4, 3)
 DECL_TYPE(mat4x2, GL_FLOAT_MAT4x2, GLSL_TYPE_FLOAT, 2, 4)
 DECL_TYPE(mat4x3, GL_FLOAT_MAT4x3, GLSL_TYPE_FLOAT, 3, 4)
 
+DECL_TYPE(float16_t, GL_FLOAT16_NV,GLSL_TYPE_FLOAT16, 1, 1)
+DECL_TYPE(f16vec2,   GL_FLOAT16_VEC2_NV,   GLSL_TYPE_FLOAT16, 2, 1)
+DECL_TYPE(f16vec3,   GL_FLOAT16_VEC3_NV,   GLSL_TYPE_FLOAT16, 3, 1)
+DECL_TYPE(f16vec4,   GL_FLOAT16_VEC4_NV,   GLSL_TYPE_FLOAT16, 4, 1)
+
+DECL_TYPE(f16mat2,   GL_FLOAT16_MAT2_AMD,   GLSL_TYPE_FLOAT16, 2, 2)
+DECL_TYPE(f16mat3,   GL_FLOAT16_MAT3_AMD,   GLSL_TYPE_FLOAT16, 3, 3)
+DECL_TYPE(f16mat4,   GL_FLOAT16_MAT4_AMD,   GLSL_TYPE_FLOAT16, 4, 4)
+
+DECL_TYPE(f16mat2x3, GL_FLOAT16_MAT2x3_AMD, GLSL_TYPE_FLOAT16, 3, 2)
+DECL_TYPE(f16mat2x4, GL_FLOAT16_MAT2x4_AMD, GLSL_TYPE_FLOAT16, 4, 2)
+DECL_TYPE(f16mat3x2, GL_FLOAT16_MAT3x2_AMD, GLSL_TYPE_FLOAT16, 2, 3)
+DECL_TYPE(f16mat3x4, GL_FLOAT16_MAT3x4_AMD, GLSL_TYPE_FLOAT16, 4, 3)
+DECL_TYPE(f16mat4x2, GL_FLOAT16_MAT4x2_AMD, GLSL_TYPE_FLOAT16, 2, 4)
+DECL_TYPE(f16mat4x3, GL_FLOAT16_MAT4x3_AMD, GLSL_TYPE_FLOAT16, 3, 4)
+
 DECL_TYPE(double,  GL_DOUBLE,GLSL_TYPE_DOUBLE, 1, 1)
 DECL_TYPE(dvec2,   GL_DOUBLE_VEC2,   GLSL_TYPE_DOUBLE, 2, 1)
 DECL_TYPE(dvec3,   GL_DOUBLE_VEC3,   GLSL_TYPE_DOUBLE, 3, 1)
@@ -88,6 +104,16 @@ DECL_TYPE(u64vec2,  GL_UNSIGNED_INT64_VEC2_ARB, 
GLSL_TYPE_UINT64, 2, 1)
 DECL_TYPE(u64vec3,  GL_UNSIGNED_INT64_VEC3_ARB, GLSL_TYPE_UINT64, 3, 1)
 DECL_TYPE(u64vec4,  GL_UNSIGNED_INT64_VEC4_ARB, GLSL_TYPE_UINT64, 4, 1)
 
+DECL_TYPE(int16_t,  GL_INT16_NV,  GLSL_TYPE_INT16, 1, 1)
+DECL_TYPE(i16vec2,  GL_INT16_VEC2_NV, GLSL_TYPE_INT16, 2, 1)
+DECL_TYPE(i16vec3,  GL_INT16_VEC3_NV, GLSL_TYPE_INT16, 3, 1)
+DECL_TYPE(i16vec4,  GL_INT16_VEC4_NV, GLSL_TYPE_INT16, 4, 1)
+
+DECL_TYPE(uint16_t, GL_UNSIGNED_INT16_NV,  GLSL_TYPE_UINT16, 1, 1)
+DECL_TYPE(u16vec2,  GL_UNSIGNED_INT16_VEC2_NV, GLSL_TYPE_UINT16, 2, 1)
+DECL_TYPE(u16vec3,  GL_UNSIGNED_INT16_VEC3_NV, GLSL_TYPE_UINT16, 3, 1)
+DECL_TYPE(u16vec4,  GL_UNSIGNED_INT16_VEC4_NV, GLSL_TYPE_UINT16, 4, 1)
+
 DECL_TYPE(sampler,   GL_SAMPLER_1D,   
GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,   0, 0, GLSL_TYPE_VOID)
 DECL_TYPE(sampler1D, GL_SAMPLER_1D,   
GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_1D,   0, 0, GLSL_TYPE_FLOAT)
 DECL_TYPE(sampler2D, GL_SAMPLER_2D,   
GLSL_TYPE_SAMPLER, GLSL_SAMPLER_DIM_2D,   0, 0, GLSL_TYPE_FLOAT)
diff --git a/src/compiler/glsl/ast_to_hir.cpp b/src/compiler/glsl/ast_to_hir.cpp
index 5cdeb94720..7abb8199e1 100644
--- a/src/compiler/glsl/ast_to_hir.cpp
+++ b/src/compiler/glsl/ast_to_hir.cpp
@@ -1108,12 +1108,15 @@ do_comparison(void *mem_ctx, int operation, ir_rvalue 
*op0, ir_rvalue *op1)
 
switch (op0->type->base_type) {
case GLSL_TYPE_FLOAT:
+   case 

[Mesa-dev] [PATCH v4 04/44] nir: Add rounding modes enum

2017-11-29 Thread Jose Maria Casanova Crespo
v2: Added comments describing each of the rounding modes. (Jason
Ekstrand)

Reviewed-by: Jason Ekstrand 
---
 src/compiler/nir/nir.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index d50e81b46d..883f371d1f 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -106,6 +106,16 @@ typedef enum {
nir_var_all = ~0,
 } nir_variable_mode;
 
+/**
+ * Rounding modes.
+ */
+typedef enum {
+   nir_rounding_mode_undef = 0,
+   nir_rounding_mode_rtne  = 1, /* round to nearest even */
+   nir_rounding_mode_ru= 2, /* round up */
+   nir_rounding_mode_rd= 3, /* round down */
+   nir_rounding_mode_rtz   = 4, /* round towards zero */
+} nir_rounding_mode;
 
 typedef union {
float f32[4];
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 06/44] nir: Handle fp16 rounding modes at nir_type_conversion_op

2017-11-29 Thread Jose Maria Casanova Crespo
nir_type_conversion enables new operations to handle rounding modes to
convert to fp16 values. Two new opcodes are enabled nir_op_f2f16_rtne
and nir_op_f2f16_rtz.

The undefined behaviour doesn't has any effect and uses the original
nir_op_f2f16 operation.

v2: Indentation fixed (Jason Ekstrand)

v3: Use explicit case for undefined rounding and assert if
rounding mode is used for non 16-bit float conversions
(Jason Ekstrand)
---
 src/compiler/glsl/glsl_to_nir.cpp |  3 ++-
 src/compiler/nir/nir.h|  3 ++-
 src/compiler/nir/nir_opcodes.py   | 11 +--
 src/compiler/nir/nir_opcodes_c.py | 15 ++-
 src/compiler/spirv/vtn_alu.c  |  2 +-
 5 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/src/compiler/glsl/glsl_to_nir.cpp 
b/src/compiler/glsl/glsl_to_nir.cpp
index 33ebc13edb..1e636225c1 100644
--- a/src/compiler/glsl/glsl_to_nir.cpp
+++ b/src/compiler/glsl/glsl_to_nir.cpp
@@ -1575,7 +1575,8 @@ nir_visitor::visit(ir_expression *ir)
case ir_unop_u642i64: {
   nir_alu_type src_type = nir_get_nir_type_for_glsl_base_type(types[0]);
   nir_alu_type dst_type = nir_get_nir_type_for_glsl_base_type(out_type);
-  result = nir_build_alu(, nir_type_conversion_op(src_type, dst_type),
+  result = nir_build_alu(, nir_type_conversion_op(src_type, dst_type,
+ nir_rounding_mode_undef),
  srcs[0], NULL, NULL, NULL);
   /* b2i and b2f don't have fixed bit-size versions so the builder will
* just assume 32 and we have to fix it up here.
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 883f371d1f..c70c0e0220 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -761,7 +761,8 @@ nir_get_nir_type_for_glsl_type(const struct glsl_type *type)
return nir_get_nir_type_for_glsl_base_type(glsl_get_base_type(type));
 }
 
-nir_op nir_type_conversion_op(nir_alu_type src, nir_alu_type dst);
+nir_op nir_type_conversion_op(nir_alu_type src, nir_alu_type dst,
+  nir_rounding_mode rnd);
 
 typedef enum {
NIR_OP_IS_COMMUTATIVE = (1 << 0),
diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
index 28a0467228..ac7333fe78 100644
--- a/src/compiler/nir/nir_opcodes.py
+++ b/src/compiler/nir/nir_opcodes.py
@@ -179,8 +179,15 @@ for src_t in [tint, tuint, tfloat]:
   else:
  bit_sizes = [8, 16, 32, 64]
   for bit_size in bit_sizes:
- unop_convert("{0}2{1}{2}".format(src_t[0], dst_t[0], bit_size),
-  dst_t + str(bit_size), src_t, "src0")
+  if bit_size == 16 and dst_t == tfloat and src_t == tfloat:
+  rnd_modes = ['rtne', 'rtz', 'undef']
+  for rnd_mode in rnd_modes:
+  unop_convert("{0}2{1}{2}_{3}".format(src_t[0], dst_t[0],
+   bit_size, rnd_mode),
+   dst_t + str(bit_size), src_t, "src0")
+  else:
+  unop_convert("{0}2{1}{2}".format(src_t[0], dst_t[0], bit_size),
+   dst_t + str(bit_size), src_t, "src0")
 
 # We'll hand-code the to/from bool conversion opcodes.  Because bool doesn't
 # have multiple bit-sizes, we can always infer the size from the other type.
diff --git a/src/compiler/nir/nir_opcodes_c.py 
b/src/compiler/nir/nir_opcodes_c.py
index 02bb4738ed..c19185534a 100644
--- a/src/compiler/nir/nir_opcodes_c.py
+++ b/src/compiler/nir/nir_opcodes_c.py
@@ -30,7 +30,7 @@ template = Template("""
 #include "nir.h"
 
 nir_op
-nir_type_conversion_op(nir_alu_type src, nir_alu_type dst)
+nir_type_conversion_op(nir_alu_type src, nir_alu_type dst, nir_rounding_mode 
rnd)
 {
nir_alu_type src_base = (nir_alu_type) nir_alu_type_get_base_type(src);
nir_alu_type dst_base = (nir_alu_type) nir_alu_type_get_base_type(dst);
@@ -64,7 +64,20 @@ nir_type_conversion_op(nir_alu_type src, nir_alu_type dst)
switch (dst_bit_size) {
 % for dst_bits in [16, 32, 64]:
   case ${dst_bits}:
+%if src_t == 'float' and dst_t == 'float' and dst_bits == 
16:
+ switch(rnd) {
+%   for rnd_t in ['rtne', 'rtz', 'undef']:
+case nir_rounding_mode_${rnd_t}:
+   return ${'nir_op_{0}2{1}{2}_{3}'.format(src_t[0], 
dst_t[0],
+   dst_bits, 
rnd_t)};
+%   endfor
+default:
+   unreachable("Invalid 16-bit nir rounding mode");
+ }
+%else:
+ assert(rnd == nir_rounding_mode_undef);
  return ${'nir_op_{0}2{1}{2}'.format(src_t[0], dst_t[0], 
dst_bits)};
+%endif
 % endfor
   default:
  unreachable("Invalid nir alu bit 

[Mesa-dev] [PATCH v4 02/44] mesa/st: Handle 16-bit types at st_glsl_storage_type_size()

2017-11-29 Thread Jose Maria Casanova Crespo
From: Eduardo Lima Mitev 

This is basically to avoid "not handle in switch" warnings.

v2: Let the new types hit the assertion instead. (Marek Olšák
and Jason Ekstrand)

Reviewed-by: Marek Olšák 
Reviewed-by: Nicolai Hähnle 
Reviewed-by: Jason Ekstrand 
---
 src/mesa/state_tracker/st_glsl_types.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/state_tracker/st_glsl_types.cpp 
b/src/mesa/state_tracker/st_glsl_types.cpp
index 50936025d9..e57fbc8f31 100644
--- a/src/mesa/state_tracker/st_glsl_types.cpp
+++ b/src/mesa/state_tracker/st_glsl_types.cpp
@@ -98,6 +98,9 @@ st_glsl_storage_type_size(const struct glsl_type *type, bool 
is_bindless)
case GLSL_TYPE_VOID:
case GLSL_TYPE_ERROR:
case GLSL_TYPE_FUNCTION:
+   case GLSL_TYPE_FLOAT16:
+   case GLSL_TYPE_UINT16:
+   case GLSL_TYPE_INT16:
   assert(!"Invalid type in type_size");
   break;
}
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 03/44] nir: Add support for 16-bit types (half float, int16 and uint16)

2017-11-29 Thread Jose Maria Casanova Crespo
From: Eduardo Lima Mitev 

v2: Renamed glsl_half_float_type() to glsl_float16_t_type().
(Jason Ekstrand)

Signed-off-by: Jose Maria Casanova Crespo 
Signed-off-by: Eduardo Lima 
Reviewed-by: Jason Ekstrand 
---
 src/compiler/nir/nir.c  |  6 ++
 src/compiler/nir/nir.h  |  9 +
 src/compiler/nir/nir_split_var_copies.c |  6 ++
 src/compiler/nir_types.cpp  | 18 ++
 src/compiler/nir_types.h|  8 
 5 files changed, 47 insertions(+)

diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c
index 7380bf436a..688f2b1ae3 100644
--- a/src/compiler/nir/nir.c
+++ b/src/compiler/nir/nir.c
@@ -726,10 +726,13 @@ deref_foreach_leaf_build_recur(nir_deref_var *deref, 
nir_deref *tail,
assert(tail->child == NULL);
switch (glsl_get_base_type(tail->type)) {
case GLSL_TYPE_UINT:
+   case GLSL_TYPE_UINT16:
case GLSL_TYPE_UINT64:
case GLSL_TYPE_INT:
+   case GLSL_TYPE_INT16:
case GLSL_TYPE_INT64:
case GLSL_TYPE_FLOAT:
+   case GLSL_TYPE_FLOAT16:
case GLSL_TYPE_DOUBLE:
case GLSL_TYPE_BOOL:
   if (glsl_type_is_vector_or_scalar(tail->type))
@@ -874,7 +877,10 @@ nir_deref_get_const_initializer_load(nir_shader *shader, 
nir_deref_var *deref)
case GLSL_TYPE_FLOAT:
case GLSL_TYPE_INT:
case GLSL_TYPE_UINT:
+   case GLSL_TYPE_FLOAT16:
case GLSL_TYPE_DOUBLE:
+   case GLSL_TYPE_INT16:
+   case GLSL_TYPE_UINT16:
case GLSL_TYPE_UINT64:
case GLSL_TYPE_INT64:
case GLSL_TYPE_BOOL:
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index c01fa6707a..d50e81b46d 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -719,6 +719,12 @@ nir_get_nir_type_for_glsl_base_type(enum glsl_base_type 
base_type)
case GLSL_TYPE_INT:
   return nir_type_int32;
   break;
+   case GLSL_TYPE_UINT16:
+  return nir_type_uint16;
+  break;
+   case GLSL_TYPE_INT16:
+  return nir_type_int16;
+  break;
case GLSL_TYPE_UINT64:
   return nir_type_uint64;
   break;
@@ -728,6 +734,9 @@ nir_get_nir_type_for_glsl_base_type(enum glsl_base_type 
base_type)
case GLSL_TYPE_FLOAT:
   return nir_type_float32;
   break;
+   case GLSL_TYPE_FLOAT16:
+  return nir_type_float16;
+  break;
case GLSL_TYPE_DOUBLE:
   return nir_type_float64;
   break;
diff --git a/src/compiler/nir/nir_split_var_copies.c 
b/src/compiler/nir/nir_split_var_copies.c
index 15a185ec8d..bc3ceedbdb 100644
--- a/src/compiler/nir/nir_split_var_copies.c
+++ b/src/compiler/nir/nir_split_var_copies.c
@@ -147,10 +147,13 @@ split_var_copy_instr(nir_intrinsic_instr *old_copy,
   break;
 
case GLSL_TYPE_UINT:
+   case GLSL_TYPE_UINT16:
case GLSL_TYPE_UINT64:
case GLSL_TYPE_INT:
+   case GLSL_TYPE_INT16:
case GLSL_TYPE_INT64:
case GLSL_TYPE_FLOAT:
+   case GLSL_TYPE_FLOAT16:
case GLSL_TYPE_DOUBLE:
case GLSL_TYPE_BOOL:
   if (glsl_type_is_matrix(src_tail->type)) {
@@ -229,6 +232,7 @@ split_var_copies_block(nir_block *block, struct 
split_var_copies_state *state)
  ralloc_steal(state->dead_ctx, instr);
  break;
   case GLSL_TYPE_FLOAT:
+  case GLSL_TYPE_FLOAT16:
   case GLSL_TYPE_DOUBLE:
  if (glsl_type_is_matrix(src_tail->type)) {
 split_var_copy_instr(intrinsic, dest_head, src_head,
@@ -239,6 +243,8 @@ split_var_copies_block(nir_block *block, struct 
split_var_copies_state *state)
  break;
   case GLSL_TYPE_INT:
   case GLSL_TYPE_UINT:
+  case GLSL_TYPE_INT16:
+  case GLSL_TYPE_UINT16:
   case GLSL_TYPE_INT64:
   case GLSL_TYPE_UINT64:
   case GLSL_TYPE_BOOL:
diff --git a/src/compiler/nir_types.cpp b/src/compiler/nir_types.cpp
index c66cfff8be..377de0c9c7 100644
--- a/src/compiler/nir_types.cpp
+++ b/src/compiler/nir_types.cpp
@@ -273,6 +273,12 @@ glsl_double_type(void)
return glsl_type::double_type;
 }
 
+const glsl_type *
+glsl_float16_t_type(void)
+{
+   return glsl_type::float16_t_type;
+}
+
 const glsl_type *
 glsl_vec_type(unsigned n)
 {
@@ -315,6 +321,18 @@ glsl_uint64_t_type(void)
return glsl_type::uint64_t_type;
 }
 
+const glsl_type *
+glsl_int16_t_type(void)
+{
+   return glsl_type::int16_t_type;
+}
+
+const glsl_type *
+glsl_uint16_t_type(void)
+{
+   return glsl_type::uint16_t_type;
+}
+
 const glsl_type *
 glsl_bool_type(void)
 {
diff --git a/src/compiler/nir_types.h b/src/compiler/nir_types.h
index 9f398b9278..daff973250 100644
--- a/src/compiler/nir_types.h
+++ b/src/compiler/nir_types.h
@@ -94,6 +94,11 @@ glsl_get_bit_size(const struct glsl_type *type)
case GLSL_TYPE_SUBROUTINE:
   return 32;
 
+   case GLSL_TYPE_FLOAT16:
+   case GLSL_TYPE_UINT16:
+   case GLSL_TYPE_INT16:
+  return 16;
+
case GLSL_TYPE_DOUBLE:
case GLSL_TYPE_INT64:
case GLSL_TYPE_UINT64:
@@ -126,6 +131,7 @@ bool glsl_sampler_type_is_array(const 

[Mesa-dev] [PATCH v4 05/44] nir: Populate conversion opcodes to 16-bit types

2017-11-29 Thread Jose Maria Casanova Crespo
From: Eduardo Lima Mitev 

This will include the following NIR ALU opcodes:
 * nir_op_i2i16
 * nir_op_i2f16
 * nir_op_u2u16
 * nir_op_u2f16
 * nir_op_f2i16
 * nir_op_f2u16
 * nir_op_f2f16

v2: Remove "from" 16-bit in commit subject (Topi Pohjolainen)

Reviewed-by: Jason Ekstrand 
---
 src/compiler/nir/nir_opcodes_c.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compiler/nir/nir_opcodes_c.py 
b/src/compiler/nir/nir_opcodes_c.py
index a1db54f05a..02bb4738ed 100644
--- a/src/compiler/nir/nir_opcodes_c.py
+++ b/src/compiler/nir/nir_opcodes_c.py
@@ -62,7 +62,7 @@ nir_type_conversion_op(nir_alu_type src, nir_alu_type dst)
 % endif
 %  endif
switch (dst_bit_size) {
-% for dst_bits in [32, 64]:
+% for dst_bits in [16, 32, 64]:
   case ${dst_bits}:
  return ${'nir_op_{0}2{1}{2}'.format(src_t[0], dst_t[0], 
dst_bits)};
 % endfor
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 00/44] anv: SPV_KHR_16bit_storage/VK_KHR_16bit_storage for gen8+

2017-11-29 Thread Jose Maria Casanova Crespo
Hello,

this is the V4 series for the implementation of the SPV_KHR_16bit_storage
and VK_KHR_16bit_storage extensions on the anv vulkan driver, in addition
to the GLSL and NIR support needed.

The original series can be found here [1], the following v2 [2]
and v3 [3].

In short V4 includes the following:

 * Reorder the series to enable features as they are implemented, the series
   now enables first UBO and SSBO support, and then inputs/outputs and
   finally push constants.
 * Support the byte scattered read/write messages with different bit sizes
   byte/word/dword.
 * Refactor of the store_ssbo code and also fix stores when writemask was .yz
 * Uses the sampler for load_ubo avoiding the initial implementation of
   the series using byte_scattered_read.
 * Addressed all the feedback provided by Jason and Topi on v3 review.

This series is also available at:

https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-rc4

The objective is to start landing part of this series, all feedback has been
addressed for SSBO and UBO. But for input/outputs features it will probably
need another iteration as was not completely reviewed. It is also needed
to define the approach for push constants issues before of after landing
the support with this implementation.

Patches 1-5 and 8-17 have already been reviewed. Patch 7 was already
reviewed but as it has changed too much i would appreciate another
review. When patches until 25 or 28 are reviewed we could land UBOs and
SSBOs support.

Finally an updated overview of the patches:

Patches 1-2 add 16-bit float, int and uint types to GLSL. This is
needed because NIR uses GLSL types internally. We use the enums
already defined at AMD_gpu_shader_half_float and NV_gpu_shader
extensions. Patch 2 updates mesa/st, in order to avoid warnings for
types not handled on a switch.

Patches 3-6 add NIR support for those new GLSL 16-bit types,
conversion opcodes, and rounding modes for float to half-float
conversions.

Patches 7-9 add the SPIR-V (SPV_KHR_16bit_storage) to NIR support.

Patches 10-12 add general 16-bit support for i965. This includes
handling of new types on several general purpose methods,
update/remove some asserts.

Patches 14-17 add support for 32 to 16-bit conversions for i965,
including rounding mode opcodes (needed for float to half-float
conversions), and an optimization that removes superfluous rounding
mode sets.

Patches 18-21 add and use two new messages: byte scattered read and
write. Those were needed because untyped surface message has a fixed
32-bit write size. Those messages are used on the 16-bit support of
store SSBO, load SSBO and load shared.

Patch 22 adds helpers to allow un/shuffle 16-bit components in 32-bit
ones. This would be needed for following optimizations on load/store
ssbo. This huck was originally in the input/outputs support, but needed
a relocation because of the new order of the series.

Patch 23 Enables load_ubo support for 16-bit using the sampler un-shuffling
pairs 16-bit components from 32-bit.

Patches 24-25 enable SPV_KHR_16bit_storage and VK_KHR_16bit_storage but only
the support for SSBO and UBO on anv vulkan driver.

Patches 26-28 were new patches included in V2 to improve performance
reducing the use of multiple scattered messages for untyped read/write
opreations. 16bit CTS tests passes without them. The other one would
fix a real problem using predication (patch 27), but unfourtunately no CTS
test yet catching it.

Patches 29-33 implement 16-bit vertex attribute inputs support on
i965. These include changes on anv. This was needed because 16-bit
surface formats do implicit conversion to 32-bit. To workaround this,
we override the 16-bit surface format, and use 32-bit ones. Issues related
to robust buffer access have been addressed.

Patch 34 implements load input and load store for all intra stage. This
patch could have problems pointed by Jason related to how TCS outputs are
implmemented that need more work.

Patch 35-42 implements 16-bit store output support for fragment
shaders on i965. Last patch enables VK_KHR_16bit for input/outputs.

Patch 43-44 adds 16-bit support for push constant and enables the feature.
There is still pending to work on a general solution for push constants and
the mixture of different bit_sizes.

[1] https://lists.freedesktop.org/archives/mesa-dev/2017-July/162791.html
[2] https://lists.freedesktop.org/archives/mesa-dev/2017-August/167455.html
[3] https://lists.freedesktop.org/archives/mesa-dev/2017-October/172557.html

CC: Jason Ekstrand 
CC: Topi Pohjolainen 
CC: Matt Turner 

Alejandro Piñeiro (12):
  i965/vec4: Handle 16-bit types at type_size_xvec4
  i965/fs: Remove BRW_REGISTER_TYPE_HF assert at get_exec_type
  i965/fs: Handle 32-bit to 16-bit conversions
  i965/fs: Define new shader opcode to set rounding modes
  i965/fs: Enable rounding mode on f2f16 ops
  i965/fs: Add remove_extra_rounding_modes optimization
  

Re: [Mesa-dev] [PATCH] i965: implement (un)mapImage

2017-11-29 Thread Jason Ekstrand
Julien,

Mind if I ask what your use-case is?  We've been talking about trying to
remove tiled mapping from the driver and using blits instead.  I don't want
to suddenly drop someone off a performance cliff.

Thanks,

--Jason

On Tue, Nov 14, 2017 at 3:05 AM, Julien Isorce 
wrote:

> Already implemented for Gallium drivers.
>
> Useful for gbm_bo_(un)map.
>
> Tests:
>   By porting wayland/weston/clients/simple-dmabuf-drm.c to GBM.
>   kmscube --mode=rgba
>   kmscube --mode=nv12-1img
>   kmscube --mode=nv12-2img
>   piglit ext_image_dma_buf_import-refcount -auto
>   piglit ext_image_dma_buf_import-transcode-nv12-as-r8-gr88 -auto
>   piglit ext_image_dma_buf_import-sample_rgb -fmt=XR24 -alpha-one -auto
>   piglit ext_image_dma_buf_import-sample_rgb -fmt=AR24 -auto
>   piglit ext_image_dma_buf_import-sample_yuv -fmt=NV12 -auto
>   piglit ext_image_dma_buf_import-sample_yuv -fmt=YU12 -auto
>   piglit ext_image_dma_buf_import-sample_yuv -fmt=YV12 -auto
>
> v2: add early return if (flag & MAP_INTERNAL_MASK)
> v3: take input rect into account and test with kmscube and piglit.
> v4: handle wraparound and bo reference.
> v5: indent, exclude 0 width and height on the boundary, map bo
> independently of the image.
>
> Signed-off-by: Julien Isorce 
> ---
>  src/mesa/drivers/dri/i965/intel_screen.c | 65
> +++-
>  1 file changed, 63 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_screen.c
> b/src/mesa/drivers/dri/i965/intel_screen.c
> index cdc36ad..88bd982 100644
> --- a/src/mesa/drivers/dri/i965/intel_screen.c
> +++ b/src/mesa/drivers/dri/i965/intel_screen.c
> @@ -755,6 +755,67 @@ intel_create_image(__DRIscreen *dri_screen,
> loaderPrivate);
>  }
>
> +static void *
> +intel_map_image(__DRIcontext *context, __DRIimage *image,
> +int x0, int y0, int width, int height,
> +unsigned int flags, int *stride, void **map_info)
> +{
> +   struct brw_context *brw = NULL;
> +   struct brw_bo *bo = NULL;
> +   void *raw_data = NULL;
> +   GLuint pix_w = 1;
> +   GLuint pix_h = 1;
> +   GLint pix_bytes = 1;
> +
> +   if (!context || !image || !stride || !map_info || *map_info)
> +  return NULL;
> +
> +   if (x0 < 0 || x0 >= image->width || width > image->width - x0)
> +  return NULL;
> +
> +   if (y0 < 0 || y0 >= image->height || height > image->height - y0)
> +return NULL;
> +
> +   if (flags & MAP_INTERNAL_MASK)
> +  return NULL;
> +
> +   brw = context->driverPrivate;
> +   bo = image->bo;
> +
> +   assert(brw);
> +   assert(bo);
> +
> +   /* DRI flags and GL_MAP.*_BIT flags are the same, so just pass them
> on. */
> +   raw_data = brw_bo_map(brw, bo, flags);
> +   if (!raw_data)
> +  return NULL;
> +
> +   _mesa_get_format_block_size(image->format, _w, _h);
> +   pix_bytes = _mesa_get_format_bytes(image->format);
> +
> +   assert(pix_w);
> +   assert(pix_h);
> +   assert(pix_bytes > 0);
> +
> +   raw_data += ((x0 / pix_w) * pix_bytes) + (y0 / pix_h) * image->pitch;
> +
> +   brw_bo_reference(bo);
> +
> +   *stride = image->pitch;
> +   *map_info = bo;
> +
> +   return raw_data;
> +}
> +
> +static void
> +intel_unmap_image(__DRIcontext *context, __DRIimage *image, void
> *map_info)
> +{
> +   struct brw_bo *bo = map_info;
> +
> +   brw_bo_unmap(bo);
> +   brw_bo_unreference(bo);
> +}
> +
>  static __DRIimage *
>  intel_create_image_with_modifiers(__DRIscreen *dri_screen,
>int width, int height, int format,
> @@ -1305,8 +1366,8 @@ static const __DRIimageExtension intelImageExtension
> = {
>  .createImageFromDmaBufs = intel_create_image_from_dma_
> bufs,
>  .blitImage  = NULL,
>  .getCapabilities= NULL,
> -.mapImage   = NULL,
> -.unmapImage = NULL,
> +.mapImage   = intel_map_image,
> +.unmapImage = intel_unmap_image,
>  .createImageWithModifiers   = intel_create_image_with_
> modifiers,
>  .createImageFromDmaBufs2= intel_create_image_from_dma_
> bufs2,
>  .queryDmaBufFormats = intel_query_dma_buf_formats,
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/12] anv: Add support for the variablePointers feature

2017-11-29 Thread Chad Versace
On Mon 06 Nov 2017, Jason Ekstrand wrote:
> On Mon, Nov 6, 2017 at 7:26 PM, Jason Ekstrand <[1]ja...@jlekstrand.net> 
> wrote:

> The tests are fixed in CL #1915.  I feel like a dork now...

The CL is still languishing in Gerrit. fyi, I've pushed a branch with
the CL.

http://git.kiwitree.net/cgit/~chadv/vk-gl-cts/log/?h=pu

fwiw, this series is
Tested-by: Chad Versace 

btw, if you'd like to see the branch where I'm integrating this into
ARC++, it's


http://git.kiwitree.net/cgit/~chadv/mesa/log/?h=wip/arc-17.3-anv-variable-pointers
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] ac/surface: always compute DCC info when DCC is possible on GFX9

2017-11-29 Thread Marek Olšák
From: Marek Olšák 

The same code for VI doesn't check for scanout either.
---
 src/amd/common/ac_surface.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/amd/common/ac_surface.c b/src/amd/common/ac_surface.c
index 8347c45..590920e 100644
--- a/src/amd/common/ac_surface.c
+++ b/src/amd/common/ac_surface.c
@@ -918,21 +918,20 @@ static int gfx9_compute_miptree(ADDR_HANDLE addrlib,
return ret;
 
surf->u.gfx9.htile.rb_aligned = hin.hTileFlags.rbAligned;
surf->u.gfx9.htile.pipe_aligned = hin.hTileFlags.pipeAligned;
surf->htile_size = hout.htileBytes;
surf->htile_slice_size = hout.sliceSize;
surf->htile_alignment = hout.baseAlign;
} else {
/* DCC */
if (!(surf->flags & RADEON_SURF_DISABLE_DCC) &&
-   !(surf->flags & RADEON_SURF_SCANOUT) &&
!compressed &&
in->swizzleMode != ADDR_SW_LINEAR) {
ADDR2_COMPUTE_DCCINFO_INPUT din = {0};
ADDR2_COMPUTE_DCCINFO_OUTPUT dout = {0};
ADDR2_META_MIP_INFO 
meta_mip_info[RADEON_SURF_MAX_LEVELS] = {};
 
din.size = sizeof(ADDR2_COMPUTE_DCCINFO_INPUT);
dout.size = sizeof(ADDR2_COMPUTE_DCCINFO_OUTPUT);
dout.pMipInfo = meta_mip_info;
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] radeonsi/gfx9: fix importing shared textures with DCC

2017-11-29 Thread Marek Olšák
From: Marek Olšák 

VI has 11 dwords at least. GFX9 has 10 dwords.

Cc: 17.2 17.3 
---
 src/gallium/drivers/radeon/r600_texture.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/r600_texture.c 
b/src/gallium/drivers/radeon/r600_texture.c
index 1a0503b..86a2e1b 100644
--- a/src/gallium/drivers/radeon/r600_texture.c
+++ b/src/gallium/drivers/radeon/r600_texture.c
@@ -639,21 +639,21 @@ static void si_apply_opaque_metadata(struct si_screen 
*sscreen,
 struct radeon_bo_metadata *md)
 {
uint32_t *desc = >metadata[2];
 
if (sscreen->info.chip_class < VI)
return;
 
/* Return if DCC is enabled. The texture should be set up with it
 * already.
 */
-   if (md->size_metadata >= 11 * 4 &&
+   if (md->size_metadata >= 10 * 4 && /* at least 2(header) + 8(desc) 
dwords */
md->metadata[0] != 0 &&
md->metadata[1] == si_get_bo_metadata_word1(sscreen) &&
G_008F28_COMPRESSION_EN(desc[6])) {
rtex->dcc_offset = (uint64_t)desc[7] << 8;
return;
}
 
/* Disable DCC. These are always set by texture_from_handle and must
 * be cleared here.
 */
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] i965/tex_image: Reference the renderbuffer miptree in setTexBuffer2

2017-11-29 Thread Jason Ekstrand
On Wed, Nov 29, 2017 at 4:42 PM, Chad Versace 
wrote:

> On Tue 28 Nov 2017, Jason Ekstrand wrote:
> >
> >
> > On Tue, Nov 21, 2017 at 3:05 PM, Chad Versace <[1]
> chadvers...@chromium.org>
> > wrote:
>
> > > @@ -442,7 +443,6 @@ intelSetTexBuffer2(__DRIcontext *pDRICtx,
> GLint
> > target,
> > > struct gl_texture_object *texObj;
> > > struct gl_texture_image *texImage;
> > > mesa_format texFormat = MESA_FORMAT_NONE;
> > > -   struct intel_mipmap_tree *mt;
> > > GLenum internal_format = 0;
> > >
> > > texObj = _mesa_get_current_tex_object(ctx, target);
> > > @@ -464,31 +464,24 @@ intelSetTexBuffer2(__DRIcontext *pDRICtx,
> GLint
> > target,
> > > if (rb->mt->cpp == 4) {
> > >if (texture_format == __DRI_TEXTURE_FORMAT_RGB) {
> > >   internal_format = GL_RGB;
> > > - texFormat = MESA_FORMAT_B8G8R8X8_UNORM;
> > > + texFormat = MESA_FORMAT_B8G8R8A8_UNORM;
> >
> > Why replace rgbx with rgba? I suspect the replace is due to the same
> > reasons explained in intel_miptree_create_for_dri_image(). Whatever
> the
> > reasons are, they're subtle and deserve a comment.
> >
> >
> > I believe your fears go away if you re-order things and put 3 before 2.
> Why
> > RGBA instead of RGBX?  Because the underlying miptree of the
> renderbuffer is
> > likely to have that format.  That said, it's not actually guaranteed so
> making
> > that change in this patch is a bit bogus.  If we just make the change in
> 2
> > instead, I believe all bogosity is gone.
>
> If I conceptually place patch 3 before patch 2, I see the correctness of
> everything. That makes this patch (patch 2)
>
> Reviewed-by: Chad Versace 
>
> If choose to fidget the code a little in this patch due to my complaint,
> my rb still stands.
>

I sent 6 patches yesterday with some fidgeting.  No, I did not use a
spinner.

--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] i965/tex_image: Reference the renderbuffer miptree in setTexBuffer2

2017-11-29 Thread Chad Versace
On Tue 28 Nov 2017, Jason Ekstrand wrote:
> 
> 
> On Tue, Nov 21, 2017 at 3:05 PM, Chad Versace <[1]chadvers...@chromium.org>
> wrote:

> > @@ -442,7 +443,6 @@ intelSetTexBuffer2(__DRIcontext *pDRICtx, GLint
> target,
> >     struct gl_texture_object *texObj;
> >     struct gl_texture_image *texImage;
> >     mesa_format texFormat = MESA_FORMAT_NONE;
> > -   struct intel_mipmap_tree *mt;
> >     GLenum internal_format = 0;
> >
> >     texObj = _mesa_get_current_tex_object(ctx, target);
> > @@ -464,31 +464,24 @@ intelSetTexBuffer2(__DRIcontext *pDRICtx, GLint
> target,
> >     if (rb->mt->cpp == 4) {
> >        if (texture_format == __DRI_TEXTURE_FORMAT_RGB) {
> >           internal_format = GL_RGB;
> > -         texFormat = MESA_FORMAT_B8G8R8X8_UNORM;
> > +         texFormat = MESA_FORMAT_B8G8R8A8_UNORM;
> 
> Why replace rgbx with rgba? I suspect the replace is due to the same
> reasons explained in intel_miptree_create_for_dri_image(). Whatever the
> reasons are, they're subtle and deserve a comment.
> 
> 
> I believe your fears go away if you re-order things and put 3 before 2.  Why
> RGBA instead of RGBX?  Because the underlying miptree of the renderbuffer is
> likely to have that format.  That said, it's not actually guaranteed so making
> that change in this patch is a bit bogus.  If we just make the change in 2
> instead, I believe all bogosity is gone.

If I conceptually place patch 3 before patch 2, I see the correctness of
everything. That makes this patch (patch 2)

Reviewed-by: Chad Versace 

If choose to fidget the code a little in this patch due to my complaint,
my rb still stands.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Refactored st/omx/tizonia commits

2017-11-29 Thread Dylan Baker
Quoting Eric Engestrom (2017-11-29 07:19:02)
> On Wednesday, 2017-11-29 09:32:09 +0530, Gurkirpal Singh wrote:
> > These are the refactored commits related to the GSoC project involving
> > adding a st/omx state tracker using tizonia.
> > There are still some parts of code that i didn't refactor yet as
> > explained below:
> > 1) I wasn't sure if it's okay to use #if-#else declaratives for function
> > declarations. For eg: One function accepts omx_base_PortType and the other
> > one vid_dec_PrivateType
> > 2) Because of the argument type differences there is excessive amounts of
> > #if-#else pairs will be needed
> > So I decided to wait for review before making those changes.
> 
> I notice you left the meson build system out; could you give it a stab?
> Feel free to ask me or Dylan for help if you get stuck :)

Do note that the meson omx code hasn't landed yet (hopefully that will happen
in the next day or two though).

One thing I'm not sure about there is how to handle the command line option to
enable the build. Currently it's `gallium-omx`, and accepts, 'true', 'false',
and 'auto'. It might make sense to make it more like the glx option, and accept
'auto', 'bellagio', 'disabled', and when this lands 'tizonia'.

Dylan


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4] nir: add varying component packing helpers

2017-11-29 Thread Timothy Arceri
v2: update shader info input/output masks when pack components
v3: make sure interpolation loc matches, this is required for the
radeonsi NIR backend.
v4: 33dca36f4f28 fixed nir_gather_info to update outputs_read
correct, make sure we also adjust this correctly when
packing components.

Reviewed-by: Bas Nieuwenhuizen  (v1)
Reviewed-by: Nicolai Hähnle  (v3)
---
 src/compiler/nir/nir.h |   2 +
 src/compiler/nir/nir_linking_helpers.c | 330 +
 2 files changed, 332 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 4c5d976a60d..83858afe148 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2452,20 +2452,22 @@ void nir_lower_io_to_temporaries(nir_shader *shader,
  nir_function_impl *entrypoint,
  bool outputs, bool inputs);
 
 void nir_shader_gather_info(nir_shader *shader, nir_function_impl *entrypoint);
 
 void nir_assign_var_locations(struct exec_list *var_list, unsigned *size,
   int (*type_size)(const struct glsl_type *));
 
 /* Some helpers to do very simple linking */
 bool nir_remove_unused_varyings(nir_shader *producer, nir_shader *consumer);
+void nir_compact_varyings(nir_shader *producer, nir_shader *consumer,
+  bool default_to_smooth_interp);
 
 typedef enum {
/* If set, this forces all non-flat fragment shader inputs to be
 * interpolated as if with the "sample" qualifier.  This requires
 * nir_shader_compiler_options::use_interpolated_input_intrinsics.
 */
nir_lower_io_force_sample_interpolation = (1 << 1),
 } nir_lower_io_options;
 bool nir_lower_io(nir_shader *shader,
   nir_variable_mode modes,
diff --git a/src/compiler/nir/nir_linking_helpers.c 
b/src/compiler/nir/nir_linking_helpers.c
index 4d709c1b3c5..9f0122d4519 100644
--- a/src/compiler/nir/nir_linking_helpers.c
+++ b/src/compiler/nir/nir_linking_helpers.c
@@ -166,10 +166,340 @@ nir_remove_unused_varyings(nir_shader *producer, 
nir_shader *consumer)
 
bool progress = false;
progress = remove_unused_io_vars(producer, >outputs, read,
 patches_read);
 
progress = remove_unused_io_vars(consumer, >inputs, written,
 patches_written) || progress;
 
return progress;
 }
+
+static uint8_t
+get_interp_type(nir_variable *var, bool default_to_smooth_interp)
+{
+   if (var->data.interpolation != INTERP_MODE_NONE)
+  return var->data.interpolation;
+   else if (default_to_smooth_interp)
+  return INTERP_MODE_SMOOTH;
+   else
+  return INTERP_MODE_NONE;
+}
+
+#define INTERPOLATE_LOC_SAMPLE 0
+#define INTERPOLATE_LOC_CENTROID 1
+#define INTERPOLATE_LOC_CENTER 2
+
+static uint8_t
+get_interp_loc(nir_variable *var)
+{
+   if (var->data.sample)
+  return INTERPOLATE_LOC_SAMPLE;
+   else if (var->data.centroid)
+  return INTERPOLATE_LOC_CENTROID;
+   else
+  return INTERPOLATE_LOC_CENTER;
+}
+
+static void
+get_slot_component_masks_and_interp_types(struct exec_list *var_list,
+  uint8_t *comps,
+  uint8_t *interp_type,
+  uint8_t *interp_loc,
+  gl_shader_stage stage,
+  bool default_to_smooth_interp)
+{
+   nir_foreach_variable_safe(var, var_list) {
+  assert(var->data.location >= 0);
+
+  /* Only remap things that aren't built-ins.
+   * TODO: add TES patch support.
+   */
+  if (var->data.location >= VARYING_SLOT_VAR0 &&
+  var->data.location - VARYING_SLOT_VAR0 < 32) {
+
+ const struct glsl_type *type = var->type;
+ if (nir_is_per_vertex_io(var, stage)) {
+assert(glsl_type_is_array(type));
+type = glsl_get_array_element(type);
+ }
+
+ unsigned location = var->data.location - VARYING_SLOT_VAR0;
+ unsigned elements =
+glsl_get_vector_elements(glsl_without_array(type));
+
+ bool dual_slot = glsl_type_is_dual_slot(glsl_without_array(type));
+ unsigned slots = glsl_count_attribute_slots(type, false);
+ unsigned comps_slot2 = 0;
+ for (unsigned i = 0; i < slots; i++) {
+interp_type[location + i] =
+   get_interp_type(var, default_to_smooth_interp);
+interp_loc[location + i] = get_interp_loc(var);
+
+if (dual_slot) {
+   if (i & 1) {
+  comps[location + i] |= ((1 << comps_slot2) - 1);
+   } else {
+  unsigned num_comps = 4 - var->data.location_frac;
+  comps_slot2 = (elements * 2) - num_comps;
+
+  /* Assume ARB_enhanced_layouts packing rules for doubles */
+  

[Mesa-dev] [PATCH v3] nir: add varying array splitting pass

2017-11-29 Thread Timothy Arceri
V2:
 - fix matrix support, non-array matrices were being skipped in v1

v3:
 - handle lowering of tcs output loads correctly
 - correctly mark indirect locations for either in or out not both
   when processing a stage.
 - use nir_src_copy() when lowering stores.
---
 src/compiler/Makefile.sources  |   1 +
 src/compiler/nir/meson.build   |   1 +
 src/compiler/nir/nir.h |   1 +
 src/compiler/nir/nir_lower_io_arrays_to_elements.c | 383 +
 4 files changed, 386 insertions(+)
 create mode 100644 src/compiler/nir/nir_lower_io_arrays_to_elements.c

diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
index 2ab8e163a26..c5094b7f198 100644
--- a/src/compiler/Makefile.sources
+++ b/src/compiler/Makefile.sources
@@ -219,20 +219,21 @@ NIR_FILES = \
nir/nir_lower_double_ops.c \
nir/nir_lower_drawpixels.c \
nir/nir_lower_global_vars_to_local.c \
nir/nir_lower_gs_intrinsics.c \
nir/nir_lower_load_const_to_scalar.c \
nir/nir_lower_locals_to_regs.c \
nir/nir_lower_idiv.c \
nir/nir_lower_indirect_derefs.c \
nir/nir_lower_int64.c \
nir/nir_lower_io.c \
+   nir/nir_lower_io_arrays_to_elements.c \
nir/nir_lower_io_to_temporaries.c \
nir/nir_lower_io_to_scalar.c \
nir/nir_lower_io_types.c \
nir/nir_lower_passthrough_edgeflags.c \
nir/nir_lower_patch_vertices.c \
nir/nir_lower_phis_to_scalar.c \
nir/nir_lower_regs_to_ssa.c \
nir/nir_lower_returns.c \
nir/nir_lower_samplers.c \
nir/nir_lower_samplers_as_deref.c \
diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
index e5c8326aa06..b61a07773d3 100644
--- a/src/compiler/nir/meson.build
+++ b/src/compiler/nir/meson.build
@@ -107,20 +107,21 @@ files_libnir = files(
   'nir_lower_double_ops.c',
   'nir_lower_drawpixels.c',
   'nir_lower_global_vars_to_local.c',
   'nir_lower_gs_intrinsics.c',
   'nir_lower_load_const_to_scalar.c',
   'nir_lower_locals_to_regs.c',
   'nir_lower_idiv.c',
   'nir_lower_indirect_derefs.c',
   'nir_lower_int64.c',
   'nir_lower_io.c',
+  'nir_lower_io_arrays_to_elements.c',
   'nir_lower_io_to_temporaries.c',
   'nir_lower_io_to_scalar.c',
   'nir_lower_io_types.c',
   'nir_lower_passthrough_edgeflags.c',
   'nir_lower_patch_vertices.c',
   'nir_lower_phis_to_scalar.c',
   'nir_lower_regs_to_ssa.c',
   'nir_lower_returns.c',
   'nir_lower_samplers.c',
   'nir_lower_samplers_as_deref.c',
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index c01fa6707a4..4c5d976a60d 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2486,20 +2486,21 @@ bool nir_lower_constant_initializers(nir_shader *shader,
  nir_variable_mode modes);
 
 bool nir_move_vec_src_uses_to_dest(nir_shader *shader);
 bool nir_lower_vec_to_movs(nir_shader *shader);
 void nir_lower_alpha_test(nir_shader *shader, enum compare_func func,
   bool alpha_to_one);
 bool nir_lower_alu_to_scalar(nir_shader *shader);
 bool nir_lower_load_const_to_scalar(nir_shader *shader);
 bool nir_lower_read_invocation_to_scalar(nir_shader *shader);
 bool nir_lower_phis_to_scalar(nir_shader *shader);
+void nir_lower_io_arrays_to_elements(nir_shader *producer, nir_shader 
*consumer);
 void nir_lower_io_to_scalar(nir_shader *shader, nir_variable_mode mask);
 void nir_lower_io_to_scalar_early(nir_shader *shader, nir_variable_mode mask);
 
 bool nir_lower_samplers(nir_shader *shader,
 const struct gl_shader_program *shader_program);
 bool nir_lower_samplers_as_deref(nir_shader *shader,
  const struct gl_shader_program 
*shader_program);
 
 typedef struct nir_lower_subgroups_options {
uint8_t subgroup_size;
diff --git a/src/compiler/nir/nir_lower_io_arrays_to_elements.c 
b/src/compiler/nir/nir_lower_io_arrays_to_elements.c
new file mode 100644
index 000..94b93e3ec91
--- /dev/null
+++ b/src/compiler/nir/nir_lower_io_arrays_to_elements.c
@@ -0,0 +1,383 @@
+/*
+ * Copyright © 2017 Timothy Arceri
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * 

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-29 Thread Miguel Angel Vico


On Wed, 29 Nov 2017 16:28:15 -0500
Rob Clark  wrote:

> On Wed, Nov 29, 2017 at 2:41 PM, Miguel Angel Vico  
> wrote:
> > Many of you may already know, but James is going to be out for a few
> > weeks and I'll be taking over this in the meantime.
> >
> > See inline for comments.
> >
> > On Wed, 29 Nov 2017 09:33:29 -0800
> > Jason Ekstrand  wrote:
> >  
> >> On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:
> >>  
> >> > On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
> >> > wrote:  
> >> > > On November 24, 2017 09:29:43 Rob Clark  wrote:  
> >> > >>
> >> > >>
> >> > >> On Mon, Nov 20, 2017 at 8:11 PM, James Jones   
> >> > wrote:  
> >> > >>>
> >> > >>> As many here know at this point, I've been working on solving issues
> >> > >>> related
> >> > >>> to DMA-capable memory allocation for various devices for some time 
> >> > >>> now.
> >> > >>> I'd
> >> > >>> like to take this opportunity to apologize for the way I handled the 
> >> > >>>  
> >> > EGL  
> >> > >>> stream proposals.  I understand now that the development process  
> >> > followed  
> >> > >>> there was unacceptable to the community and likely offended many 
> >> > >>> great
> >> > >>> engineers.
> >> > >>>
> >> > >>> Moving forward, I attempted to reboot talks in a more constructive  
> >> > manner  
> >> > >>> with the generic allocator library proposals & discussion forum at 
> >> > >>> XDC
> >> > >>> 2016.
> >> > >>> Some great design ideas came out of that, and I've since been  
> >> > prototyping  
> >> > >>> some code to prove them out before bringing them back as official
> >> > >>> proposals.
> >> > >>> Again, I understand some people are growing concerned that I've been
> >> > >>> doing
> >> > >>> this off on the side in a github project that has primarily NVIDIA
> >> > >>> contributors.  My goal was only to avoid wasting everyone's time with
> >> > >>> unproven ideas.  The intent was never to dump the prototype code 
> >> > >>> as-is  
> >> > on  
> >> > >>> the community and presume acceptance. It's just a public research
> >> > >>> project.
> >> > >>>
> >> > >>> Now the prototyping is nearing completion, and I'd like to renew
> >> > >>> discussion
> >> > >>> on whether and how the new mechanisms can be integrated with the 
> >> > >>> Linux
> >> > >>> graphics stack.
> >> > >>>
> >> > >>> I'd be interested to know if more work is needed to demonstrate the
> >> > >>> usefulness of the new mechanisms, or whether people think they have  
> >> > value  
> >> > >>> at
> >> > >>> this point.
> >> > >>>
> >> > >>> After talking with people on the hallway track at XDC this year, I've
> >> > >>> heard
> >> > >>> several proposals for incorporating the new mechanisms:
> >> > >>>
> >> > >>> -Include ideas from the generic allocator design into GBM.  This 
> >> > >>> could
> >> > >>> take
> >> > >>> the form of designing a "GBM 2.0" API, or incrementally adding to the
> >> > >>> existing GBM API.
> >> > >>>
> >> > >>> -Develop a library to replace GBM.  The allocator prototype code 
> >> > >>> could  
> >> > be  
> >> > >>> massaged into something production worthy to jump start this process.
> >> > >>>
> >> > >>> -Develop a library that sits beside or on top of GBM, using GBM for
> >> > >>> low-level graphics buffer allocation, while supporting non-graphics
> >> > >>> kernel
> >> > >>> APIs directly.  The additional cross-device negotiation and sorting 
> >> > >>> of
> >> > >>> capabilities would be handled in this slightly higher-level API 
> >> > >>> before
> >> > >>> handing off to GBM and other APIs for actual allocation somehow.  
> >> > >>
> >> > >>
> >> > >> tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
> >> > >> still the "winsys" for running on "bare metal" (ie. kms).  And we
> >> > >> don't want to saddle $new_thing with aspects of that, but rather have
> >> > >> it focus on being the thing that in multiple-"device"[1] scenarious
> >> > >> figures out what sort of buffer can be allocated by who for sharing.
> >> > >> Ie $new_thing should really not care about winsys level things like
> >> > >> cursors or surfaces.. only buffers.
> >> > >>
> >> > >> The mesa implementation of $new_thing could sit on top of GBM,
> >> > >> although it could also just sit on top of the same internal APIs that
> >> > >> GBM sits on top of.  That is an implementation detail.  It could be
> >> > >> that GBM grows an API to return an instance of $new_thing for
> >> > >> use-cases that involve sharing a buffer with the GPU.  Or perhaps that
> >> > >> is exposed via some sort of EGL extension.  (We probably also need a
> >> > >> way to get an instance from libdrm (?) for display-only KMS drivers,
> >> > >> to cover cases like etnaviv sharing a buffer with a separate display
> >> > >> driver.)
> >> > >>
> >> > >> [1] where "devices" could be multiple GPUs or multiple APIs for one or
> 

Re: [Mesa-dev] [PATCH] radv: do not allocate CMASK or DCC for small surfaces

2017-11-29 Thread Dave Airlie
On 29 November 2017 at 23:48, Samuel Pitoiset  wrote:
> The idea is ported from RadeonSI, but using 512x512 instead of
> 256x256 seems slightly better. This improves dota2 performance
> by +2%.

I wonder if the threshold is different on some sort of GPU basis (mem bw).

But this seems like the best place to start.

Reviewed-by: Dave Airlie 

>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_image.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c
> index c241e369b9..1bf2fa12ed 100644
> --- a/src/amd/vulkan/radv_image.c
> +++ b/src/amd/vulkan/radv_image.c
> @@ -805,6 +805,16 @@ radv_image_alloc_htile(struct radv_image *image)
>  static inline bool
>  radv_image_can_enable_dcc_or_cmask(struct radv_image *image)
>  {
> +   if (image->info.samples <= 1 &&
> +   image->info.width <= 512 && image->info.height <= 512) {
> +   /* Do not enable CMASK or DCC for small surfaces where the 
> cost
> +* of the eliminate pass can be higher than the benefit of 
> fast
> +* clear. RadeonSI does this, but the image threshold is
> +* different.
> +*/
> +   return false;
> +   }
> +
> return image->usage & VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT &&
>(image->exclusive || image->queue_family_mask == 1);
>  }
> --
> 2.15.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] r600/evergreen compute shader + glsl 4.30 support

2017-11-29 Thread Dave Airlie
On 29 November 2017 at 22:46, Gert Wollny  wrote:
> Am Mittwoch, den 29.11.2017, 14:36 +1000 schrieb Dave Airlie:
>> This set of patches enables compute shaders on r600 and exposes GLSL
>> 4.30 support. They are pretty alpha level, but I'd like to land some
>> of them (maybe disabled) so I can avoid the rebasing fun with the
>> more intrusive ones.
>>
>> It is based on the previous ssbo support patch.
>>
>> It may not be stable, I have a few patches sitting on top locally
>> for flushing various things I want to figure out if they are required
>> or if I can fix things properly.
>>
>
> I run the arb_compute_shader piglits on BARTS, the piglits
>
>basic-texelfetch
>border-color
>multiple-workgroups
>basic-uniform-access
>multiple-texture-reading
>simple-barrier
>
> result in GPU lockups and, consequently, fail. The other 20 tests pass.

Does the attached patch help with the lockups at all?

Dave.
From 3aa8b83628190d452639810d2dbaea8aae8a104d Mon Sep 17 00:00:00 2001
From: Dave Airlie 
Date: Fri, 3 Nov 2017 15:44:51 +1000
Subject: [PATCH] r600/cs more flushes

add some missing flushes
---
 src/gallium/drivers/r600/evergreen_compute.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c
index 4c888a2..ec77bb0 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -799,13 +799,21 @@ static void compute_emit_cs(struct r600_context *rctx,
 	/* Emit dispatch state and dispatch packet */
 	evergreen_emit_dispatch(rctx, info);
 
+	radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
+	radeon_emit(cs, EVENT_TYPE(0x6));
+	radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
+	radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_PS_PARTIAL_FLUSH) | EVENT_INDEX(4));
+	radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
+	radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_CS_PARTIAL_FLUSH) | EVENT_INDEX(4));
 	if (rctx->cs_shader_state.shader->ir_type == PIPE_SHADER_IR_TGSI)
 		evergreen_emit_atomic_buffer_save(rctx, true, combined_atomics, _used_mask);
 	/* XXX evergreen_flush_emit() hardcodes the CP_COHER_SIZE to 0x
 	 */
 	rctx->b.flags |= R600_CONTEXT_INV_CONST_CACHE |
 		  R600_CONTEXT_INV_VERTEX_CACHE |
-	  R600_CONTEXT_INV_TEX_CACHE;
+	  R600_CONTEXT_INV_TEX_CACHE |
+		  R600_CONTEXT_PS_PARTIAL_FLUSH |
+		  R600_CONTEXT_FLUSH_AND_INV_CB;
 	r600_flush_emit(rctx);
 	rctx->b.flags = 0;
 
-- 
2.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] glsl: Fix gl_NormalScale.

2017-11-29 Thread Brian Paul


Reviewed-by: Brian Paul 


On 11/23/2017 01:48 PM, Fabian Bieler wrote:

GLSL shaders can access the normal scale factor with the built-in
gl_NormalScale.  Mesa's modelspace lighting optimization uses a different
normal scale factor than defined in the spec.  We have to take care not
to use this factor for gl_NormalScale.

Mesa already defines two seperate states: state.normalScale and
state.internal.normalScale.  The first is used by the glsl compiler while the
later is used by the fixed function T pipeline.  Previously the only
difference was some component swizzling.  With this commit state.normalScale
always uses the normal scale factor for eyespace lighting.
---
  src/mesa/main/light.c | 3 +++
  src/mesa/main/mtypes.h| 3 ++-
  src/mesa/program/prog_statevars.c | 2 +-
  3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/light.c b/src/mesa/main/light.c
index f52ed8e..67faf8a 100644
--- a/src/mesa/main/light.c
+++ b/src/mesa/main/light.c
@@ -1032,6 +1032,7 @@ static void
  update_modelview_scale( struct gl_context *ctx )
  {
 ctx->_ModelViewInvScale = 1.0F;
+   ctx->_ModelViewInvScaleEyespace = 1.0F;
 if (!_math_matrix_is_length_preserving(ctx->ModelviewMatrixStack.Top)) {
const GLfloat *m = ctx->ModelviewMatrixStack.Top->inv;
GLfloat f = m[2] * m[2] + m[6] * m[6] + m[10] * m[10];
@@ -1040,6 +1041,7 @@ update_modelview_scale( struct gl_context *ctx )
 ctx->_ModelViewInvScale = 1.0f / sqrtf(f);
else
 ctx->_ModelViewInvScale = sqrtf(f);
+  ctx->_ModelViewInvScaleEyespace = 1.0f / sqrtf(f);
 }
  }

@@ -1216,4 +1218,5 @@ _mesa_init_lighting( struct gl_context *ctx )
 ctx->_NeedEyeCoords = GL_FALSE;
 ctx->_ForceEyeCoords = GL_FALSE;
 ctx->_ModelViewInvScale = 1.0;
+   ctx->_ModelViewInvScaleEyespace = 1.0;
  }
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 4682e02..1bd0d2a 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -4953,7 +4953,8 @@ struct gl_context
 /** \name Derived state */
 GLbitfield _ImageTransferState;/**< bitwise-or of IMAGE_*_BIT flags */
 GLfloat _EyeZDir[3];
-   GLfloat _ModelViewInvScale;
+   GLfloat _ModelViewInvScale; /* may be for model- or eyespace lighting */
+   GLfloat _ModelViewInvScaleEyespace; /* always factor defined in spec */
 GLboolean _NeedEyeCoords;
 GLboolean _ForceEyeCoords;

diff --git a/src/mesa/program/prog_statevars.c 
b/src/mesa/program/prog_statevars.c
index 91178e3..b69895c 100644
--- a/src/mesa/program/prog_statevars.c
+++ b/src/mesa/program/prog_statevars.c
@@ -422,7 +422,7 @@ _mesa_fetch_state(struct gl_context *ctx, const 
gl_state_index state[],
return;

 case STATE_NORMAL_SCALE:
-  ASSIGN_4V(value, ctx->_ModelViewInvScale, 0, 0, 1);
+  ASSIGN_4V(value, ctx->_ModelViewInvScaleEyespace, 0, 0, 1);
return;

 case STATE_INTERNAL:



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] glsl: Match order of gl_LightSourceParameters elements.

2017-11-29 Thread Brian Paul

Reviewed-by: Brian Paul 

I think you could tag both of your patches for the stable branch.

On 11/23/2017 01:48 PM, Fabian Bieler wrote:

spotExponent and spotCosCutoff were swapped in the gl_builtin_uniform_element
struct.
Now the order matches across gl_builtin_uniform_element, glsl_struct_field and
the spec.
---
  src/compiler/glsl/builtin_variables.cpp | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/compiler/glsl/builtin_variables.cpp 
b/src/compiler/glsl/builtin_variables.cpp
index 00bc99d..a885f32 100644
--- a/src/compiler/glsl/builtin_variables.cpp
+++ b/src/compiler/glsl/builtin_variables.cpp
@@ -90,9 +90,9 @@ static const struct gl_builtin_uniform_element 
gl_LightSource_elements[] = {
  SWIZZLE_Y,
  SWIZZLE_Z,
  SWIZZLE_Z)},
-   {"spotCosCutoff", {STATE_LIGHT, 0, STATE_SPOT_DIRECTION}, SWIZZLE_},
-   {"spotCutoff", {STATE_LIGHT, 0, STATE_SPOT_CUTOFF}, SWIZZLE_},
 {"spotExponent", {STATE_LIGHT, 0, STATE_ATTENUATION}, SWIZZLE_},
+   {"spotCutoff", {STATE_LIGHT, 0, STATE_SPOT_CUTOFF}, SWIZZLE_},
+   {"spotCosCutoff", {STATE_LIGHT, 0, STATE_SPOT_DIRECTION}, SWIZZLE_},
 {"constantAttenuation", {STATE_LIGHT, 0, STATE_ATTENUATION}, SWIZZLE_},
 {"linearAttenuation", {STATE_LIGHT, 0, STATE_ATTENUATION}, SWIZZLE_},
 {"quadraticAttenuation", {STATE_LIGHT, 0, STATE_ATTENUATION}, 
SWIZZLE_},



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 09/32] anv: Require a dedicated allocation for modified images

2017-11-29 Thread Jason Ekstrand
On Wed, Nov 29, 2017 at 2:24 PM, Chad Versace 
wrote:

> On Tue 28 Nov 2017, Jason Ekstrand wrote:
> > This lets us set the BO tiling when we allocate the memory.  This is
> > required for GL to work properly.
> > ---
> >  src/intel/vulkan/anv_device.c | 53 ++
> +
> >  1 file changed, 49 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/intel/vulkan/anv_device.c
> b/src/intel/vulkan/anv_device.c
> > index df929e4..d82d1f7 100644
> > --- a/src/intel/vulkan/anv_device.c
> > +++ b/src/intel/vulkan/anv_device.c
> > @@ -29,6 +29,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include "anv_private.h"
> >  #include "util/strtod.h"
> > @@ -1612,6 +1613,40 @@ VkResult anv_AllocateMemory(
> >>bo);
> >if (result != VK_SUCCESS)
> >   goto fail;
> > +
> > +  const VkMemoryDedicatedAllocateInfoKHR *dedicated_info =
> > + vk_find_struct_const(pAllocateInfo->pNext,
> MEMORY_DEDICATED_ALLOCATE_INFO_KHR);
> > +  if (dedicated_info && dedicated_info->image != VK_NULL_HANDLE) {
> > + ANV_FROM_HANDLE(anv_image, image, dedicated_info->image);
> > +
> > + /* For images using modifiers, we require a dedicated
> allocation
> > +  * and we set the BO tiling to match the tiling of the
> underlying
> > +  * modifier.  This is a bit unfortunate as this is completely
> > +  * pointless for Vulkan.  However, GL needs to be able to map
> things
> > +  * so it needs the tiling to be set.  The only way to do this
> in a
> > +  * non-racy way is to set the tiling in the creator of the BO
> so that
> > +  * makes it our job.
> > +  *
> > +  * One of these days, once the GL driver learns to not map
> things
> > +  * through the GTT in random places, we can drop this and start
> > +  * allowing multiple modified images in the same BO.
> > +  */
> > + if (image->drm_format_mod != DRM_FORMAT_MOD_INVALID) {
> > +assert(isl_drm_modifier_get_info(image->drm_format_mod)->tiling
> ==
> > +   image->planes[0].surface.isl.tiling);
> > +const uint32_t i915_tiling =
> > +   isl_tiling_to_i915_tiling(image->planes[0].surface.isl.
> tiling);
> > +int ret = anv_gem_set_tiling(device, mem->bo->gem_handle,
> > + image->planes[0].surface.isl.
> row_pitch,
> > + i915_tiling);
> > +if (ret) {
> > +   anv_bo_cache_release(device, >bo_cache, mem->bo);
> > +   return vk_errorf(device->instance, NULL,
> > +VK_ERROR_OUT_OF_DEVICE_MEMORY,
> > +"failed to set BO tiling: %m");
> > +}
> > + }
> > +  }
> > }
>
> This hunk of code is run only for non-imported memory. It's in the
> 'else' branch of `if (fd_info && fd_info->handleTypes)`. I believe it
> needs to be run in both branches. Otherwise, if an app creates
> a dma_buf, imports it into Vulkan, binds the image to the dma_buf, and
> afterwards imports the same dma_buf into GL as an EGLImage, then
> intel_create_image_from_fds_common() will fail to discover the bo's
> tiling.
>

Yes, that was a bit intentional because setting it any time other than
image creation is racy.  If we were going to set it on import as well, we
may as well just do that in the GL driver and call it a day.


> Everything else in this patch looks correct to me, even though it makes
> me sad.
>

Agreed. :(

--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glx: add support for GLX_ARB_create_context_no_error (v2)

2017-11-29 Thread Brian Paul

On 11/29/2017 02:30 PM, Adam Jackson wrote:

On Wed, 2017-11-29 at 16:20 -0500, Adam Jackson wrote:

From: Grigori Goronzy 

v2: Only reject no-error contexts for too-old GL if we're actually
trying to create a no-error context (Adam Jackson)

Reviewed-by: Adam Jackson 


Note that the original series for this also updated st/glx. That patch
looks correct to me as far as it goes, but it does change
XMesaCreateContext's function signature, which seems rude. Someone who
actually cares about the st/glx code should make that decision.


I don't know if any apps directly use the XMesa interface (I hope not). 
 The alternatives to adding a new parameter to XMesaCreateContext() is 
to add a new entrypoing, or just "poke" the new state into the 
XMesaContext object from the call site.


-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/6] radeonsi: clear PIPE_IMAGE_ACCESS_WRITE when it's invalid to be on the safe side

2017-11-29 Thread Nicolai Hähnle

On 29.11.2017 15:29, Marek Olšák wrote:

On Wed, Nov 29, 2017 at 10:59 AM, Nicolai Hähnle  wrote:

On 28.11.2017 22:17, Marek Olšák wrote:


From: Marek Olšák 

---
   src/gallium/drivers/radeonsi/si_descriptors.c | 8 
   1 file changed, 8 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 69371ea..471c93a 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -681,20 +681,28 @@ static void si_set_shader_image_desc(struct
si_context *ctx,
   view->format,
   view->u.buf.offset,
   view->u.buf.size, desc);
 si_set_buf_desc_address(res, view->u.buf.offset, desc +
4);
 } else {
 static const unsigned char swizzle[4] = { 0, 1, 2, 3 };
 struct r600_texture *tex = (struct r600_texture *)res;
 unsigned level = view->u.tex.level;
 unsigned width, height, depth, hw_level;
 bool uses_dcc = vi_dcc_enabled(tex, level);
+   unsigned access = view->access;
+
+   /* Clear the write flag when writes can't occur.
+* Note that DCC_DECOMPRESS for MSAA doesn't work in some
cases,
+* so we don't wanna trigger it.
+*/
+   if (tex->is_depth || tex->resource.b.b.nr_samples >= 2)
+   access &= ~PIPE_IMAGE_ACCESS_WRITE;



Shouldn't this rather be an assert()? Just removing the bit here won't stop
the application from attempting to write to it from a shader anyway.

We shouldn't be hitting that assert anyway, since we don't currently support
MSAA images.


Better safe than sorry. I think it's better to recover from a bad
scenario than to fail an assertion, considering that assertions aren't
even enabled in release builds. I'll add an assertion into the
conditional block.


Sounds good, thanks :)




Marek




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 09/32] anv: Require a dedicated allocation for modified images

2017-11-29 Thread Chad Versace
On Tue 28 Nov 2017, Jason Ekstrand wrote:
> This lets us set the BO tiling when we allocate the memory.  This is
> required for GL to work properly.
> ---
>  src/intel/vulkan/anv_device.c | 53 
> +++
>  1 file changed, 49 insertions(+), 4 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
> index df929e4..d82d1f7 100644
> --- a/src/intel/vulkan/anv_device.c
> +++ b/src/intel/vulkan/anv_device.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "anv_private.h"
>  #include "util/strtod.h"
> @@ -1612,6 +1613,40 @@ VkResult anv_AllocateMemory(
>>bo);
>if (result != VK_SUCCESS)
>   goto fail;
> +
> +  const VkMemoryDedicatedAllocateInfoKHR *dedicated_info =
> + vk_find_struct_const(pAllocateInfo->pNext, 
> MEMORY_DEDICATED_ALLOCATE_INFO_KHR);
> +  if (dedicated_info && dedicated_info->image != VK_NULL_HANDLE) {
> + ANV_FROM_HANDLE(anv_image, image, dedicated_info->image);
> +
> + /* For images using modifiers, we require a dedicated allocation
> +  * and we set the BO tiling to match the tiling of the underlying
> +  * modifier.  This is a bit unfortunate as this is completely
> +  * pointless for Vulkan.  However, GL needs to be able to map things
> +  * so it needs the tiling to be set.  The only way to do this in a
> +  * non-racy way is to set the tiling in the creator of the BO so 
> that
> +  * makes it our job.
> +  *
> +  * One of these days, once the GL driver learns to not map things
> +  * through the GTT in random places, we can drop this and start
> +  * allowing multiple modified images in the same BO.
> +  */
> + if (image->drm_format_mod != DRM_FORMAT_MOD_INVALID) {
> +assert(isl_drm_modifier_get_info(image->drm_format_mod)->tiling 
> ==
> +   image->planes[0].surface.isl.tiling);
> +const uint32_t i915_tiling =
> +   
> isl_tiling_to_i915_tiling(image->planes[0].surface.isl.tiling);
> +int ret = anv_gem_set_tiling(device, mem->bo->gem_handle,
> + 
> image->planes[0].surface.isl.row_pitch,
> + i915_tiling);
> +if (ret) {
> +   anv_bo_cache_release(device, >bo_cache, mem->bo);
> +   return vk_errorf(device->instance, NULL,
> +VK_ERROR_OUT_OF_DEVICE_MEMORY,
> +"failed to set BO tiling: %m");
> +}
> + }
> +  }
> }

This hunk of code is run only for non-imported memory. It's in the
'else' branch of `if (fd_info && fd_info->handleTypes)`. I believe it
needs to be run in both branches. Otherwise, if an app creates
a dma_buf, imports it into Vulkan, binds the image to the dma_buf, and
afterwards imports the same dma_buf into GL as an EGLImage, then
intel_create_image_from_fds_common() will fail to discover the bo's
tiling.

Everything else in this patch looks correct to me, even though it makes
me sad.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 06/32] anv: Implement VK_EXT_external_memory_dma_buf

2017-11-29 Thread Jason Ekstrand
On Wed, Nov 29, 2017 at 1:44 PM, Chad Versace 
wrote:

> On Tue 28 Nov 2017, Jason Ekstrand wrote:
> > This is a modified version of the patch originally sent by Chad Versace.
> > The primary difference is that this version claims that OPQAUE_FD and
> > DMA_BUF are compatible handle types.
> > ---
> >  src/intel/vulkan/anv_device.c  | 13 ++---
> >  src/intel/vulkan/anv_extensions.py |  1 +
> >  src/intel/vulkan/anv_formats.c |  8 ++--
> >  3 files changed, 13 insertions(+), 9 deletions(-)
>
>
> > @@ -1641,9 +1641,8 @@ VkResult anv_GetMemoryFdKHR(
> >
> > assert(pGetFdInfo->sType == VK_STRUCTURE_TYPE_MEMORY_GET_
> FD_INFO_KHR);
> >
> > -   /* We support only one handle type. */
> > -   assert(pGetFdInfo->handleType ==
> > -  VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR);
> > +   assert(pGetFdInfo->handleType == 
> > VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR
> ||
> > +  pGetFdInfo->handleType == VK_EXTERNAL_MEMORY_HANDLE_
> TYPE_DMA_BUF_BIT_EXT);
> >
> > return anv_bo_cache_export(dev, >bo_cache, mem->bo, pFd);
> >  }
>
> Same as in the radv patch, anv_GetMemoryFdPropertiesKHR must be updated
> for dma-buf.
>

Done.  See wip/vulkan-wsi-prime
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/8] nir: add varying array splitting pass

2017-11-29 Thread Timothy Arceri

On 29/11/17 21:47, Nicolai Hähnle wrote:

On 21.11.2017 04:28, Timothy Arceri wrote:

V2: fix matrix support, non-array matrices were being skipped in v1
---
  src/compiler/Makefile.sources  |   1 +
  src/compiler/nir/meson.build   |   1 +
  src/compiler/nir/nir.h |   1 +
  src/compiler/nir/nir_lower_io_arrays_to_elements.c | 373 
+

  4 files changed, 376 insertions(+)
  create mode 100644 src/compiler/nir/nir_lower_io_arrays_to_elements.c

diff --git a/src/compiler/Makefile.sources 
b/src/compiler/Makefile.sources

index 2ab8e163a2..c5094b7f19 100644
--- a/src/compiler/Makefile.sources
+++ b/src/compiler/Makefile.sources
@@ -219,20 +219,21 @@ NIR_FILES = \
  nir/nir_lower_double_ops.c \
  nir/nir_lower_drawpixels.c \
  nir/nir_lower_global_vars_to_local.c \
  nir/nir_lower_gs_intrinsics.c \
  nir/nir_lower_load_const_to_scalar.c \
  nir/nir_lower_locals_to_regs.c \
  nir/nir_lower_idiv.c \
  nir/nir_lower_indirect_derefs.c \
  nir/nir_lower_int64.c \
  nir/nir_lower_io.c \
+    nir/nir_lower_io_arrays_to_elements.c \
  nir/nir_lower_io_to_temporaries.c \
  nir/nir_lower_io_to_scalar.c \
  nir/nir_lower_io_types.c \
  nir/nir_lower_passthrough_edgeflags.c \
  nir/nir_lower_patch_vertices.c \
  nir/nir_lower_phis_to_scalar.c \
  nir/nir_lower_regs_to_ssa.c \
  nir/nir_lower_returns.c \
  nir/nir_lower_samplers.c \
  nir/nir_lower_samplers_as_deref.c \
diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
index e5c8326aa0..b61a07773d 100644
--- a/src/compiler/nir/meson.build
+++ b/src/compiler/nir/meson.build
@@ -107,20 +107,21 @@ files_libnir = files(
    'nir_lower_double_ops.c',
    'nir_lower_drawpixels.c',
    'nir_lower_global_vars_to_local.c',
    'nir_lower_gs_intrinsics.c',
    'nir_lower_load_const_to_scalar.c',
    'nir_lower_locals_to_regs.c',
    'nir_lower_idiv.c',
    'nir_lower_indirect_derefs.c',
    'nir_lower_int64.c',
    'nir_lower_io.c',
+  'nir_lower_io_arrays_to_elements.c',
    'nir_lower_io_to_temporaries.c',
    'nir_lower_io_to_scalar.c',
    'nir_lower_io_types.c',
    'nir_lower_passthrough_edgeflags.c',
    'nir_lower_patch_vertices.c',
    'nir_lower_phis_to_scalar.c',
    'nir_lower_regs_to_ssa.c',
    'nir_lower_returns.c',
    'nir_lower_samplers.c',
    'nir_lower_samplers_as_deref.c',
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index f46f614711..c62af4afb9 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2475,20 +2475,21 @@ bool 
nir_lower_constant_initializers(nir_shader *shader,

   nir_variable_mode modes);
  bool nir_move_vec_src_uses_to_dest(nir_shader *shader);
  bool nir_lower_vec_to_movs(nir_shader *shader);
  void nir_lower_alpha_test(nir_shader *shader, enum compare_func func,
    bool alpha_to_one);
  bool nir_lower_alu_to_scalar(nir_shader *shader);
  bool nir_lower_load_const_to_scalar(nir_shader *shader);
  bool nir_lower_read_invocation_to_scalar(nir_shader *shader);
  bool nir_lower_phis_to_scalar(nir_shader *shader);
+void nir_lower_io_arrays_to_elements(nir_shader *producer, nir_shader 
*consumer);
  void nir_lower_io_to_scalar(nir_shader *shader, nir_variable_mode 
mask);
  void nir_lower_io_to_scalar_early(nir_shader *shader, 
nir_variable_mode mask);

  bool nir_lower_samplers(nir_shader *shader,
  const struct gl_shader_program 
*shader_program);

  bool nir_lower_samplers_as_deref(nir_shader *shader,
   const struct gl_shader_program 
*shader_program);

  typedef struct nir_lower_subgroups_options {
 uint8_t subgroup_size;
diff --git a/src/compiler/nir/nir_lower_io_arrays_to_elements.c 
b/src/compiler/nir/nir_lower_io_arrays_to_elements.c

new file mode 100644
index 00..c41f300edb
--- /dev/null
+++ b/src/compiler/nir/nir_lower_io_arrays_to_elements.c
@@ -0,0 +1,373 @@
+/*
+ * Copyright © 2017 Timothy Arceri
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a
+ * copy of this software and associated documentation files (the 
"Software"),
+ * to deal in the Software without restriction, including without 
limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
sublicense,

+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including 
the next
+ * paragraph) shall be included in all copies or substantial portions 
of the

+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT 
SHALL
+ * THE AUTHORS OR COPYRIGHT 

Re: [Mesa-dev] [PATCH] glx: add support for GLX_ARB_create_context_no_error (v2)

2017-11-29 Thread Adam Jackson
On Wed, 2017-11-29 at 16:20 -0500, Adam Jackson wrote:
> From: Grigori Goronzy 
> 
> v2: Only reject no-error contexts for too-old GL if we're actually
> trying to create a no-error context (Adam Jackson)

D'oh, this is still busted, sorry for the noise. We're not saving the
no-error state in the context, so the share-context check will get it
wrong (no-error contexts would never be able to share). v3 to follow
once I figure out why vim isn't honoring editorconfig.

- ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 07/32] radv: Implement VK_EXT_external_memory_dma_buf

2017-11-29 Thread Jason Ekstrand
On Wed, Nov 29, 2017 at 1:41 PM, Chad Versace 
wrote:

> On Tue 28 Nov 2017, Jason Ekstrand wrote:
> > ---
> >  src/amd/vulkan/radv_device.c  | 8 ++--
> >  src/amd/vulkan/radv_extensions.py | 1 +
> >  src/amd/vulkan/radv_formats.c | 8 ++--
> >  3 files changed, 13 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> > index 8e5ae0b..4517227 100644
> > --- a/src/amd/vulkan/radv_device.c
> > +++ b/src/amd/vulkan/radv_device.c
> > @@ -2166,7 +2166,9 @@ VkResult radv_alloc_memory(VkDevice
> _device,
> >
> >   if (import_info) {
> >   assert(import_info->handleType ==
> > -VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR);
> > +VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR ||
> > +import_info->handleType ==
> > +VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT);
> >   mem->bo = device->ws->buffer_from_fd(device->ws,
> import_info->fd,
> >NULL, NULL);
> >   if (!mem->bo) {
> > @@ -3540,7 +3542,9 @@ VkResult radv_GetMemoryFdKHR(VkDevice _device,
> >
> >   /* We support only one handle type. */
>
> This comment needs updating. I suggest copy-pasting the comment from
> anvil.
>

Done.


> >   assert(pGetFdInfo->handleType ==
> > -VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR);
> > +VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR ||
> > +pGetFdInfo->handleType ==
> > +VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT);
> >
> >   bool ret = radv_get_memory_fd(device, memory, pFD);
> >   if (ret == false)
>
> Also, radv_GetMemoryFdPropertiesKHR() (the next function in this file)
> needs updating to support dma-buf.  Today it unconditionally fails.
>

This was missing from anv as well.  I've added it and will re-push the
branch in a minute.

For radv, I really don't know what to do here.  I'm sure there are some FD
properties I should probably query and then walk a list of types and fill
out some bits.  I have no idea how their memory handle interfaces work.
Dave?

--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 08/32] anv/image: Add a drm_format_mod field

2017-11-29 Thread Chad Versace
On Tue 28 Nov 2017, Jason Ekstrand wrote:
> At the moment, this is always initialized to DRM_FORMAT_MOD_INVALID.
> ---
>  src/intel/vulkan/anv_image.c   | 2 ++
>  src/intel/vulkan/anv_private.h | 5 +
>  2 files changed, 7 insertions(+)

This patch is
Reviewed-by: Chad Versace 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 06/32] anv: Implement VK_EXT_external_memory_dma_buf

2017-11-29 Thread Chad Versace
On Tue 28 Nov 2017, Jason Ekstrand wrote:
> This is a modified version of the patch originally sent by Chad Versace.
> The primary difference is that this version claims that OPQAUE_FD and
> DMA_BUF are compatible handle types.
> ---
>  src/intel/vulkan/anv_device.c  | 13 ++---
>  src/intel/vulkan/anv_extensions.py |  1 +
>  src/intel/vulkan/anv_formats.c |  8 ++--
>  3 files changed, 13 insertions(+), 9 deletions(-)


> @@ -1641,9 +1641,8 @@ VkResult anv_GetMemoryFdKHR(
>  
> assert(pGetFdInfo->sType == VK_STRUCTURE_TYPE_MEMORY_GET_FD_INFO_KHR);
>  
> -   /* We support only one handle type. */
> -   assert(pGetFdInfo->handleType ==
> -  VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR);
> +   assert(pGetFdInfo->handleType == 
> VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR ||
> +  pGetFdInfo->handleType == 
> VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT);
>  
> return anv_bo_cache_export(dev, >bo_cache, mem->bo, pFd);
>  }

Same as in the radv patch, anv_GetMemoryFdPropertiesKHR must be updated
for dma-buf.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 07/32] radv: Implement VK_EXT_external_memory_dma_buf

2017-11-29 Thread Chad Versace
On Tue 28 Nov 2017, Jason Ekstrand wrote:
> ---
>  src/amd/vulkan/radv_device.c  | 8 ++--
>  src/amd/vulkan/radv_extensions.py | 1 +
>  src/amd/vulkan/radv_formats.c | 8 ++--
>  3 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index 8e5ae0b..4517227 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -2166,7 +2166,9 @@ VkResult radv_alloc_memory(VkDevice 
>_device,
>  
>   if (import_info) {
>   assert(import_info->handleType ==
> -VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR);
> +VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR ||
> +import_info->handleType ==
> +VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT);
>   mem->bo = device->ws->buffer_from_fd(device->ws, 
> import_info->fd,
>NULL, NULL);
>   if (!mem->bo) {
> @@ -3540,7 +3542,9 @@ VkResult radv_GetMemoryFdKHR(VkDevice _device,
>  
>   /* We support only one handle type. */

This comment needs updating. I suggest copy-pasting the comment from
anvil.

>   assert(pGetFdInfo->handleType ==
> -VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR);
> +VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR ||
> +pGetFdInfo->handleType ==
> +VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT);
>  
>   bool ret = radv_get_memory_fd(device, memory, pFD);
>   if (ret == false)

Also, radv_GetMemoryFdPropertiesKHR() (the next function in this file)
needs updating to support dma-buf.  Today it unconditionally fails.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] configure: avoid testing for negative compiler options

2017-11-29 Thread Dylan Baker
Reviewed-by: Dylan Baker 

Quoting Marc Dietrich (2017-11-29 13:25:05)
> gcc seems to always accept unsupported negative compiler warning options:
> 
> echo "int i;" | gcc -c -xc -Wno-bob - # no error
> echo "int i;" | gcc -c -xc -Walice -  # unsupported compiler option
> 
> Inverting the options fixes the tests.
> 
> V2: fix options in meson build
> 
> Reviewed-by: Matt Turner 
> Signed-off-by: Marc Dietrich 
> ---
>  configure.ac |  6 --
>  meson.build  | 23 +++
>  2 files changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/configure.ac b/configure.ac
> index 1344c12884..3f9a5c85b1 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -394,8 +394,10 @@ esac
>  AC_SUBST([VISIBILITY_CFLAGS])
>  AC_SUBST([VISIBILITY_CXXFLAGS])
>  
> -AX_CHECK_COMPILE_FLAG([-Wno-override-init],
> [WNO_OVERRIDE_INIT="$WNO_OVERRIDE_INIT -Wno-override-init"]) # gcc
> -AX_CHECK_COMPILE_FLAG([-Wno-initializer-overrides],
> [WNO_OVERRIDE_INIT="$WNO_OVERRIDE_INIT -Wno-initializer-overrides"]) # clang
> +dnl For some reason, the test for -Wno-foo always succeeds with gcc, even if 
> the
> +dnl option is not supported. Hence, check for -Wfoo instead.
> +AX_CHECK_COMPILE_FLAG([-Woverride-init],
> [WNO_OVERRIDE_INIT="$WNO_OVERRIDE_INIT -Wno-override-init"]) # gcc
> +AX_CHECK_COMPILE_FLAG([-Winitializer-overrides],
> [WNO_OVERRIDE_INIT="$WNO_OVERRIDE_INIT -Wno-initializer-overrides"]) # clang
>  AC_SUBST([WNO_OVERRIDE_INIT])
>  
>  dnl
> diff --git a/meson.build b/meson.build
> index 919f1c2d41..a55d5ed391 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -683,11 +683,25 @@ endif
>  cpp = meson.get_compiler('cpp')
>  cpp_args = []
>  foreach a : ['-Wall', '-fno-math-errno', '-fno-trapping-math',
> - '-Qunused-arguments', '-Wno-non-virtual-dtor']
> + '-Qunused-arguments']
>if cpp.has_argument(a)
>  cpp_args += a
>endif
>  endforeach
> +
> +# For some reason, the test for -Wno-foo always succeeds with gcc, even if 
> the
> +# option is not supported. Hence, check for -Wfoo instead.
> +if cpp.has_argument('-Wnon-virtual-dtor')
> +  cpp_args += '-Wno-non-virtual-dtor'
> +endif
> +
> +no_override_init_args = []
> +foreach a : ['override-init', 'initializer-overrides']
> +  if cc.has_argument('-W' + a)
> +no_override_init_args += '-Wno-' + a
> +  endif
> +endforeach
> +
>  cpp_vis_args = []
>  if cpp.has_argument('-fvisibility=hidden')
>cpp_vis_args += '-fvisibility=hidden'
> @@ -707,13 +721,6 @@ foreach a : ['-Werror=pointer-arith', '-Werror=vla']
>endif
>  endforeach
>  
> -no_override_init_args = []
> -foreach a : ['-Wno-override-init', '-Wno-initializer-overrides']
> -  if cc.has_argument(a)
> -no_override_init_args += a
> -  endif
> -endforeach
> -
>  if host_machine.cpu_family().startswith('x86')
>pre_args += '-DHAVE_SSE41'
>with_sse41 = true
> -- 
> 2.15.0
> 


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glx: add support for GLX_ARB_create_context_no_error (v2)

2017-11-29 Thread Adam Jackson
On Wed, 2017-11-29 at 16:20 -0500, Adam Jackson wrote:
> From: Grigori Goronzy 
> 
> v2: Only reject no-error contexts for too-old GL if we're actually
> trying to create a no-error context (Adam Jackson)
> 
> Reviewed-by: Adam Jackson 

Note that the original series for this also updated st/glx. That patch
looks correct to me as far as it goes, but it does change
XMesaCreateContext's function signature, which seems rude. Someone who
actually cares about the st/glx code should make that decision.

- ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-29 Thread Rob Clark
On Wed, Nov 29, 2017 at 2:41 PM, Miguel Angel Vico  wrote:
> Many of you may already know, but James is going to be out for a few
> weeks and I'll be taking over this in the meantime.
>
> See inline for comments.
>
> On Wed, 29 Nov 2017 09:33:29 -0800
> Jason Ekstrand  wrote:
>
>> On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:
>>
>> > On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
>> > wrote:
>> > > On November 24, 2017 09:29:43 Rob Clark  wrote:
>> > >>
>> > >>
>> > >> On Mon, Nov 20, 2017 at 8:11 PM, James Jones 
>> > wrote:
>> > >>>
>> > >>> As many here know at this point, I've been working on solving issues
>> > >>> related
>> > >>> to DMA-capable memory allocation for various devices for some time now.
>> > >>> I'd
>> > >>> like to take this opportunity to apologize for the way I handled the
>> > EGL
>> > >>> stream proposals.  I understand now that the development process
>> > followed
>> > >>> there was unacceptable to the community and likely offended many great
>> > >>> engineers.
>> > >>>
>> > >>> Moving forward, I attempted to reboot talks in a more constructive
>> > manner
>> > >>> with the generic allocator library proposals & discussion forum at XDC
>> > >>> 2016.
>> > >>> Some great design ideas came out of that, and I've since been
>> > prototyping
>> > >>> some code to prove them out before bringing them back as official
>> > >>> proposals.
>> > >>> Again, I understand some people are growing concerned that I've been
>> > >>> doing
>> > >>> this off on the side in a github project that has primarily NVIDIA
>> > >>> contributors.  My goal was only to avoid wasting everyone's time with
>> > >>> unproven ideas.  The intent was never to dump the prototype code as-is
>> > on
>> > >>> the community and presume acceptance. It's just a public research
>> > >>> project.
>> > >>>
>> > >>> Now the prototyping is nearing completion, and I'd like to renew
>> > >>> discussion
>> > >>> on whether and how the new mechanisms can be integrated with the Linux
>> > >>> graphics stack.
>> > >>>
>> > >>> I'd be interested to know if more work is needed to demonstrate the
>> > >>> usefulness of the new mechanisms, or whether people think they have
>> > value
>> > >>> at
>> > >>> this point.
>> > >>>
>> > >>> After talking with people on the hallway track at XDC this year, I've
>> > >>> heard
>> > >>> several proposals for incorporating the new mechanisms:
>> > >>>
>> > >>> -Include ideas from the generic allocator design into GBM.  This could
>> > >>> take
>> > >>> the form of designing a "GBM 2.0" API, or incrementally adding to the
>> > >>> existing GBM API.
>> > >>>
>> > >>> -Develop a library to replace GBM.  The allocator prototype code could
>> > be
>> > >>> massaged into something production worthy to jump start this process.
>> > >>>
>> > >>> -Develop a library that sits beside or on top of GBM, using GBM for
>> > >>> low-level graphics buffer allocation, while supporting non-graphics
>> > >>> kernel
>> > >>> APIs directly.  The additional cross-device negotiation and sorting of
>> > >>> capabilities would be handled in this slightly higher-level API before
>> > >>> handing off to GBM and other APIs for actual allocation somehow.
>> > >>
>> > >>
>> > >> tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
>> > >> still the "winsys" for running on "bare metal" (ie. kms).  And we
>> > >> don't want to saddle $new_thing with aspects of that, but rather have
>> > >> it focus on being the thing that in multiple-"device"[1] scenarious
>> > >> figures out what sort of buffer can be allocated by who for sharing.
>> > >> Ie $new_thing should really not care about winsys level things like
>> > >> cursors or surfaces.. only buffers.
>> > >>
>> > >> The mesa implementation of $new_thing could sit on top of GBM,
>> > >> although it could also just sit on top of the same internal APIs that
>> > >> GBM sits on top of.  That is an implementation detail.  It could be
>> > >> that GBM grows an API to return an instance of $new_thing for
>> > >> use-cases that involve sharing a buffer with the GPU.  Or perhaps that
>> > >> is exposed via some sort of EGL extension.  (We probably also need a
>> > >> way to get an instance from libdrm (?) for display-only KMS drivers,
>> > >> to cover cases like etnaviv sharing a buffer with a separate display
>> > >> driver.)
>> > >>
>> > >> [1] where "devices" could be multiple GPUs or multiple APIs for one or
>> > >> more GPUs, but also includes non-GPU devices like camera, video
>> > >> decoder, "image processor" (which may or may not be part of camera),
>> > >> etc, etc
>> > >
>> > >
>> > > I'm not quite some sure what I think about this.  I think I would like to
>> > > see $new_thing at least replace the guts of GBM. Whether GBM becomes a
>> > > wrapper around $new_thing or $new_thing implements the GBM API, I'm not
>> > > sure.  What I 

[Mesa-dev] [PATCH v2] configure: avoid testing for negative compiler options

2017-11-29 Thread Marc Dietrich
gcc seems to always accept unsupported negative compiler warning options:

echo "int i;" | gcc -c -xc -Wno-bob - # no error
echo "int i;" | gcc -c -xc -Walice -  # unsupported compiler option

Inverting the options fixes the tests.

V2: fix options in meson build

Reviewed-by: Matt Turner 
Signed-off-by: Marc Dietrich 
---
 configure.ac |  6 --
 meson.build  | 23 +++
 2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/configure.ac b/configure.ac
index 1344c12884..3f9a5c85b1 100644
--- a/configure.ac
+++ b/configure.ac
@@ -394,8 +394,10 @@ esac
 AC_SUBST([VISIBILITY_CFLAGS])
 AC_SUBST([VISIBILITY_CXXFLAGS])
 
-AX_CHECK_COMPILE_FLAG([-Wno-override-init],
[WNO_OVERRIDE_INIT="$WNO_OVERRIDE_INIT -Wno-override-init"]) # gcc
-AX_CHECK_COMPILE_FLAG([-Wno-initializer-overrides],
[WNO_OVERRIDE_INIT="$WNO_OVERRIDE_INIT -Wno-initializer-overrides"]) # clang
+dnl For some reason, the test for -Wno-foo always succeeds with gcc, even if 
the
+dnl option is not supported. Hence, check for -Wfoo instead.
+AX_CHECK_COMPILE_FLAG([-Woverride-init],
[WNO_OVERRIDE_INIT="$WNO_OVERRIDE_INIT -Wno-override-init"]) # gcc
+AX_CHECK_COMPILE_FLAG([-Winitializer-overrides],
[WNO_OVERRIDE_INIT="$WNO_OVERRIDE_INIT -Wno-initializer-overrides"]) # clang
 AC_SUBST([WNO_OVERRIDE_INIT])
 
 dnl
diff --git a/meson.build b/meson.build
index 919f1c2d41..a55d5ed391 100644
--- a/meson.build
+++ b/meson.build
@@ -683,11 +683,25 @@ endif
 cpp = meson.get_compiler('cpp')
 cpp_args = []
 foreach a : ['-Wall', '-fno-math-errno', '-fno-trapping-math',
- '-Qunused-arguments', '-Wno-non-virtual-dtor']
+ '-Qunused-arguments']
   if cpp.has_argument(a)
 cpp_args += a
   endif
 endforeach
+
+# For some reason, the test for -Wno-foo always succeeds with gcc, even if the
+# option is not supported. Hence, check for -Wfoo instead.
+if cpp.has_argument('-Wnon-virtual-dtor')
+  cpp_args += '-Wno-non-virtual-dtor'
+endif
+
+no_override_init_args = []
+foreach a : ['override-init', 'initializer-overrides']
+  if cc.has_argument('-W' + a)
+no_override_init_args += '-Wno-' + a
+  endif
+endforeach
+
 cpp_vis_args = []
 if cpp.has_argument('-fvisibility=hidden')
   cpp_vis_args += '-fvisibility=hidden'
@@ -707,13 +721,6 @@ foreach a : ['-Werror=pointer-arith', '-Werror=vla']
   endif
 endforeach
 
-no_override_init_args = []
-foreach a : ['-Wno-override-init', '-Wno-initializer-overrides']
-  if cc.has_argument(a)
-no_override_init_args += a
-  endif
-endforeach
-
 if host_machine.cpu_family().startswith('x86')
   pre_args += '-DHAVE_SSE41'
   with_sse41 = true
-- 
2.15.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glx: add support for GLX_ARB_create_context_no_error (v2)

2017-11-29 Thread Adam Jackson
From: Grigori Goronzy 

v2: Only reject no-error contexts for too-old GL if we're actually
trying to create a no-error context (Adam Jackson)

Reviewed-by: Adam Jackson 
---
 src/glx/dri2_glx.c  | 12 
 src/glx/dri3_glx.c  |  8 
 src/glx/dri_common.c| 52 -
 src/glx/dri_common.h|  5 +
 src/glx/drisw_glx.c |  3 +++
 src/glx/glxclient.h |  6 ++
 src/glx/glxextensions.c |  1 +
 src/glx/glxextensions.h |  1 +
 8 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c
index 0f44635725..148e294202 100644
--- a/src/glx/dri2_glx.c
+++ b/src/glx/dri2_glx.c
@@ -263,6 +263,10 @@ dri2_create_context_attribs(struct glx_screen *base,
  , , , error))
   goto error_exit;
 
+   if (!dri2_check_no_error(flags, shareList, major_ver, error)) {
+  goto error_exit;
+   }
+
/* Check the renderType value */
if (!validate_renderType_against_config(config_base, renderType))
goto error_exit;
@@ -1165,6 +1169,14 @@ dri2BindExtensions(struct dri2_screen *psc, struct 
glx_display * priv,
  __glXEnableDirectExtension(>base,
 "GLX_ARB_create_context_robustness");
 
+  /* DRI2 version 3 is also required because
+   * GLX_ARB_create_context_no_error requires GLX_ARB_create_context.
+   */
+  if (psc->dri2->base.version >= 3
+  && strcmp(extensions[i]->name, __DRI2_NO_ERROR) == 0)
+ __glXEnableDirectExtension(>base,
+"GLX_ARB_create_context_no_error");
+
   /* DRI2 version 3 is also required because GLX_MESA_query_renderer
* requires GLX_ARB_create_context_profile.
*/
diff --git a/src/glx/dri3_glx.c b/src/glx/dri3_glx.c
index a10306fe32..54220ccca7 100644
--- a/src/glx/dri3_glx.c
+++ b/src/glx/dri3_glx.c
@@ -248,6 +248,10 @@ dri3_create_context_attribs(struct glx_screen *base,
  , , error))
   goto error_exit;
 
+   if (!dri2_check_no_error(flags, shareList, major_ver, error)) {
+  goto error_exit;
+   }
+
/* Check the renderType value */
if (!validate_renderType_against_config(config_base, render_type))
goto error_exit;
@@ -756,6 +760,10 @@ dri3_bind_extensions(struct dri3_screen *psc, struct 
glx_display * priv,
  __glXEnableDirectExtension(>base,
 "GLX_ARB_create_context_robustness");
 
+  if (strcmp(extensions[i]->name, __DRI2_NO_ERROR) == 0)
+ __glXEnableDirectExtension(>base,
+"GLX_ARB_create_context_no_error");
+
   if (strcmp(extensions[i]->name, __DRI2_RENDERER_QUERY) == 0) {
  psc->rendererQuery = (__DRI2rendererQueryExtension *) extensions[i];
  __glXEnableDirectExtension(>base, "GLX_MESA_query_renderer");
diff --git a/src/glx/dri_common.c b/src/glx/dri_common.c
index 3b82309fa2..4f9beb22b1 100644
--- a/src/glx/dri_common.c
+++ b/src/glx/dri_common.c
@@ -479,6 +479,7 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
uint32_t *attribs,
 {
unsigned i;
bool got_profile = false;
+   int no_error = 0;
uint32_t profile;
 
*major_ver = 1;
@@ -511,6 +512,9 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
uint32_t *attribs,
   case GLX_CONTEXT_FLAGS_ARB:
 *flags = attribs[i * 2 + 1];
 break;
+  case GLX_CONTEXT_OPENGL_NO_ERROR_ARB:
+no_error = attribs[i * 2 + 1];
+break;
   case GLX_CONTEXT_PROFILE_MASK_ARB:
 profile = attribs[i * 2 + 1];
 got_profile = true;
@@ -552,6 +556,10 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
uint32_t *attribs,
   }
}
 
+   if (no_error) {
+  *flags |= __DRI_CTX_FLAG_NO_ERROR;
+   }
+
if (!got_profile) {
   if (*major_ver > 3 || (*major_ver == 3 && *minor_ver >= 2))
 *api = __DRI_API_OPENGL_CORE;
@@ -592,7 +600,8 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
uint32_t *attribs,
/* Unknown flag value.
 */
if (*flags & ~(__DRI_CTX_FLAG_DEBUG | __DRI_CTX_FLAG_FORWARD_COMPATIBLE
-  | __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS)) {
+  | __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS
+  | __DRI_CTX_FLAG_NO_ERROR)) {
   *error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
   return false;
}
@@ -617,4 +626,45 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
uint32_t *attribs,
return true;
 }
 
+_X_HIDDEN bool
+dri2_check_no_error(uint32_t flags, struct glx_context *share_context,
+int major, unsigned *error)
+{
+   Bool noError = flags & __DRI_CTX_FLAG_NO_ERROR;
+
+   /* The KHR_no_error specs say:
+*
+*Requires OpenGL ES 2.0 or OpenGL 2.0.
+*/
+   if (noError && major < 2) {
+  *error = __DRI_CTX_ERROR_UNKNOWN_ATTRIBUTE;
+  return false;
+   }
+
+   /* The 

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-29 Thread Rob Clark
On Wed, Nov 29, 2017 at 12:33 PM, Jason Ekstrand  wrote:
> On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:
>>
>> On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
>> wrote:
>> > On November 24, 2017 09:29:43 Rob Clark  wrote:
>> >>
>> >>
>> >> On Mon, Nov 20, 2017 at 8:11 PM, James Jones 
>> >> wrote:
>> >>>
>> >>> As many here know at this point, I've been working on solving issues
>> >>> related
>> >>> to DMA-capable memory allocation for various devices for some time
>> >>> now.
>> >>> I'd
>> >>> like to take this opportunity to apologize for the way I handled the
>> >>> EGL
>> >>> stream proposals.  I understand now that the development process
>> >>> followed
>> >>> there was unacceptable to the community and likely offended many great
>> >>> engineers.
>> >>>
>> >>> Moving forward, I attempted to reboot talks in a more constructive
>> >>> manner
>> >>> with the generic allocator library proposals & discussion forum at XDC
>> >>> 2016.
>> >>> Some great design ideas came out of that, and I've since been
>> >>> prototyping
>> >>> some code to prove them out before bringing them back as official
>> >>> proposals.
>> >>> Again, I understand some people are growing concerned that I've been
>> >>> doing
>> >>> this off on the side in a github project that has primarily NVIDIA
>> >>> contributors.  My goal was only to avoid wasting everyone's time with
>> >>> unproven ideas.  The intent was never to dump the prototype code as-is
>> >>> on
>> >>> the community and presume acceptance. It's just a public research
>> >>> project.
>> >>>
>> >>> Now the prototyping is nearing completion, and I'd like to renew
>> >>> discussion
>> >>> on whether and how the new mechanisms can be integrated with the Linux
>> >>> graphics stack.
>> >>>
>> >>> I'd be interested to know if more work is needed to demonstrate the
>> >>> usefulness of the new mechanisms, or whether people think they have
>> >>> value
>> >>> at
>> >>> this point.
>> >>>
>> >>> After talking with people on the hallway track at XDC this year, I've
>> >>> heard
>> >>> several proposals for incorporating the new mechanisms:
>> >>>
>> >>> -Include ideas from the generic allocator design into GBM.  This could
>> >>> take
>> >>> the form of designing a "GBM 2.0" API, or incrementally adding to the
>> >>> existing GBM API.
>> >>>
>> >>> -Develop a library to replace GBM.  The allocator prototype code could
>> >>> be
>> >>> massaged into something production worthy to jump start this process.
>> >>>
>> >>> -Develop a library that sits beside or on top of GBM, using GBM for
>> >>> low-level graphics buffer allocation, while supporting non-graphics
>> >>> kernel
>> >>> APIs directly.  The additional cross-device negotiation and sorting of
>> >>> capabilities would be handled in this slightly higher-level API before
>> >>> handing off to GBM and other APIs for actual allocation somehow.
>> >>
>> >>
>> >> tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
>> >> still the "winsys" for running on "bare metal" (ie. kms).  And we
>> >> don't want to saddle $new_thing with aspects of that, but rather have
>> >> it focus on being the thing that in multiple-"device"[1] scenarious
>> >> figures out what sort of buffer can be allocated by who for sharing.
>> >> Ie $new_thing should really not care about winsys level things like
>> >> cursors or surfaces.. only buffers.
>> >>
>> >> The mesa implementation of $new_thing could sit on top of GBM,
>> >> although it could also just sit on top of the same internal APIs that
>> >> GBM sits on top of.  That is an implementation detail.  It could be
>> >> that GBM grows an API to return an instance of $new_thing for
>> >> use-cases that involve sharing a buffer with the GPU.  Or perhaps that
>> >> is exposed via some sort of EGL extension.  (We probably also need a
>> >> way to get an instance from libdrm (?) for display-only KMS drivers,
>> >> to cover cases like etnaviv sharing a buffer with a separate display
>> >> driver.)
>> >>
>> >> [1] where "devices" could be multiple GPUs or multiple APIs for one or
>> >> more GPUs, but also includes non-GPU devices like camera, video
>> >> decoder, "image processor" (which may or may not be part of camera),
>> >> etc, etc
>> >
>> >
>> > I'm not quite some sure what I think about this.  I think I would like
>> > to
>> > see $new_thing at least replace the guts of GBM. Whether GBM becomes a
>> > wrapper around $new_thing or $new_thing implements the GBM API, I'm not
>> > sure.  What I don't think I want is to see GBM development continuing on
>> > it's own so we have two competing solutions.
>>
>> I don't really view them as competing.. there is *some* overlap, ie.
>> allocating a buffer.. but even if you are using GBM w/out $new_thing
>> you could allocate a buffer externally and import it.  I don't see
>> $new_thing as that much different from GBM PoV.
>>
>> But things 

Re: [Mesa-dev] [PATCH 1/5] xlib: remove empty GLX_NV_vertex_array_range stubs

2017-11-29 Thread Adam Jackson
On Wed, 2017-11-29 at 19:23 +, Emil Velikov wrote:
> From: Emil Velikov 
> 
> The extension was never implemented and seemingly never will.
> The DRI based libGL dropped support for it over 10 years ago.

Series is:

Reviewed-by: Adam Jackson 

There's some stubs in src/glx/glxcmds.c that can go in the bin too.

- ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 103732] [swr] often gets stuck in piglit's glx-multi-context-single-window test

2017-11-29 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103732

--- Comment #10 from Bruce Cherniak  ---
On Nov 29, 2017, at 7:50 AM,
bugzilla-dae...@freedesktop.org wrote:

Comment # 9 on bug
103732 from Andrés Gómez
García

(In reply to Bruce Cherniak from comment
#8)
> The root cause to this bug was fixed in a post-17.2 patch (b9aa0fa7) "swr:
> Handle resource across context changes".  It's in mesa master and the
> forthcoming 17.3.
>
> The test still fails occasionally, but does not get stuck.

Wow! That was quick!

Thanks a lot, Bruce, should we mark as "ALREADYFIXED" or rename for the
occasional failure?

Also, should we pick b9aa0fa7 for the 17.2 stable queue? It seems to apply
clean ...

Yes, I do believe this is a good candidate for picking to the 17.2 stable
queue.  What do I need to do to enable that?

Thanks,
Bruce



You are receiving this mail because:

  *   You are the assignee for the bug.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 06/32] anv: Implement VK_EXT_external_memory_dma_buf

2017-11-29 Thread Chad Versace
On Tue 28 Nov 2017, Jason Ekstrand wrote:
> This is a modified version of the patch originally sent by Chad Versace.
> The primary difference is that this version claims that OPQAUE_FD and
> DMA_BUF are compatible handle types.
> ---
>  src/intel/vulkan/anv_device.c  | 13 ++---
>  src/intel/vulkan/anv_extensions.py |  1 +
>  src/intel/vulkan/anv_formats.c |  8 ++--
>  3 files changed, 13 insertions(+), 9 deletions(-)



> @@ -923,6 +925,7 @@ VkResult anv_GetPhysicalDeviceImageFormatProperties2KHR(
> if (external_info && external_info->handleType != 0) {
>switch (external_info->handleType) {
>case VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHR:
> +  case VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT:

I reflexively perceive this hunk as incorrect. My instinct screams "No!
We we must allow binding VkImage to dma-buf-backed VkDeviceMemory *only*
when the VkImage is simple.". Of course, all the discussions around
VK_EXT_queue_family_foreign changed that requirement. It will take some
time to re-orient my intuition here.

>   if (external_props)
>  external_props->externalMemoryProperties = prime_fd_props;
>   break;
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] xlib: remove empty GLX_NV_vertex_array_range stubs

2017-11-29 Thread Brian Paul

Series looks OK to me.

Reviewed-by: Brian Paul 

On 11/29/2017 12:23 PM, Emil Velikov wrote:

From: Emil Velikov 

The extension was never implemented and seemingly never will.
The DRI based libGL dropped support for it over 10 years ago.

Cc: Brian Paul 
Cc: Ian Romanick 
Signed-off-by: Emil Velikov 
---
  src/mesa/drivers/x11/fakeglx.c | 26 --
  src/mesa/drivers/x11/glxapi.c  | 35 ---
  src/mesa/drivers/x11/glxapi.h  |  7 ---
  3 files changed, 68 deletions(-)

diff --git a/src/mesa/drivers/x11/fakeglx.c b/src/mesa/drivers/x11/fakeglx.c
index d2a099f9a20..8902a7cd667 100644
--- a/src/mesa/drivers/x11/fakeglx.c
+++ b/src/mesa/drivers/x11/fakeglx.c
@@ -2759,28 +2759,6 @@ Fake_glXSet3DfxModeMESA( int mode )



-/*** GLX_NV_vertex_array range ***/
-static void *
-Fake_glXAllocateMemoryNV( GLsizei size,
-  GLfloat readFrequency,
-  GLfloat writeFrequency,
-  GLfloat priority )
-{
-   (void) size;
-   (void) readFrequency;
-   (void) writeFrequency;
-   (void) priority;
-   return NULL;
-}
-
-
-static void
-Fake_glXFreeMemoryNV( GLvoid *pointer )
-{
-   (void) pointer;
-}
-
-
  /*** GLX_MESA_agp_offset ***/

  static GLuint
@@ -3009,10 +2987,6 @@ _mesa_GetGLXDispatchTable(void)
 /*** GLX_MESA_set_3dfx_mode ***/
 glx.Set3DfxModeMESA = Fake_glXSet3DfxModeMESA;

-   /*** GLX_NV_vertex_array_range ***/
-   glx.AllocateMemoryNV = Fake_glXAllocateMemoryNV;
-   glx.FreeMemoryNV = Fake_glXFreeMemoryNV;
-
 /*** GLX_MESA_agp_offset ***/
 glx.GetAGPOffsetMESA = Fake_glXGetAGPOffsetMESA;

diff --git a/src/mesa/drivers/x11/glxapi.c b/src/mesa/drivers/x11/glxapi.c
index 52e60265697..ff8b2b2ce16 100644
--- a/src/mesa/drivers/x11/glxapi.c
+++ b/src/mesa/drivers/x11/glxapi.c
@@ -1019,37 +1019,6 @@ glXSet3DfxModeMESA(int mode)



-/*** GLX_NV_vertex_array_range ***/
-
-void PUBLIC *
-glXAllocateMemoryNV( GLsizei size,
- GLfloat readFrequency,
- GLfloat writeFrequency,
- GLfloat priority )
-{
-   struct _glxapi_table *t;
-   Display *dpy = glXGetCurrentDisplay();
-   GET_DISPATCH(dpy, t);
-   if (!t)
-  return NULL;
-   return t->AllocateMemoryNV(size, readFrequency, writeFrequency, priority);
-}
-
-
-void PUBLIC
-glXFreeMemoryNV( GLvoid *pointer )
-{
-   struct _glxapi_table *t;
-   Display *dpy = glXGetCurrentDisplay();
-   GET_DISPATCH(dpy, t);
-   if (!t)
-  return;
-   t->FreeMemoryNV(pointer);
-}
-
-
-
-
  /*** GLX_MESA_agp_offset */

  GLuint PUBLIC
@@ -1288,10 +1257,6 @@ static struct name_address_pair GLX_functions[] = {
 /*** GLX_ARB_get_proc_address ***/
 { "glXGetProcAddressARB", (__GLXextFuncPtr) glXGetProcAddressARB },

-   /*** GLX_NV_vertex_array_range ***/
-   { "glXAllocateMemoryNV", (__GLXextFuncPtr) glXAllocateMemoryNV },
-   { "glXFreeMemoryNV", (__GLXextFuncPtr) glXFreeMemoryNV },
-
 /*** GLX_MESA_agp_offset ***/
 { "glXGetAGPOffsetMESA", (__GLXextFuncPtr) glXGetAGPOffsetMESA },

diff --git a/src/mesa/drivers/x11/glxapi.h b/src/mesa/drivers/x11/glxapi.h
index cc4f902925b..a4930b10dca 100644
--- a/src/mesa/drivers/x11/glxapi.h
+++ b/src/mesa/drivers/x11/glxapi.h
@@ -186,13 +186,6 @@ struct _glxapi_table {
 /*** GLX_MESA_set_3dfx_mode ***/
 Bool (*Set3DfxModeMESA)(int mode);

-   /*** GLX_NV_vertex_array_range ***/
-   void * (*AllocateMemoryNV)( GLsizei size,
-   GLfloat readFrequency,
-   GLfloat writeFrequency,
-   GLfloat priority );
-   void (*FreeMemoryNV)( GLvoid *pointer );
-
 /*** GLX_MESA_agp_offset ***/
 GLuint (*GetAGPOffsetMESA)( const GLvoid *pointer );




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] configure: avoid testing for negative compiler options

2017-11-29 Thread Matt Turner
On Wed, Nov 29, 2017 at 5:47 AM, Marc Dietrich  wrote:
> gcc seems to always accept unsupported negative compiler warning options:
>
> echo "int i;" | gcc -c -xc -Wno-bob - # no error
> echo "int i;" | gcc -c -xc -Walice -  # unsupported compiler option
>
> Inverting the options fixes the tests.

Thanks for figuring this out.

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/vec4: use a temp register to compute offsets for pull loads

2017-11-29 Thread Matt Turner
On Wed, Nov 29, 2017 at 2:49 AM, Iago Toral Quiroga  wrote:
> 64-bit pull loads are implemented by emitting 2 separate
> 32-bit pull load messages, where the second message loads from
> an offset at +16B.
>
> That addition of 16B to the original offset should not alter the
> original offset register used as source for the pull load instruction
> though, since the compiler might use that same offset register in other
> instructions (for example, for other pull loads in the shader code
> that take that same offset as reference).
>
> If the pull load is 32-bit then we only need to emit one message and
> we don't need to do offset calculations, but in that case the optimizer
> should be able to drop the redundant MOV.
>
> Fixes the following test on Haswell:
> KHR-GL45.gpu_shader_fp64.fp64.max_uniform_components
> ---
>  src/intel/compiler/brw_vec4_nir.cpp | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_vec4_nir.cpp 
> b/src/intel/compiler/brw_vec4_nir.cpp
> index 0a1caa9fad..84f5b37a9d 100644
> --- a/src/intel/compiler/brw_vec4_nir.cpp
> +++ b/src/intel/compiler/brw_vec4_nir.cpp
> @@ -888,7 +888,9 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr 
> *instr)
>if (const_offset) {
>   offset_reg = brw_imm_ud(const_offset->u32[0] & ~15);
>} else {
> - offset_reg = get_nir_src(instr->src[1], nir_type_uint32, 1);
> + offset_reg = src_reg(this, glsl_type::uint_type);
> + emit(MOV(dst_reg(offset_reg),
> +  get_nir_src(instr->src[1], nir_type_uint32, 1)));
>}
>

Nice find!

Reviewed-by: Matt Turner 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103007
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/29] anv/cmd_buffer: Recurse in transition_color_buffer instead of falling through

2017-11-29 Thread Jason Ekstrand
On Wed, Nov 29, 2017 at 12:01 PM, Jason Ekstrand 
wrote:

> On Wed, Nov 29, 2017 at 11:57 AM, Pohjolainen, Topi <
> topi.pohjolai...@gmail.com> wrote:
>
>> On Mon, Nov 27, 2017 at 07:05:58PM -0800, Jason Ekstrand wrote:
>> > ---
>> >  src/intel/vulkan/genX_cmd_buffer.c | 17 -
>> >  1 file changed, 8 insertions(+), 9 deletions(-)
>> >
>> > diff --git a/src/intel/vulkan/genX_cmd_buffer.c
>> b/src/intel/vulkan/genX_cmd_buffer.c
>> > index 0c1ae83..be717eb 100644
>> > --- a/src/intel/vulkan/genX_cmd_buffer.c
>> > +++ b/src/intel/vulkan/genX_cmd_buffer.c
>> > @@ -719,20 +719,19 @@ transition_color_buffer(struct anv_cmd_buffer
>> *cmd_buffer,
>> >if (image->samples == 1 &&
>> >image->planes[plane].aux_usage != ISL_AUX_USAGE_CCS_E &&
>> >final_layout != VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL) {
>> > - /* The CCS_D buffer may not be enabled in the final layout.
>> Continue
>> > -  * executing this function to perform a resolve.
>> > + /* The CCS_D buffer may not be enabled in the final layout.
>> Call this
>> > +  * function again with a initial layout of
>> COLOR_ATTACHMENT_OPTIMAL
>> > +  * to perform a resolve.
>> >*/
>> >anv_perf_warn(cmd_buffer->device->instance, image,
>> >  "Performing an additional resolve for CCS_D
>> layout "
>> >  "transition. Consider always leaving it on or "
>> >  "performing an ambiguation pass.");
>> > -  } else {
>> > - /* Writes in the final layout will be aware of the auxiliary
>> buffer.
>> > -  * In addition, the clear buffer entries and the auxiliary
>> buffers
>> > -  * have been populated with values that will result in correct
>> > -  * rendering.
>> > -  */
>> > - return;
>>
>> I must be missing something here. This now calls transition_color_buffer()
>> again also for the case that doesn't need resolves and after return goes
>> and falls thru and does resolves.
>>
>
> Yikes!  You're not missing anything.  I'm missing a return statement.
>

Upon further inspection, it appears to get added in the next patch.  I've
moved it to this one.


> > + transition_color_buffer(cmd_buffer, image, aspect,
>
>> > + base_level, level_count,
>> > + base_layer, layer_count,
>> > + VK_IMAGE_LAYOUT_COLOR_ATTACHM
>> ENT_OPTIMAL,
>> > + final_layout);
>> >}
>> > } else if (initial_layout != VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL)
>> {
>> >/* Resolves are only necessary if the subresource may contain
>> blocks
>> > --
>> > 2.5.0.400.gff86faf
>> >
>> > ___
>> > mesa-dev mailing list
>> > mesa-dev@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/29] anv/cmd_buffer: Recurse in transition_color_buffer instead of falling through

2017-11-29 Thread Jason Ekstrand
On Wed, Nov 29, 2017 at 11:57 AM, Pohjolainen, Topi <
topi.pohjolai...@gmail.com> wrote:

> On Mon, Nov 27, 2017 at 07:05:58PM -0800, Jason Ekstrand wrote:
> > ---
> >  src/intel/vulkan/genX_cmd_buffer.c | 17 -
> >  1 file changed, 8 insertions(+), 9 deletions(-)
> >
> > diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> b/src/intel/vulkan/genX_cmd_buffer.c
> > index 0c1ae83..be717eb 100644
> > --- a/src/intel/vulkan/genX_cmd_buffer.c
> > +++ b/src/intel/vulkan/genX_cmd_buffer.c
> > @@ -719,20 +719,19 @@ transition_color_buffer(struct anv_cmd_buffer
> *cmd_buffer,
> >if (image->samples == 1 &&
> >image->planes[plane].aux_usage != ISL_AUX_USAGE_CCS_E &&
> >final_layout != VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL) {
> > - /* The CCS_D buffer may not be enabled in the final layout.
> Continue
> > -  * executing this function to perform a resolve.
> > + /* The CCS_D buffer may not be enabled in the final layout.
> Call this
> > +  * function again with a initial layout of
> COLOR_ATTACHMENT_OPTIMAL
> > +  * to perform a resolve.
> >*/
> >anv_perf_warn(cmd_buffer->device->instance, image,
> >  "Performing an additional resolve for CCS_D
> layout "
> >  "transition. Consider always leaving it on or "
> >  "performing an ambiguation pass.");
> > -  } else {
> > - /* Writes in the final layout will be aware of the auxiliary
> buffer.
> > -  * In addition, the clear buffer entries and the auxiliary
> buffers
> > -  * have been populated with values that will result in correct
> > -  * rendering.
> > -  */
> > - return;
>
> I must be missing something here. This now calls transition_color_buffer()
> again also for the case that doesn't need resolves and after return goes
> and falls thru and does resolves.
>

Yikes!  You're not missing anything.  I'm missing a return statement.


> > + transition_color_buffer(cmd_buffer, image, aspect,
> > + base_level, level_count,
> > + base_layer, layer_count,
> > + VK_IMAGE_LAYOUT_COLOR_
> ATTACHMENT_OPTIMAL,
> > + final_layout);
> >}
> > } else if (initial_layout != VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL)
> {
> >/* Resolves are only necessary if the subresource may contain
> blocks
> > --
> > 2.5.0.400.gff86faf
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/29] anv/cmd_buffer: Recurse in transition_color_buffer instead of falling through

2017-11-29 Thread Pohjolainen, Topi
On Mon, Nov 27, 2017 at 07:05:58PM -0800, Jason Ekstrand wrote:
> ---
>  src/intel/vulkan/genX_cmd_buffer.c | 17 -
>  1 file changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
> b/src/intel/vulkan/genX_cmd_buffer.c
> index 0c1ae83..be717eb 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -719,20 +719,19 @@ transition_color_buffer(struct anv_cmd_buffer 
> *cmd_buffer,
>if (image->samples == 1 &&
>image->planes[plane].aux_usage != ISL_AUX_USAGE_CCS_E &&
>final_layout != VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL) {
> - /* The CCS_D buffer may not be enabled in the final layout. Continue
> -  * executing this function to perform a resolve.
> + /* The CCS_D buffer may not be enabled in the final layout. Call 
> this
> +  * function again with a initial layout of COLOR_ATTACHMENT_OPTIMAL
> +  * to perform a resolve.
>*/
>anv_perf_warn(cmd_buffer->device->instance, image,
>  "Performing an additional resolve for CCS_D layout "
>  "transition. Consider always leaving it on or "
>  "performing an ambiguation pass.");
> -  } else {
> - /* Writes in the final layout will be aware of the auxiliary buffer.
> -  * In addition, the clear buffer entries and the auxiliary buffers
> -  * have been populated with values that will result in correct
> -  * rendering.
> -  */
> - return;

I must be missing something here. This now calls transition_color_buffer()
again also for the case that doesn't need resolves and after return goes
and falls thru and does resolves.

> + transition_color_buffer(cmd_buffer, image, aspect,
> + base_level, level_count,
> + base_layer, layer_count,
> + VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
> + final_layout);
>}
> } else if (initial_layout != VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL) {
>/* Resolves are only necessary if the subresource may contain blocks
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] configure: avoid testing for negative compiler options

2017-11-29 Thread Dylan Baker
Quoting Marc Dietrich (2017-11-29 05:47:55)
> gcc seems to always accept unsupported negative compiler warning options:
> 
> echo "int i;" | gcc -c -xc -Wno-bob - # no error
> echo "int i;" | gcc -c -xc -Walice -  # unsupported compiler option
> 
> Inverting the options fixes the tests.
> 
> Signed-off-by: Marc Dietrich 
> ---
>  configure.ac | 4 ++--
>  meson.build  | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/configure.ac b/configure.ac
> index 1344c12884..c025d9c766 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -394,8 +394,8 @@ esac
>  AC_SUBST([VISIBILITY_CFLAGS])
>  AC_SUBST([VISIBILITY_CXXFLAGS])
>  
> -AX_CHECK_COMPILE_FLAG([-Wno-override-init],
> [WNO_OVERRIDE_INIT="$WNO_OVERRIDE_INIT -Wno-override-init"]) # gcc
> -AX_CHECK_COMPILE_FLAG([-Wno-initializer-overrides],
> [WNO_OVERRIDE_INIT="$WNO_OVERRIDE_INIT -Wno-initializer-overrides"]) # clang
> +AX_CHECK_COMPILE_FLAG([-Woverride-init],
> [WNO_OVERRIDE_INIT="$WNO_OVERRIDE_INIT -Wno-override-init"]) # gcc
> +AX_CHECK_COMPILE_FLAG([-Winitializer-overrides],
> [WNO_OVERRIDE_INIT="$WNO_OVERRIDE_INIT -Wno-initializer-overrides"]) # clang
>  AC_SUBST([WNO_OVERRIDE_INIT])

This is correct I think.

>  
>  dnl
> diff --git a/meson.build b/meson.build
> index bba9a292aa..e69ef6a14b 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -683,7 +683,7 @@ endif
>  cpp = meson.get_compiler('cpp')
>  cpp_args = []
>  foreach a : ['-Wall', '-fno-math-errno', '-fno-trapping-math',
> - '-Qunused-arguments', '-Wno-non-virtual-dtor']
> + '-Qunused-arguments', '-Wnon-virtual-dtor']
>if cpp.has_argument(a)
>  cpp_args += a
>endif

This inverts the meaning of the test, instead remove it from the list and do
something like:

if cpp.has_argument('-Wnon-virtual-dtor')
  cpp_args += '-Wno-non-virtual-dtor')
endif

> @@ -708,7 +708,7 @@ foreach a : ['-Werror=pointer-arith', '-Werror=vla']
>  endforeach
>  
>  no_override_init_args = []
> -foreach a : ['-Wno-override-init', '-Wno-initializer-overrides']
> +foreach a : ['-Woverride-init', '-Winitializer-overrides']

Same here, but you can do something like:

foreach a : ['override-init', 'initializer-overrides']
  if cc.has_argument('-W0@0'.format(a))
no_override_init_args += '-Wno-@0@'.format(a)
  endif
endforeach

Please be sure to add a comment explaining why this is necessary, preferably in
both places in meson and in configure.ac

And thank you for looking into this. The warnings about those arguments were
annoying me to no end.

Dylan


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-29 Thread Miguel Angel Vico
Many of you may already know, but James is going to be out for a few
weeks and I'll be taking over this in the meantime.

See inline for comments.

On Wed, 29 Nov 2017 09:33:29 -0800
Jason Ekstrand  wrote:

> On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:
> 
> > On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
> > wrote:  
> > > On November 24, 2017 09:29:43 Rob Clark  wrote:  
> > >>
> > >>
> > >> On Mon, Nov 20, 2017 at 8:11 PM, James Jones   
> > wrote:  
> > >>>
> > >>> As many here know at this point, I've been working on solving issues
> > >>> related
> > >>> to DMA-capable memory allocation for various devices for some time now.
> > >>> I'd
> > >>> like to take this opportunity to apologize for the way I handled the  
> > EGL  
> > >>> stream proposals.  I understand now that the development process  
> > followed  
> > >>> there was unacceptable to the community and likely offended many great
> > >>> engineers.
> > >>>
> > >>> Moving forward, I attempted to reboot talks in a more constructive  
> > manner  
> > >>> with the generic allocator library proposals & discussion forum at XDC
> > >>> 2016.
> > >>> Some great design ideas came out of that, and I've since been  
> > prototyping  
> > >>> some code to prove them out before bringing them back as official
> > >>> proposals.
> > >>> Again, I understand some people are growing concerned that I've been
> > >>> doing
> > >>> this off on the side in a github project that has primarily NVIDIA
> > >>> contributors.  My goal was only to avoid wasting everyone's time with
> > >>> unproven ideas.  The intent was never to dump the prototype code as-is  
> > on  
> > >>> the community and presume acceptance. It's just a public research
> > >>> project.
> > >>>
> > >>> Now the prototyping is nearing completion, and I'd like to renew
> > >>> discussion
> > >>> on whether and how the new mechanisms can be integrated with the Linux
> > >>> graphics stack.
> > >>>
> > >>> I'd be interested to know if more work is needed to demonstrate the
> > >>> usefulness of the new mechanisms, or whether people think they have  
> > value  
> > >>> at
> > >>> this point.
> > >>>
> > >>> After talking with people on the hallway track at XDC this year, I've
> > >>> heard
> > >>> several proposals for incorporating the new mechanisms:
> > >>>
> > >>> -Include ideas from the generic allocator design into GBM.  This could
> > >>> take
> > >>> the form of designing a "GBM 2.0" API, or incrementally adding to the
> > >>> existing GBM API.
> > >>>
> > >>> -Develop a library to replace GBM.  The allocator prototype code could  
> > be  
> > >>> massaged into something production worthy to jump start this process.
> > >>>
> > >>> -Develop a library that sits beside or on top of GBM, using GBM for
> > >>> low-level graphics buffer allocation, while supporting non-graphics
> > >>> kernel
> > >>> APIs directly.  The additional cross-device negotiation and sorting of
> > >>> capabilities would be handled in this slightly higher-level API before
> > >>> handing off to GBM and other APIs for actual allocation somehow.  
> > >>
> > >>
> > >> tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
> > >> still the "winsys" for running on "bare metal" (ie. kms).  And we
> > >> don't want to saddle $new_thing with aspects of that, but rather have
> > >> it focus on being the thing that in multiple-"device"[1] scenarious
> > >> figures out what sort of buffer can be allocated by who for sharing.
> > >> Ie $new_thing should really not care about winsys level things like
> > >> cursors or surfaces.. only buffers.
> > >>
> > >> The mesa implementation of $new_thing could sit on top of GBM,
> > >> although it could also just sit on top of the same internal APIs that
> > >> GBM sits on top of.  That is an implementation detail.  It could be
> > >> that GBM grows an API to return an instance of $new_thing for
> > >> use-cases that involve sharing a buffer with the GPU.  Or perhaps that
> > >> is exposed via some sort of EGL extension.  (We probably also need a
> > >> way to get an instance from libdrm (?) for display-only KMS drivers,
> > >> to cover cases like etnaviv sharing a buffer with a separate display
> > >> driver.)
> > >>
> > >> [1] where "devices" could be multiple GPUs or multiple APIs for one or
> > >> more GPUs, but also includes non-GPU devices like camera, video
> > >> decoder, "image processor" (which may or may not be part of camera),
> > >> etc, etc  
> > >
> > >
> > > I'm not quite some sure what I think about this.  I think I would like to
> > > see $new_thing at least replace the guts of GBM. Whether GBM becomes a
> > > wrapper around $new_thing or $new_thing implements the GBM API, I'm not
> > > sure.  What I don't think I want is to see GBM development continuing on
> > > it's own so we have two competing solutions.  
> >
> > I don't really view them as 

  1   2   >