Re: [Mesa-dev] [PATCH] egl/wayland: break double/tripple buffering feedback loops

2019-01-15 Thread Pekka Paalanen
On Tue, 18 Dec 2018 18:59:10 +0100
Lucas Stach  wrote:

> Am Dienstag, den 18.12.2018, 17:43 + schrieb Emil Velikov:
> > > On Tue, 18 Dec 2018 at 11:16, Lucas Stach  wrote:
> > > 
> > > Currently we dispose any unneeded color buffers immediately if we detect 
> > > that
> > > there are more unlocked buffers than we need. This can lead to feedback 
> > > loops
> > > between the compositor and the application causing rapid toggling between
> > > double and tripple buffering. Scenario: 2 buffers already qeued to the
> > > compositor, egl/wayland allocates a new back buffer to avoid trottling,
> > > slowing down the frame, this allows the compositor to catch up and unlock
> > > both buffers, then EGL detects that there are more buffers than currently
> > > need, freeing the buffer, restartig the loop shortly after.
> > > 
> > > To avoid wasting CPU time on rapidly freeing and reallocating color 
> > > buffers
> > > break those feedback loops by letting the unneeded buffers sit around for 
> > > a
> > > short while before disposing them.
> > >   
> > > > > Signed-off-by: Lucas Stach   
> > > ---
> > >  src/egl/drivers/dri2/platform_wayland.c | 13 +
> > >  1 file changed, 9 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/src/egl/drivers/dri2/platform_wayland.c 
> > > b/src/egl/drivers/dri2/platform_wayland.c
> > > index 34e09d7ec167..3fa08c1639d1 100644
> > > --- a/src/egl/drivers/dri2/platform_wayland.c
> > > +++ b/src/egl/drivers/dri2/platform_wayland.c

...

> > >    if (dri2_surf->back)
> > > @@ -615,10 +617,13 @@ update_buffers(struct dri2_egl_surface *dri2_surf)
> > > 
> > > /* If we have an extra unlocked buffer at this point, we had to do 
> > > triple
> > >  * buffering for a while, but now can go back to just double 
> > > buffering.
> > > -* That means we can free any unlocked buffer now. */
> > > +* That means we can free any unlocked buffer now. To avoid toggling 
> > > between
> > > +* going back to double buffering and needing to allocate third 
> > > buffer too
> > > +* fast we let the unneeded buffer sit around for a short while. */
> > > for (int i = 0; i < ARRAY_SIZE(dri2_surf->color_buffers); i++) {
> > >    if (!dri2_surf->color_buffers[i].locked &&
> > > -  dri2_surf->color_buffers[i].wl_buffer) {
> > > +  dri2_surf->color_buffers[i].wl_buffer &&
> > > +  dri2_surf->color_buffers[i].age > 18) {  
> > 
> > The age check here seems strange - both number used and it's relation
> > to double/triple buffering.
> > Have you considered tracking/checking how many buffers we have?  
> 
> A hysteresis value of 18 is just something that worked well in
> practice. It didn't appear to defer the buffer destruction for too long
>  while keeping the feedback loop well under control.

Hi,

it would be really nice if there was a code comment explaining where
the magic 18 comes from. There is no way to deduce it from the code
alone.


Thanks,
pq


pgpExDUSlaVvu.pgp
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl/wayland: break double/tripple buffering feedback loops

2019-01-15 Thread Emil Velikov
On Tue, 18 Dec 2018 at 17:59, Lucas Stach  wrote:
>
> Am Dienstag, den 18.12.2018, 17:43 + schrieb Emil Velikov:
> > > On Tue, 18 Dec 2018 at 11:16, Lucas Stach  wrote:
> > >
> > > Currently we dispose any unneeded color buffers immediately if we detect 
> > > that
> > > there are more unlocked buffers than we need. This can lead to feedback 
> > > loops
> > > between the compositor and the application causing rapid toggling between
> > > double and tripple buffering. Scenario: 2 buffers already qeued to the
> > > compositor, egl/wayland allocates a new back buffer to avoid trottling,
> > > slowing down the frame, this allows the compositor to catch up and unlock
> > > both buffers, then EGL detects that there are more buffers than currently
> > > need, freeing the buffer, restartig the loop shortly after.
> > >
> > > To avoid wasting CPU time on rapidly freeing and reallocating color 
> > > buffers
> > > break those feedback loops by letting the unneeded buffers sit around for 
> > > a
> > > short while before disposing them.
> > >
> > > > > Signed-off-by: Lucas Stach 
> > > ---
> > >  src/egl/drivers/dri2/platform_wayland.c | 13 +
> > >  1 file changed, 9 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/src/egl/drivers/dri2/platform_wayland.c 
> > > b/src/egl/drivers/dri2/platform_wayland.c
> > > index 34e09d7ec167..3fa08c1639d1 100644
> > > --- a/src/egl/drivers/dri2/platform_wayland.c
> > > +++ b/src/egl/drivers/dri2/platform_wayland.c
> > > @@ -474,15 +474,17 @@ get_back_bo(struct dri2_egl_surface *dri2_surf)
> > > wl_display_dispatch_queue_pending(dri2_dpy->wl_dpy, 
> > > dri2_surf->wl_queue);
> > >
> > > while (dri2_surf->back == NULL) {
> > > +  int age = 0;
> > >for (int i = 0; i < ARRAY_SIZE(dri2_surf->color_buffers); i++) {
> > >   /* Get an unlocked buffer, preferably one with a dri_buffer
> > >* already allocated. */
> > > - if (dri2_surf->color_buffers[i].locked)
> > > + if (dri2_surf->color_buffers[i].locked || 
> > > dri2_surf->color_buffers[i].age < age)
> > >  continue;
> > >   if (dri2_surf->back == NULL)
> > >  dri2_surf->back = _surf->color_buffers[i];
> > > - else if (dri2_surf->back->dri_image == NULL)
> > > + else if (dri2_surf->back->dri_image == NULL && 
> > > dri2_surf->color_buffers[i].dri_image)
> > >  dri2_surf->back = _surf->color_buffers[i];
> > > + age = dri2_surf->back->age;
> > >}
> > >
> >
> > AFAICT this is the wayland equivalent of
> > 4f1d27a406478d405eac6f9894ccc46a80034adb
> > Where the exact same logic/commit message applies.
>
> No it isn't. It's exactly what it says in the commit log. It's keeping
> the tripple buffer around for a bit, even if we don't strictly need it
> when the client is currently doing double buffering.
>
> When things are on the edge between double buffering being enough and
> sometimes a third buffer being needed to avoid stalling we would
> otherwise bounce rapidly between allocating and disposing the third
> buffer.
>
> The DRM platform has no such optimization and just keeps the third
> buffer around forever. This patch keeps the optimization in the Wayland
> platform, but adds a bit of hysteresis before disposing the buffer when
> going from tripple to double buffering to see if things are settling on
> double buffering.
>
Indeed, I misread things. Thanks for the correction.

> > Can you please split that up? I'd imagine this isn't enough to for the
> > usecase you had in mind?
> >
> > >if (dri2_surf->back)
> > > @@ -615,10 +617,13 @@ update_buffers(struct dri2_egl_surface *dri2_surf)
> > >
> > > /* If we have an extra unlocked buffer at this point, we had to do 
> > > triple
> > >  * buffering for a while, but now can go back to just double 
> > > buffering.
> > > -* That means we can free any unlocked buffer now. */
> > > +* That means we can free any unlocked buffer now. To avoid toggling 
> > > between
> > > +* going back to double buffering and needing to allocate third 
> > > buffer too
> > > +* fast we let the unneeded buffer sit around for a short while. */
> > > for (int i = 0; i < ARRAY_SIZE(dri2_surf->color_buffers); i++) {
> > >if (!dri2_surf->color_buffers[i].locked &&
> > > -  dri2_surf->color_buffers[i].wl_buffer) {
> > > +  dri2_surf->color_buffers[i].wl_buffer &&
> > > +  dri2_surf->color_buffers[i].age > 18) {
> >
> > The age check here seems strange - both number used and it's relation
> > to double/triple buffering.
> > Have you considered tracking/checking how many buffers we have?
>
> A hysteresis value of 18 is just something that worked well in
> practice. It didn't appear to defer the buffer destruction for too long
>  while keeping the feedback loop well under control.
>
As Pekka pointed out, please document how the number was chosen.
Otherwise any change could lead to a regression.

Thanks

Re: [Mesa-dev] [PATCH] egl/wayland: break double/tripple buffering feedback loops

2019-01-15 Thread Daniel Stone
Hi,

On Tue, 18 Dec 2018 at 17:59, Lucas Stach  wrote:
> Am Dienstag, den 18.12.2018, 17:43 + schrieb Emil Velikov:
> > > On Tue, 18 Dec 2018 at 11:16, Lucas Stach  wrote:
> > >   if (dri2_surf->back == NULL)
> > >  dri2_surf->back = _surf->color_buffers[i];
> > > - else if (dri2_surf->back->dri_image == NULL)
> > > + else if (dri2_surf->back->dri_image == NULL && 
> > > dri2_surf->color_buffers[i].dri_image)
> > >  dri2_surf->back = _surf->color_buffers[i];
> > > + age = dri2_surf->back->age;
> > >}
> > >
> >
> > AFAICT this is the wayland equivalent of
> > 4f1d27a406478d405eac6f9894ccc46a80034adb
> > Where the exact same logic/commit message applies.
>
> No it isn't. It's exactly what it says in the commit log. It's keeping
> the tripple buffer around for a bit, even if we don't strictly need it
> when the client is currently doing double buffering.

Right - the crucial part in Derek's GBM commit was removing the
'break' and adding the extra conditional on age.

Derek's patch stabilises the age of buffers handed back to the user,
by always returning the oldest available buffer. That slightly
pessimises rendering if there is a 'free' buffer in the queue: if four
buffers are allocated, then we will always return a buffer from three
flips ago, maybe meaning more rendering work. It means that, barring
the client holding on to one buffer for unexpectedly long, the age of
the oldest buffer in the queue will never be greater than the queue
depth.

This patch instead relies on unbalanced ages, where older buffers in
the queue are allowed to age far beyond the queue depth if not used
during normal rendering.

> When things are on the edge between double buffering being enough and
> sometimes a third buffer being needed to avoid stalling we would
> otherwise bounce rapidly between allocating and disposing the third
> buffer.
>
> The DRM platform has no such optimization and just keeps the third
> buffer around forever. This patch keeps the optimization in the Wayland
> platform, but adds a bit of hysteresis before disposing the buffer when
> going from tripple to double buffering to see if things are settling on
> double buffering.

Ideally we'd have globally optimal behaviour for both platforms, but
that doesn't really seem doable for now. I think this is a good
balance though. There will only be one GBM user at a time, so having
that allocate excessive buffers doesn't seem too bad, and the penalty
for doing so is your entire system stuttering as the compositor
becomes blocked. Given the general stability of compositors, if they
need a larger queue depth at some point, they are likely to need it
again in the near future.

Conversely, there may be a great many Wayland clients, and these
clients may bounce between overlay and GPU composition. Given that, it
seems reasonable to opportunistically free up buffers, to make sure we
have enough memory available across the system.

> > The age check here seems strange - both number used and it's relation
> > to double/triple buffering.
> > Have you considered tracking/checking how many buffers we have?
>
> A hysteresis value of 18 is just something that worked well in
> practice. It didn't appear to defer the buffer destruction for too long
>  while keeping the feedback loop well under control.

Yeah, having this #defined with a comment above it would be nice.

With that, this patch is:
Reviewed-by: Daniel Stone 

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-15 Thread Rob Clark
On Tue, Jan 15, 2019 at 1:02 AM Tapani Pälli  wrote:
>
>
>
> On 1/14/19 2:36 PM, Daniel Stone wrote:
> > Hi,
> >
> > On Fri, 11 Jan 2019 at 17:05, Jason Ekstrand  wrote:
> >>   5. There's no way with gitlab for Reviewed-by tags to get automatically 
> >> applied as part of the merging process.  This makes merging a bit more 
> >> manual than it needs to be but is really no worse than it was before.
> >
> > I'm still on the side of not seeing the value in them. Most of the
> > time when I go to pursue someone who reviewed a commit, I'll go to see
> > what came up in review anyway. Maybe someone had the same comment
> > which was found to be not applicable or otherwise explained away.
> > Reviewed-by and Acked-by are also pretty lossy anyway, and freeform
> > text descriptors in a comment can much better capture the intent (e.g.
> > 'I'm strongly OK with the driver changes and weakly OK with the core
> > changes as it's not really my area of expertise').
> >
> > In other projects, we looked for ways to apply the tags and ended up
> > concluding that they didn't bring enough value to make it worthwhile.
> > I don't know if that holds for Mesa, but it would be better to start
> > with an actual problem statement - what value does R-b bring and how?
> > - then look at ways to solve that problem, rather than just very
> > directly finding a way to insert that literal text string into every
> > commit message.
>
> IMO it brings some 'shared responsibility' for correctness of the patch
> and quickly accessible information on who were looking at the change. So
> ideally later when filing bug against commit/series there would be more
> people than just the committer that should take a look at the possible
> regressions. At least in my experience people filing bugs tend to often
> also CC the reviewer.

+1 .. and also it is nice to see things like Reported-by/Reviewed-by
without having to go search somewhere else (ie. outside of git/tig)

(ofc it would be pretty awesome incentive to switch to gitlab issues
if gitlab could automate adding Reported-by tags for MR's associated
with an issue.. but I guess checkbox to add Reviewed-by tag would
already make my day)

BR,
-R

> > FWIW, if you go to
> > https://gitlab.freedesktop.org/mesa/mesa/commit/SHA1 then you get a
> > hyperlink from the web UI which points you to the MR. The API to do
> > this is pretty straightforward and amenable to piping through jq:
> > https://docs.gitlab.com/ce/api/commits.html#list-merge-requests-associated-with-a-commit
>
> I guess if we would move issue tracking to gitlab then we could possibly
> automate the CC list generation based on commit?
>
> // Tapani
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-15 Thread Daniel Stone
Hi,

On Tue, 15 Jan 2019 at 12:21, Rob Clark  wrote:
> On Tue, Jan 15, 2019 at 1:02 AM Tapani Pälli  wrote:
> > On 1/14/19 2:36 PM, Daniel Stone wrote:
> > > On Fri, 11 Jan 2019 at 17:05, Jason Ekstrand  wrote:
> > > In other projects, we looked for ways to apply the tags and ended up
> > > concluding that they didn't bring enough value to make it worthwhile.
> > > I don't know if that holds for Mesa, but it would be better to start
> > > with an actual problem statement - what value does R-b bring and how?
> > > - then look at ways to solve that problem, rather than just very
> > > directly finding a way to insert that literal text string into every
> > > commit message.
> >
> > IMO it brings some 'shared responsibility' for correctness of the patch

Oh, no doubt - we certainly haven't abandoned thorough review! So far
we haven't seen that compromised by not having a name in the commit
message.

> > and quickly accessible information on who were looking at the change. So
> > ideally later when filing bug against commit/series there would be more
> > people than just the committer that should take a look at the possible
> > regressions. At least in my experience people filing bugs tend to often
> > also CC the reviewer.

Yeah, that's really helpful. So maybe a useful flow - assuming we
eventually switch to GitLab issues - would be the ability to associate
an issue with a commit, which could then automatically drag in people
who commented on the MR which landed that commit, as well as (at
least) the reporter of the issue(s) fixed by that MR. That would need
some kind of clever - probably at least semi-manual - filtering to
make sure it wasn't just spamming the world, but it's at least a
starting point.

> +1 .. and also it is nice to see things like Reported-by/Reviewed-by
> without having to go search somewhere else (ie. outside of git/tig)

My question would again be what value that brings you. Do you just
like seeing the name there, or do you go poke the people on IRC, or
follow up via email, or ... ? Again I personally go look through the
original review to see what came up during that first, but everyone's
different, so I'm just trying to understand what you actually do with
that information, so we can figure out if there's a better way to do
things for everyone rather than just blindly imitating what came
before.

> (ofc it would be pretty awesome incentive to switch to gitlab issues
> if gitlab could automate adding Reported-by tags for MR's associated
> with an issue.. but I guess checkbox to add Reviewed-by tag would
> already make my day)

I saw this the other day, which might be more incentive:
https://csoriano.pages.gitlab.gnome.org/csoriano-blog/post/2019-01-07-issue-handling-automation/

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-15 Thread apinheiro

On 15/1/19 7:01, Tapani Pälli wrote:
>
>
> On 1/14/19 2:36 PM, Daniel Stone wrote:
>> Hi,
>>
>> On Fri, 11 Jan 2019 at 17:05, Jason Ekstrand 
>> wrote:
>>>   5. There's no way with gitlab for Reviewed-by tags to get
>>> automatically applied as part of the merging process.  This makes
>>> merging a bit more manual than it needs to be but is really no worse
>>> than it was before.
>>
>> I'm still on the side of not seeing the value in them. Most of the
>> time when I go to pursue someone who reviewed a commit, I'll go to see
>> what came up in review anyway. Maybe someone had the same comment
>> which was found to be not applicable or otherwise explained away.
>> Reviewed-by and Acked-by are also pretty lossy anyway, and freeform
>> text descriptors in a comment can much better capture the intent (e.g.
>> 'I'm strongly OK with the driver changes and weakly OK with the core
>> changes as it's not really my area of expertise').
>>
>> In other projects, we looked for ways to apply the tags and ended up
>> concluding that they didn't bring enough value to make it worthwhile.
>> I don't know if that holds for Mesa, but it would be better to start
>> with an actual problem statement - what value does R-b bring and how?
>> - then look at ways to solve that problem, rather than just very
>> directly finding a way to insert that literal text string into every
>> commit message.
>
> IMO it brings some 'shared responsibility' for correctness of the
> patch and quickly accessible information on who were looking at the
> change. So ideally later when filing bug against commit/series there
> would be more people than just the committer that should take a look
> at the possible regressions. At least in my experience people filing
> bugs tend to often also CC the reviewer.


In addition to that, it is also useful for big series that are updated
several times, as is a way to know which patches were already reviewed
and which not, so reviewer can focus on the latter.


>
>> FWIW, if you go to
>> https://gitlab.freedesktop.org/mesa/mesa/commit/SHA1 then you get a
>> hyperlink from the web UI which points you to the MR. The API to do
>> this is pretty straightforward and amenable to piping through jq:
>> https://docs.gitlab.com/ce/api/commits.html#list-merge-requests-associated-with-a-commit
>>
>
> I guess if we would move issue tracking to gitlab then we could
> possibly automate the CC list generation based on commit?
>
> // Tapani
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


pEpkey.asc
Description: application/pgp-keys
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 109339] Wolfenstein II The New Colossus - a lot of missing objects

2019-01-15 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109339

Ahmed Elsayed  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #3 from Ahmed Elsayed  ---
I was using unofficial user repository for mesa-git, and I used mesa-git from
AUR repository instead, then deleted vulkan-intel manually, and that worked. I
hope it will last for a long time without any new problems.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 19/42] intel/compiler: don't compact 3-src instructions with Src1Type or Src2Type bits

2019-01-15 Thread Iago Toral Quiroga
We are now using these bits, so don't assert that they are not set, just
avoid compaction in that case.

Reviewed-by: Topi Pohjolainen 
---
 src/intel/compiler/brw_eu_compact.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_eu_compact.c 
b/src/intel/compiler/brw_eu_compact.c
index ae14ef10ec0..20fed254331 100644
--- a/src/intel/compiler/brw_eu_compact.c
+++ b/src/intel/compiler/brw_eu_compact.c
@@ -928,8 +928,11 @@ has_3src_unmapped_bits(const struct gen_device_info 
*devinfo,
   assert(!brw_inst_bits(src, 127, 126) &&
  !brw_inst_bits(src, 105, 105) &&
  !brw_inst_bits(src, 84, 84) &&
- !brw_inst_bits(src, 36, 35) &&
  !brw_inst_bits(src, 7,  7));
+
+  /* Src1Type and Src2Type, used for mixed-precision floating point */
+  if (brw_inst_bits(src, 36, 35))
+ return true;
}
 
return false;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 41/42] intel/compiler: fix combine constants for Align16 with half-float prior to gen9

2019-01-15 Thread Iago Toral Quiroga
There is a hardware restriction where <0,1,0>:HF in Align16 doesn't replicate
a single 16-bit channel, but instead it replicates a full 32-bit channel.
---
 .../compiler/brw_fs_combine_constants.cpp | 24 +--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_combine_constants.cpp 
b/src/intel/compiler/brw_fs_combine_constants.cpp
index 54017e5668b..56e414d3f4e 100644
--- a/src/intel/compiler/brw_fs_combine_constants.cpp
+++ b/src/intel/compiler/brw_fs_combine_constants.cpp
@@ -301,7 +301,26 @@ fs_visitor::opt_combine_constants()
*/
   exec_node *n = (imm->inst ? imm->inst :
   imm->block->last_non_control_flow_inst()->next);
-  const fs_builder ibld = bld.at(imm->block, n).exec_all().group(1, 0);
+
+  /* Prior to gen9 we also have to deal with this restriction:
+   *
+   * "In Align16 mode, the channel selects and channel enables apply to a
+   *  pair of half-floats, because these parameters are defined for DWord
+   *  elements ONLY. This is applicable when both source and destination
+   *  are half-floats."
+   *
+   * This means that when we emit a 3-src instruction such as MAD or LRP,
+   * for which we use Align16, if we need to promote an HF constant to a
+   * register we need to be aware that the  <0,1,0>:HF region would still
+   * read 2 HF slots and not not replicate the single one like we want.
+   * We fix this by populating both HF slots with the constant we need to
+   * read.
+   */
+  const uint32_t width =
+ devinfo->gen < 9 &&
+ imm->type == BRW_REGISTER_TYPE_HF &&
+ (!imm->inst || imm->inst->is_3src(devinfo)) ? 2 : 1;
+  const fs_builder ibld = bld.at(imm->block, n).exec_all().group(width, 0);
 
   reg = retype(reg, imm->type);
   if (imm->type == BRW_REGISTER_TYPE_F) {
@@ -314,7 +333,8 @@ fs_visitor::opt_combine_constants()
   imm->subreg_offset = reg.offset;
 
   /* Keep offsets 32-bit aligned since we are mixing 32-bit and 16-bit
-   * constants into the same register
+   * constants into the same register (and we are writing 32-bit slots
+   * prior to gen9 for HF constants anyway).
*
* TODO: try to pack pairs of HF constants into each 32-bit slot
*/
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 38/42] intel/compiler: fix cmod propagation for non 32-bit types

2019-01-15 Thread Iago Toral Quiroga
v2:
 - Do not propagate if the bit-size changes
---
 src/intel/compiler/brw_fs_cmod_propagation.cpp | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/intel/compiler/brw_fs_cmod_propagation.cpp 
b/src/intel/compiler/brw_fs_cmod_propagation.cpp
index 7bb5c9afbc9..57d4e645c05 100644
--- a/src/intel/compiler/brw_fs_cmod_propagation.cpp
+++ b/src/intel/compiler/brw_fs_cmod_propagation.cpp
@@ -244,8 +244,7 @@ opt_cmod_propagation_local(const gen_device_info *devinfo,
 /* CMP's result is the same regardless of dest type. */
 if (inst->conditional_mod == BRW_CONDITIONAL_NZ &&
 scan_inst->opcode == BRW_OPCODE_CMP &&
-(inst->dst.type == BRW_REGISTER_TYPE_D ||
- inst->dst.type == BRW_REGISTER_TYPE_UD)) {
+brw_reg_type_is_integer(inst->dst.type)) {
inst->remove(block);
progress = true;
break;
@@ -258,9 +257,14 @@ opt_cmod_propagation_local(const gen_device_info *devinfo,
break;
 
 /* Comparisons operate differently for ints and floats */
-if (scan_inst->dst.type != inst->dst.type &&
-(scan_inst->dst.type == BRW_REGISTER_TYPE_F ||
- inst->dst.type == BRW_REGISTER_TYPE_F))
+if (brw_reg_type_is_floating_point(scan_inst->dst.type) !=
+brw_reg_type_is_floating_point(inst->dst.type))
+   break;
+
+/* Comparison result may be altered if the bit-size changes
+ * since that affects range, denorms, etc
+ */
+if (type_sz(scan_inst->dst.type) != type_sz(inst->dst.type))
break;
 
 /* If the instruction generating inst's source also wrote the
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 20/42] intel/compiler: allow half-float on 3-source instructions since gen8

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Topi Pohjolainen 
---
 src/intel/compiler/brw_eu_emit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index e21df4624b3..a785f96b650 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++ b/src/intel/compiler/brw_eu_emit.c
@@ -755,7 +755,8 @@ brw_alu3(struct brw_codegen *p, unsigned opcode, struct 
brw_reg dest,
   assert(dest.type == BRW_REGISTER_TYPE_F  ||
  dest.type == BRW_REGISTER_TYPE_DF ||
  dest.type == BRW_REGISTER_TYPE_D  ||
- dest.type == BRW_REGISTER_TYPE_UD);
+ dest.type == BRW_REGISTER_TYPE_UD ||
+ (dest.type == BRW_REGISTER_TYPE_HF && devinfo->gen >= 8));
   if (devinfo->gen == 6) {
  brw_inst_set_3src_a16_dst_reg_file(devinfo, inst,
 dest.file == 
BRW_MESSAGE_REGISTER_FILE);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 26/42] intel/compiler: split is_partial_write() into two variants

2019-01-15 Thread Iago Toral Quiroga
This function is used in two different scenarios that for 32-bit
instructions are the same, but for 16-bit instructions are not.

One scenario is that in which we are working at a SIMD8 register
level and we need to know if a register is fully defined or written.
This is useful, for example, in the context of liveness analysis or
register allocation, where we work with units of registers.

The other scenario is that in which we want to know if an instruction
is writing a full scalar component or just some subset of it. This is
useful, for example, in the context of some optimization passes
like copy propagation.

For 32-bit instructions (or larger), a SIMD8 dispatch will always write
at least a full SIMD8 register (32B) if the write is not partial. The
function is_partial_write() checks this to determine if we have a partial
write. However, when we deal with 16-bit instructions, that logic disables
some optimizations that should be safe. For example, a SIMD8 16-bit MOV will
only update half of a SIMD register, but it is still a complete write of the
variable for a SIMD8 dispatch, so we should not prevent copy propagation in
this scenario because we don't write all 32 bytes in the SIMD register
or because the write starts at offset 16B (wehere we pack components Y or
W of 16-bit vectors).

This is a problem for SIMD8 executions (VS, TCS, TES, GS) of 16-bit
instructions, which lose a number of optimizations because of this, most
important of which is copy-propagation.

This patch splits is_partial_write() into is_partial_reg_write(), which
represents the current is_partial_write(), useful for things like
liveness analysis, and is_partial_var_write(), which considers
the dispatch size to check if we are writing a full variable (rather
than a full register) to decide if the write is partial or not, which
is what we really want in many optimization passes.

Then the patch goes on and rewrites all uses of is_partial_write() to use
one or the other version. Specifically, we use is_partial_var_write()
in the following places: copy propagation, cmod propagation, common
subexpression elimination, saturate propagation and sel peephole.

Notice that the semantics of is_partial_var_write() exactly match the
current implementation of is_partial_write() for anything that is
32-bit or larger, so no changes are expected for 32-bit instructions.

Tested against ~5000 tests involving 16-bit instructions in CTS produced
the following changes in instruction counts:

Patched  | Master|%|

SIMD8  |621,900  |706,721| -12.00% |

SIMD16 | 93,252  | 93,252|   0.00% |


As expected, the change only affects SIMD8 dispatches.

Reviewed-by: Topi Pohjolainen 
---
 src/intel/compiler/brw_fs.cpp | 31 +++
 .../compiler/brw_fs_cmod_propagation.cpp  | 20 ++--
 .../compiler/brw_fs_copy_propagation.cpp  |  8 ++---
 src/intel/compiler/brw_fs_cse.cpp |  3 +-
 .../compiler/brw_fs_dead_code_eliminate.cpp   |  2 +-
 src/intel/compiler/brw_fs_live_variables.cpp  |  2 +-
 src/intel/compiler/brw_fs_reg_allocate.cpp|  2 +-
 .../compiler/brw_fs_register_coalesce.cpp |  2 +-
 .../compiler/brw_fs_saturate_propagation.cpp  |  7 +++--
 src/intel/compiler/brw_fs_sel_peephole.cpp|  4 +--
 src/intel/compiler/brw_ir_fs.h|  3 +-
 11 files changed, 54 insertions(+), 30 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index d6096cd667d..77c955ac435 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -716,14 +716,33 @@ fs_visitor::limit_dispatch_width(unsigned n, const char 
*msg)
  * it.
  */
 bool
-fs_inst::is_partial_write() const
+fs_inst::is_partial_reg_write() const
 {
return ((this->predicate && this->opcode != BRW_OPCODE_SEL) ||
-   (this->exec_size * type_sz(this->dst.type)) < 32 ||
!this->dst.is_contiguous() ||
+   (this->exec_size * type_sz(this->dst.type)) < REG_SIZE ||
this->dst.offset % REG_SIZE != 0);
 }
 
+/**
+ * Returns true if the instruction has a flag that means it won't
+ * update an entire variable for the given dispatch width.
+ *
+ * This is only different from is_partial_reg_write() for SIMD8
+ * dispatches of 16-bit (or smaller) instructions.
+ */
+bool
+fs_inst::is_partial_var_write(uint32_t dispatch_width) const
+{
+   const uint32_t type_size = type_sz(this->dst.type);
+   uint32_t var_size = MIN2(REG_SIZE, dispatch_width * type_size);
+
+   return ((this->predicate && this->opcode != BRW_OPCODE_SEL) ||
+   !this->dst.is_contiguous() ||
+   (this->exec_size * type_sz(this->dst.type)) < var_size ||
+   this->dst.offset % var_size != 0);
+}
+
 unsigned
 fs_inst::components_read(unsigned i) const
 {
@@ -2896,7 +2915,7 @@ 

[Mesa-dev] [PATCH v3 12/42] compiler/nir: add lowering for 16-bit flrp

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Jason Ekstrand 
---
 src/compiler/nir/nir.h| 1 +
 src/compiler/nir/nir_opt_algebraic.py | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 19056e79206..adcc8e36cc9 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2106,6 +2106,7 @@ typedef struct nir_shader_compiler_options {
bool lower_fdiv;
bool lower_ffma;
bool fuse_ffma;
+   bool lower_flrp16;
bool lower_flrp32;
/** Lowers flrp when it does not support doubles */
bool lower_flrp64;
diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index cd969de1f88..40eb3de02c3 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -124,6 +124,7 @@ optimizations = [
(('~flrp', 0.0, a, b), ('fmul', a, b)),
(('~flrp', a, b, ('b2f', 'c@1')), ('bcsel', c, b, a), 
'options->lower_flrp32'),
(('~flrp', a, 0.0, c), ('fadd', ('fmul', ('fneg', a), c), a)),
+   (('flrp@16', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 
'options->lower_flrp16'),
(('flrp@32', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 
'options->lower_flrp32'),
(('flrp@64', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 
'options->lower_flrp64'),
(('ffloor', a), ('fsub', a, ('ffract', a)), 'options->lower_ffloor'),
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 07/42] intel/compiler: lower 16-bit extended math to 32-bit prior to gen9

2019-01-15 Thread Iago Toral Quiroga
Extended math doesn't support half-float on these generations.

Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_nir.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index f0fe7f870c2..3b2909da33e 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -631,6 +631,8 @@ lower_bit_size_callback(const nir_alu_instr *alu, UNUSED 
void *data)
if (alu->dest.dest.ssa.bit_size != 16)
   return 0;
 
+   const struct brw_compiler *compiler = (const struct brw_compiler *) data;
+
switch (alu->op) {
case nir_op_idiv:
case nir_op_imod:
@@ -643,6 +645,15 @@ lower_bit_size_callback(const nir_alu_instr *alu, UNUSED 
void *data)
case nir_op_fround_even:
case nir_op_ftrunc:
   return 32;
+   case nir_op_frcp:
+   case nir_op_frsq:
+   case nir_op_fsqrt:
+   case nir_op_fpow:
+   case nir_op_fexp2:
+   case nir_op_flog2:
+   case nir_op_fsin:
+   case nir_op_fcos:
+  return compiler->devinfo->gen < 9 ? 32 : 0;
default:
   return 0;
}
@@ -770,7 +781,7 @@ brw_preprocess_nir(const struct brw_compiler *compiler, 
nir_shader *nir)
   OPT(nir_opt_large_constants, NULL, 32);
}
 
-   OPT(nir_lower_bit_size, lower_bit_size_callback, NULL);
+   OPT(nir_lower_bit_size, lower_bit_size_callback, (void *)compiler);
 
if (is_scalar) {
   OPT(nir_lower_load_const_to_scalar);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 10/42] compiler/nir: add lowering option for 16-bit fmod

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Jason Ekstrand 
---
 src/compiler/nir/nir.h| 1 +
 src/compiler/nir/nir_opt_algebraic.py | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 3cb2d166cb3..19056e79206 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2112,6 +2112,7 @@ typedef struct nir_shader_compiler_options {
bool lower_fpow;
bool lower_fsat;
bool lower_fsqrt;
+   bool lower_fmod16;
bool lower_fmod32;
bool lower_fmod64;
/** Lowers ibitfield_extract/ubitfield_extract to ibfe/ubfe. */
diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index 75a3d2ad238..cd969de1f88 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -636,6 +636,7 @@ optimizations = [
(('bcsel', ('ine', a, -1), ('ifind_msb', a), -1), ('ifind_msb', a)),
 
# Misc. lowering
+   (('fmod@16', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, b, 
'options->lower_fmod16'),
(('fmod@32', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, b, 
'options->lower_fmod32'),
(('fmod@64', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, b, 
'options->lower_fmod64'),
(('frem', a, b), ('fsub', a, ('fmul', b, ('ftrunc', ('fdiv', a, b, 
'options->lower_fmod32'),
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 34/42] anv/pipeline: support Float16 and Int8 capabilities in gen8+

2019-01-15 Thread Iago Toral Quiroga
v2:
 - Merge Float16 and Int8 in a single patch (Jason)

Reviewed-by: Jason Ekstrand  (v1)
---
 src/intel/vulkan/anv_pipeline.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 899160746d4..663d1c77fa5 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -136,8 +136,10 @@ anv_shader_compile_to_nir(struct anv_device *device,
   .caps = {
  .device_group = true,
  .draw_parameters = true,
+ .float16 = device->instance->physicalDevice.info.gen >= 8,
  .float64 = device->instance->physicalDevice.info.gen >= 8,
  .image_write_without_format = true,
+ .int8 = device->instance->physicalDevice.info.gen >= 8,
  .int16 = device->instance->physicalDevice.info.gen >= 8,
  .int64 = device->instance->physicalDevice.info.gen >= 8,
  .min_lod = true,
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 28/42] intel/compiler: handle 64-bit float to 8-bit integer conversions

2019-01-15 Thread Iago Toral Quiroga
These are not directly supported in hardware and brw_nir_lower_conversions
should have taken care of that before we get here. Also, while we are
at it, make sure 64-bit integer to 8-bit are also properly split by
the same lowering pass, since they have the same hardware restrictions.
---
 src/intel/compiler/brw_fs_nir.cpp | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index cf546b8ff09..e454578d99b 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -786,6 +786,10 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
case nir_op_f2f16:
case nir_op_i2f16:
case nir_op_u2f16:
+   case nir_op_i2i8:
+   case nir_op_u2u8:
+   case nir_op_f2i8:
+   case nir_op_f2u8:
   assert(type_sz(op[0].type) < 8); /* brw_nir_lower_conversions */
   inst = bld.MOV(result, op[0]);
   inst->saturate = instr->dest.saturate;
@@ -824,8 +828,6 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
case nir_op_u2u32:
case nir_op_i2i16:
case nir_op_u2u16:
-   case nir_op_i2i8:
-   case nir_op_u2u8:
   inst = bld.MOV(result, op[0]);
   inst->saturate = instr->dest.saturate;
   break;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 16/42] intel/compiler: add instruction setters for Src1Type and Src2Type.

2019-01-15 Thread Iago Toral Quiroga
The original SrcType is a 3-bit field that takes a subset of the types
supported for the hardware for 3-source instructions. Since gen8,
when the half-float type was added, 3-source floating point operations
can use use mixed precision mode, where not all the operands have the
same floating-point precision. While the precision for the first operand
is taken from the type in SrcType, the bits in Src1Type (bit 36) and
Src2Type (bit 35) define the precision for the other operands
(0: normal precision, 1: half precision).

Reviewed-by: Topi Pohjolainen 
---
 src/intel/compiler/brw_inst.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/intel/compiler/brw_inst.h b/src/intel/compiler/brw_inst.h
index ce89bbba72f..c45697eaa3a 100644
--- a/src/intel/compiler/brw_inst.h
+++ b/src/intel/compiler/brw_inst.h
@@ -222,6 +222,8 @@ F8(3src_src1_negate,39, 39, 40, 40)
 F8(3src_src1_abs,   38, 38, 39, 39)
 F8(3src_src0_negate,37, 37, 38, 38)
 F8(3src_src0_abs,   36, 36, 37, 37)
+F8(3src_a16_src1_type,  -1, -1, 36, 36)
+F8(3src_a16_src2_type,  -1, -1, 35, 35)
 F8(3src_a16_flag_reg_nr,34, 34, 33, 33)
 F8(3src_a16_flag_subreg_nr, 33, 33, 32, 32)
 FF(3src_a16_dst_reg_file,
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 35/42] anv/device: expose shaderFloat16 and shaderInt8 in gen8+

2019-01-15 Thread Iago Toral Quiroga
v2 (Jason):
 - Merge Float16 and Int8 into a single patch.
 - Merge extension enable.

Reviewed-by: Jason Ekstrand  (v1)
---
 src/intel/vulkan/anv_device.c  | 9 +
 src/intel/vulkan/anv_extensions.py | 1 +
 2 files changed, 10 insertions(+)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 523f1483e29..d9931d339e5 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -966,6 +966,15 @@ void anv_GetPhysicalDeviceFeatures2(
  break;
   }
 
+  case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FLOAT16_INT8_FEATURES_KHR: {
+ VkPhysicalDeviceFloat16Int8FeaturesKHR *features = (void *)ext;
+ ANV_FROM_HANDLE(anv_physical_device, pdevice, physicalDevice);
+
+ features->shaderFloat16 = pdevice->info.gen >= 8;
+ features->shaderInt8 = pdevice->info.gen >= 8;
+ break;
+  }
+
   default:
  anv_debug_ignored_stype(ext->sType);
  break;
diff --git a/src/intel/vulkan/anv_extensions.py 
b/src/intel/vulkan/anv_extensions.py
index 388845003aa..0f579ced692 100644
--- a/src/intel/vulkan/anv_extensions.py
+++ b/src/intel/vulkan/anv_extensions.py
@@ -105,6 +105,7 @@ EXTENSIONS = [
 Extension('VK_KHR_sampler_mirror_clamp_to_edge',  1, True),
 Extension('VK_KHR_sampler_ycbcr_conversion',  1, True),
 Extension('VK_KHR_shader_draw_parameters',1, True),
+Extension('VK_KHR_shader_float16_int8',   1, 'device->info.gen 
>= 8'),
 Extension('VK_KHR_storage_buffer_storage_class',  1, True),
 Extension('VK_KHR_surface',  25, 
'ANV_HAS_SURFACE'),
 Extension('VK_KHR_swapchain',68, 
'ANV_HAS_SURFACE'),
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 40/42] intel/compiler: support half-float in the combine constants pass

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Topi Pohjolainen 
---
 .../compiler/brw_fs_combine_constants.cpp | 60 +++
 1 file changed, 49 insertions(+), 11 deletions(-)

diff --git a/src/intel/compiler/brw_fs_combine_constants.cpp 
b/src/intel/compiler/brw_fs_combine_constants.cpp
index 7343f77bb45..54017e5668b 100644
--- a/src/intel/compiler/brw_fs_combine_constants.cpp
+++ b/src/intel/compiler/brw_fs_combine_constants.cpp
@@ -36,6 +36,7 @@
 
 #include "brw_fs.h"
 #include "brw_cfg.h"
+#include "util/half_float.h"
 
 using namespace brw;
 
@@ -114,8 +115,9 @@ struct imm {
 */
exec_list *uses;
 
-   /** The immediate value.  We currently only handle floats. */
+   /** The immediate value.  We currently only handle float and half-float. */
float val;
+   brw_reg_type type;
 
/**
 * The GRF register and subregister number where we've decided to store the
@@ -145,10 +147,10 @@ struct table {
 };
 
 static struct imm *
-find_imm(struct table *table, float val)
+find_imm(struct table *table, float val, brw_reg_type type)
 {
for (int i = 0; i < table->len; i++) {
-  if (table->imm[i].val == val) {
+  if (table->imm[i].val == val && table->imm[i].type == type) {
  return >imm[i];
   }
}
@@ -190,6 +192,20 @@ compare(const void *_a, const void *_b)
return a->first_use_ip - b->first_use_ip;
 }
 
+static bool
+needs_negate(float reg_val, float imm_val, brw_reg_type type)
+{
+   /* reg_val represents the immediate value in the register in its original
+* bit-size, while imm_val is always a valid 32-bit float value.
+*/
+   if (type == BRW_REGISTER_TYPE_HF) {
+  uint32_t reg_val_ud = *((uint32_t *) _val);
+  reg_val = _mesa_half_to_float(reg_val_ud & 0x);
+   }
+
+   return signbit(imm_val) != signbit(reg_val);
+}
+
 bool
 fs_visitor::opt_combine_constants()
 {
@@ -215,12 +231,20 @@ fs_visitor::opt_combine_constants()
 
   for (int i = 0; i < inst->sources; i++) {
  if (inst->src[i].file != IMM ||
- inst->src[i].type != BRW_REGISTER_TYPE_F)
+ (inst->src[i].type != BRW_REGISTER_TYPE_F &&
+  inst->src[i].type != BRW_REGISTER_TYPE_HF))
 continue;
 
- float val = !inst->can_do_source_mods(devinfo) ? inst->src[i].f :
- fabs(inst->src[i].f);
- struct imm *imm = find_imm(, val);
+ float val;
+ if (inst->src[i].type == BRW_REGISTER_TYPE_F) {
+val = !inst->can_do_source_mods(devinfo) ? inst->src[i].f :
+fabs(inst->src[i].f);
+ } else {
+val = !inst->can_do_source_mods(devinfo) ?
+   _mesa_half_to_float(inst->src[i].d & 0x) :
+   fabs(_mesa_half_to_float(inst->src[i].d & 0x));
+ }
+ struct imm *imm = find_imm(, val, inst->src[i].type);
 
  if (imm) {
 bblock_t *intersection = cfg_t::intersect(block, imm->block);
@@ -238,6 +262,7 @@ fs_visitor::opt_combine_constants()
 imm->uses = new(const_ctx) exec_list();
 imm->uses->push_tail(link(const_ctx, >src[i]));
 imm->val = val;
+imm->type = inst->src[i].type;
 imm->uses_by_coissue = could_coissue(devinfo, inst);
 imm->must_promote = must_promote_imm(devinfo, inst);
 imm->first_use_ip = ip;
@@ -278,12 +303,23 @@ fs_visitor::opt_combine_constants()
   imm->block->last_non_control_flow_inst()->next);
   const fs_builder ibld = bld.at(imm->block, n).exec_all().group(1, 0);
 
-  ibld.MOV(reg, brw_imm_f(imm->val));
+  reg = retype(reg, imm->type);
+  if (imm->type == BRW_REGISTER_TYPE_F) {
+ ibld.MOV(reg, brw_imm_f(imm->val));
+  } else {
+ const uint16_t val_hf = _mesa_float_to_half(imm->val);
+ ibld.MOV(reg, retype(brw_imm_uw(val_hf), BRW_REGISTER_TYPE_HF));
+  }
   imm->nr = reg.nr;
   imm->subreg_offset = reg.offset;
 
+  /* Keep offsets 32-bit aligned since we are mixing 32-bit and 16-bit
+   * constants into the same register
+   *
+   * TODO: try to pack pairs of HF constants into each 32-bit slot
+   */
   reg.offset += sizeof(float);
-  if (reg.offset == 8 * sizeof(float)) {
+  if (reg.offset == REG_SIZE) {
  reg.nr = alloc.allocate(1);
  reg.offset = 0;
   }
@@ -295,12 +331,14 @@ fs_visitor::opt_combine_constants()
   foreach_list_typed(reg_link, link, link, table.imm[i].uses) {
  fs_reg *reg = link->reg;
  assert((isnan(reg->f) && isnan(table.imm[i].val)) ||
-fabsf(reg->f) == fabs(table.imm[i].val));
+fabsf(reg->f) == fabs(table.imm[i].val) ||
+table.imm[i].type == BRW_REGISTER_TYPE_HF);
 
  reg->file = VGRF;
+ reg->type = table.imm[i].type;
  reg->offset = table.imm[i].subreg_offset;
  reg->stride = 0;
- reg->negate = signbit(reg->f) != signbit(table.imm[i].val);
+ 

[Mesa-dev] [PATCH v3 13/42] intel/compiler: lower 16-bit flrp

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_compiler.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/intel/compiler/brw_compiler.c 
b/src/intel/compiler/brw_compiler.c
index f885e79c3e6..04a1a7cac4e 100644
--- a/src/intel/compiler/brw_compiler.c
+++ b/src/intel/compiler/brw_compiler.c
@@ -33,6 +33,7 @@
.lower_sub = true, \
.lower_fdiv = true,\
.lower_scmp = true,\
+   .lower_flrp16 = true,  \
.lower_fmod16 = true,  \
.lower_fmod32 = true,  \
.lower_fmod64 = false, \
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 29/42] intel/compiler: handle conversions between int and half-float on atom

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Topi Pohjolainen 
---
 src/intel/compiler/brw_fs_nir.cpp | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index e454578d99b..a739562c3ab 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -784,13 +784,20 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
*/
 
case nir_op_f2f16:
-   case nir_op_i2f16:
-   case nir_op_u2f16:
case nir_op_i2i8:
case nir_op_u2u8:
+  assert(type_sz(op[0].type) < 8); /* brw_nir_lower_conversions */
+  inst = bld.MOV(result, op[0]);
+  inst->saturate = instr->dest.saturate;
+  break;
+
+   case nir_op_i2f16:
+   case nir_op_u2f16:
case nir_op_f2i8:
case nir_op_f2u8:
   assert(type_sz(op[0].type) < 8); /* brw_nir_lower_conversions */
+   case nir_op_f2i16:
+   case nir_op_f2u16:
   inst = bld.MOV(result, op[0]);
   inst->saturate = instr->dest.saturate;
   break;
@@ -822,8 +829,6 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
case nir_op_f2f32:
case nir_op_f2i32:
case nir_op_f2u32:
-   case nir_op_f2i16:
-   case nir_op_f2u16:
case nir_op_i2i32:
case nir_op_u2u32:
case nir_op_i2i16:
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 21/42] intel/compiler: set correct precision fields for 3-source float instructions

2019-01-15 Thread Iago Toral Quiroga
Source0 and Destination extract the floating-point precision automatically
from the SrcType and DstType instruction fields respectively when they are
set to types :F or :HF. For Source1 and Source2 operands, we use the new
1-bit fields Src1Type and Src2Type, where 0 means normal precision and 1
means half-precision. Since we always use the type of the destination for
all operands when we emit 3-source instructions, we only need set Src1Type
and Src2Type to 1 when we are emitting a half-precision instruction.

v2:
 - Set the bit separately for each source based on its type so we can
   do mixed floating-point mode in the future (Topi).

Reviewed-by: Topi Pohjolainen 
---
 src/intel/compiler/brw_eu_emit.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index a785f96b650..2fa89f8a2a3 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++ b/src/intel/compiler/brw_eu_emit.c
@@ -801,6 +801,22 @@ brw_alu3(struct brw_codegen *p, unsigned opcode, struct 
brw_reg dest,
   */
  brw_inst_set_3src_a16_src_type(devinfo, inst, dest.type);
  brw_inst_set_3src_a16_dst_type(devinfo, inst, dest.type);
+
+ /* From the Bspec: Instruction types
+  *
+  * Three source instructions can use operands with mixed-mode
+  * precision. When SrcType field is set to :f or :hf it defines
+  * precision for source 0 only, and fields Src1Type and Src2Type
+  * define precision for other source operands:
+  *
+  *   0b = :f. Single precision Float (32-bit).
+  *   1b = :hf. Half precision Float (16-bit).
+  */
+ if (src1.type == BRW_REGISTER_TYPE_HF)
+brw_inst_set_3src_a16_src1_type(devinfo, inst, 1);
+
+ if (src2.type == BRW_REGISTER_TYPE_HF)
+brw_inst_set_3src_a16_src2_type(devinfo, inst, 1);
   }
}
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 31/42] intel/compiler: ask for an integer type if requesting an 8-bit type

2019-01-15 Thread Iago Toral Quiroga
---
 src/intel/compiler/brw_fs_nir.cpp | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index a3d193b8a44..ccf1891b925 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -346,7 +346,9 @@ fs_visitor::nir_emit_impl(nir_function_impl *impl)
  reg->num_array_elems == 0 ? 1 : reg->num_array_elems;
   unsigned size = array_elems * reg->num_components;
   const brw_reg_type reg_type =
- brw_reg_type_from_bit_size(reg->bit_size, BRW_REGISTER_TYPE_F);
+ brw_reg_type_from_bit_size(reg->bit_size,
+reg->bit_size == 8 ? BRW_REGISTER_TYPE_D :
+ BRW_REGISTER_TYPE_F);
   nir_locals[reg->index] = bld.vgrf(reg_type, size);
}
 
@@ -4281,7 +4283,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
   fs_reg value = get_nir_src(instr->src[0]);
   if (instr->intrinsic == nir_intrinsic_vote_feq) {
  const unsigned bit_size = nir_src_bit_size(instr->src[0]);
- value.type = brw_reg_type_from_bit_size(bit_size, 
BRW_REGISTER_TYPE_F);
+ value.type =
+brw_reg_type_from_bit_size(bit_size,
+   bit_size == 8 ? BRW_REGISTER_TYPE_D :
+   BRW_REGISTER_TYPE_F);
   }
 
   fs_reg uniformized = bld.emit_uniformize(value);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 06/42] intel/compiler: lower some 16-bit float operations to 32-bit

2019-01-15 Thread Iago Toral Quiroga
The hardware doesn't support half-float for these.

Reviewed-by: Topi Pohjolainen 
Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_nir.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index 572ab824a94..f0fe7f870c2 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -637,6 +637,11 @@ lower_bit_size_callback(const nir_alu_instr *alu, UNUSED 
void *data)
case nir_op_irem:
case nir_op_udiv:
case nir_op_umod:
+   case nir_op_fceil:
+   case nir_op_ffloor:
+   case nir_op_ffract:
+   case nir_op_fround_even:
+   case nir_op_ftrunc:
   return 32;
default:
   return 0;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 23/42] intel/compiler: fix ddx and ddy for 16-bit float

2019-01-15 Thread Iago Toral Quiroga
We were assuming 32-bit elements. Also, In SIMD8 we pack 2 vector components
in a single SIMD register, so for example, component Y of a 16-bit vec2
starts is at byte offset 16B. This means that when we compute the offset of
the elements to be differentiated we should not stomp whatever base offset we
have, but instead add to it.

v2
 - Use byte_offset() helper (Jason)
 - Merge the fix for SIMD8: using byte_offset() fixes that too.

Reviewed-by: Jason Ekstrand  (v1)
---
 src/intel/compiler/brw_fs_generator.cpp | 37 -
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index 5fc6cf5f8cc..d0cc4a6d231 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -1315,10 +1315,9 @@ fs_generator::generate_ddx(const fs_inst *inst,
   width = BRW_WIDTH_4;
}
 
-   struct brw_reg src0 = src;
+   struct brw_reg src0 = byte_offset(src, type_sz(src.type));;
struct brw_reg src1 = src;
 
-   src0.subnr   = sizeof(float);
src0.vstride = vstride;
src0.width   = width;
src0.hstride = BRW_HORIZONTAL_STRIDE_0;
@@ -1337,23 +1336,25 @@ void
 fs_generator::generate_ddy(const fs_inst *inst,
struct brw_reg dst, struct brw_reg src)
 {
+   const uint32_t type_size = type_sz(src.type);
+
if (inst->opcode == FS_OPCODE_DDY_FINE) {
   /* produce accurate derivatives */
   if (devinfo->gen >= 11) {
  src = stride(src, 0, 2, 1);
- struct brw_reg src_0  = byte_offset(src,  0 * sizeof(float));
- struct brw_reg src_2  = byte_offset(src,  2 * sizeof(float));
- struct brw_reg src_4  = byte_offset(src,  4 * sizeof(float));
- struct brw_reg src_6  = byte_offset(src,  6 * sizeof(float));
- struct brw_reg src_8  = byte_offset(src,  8 * sizeof(float));
- struct brw_reg src_10 = byte_offset(src, 10 * sizeof(float));
- struct brw_reg src_12 = byte_offset(src, 12 * sizeof(float));
- struct brw_reg src_14 = byte_offset(src, 14 * sizeof(float));
-
- struct brw_reg dst_0  = byte_offset(dst,  0 * sizeof(float));
- struct brw_reg dst_4  = byte_offset(dst,  4 * sizeof(float));
- struct brw_reg dst_8  = byte_offset(dst,  8 * sizeof(float));
- struct brw_reg dst_12 = byte_offset(dst, 12 * sizeof(float));
+ struct brw_reg src_0  = byte_offset(src,  0 * type_size);
+ struct brw_reg src_2  = byte_offset(src,  2 * type_size);
+ struct brw_reg src_4  = byte_offset(src,  4 * type_size);
+ struct brw_reg src_6  = byte_offset(src,  6 * type_size);
+ struct brw_reg src_8  = byte_offset(src,  8 * type_size);
+ struct brw_reg src_10 = byte_offset(src, 10 * type_size);
+ struct brw_reg src_12 = byte_offset(src, 12 * type_size);
+ struct brw_reg src_14 = byte_offset(src, 14 * type_size);
+
+ struct brw_reg dst_0  = byte_offset(dst,  0 * type_size);
+ struct brw_reg dst_4  = byte_offset(dst,  4 * type_size);
+ struct brw_reg dst_8  = byte_offset(dst,  8 * type_size);
+ struct brw_reg dst_12 = byte_offset(dst, 12 * type_size);
 
  brw_push_insn_state(p);
  brw_set_default_exec_size(p, BRW_EXECUTE_4);
@@ -1380,10 +1381,8 @@ fs_generator::generate_ddy(const fs_inst *inst,
   }
} else {
   /* replicate the derivative at the top-left pixel to other pixels */
-  struct brw_reg src0 = stride(src, 4, 4, 0);
-  struct brw_reg src1 = stride(src, 4, 4, 0);
-  src0.subnr = 0 * sizeof(float);
-  src1.subnr = 2 * sizeof(float);
+  struct brw_reg src0 = byte_offset(stride(src, 4, 4, 0), 0 * type_size);
+  struct brw_reg src1 = byte_offset(stride(src, 4, 4, 0), 2 * type_size);
 
   brw_ADD(p, dst, negate(src0), src1);
}
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 37/42] intel/compiler: add a brw_reg_type_is_integer helper

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_reg_type.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/src/intel/compiler/brw_reg_type.h 
b/src/intel/compiler/brw_reg_type.h
index ffbec90d3fe..a3365b7e34c 100644
--- a/src/intel/compiler/brw_reg_type.h
+++ b/src/intel/compiler/brw_reg_type.h
@@ -82,6 +82,24 @@ brw_reg_type_is_floating_point(enum brw_reg_type type)
}
 }
 
+static inline bool
+brw_reg_type_is_integer(enum brw_reg_type type)
+{
+   switch (type) {
+   case BRW_REGISTER_TYPE_Q:
+   case BRW_REGISTER_TYPE_UQ:
+   case BRW_REGISTER_TYPE_D:
+   case BRW_REGISTER_TYPE_UD:
+   case BRW_REGISTER_TYPE_W:
+   case BRW_REGISTER_TYPE_UW:
+   case BRW_REGISTER_TYPE_B:
+   case BRW_REGISTER_TYPE_UV:
+  return true;
+   default:
+  return false;
+   }
+}
+
 unsigned
 brw_reg_type_to_hw_type(const struct gen_device_info *devinfo,
 enum brw_reg_file file, enum brw_reg_type type);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 25/42] intel/compiler: workaround for SIMD8 half-float MAD in gen8

2019-01-15 Thread Iago Toral Quiroga
Broadwell hardware has a bug that manifests in SIMD8 executions of
16-bit MAD instructions when any of the sources is a Y or W component.
We pack these components in the same SIMD register as components X and
Z respectively, but starting at offset 16B (so they live in the second
half of the register). The problem does not exist in SKL or later.

We work around this issue by moving any such sources to a temporary
starting at offset 0B. We want to do this after the main optimization loop
to prevent copy-propagation and friends to undo the fix.

Reviewed-by: Topi Pohjolainen 
---
 src/intel/compiler/brw_fs.cpp | 48 +++
 src/intel/compiler/brw_fs.h   |  1 +
 2 files changed, 49 insertions(+)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 0b3ec94e2d2..d6096cd667d 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -6540,6 +6540,48 @@ fs_visitor::optimize()
validate();
 }
 
+/**
+ * Broadwell hardware has a bug that manifests in SIMD8 executions of 16-bit
+ * MAD instructions when any of the sources is a Y or W component. We pack
+ * these components in the same SIMD register as components X and Z
+ * respectively, but starting at offset 16B (so they live in the second half
+ * of the register).
+ *
+ * We work around this issue by moving any such sources to a temporary
+ * starting at offset 0B. We want to do this after the main optimization loop
+ * to prevent copy-propagation and friends to undo the fix.
+ */
+void
+fs_visitor::fixup_hf_mad()
+{
+   if (devinfo->gen != 8)
+  return;
+
+   bool progress = false;
+
+   foreach_block_and_inst_safe (block, fs_inst, inst, cfg) {
+  if (inst->opcode != BRW_OPCODE_MAD ||
+  inst->dst.type != BRW_REGISTER_TYPE_HF ||
+  inst->exec_size > 8)
+ continue;
+
+  for (int i = 0; i < 3; i++) {
+ if (inst->src[i].offset > 0) {
+assert(inst->src[i].type == BRW_REGISTER_TYPE_HF);
+const fs_builder ibld =
+   bld.at(block, inst).exec_all().group(inst->exec_size, 0);
+fs_reg tmp = ibld.vgrf(inst->src[i].type);
+ibld.MOV(tmp, inst->src[i]);
+inst->src[i] = tmp;
+progress = true;
+ }
+  }
+   }
+
+   if (progress)
+  invalidate_live_intervals();
+}
+
 /**
  * Three source instruction must have a GRF/MRF destination register.
  * ARF NULL is not allowed.  Fix that up by allocating a temporary GRF.
@@ -6698,6 +6740,7 @@ fs_visitor::run_vs()
assign_curb_setup();
assign_vs_urb_setup();
 
+   fixup_hf_mad();
fixup_3src_null_dest();
allocate_registers(8, true);
 
@@ -6782,6 +6825,7 @@ fs_visitor::run_tcs_single_patch()
assign_curb_setup();
assign_tcs_single_patch_urb_setup();
 
+   fixup_hf_mad();
fixup_3src_null_dest();
allocate_registers(8, true);
 
@@ -6816,6 +6860,7 @@ fs_visitor::run_tes()
assign_curb_setup();
assign_tes_urb_setup();
 
+   fixup_hf_mad();
fixup_3src_null_dest();
allocate_registers(8, true);
 
@@ -6865,6 +6910,7 @@ fs_visitor::run_gs()
assign_curb_setup();
assign_gs_urb_setup();
 
+   fixup_hf_mad();
fixup_3src_null_dest();
allocate_registers(8, true);
 
@@ -6965,6 +7011,7 @@ fs_visitor::run_fs(bool allow_spilling, bool do_rep_send)
 
   assign_urb_setup();
 
+  fixup_hf_mad();
   fixup_3src_null_dest();
   allocate_registers(8, allow_spilling);
 
@@ -7009,6 +7056,7 @@ fs_visitor::run_cs(unsigned min_dispatch_width)
 
assign_curb_setup();
 
+   fixup_hf_mad();
fixup_3src_null_dest();
allocate_registers(min_dispatch_width, true);
 
diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 68287bcdcea..1879d4bc7f7 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -103,6 +103,7 @@ public:
void setup_vs_payload();
void setup_gs_payload();
void setup_cs_payload();
+   void fixup_hf_mad();
void fixup_3src_null_dest();
void assign_curb_setup();
void calculate_urb_setup();
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 33/42] compiler/spirv: add support for Float16 and Int8 capabilities

2019-01-15 Thread Iago Toral Quiroga
v2:
 - Merge Float16 and Int8 capabilities into a single patch (Jason)

Reviewed-by: Jason Ekstrand  (v1)
---
 src/compiler/shader_info.h| 2 ++
 src/compiler/spirv/spirv_to_nir.c | 8 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/compiler/shader_info.h b/src/compiler/shader_info.h
index 87a2c805d37..1d45433312a 100644
--- a/src/compiler/shader_info.h
+++ b/src/compiler/shader_info.h
@@ -37,12 +37,14 @@ struct spirv_supported_capabilities {
bool descriptor_array_dynamic_indexing;
bool device_group;
bool draw_parameters;
+   bool float16;
bool float64;
bool geometry_streams;
bool gcn_shader;
bool image_ms_array;
bool image_read_without_format;
bool image_write_without_format;
+   bool int8;
bool int16;
bool int64;
bool int64_atomics;
diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index 76a997ee341..731b1cbea5b 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -3518,8 +3518,6 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, 
SpvOp opcode,
   case SpvCapabilityLinkage:
   case SpvCapabilityVector16:
   case SpvCapabilityFloat16Buffer:
-  case SpvCapabilityFloat16:
-  case SpvCapabilityInt8:
   case SpvCapabilitySparseResidency:
  vtn_warn("Unsupported SPIR-V capability: %s",
   spirv_capability_to_string(cap));
@@ -3536,12 +3534,18 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, 
SpvOp opcode,
   case SpvCapabilityFloat64:
  spv_check_supported(float64, cap);
  break;
+  case SpvCapabilityFloat16:
+ spv_check_supported(float16, cap);
+ break;
   case SpvCapabilityInt64:
  spv_check_supported(int64, cap);
  break;
   case SpvCapabilityInt16:
  spv_check_supported(int16, cap);
  break;
+  case SpvCapabilityInt8:
+ spv_check_supported(int8, cap);
+ break;
 
   case SpvCapabilityTransformFeedback:
  spv_check_supported(transform_feedback, cap);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 14/42] compiler/nir: add lowering for 16-bit ldexp

2019-01-15 Thread Iago Toral Quiroga
v2 (Topi):
 - Make bit-size handling order be 16-bit, 32-bit, 64-bit
 - Clamp lower exponent range at -28 instead of -30.

Reviewed-by: Topi Pohjolainen 
Reviewed-by: Jason Ekstrand 
---
 src/compiler/nir/nir_opt_algebraic.py | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index 40eb3de02c3..71c626e1b3f 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -790,7 +790,9 @@ for x, y in itertools.product(['f', 'u', 'i'], ['f', 'u', 
'i']):
 
 def fexp2i(exp, bits):
# We assume that exp is already in the right range.
-   if bits == 32:
+   if bits == 16:
+  return ('i2i16', ('ishl', ('iadd', exp, 15), 10))
+   elif bits == 32:
   return ('ishl', ('iadd', exp, 127), 23)
elif bits == 64:
   return ('pack_64_2x32_split', 0, ('ishl', ('iadd', exp, 1023), 20))
@@ -808,7 +810,9 @@ def ldexp(f, exp, bits):
# handles a range on exp of [-252, 254] which allows you to create any
# value (including denorms if the hardware supports it) and to adjust the
# exponent of any normal value to anything you want.
-   if bits == 32:
+   if bits == 16:
+  exp = ('imin', ('imax', exp, -28), 30)
+   elif bits == 32:
   exp = ('imin', ('imax', exp, -252), 254)
elif bits == 64:
   exp = ('imin', ('imax', exp, -2044), 2046)
@@ -828,6 +832,7 @@ def ldexp(f, exp, bits):
return ('fmul', ('fmul', f, pow2_1), pow2_2)
 
 optimizations += [
+   (('ldexp@16', 'x', 'exp'), ldexp('x', 'exp', 16), 'options->lower_ldexp'),
(('ldexp@32', 'x', 'exp'), ldexp('x', 'exp', 32), 'options->lower_ldexp'),
(('ldexp@64', 'x', 'exp'), ldexp('x', 'exp', 64), 'options->lower_ldexp'),
 ]
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 42/42] intel/compiler: allow propagating HF immediates to MAD/LRP

2019-01-15 Thread Iago Toral Quiroga
Even if we don't do 3-src algebraic optimizations for MAD and LRP in
the backend any more, the combine constants pass can still do a fine
job putting grouping these constants into single registers for better
register pressure.

v2:
 - updated comment to reference register pressure benefits rather than
   algebraic optimizations.
---
 src/intel/compiler/brw_fs_copy_propagation.cpp | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp 
b/src/intel/compiler/brw_fs_copy_propagation.cpp
index 4e20ddb683a..5695678b766 100644
--- a/src/intel/compiler/brw_fs_copy_propagation.cpp
+++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
@@ -772,16 +772,14 @@ fs_visitor::try_constant_propagate(fs_inst *inst, 
acp_entry *entry)
 
   case BRW_OPCODE_MAD:
   case BRW_OPCODE_LRP:
- /* 3-src instructions can't take IMM registers, however, for 32-bit
-  * floating instructions we rely on the combine constants pass to fix
-  * it up. For anything else, we shouldn't be promoting immediates
-  * until we can make the pass capable of combining constants of
-  * different sizes.
+ /* 3-src instructions can't take IMM registers, but we allow this
+  * here anyway and rely on the combine constants pass to fix it up
+  * later, hopefully leading to better register pressure.
   */
- if (val.type == BRW_REGISTER_TYPE_F) {
-inst->src[i] = val;
-progress = true;
- }
+ assert(val.type == BRW_REGISTER_TYPE_F ||
+val.type == BRW_REGISTER_TYPE_HF);
+ inst->src[i] = val;
+ progress = true;
  break;
 
   default:
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 30/42] intel/compiler: implement isign for int8

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_fs_nir.cpp | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index a739562c3ab..a3d193b8a44 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -912,11 +912,28 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
*  Predicated OR sets 1 if val is positive.
*/
   uint32_t bit_size = nir_dest_bit_size(instr->dest.dest);
-  assert(bit_size == 32 || bit_size == 16);
 
-  fs_reg zero = bit_size == 32 ? brw_imm_d(0) : brw_imm_w(0);
-  fs_reg one = bit_size == 32 ? brw_imm_d(1) : brw_imm_w(1);
-  fs_reg shift = bit_size == 32 ? brw_imm_d(31) : brw_imm_w(15);
+  fs_reg zero, one, shift;
+  switch (bit_size) {
+  case 32:
+ zero = brw_imm_d(0);
+ one = brw_imm_d(1);
+ shift = brw_imm_d(31);
+ break;
+  case 16:
+ zero = brw_imm_w(0);
+ one = brw_imm_w(1);
+ shift = brw_imm_w(15);
+ break;
+  case 8: {
+ zero = setup_imm_b(bld, 0);
+ one = setup_imm_b(bld, 1);
+ shift = setup_imm_b(bld, 7);
+ break;
+  }
+  default:
+ unreachable("unsupported bit-size");
+  };
 
   bld.CMP(bld.null_reg_d(), op[0], zero, BRW_CONDITIONAL_G);
   bld.ASR(result, op[0], shift);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 36/42] intel/compiler: implement is_zero, is_one, is_negative_one for 8-bit/16-bit

2019-01-15 Thread Iago Toral Quiroga
There are no 8-bit immediates, so assert in that case.
16-bit immediates are replicated in each word of a 32-bit immediate, so
we only need to check the lower 16-bits.

v2:
 - Fix is_zero with half-float to consider -0 as well (Jason).
 - Fix is_negative_one for word type.
---
 src/intel/compiler/brw_shader.cpp | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/src/intel/compiler/brw_shader.cpp 
b/src/intel/compiler/brw_shader.cpp
index 97966c951a1..3c636c9d3a4 100644
--- a/src/intel/compiler/brw_shader.cpp
+++ b/src/intel/compiler/brw_shader.cpp
@@ -704,11 +704,20 @@ backend_reg::is_zero() const
if (file != IMM)
   return false;
 
+   assert(type_sz(type) > 1);
+
switch (type) {
+   case BRW_REGISTER_TYPE_HF:
+  assert((d & 0x) == ((d >> 16) & 0x));
+  return (d & 0x) == 0 || (d & 0x) == 0x8000;
case BRW_REGISTER_TYPE_F:
   return f == 0;
case BRW_REGISTER_TYPE_DF:
   return df == 0;
+   case BRW_REGISTER_TYPE_W:
+   case BRW_REGISTER_TYPE_UW:
+  assert((d & 0x) == ((d >> 16) & 0x));
+  return (d & 0x) == 0;
case BRW_REGISTER_TYPE_D:
case BRW_REGISTER_TYPE_UD:
   return d == 0;
@@ -726,11 +735,20 @@ backend_reg::is_one() const
if (file != IMM)
   return false;
 
+   assert(type_sz(type) > 1);
+
switch (type) {
+   case BRW_REGISTER_TYPE_HF:
+  assert((d & 0x) == ((d >> 16) & 0x));
+  return (d & 0x) == 0x3c00;
case BRW_REGISTER_TYPE_F:
   return f == 1.0f;
case BRW_REGISTER_TYPE_DF:
   return df == 1.0;
+   case BRW_REGISTER_TYPE_W:
+   case BRW_REGISTER_TYPE_UW:
+  assert((d & 0x) == ((d >> 16) & 0x));
+  return (d & 0x) == 1;
case BRW_REGISTER_TYPE_D:
case BRW_REGISTER_TYPE_UD:
   return d == 1;
@@ -748,11 +766,19 @@ backend_reg::is_negative_one() const
if (file != IMM)
   return false;
 
+   assert(type_sz(type) > 1);
+
switch (type) {
+   case BRW_REGISTER_TYPE_HF:
+  assert((d & 0x) == ((d >> 16) & 0x));
+  return (d & 0x) == 0xbc00;
case BRW_REGISTER_TYPE_F:
   return f == -1.0;
case BRW_REGISTER_TYPE_DF:
   return df == -1.0;
+   case BRW_REGISTER_TYPE_W:
+  assert((d & 0x) == ((d >> 16) & 0x));
+  return (d & 0x) == 0x;
case BRW_REGISTER_TYPE_D:
   return d == -1;
case BRW_REGISTER_TYPE_Q:
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 03/42] intel/compiler: split float to 64-bit opcodes from int to 64-bit

2019-01-15 Thread Iago Toral Quiroga
Going forward having these split is a bit more convenient since these two
groups have different restrictions.

v2:
 - Rebased on top of new regioning lowering pass.

Reviewed-by: Topi Pohjolainen  (v1)
---
 src/intel/compiler/brw_fs_nir.cpp | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index bdc883e5364..a59debf2b78 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -801,10 +801,17 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
case nir_op_f2f64:
case nir_op_f2i64:
case nir_op_f2u64:
+  assert(type_sz(op[0].type) > 2); /* brw_nir_lower_conversions */
+  inst = bld.MOV(result, op[0]);
+  inst->saturate = instr->dest.saturate;
+  break;
+
case nir_op_i2f64:
case nir_op_i2i64:
case nir_op_u2f64:
case nir_op_u2u64:
+  assert(type_sz(op[0].type) > 1); /* brw_nir_lower_conversions */
+  /* fallthrough */
case nir_op_f2f32:
case nir_op_f2i32:
case nir_op_f2u32:
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 27/42] intel/compiler: activate 16-bit bit-size lowerings also for 8-bit

2019-01-15 Thread Iago Toral Quiroga
Particularly, we need the same lowewrings we use for 16-bit
integers.

Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_nir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index 3b2909da33e..2dfbf8824dc 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -628,7 +628,7 @@ static unsigned
 lower_bit_size_callback(const nir_alu_instr *alu, UNUSED void *data)
 {
assert(alu->dest.dest.is_ssa);
-   if (alu->dest.dest.ssa.bit_size != 16)
+   if (alu->dest.dest.ssa.bit_size >= 32)
   return 0;
 
const struct brw_compiler *compiler = (const struct brw_compiler *) data;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 32/42] intel/eu: force stride of 2 on NULL register for Byte instructions

2019-01-15 Thread Iago Toral Quiroga
The hardware only allows a stride of 1 on a Byte destination for raw
byte MOV instructions. This is required even when the destination
is the NULL register.

Rather than making sure that we emit a proper NULL:B destination
every time we need one, just fix it at emission time.

Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_eu_emit.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index 2fa89f8a2a3..4e1672408ea 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++ b/src/intel/compiler/brw_eu_emit.c
@@ -94,6 +94,17 @@ brw_set_dest(struct brw_codegen *p, brw_inst *inst, struct 
brw_reg dest)
else if (dest.file == BRW_GENERAL_REGISTER_FILE)
   assert(dest.nr < 128);
 
+   /* The hardware has a restriction where if the destination is Byte,
+* the instruction needs to have a stride of 2 (except for packed byte
+* MOV). This seems to be required even if the destination is the NULL
+* register.
+*/
+   if (dest.file == BRW_ARCHITECTURE_REGISTER_FILE &&
+   dest.nr == BRW_ARF_NULL &&
+   type_sz(dest.type) == 1) {
+  dest.hstride = BRW_HORIZONTAL_STRIDE_2;
+   }
+
gen7_convert_mrf_to_grf(p, );
 
brw_inst_set_dst_file_type(devinfo, inst, dest.file, dest.type);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 39/42] intel/compiler: remove MAD/LRP algebraic optimizations from the backend

2019-01-15 Thread Iago Toral Quiroga
NIR already has these so they are redundant. A run of shader-db confirms
that the only cases where these backend optimizations are activated
are some Tomb Raider shaders where the affected variables are qualified
as "precise", which is why NIR won't apply them and why the backend
shouldn't either (so it is actually a bug).

Suggested-by: Jason Ekstrand 
---
 src/intel/compiler/brw_fs.cpp | 37 ---
 1 file changed, 37 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 77c955ac435..e7f5a8822a3 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -2568,16 +2568,6 @@ fs_visitor::opt_algebraic()
 break;
  }
  break;
-  case BRW_OPCODE_LRP:
- if (inst->src[1].equals(inst->src[2])) {
-inst->opcode = BRW_OPCODE_MOV;
-inst->src[0] = inst->src[1];
-inst->src[1] = reg_undef;
-inst->src[2] = reg_undef;
-progress = true;
-break;
- }
- break;
   case BRW_OPCODE_CMP:
  if ((inst->conditional_mod == BRW_CONDITIONAL_Z ||
   inst->conditional_mod == BRW_CONDITIONAL_NZ) &&
@@ -2654,33 +2644,6 @@ fs_visitor::opt_algebraic()
 }
  }
  break;
-  case BRW_OPCODE_MAD:
- if (inst->src[1].is_zero() || inst->src[2].is_zero()) {
-inst->opcode = BRW_OPCODE_MOV;
-inst->src[1] = reg_undef;
-inst->src[2] = reg_undef;
-progress = true;
- } else if (inst->src[0].is_zero()) {
-inst->opcode = BRW_OPCODE_MUL;
-inst->src[0] = inst->src[2];
-inst->src[2] = reg_undef;
-progress = true;
- } else if (inst->src[1].is_one()) {
-inst->opcode = BRW_OPCODE_ADD;
-inst->src[1] = inst->src[2];
-inst->src[2] = reg_undef;
-progress = true;
- } else if (inst->src[2].is_one()) {
-inst->opcode = BRW_OPCODE_ADD;
-inst->src[2] = reg_undef;
-progress = true;
- } else if (inst->src[1].file == IMM && inst->src[2].file == IMM) {
-inst->opcode = BRW_OPCODE_ADD;
-inst->src[1].f *= inst->src[2].f;
-inst->src[2] = reg_undef;
-progress = true;
- }
- break;
   case SHADER_OPCODE_BROADCAST:
  if (is_uniform(inst->src[0])) {
 inst->opcode = BRW_OPCODE_MOV;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 17/42] intel/compiler: add new half-float register type for 3-src instructions

2019-01-15 Thread Iago Toral Quiroga
This is available since gen8.

v2: restore previously existing assertion.

Reviewed-by: Topi Pohjolainen  (v1)
---
 src/intel/compiler/brw_reg_type.c | 36 +++
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_reg_type.c 
b/src/intel/compiler/brw_reg_type.c
index 60240ba1513..09b3ea61d4c 100644
--- a/src/intel/compiler/brw_reg_type.c
+++ b/src/intel/compiler/brw_reg_type.c
@@ -138,6 +138,7 @@ enum hw_3src_reg_type {
GEN7_3SRC_TYPE_D  = 1,
GEN7_3SRC_TYPE_UD = 2,
GEN7_3SRC_TYPE_DF = 3,
+   GEN8_3SRC_TYPE_HF = 4,
 
/** When ExecutionDatatype is 1: @{ */
GEN10_ALIGN1_3SRC_REG_TYPE_HF = 0b000,
@@ -166,6 +167,14 @@ static const struct hw_3src_type {
[BRW_REGISTER_TYPE_D]  = { GEN7_3SRC_TYPE_D  },
[BRW_REGISTER_TYPE_UD] = { GEN7_3SRC_TYPE_UD },
[BRW_REGISTER_TYPE_DF] = { GEN7_3SRC_TYPE_DF },
+}, gen8_hw_3src_type[] = {
+   [0 ... BRW_REGISTER_TYPE_LAST] = { INVALID },
+
+   [BRW_REGISTER_TYPE_F]  = { GEN7_3SRC_TYPE_F  },
+   [BRW_REGISTER_TYPE_D]  = { GEN7_3SRC_TYPE_D  },
+   [BRW_REGISTER_TYPE_UD] = { GEN7_3SRC_TYPE_UD },
+   [BRW_REGISTER_TYPE_DF] = { GEN7_3SRC_TYPE_DF },
+   [BRW_REGISTER_TYPE_HF] = { GEN8_3SRC_TYPE_HF },
 }, gen10_hw_3src_align1_type[] = {
 #define E(x) BRW_ALIGN1_3SRC_EXEC_TYPE_##x
[0 ... BRW_REGISTER_TYPE_LAST] = { INVALID },
@@ -249,6 +258,20 @@ brw_hw_type_to_reg_type(const struct gen_device_info 
*devinfo,
unreachable("not reached");
 }
 
+static inline const struct hw_3src_type *
+get_hw_3src_type_map(const struct gen_device_info *devinfo, uint32_t *size)
+{
+   if (devinfo->gen < 8) {
+  if (size)
+ *size = ARRAY_SIZE(gen7_hw_3src_type);
+  return gen7_hw_3src_type;
+   } else {
+  if (size)
+ *size = ARRAY_SIZE(gen8_hw_3src_type);
+  return gen8_hw_3src_type;
+   }
+}
+
 /**
  * Convert a brw_reg_type enumeration value into the hardware representation
  * for a 3-src align16 instruction
@@ -257,9 +280,12 @@ unsigned
 brw_reg_type_to_a16_hw_3src_type(const struct gen_device_info *devinfo,
  enum brw_reg_type type)
 {
-   assert(type < ARRAY_SIZE(gen7_hw_3src_type));
-   assert(gen7_hw_3src_type[type].reg_type != (enum hw_3src_reg_type)INVALID);
-   return gen7_hw_3src_type[type].reg_type;
+   uint32_t map_size;
+   const struct hw_3src_type *hw_3src_type_map =
+  get_hw_3src_type_map(devinfo, _size);
+   assert(type < map_size);
+   assert(hw_3src_type_map[type].reg_type != (enum hw_3src_reg_type)INVALID);
+   return hw_3src_type_map[type].reg_type;
 }
 
 /**
@@ -283,8 +309,10 @@ enum brw_reg_type
 brw_a16_hw_3src_type_to_reg_type(const struct gen_device_info *devinfo,
  unsigned hw_type)
 {
+   const struct hw_3src_type *hw_3src_type_map =
+  get_hw_3src_type_map(devinfo, NULL);
for (enum brw_reg_type i = 0; i <= BRW_REGISTER_TYPE_LAST; i++) {
-  if (gen7_hw_3src_type[i].reg_type == hw_type) {
+  if (hw_3src_type_map[i].reg_type == hw_type) {
  return i;
   }
}
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 18/42] intel/compiler: add a helper function to query hardware type table

2019-01-15 Thread Iago Toral Quiroga
We open coded this in a couple of places, so a helper function is probably
sensible. Plus it makes it more consistent with the 3src hardware type case.

Suggested-by: Topi Pohjolainen 
---
 src/intel/compiler/brw_reg_type.c | 34 ---
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/src/intel/compiler/brw_reg_type.c 
b/src/intel/compiler/brw_reg_type.c
index 09b3ea61d4c..0c9f522eca0 100644
--- a/src/intel/compiler/brw_reg_type.c
+++ b/src/intel/compiler/brw_reg_type.c
@@ -193,6 +193,20 @@ static const struct hw_3src_type {
 #undef E
 };
 
+static inline const struct hw_type *
+get_hw_type_map(const struct gen_device_info *devinfo, uint32_t *size)
+{
+   if (devinfo->gen >= 11) {
+  if (size)
+ *size = ARRAY_SIZE(gen11_hw_type);
+  return gen11_hw_type;
+   } else {
+  if (size)
+ *size = ARRAY_SIZE(gen4_hw_type);
+  return gen4_hw_type;
+   }
+}
+
 /**
  * Convert a brw_reg_type enumeration value into the hardware representation.
  *
@@ -203,16 +217,10 @@ brw_reg_type_to_hw_type(const struct gen_device_info 
*devinfo,
 enum brw_reg_file file,
 enum brw_reg_type type)
 {
-   const struct hw_type *table;
-
-   if (devinfo->gen >= 11) {
-  assert(type < ARRAY_SIZE(gen11_hw_type));
-  table = gen11_hw_type;
-   } else {
-  assert(type < ARRAY_SIZE(gen4_hw_type));
-  table = gen4_hw_type;
-   }
+   uint32_t table_size;
+   const struct hw_type *table = get_hw_type_map(devinfo, _size);
 
+   assert(type < table_size);
assert(devinfo->has_64bit_types || brw_reg_type_to_size(type) < 8 ||
   type == BRW_REGISTER_TYPE_NF);
 
@@ -234,13 +242,7 @@ enum brw_reg_type
 brw_hw_type_to_reg_type(const struct gen_device_info *devinfo,
 enum brw_reg_file file, unsigned hw_type)
 {
-   const struct hw_type *table;
-
-   if (devinfo->gen >= 11) {
-  table = gen11_hw_type;
-   } else {
-  table = gen4_hw_type;
-   }
+   const struct hw_type *table = get_hw_type_map(devinfo, NULL);
 
if (file == BRW_IMMEDIATE_VALUE) {
   for (enum brw_reg_type i = 0; i <= BRW_REGISTER_TYPE_LAST; i++) {
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v1] nir: Length of boolean vtn_value now is 1

2019-01-15 Thread apinheiro

Just tested it with the ARB_gl_spirv tests where I found this:

Tested-by: Alejandro Piñeiro 


On 15/1/19 12:08, Sergii Romantsov wrote:

During conversion type-length was lost due to math.

CC: Jason Ekstrand 
Fixes: 44227453ec03 (nir: Switch to using 1-bit Booleans for almost everything)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109353
Signed-off-by: Sergii Romantsov 
---
  src/compiler/spirv/spirv_to_nir.c | 9 ++---
  1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index e3dc619..faad771 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -1042,14 +1042,16 @@ vtn_type_layout_std430(struct vtn_builder *b, struct 
vtn_type *type,
  {
 switch (type->base_type) {
 case vtn_base_type_scalar: {
-  uint32_t comp_size = glsl_get_bit_size(type->type) / 8;
+  uint32_t comp_size = glsl_type_is_boolean(type->type)
+ ? 1 : glsl_get_bit_size(type->type) / 8;
*size_out = comp_size;
*align_out = comp_size;
return type;
 }
  
 case vtn_base_type_vector: {

-  uint32_t comp_size = glsl_get_bit_size(type->type) / 8;
+  uint32_t comp_size = glsl_type_is_boolean(type->type)
+ ? 1 : glsl_get_bit_size(type->type) / 8;
unsigned align_comps = type->length == 3 ? 4 : type->length;
*size_out = comp_size * type->length,
*align_out = comp_size * align_comps;
@@ -1168,7 +1170,8 @@ vtn_handle_type(struct vtn_builder *b, SpvOp opcode,
val->type->base_type = vtn_base_type_vector;
val->type->type = glsl_vector_type(glsl_get_base_type(base->type), 
elems);
val->type->length = elems;
-  val->type->stride = glsl_get_bit_size(base->type) / 8;
+  val->type->stride = glsl_type_is_boolean(val->type->type)
+ ? 1 : glsl_get_bit_size(base->type) / 8;
val->type->array_element = base;
break;
 }

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 08/42] intel/compiler: implement 16-bit fsign

2019-01-15 Thread Iago Toral Quiroga
v2:
 - make 16-bit be its own separate case (Jason)

Reviewed-by: Topi Pohjolainen 
---
 src/intel/compiler/brw_fs_nir.cpp | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index d742f55a957..cf546b8ff09 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -844,7 +844,22 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
 : bld.MOV(result, brw_imm_f(1.0f));
 
  set_predicate(BRW_PREDICATE_NORMAL, inst);
-  } else if (type_sz(op[0].type) < 8) {
+  } else if (type_sz(op[0].type) == 2) {
+ /* AND(val, 0x8000) gives the sign bit.
+  *
+  * Predicated OR ORs 1.0 (0x3c00) with the sign bit if val is not 
zero.
+  */
+ fs_reg zero = retype(brw_imm_uw(0), BRW_REGISTER_TYPE_HF);
+ bld.CMP(bld.null_reg_f(), op[0], zero, BRW_CONDITIONAL_NZ);
+
+ fs_reg result_int = retype(result, BRW_REGISTER_TYPE_UW);
+ op[0].type = BRW_REGISTER_TYPE_UW;
+ result.type = BRW_REGISTER_TYPE_UW;
+ bld.AND(result_int, op[0], brw_imm_uw(0x8000u));
+
+ inst = bld.OR(result_int, result_int, brw_imm_uw(0x3c00u));
+ inst->predicate = BRW_PREDICATE_NORMAL;
+  } else if (type_sz(op[0].type) == 4) {
  /* AND(val, 0x8000) gives the sign bit.
   *
   * Predicated OR ORs 1.0 (0x3f80) with the sign bit if val is not
@@ -866,6 +881,7 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
   * - The sign is encoded in the high 32-bit of each DF
   * - We need to produce a DF result.
   */
+ assert(type_sz(op[0].type) == 8);
 
  fs_reg zero = vgrf(glsl_type::double_type);
  bld.MOV(zero, setup_imm_df(bld, 0.0));
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 22/42] intel/compiler: don't propagate HF immediates to 3-src instructions

2019-01-15 Thread Iago Toral Quiroga
3-src instructions don't support immediates, but since 36bc5f06dd22,
we allow them on MAD and LRP relying on the combine constants pass to
fix it up later. However, that pass is specialized for 32-bit float
immediates and can't handle HF constants at present, so this patch
ensures that copy-propagation only does this for 32-bit constants.

Reviewed-by: Topi Pohjolainen 
---
 src/intel/compiler/brw_fs_copy_propagation.cpp | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp 
b/src/intel/compiler/brw_fs_copy_propagation.cpp
index c23ce1ef426..77f2749ba04 100644
--- a/src/intel/compiler/brw_fs_copy_propagation.cpp
+++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
@@ -772,8 +772,16 @@ fs_visitor::try_constant_propagate(fs_inst *inst, 
acp_entry *entry)
 
   case BRW_OPCODE_MAD:
   case BRW_OPCODE_LRP:
- inst->src[i] = val;
- progress = true;
+ /* 3-src instructions can't take IMM registers, however, for 32-bit
+  * floating instructions we rely on the combine constants pass to fix
+  * it up. For anything else, we shouldn't be promoting immediates
+  * until we can make the pass capable of combining constants of
+  * different sizes.
+  */
+ if (val.type == BRW_REGISTER_TYPE_F) {
+inst->src[i] = val;
+progress = true;
+ }
  break;
 
   default:
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 05/42] intel/compiler: assert restrictions on conversions to half-float

2019-01-15 Thread Iago Toral Quiroga
There are some hardware restrictions that brw_nir_lower_conversions should
have taken care of before we get here.

v2:
 - rebased on top of regioning lowering pass

Reviewed-by: Topi Pohjolainen  (v1)
---
 src/intel/compiler/brw_fs_nir.cpp | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index e1d0e318b35..d742f55a957 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -784,6 +784,9 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
*/
 
case nir_op_f2f16:
+   case nir_op_i2f16:
+   case nir_op_u2f16:
+  assert(type_sz(op[0].type) < 8); /* brw_nir_lower_conversions */
   inst = bld.MOV(result, op[0]);
   inst->saturate = instr->dest.saturate;
   break;
@@ -821,8 +824,6 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
case nir_op_u2u32:
case nir_op_i2i16:
case nir_op_u2u16:
-   case nir_op_i2f16:
-   case nir_op_u2f16:
case nir_op_i2i8:
case nir_op_u2u8:
   inst = bld.MOV(result, op[0]);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 11/42] intel/compiler: lower 16-bit fmod

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_compiler.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/intel/compiler/brw_compiler.c 
b/src/intel/compiler/brw_compiler.c
index fe632c5badc..f885e79c3e6 100644
--- a/src/intel/compiler/brw_compiler.c
+++ b/src/intel/compiler/brw_compiler.c
@@ -33,6 +33,7 @@
.lower_sub = true, \
.lower_fdiv = true,\
.lower_scmp = true,\
+   .lower_fmod16 = true,  \
.lower_fmod32 = true,  \
.lower_fmod64 = false, \
.lower_bitfield_extract = true,\
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 04/42] intel/compiler: handle b2i/b2f with other integer conversion opcodes

2019-01-15 Thread Iago Toral Quiroga
Since we handle booleans as integers this makes more sense.

v2:
 - rebased to incorporate new boolean conversion opcodes

v3:
 - rebased on top regioning lowering pass

Reviewed-by: Jason Ekstrand  (v1)
Reviewed-by: Topi Pohjolainen  (v2)
---
 src/intel/compiler/brw_fs_nir.cpp | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index a59debf2b78..e1d0e318b35 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -788,6 +788,14 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
   inst->saturate = instr->dest.saturate;
   break;
 
+   case nir_op_f2f64:
+   case nir_op_f2i64:
+   case nir_op_f2u64:
+  assert(type_sz(op[0].type) > 2); /* brw_nir_lower_conversions */
+  inst = bld.MOV(result, op[0]);
+  inst->saturate = instr->dest.saturate;
+  break;
+
case nir_op_b2i8:
case nir_op_b2i16:
case nir_op_b2i32:
@@ -798,14 +806,6 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
   op[0].type = BRW_REGISTER_TYPE_D;
   op[0].negate = !op[0].negate;
   /* fallthrough */
-   case nir_op_f2f64:
-   case nir_op_f2i64:
-   case nir_op_f2u64:
-  assert(type_sz(op[0].type) > 2); /* brw_nir_lower_conversions */
-  inst = bld.MOV(result, op[0]);
-  inst->saturate = instr->dest.saturate;
-  break;
-
case nir_op_i2f64:
case nir_op_i2i64:
case nir_op_u2f64:
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 15/42] intel/compiler: Extended Math is limited to SIMD8 on half-float

2019-01-15 Thread Iago Toral Quiroga
From the Skylake PRM, Extended Math Function:

  "The execution size must be no more than 8 when half-floats
   are used in source or destination operand."

Earlier generations do not support Extended Math with half-float.

v2:
 - Rewrite the code to make it more readable (Jason).

v3:
 - Use if-ladders or just if+return exclusively (Topi).

Reviewed-by: Topi Pohjolainen  (v1)
---
 src/intel/compiler/brw_fs.cpp | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 0359eb079f7..0b3ec94e2d2 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -5493,18 +5493,27 @@ get_lowered_simd_width(const struct gen_device_info 
*devinfo,
case SHADER_OPCODE_EXP2:
case SHADER_OPCODE_LOG2:
case SHADER_OPCODE_SIN:
-   case SHADER_OPCODE_COS:
+   case SHADER_OPCODE_COS: {
   /* Unary extended math instructions are limited to SIMD8 on Gen4 and
-   * Gen6.
+   * Gen6. Extended Math Function is limited to SIMD8 with half-float.
*/
-  return (devinfo->gen >= 7 ? MIN2(16, inst->exec_size) :
-  devinfo->gen == 5 || devinfo->is_g4x ? MIN2(16, inst->exec_size) 
:
-  MIN2(8, inst->exec_size));
+  if (devinfo->gen == 6 || (devinfo->gen == 4 && !devinfo->is_g4x))
+ return MIN2(8, inst->exec_size);
+  if (inst->dst.type == BRW_REGISTER_TYPE_HF)
+ return MIN2(8, inst->exec_size);
+  return MIN2(16, inst->exec_size);
+   }
 
-   case SHADER_OPCODE_POW:
-  /* SIMD16 is only allowed on Gen7+. */
-  return (devinfo->gen >= 7 ? MIN2(16, inst->exec_size) :
-  MIN2(8, inst->exec_size));
+   case SHADER_OPCODE_POW: {
+  /* SIMD16 is only allowed on Gen7+. Extended Math Function is limited
+   * to SIMD8 with half-float
+   */
+  if (devinfo->gen < 7)
+ return MIN2(8, inst->exec_size);
+  if (inst->dst.type == BRW_REGISTER_TYPE_HF)
+ return MIN2(8, inst->exec_size);
+  return MIN2(16, inst->exec_size);
+   }
 
case SHADER_OPCODE_INT_QUOTIENT:
case SHADER_OPCODE_INT_REMAINDER:
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 24/42] intel/compiler: fix ddy for half-float in gen8

2019-01-15 Thread Iago Toral Quiroga
We use ALign16 mode for this, since it is more convenient, but the PRM
for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register region
restrictions', Section '1. Special Restrictions':

   "In Align16 mode, the channel selects and channel enables apply to a
pair of half-floats, because these parameters are defined for DWord
elements ONLY. This is applicable when both source and destination
are half-floats."

This means that we cannot select individual HF elements using swizzles
like we do with 32-bit floats so we can't implement the required
regioning for this.

Use the gen11 path for this instead, which uses Align1 mode.

The restriction is not present in gen9 or gen10, where the Align16
implementation seems to work just fine.

Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_fs_generator.cpp | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index d0cc4a6d231..4310f0b7fdc 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -1339,8 +1339,14 @@ fs_generator::generate_ddy(const fs_inst *inst,
const uint32_t type_size = type_sz(src.type);
 
if (inst->opcode == FS_OPCODE_DDY_FINE) {
-  /* produce accurate derivatives */
-  if (devinfo->gen >= 11) {
+  /* produce accurate derivatives. We can do this easily in Align16
+   * but this is not supported in gen11+ and gen8 Align16 swizzles
+   * for Half-Float operands work in units of 32-bit and always
+   * select pairs of consecutive half-float elements, so we can't use
+   * use it for this.
+   */
+  if (devinfo->gen >= 11 ||
+  (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF)) {
  src = stride(src, 0, 2, 1);
  struct brw_reg src_0  = byte_offset(src,  0 * type_size);
  struct brw_reg src_2  = byte_offset(src,  2 * type_size);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 01/42] intel/compiler: handle conversions between int and half-float on atom

2019-01-15 Thread Iago Toral Quiroga
v2: adapted to work with the new regioning lowering pass

Reviewed-by: Topi Pohjolainen  (v1)
---
 src/intel/compiler/brw_ir_fs.h | 33 ++---
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
index 3c23fb375e4..ba4d6a95720 100644
--- a/src/intel/compiler/brw_ir_fs.h
+++ b/src/intel/compiler/brw_ir_fs.h
@@ -497,9 +497,10 @@ is_unordered(const fs_inst *inst)
 }
 
 /**
- * Return whether the following regioning restriction applies to the specified
- * instruction.  From the Cherryview PRM Vol 7. "Register Region
- * Restrictions":
+ * Return whether one of the the following regioning restrictions apply to the
+ * specified instruction.
+ *
+ * From the Cherryview PRM Vol 7. "Register Region Restrictions":
  *
  * "When source or destination datatype is 64b or operation is integer DWord
  *  multiply, regioning in Align1 must follow these rules:
@@ -508,6 +509,14 @@ is_unordered(const fs_inst *inst)
  *  2. Regioning must ensure Src.Vstride = Src.Width * Src.Hstride.
  *  3. Source and Destination offset must be the same, except the case of
  * scalar source."
+ *
+ * From the Cherryview PRM Vol 7. "Register Region Restrictions":
+ *
+ *"Conversion between Integer and HF (Half Float) must be DWord
+ * aligned and strided by a DWord on the destination."
+ *
+ *The same restriction is listed for other hardware platforms, however,
+ *empirical testing suggests that only atom platforms are affected.
  */
 static inline bool
 has_dst_aligned_region_restriction(const gen_device_info *devinfo,
@@ -518,10 +527,20 @@ has_dst_aligned_region_restriction(const gen_device_info 
*devinfo,
  (inst->opcode == BRW_OPCODE_MUL || inst->opcode == BRW_OPCODE_MAD);
 
if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
-   (type_sz(exec_type) == 4 && is_int_multiply))
-  return devinfo->is_cherryview || gen_device_info_is_9lp(devinfo);
-   else
-  return false;
+   (type_sz(exec_type) == 4 && is_int_multiply)) {
+  if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo))
+ return true;
+   }
+
+   const bool dst_type_is_hf = inst->dst.type == BRW_REGISTER_TYPE_HF;
+   const bool exec_type_is_hf = exec_type == BRW_REGISTER_TYPE_HF;
+   if ((dst_type_is_hf && !brw_reg_type_is_floating_point(exec_type)) ||
+   (exec_type_is_hf && !brw_reg_type_is_floating_point(inst->dst.type))) {
+  if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo))
+ return true;
+   }
+
+   return false;
 }
 
 #endif
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 09/42] intel/compiler: allow extended math functions with HF operands

2019-01-15 Thread Iago Toral Quiroga
The PRM states that half-float operands are supported since gen9.

Reviewed-by: Topi Pohjolainen 
---
 src/intel/compiler/brw_eu_emit.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index 45e2552783b..e21df4624b3 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++ b/src/intel/compiler/brw_eu_emit.c
@@ -1874,8 +1874,10 @@ void gen6_math(struct brw_codegen *p,
   assert(src1.file == BRW_GENERAL_REGISTER_FILE ||
  (devinfo->gen >= 8 && src1.file == BRW_IMMEDIATE_VALUE));
} else {
-  assert(src0.type == BRW_REGISTER_TYPE_F);
-  assert(src1.type == BRW_REGISTER_TYPE_F);
+  assert(src0.type == BRW_REGISTER_TYPE_F ||
+ (src0.type == BRW_REGISTER_TYPE_HF && devinfo->gen >= 9));
+  assert(src1.type == BRW_REGISTER_TYPE_F ||
+ (src1.type == BRW_REGISTER_TYPE_HF && devinfo->gen >= 9));
}
 
/* Source modifiers are ignored for extended math instructions on Gen6. */
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 02/42] intel/compiler: add a NIR pass to lower conversions

2019-01-15 Thread Iago Toral Quiroga
Some conversions are not directly supported in hardware and need to be
split in two conversion instructions going through an intermediary type.
Doing this at the NIR level simplifies a bit the complexity in the backend.

v2:
 - Consider fp16 rounding conversion opcodes
 - Properly handle swizzles on conversion sources.

Reviewed-by: Topi Pohjolainen  (v1)
---
 src/intel/Makefile.sources|   1 +
 src/intel/compiler/brw_nir.c  |   1 +
 src/intel/compiler/brw_nir.h  |   2 +
 .../compiler/brw_nir_lower_conversions.c  | 158 ++
 src/intel/compiler/meson.build|   1 +
 5 files changed, 163 insertions(+)
 create mode 100644 src/intel/compiler/brw_nir_lower_conversions.c

diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
index 94a28d370e8..9975daa3ad1 100644
--- a/src/intel/Makefile.sources
+++ b/src/intel/Makefile.sources
@@ -83,6 +83,7 @@ COMPILER_FILES = \
compiler/brw_nir_analyze_boolean_resolves.c \
compiler/brw_nir_analyze_ubo_ranges.c \
compiler/brw_nir_attribute_workarounds.c \
+   compiler/brw_nir_lower_conversions.c \
compiler/brw_nir_lower_cs_intrinsics.c \
compiler/brw_nir_lower_image_load_store.c \
compiler/brw_nir_lower_mem_access_bit_sizes.c \
diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index 92d7fe4bede..572ab824a94 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -882,6 +882,7 @@ brw_postprocess_nir(nir_shader *nir, const struct 
brw_compiler *compiler,
OPT(nir_opt_move_comparisons);
 
OPT(nir_lower_bool_to_int32);
+   OPT(brw_nir_lower_conversions);
 
OPT(nir_lower_locals_to_regs);
 
diff --git a/src/intel/compiler/brw_nir.h b/src/intel/compiler/brw_nir.h
index bc81950d47e..662b2627e95 100644
--- a/src/intel/compiler/brw_nir.h
+++ b/src/intel/compiler/brw_nir.h
@@ -114,6 +114,8 @@ void brw_nir_lower_tcs_outputs(nir_shader *nir, const 
struct brw_vue_map *vue,
GLenum tes_primitive_mode);
 void brw_nir_lower_fs_outputs(nir_shader *nir);
 
+bool brw_nir_lower_conversions(nir_shader *nir);
+
 bool brw_nir_lower_image_load_store(nir_shader *nir,
 const struct gen_device_info *devinfo);
 void brw_nir_rewrite_image_intrinsic(nir_intrinsic_instr *intrin,
diff --git a/src/intel/compiler/brw_nir_lower_conversions.c 
b/src/intel/compiler/brw_nir_lower_conversions.c
new file mode 100644
index 000..583167c7753
--- /dev/null
+++ b/src/intel/compiler/brw_nir_lower_conversions.c
@@ -0,0 +1,158 @@
+/*
+ * Copyright © 2018 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "brw_nir.h"
+#include "compiler/nir/nir_builder.h"
+
+static nir_op
+get_conversion_op(nir_alu_type src_type,
+  unsigned src_bit_size,
+  nir_alu_type dst_type,
+  unsigned dst_bit_size,
+  nir_rounding_mode rounding_mode)
+{
+   nir_alu_type src_full_type = (nir_alu_type) (src_type | src_bit_size);
+   nir_alu_type dst_full_type = (nir_alu_type) (dst_type | dst_bit_size);
+
+   return nir_type_conversion_op(src_full_type, dst_full_type, rounding_mode);
+}
+
+static nir_rounding_mode
+get_opcode_rounding_mode(nir_op op)
+{
+   switch (op) {
+   case nir_op_f2f16_rtz:
+  return nir_rounding_mode_rtz;
+   case nir_op_f2f16_rtne:
+  return nir_rounding_mode_rtne;
+   default:
+  return nir_rounding_mode_undef;
+   }
+}
+
+static void
+split_conversion(nir_builder *b, nir_alu_instr *alu, nir_op op1, nir_op op2)
+{
+   b->cursor = nir_before_instr(>instr);
+   assert(alu->dest.write_mask == 1);
+   nir_ssa_def *src = nir_ssa_for_alu_src(b, alu, 0);
+   nir_ssa_def *tmp = nir_build_alu(b, op1, src, NULL, NULL, NULL);
+   nir_ssa_def *res = nir_build_alu(b, op2, tmp, NULL, NULL, NULL);

[Mesa-dev] [PATCH v3 00/42] intel: VK_KHR_shader_float16_int8 implementation

2019-01-15 Thread Iago Toral Quiroga
The changes in this version address review feedback to v2 and, most importantly,
rebase on top of relevant changes in master, specifically Curro's regioning
lowering pass. This new regioning pass simplifies some of the NIR translation
code (specifically the code for translating regioning restrictions on
conversions for atom platforms) making some of the previous work in this series
unnecessary. The regioning restrictions for conversions between integer and
half-float added with this series are are now implemented as part of this
framework instead of doing it at NIR translation time. This version of the
series also dropped the SPIR-V compiler patches that have already been merged.

As always, a branch for with these patches is available for testing in the
itoral/VK_KHR_shader_float16_int8 branch of the Igalia Mesa repository at
https://github.com/Igalia/mesa.

Iago Toral Quiroga (42):
  intel/compiler: handle conversions between int and half-float on atom
  intel/compiler: add a NIR pass to lower conversions
  intel/compiler: split float to 64-bit opcodes from int to 64-bit
  intel/compiler: handle b2i/b2f with other integer conversion opcodes
  intel/compiler: assert restrictions on conversions to half-float
  intel/compiler: lower some 16-bit float operations to 32-bit
  intel/compiler: lower 16-bit extended math to 32-bit prior to gen9
  intel/compiler: implement 16-bit fsign
  intel/compiler: allow extended math functions with HF operands
  compiler/nir: add lowering option for 16-bit fmod
  intel/compiler: lower 16-bit fmod
  compiler/nir: add lowering for 16-bit flrp
  intel/compiler: lower 16-bit flrp
  compiler/nir: add lowering for 16-bit ldexp
  intel/compiler: Extended Math is limited to SIMD8 on half-float
  intel/compiler: add instruction setters for Src1Type and Src2Type.
  intel/compiler: add new half-float register type for 3-src
instructions
  intel/compiler: add a helper function to query hardware type table
  intel/compiler: don't compact 3-src instructions with Src1Type or
Src2Type bits
  intel/compiler: allow half-float on 3-source instructions since gen8
  intel/compiler: set correct precision fields for 3-source float
instructions
  intel/compiler: don't propagate HF immediates to 3-src instructions
  intel/compiler: fix ddx and ddy for 16-bit float
  intel/compiler: fix ddy for half-float in gen8
  intel/compiler: workaround for SIMD8 half-float MAD in gen8
  intel/compiler: split is_partial_write() into two variants
  intel/compiler: activate 16-bit bit-size lowerings also for 8-bit
  intel/compiler: handle 64-bit float to 8-bit integer conversions
  intel/compiler: handle conversions between int and half-float on atom
  intel/compiler: implement isign for int8
  intel/compiler: ask for an integer type if requesting an 8-bit type
  intel/eu: force stride of 2 on NULL register for Byte instructions
  compiler/spirv: add support for Float16 and Int8 capabilities
  anv/pipeline: support Float16 and Int8 capabilities in gen8+
  anv/device: expose shaderFloat16 and shaderInt8 in gen8+
  intel/compiler: implement is_zero, is_one, is_negative_one for
8-bit/16-bit
  intel/compiler: add a brw_reg_type_is_integer helper
  intel/compiler: fix cmod propagation for non 32-bit types
  intel/compiler: remove MAD/LRP algebraic optimizations from the
backend
  intel/compiler: support half-float in the combine constants pass
  intel/compiler: fix combine constants for Align16 with half-float
prior to gen9
  intel/compiler: allow propagating HF immediates to MAD/LRP

 src/compiler/nir/nir.h|   2 +
 src/compiler/nir/nir_opt_algebraic.py |  11 +-
 src/compiler/shader_info.h|   2 +
 src/compiler/spirv/spirv_to_nir.c |   8 +-
 src/intel/Makefile.sources|   1 +
 src/intel/compiler/brw_compiler.c |   2 +
 src/intel/compiler/brw_eu_compact.c   |   5 +-
 src/intel/compiler/brw_eu_emit.c  |  36 +++-
 src/intel/compiler/brw_fs.cpp | 143 ++--
 src/intel/compiler/brw_fs.h   |   1 +
 .../compiler/brw_fs_cmod_propagation.cpp  |  34 ++--
 .../compiler/brw_fs_combine_constants.cpp |  82 +++--
 .../compiler/brw_fs_copy_propagation.cpp  |  14 +-
 src/intel/compiler/brw_fs_cse.cpp |   3 +-
 .../compiler/brw_fs_dead_code_eliminate.cpp   |   2 +-
 src/intel/compiler/brw_fs_generator.cpp   |  47 +++---
 src/intel/compiler/brw_fs_live_variables.cpp  |   2 +-
 src/intel/compiler/brw_fs_nir.cpp |  85 --
 src/intel/compiler/brw_fs_reg_allocate.cpp|   2 +-
 .../compiler/brw_fs_register_coalesce.cpp |   2 +-
 .../compiler/brw_fs_saturate_propagation.cpp  |   7 +-
 src/intel/compiler/brw_fs_sel_peephole.cpp|   4 +-
 src/intel/compiler/brw_inst.h |   2 +
 src/intel/compiler/brw_ir_fs.h|  36 +++-
 src/intel/compiler/brw_nir.c  |  21 ++-
 

[Mesa-dev] [ANNOUNCE] Mesa 18.3.2 release candidate

2019-01-15 Thread Emil Velikov
Hello list,

The candidate for the Mesa 18.3.2 is now available. Currently we have:
 - 78 queued
 - 3 nominated (outstanding)
 - and 0 rejected patches


With this release candidate we have added more PCI IDs for AMD Vega devices and
a number of fixes for the RADV Vulkan drivers.

On the Intel side we have a selection ranging from quad swizzles support for ICL
to compiler fixes.

The nine state tracker has also seen some love as do the Broadcom drivers.

To top it all up, we have a healthy mount of build system fixes.


Take a look at section "Mesa stable queue" for more information.


Testing reports/general approval

Any testing reports (or general approval of the state of the branch) will be
greatly appreciated.

The plan is to have 18.3.2 this Wednesday 16th January 2019, around or shortly
after 12:00 GMT.

If you have any questions or suggestions - be that about the current patch
queue or otherwise, please go ahead.


Trivial merge conflicts
---

commit 083f5fccb9e3849d955034ff7455e3fb60f7984f
Author: Jason Ekstrand 

nir/constant_folding: Fix source bit size logic

(cherry picked from commit 3595a0abf43be3ce27d88f5939b257a74e90035b)


commit 80bea2ba6e4c5b1152e6623717d460a1c9e0e6ec
Author: Emil Velikov 

glx/test: meson: assorted include fixes

(cherry picked from commit f331419f262d3a0f270376cafbb9517b4627bb7a)


commit e5b1fde8c2e35938cda8373dbd502bfa608fa8d9
Author: Bas Nieuwenhuizen 

radv: Implement buffer stores with less than 4 components.

(cherry picked from commit 9a45a190ad22849a492506389413046948e0b093)


commit e5b1fde8c2e35938cda8373dbd502bfa608fa8d9
Author: Bas Nieuwenhuizen 

radv: Implement buffer stores with less than 4 components.

(cherry picked from commit 9a45a190ad22849a492506389413046948e0b093)



Cheers,
Emil


Mesa stable queue
-

Nominated (3)
=

Bas Nieuwenhuizen (1):
  f67dea5e19e radv: Fix multiview depth clears

Note: commit is not applicable in current form. Backport requested.

Jason Ekstrand (2):
  bfe31c5e461 nir/builder: Add nir_i2i and nir_u2u helpers which take a bit 
size
  abfe674c54b spirv: Handle arbitrary bit sizes for deref array indices

Note: commits are not applicable in current form. Backports requested.


Queued (78)
===

Alex Deucher (3):
  pci_ids: add new vega10 pci ids
  pci_ids: add new vega20 pci id
  pci_ids: add new VegaM pci id

Alexander von Gluck IV (1):
  egl/haiku: Fix reference to disp vs dpy

Andres Gomez (2):
  glsl: correct typo in GLSL compilation error message
  glsl/linker: specify proper direction in location aliasing error

Axel Davy (3):
  st/nine: Fix volumetexture dtor on ctor failure
  st/nine: Bind src not dst in nine_context_box_upload
  st/nine: Add src reference to nine_context_range_upload

Bas Nieuwenhuizen (5):
  radv: Do a cache flush if needed before reading predicates.
  radv: Implement buffer stores with less than 4 components.
  anv/android: Do not reject storage images.
  radv: Fix rasterization precision bits.
  spirv: Fix matrix parameters in function calls.

Caio Marcelo de Oliveira Filho (3):
  nir: properly clear the entry sources in copy_prop_vars
  nir: properly find the entry to keep in copy_prop_vars
  nir: remove dead code from copy_prop_vars

Dave Airlie (2):
  radv/xfb: fix counter buffer bounds checks.
  virgl/vtest: fix front buffer flush with protocol version 0.

Dylan Baker (6):
  meson: Fix ppc64 little endian detection
  meson: Add support for gnu hurd
  meson: Add toggle for glx-direct
  meson: Override C++ standard to gnu++11 when building with altivec on 
ppc64
  meson: Error out if building nouveau and using LLVM without rtti
  autotools: Remove tegra vdpau driver

Emil Velikov (11):
  docs: add sha256 checksums for 18.3.1
  bin/get-pick-list.sh: rework handing of sha nominations
  bin/get-pick-list.sh: warn when commit lists invalid sha
  cherry-ignore: meson: libfreedreno depends upon libdrm (for fence support)
  glx: mandate xf86vidmode only for "drm" dri platforms
Squashed with
  glx: Fix compilation with GLX_USE_WINDOWSGL
  meson: don't require glx/egl/gbm with gallium drivers
  pipe-loader: meson: reference correct library
  TODO: glx: meson: build dri based glx tests, only with -Dglx=dri
  glx: meson: drop includes from a link-only library
  glx: meson: wire up the dispatch-index-check test
  glx/test: meson: assorted include fixes

Eric Anholt (6):
  v3d: Fix a leak of the transfer helper on screen destroy.
  vc4: Fix a leak of the transfer helper on screen destroy.
  v3d: Fix a leak of the disassembled instruction string during debug dumps.
  v3d: Make sure that a thrsw doesn't split a multop from its umul24.
  v3d: Add missing flagging of SYNCB as a TSY op.
  gallium/ttn: Fix setup 

[Mesa-dev] [PATCH v1] nir: Length of boolean vtn_value now is 1

2019-01-15 Thread Sergii Romantsov
During conversion type-length was lost due to math.

CC: Jason Ekstrand 
Fixes: 44227453ec03 (nir: Switch to using 1-bit Booleans for almost everything)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109353
Signed-off-by: Sergii Romantsov 
---
 src/compiler/spirv/spirv_to_nir.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index e3dc619..faad771 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -1042,14 +1042,16 @@ vtn_type_layout_std430(struct vtn_builder *b, struct 
vtn_type *type,
 {
switch (type->base_type) {
case vtn_base_type_scalar: {
-  uint32_t comp_size = glsl_get_bit_size(type->type) / 8;
+  uint32_t comp_size = glsl_type_is_boolean(type->type)
+ ? 1 : glsl_get_bit_size(type->type) / 8;
   *size_out = comp_size;
   *align_out = comp_size;
   return type;
}
 
case vtn_base_type_vector: {
-  uint32_t comp_size = glsl_get_bit_size(type->type) / 8;
+  uint32_t comp_size = glsl_type_is_boolean(type->type)
+ ? 1 : glsl_get_bit_size(type->type) / 8;
   unsigned align_comps = type->length == 3 ? 4 : type->length;
   *size_out = comp_size * type->length,
   *align_out = comp_size * align_comps;
@@ -1168,7 +1170,8 @@ vtn_handle_type(struct vtn_builder *b, SpvOp opcode,
   val->type->base_type = vtn_base_type_vector;
   val->type->type = glsl_vector_type(glsl_get_base_type(base->type), 
elems);
   val->type->length = elems;
-  val->type->stride = glsl_get_bit_size(base->type) / 8;
+  val->type->stride = glsl_type_is_boolean(val->type->type)
+ ? 1 : glsl_get_bit_size(base->type) / 8;
   val->type->array_element = base;
   break;
}
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl/wayland: break double/tripple buffering feedback loops

2019-01-15 Thread Derek Foreman
On 1/15/19 8:02 AM, Daniel Stone wrote:
> Hi,
> 
> On Tue, 18 Dec 2018 at 17:59, Lucas Stach  wrote:
>> Am Dienstag, den 18.12.2018, 17:43 + schrieb Emil Velikov:
 On Tue, 18 Dec 2018 at 11:16, Lucas Stach  wrote:
   if (dri2_surf->back == NULL)
  dri2_surf->back = _surf->color_buffers[i];
 - else if (dri2_surf->back->dri_image == NULL)
 + else if (dri2_surf->back->dri_image == NULL && 
 dri2_surf->color_buffers[i].dri_image)
  dri2_surf->back = _surf->color_buffers[i];
 + age = dri2_surf->back->age;
}

>>>
>>> AFAICT this is the wayland equivalent of
>>> 4f1d27a406478d405eac6f9894ccc46a80034adb
>>> Where the exact same logic/commit message applies.
>>
>> No it isn't. It's exactly what it says in the commit log. It's keeping
>> the tripple buffer around for a bit, even if we don't strictly need it
>> when the client is currently doing double buffering.

I'm having a bit of a hard time following the logic in this first hunk
myself...

The dri2_surf->color_buffers[i].age < age check looks like it's intended
to skip buffers younger than the one currently in hand - ie) pick the
oldest buffer.  But doing so would break the second hunk because we'd
never end up with a very old buffer to trim.  (It doesn't actually cause
the oldest buffer to be picked though, because of the other tests involved)

I would like to at least see a comment explaining what's going on,
because it looks kind of like a bug on a casual read.

I think I side with Emil that this is an independent functional change.
 The first hunk should be able to stand on its own, and having a commit
log for it explaining what it does to the selection process would be
helpful.

> Right - the crucial part in Derek's GBM commit was removing the
> 'break' and adding the extra conditional on age.
> 
> Derek's patch stabilises the age of buffers handed back to the user,
> by always returning the oldest available buffer. That slightly
> pessimises rendering if there is a 'free' buffer in the queue: if four
> buffers are allocated, then we will always return a buffer from three
> flips ago, maybe meaning more rendering work. It means that, barring
> the client holding on to one buffer for unexpectedly long, the age of
> the oldest buffer in the queue will never be greater than the queue
> depth.
> 
> This patch instead relies on unbalanced ages, where older buffers in
> the queue are allowed to age far beyond the queue depth if not used
> during normal rendering.

Yes, it's a bit more annoying to track how long a buffer is "unneeded"
for when you always pick the oldest, since age becomes useless for that
purpose.

For this patch to work, we need a buffer to be unused until it ages
beyond a threshold - intuitively it seems to me this will just naturally
happen to the last buffer in the array without the first hunk at all?

>> When things are on the edge between double buffering being enough and
>> sometimes a third buffer being needed to avoid stalling we would
>> otherwise bounce rapidly between allocating and disposing the third
>> buffer.
>>
>> The DRM platform has no such optimization and just keeps the third
>> buffer around forever. This patch keeps the optimization in the Wayland
>> platform, but adds a bit of hysteresis before disposing the buffer when
>> going from tripple to double buffering to see if things are settling on
>> double buffering.
> 
> Ideally we'd have globally optimal behaviour for both platforms, but
> that doesn't really seem doable for now. I think this is a good
> balance though. There will only be one GBM user at a time, so having
> that allocate excessive buffers doesn't seem too bad, and the penalty
> for doing so is your entire system stuttering as the compositor
> becomes blocked. Given the general stability of compositors, if they
> need a larger queue depth at some point, they are likely to need it
> again in the near future.
> 
> Conversely, there may be a great many Wayland clients, and these
> clients may bounce between overlay and GPU composition. Given that, it
> seems reasonable to opportunistically free up buffers, to make sure we
> have enough memory available across the system.

Right - to be clear, I think this is a really good idea. :)
I'm just having a little trouble with the details of the implementation.

>>> The age check here seems strange - both number used and it's relation
>>> to double/triple buffering.
>>> Have you considered tracking/checking how many buffers we have?
>>
>> A hysteresis value of 18 is just something that worked well in
>> practice. It didn't appear to defer the buffer destruction for too long
>>  while keeping the feedback loop well under control.
> 
> Yeah, having this #defined with a comment above it would be nice.
> 
> With that, this patch is:
> Reviewed-by: Daniel Stone 
> 
> Cheers,
> Daniel
> 

___
mesa-dev mailing list

[Mesa-dev] [PATCH] intel/fs: Do the grf127 hack on SIMD8 instructions in SIMD16 mode

2019-01-15 Thread Jason Ekstrand
Previously, we only applied the fix to shaders with a dispatch mode of
SIMD8 but the code it relies on for SIMD16 mode only applies to SIMD16
instructions.  If you have a SIMD8 instruction in a SIMD16 shader,
neither would trigger and the restriction could still be hit.

Cc: Jose Maria Casanova Crespo 
Fixes: 232ed8980217dd "i965/fs: Register allocator shoudn't use grf127..."
---
 src/intel/compiler/brw_fs_reg_allocate.cpp | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp 
b/src/intel/compiler/brw_fs_reg_allocate.cpp
index 5db5242452e..ec743f9b5bf 100644
--- a/src/intel/compiler/brw_fs_reg_allocate.cpp
+++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
@@ -667,15 +667,14 @@ fs_visitor::assign_regs(bool allow_spilling, bool 
spill_all)
* messages adding a node interference to the grf127_send_hack_node.
* This node has a fixed asignment to grf127.
*
-   * We don't apply it to SIMD16 because previous code avoids any register
-   * overlap between sources and destination.
+   * We don't apply it to SIMD16 instructions because previous code avoids
+   * any register overlap between sources and destination.
*/
   ra_set_node_reg(g, grf127_send_hack_node, 127);
-  if (dispatch_width == 8) {
- foreach_block_and_inst(block, fs_inst, inst, cfg) {
-if (inst->is_send_from_grf() && inst->dst.file == VGRF)
-   ra_add_node_interference(g, inst->dst.nr, 
grf127_send_hack_node);
- }
+  foreach_block_and_inst(block, fs_inst, inst, cfg) {
+ if (inst->exec_size < 16 && inst->is_send_from_grf() &&
+ inst->dst.file == VGRF)
+ra_add_node_interference(g, inst->dst.nr, grf127_send_hack_node);
   }
 
   if (spilled_any_registers) {
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] gallium/util: add util_format_snorm8_to_sint8 (from radeonsi)

2019-01-15 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/auxiliary/util/u_format.c  | 40 ++
 src/gallium/auxiliary/util/u_format.h  |  3 ++
 src/gallium/drivers/radeonsi/si_blit.c | 32 ++---
 3 files changed, 45 insertions(+), 30 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_format.c 
b/src/gallium/auxiliary/util/u_format.c
index 231e89017b4..862061a8ec2 100644
--- a/src/gallium/auxiliary/util/u_format.c
+++ b/src/gallium/auxiliary/util/u_format.c
@@ -879,10 +879,50 @@ void util_format_unswizzle_4f(float *dst, const float 
*src,
  break;
   case PIPE_SWIZZLE_Z:
  dst[2] = src[i];
  break;
   case PIPE_SWIZZLE_W:
  dst[3] = src[i];
  break;
   }
}
 }
+
+enum pipe_format
+util_format_snorm8_to_sint8(enum pipe_format format)
+{
+   switch (format) {
+   case PIPE_FORMAT_R8_SNORM:
+  return PIPE_FORMAT_R8_SINT;
+   case PIPE_FORMAT_R8G8_SNORM:
+  return PIPE_FORMAT_R8G8_SINT;
+   case PIPE_FORMAT_R8G8B8_SNORM:
+  return PIPE_FORMAT_R8G8B8_SINT;
+   case PIPE_FORMAT_R8G8B8A8_SNORM:
+  return PIPE_FORMAT_R8G8B8A8_SINT;
+
+   case PIPE_FORMAT_A8_SNORM:
+  return PIPE_FORMAT_A8_SINT;
+   case PIPE_FORMAT_L8_SNORM:
+  return PIPE_FORMAT_L8_SINT;
+   case PIPE_FORMAT_L8A8_SNORM:
+  return PIPE_FORMAT_L8A8_SINT;
+   case PIPE_FORMAT_I8_SNORM:
+  return PIPE_FORMAT_I8_SINT;
+
+   case PIPE_FORMAT_R8G8B8X8_SNORM:
+  return PIPE_FORMAT_R8G8B8X8_SINT;
+   case PIPE_FORMAT_R8A8_SNORM:
+  return PIPE_FORMAT_R8A8_SINT;
+   case PIPE_FORMAT_A8L8_SNORM:
+  return PIPE_FORMAT_A8L8_SINT;
+   case PIPE_FORMAT_G8R8_SNORM:
+  return PIPE_FORMAT_G8R8_SINT;
+   case PIPE_FORMAT_A8B8G8R8_SNORM:
+  return PIPE_FORMAT_A8B8G8R8_SINT;
+   case PIPE_FORMAT_X8B8G8R8_SNORM:
+  return PIPE_FORMAT_X8B8G8R8_SINT;
+
+   default:
+  return format;
+   }
+}
diff --git a/src/gallium/auxiliary/util/u_format.h 
b/src/gallium/auxiliary/util/u_format.h
index 8dcc438a4a1..0c0c505e391 100644
--- a/src/gallium/auxiliary/util/u_format.h
+++ b/src/gallium/auxiliary/util/u_format.h
@@ -1351,15 +1351,18 @@ void util_format_apply_color_swizzle(union 
pipe_color_union *dst,
  const union pipe_color_union *src,
  const unsigned char swz[4],
  const boolean is_integer);
 
 void pipe_swizzle_4f(float *dst, const float *src,
 const unsigned char swz[4]);
 
 void util_format_unswizzle_4f(float *dst, const float *src,
   const unsigned char swz[4]);
 
+enum pipe_format
+util_format_snorm8_to_sint8(enum pipe_format format);
+
 #ifdef __cplusplus
 } // extern "C" {
 #endif
 
 #endif /* ! U_FORMAT_H */
diff --git a/src/gallium/drivers/radeonsi/si_blit.c 
b/src/gallium/drivers/radeonsi/si_blit.c
index 69b1af02db0..16be11247e4 100644
--- a/src/gallium/drivers/radeonsi/si_blit.c
+++ b/src/gallium/drivers/radeonsi/si_blit.c
@@ -1005,50 +1005,22 @@ void si_resource_copy_region(struct pipe_context *ctx,
assert(0);
}
}
}
 
/* SNORM8 blitting has precision issues on some chips. Use the SINT
 * equivalent instead, which doesn't force DCC decompression.
 * Note that some chips avoid this issue by using SDMA.
 */
if (util_format_is_snorm8(dst_templ.format)) {
-   switch (dst_templ.format) {
-   case PIPE_FORMAT_R8_SNORM:
-   dst_templ.format = src_templ.format = 
PIPE_FORMAT_R8_SINT;
-   break;
-   case PIPE_FORMAT_R8G8_SNORM:
-   dst_templ.format = src_templ.format = 
PIPE_FORMAT_R8G8_SINT;
-   break;
-   case PIPE_FORMAT_R8G8B8X8_SNORM:
-   dst_templ.format = src_templ.format = 
PIPE_FORMAT_R8G8B8X8_SINT;
-   break;
-   case PIPE_FORMAT_R8G8B8A8_SNORM:
-   /* There are no SINT variants for ABGR and XBGR, so we have to 
use RGBA. */
-   case PIPE_FORMAT_A8B8G8R8_SNORM:
-   case PIPE_FORMAT_X8B8G8R8_SNORM:
-   dst_templ.format = src_templ.format = 
PIPE_FORMAT_R8G8B8A8_SINT;
-   break;
-   case PIPE_FORMAT_A8_SNORM:
-   dst_templ.format = src_templ.format = 
PIPE_FORMAT_A8_SINT;
-   break;
-   case PIPE_FORMAT_L8_SNORM:
-   dst_templ.format = src_templ.format = 
PIPE_FORMAT_L8_SINT;
-   break;
-   case PIPE_FORMAT_L8A8_SNORM:
-   dst_templ.format = src_templ.format = 
PIPE_FORMAT_L8A8_SINT;
-   break;
-   case PIPE_FORMAT_I8_SNORM:
-   dst_templ.format = src_templ.format = 
PIPE_FORMAT_I8_SINT;
-   

Re: [Mesa-dev] Meson configuration for bare-bones osmesa

2019-01-15 Thread Dylan Baker
Quoting Chuck Atkins (2019-01-15 11:17:43)
> I'm jumping into Meson land now and I'm trying to configure the most recent
> release, 18.3.1, to build a minimal OSMesa containing only softpipe.  So I'm
> trying to make sure everything is explicitly disabled and only turning on the
> few pieces I need:
> 
> 
> meson -Ddebug=false -Degl=false -Dgbm=false -Dopengl=true -Dgles1=false

BTW, meson is more like cmake, it has predefined build profiles so there is no
debug option. We default to 'debugoptimized', which is -O2 -g, there's also
'debug': -O0 -g, 'release': -O2, and 'plain': ''. There's a separate toggle for
asserts -Dn_debug.

Dylan

> -Dgles2=false -Dglvnd=false -Dgallium-nine=false -Dgallium-omx=disabled
> -Dgallium-opencl=disabled -Dgallium-va=false -Dgallium-vdpau=false
> -Dgallium-xa=false -Dgallium-xvmc=false -Dplatforms=surfaceless
> -Dgallium-drivers=swrast -Dosmesa=gallium -Dllvm=false -Dglx=disabled
> -Ddri-drivers= build
> 
> 
> But I end up with the following error:
> 
> meson.build:393:4: ERROR:  Problem encountered: building dri or gallium
> drivers require at least one window system
> 
> 
> Am I doing something wring here?  It looks like the meson build isn't properly
> dealing with the dependencies of the software rasterizers since you should be
> able to build a libOSMesa.so with only software rasterizers, i.e. softpipe,
> llvmpipe, and swr, without requiring any windowing system.
> 
> --
> Chuck Atkins
> Staff R Engineer, Scientific Computing
> Kitware, Inc.
> (518) 881-1183


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-15 Thread Dylan Baker
Quoting Jason Ekstrand (2019-01-15 11:57:01)
> On Tue, Jan 15, 2019 at 12:52 PM Eric Anholt  wrote:
> 
> Daniel Stone  writes:
> 
> > Hi,
> >
> > On Tue, 15 Jan 2019 at 12:21, Rob Clark  wrote:
> >> On Tue, Jan 15, 2019 at 1:02 AM Tapani Pälli 
> wrote:
> >> > On 1/14/19 2:36 PM, Daniel Stone wrote:
> >> > > On Fri, 11 Jan 2019 at 17:05, Jason Ekstrand 
> wrote:
> >> > > In other projects, we looked for ways to apply the tags and ended 
> up
> >> > > concluding that they didn't bring enough value to make it
> worthwhile.
> >> > > I don't know if that holds for Mesa, but it would be better to 
> start
> >> > > with an actual problem statement - what value does R-b bring and
> how?
> >> > > - then look at ways to solve that problem, rather than just very
> >> > > directly finding a way to insert that literal text string into 
> every
> >> > > commit message.
> >> >
> >> > IMO it brings some 'shared responsibility' for correctness of the
> patch
> >
> > Oh, no doubt - we certainly haven't abandoned thorough review! So far
> > we haven't seen that compromised by not having a name in the commit
> > message.
> >
> >> > and quickly accessible information on who were looking at the change.
> So
> >> > ideally later when filing bug against commit/series there would be
> more
> >> > people than just the committer that should take a look at the 
> possible
> >> > regressions. At least in my experience people filing bugs tend to
> often
> >> > also CC the reviewer.
> >
> > Yeah, that's really helpful. So maybe a useful flow - assuming we
> > eventually switch to GitLab issues - would be the ability to associate
> > an issue with a commit, which could then automatically drag in people
> > who commented on the MR which landed that commit, as well as (at
> > least) the reporter of the issue(s) fixed by that MR. That would need
> > some kind of clever - probably at least semi-manual - filtering to
> > make sure it wasn't just spamming the world, but it's at least a
> > starting point.
> >
> >> +1 .. and also it is nice to see things like Reported-by/Reviewed-by
> >> without having to go search somewhere else (ie. outside of git/tig)
> >
> > My question would again be what value that brings you. Do you just
> > like seeing the name there, or do you go poke the people on IRC, or
> > follow up via email, or ... ? Again I personally go look through the
> > original review to see what came up during that first, but everyone's
> > different, so I'm just trying to understand what you actually do with
> > that information, so we can figure out if there's a better way to do
> > things for everyone rather than just blindly imitating what came
> > before.
> 
> I've participated in adding Reported-bys, but I've never seen the use.
> It felt like "we could record this information, so we should!" rather
> than solving a problem.
> 
> 
> To me, the Reported-by tag is more for giving credit than anything else.  
> Maybe
> it doesn't matter but some people appreciate it when their contributions, even
> if it's just a good bug report, are recorded in the project's permanent
> record.  It's also a good way to make sure the reporter gets CCd on the patch
> so they can verify it fixes the bug for them.  That said, bugzilla is a
> permanent record and the information would be even more accessible if we just
> used GitLab MRs...

If we used gitlab issues with a Fixes #123 tag the reporter and all subscribers
would be notified and there would still be a permanent record.

Dylan

> I've found little use in ccing reviewers on followups, except for
> trivial stuff like compiler warnings.  I propose that the solution for
> compiler warnings should be CI that prevents you from merging new
> compiler warnings anyway.
> 
> Basically, I feel like the pain points in the MR process (amending and
> re-pushing before clicking "merge") are pre-existing pain points in our
> process, slightly amplified.
> 
> >> (ofc it would be pretty awesome incentive to switch to gitlab issues
> >> if gitlab could automate adding Reported-by tags for MR's associated
> >> with an issue.. but I guess checkbox to add Reviewed-by tag would
> >> already make my day)
> >
> > I saw this the other day, which might be more incentive:
> > https://csoriano.pages.gitlab.gnome.org/csoriano-blog/post/
> 2019-01-07-issue-handling-automation/
> 
> Automatic needinfo closing?  Sign me up.

Yes please!


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: use compute for resource_copy_region when possible

2019-01-15 Thread Marek Olšák
From: Sonny Jiang 

v2: marek: fix snorm8 blits

Signed-off-by: Sonny Jiang 
Signed-off-by: Marek Olšák 
---
 src/gallium/drivers/radeonsi/si_blit.c|  12 ++
 .../drivers/radeonsi/si_compute_blit.c| 108 ++
 src/gallium/drivers/radeonsi/si_pipe.c|   4 +
 src/gallium/drivers/radeonsi/si_pipe.h|  11 ++
 .../drivers/radeonsi/si_shaderlib_tgsi.c  |  77 +
 5 files changed, 212 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_blit.c 
b/src/gallium/drivers/radeonsi/si_blit.c
index 16be11247e4..bb8d1cbd12d 100644
--- a/src/gallium/drivers/radeonsi/si_blit.c
+++ b/src/gallium/drivers/radeonsi/si_blit.c
@@ -895,32 +895,44 @@ struct texture_orig_info {
 void si_resource_copy_region(struct pipe_context *ctx,
 struct pipe_resource *dst,
 unsigned dst_level,
 unsigned dstx, unsigned dsty, unsigned dstz,
 struct pipe_resource *src,
 unsigned src_level,
 const struct pipe_box *src_box)
 {
struct si_context *sctx = (struct si_context *)ctx;
struct si_texture *ssrc = (struct si_texture*)src;
+   struct si_texture *sdst = (struct si_texture*)dst;
struct pipe_surface *dst_view, dst_templ;
struct pipe_sampler_view src_templ, *src_view;
unsigned dst_width, dst_height, src_width0, src_height0;
unsigned dst_width0, dst_height0, src_force_level = 0;
struct pipe_box sbox, dstbox;
 
/* Handle buffers first. */
if (dst->target == PIPE_BUFFER && src->target == PIPE_BUFFER) {
si_copy_buffer(sctx, dst, src, dstx, src_box->x, 
src_box->width);
return;
}
 
+   if (!util_format_is_compressed(src->format) &&
+   !util_format_is_compressed(dst->format) &&
+   !util_format_is_depth_or_stencil(src->format) &&
+   src->nr_samples <= 1 &&
+   !sdst->dcc_offset &&
+   !(dst->target != src->target &&
+ (src->target == PIPE_TEXTURE_1D_ARRAY || dst->target == 
PIPE_TEXTURE_1D_ARRAY))) {
+   si_compute_copy_image(sctx, dst, dst_level, src, src_level, 
dstx, dsty, dstz, src_box);
+   return;
+   }
+
assert(u_max_sample(dst) == u_max_sample(src));
 
/* The driver doesn't decompress resources automatically while
 * u_blitter is rendering. */
si_decompress_subresource(ctx, src, PIPE_MASK_RGBAZS, src_level,
  src_box->z, src_box->z + src_box->depth - 1);
 
dst_width = u_minify(dst->width0, dst_level);
dst_height = u_minify(dst->height0, dst_level);
dst_width0 = dst->width0;
diff --git a/src/gallium/drivers/radeonsi/si_compute_blit.c 
b/src/gallium/drivers/radeonsi/si_compute_blit.c
index dfa77a98804..c547d124507 100644
--- a/src/gallium/drivers/radeonsi/si_compute_blit.c
+++ b/src/gallium/drivers/radeonsi/si_compute_blit.c
@@ -17,20 +17,21 @@
  * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
  * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
  * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
  * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
  * USE OR OTHER DEALINGS IN THE SOFTWARE.
  *
  */
 
 #include "si_pipe.h"
+#include "util/u_format.h"
 
 /* Note: Compute shaders always use SI_COMPUTE_DST_CACHE_POLICY for dst
  * and L2_STREAM for src.
  */
 static enum si_cache_policy get_cache_policy(struct si_context *sctx,
 enum si_coherency coher,
 uint64_t size)
 {
if ((sctx->chip_class >= GFX9 && (coher == SI_COHERENCY_CB_META ||
  coher == SI_COHERENCY_CP)) ||
@@ -285,14 +286,121 @@ void si_copy_buffer(struct si_context *sctx,
size > 32 * 1024 &&
dst_offset % 4 == 0 && src_offset % 4 == 0 && size % 4 == 0) {
si_compute_do_clear_or_copy(sctx, dst, dst_offset, src, 
src_offset,
size, NULL, 0, coher);
} else {
si_cp_dma_copy_buffer(sctx, dst, src, dst_offset, src_offset, 
size,
  0, coher, cache_policy);
}
 }
 
+void si_compute_copy_image(struct si_context *sctx,
+  struct pipe_resource *dst,
+  unsigned dst_level,
+  struct pipe_resource *src,
+  unsigned src_level,
+  unsigned dstx, unsigned dsty, unsigned dstz,
+  const struct pipe_box *src_box)
+{
+   struct pipe_context *ctx = >b;
+   unsigned width = src_box->width;
+   unsigned 

Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-15 Thread Rob Clark
On Tue, Jan 15, 2019 at 7:40 AM Daniel Stone  wrote:
>
> Hi,
>
> On Tue, 15 Jan 2019 at 12:21, Rob Clark  wrote:
> > On Tue, Jan 15, 2019 at 1:02 AM Tapani Pälli  wrote:
> > > On 1/14/19 2:36 PM, Daniel Stone wrote:
> > > > On Fri, 11 Jan 2019 at 17:05, Jason Ekstrand  
> > > > wrote:
> > > > In other projects, we looked for ways to apply the tags and ended up
> > > > concluding that they didn't bring enough value to make it worthwhile.
> > > > I don't know if that holds for Mesa, but it would be better to start
> > > > with an actual problem statement - what value does R-b bring and how?
> > > > - then look at ways to solve that problem, rather than just very
> > > > directly finding a way to insert that literal text string into every
> > > > commit message.
> > >
> > > IMO it brings some 'shared responsibility' for correctness of the patch
>
> Oh, no doubt - we certainly haven't abandoned thorough review! So far
> we haven't seen that compromised by not having a name in the commit
> message.
>
> > > and quickly accessible information on who were looking at the change. So
> > > ideally later when filing bug against commit/series there would be more
> > > people than just the committer that should take a look at the possible
> > > regressions. At least in my experience people filing bugs tend to often
> > > also CC the reviewer.
>
> Yeah, that's really helpful. So maybe a useful flow - assuming we
> eventually switch to GitLab issues - would be the ability to associate
> an issue with a commit, which could then automatically drag in people
> who commented on the MR which landed that commit, as well as (at
> least) the reporter of the issue(s) fixed by that MR. That would need
> some kind of clever - probably at least semi-manual - filtering to
> make sure it wasn't just spamming the world, but it's at least a
> starting point.
>
> > +1 .. and also it is nice to see things like Reported-by/Reviewed-by
> > without having to go search somewhere else (ie. outside of git/tig)
>
> My question would again be what value that brings you. Do you just
> like seeing the name there, or do you go poke the people on IRC, or
> follow up via email, or ... ? Again I personally go look through the
> original review to see what came up during that first, but everyone's
> different, so I'm just trying to understand what you actually do with
> that information, so we can figure out if there's a better way to do
> things for everyone rather than just blindly imitating what came
> before.

If I am curious or have some questions about why some code is the way
it is I frequently use tig-blame, which makes it easy to step into the
commit that made the change and see the commit msg and r-b tags..  I
guess the most important part if I need to ping someone on IRC w/
questions is the author, but it seems like having the other tags handy
without context-switching to browser/gitlab is useful.

I guess I don't as frequently dig into the history of the original
patchset and it's review comments.. mostly because that isn't as easy
with the email based review process.  Making this easier would defn be
a win.  But in cases where I don't have to leave the comfort of tig,
it would be nice not to have to start doing so..

This is not an argument for sticking to email based process, just
defence of what I think would be a useful feature for gitlab to gain
;-)

(Also, I suppose preserving those artifacts of "the old process" is
probably useful for folks who run git statistics, although personally
that does not effect me.)

BR,
-R

> > (ofc it would be pretty awesome incentive to switch to gitlab issues
> > if gitlab could automate adding Reported-by tags for MR's associated
> > with an issue.. but I guess checkbox to add Reviewed-by tag would
> > already make my day)
>
> I saw this the other day, which might be more incentive:
> https://csoriano.pages.gitlab.gnome.org/csoriano-blog/post/2019-01-07-issue-handling-automation/
>
> Cheers,
> Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/7] swr/rast: Use gfxptr_t value in JitGatherVertices

2019-01-15 Thread Cherniak, Bruce
Reviewed-by: Bruce Cherniak  

> On Dec 17, 2018, at 8:36 AM, Alok Hota  wrote:
> 
> Use gfxptr_t type value for stream pointer uses in gather and similar
> calls
> ---
> .../swr/rasterizer/jitter/fetch_jit.cpp   | 34 +--
> 1 file changed, 16 insertions(+), 18 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp 
> b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
> index 3ad0fabe81f..d294a67050c 100644
> --- a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
> @@ -550,9 +550,6 @@ void FetchJit::JitGatherVertices(const 
> FETCH_COMPILE_STATE& fetchState,
> 
> Value* stream = LOAD(streams, {ied.StreamIndex, 
> SWR_VERTEX_BUFFER_STATE_xpData});
> 
> -// VGATHER* takes an *i8 src pointer
> -Value* pStreamBase = INT_TO_PTR(stream, PointerType::get(mInt8Ty, 
> 0));
> -
> Value* stride  = LOAD(streams, {ied.StreamIndex, 
> SWR_VERTEX_BUFFER_STATE_pitch});
> Value* vStride = VBROADCAST(stride);
> 
> @@ -620,7 +617,8 @@ void FetchJit::JitGatherVertices(const 
> FETCH_COMPILE_STATE& fetchState,
> 
> // calculate byte offset to the start of the VB
> Value* baseOffset = MUL(Z_EXT(startOffset, mInt64Ty), 
> Z_EXT(stride, mInt64Ty));
> -pStreamBase   = GEP(pStreamBase, baseOffset);
> +
> +// VGATHER* takes an *i8 src pointer so that's what stream is
> Value* pStreamBaseGFX = ADD(stream, baseOffset);
> 
> // if we have a start offset, subtract from max vertex. Used for OOB 
> check
> @@ -698,7 +696,7 @@ void FetchJit::JitGatherVertices(const 
> FETCH_COMPILE_STATE& fetchState,
> {
> Value* pResults[4];
> CreateGatherOddFormats(
> -(SWR_FORMAT)ied.Format, vGatherMask, pStreamBase, vOffsets, 
> pResults);
> +(SWR_FORMAT)ied.Format, vGatherMask, pStreamBaseGFX, 
> vOffsets, pResults);
> ConvertFormat((SWR_FORMAT)ied.Format, pResults);
> 
> for (uint32_t c = 0; c < 4; c += 1)
> @@ -733,7 +731,7 @@ void FetchJit::JitGatherVertices(const 
> FETCH_COMPILE_STATE& fetchState,
> // if we have at least one component out of x or y to fetch
> if (isComponentEnabled(compMask, 0) || 
> isComponentEnabled(compMask, 1))
> {
> -vGatherResult[0] = GATHERPS(gatherSrc, pStreamBase, 
> vOffsets, vGatherMask);
> +vGatherResult[0] = GATHERPS(gatherSrc, pStreamBaseGFX, 
> vOffsets, vGatherMask);
> // e.g. result of first 8x32bit integer gather for 16bit 
> components
> // 256i - 01234567
> //xyxy xyxy xyxy xyxy xyxy xyxy xyxy xyxy
> @@ -744,9 +742,9 @@ void FetchJit::JitGatherVertices(const 
> FETCH_COMPILE_STATE& fetchState,
> if (isComponentEnabled(compMask, 2) || 
> isComponentEnabled(compMask, 3))
> {
> // offset base to the next components(zw) in the vertex 
> to gather
> -pStreamBase = GEP(pStreamBase, C((char)4));
> +pStreamBaseGFX = ADD(pStreamBaseGFX, C((int64_t)4));
> 
> -vGatherResult[1] = GATHERPS(gatherSrc, pStreamBase, 
> vOffsets, vGatherMask);
> +vGatherResult[1] = GATHERPS(gatherSrc, pStreamBaseGFX, 
> vOffsets, vGatherMask);
> // e.g. result of second 8x32bit integer gather for 16bit 
> components
> // 256i - 01234567
> //zwzw zwzw zwzw zwzw zwzw zwzw zwzw zwzw
> @@ -811,7 +809,6 @@ void FetchJit::JitGatherVertices(const 
> FETCH_COMPILE_STATE& fetchState,
> }
> 
> // offset base to the next component in the vertex to 
> gather
> -pStreamBase= GEP(pStreamBase, C((char)4));
> pStreamBaseGFX = ADD(pStreamBaseGFX, C((int64_t)4));
> }
> }
> @@ -854,9 +851,9 @@ void FetchJit::JitGatherVertices(const 
> FETCH_COMPILE_STATE& fetchState,
> mVWidth / 2, 
> ConstantFP::get(IRB()->getDoubleTy(), 0.0f));
> 
> Value* pGatherLo =
> -GATHERPD(vZeroDouble, pStreamBase, 
> vOffsetsLo, vMaskLo);
> +GATHERPD(vZeroDouble, pStreamBaseGFX, 
> vOffsetsLo, vMaskLo);
> Value* pGatherHi =
> -GATHERPD(vZeroDouble, pStreamBase, 
> vOffsetsHi, vMaskHi);
> +GATHERPD(vZeroDouble, pStreamBaseGFX, 
> vOffsetsHi, vMaskHi);
> 
> pGatherLo = VCVTPD2PS(pGatherLo);
> pGatherHi = VCVTPD2PS(pGatherHi);
> @@ -880,7 +877,7 @@ void FetchJit::JitGatherVertices(const 
> 

Re: [Mesa-dev] [PATCH 4/7] swr/rast: Unaligned and translations in gathers

2019-01-15 Thread Cherniak, Bruce
Reviewed-by: Bruce Cherniak  

> On Dec 17, 2018, at 8:36 AM, Alok Hota  wrote:
> 
> - added graphics address translation in odd gathers
> - added support for unaligned gathers in fetch shader
> - changed how 2+ GB offsets are handled to make them compatible with
> unaligned offsets
> ---
> .../swr/rasterizer/jitter/fetch_jit.cpp   | 56 ---
> 1 file changed, 35 insertions(+), 21 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp 
> b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
> index d294a67050c..6feb1a76e63 100644
> --- a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
> @@ -368,7 +368,7 @@ void FetchJit::UnpackComponents(SWR_FORMAT format, Value* 
> vInput, Value* result[
> // gather SIMD full pixels per lane then shift/mask to move each component to 
> their
> // own vector
> void FetchJit::CreateGatherOddFormats(
> -SWR_FORMAT format, Value* pMask, Value* pBase, Value* pOffsets, Value* 
> pResult[4])
> +SWR_FORMAT format, Value* pMask, Value* xpBase, Value* pOffsets, Value* 
> pResult[4])
> {
> const SWR_FORMAT_INFO& info = GetFormatInfo(format);
> 
> @@ -378,7 +378,7 @@ void FetchJit::CreateGatherOddFormats(
> Value* pGather;
> if (info.bpp == 32)
> {
> -pGather = GATHERDD(VIMMED1(0), pBase, pOffsets, pMask);
> +pGather = GATHERDD(VIMMED1(0), xpBase, pOffsets, pMask);
> }
> else
> {
> @@ -386,29 +386,40 @@ void FetchJit::CreateGatherOddFormats(
> Value* pMem = ALLOCA(mSimdInt32Ty);
> STORE(VIMMED1(0u), pMem);
> 
> -pBase  = BITCAST(pBase, PointerType::get(mInt8Ty, 0));
> -Value* pDstMem = BITCAST(pMem, mInt32PtrTy);
> +Value* pDstMem = POINTER_CAST(pMem, mInt32PtrTy);
> 
> for (uint32_t lane = 0; lane < mVWidth; ++lane)
> {
> // Get index
> Value* index = VEXTRACT(pOffsets, C(lane));
> Value* mask  = VEXTRACT(pMask, C(lane));
> +
> +// use branch around load based on mask
> +// Needed to avoid page-faults on unmasked lanes
> +BasicBlock* pCurrentBB = IRB()->GetInsertBlock();
> +BasicBlock* pMaskedLoadBlock =
> +BasicBlock::Create(JM()->mContext, "MaskedLaneLoad", 
> pCurrentBB->getParent());
> +BasicBlock* pEndLoadBB = BasicBlock::Create(JM()->mContext, 
> "AfterMaskedLoad", pCurrentBB->getParent());
> +
> +COND_BR(mask, pMaskedLoadBlock, pEndLoadBB);
> +
> +JM()->mBuilder.SetInsertPoint(pMaskedLoadBlock);
> +
> switch (info.bpp)
> {
> case 8:
> {
> Value* pDst = BITCAST(GEP(pDstMem, C(lane)), 
> PointerType::get(mInt8Ty, 0));
> -Value* pSrc = BITCAST(GEP(pBase, index), 
> PointerType::get(mInt8Ty, 0));
> -STORE(LOAD(SELECT(mask, pSrc, pDst)), pDst);
> +Value* xpSrc = ADD(xpBase, Z_EXT(index, xpBase->getType()));
> +STORE(LOAD(xpSrc, "", mInt8PtrTy, GFX_MEM_CLIENT_FETCH), 
> pDst);
> break;
> }
> 
> case 16:
> {
> Value* pDst = BITCAST(GEP(pDstMem, C(lane)), 
> PointerType::get(mInt16Ty, 0));
> -Value* pSrc = BITCAST(GEP(pBase, index), 
> PointerType::get(mInt16Ty, 0));
> -STORE(LOAD(SELECT(mask, pSrc, pDst)), pDst);
> +Value* xpSrc = ADD(xpBase, Z_EXT(index, xpBase->getType()));
> +STORE(LOAD(xpSrc, "", mInt16PtrTy, GFX_MEM_CLIENT_FETCH), 
> pDst);
> break;
> }
> break;
> @@ -417,13 +428,13 @@ void FetchJit::CreateGatherOddFormats(
> {
> // First 16-bits of data
> Value* pDst = BITCAST(GEP(pDstMem, C(lane)), 
> PointerType::get(mInt16Ty, 0));
> -Value* pSrc = BITCAST(GEP(pBase, index), 
> PointerType::get(mInt16Ty, 0));
> -STORE(LOAD(SELECT(mask, pSrc, pDst)), pDst);
> +Value* xpSrc = ADD(xpBase, Z_EXT(index, xpBase->getType()));
> +STORE(LOAD(xpSrc, "", mInt16PtrTy, GFX_MEM_CLIENT_FETCH), 
> pDst);
> 
> // Last 8-bits of data
> pDst = BITCAST(GEP(pDst, C(1)), PointerType::get(mInt8Ty, 0));
> -pSrc = BITCAST(GEP(pSrc, C(1)), PointerType::get(mInt8Ty, 
> 0));
> -STORE(LOAD(SELECT(mask, pSrc, pDst)), pDst);
> +xpSrc = ADD(xpSrc, C(2));
> +STORE(LOAD(xpSrc, "", mInt8PtrTy, GFX_MEM_CLIENT_FETCH), 
> pDst);
> break;
> }
> 
> @@ -431,6 +442,9 @@ void FetchJit::CreateGatherOddFormats(
> SWR_INVALID("Shouldn't have BPP = %d now", info.bpp);
> break;
> }
> +
> +BR(pEndLoadBB);
> +JM()->mBuilder.SetInsertPoint(pEndLoadBB);
> }

Re: [Mesa-dev] [PATCH 5/7] swr/rast: Scope MEM_CLIENT enum for mem usages

2019-01-15 Thread Cherniak, Bruce
Reviewed-by: Bruce Cherniak  

> On Dec 17, 2018, at 8:36 AM, Alok Hota  wrote:
> 
> Avoids confusion with other defaulted integer parameters
> 
> - fixed some unspecified usages
> - removed unnecessary includes
> - removed unecessary protected access specifier in buckets framework
> ---
> .../drivers/swr/rasterizer/jitter/builder.h   |  1 -
> .../swr/rasterizer/jitter/builder_gfx_mem.cpp |  2 +-
> .../swr/rasterizer/jitter/builder_gfx_mem.h   | 20 ++---
> .../swr/rasterizer/jitter/builder_mem.cpp |  1 -
> .../swr/rasterizer/jitter/builder_mem.h   | 28 +--
> .../swr/rasterizer/jitter/fetch_jit.cpp   | 26 -
> 6 files changed, 38 insertions(+), 40 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder.h 
> b/src/gallium/drivers/swr/rasterizer/jitter/builder.h
> index a047f2a065f..0ce8d025b5c 100644
> --- a/src/gallium/drivers/swr/rasterizer/jitter/builder.h
> +++ b/src/gallium/drivers/swr/rasterizer/jitter/builder.h
> @@ -161,7 +161,6 @@ namespace SwrJit
> #include "builder_math.h"
> #include "builder_mem.h"
> 
> -protected:
> void SetPrivateContext(Value* pPrivateContext)
> {
> mpPrivateContext = pPrivateContext;
> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_gfx_mem.cpp 
> b/src/gallium/drivers/swr/rasterizer/jitter/builder_gfx_mem.cpp
> index c68f3b9a619..19eec7e99e0 100644
> --- a/src/gallium/drivers/swr/rasterizer/jitter/builder_gfx_mem.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_gfx_mem.cpp
> @@ -52,7 +52,7 @@ namespace SwrJit
> 
> void BuilderGfxMem::AssertGFXMemoryParams(Value* ptr, 
> Builder::JIT_MEM_CLIENT usage)
> {
> -SWR_ASSERT(!(ptr->getType() == mInt64Ty && usage == 
> MEM_CLIENT_INTERNAL),
> +SWR_ASSERT(!(ptr->getType() == mInt64Ty && usage == 
> JIT_MEM_CLIENT::MEM_CLIENT_INTERNAL),
>"Internal memory should not be gfxptr_t.");
> }
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_gfx_mem.h 
> b/src/gallium/drivers/swr/rasterizer/jitter/builder_gfx_mem.h
> index aefbbef9fba..4cf06253695 100644
> --- a/src/gallium/drivers/swr/rasterizer/jitter/builder_gfx_mem.h
> +++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_gfx_mem.h
> @@ -51,21 +51,21 @@ namespace SwrJit
> virtual LoadInst* LOAD(Value* Ptr,
>const char*Name,
>Type*  Ty= nullptr,
> -   JIT_MEM_CLIENT usage = MEM_CLIENT_INTERNAL);
> +   JIT_MEM_CLIENT usage = 
> JIT_MEM_CLIENT::MEM_CLIENT_INTERNAL);
> virtual LoadInst* LOAD(Value* Ptr,
>const Twine&   Name  = "",
>Type*  Ty= nullptr,
> -   JIT_MEM_CLIENT usage = MEM_CLIENT_INTERNAL);
> +   JIT_MEM_CLIENT usage = 
> JIT_MEM_CLIENT::MEM_CLIENT_INTERNAL);
> virtual LoadInst* LOAD(Value* Ptr,
>bool   isVolatile,
>const Twine&   Name  = "",
>Type*  Ty= nullptr,
> -   JIT_MEM_CLIENT usage = MEM_CLIENT_INTERNAL);
> +   JIT_MEM_CLIENT usage = 
> JIT_MEM_CLIENT::MEM_CLIENT_INTERNAL);
> virtual LoadInst* LOAD(Value* BasePtr,
>const std::initializer_list& offset,
>const llvm::Twine& Name  = 
> "",
>Type*  Ty= 
> nullptr,
> -   JIT_MEM_CLIENT usage 
> = MEM_CLIENT_INTERNAL);
> +   JIT_MEM_CLIENT usage 
> = JIT_MEM_CLIENT::MEM_CLIENT_INTERNAL);
> 
> 
> virtual CallInst* MASKED_LOAD(Value* Ptr,
> @@ -74,36 +74,36 @@ namespace SwrJit
>   Value* PassThru = nullptr,
>   const Twine&   Name = "",
>   Type*  Ty   = nullptr,
> -  JIT_MEM_CLIENT usage= 
> MEM_CLIENT_INTERNAL);
> +  JIT_MEM_CLIENT usage= 
> JIT_MEM_CLIENT::MEM_CLIENT_INTERNAL);
> 
> virtual Value* GATHERPS(Value* src,
> Value* pBase,
> Value* indices,
> Value* mask,
> uint8_tscale = 1,
> -JIT_MEM_CLIENT usage = MEM_CLIENT_INTERNAL);
> +JIT_MEM_CLIENT usage = 
> JIT_MEM_CLIENT::MEM_CLIENT_INTERNAL);
>  

Re: [Mesa-dev] [PATCH 6/7] swr/rast: New execution engine per JIT

2019-01-15 Thread Cherniak, Bruce
Reviewed-by: Bruce Cherniak  

> On Dec 17, 2018, at 8:36 AM, Alok Hota  wrote:
> 
> Fixes relocation errors with LLVM 7.0.0
> ---
> .../swr/rasterizer/jitter/JitManager.cpp  | 79 +++
> .../swr/rasterizer/jitter/JitManager.h| 28 +--
> 2 files changed, 65 insertions(+), 42 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp 
> b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp
> index 58d30d4e119..1b2b570318c 100644
> --- a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp
> @@ -63,39 +63,29 @@ JitManager::JitManager(uint32_t simdWidth, const char* 
> arch, const char* core) :
> mContext(), mBuilder(mContext), mIsModuleFinalized(true), mJitNumber(0), 
> mVWidth(simdWidth),
> mArch(arch)
> {
> +mpCurrentModule = nullptr;
> +mpExec = nullptr;
> +
> InitializeNativeTarget();
> InitializeNativeTargetAsmPrinter();
> InitializeNativeTargetDisassembler();
> 
> 
> -TargetOptions tOpts;
> -tOpts.AllowFPOpFusion = FPOpFusion::Fast;
> -tOpts.NoInfsFPMath= false;
> -tOpts.NoNaNsFPMath= false;
> -tOpts.UnsafeFPMath = false;
> -
> -// tOpts.PrintMachineCode= true;
> -
> -std::unique_ptr newModule(new Module("", mContext));
> -mpCurrentModule = newModule.get();
> -
> -StringRef hostCPUName;
> -
> // force JIT to use the same CPU arch as the rest of swr
> if (mArch.AVX512F())
> {
> #if USE_SIMD16_SHADERS
> if (mArch.AVX512ER())
> {
> -hostCPUName = StringRef("knl");
> +mHostCpuName = StringRef("knl");
> }
> else
> {
> -hostCPUName = StringRef("skylake-avx512");
> +mHostCpuName = StringRef("skylake-avx512");
> }
> mUsingAVX512 = true;
> #else
> -hostCPUName = StringRef("core-avx2");
> +mHostCpuName = StringRef("core-avx2");
> #endif
> if (mVWidth == 0)
> {
> @@ -104,7 +94,7 @@ JitManager::JitManager(uint32_t simdWidth, const char* 
> arch, const char* core) :
> }
> else if (mArch.AVX2())
> {
> -hostCPUName = StringRef("core-avx2");
> +mHostCpuName = StringRef("core-avx2");
> if (mVWidth == 0)
> {
> mVWidth = 8;
> @@ -114,11 +104,11 @@ JitManager::JitManager(uint32_t simdWidth, const char* 
> arch, const char* core) :
> {
> if (mArch.F16C())
> {
> -hostCPUName = StringRef("core-avx-i");
> +mHostCpuName = StringRef("core-avx-i");
> }
> else
> {
> -hostCPUName = StringRef("corei7-avx");
> +mHostCpuName = StringRef("corei7-avx");
> }
> if (mVWidth == 0)
> {
> @@ -131,31 +121,21 @@ JitManager::JitManager(uint32_t simdWidth, const char* 
> arch, const char* core) :
> }
> 
> 
> -auto optLevel = CodeGenOpt::Aggressive;
> +mOptLevel = CodeGenOpt::Aggressive;
> 
> if (KNOB_JIT_OPTIMIZATION_LEVEL >= CodeGenOpt::None &&
> KNOB_JIT_OPTIMIZATION_LEVEL <= CodeGenOpt::Aggressive)
> {
> -optLevel = CodeGenOpt::Level(KNOB_JIT_OPTIMIZATION_LEVEL);
> +mOptLevel = CodeGenOpt::Level(KNOB_JIT_OPTIMIZATION_LEVEL);
> }
> 
> -mpCurrentModule->setTargetTriple(sys::getProcessTriple());
> -mpExec = EngineBuilder(std::move(newModule))
> - .setTargetOptions(tOpts)
> - .setOptLevel(optLevel)
> - .setMCPU(hostCPUName)
> - .create();
> -
> if (KNOB_JIT_ENABLE_CACHE)
> {
> -mCache.Init(this, hostCPUName, optLevel);
> -mpExec->setObjectCache();
> +mCache.Init(this, mHostCpuName, mOptLevel);
> }
> 
> -#if LLVM_USE_INTEL_JITEVENTS
> -JITEventListener* vTune = 
> JITEventListener::createIntelJITEventListener();
> -mpExec->RegisterJITEventListener(vTune);
> -#endif
> +SetupNewModule();
> +mIsModuleFinalized = true;
> 
> // fetch function signature
> #if USE_SIMD16_SHADERS
> @@ -198,6 +178,35 @@ JitManager::JitManager(uint32_t simdWidth, const char* 
> arch, const char* core) :
> #endif
> }
> 
> +void JitManager::CreateExecEngine(std::unique_ptr pModule)
> +{
> +TargetOptions tOpts;
> +tOpts.AllowFPOpFusion = FPOpFusion::Fast;
> +tOpts.NoInfsFPMath= false;
> +tOpts.NoNaNsFPMath= false;
> +tOpts.UnsafeFPMath = false;
> +
> +// tOpts.PrintMachineCode= true;
> +
> +mpExec = EngineBuilder(std::move(pModule))
> + .setTargetOptions(tOpts)
> + .setOptLevel(mOptLevel)
> + .setMCPU(mHostCpuName)
> + .create();
> +
> +if (KNOB_JIT_ENABLE_CACHE)
> +{
> +mpExec->setObjectCache();
> +}
> +
> +#if LLVM_USE_INTEL_JITEVENTS
> +JITEventListener* vTune = 
> JITEventListener::createIntelJITEventListener();
> +

Re: [Mesa-dev] [PATCH 3/7] swr/rast: partial support for Tiled Resources

2019-01-15 Thread Cherniak, Bruce
Reviewed-by: Bruce Cherniak  

> On Dec 17, 2018, at 8:36 AM, Alok Hota  wrote:
> 
> - updated sample from TRTT surfaces correctly
> - implemented mapped status return for TRTT surfaces
> - implemented per-sample instruction minLod clamp
> - updated bilinear filter weight calculation to be closer to D3D specs
> - implemented "ReducedTexcoordRange" operation from D3D specs to avoid
> loss of precision on high-value normalized coordinates
> ---
> .../swr/rasterizer/jitter/builder_misc.cpp| 142 ++
> .../swr/rasterizer/jitter/builder_misc.h  |  22 +++
> 2 files changed, 164 insertions(+)
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp 
> b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
> index 26d8688f5e9..65eec4e4c68 100644
> --- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
> @@ -764,6 +764,148 @@ namespace SwrJit
> /// @brief pop count on vector mask (e.g. <8 x i1>)
> Value* Builder::VPOPCNT(Value* a) { return POPCNT(VMOVMSK(a)); }
> 
> +
> //
> +/// @brief Float / Fixed-point conversions
> +
> //
> +Value* Builder::VCVT_F32_FIXED_SI(Value* vFloat,
> +  uint32_t   numIntBits,
> +  uint32_t   numFracBits,
> +  const llvm::Twine& name)
> +{
> +SWR_ASSERT((numIntBits + numFracBits) <= 32, "Can only handle 32-bit 
> fixed-point values");
> +Value* fixed = nullptr;
> +if constexpr (false) // This doesn't work for negative numbers!!
> +{
> +fixed = FP_TO_SI(VROUND(FMUL(vFloat, VIMMED1(float(1 << 
> numFracBits))),
> +C(_MM_FROUND_TO_NEAREST_INT)),
> + mSimdInt32Ty);
> +}
> +else
> +{
> +// Do round to nearest int on fractional bits first
> +// Not entirely perfect for negative numbers, but close enough
> +vFloat = VROUND(FMUL(vFloat, VIMMED1(float(1 << numFracBits))),
> +C(_MM_FROUND_TO_NEAREST_INT));
> +vFloat = FMUL(vFloat, VIMMED1(1.0f / float(1 << numFracBits)));
> +
> +// TODO: Handle INF, NAN, overflow / underflow, etc.
> +
> +Value* vSgn  = FCMP_OLT(vFloat, VIMMED1(0.0f));
> +Value* vFloatInt = BITCAST(vFloat, mSimdInt32Ty);
> +Value* vFixed= AND(vFloatInt, VIMMED1((1 << 23) - 1));
> +vFixed   = OR(vFixed, VIMMED1(1 << 23));
> +vFixed   = SELECT(vSgn, NEG(vFixed), vFixed);
> +
> +Value* vExp = LSHR(SHL(vFloatInt, VIMMED1(1)), VIMMED1(24));
> +vExp= SUB(vExp, VIMMED1(127));
> +
> +Value* vExtraBits = SUB(VIMMED1(23 - numFracBits), vExp);
> +
> +fixed = ASHR(vFixed, vExtraBits, name);
> +}
> +
> +return fixed;
> +}
> +
> +Value* Builder::VCVT_FIXED_SI_F32(Value* vFixed,
> +  uint32_t   numIntBits,
> +  uint32_t   numFracBits,
> +  const llvm::Twine& name)
> +{
> +SWR_ASSERT((numIntBits + numFracBits) <= 32, "Can only handle 32-bit 
> fixed-point values");
> +uint32_t extraBits = 32 - numIntBits - numFracBits;
> +if (numIntBits && extraBits)
> +{
> +// Sign extend
> +Value* shftAmt = VIMMED1(extraBits);
> +vFixed = ASHR(SHL(vFixed, shftAmt), shftAmt);
> +}
> +
> +Value* fVal  = VIMMED1(0.0f);
> +Value* fFrac = VIMMED1(0.0f);
> +if (numIntBits)
> +{
> +fVal = SI_TO_FP(ASHR(vFixed, VIMMED1(numFracBits)), mSimdFP32Ty, 
> name);
> +}
> +
> +if (numFracBits)
> +{
> +fFrac = UI_TO_FP(AND(vFixed, VIMMED1((1 << numFracBits) - 1)), 
> mSimdFP32Ty);
> +fFrac = FDIV(fFrac, VIMMED1(float(1 << numFracBits)), name);
> +}
> +
> +return FADD(fVal, fFrac, name);
> +}
> +
> +Value* Builder::VCVT_F32_FIXED_UI(Value* vFloat,
> +  uint32_t   numIntBits,
> +  uint32_t   numFracBits,
> +  const llvm::Twine& name)
> +{
> +SWR_ASSERT((numIntBits + numFracBits) <= 32, "Can only handle 32-bit 
> fixed-point values");
> +Value* fixed = nullptr;
> +if constexpr (true) // KNOB_SIM_FAST_MATH?  Below works correctly 
> from a precision
> +// standpoint...
> +{
> +fixed = 

Re: [Mesa-dev] [PATCH 7/7] swr/rast: Store cached files in multiple subdirs

2019-01-15 Thread Cherniak, Bruce
Reviewed-by: Bruce Cherniak  

> On Dec 17, 2018, at 8:36 AM, Alok Hota  wrote:
> 
> This improves cache filesystem performance, especially during CI tests
> Also updated jitcache magic number due to codegen parameter changes
> Removed 2 `if constexpr` to prevent C++17 requirement
> ---
> .../swr/rasterizer/jitter/JitManager.cpp  | 51 ---
> .../swr/rasterizer/jitter/JitManager.h|  6 +++
> .../swr/rasterizer/jitter/builder_misc.cpp| 33 +---
> 3 files changed, 52 insertions(+), 38 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp 
> b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp
> index 1b2b570318c..a549721f147 100644
> --- a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp
> @@ -582,7 +582,7 @@ struct JitCacheFileHeader
> uint64_t GetObjectCRC() const { return m_objCRC; }
> 
> private:
> -static const uint64_t JC_MAGIC_NUMBER = 0xfedcba9876543211ULL + 4;
> +static const uint64_t JC_MAGIC_NUMBER = 0xfedcba9876543210ULL + 6;
> static const size_t   JC_STR_MAX_LEN  = 32;
> static const uint32_t JC_PLATFORM_KEY = (LLVM_VERSION_MAJOR << 24) |
> (LLVM_VERSION_MINOR << 16) | 
> (LLVM_VERSION_PATCH << 8) |
> @@ -634,6 +634,15 @@ JitCache::JitCache()
> {
> mCacheDir = KNOB_JIT_CACHE_DIR;
> }
> +
> +// Create cache dir at startup to allow jitter to write debug.ll files
> +// to that directory.
> +if (!llvm::sys::fs::exists(mCacheDir.str()) &&
> +llvm::sys::fs::create_directories(mCacheDir.str()))
> +{
> +SWR_INVALID("Unable to create directory: %s", mCacheDir.c_str());
> +}
> +
> }
> 
> int ExecUnhookedProcess(const std::string& CmdLine, std::string* pStdOut, 
> std::string* pStdErr)
> @@ -641,6 +650,26 @@ int ExecUnhookedProcess(const std::string& CmdLine, 
> std::string* pStdOut, std::s
> return ExecCmd(CmdLine, "", pStdOut, pStdErr);
> }
> 
> +/// Calculate actual directory where module will be cached.
> +/// This is always a subdirectory of mCacheDir.  Full absolute
> +/// path name will be stored in mCurrentModuleCacheDir
> +void JitCache::CalcModuleCacheDir()
> +{
> +mModuleCacheDir.clear();
> +
> +llvm::SmallString moduleDir = mCacheDir;
> +
> +// Create 4 levels of directory hierarchy based on CRC, 256 entries each
> +uint8_t* pCRC = (uint8_t*)
> +for (uint32_t i = 0; i < 4; ++i)
> +{
> +llvm::sys::path::append(moduleDir, std::to_string((int)pCRC[i]));
> +}
> +
> +mModuleCacheDir = moduleDir;
> +}
> +
> +
> /// notifyObjectCompiled - Provides a pointer to compiled code for Module M.
> void JitCache::notifyObjectCompiled(const llvm::Module* M, 
> llvm::MemoryBufferRef Obj)
> {
> @@ -650,16 +679,22 @@ void JitCache::notifyObjectCompiled(const llvm::Module* 
> M, llvm::MemoryBufferRef
> return;
> }
> 
> -if (!llvm::sys::fs::exists(mCacheDir.str()) &&
> -llvm::sys::fs::create_directories(mCacheDir.str()))
> +if (!mModuleCacheDir.size())
> {
> -SWR_INVALID("Unable to create directory: %s", mCacheDir.c_str());
> +SWR_INVALID("Unset module cache directory");
> +return;
> +}
> +
> +if (!llvm::sys::fs::exists(mModuleCacheDir.str()) &&
> +llvm::sys::fs::create_directories(mModuleCacheDir.str()))
> +{
> +SWR_INVALID("Unable to create directory: %s", 
> mModuleCacheDir.c_str());
> return;
> }
> 
> JitCacheFileHeader header;
> 
> -llvm::SmallString filePath = mCacheDir;
> +llvm::SmallString filePath = mModuleCacheDir;
> llvm::sys::path::append(filePath, moduleID);
> 
> llvm::SmallString objPath = filePath;
> @@ -699,12 +734,14 @@ std::unique_ptr 
> JitCache::getObject(const llvm::Module* M)
> return nullptr;
> }
> 
> -if (!llvm::sys::fs::exists(mCacheDir))
> +CalcModuleCacheDir();
> +
> +if (!llvm::sys::fs::exists(mModuleCacheDir))
> {
> return nullptr;
> }
> 
> -llvm::SmallString filePath = mCacheDir;
> +llvm::SmallString filePath = mModuleCacheDir;
> llvm::sys::path::append(filePath, moduleID);
> 
> llvm::SmallString objFilePath = filePath;
> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.h 
> b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.h
> index 5659191525d..bb7ca8b4a3e 100644
> --- a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.h
> +++ b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.h
> @@ -113,9 +113,15 @@ public:
> private:
> std::string mCpu;
> llvm::SmallString mCacheDir;
> +llvm::SmallString mModuleCacheDir;
> uint32_tmCurrentModuleCRC = 0;
> JitManager* mpJitMgr  = nullptr;
> llvm::CodeGenOpt::Level mOptLevel = llvm::CodeGenOpt::None;
> +
> +/// Calculate actual directory where 

Re: [Mesa-dev] [PATCH] gallium/swr: Fix multi-context sync fence deadlock.

2019-01-15 Thread Hota, Alok
Reviewed-by: Alok Hota 

-Alok

-Original Message-
From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf Of 
Bruce Cherniak
Sent: Friday, January 4, 2019 2:54 PM
To: mesa-dev@lists.freedesktop.org
Cc: mesa-sta...@lists.freedesktop.org
Subject: [Mesa-dev] [PATCH] gallium/swr: Fix multi-context sync fence deadlock.

Various recreation scenarios lead to API thread getting stuck in 
swr_fence_finish().  This is a multi-context issue, whereby one context 
overwrites the fence read-value with a previous sync's lesser value.
The fence sync value is supposed to be always increasing.

In swr_fence_cb(), only update the "read" value if the new value is greater.

(This may seem like we're not waiting on the other context to finish, but had 
we needed for it to finish there would have been a wait prior to submitting a 
new sync.)

cc: mesa-sta...@lists.freedesktop.org
---
 src/gallium/drivers/swr/swr_fence.cpp | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_fence.cpp 
b/src/gallium/drivers/swr/swr_fence.cpp
index b05ac8cec0..074d82a3b4 100644
--- a/src/gallium/drivers/swr/swr_fence.cpp
+++ b/src/gallium/drivers/swr/swr_fence.cpp
@@ -50,7 +50,9 @@ swr_fence_cb(uint64_t userData, uint64_t userData2, uint64_t 
userData3)
swr_fence_do_work(fence);
 
/* Correct value is in SwrSync data, and not the fence write field. */
-   fence->read = userData2;
+   /* Contexts may not finish in order, but fence value always increases */
+   if (fence->read < userData2)
+  fence->read = userData2;
 }
 
 /*
--
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] gallium: add SINT formats to have exact counterparts to SNORM formats

2019-01-15 Thread Marek Olšák
From: Marek Olšák 

for radeonsi
---
 src/gallium/auxiliary/util/u_format.csv | 4 
 src/gallium/include/pipe/p_format.h | 5 +
 2 files changed, 9 insertions(+)

diff --git a/src/gallium/auxiliary/util/u_format.csv 
b/src/gallium/auxiliary/util/u_format.csv
index 911ac07d32a..a7303c1ff73 100644
--- a/src/gallium/auxiliary/util/u_format.csv
+++ b/src/gallium/auxiliary/util/u_format.csv
@@ -427,20 +427,24 @@ PIPE_FORMAT_R8A8_SINT   , plain, 1, 1, sp8  , 
sp8  , , , x00
 PIPE_FORMAT_R16A16_UINT , plain, 1, 1, up16 , up16 , , , 
x00y, rgb
 PIPE_FORMAT_R16A16_SINT , plain, 1, 1, sp16 , sp16 , , , 
x00y, rgb
 PIPE_FORMAT_R32A32_UINT , plain, 1, 1, up32 , up32 , , , 
x00y, rgb
 PIPE_FORMAT_R32A32_SINT , plain, 1, 1, sp32 , sp32 , , , 
x00y, rgb
 PIPE_FORMAT_R10G10B10A2_UINT, plain, 1, 1, up10 , up10 , up10, up2 , 
xyzw, rgb, up2 , up10, up10, up10, wzyx
 
 PIPE_FORMAT_B5G6R5_SRGB , plain, 1, 1, un5 , un6 , un5 , , 
zyx1, srgb, un5 , un6 , un5 , , xyz1
 
 PIPE_FORMAT_A8L8_UNORM, plain, 1, 1, un8 , un8 , , , yyyx, rgb
 PIPE_FORMAT_A8L8_SNORM, plain, 1, 1, sn8 , sn8 , , , yyyx, rgb
+PIPE_FORMAT_A8L8_SINT , plain, 1, 1, sp8 , sp8 , , , yyyx, rgb
 PIPE_FORMAT_A8L8_SRGB , plain, 1, 1, un8 , un8 , , , yyyx, srgb
 PIPE_FORMAT_A16L16_UNORM  , plain, 1, 1, un16, un16, , , yyyx, rgb
 
 PIPE_FORMAT_G8R8_UNORM, plain, 1, 1, un8 , un8 , , , yx01, rgb
 PIPE_FORMAT_G8R8_SNORM, plain, 1, 1, sn8 , sn8 , , , yx01, rgb
+PIPE_FORMAT_G8R8_SINT , plain, 1, 1, sp8 , sp8 , , , yx01, rgb
 PIPE_FORMAT_G16R16_UNORM  , plain, 1, 1, un16, un16, , , yx01, rgb
 PIPE_FORMAT_G16R16_SNORM  , plain, 1, 1, sn16, sn16, , , yx01, rgb
 
 PIPE_FORMAT_A8B8G8R8_SNORM, plain, 1, 1, sn8 , sn8 , sn8 , sn8 , wzyx, 
rgb
+PIPE_FORMAT_A8B8G8R8_SINT , plain, 1, 1, sp8 , sp8 , sp8 , sp8 , wzyx, 
rgb
 PIPE_FORMAT_X8B8G8R8_SNORM, plain, 1, 1, x8,   sn8,  sn8,  sn8,  wzy1, 
rgb
+PIPE_FORMAT_X8B8G8R8_SINT , plain, 1, 1, x8,   sp8,  sp8,  sp8,  wzy1, 
rgb
diff --git a/src/gallium/include/pipe/p_format.h 
b/src/gallium/include/pipe/p_format.h
index 6fb91222f2b..c81fc67b8e8 100644
--- a/src/gallium/include/pipe/p_format.h
+++ b/src/gallium/include/pipe/p_format.h
@@ -391,20 +391,25 @@ enum pipe_format {
 
PIPE_FORMAT_P016= 307,
 
PIPE_FORMAT_R10G10B10X2_UNORM   = 308,
PIPE_FORMAT_A1B5G5R5_UNORM  = 309,
PIPE_FORMAT_X1B5G5R5_UNORM  = 310,
PIPE_FORMAT_A4B4G4R4_UNORM  = 311,
 
PIPE_FORMAT_R8_SRGB = 312,
 
+   PIPE_FORMAT_A8L8_SINT   = 313,
+   PIPE_FORMAT_G8R8_SINT   = 314,
+   PIPE_FORMAT_A8B8G8R8_SINT   = 315,
+   PIPE_FORMAT_X8B8G8R8_SINT   = 316,
+
PIPE_FORMAT_COUNT
 };
 
 #if defined(PIPE_ARCH_LITTLE_ENDIAN)
 #define PIPE_FORMAT_RGBA_UNORM PIPE_FORMAT_R8G8B8A8_UNORM
 #define PIPE_FORMAT_RGBX_UNORM PIPE_FORMAT_R8G8B8X8_UNORM
 #define PIPE_FORMAT_BGRA_UNORM PIPE_FORMAT_B8G8R8A8_UNORM
 #define PIPE_FORMAT_BGRX_UNORM PIPE_FORMAT_B8G8R8X8_UNORM
 #define PIPE_FORMAT_ARGB_UNORM PIPE_FORMAT_A8R8G8B8_UNORM
 #define PIPE_FORMAT_XRGB_UNORM PIPE_FORMAT_X8R8G8B8_UNORM
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: enable dithered alpha-to-coverage for better quality

2019-01-15 Thread Marek Olšák
From: Marek Olšák 

same as AMDVLK.

GL_NV_alpha_to_coverage_dither_control allows controlling this behavior.
The default is implementation-dependent.
---
 src/gallium/drivers/radeonsi/si_state.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index d1c0e0371dc..2282dbf7017 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -467,24 +467,25 @@ static void *si_create_blend_state_mode(struct 
pipe_context *ctx,
blend->logicop_enable = state->logicop_enable;
 
if (state->logicop_enable) {
color_control |= S_028808_ROP3(state->logicop_func | 
(state->logicop_func << 4));
} else {
color_control |= S_028808_ROP3(0xcc);
}
 
si_pm4_set_reg(pm4, R_028B70_DB_ALPHA_TO_MASK,
   S_028B70_ALPHA_TO_MASK_ENABLE(state->alpha_to_coverage) |
-  S_028B70_ALPHA_TO_MASK_OFFSET0(2) |
-  S_028B70_ALPHA_TO_MASK_OFFSET1(2) |
-  S_028B70_ALPHA_TO_MASK_OFFSET2(2) |
-  S_028B70_ALPHA_TO_MASK_OFFSET3(2));
+  S_028B70_ALPHA_TO_MASK_OFFSET0(3) |
+  S_028B70_ALPHA_TO_MASK_OFFSET1(1) |
+  S_028B70_ALPHA_TO_MASK_OFFSET2(0) |
+  S_028B70_ALPHA_TO_MASK_OFFSET3(2) |
+  S_028B70_OFFSET_ROUND(1));
 
if (state->alpha_to_coverage)
blend->need_src_alpha_4bit |= 0xf;
 
blend->cb_target_mask = 0;
blend->cb_target_enabled_4bit = 0;
 
for (int i = 0; i < 8; i++) {
/* state->rt entries > 0 only written if independent blending */
const int j = state->independent_blend_enable ? i : 0;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Meson configuration for bare-bones osmesa

2019-01-15 Thread Chuck Atkins
I'm jumping into Meson land now and I'm trying to configure the most recent
release, 18.3.1, to build a minimal OSMesa containing only softpipe.  So
I'm trying to make sure everything is explicitly disabled and only turning
on the few pieces I need:

meson -Ddebug=false -Degl=false -Dgbm=false -Dopengl=true -Dgles1=false
-Dgles2=false -Dglvnd=false -Dgallium-nine=false -Dgallium-omx=disabled
-Dgallium-opencl=disabled -Dgallium-va=false -Dgallium-vdpau=false
-Dgallium-xa=false -Dgallium-xvmc=false -Dplatforms=surfaceless
-Dgallium-drivers=swrast -Dosmesa=gallium -Dllvm=false -Dglx=disabled
-Ddri-drivers= build


But I end up with the following error:

meson.build:393:4: ERROR:  Problem encountered: building dri or gallium
drivers require at least one window system


Am I doing something wring here?  It looks like the meson build isn't
properly dealing with the dependencies of the software rasterizers since
you should be able to build a libOSMesa.so with only software rasterizers,
i.e. softpipe, llvmpipe, and swr, without requiring any windowing system.

--
Chuck Atkins
Staff R Engineer, Scientific Computing
Kitware, Inc.
(518) 881-1183
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Meson configuration for bare-bones osmesa

2019-01-15 Thread Dylan Baker
Quoting Chuck Atkins (2019-01-15 11:17:43)
> I'm jumping into Meson land now and I'm trying to configure the most recent
> release, 18.3.1, to build a minimal OSMesa containing only softpipe.  So I'm
> trying to make sure everything is explicitly disabled and only turning on the
> few pieces I need:
> 
> 
> meson -Ddebug=false -Degl=false -Dgbm=false -Dopengl=true -Dgles1=false
> -Dgles2=false -Dglvnd=false -Dgallium-nine=false -Dgallium-omx=disabled
> -Dgallium-opencl=disabled -Dgallium-va=false -Dgallium-vdpau=false
> -Dgallium-xa=false -Dgallium-xvmc=false -Dplatforms=surfaceless
> -Dgallium-drivers=swrast -Dosmesa=gallium -Dllvm=false -Dglx=disabled
> -Ddri-drivers= build
> 
> 
> But I end up with the following error:
> 
> meson.build:393:4: ERROR:  Problem encountered: building dri or gallium
> drivers require at least one window system
> 
> 
> Am I doing something wring here?  It looks like the meson build isn't properly
> dealing with the dependencies of the software rasterizers since you should be
> able to build a libOSMesa.so with only software rasterizers, i.e. softpipe,
> llvmpipe, and swr, without requiring any windowing system.
> 
> --
> Chuck Atkins
> Staff R Engineer, Scientific Computing
> Kitware, Inc.
> (518) 881-1183

That looks more like an error check that is too greedy. More likely that should
be a check of something like (psuedo code ahead):
`if not (osmesa or any_window_system)`

I'll send a patch.

Dylan


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-15 Thread Jason Ekstrand
On Tue, Jan 15, 2019 at 12:52 PM Eric Anholt  wrote:

> Daniel Stone  writes:
>
> > Hi,
> >
> > On Tue, 15 Jan 2019 at 12:21, Rob Clark  wrote:
> >> On Tue, Jan 15, 2019 at 1:02 AM Tapani Pälli 
> wrote:
> >> > On 1/14/19 2:36 PM, Daniel Stone wrote:
> >> > > On Fri, 11 Jan 2019 at 17:05, Jason Ekstrand 
> wrote:
> >> > > In other projects, we looked for ways to apply the tags and ended up
> >> > > concluding that they didn't bring enough value to make it
> worthwhile.
> >> > > I don't know if that holds for Mesa, but it would be better to start
> >> > > with an actual problem statement - what value does R-b bring and
> how?
> >> > > - then look at ways to solve that problem, rather than just very
> >> > > directly finding a way to insert that literal text string into every
> >> > > commit message.
> >> >
> >> > IMO it brings some 'shared responsibility' for correctness of the
> patch
> >
> > Oh, no doubt - we certainly haven't abandoned thorough review! So far
> > we haven't seen that compromised by not having a name in the commit
> > message.
> >
> >> > and quickly accessible information on who were looking at the change.
> So
> >> > ideally later when filing bug against commit/series there would be
> more
> >> > people than just the committer that should take a look at the possible
> >> > regressions. At least in my experience people filing bugs tend to
> often
> >> > also CC the reviewer.
> >
> > Yeah, that's really helpful. So maybe a useful flow - assuming we
> > eventually switch to GitLab issues - would be the ability to associate
> > an issue with a commit, which could then automatically drag in people
> > who commented on the MR which landed that commit, as well as (at
> > least) the reporter of the issue(s) fixed by that MR. That would need
> > some kind of clever - probably at least semi-manual - filtering to
> > make sure it wasn't just spamming the world, but it's at least a
> > starting point.
> >
> >> +1 .. and also it is nice to see things like Reported-by/Reviewed-by
> >> without having to go search somewhere else (ie. outside of git/tig)
> >
> > My question would again be what value that brings you. Do you just
> > like seeing the name there, or do you go poke the people on IRC, or
> > follow up via email, or ... ? Again I personally go look through the
> > original review to see what came up during that first, but everyone's
> > different, so I'm just trying to understand what you actually do with
> > that information, so we can figure out if there's a better way to do
> > things for everyone rather than just blindly imitating what came
> > before.
>
> I've participated in adding Reported-bys, but I've never seen the use.
> It felt like "we could record this information, so we should!" rather
> than solving a problem.
>

To me, the Reported-by tag is more for giving credit than anything else.
Maybe it doesn't matter but some people appreciate it when their
contributions, even if it's just a good bug report, are recorded in the
project's permanent record.  It's also a good way to make sure the reporter
gets CCd on the patch so they can verify it fixes the bug for them.  That
said, bugzilla is a permanent record and the information would be even more
accessible if we just used GitLab MRs...


> I've found little use in ccing reviewers on followups, except for
> trivial stuff like compiler warnings.  I propose that the solution for
> compiler warnings should be CI that prevents you from merging new
> compiler warnings anyway.
>
> Basically, I feel like the pain points in the MR process (amending and
> re-pushing before clicking "merge") are pre-existing pain points in our
> process, slightly amplified.
>
> >> (ofc it would be pretty awesome incentive to switch to gitlab issues
> >> if gitlab could automate adding Reported-by tags for MR's associated
> >> with an issue.. but I guess checkbox to add Reviewed-by tag would
> >> already make my day)
> >
> > I saw this the other day, which might be more incentive:
> >
> https://csoriano.pages.gitlab.gnome.org/csoriano-blog/post/2019-01-07-issue-handling-automation/
>
> Automatic needinfo closing?  Sign me up.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/7] swr/rast: Add annotator to interleave isa text

2019-01-15 Thread Cherniak, Bruce
Reviewed-by: Bruce Cherniak  

> On Dec 17, 2018, at 8:36 AM, Alok Hota  wrote:
> 
> To make debugging simpler
> ---
> .../swr/rasterizer/jitter/JitManager.cpp  | 27 +--
> .../swr/rasterizer/jitter/JitManager.h| 12 -
> 2 files changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp 
> b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp
> index 0312fc47fb6..58d30d4e119 100644
> --- a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp
> @@ -443,7 +443,7 @@ std::string JitManager::GetOutputDir()
> 
> //
> /// @brief Dump function to file.
> -void JitManager::DumpToFile(Module* M, const char* fileName)
> +void JitManager::DumpToFile(Module* M, const char* fileName, 
> llvm::AssemblyAnnotationWriter* annotater)
> {
> if (KNOB_DUMP_SHADER_IR)
> {
> @@ -458,7 +458,7 @@ void JitManager::DumpToFile(Module* M, const char* 
> fileName)
> sprintf(fName, "%s.%s.ll", funcName, fileName);
> #endif
> raw_fd_ostream fd(fName, EC, llvm::sys::fs::F_None);
> -M->print(fd, nullptr);
> +M->print(fd, annotater);
> fd.flush();
> }
> }
> @@ -758,3 +758,26 @@ std::unique_ptr 
> JitCache::getObject(const llvm::Module* M)
> 
> return pBuf;
> }
> +
> +void InterleaveAssemblyAnnotater::emitInstructionAnnot(const 
> llvm::Instruction *pInst, llvm::formatted_raw_ostream )
> +{
> +auto dbgLoc = pInst->getDebugLoc();
> +if(dbgLoc)
> +{
> +unsigned int line = dbgLoc.getLine();
> +if(line != mCurrentLineNo)
> +{
> +if(line > 0 && line <= mAssembly.size())
> +{
> +// HACK: here we assume that OS is a 
> formatted_raw_ostream(ods())
> +// and modify the color accordingly. We can't do the color
> +// modification on OS because formatted_raw_ostream strips
> +// the color information. The only way to fix this behavior
> +// is to patch LLVM.
> +OS << "\n; " << line << ": " << mAssembly[line-1] << "\n";
> +}
> +mCurrentLineNo = line;
> +}
> +}
> +}
> +
> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.h 
> b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.h
> index a5b6af91f06..2f479314c76 100644
> --- a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.h
> +++ b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.h
> @@ -31,6 +31,7 @@
> 
> #include "jit_pch.hpp"
> #include "common/isa.hpp"
> +#include 
> 
> 
> //
> @@ -151,7 +152,7 @@ struct JitManager
> 
> void   DumpAsm(llvm::Function* pFunction, const char* 
> fileName);
> static voidDumpToFile(llvm::Function* f, const char* fileName);
> -static voidDumpToFile(llvm::Module* M, const char* fileName);
> +static voidDumpToFile(llvm::Module* M, const char* fileName, 
> llvm::AssemblyAnnotationWriter* annotater = nullptr);
> static std::string GetOutputDir();
> 
> // Debugging support methods
> @@ -178,3 +179,12 @@ struct JitManager
>   uint32_t
>  lineNum,
>   const std::vector uint32_t>>& members);
> };
> +
> +class InterleaveAssemblyAnnotater : public llvm::AssemblyAnnotationWriter
> +{
> +public:
> +void emitInstructionAnnot(const llvm::Instruction *pInst, 
> llvm::formatted_raw_ostream ) override;
> +std::vector mAssembly;
> +private:
> +uint32_t mCurrentLineNo = 0;
> +};
> -- 
> 2.17.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 109258] Weston drm-backend.so seems to fail with Mesa master and LIBGL_ALWAYS_SOFTWARE=1

2019-01-15 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109258

--- Comment #2 from Eric Engestrom  ---
I can reproduce, I'll take a look. Thanks for the report :)

As for your bisect, I had messed up a rebase and accidentally dropped a line in
8cb84c8477a57ed05d70, which lead to Marek reverting my commit a few weeks later
in 84f3afc2e122cb418573. About a year later, I remembered about all this and
fixed up the issue, and the commit landed as cb0980e69aa921af7086.

This might help you bisect, as you now know that all the commit between
8cb84c8477a57ed05d70 and 84f3afc2e122cb418573 are broken :)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-15 Thread Eric Anholt
Daniel Stone  writes:

> Hi,
>
> On Tue, 15 Jan 2019 at 12:21, Rob Clark  wrote:
>> On Tue, Jan 15, 2019 at 1:02 AM Tapani Pälli  wrote:
>> > On 1/14/19 2:36 PM, Daniel Stone wrote:
>> > > On Fri, 11 Jan 2019 at 17:05, Jason Ekstrand  
>> > > wrote:
>> > > In other projects, we looked for ways to apply the tags and ended up
>> > > concluding that they didn't bring enough value to make it worthwhile.
>> > > I don't know if that holds for Mesa, but it would be better to start
>> > > with an actual problem statement - what value does R-b bring and how?
>> > > - then look at ways to solve that problem, rather than just very
>> > > directly finding a way to insert that literal text string into every
>> > > commit message.
>> >
>> > IMO it brings some 'shared responsibility' for correctness of the patch
>
> Oh, no doubt - we certainly haven't abandoned thorough review! So far
> we haven't seen that compromised by not having a name in the commit
> message.
>
>> > and quickly accessible information on who were looking at the change. So
>> > ideally later when filing bug against commit/series there would be more
>> > people than just the committer that should take a look at the possible
>> > regressions. At least in my experience people filing bugs tend to often
>> > also CC the reviewer.
>
> Yeah, that's really helpful. So maybe a useful flow - assuming we
> eventually switch to GitLab issues - would be the ability to associate
> an issue with a commit, which could then automatically drag in people
> who commented on the MR which landed that commit, as well as (at
> least) the reporter of the issue(s) fixed by that MR. That would need
> some kind of clever - probably at least semi-manual - filtering to
> make sure it wasn't just spamming the world, but it's at least a
> starting point.
>
>> +1 .. and also it is nice to see things like Reported-by/Reviewed-by
>> without having to go search somewhere else (ie. outside of git/tig)
>
> My question would again be what value that brings you. Do you just
> like seeing the name there, or do you go poke the people on IRC, or
> follow up via email, or ... ? Again I personally go look through the
> original review to see what came up during that first, but everyone's
> different, so I'm just trying to understand what you actually do with
> that information, so we can figure out if there's a better way to do
> things for everyone rather than just blindly imitating what came
> before.

I've participated in adding Reported-bys, but I've never seen the use.
It felt like "we could record this information, so we should!" rather
than solving a problem.

I've found little use in ccing reviewers on followups, except for
trivial stuff like compiler warnings.  I propose that the solution for
compiler warnings should be CI that prevents you from merging new
compiler warnings anyway.

Basically, I feel like the pain points in the MR process (amending and
re-pushing before clicking "merge") are pre-existing pain points in our
process, slightly amplified.

>> (ofc it would be pretty awesome incentive to switch to gitlab issues
>> if gitlab could automate adding Reported-by tags for MR's associated
>> with an issue.. but I guess checkbox to add Reviewed-by tag would
>> already make my day)
>
> I saw this the other day, which might be more incentive:
> https://csoriano.pages.gitlab.gnome.org/csoriano-blog/post/2019-01-07-issue-handling-automation/

Automatic needinfo closing?  Sign me up.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-15 Thread Marek Olšák
I noticed that gitlab breaks formatting of . It
removes < and >, and converts the address to a hyperlink. I can preserve
the formatting by enclosing the comment in ` ... `.

Marek

On Tue, Jan 15, 2019 at 1:52 PM Eric Anholt  wrote:

> Daniel Stone  writes:
>
> > Hi,
> >
> > On Tue, 15 Jan 2019 at 12:21, Rob Clark  wrote:
> >> On Tue, Jan 15, 2019 at 1:02 AM Tapani Pälli 
> wrote:
> >> > On 1/14/19 2:36 PM, Daniel Stone wrote:
> >> > > On Fri, 11 Jan 2019 at 17:05, Jason Ekstrand 
> wrote:
> >> > > In other projects, we looked for ways to apply the tags and ended up
> >> > > concluding that they didn't bring enough value to make it
> worthwhile.
> >> > > I don't know if that holds for Mesa, but it would be better to start
> >> > > with an actual problem statement - what value does R-b bring and
> how?
> >> > > - then look at ways to solve that problem, rather than just very
> >> > > directly finding a way to insert that literal text string into every
> >> > > commit message.
> >> >
> >> > IMO it brings some 'shared responsibility' for correctness of the
> patch
> >
> > Oh, no doubt - we certainly haven't abandoned thorough review! So far
> > we haven't seen that compromised by not having a name in the commit
> > message.
> >
> >> > and quickly accessible information on who were looking at the change.
> So
> >> > ideally later when filing bug against commit/series there would be
> more
> >> > people than just the committer that should take a look at the possible
> >> > regressions. At least in my experience people filing bugs tend to
> often
> >> > also CC the reviewer.
> >
> > Yeah, that's really helpful. So maybe a useful flow - assuming we
> > eventually switch to GitLab issues - would be the ability to associate
> > an issue with a commit, which could then automatically drag in people
> > who commented on the MR which landed that commit, as well as (at
> > least) the reporter of the issue(s) fixed by that MR. That would need
> > some kind of clever - probably at least semi-manual - filtering to
> > make sure it wasn't just spamming the world, but it's at least a
> > starting point.
> >
> >> +1 .. and also it is nice to see things like Reported-by/Reviewed-by
> >> without having to go search somewhere else (ie. outside of git/tig)
> >
> > My question would again be what value that brings you. Do you just
> > like seeing the name there, or do you go poke the people on IRC, or
> > follow up via email, or ... ? Again I personally go look through the
> > original review to see what came up during that first, but everyone's
> > different, so I'm just trying to understand what you actually do with
> > that information, so we can figure out if there's a better way to do
> > things for everyone rather than just blindly imitating what came
> > before.
>
> I've participated in adding Reported-bys, but I've never seen the use.
> It felt like "we could record this information, so we should!" rather
> than solving a problem.
>
> I've found little use in ccing reviewers on followups, except for
> trivial stuff like compiler warnings.  I propose that the solution for
> compiler warnings should be CI that prevents you from merging new
> compiler warnings anyway.
>
> Basically, I feel like the pain points in the MR process (amending and
> re-pushing before clicking "merge") are pre-existing pain points in our
> process, slightly amplified.
>
> >> (ofc it would be pretty awesome incentive to switch to gitlab issues
> >> if gitlab could automate adding Reported-by tags for MR's associated
> >> with an issue.. but I guess checkbox to add Reviewed-by tag would
> >> already make my day)
> >
> > I saw this the other day, which might be more incentive:
> >
> https://csoriano.pages.gitlab.gnome.org/csoriano-blog/post/2019-01-07-issue-handling-automation/
>
> Automatic needinfo closing?  Sign me up.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Meson configuration for bare-bones osmesa

2019-01-15 Thread Dylan Baker
Quoting Chuck Atkins (2019-01-15 11:17:43)
> I'm jumping into Meson land now and I'm trying to configure the most recent
> release, 18.3.1, to build a minimal OSMesa containing only softpipe.  So I'm
> trying to make sure everything is explicitly disabled and only turning on the
> few pieces I need:
> 
> 
> meson -Ddebug=false -Degl=false -Dgbm=false -Dopengl=true -Dgles1=false
> -Dgles2=false -Dglvnd=false -Dgallium-nine=false -Dgallium-omx=disabled
> -Dgallium-opencl=disabled -Dgallium-va=false -Dgallium-vdpau=false
> -Dgallium-xa=false -Dgallium-xvmc=false -Dplatforms=surfaceless
> -Dgallium-drivers=swrast -Dosmesa=gallium -Dllvm=false -Dglx=disabled
> -Ddri-drivers= build
> 
> 
> But I end up with the following error:
> 
> meson.build:393:4: ERROR:  Problem encountered: building dri or gallium
> drivers require at least one window system
> 
> 
> Am I doing something wring here?  It looks like the meson build isn't properly
> dealing with the dependencies of the software rasterizers since you should be
> able to build a libOSMesa.so with only software rasterizers, i.e. softpipe,
> llvmpipe, and swr, without requiring any windowing system.
> 
> --
> Chuck Atkins
> Staff R Engineer, Scientific Computing
> Kitware, Inc.
> (518) 881-1183

There is a patch Emil wrote that should be staged for the 18.3 branch that fixes
this for gallium osmesa, I have a trivial patch for classic osmesa as well.

PS: you probably want to add -Dvulkan-drivers= if you're building on Linux.

Dylan


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [MR] meson: allow dri based drivers with classic osmesa without glx or egl

2019-01-15 Thread Dylan Baker
Currently something like:
meson -Dgallium-drivers= -Dvulkan-drivers= -Ddri-drivers=swrast -Dosmesa=classic
-Degl=false -Dglx=disabled
will error, which is wrong.

https://gitlab.freedesktop.org/mesa/mesa/merge_requests/115

Dylan


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: use compute for resource_copy_region when possible

2019-01-15 Thread Marek Olšák
On Tue, Jan 15, 2019 at 3:54 PM Axel Davy  wrote:

> On 15/01/2019 18:50, Marek Olšák wrote:
> >
> > +void si_compute_copy_image(struct si_context *sctx,
> > +struct pipe_resource *dst,
> > +unsigned dst_level,
> > +struct pipe_resource *src,
> > +unsigned src_level,
> > +unsigned dstx, unsigned dsty, unsigned dstz,
> > +const struct pipe_box *src_box)
> > +{
> > + struct pipe_context *ctx = >b;
> > + unsigned width = src_box->width;
> > + unsigned height = src_box->height;
> > + unsigned depth = src_box->depth;
> > +
> > + unsigned data[] = {src_box->x, src_box->y, src_box->z, 0, dstx,
> dsty, dstz, 0};
> > +
> > + if (width == 0 || height == 0)
> > + return;
> > +
> > + sctx->flags |= SI_CONTEXT_CS_PARTIAL_FLUSH |
> > +si_get_flush_flags(sctx, SI_COHERENCY_SHADER,
> L2_STREAM);
> > + si_make_CB_shader_coherent(sctx, dst->nr_samples, true);
> > +
> > + struct pipe_constant_buffer saved_cb = {};
> > + si_get_pipe_constant_buffer(sctx, PIPE_SHADER_COMPUTE, 0,
> _cb);
> > +
> > + struct si_images *images = >images[PIPE_SHADER_COMPUTE];
> > + struct pipe_image_view saved_image[2] = {0};
> > + util_copy_image_view(_image[0], >views[0]);
> > + util_copy_image_view(_image[1], >views[1]);
> > +
> > + void *saved_cs = sctx->cs_shader_state.program;
> > +
> > + struct pipe_constant_buffer cb = {};
> > + cb.buffer_size = sizeof(data);
> > + cb.user_buffer = data;
> > + ctx->set_constant_buffer(ctx, PIPE_SHADER_COMPUTE, 0, );
> > +
> > + struct pipe_image_view image[2] = {0};
> > + image[0].resource = src;
> > + image[0].shader_access = image[0].access = PIPE_IMAGE_ACCESS_READ;
> > + image[0].format = util_format_linear(src->format);
> > + image[0].u.tex.level = src_level;
> > + image[0].u.tex.first_layer = 0;
> > + image[0].u.tex.last_layer =
> > + src->target == PIPE_TEXTURE_3D ? u_minify(src->depth0,
> src_level) - 1
> > + :
> (unsigned)(src->array_size - 1);
> > + image[1].resource = dst;
> > + image[1].shader_access = image[1].access = PIPE_IMAGE_ACCESS_WRITE;
> > + image[1].format = util_format_linear(dst->format);
> > + image[1].u.tex.level = dst_level;
> > + image[1].u.tex.first_layer = 0;
> > + image[1].u.tex.last_layer =
> > + dst->target == PIPE_TEXTURE_3D ? u_minify(dst->depth0,
> dst_level) - 1
> > + :
> (unsigned)(dst->array_size - 1);
> > +
> > + if (src->format == PIPE_FORMAT_R9G9B9E5_FLOAT)
> > + image[0].format = image[1].format = PIPE_FORMAT_R32_UINT;
> > +
> > + /* SNORM8 blitting has precision issues on some chips. Use the SINT
> > +  * equivalent instead, which doesn't force DCC decompression.
> > +  * Note that some chips avoid this issue by using SDMA.
> > +  */
> > + if (util_format_is_snorm8(dst->format)) {
> > + image[0].format = image[1].format =
> > + util_format_snorm8_to_sint8(dst->format);
> > + }
> > +
> > + ctx->set_shader_images(ctx, PIPE_SHADER_COMPUTE, 0, 2, image);
> > +
> > + struct pipe_grid_info info = {0};
> > +
> > + if (dst->target == PIPE_TEXTURE_1D_ARRAY && src->target ==
> PIPE_TEXTURE_1D_ARRAY) {
> > + if (!sctx->cs_copy_image_1d_array)
> > + sctx->cs_copy_image_1d_array =
> > +
>  si_create_copy_image_compute_shader_1d_array(ctx);
> > + ctx->bind_compute_state(ctx, sctx->cs_copy_image_1d_array);
> > + info.block[0] = 64;
> > + info.last_block[0] = width % 64;
> > + info.block[1] = 1;
> > + info.block[2] = 1;
> > + info.grid[0] = DIV_ROUND_UP(width, 64);
> > + info.grid[1] = depth;
> > + info.grid[2] = 1;
> > + } else {
> > + if (!sctx->cs_copy_image)
> > + sctx->cs_copy_image =
> si_create_copy_image_compute_shader(ctx);
> > + ctx->bind_compute_state(ctx, sctx->cs_copy_image);
> > + info.block[0] = 8;
> > + info.last_block[0] = width % 8;
> > + info.block[1] = 8;
> > + info.last_block[1] = height % 8;
> > + info.block[2] = 1;
> > + info.grid[0] = DIV_ROUND_UP(width, 8);
> > + info.grid[1] = DIV_ROUND_UP(height, 8);
> > + info.grid[2] = depth;
> > + }
> > +
> > + ctx->launch_grid(ctx, );
> > +
> > + sctx->flags |= SI_CONTEXT_CS_PARTIAL_FLUSH |
> > +(sctx->chip_class <= VI ?
> SI_CONTEXT_WRITEBACK_GLOBAL_L2 : 0) |
> > +si_get_flush_flags(sctx, SI_COHERENCY_SHADER,
> L2_STREAM);
> > + ctx->bind_compute_state(ctx, saved_cs);
> > + 

Re: [Mesa-dev] [PATCH] nir/lower_tex: Fix the channel ordering during conversion of AYUV images

2019-01-15 Thread Vivek Kasireddy
On Tue, 15 Jan 2019 02:34:08 +
Lionel Landwerlin  wrote:

> When writing this I used this page to figure the bytes' ordering : 
> https://docs.microsoft.com/en-us/windows/desktop/medfound/recommended-8-bit-yuv-formats-for-video-rendering#ayuv
> Of course endianess confuses everything :(
> 
> sunxi seems to support AYUV & VUYA : 
> https://github.com/allwinner-zh/linux-3.4-sunxi/blob/master/include/video/sunxi_display2.h#L40
> 
> Finally this patch (and its gstreamer comments) confuses me even
> more : https://patchwork.freedesktop.org/patch/255529/
> 
> I really don't know what's right or wrong here...
> 
> -
> Lionel
Hi Lionel,
I am in the same boat as you; however, I think you may be right. I was
looking at this page:
https://linuxtv.org/downloads/v4l-dvb-apis/uapi/v4l/pixfmt-packed-yuv.html

and assumed that the format it refers to as V4L2_PIX_FMT_YUV32 with
code YUV4 is the same as AYUV. I am guessing the best way to fix this
is to add a new AYUV format to V4L that reverses the channel ordering?

-Vivek

> 
> On 15/01/2019 00:49, Vivek Kasireddy wrote:
> > From: "Kasireddy, Vivek" 
> >
> > The channel ordering should be 1230 instead of 2103.
> >
> > While displaying the packed YUV buffers generated by the Vivid
> > (Virtual Video) driver on Weston, it was observed that AYUV images
> > were not displayed correctly. Changing the ordering to 1230 makes
> > AYUV buffers display as expected.
> >
> > CC: Lionel Landwerlin 
> > CC: Tapani Palli 
> > Signed-off-by: Vivek Kasireddy 
> > ---
> >   src/compiler/nir/nir_lower_tex.c | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/compiler/nir/nir_lower_tex.c
> > b/src/compiler/nir/nir_lower_tex.c index a618b86b34c..7058c54f17c
> > 100644 --- a/src/compiler/nir/nir_lower_tex.c
> > +++ b/src/compiler/nir/nir_lower_tex.c
> > @@ -434,10 +434,10 @@ lower_ayuv_external(nir_builder *b,
> > nir_tex_instr *tex) nir_ssa_def *ayuv = sample_plane(b, tex, 0);
> >   
> > convert_yuv_to_rgb(b, tex,
> > - nir_channel(b, ayuv, 2),
> >nir_channel(b, ayuv, 1),
> > - nir_channel(b, ayuv, 0),
> > - nir_channel(b, ayuv, 3));
> > + nir_channel(b, ayuv, 2),
> > + nir_channel(b, ayuv, 3),
> > + nir_channel(b, ayuv, 0));
> >   }
> >   
> >   /*  
> 
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-15 Thread Daniel Stone
Hey,

On Tue, 15 Jan 2019 at 20:22, Rob Clark  wrote:
> On Tue, Jan 15, 2019 at 7:40 AM Daniel Stone  wrote:
> > My question would again be what value that brings you. Do you just
> > like seeing the name there, or do you go poke the people on IRC, or
> > follow up via email, or ... ? Again I personally go look through the
> > original review to see what came up during that first, but everyone's
> > different, so I'm just trying to understand what you actually do with
> > that information, so we can figure out if there's a better way to do
> > things for everyone rather than just blindly imitating what came
> > before.
>
> If I am curious or have some questions about why some code is the way
> it is I frequently use tig-blame, which makes it easy to step into the
> commit that made the change and see the commit msg and r-b tags..  I
> guess the most important part if I need to ping someone on IRC w/
> questions is the author, but it seems like having the other tags handy
> without context-switching to browser/gitlab is useful.
>
> I guess I don't as frequently dig into the history of the original
> patchset and it's review comments.. mostly because that isn't as easy
> with the email based review process.  Making this easier would defn be
> a win.  But in cases where I don't have to leave the comfort of tig,
> it would be nice not to have to start doing so..
>
> This is not an argument for sticking to email based process, just
> defence of what I think would be a useful feature for gitlab to gain
> ;-)

Thanks, that helps. How about this? It technically even fits in one
line, though you might wish it didn't.

~/mesa/mesa master ← → * % export
GITLAB_TOKEN=secret-api-token-you-get-from-web-UI
~/mesa/mesa master ← → * % export
GITLAB_COMMIT=f967273fb442de8281f8248e8c8bff5b13ab89e4
~/mesa/mesa master ← → * % curl --silent --header "PRIVATE-TOKEN:
$GITLAB_TOKEN" 
https://gitlab.freedesktop.org/api/v4/projects/mesa%2Fmesa/merge_requests/$(curl
--silent --header "PRIVATE-TOKEN: $GITLAB_TOKEN"
https://gitlab.freedesktop.org/api/v4/projects/mesa%2Fmesa/repository/commits/${GITLAB_COMMIT}/merge_requests
| jq -r '.[] | .iid')/participants | jq -r '.[] | { username:
.username, realname: .name }'
{
  "username": "sroland",
  "realname": "Roland Scheidegger"
}
{
  "username": "kwg",
  "realname": "Kenneth Graunke"
}
{
  "username": "mareko",
  "realname": "Marek Olšák"
}
{
  "username": "tpalli",
  "realname": "Tapani Pälli"
}

> (Also, I suppose preserving those artifacts of "the old process" is
> probably useful for folks who run git statistics, although personally
> that does not effect me.)

[mumbles something about GDPR]

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: use compute for resource_copy_region when possible

2019-01-15 Thread Axel Davy

On 15/01/2019 18:50, Marek Olšák wrote:
  
+void si_compute_copy_image(struct si_context *sctx,

+  struct pipe_resource *dst,
+  unsigned dst_level,
+  struct pipe_resource *src,
+  unsigned src_level,
+  unsigned dstx, unsigned dsty, unsigned dstz,
+  const struct pipe_box *src_box)
+{
+   struct pipe_context *ctx = >b;
+   unsigned width = src_box->width;
+   unsigned height = src_box->height;
+   unsigned depth = src_box->depth;
+
+   unsigned data[] = {src_box->x, src_box->y, src_box->z, 0, dstx, dsty, 
dstz, 0};
+
+   if (width == 0 || height == 0)
+   return;
+
+   sctx->flags |= SI_CONTEXT_CS_PARTIAL_FLUSH |
+  si_get_flush_flags(sctx, SI_COHERENCY_SHADER, L2_STREAM);
+   si_make_CB_shader_coherent(sctx, dst->nr_samples, true);
+
+   struct pipe_constant_buffer saved_cb = {};
+   si_get_pipe_constant_buffer(sctx, PIPE_SHADER_COMPUTE, 0, _cb);
+
+   struct si_images *images = >images[PIPE_SHADER_COMPUTE];
+   struct pipe_image_view saved_image[2] = {0};
+   util_copy_image_view(_image[0], >views[0]);
+   util_copy_image_view(_image[1], >views[1]);
+
+   void *saved_cs = sctx->cs_shader_state.program;
+
+   struct pipe_constant_buffer cb = {};
+   cb.buffer_size = sizeof(data);
+   cb.user_buffer = data;
+   ctx->set_constant_buffer(ctx, PIPE_SHADER_COMPUTE, 0, );
+
+   struct pipe_image_view image[2] = {0};
+   image[0].resource = src;
+   image[0].shader_access = image[0].access = PIPE_IMAGE_ACCESS_READ;
+   image[0].format = util_format_linear(src->format);
+   image[0].u.tex.level = src_level;
+   image[0].u.tex.first_layer = 0;
+   image[0].u.tex.last_layer =
+   src->target == PIPE_TEXTURE_3D ? u_minify(src->depth0, 
src_level) - 1
+   : (unsigned)(src->array_size - 
1);
+   image[1].resource = dst;
+   image[1].shader_access = image[1].access = PIPE_IMAGE_ACCESS_WRITE;
+   image[1].format = util_format_linear(dst->format);
+   image[1].u.tex.level = dst_level;
+   image[1].u.tex.first_layer = 0;
+   image[1].u.tex.last_layer =
+   dst->target == PIPE_TEXTURE_3D ? u_minify(dst->depth0, 
dst_level) - 1
+   : (unsigned)(dst->array_size - 
1);
+
+   if (src->format == PIPE_FORMAT_R9G9B9E5_FLOAT)
+   image[0].format = image[1].format = PIPE_FORMAT_R32_UINT;
+
+   /* SNORM8 blitting has precision issues on some chips. Use the SINT
+* equivalent instead, which doesn't force DCC decompression.
+* Note that some chips avoid this issue by using SDMA.
+*/
+   if (util_format_is_snorm8(dst->format)) {
+   image[0].format = image[1].format =
+   util_format_snorm8_to_sint8(dst->format);
+   }
+
+   ctx->set_shader_images(ctx, PIPE_SHADER_COMPUTE, 0, 2, image);
+
+   struct pipe_grid_info info = {0};
+
+   if (dst->target == PIPE_TEXTURE_1D_ARRAY && src->target == 
PIPE_TEXTURE_1D_ARRAY) {
+   if (!sctx->cs_copy_image_1d_array)
+   sctx->cs_copy_image_1d_array =
+   
si_create_copy_image_compute_shader_1d_array(ctx);
+   ctx->bind_compute_state(ctx, sctx->cs_copy_image_1d_array);
+   info.block[0] = 64;
+   info.last_block[0] = width % 64;
+   info.block[1] = 1;
+   info.block[2] = 1;
+   info.grid[0] = DIV_ROUND_UP(width, 64);
+   info.grid[1] = depth;
+   info.grid[2] = 1;
+   } else {
+   if (!sctx->cs_copy_image)
+   sctx->cs_copy_image = 
si_create_copy_image_compute_shader(ctx);
+   ctx->bind_compute_state(ctx, sctx->cs_copy_image);
+   info.block[0] = 8;
+   info.last_block[0] = width % 8;
+   info.block[1] = 8;
+   info.last_block[1] = height % 8;
+   info.block[2] = 1;
+   info.grid[0] = DIV_ROUND_UP(width, 8);
+   info.grid[1] = DIV_ROUND_UP(height, 8);
+   info.grid[2] = depth;
+   }
+
+   ctx->launch_grid(ctx, );
+
+   sctx->flags |= SI_CONTEXT_CS_PARTIAL_FLUSH |
+  (sctx->chip_class <= VI ? SI_CONTEXT_WRITEBACK_GLOBAL_L2 
: 0) |
+  si_get_flush_flags(sctx, SI_COHERENCY_SHADER, L2_STREAM);
+   ctx->bind_compute_state(ctx, saved_cs);
+   ctx->set_shader_images(ctx, PIPE_SHADER_COMPUTE, 0, 2, saved_image);
+   ctx->set_constant_buffer(ctx, PIPE_SHADER_COMPUTE, 0, _cb);
+}
+



+void *si_create_copy_image_compute_shader(struct pipe_context *ctx)
+{
+   static const char text[] =
+   "COMP\n"
+   "PROPERTY 

Re: [Mesa-dev] [PATCH] mesa/main: return GL_BGRA as the preferred color read format more often

2019-01-15 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Sun, Jan 13, 2019 at 1:57 PM Ilia Mirkin  wrote:

> Currently we were only returning it for BGRA8. But it makes sense to
> return it for all the BGR[AX] variants. This was discovered when
> figuring out why wlroots was sending a RGBX instead of BGRX image when
> screenshotting.
>
> Signed-off-by: Ilia Mirkin 
> ---
>
> Not sure that this can really matter for performance -- I guess in some
> cases it means we can memcpy a mapped RB. But we might also fall off
> some optimized imageBuffer pbo download path. On average, probably
> "meh", but seems more consistent.
>
>  src/mesa/main/framebuffer.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/src/mesa/main/framebuffer.c b/src/mesa/main/framebuffer.c
> index 10dd2fde446..09059a11cbd 100644
> --- a/src/mesa/main/framebuffer.c
> +++ b/src/mesa/main/framebuffer.c
> @@ -850,6 +850,11 @@ _mesa_get_color_read_format(struct gl_context *ctx,
>case MESA_FORMAT_RGBA_UINT8:
>   return GL_RGBA_INTEGER;
>case MESA_FORMAT_B8G8R8A8_UNORM:
> +  case MESA_FORMAT_B8G8R8X8_UNORM:
> +  case MESA_FORMAT_B5G5R5A1_UNORM:
> +  case MESA_FORMAT_B5G5R5X1_UNORM:
> +  case MESA_FORMAT_B4G4R4A4_UNORM:
> +  case MESA_FORMAT_B4G4R4X4_UNORM:
>   return GL_BGRA;
>case MESA_FORMAT_B5G6R5_UNORM:
>case MESA_FORMAT_R11G11B10_FLOAT:
> --
> 2.19.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/lower_tex: Fix the channel ordering during conversion of AYUV images

2019-01-15 Thread Vivek Kasireddy
On Tue, 15 Jan 2019 09:45:52 +0200
Tapani Pälli  wrote:

> On 1/15/19 4:34 AM, Lionel Landwerlin wrote:
> > When writing this I used this page to figure the bytes' ordering : 
> > https://docs.microsoft.com/en-us/windows/desktop/medfound/recommended-8-bit-yuv-formats-for-video-rendering#ayuv
> >  
> > 
> > Of course endianess confuses everything :(
> > 
> > sunxi seems to support AYUV & VUYA : 
> > https://github.com/allwinner-zh/linux-3.4-sunxi/blob/master/include/video/sunxi_display2.h#L40
> >  
> > 
> > 
> > Finally this patch (and its gstreamer comments) confuses me even
> > more : https://patchwork.freedesktop.org/patch/255529/
> > 
> > I really don't know what's right or wrong here...  
> 
> IMO order 1230 seems wrong to me. Vivek, was the order chosen just 
> because vivid driver outputs that or is it based on anything else,
> like some specification or other information?
Hi Tapani,
I chose the order just by looking at the Vivid driver output. YUYV and
VYUY buffers generated by the Vivid driver were displayed correctly on
Weston and I believed that YUV4 buffers (which I assumed were AYUV)
should have displayed correctly as well. 

-Vivek

> 
> 
> > -
> > Lionel
> > 
> > On 15/01/2019 00:49, Vivek Kasireddy wrote:  
> >> From: "Kasireddy, Vivek" 
> >>
> >> The channel ordering should be 1230 instead of 2103.
> >>
> >> While displaying the packed YUV buffers generated by the Vivid
> >> (Virtual Video) driver on Weston, it was observed that AYUV images
> >> were not displayed correctly. Changing the ordering to 1230 makes
> >> AYUV buffers display as expected.
> >>
> >> CC: Lionel Landwerlin 
> >> CC: Tapani Palli 
> >> Signed-off-by: Vivek Kasireddy 
> >> ---
> >>   src/compiler/nir/nir_lower_tex.c | 6 +++---
> >>   1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/src/compiler/nir/nir_lower_tex.c 
> >> b/src/compiler/nir/nir_lower_tex.c
> >> index a618b86b34c..7058c54f17c 100644
> >> --- a/src/compiler/nir/nir_lower_tex.c
> >> +++ b/src/compiler/nir/nir_lower_tex.c
> >> @@ -434,10 +434,10 @@ lower_ayuv_external(nir_builder *b, 
> >> nir_tex_instr *tex)
> >>     nir_ssa_def *ayuv = sample_plane(b, tex, 0);
> >>     convert_yuv_to_rgb(b, tex,
> >> - nir_channel(b, ayuv, 2),
> >>    nir_channel(b, ayuv, 1),
> >> - nir_channel(b, ayuv, 0),
> >> - nir_channel(b, ayuv, 3));
> >> + nir_channel(b, ayuv, 2),
> >> + nir_channel(b, ayuv, 3),
> >> + nir_channel(b, ayuv, 0));
> >>   }
> >>   /*  
> > 
> >   

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radv: prevent dirtying of dynamic state when it does not change

2019-01-15 Thread Rhys Perry
DXVK often sets dynamic state without actually changing it.

Signed-off-by: Rhys Perry 
---
 src/amd/vulkan/radv_cmd_buffer.c | 92 ++--
 1 file changed, 76 insertions(+), 16 deletions(-)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index 59903ab64d8..56b3c934c2e 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -2965,6 +2965,11 @@ void radv_CmdSetViewport(
assert(firstViewport < MAX_VIEWPORTS);
assert(total_count >= 1 && total_count <= MAX_VIEWPORTS);
 
+   if (!memcmp(state->dynamic.viewport.viewports + firstViewport,
+   pViewports, viewportCount * sizeof(*pViewports))) {
+   return;
+   }
+
memcpy(state->dynamic.viewport.viewports + firstViewport, pViewports,
   viewportCount * sizeof(*pViewports));
 
@@ -2984,6 +2989,11 @@ void radv_CmdSetScissor(
assert(firstScissor < MAX_SCISSORS);
assert(total_count >= 1 && total_count <= MAX_SCISSORS);
 
+   if (!memcmp(state->dynamic.scissor.scissors + firstScissor, pScissors,
+   scissorCount * sizeof(*pScissors))) {
+   return;
+   }
+
memcpy(state->dynamic.scissor.scissors + firstScissor, pScissors,
   scissorCount * sizeof(*pScissors));
 
@@ -2995,6 +3005,10 @@ void radv_CmdSetLineWidth(
float   lineWidth)
 {
RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
+
+   if (cmd_buffer->state.dynamic.line_width == lineWidth)
+   return;
+
cmd_buffer->state.dynamic.line_width = lineWidth;
cmd_buffer->state.dirty |= RADV_CMD_DIRTY_DYNAMIC_LINE_WIDTH;
 }
@@ -3006,12 +3020,19 @@ void radv_CmdSetDepthBias(
float   depthBiasSlopeFactor)
 {
RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
+   struct radv_cmd_state *state = _buffer->state;
 
-   cmd_buffer->state.dynamic.depth_bias.bias = depthBiasConstantFactor;
-   cmd_buffer->state.dynamic.depth_bias.clamp = depthBiasClamp;
-   cmd_buffer->state.dynamic.depth_bias.slope = depthBiasSlopeFactor;
+   if (state->dynamic.depth_bias.bias == depthBiasConstantFactor &&
+   state->dynamic.depth_bias.clamp == depthBiasClamp &&
+   state->dynamic.depth_bias.slope == depthBiasSlopeFactor) {
+   return;
+   }
 
-   cmd_buffer->state.dirty |= RADV_CMD_DIRTY_DYNAMIC_DEPTH_BIAS;
+   state->dynamic.depth_bias.bias = depthBiasConstantFactor;
+   state->dynamic.depth_bias.clamp = depthBiasClamp;
+   state->dynamic.depth_bias.slope = depthBiasSlopeFactor;
+
+   state->dirty |= RADV_CMD_DIRTY_DYNAMIC_DEPTH_BIAS;
 }
 
 void radv_CmdSetBlendConstants(
@@ -3019,11 +3040,14 @@ void radv_CmdSetBlendConstants(
const float blendConstants[4])
 {
RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
+   struct radv_cmd_state *state = _buffer->state;
 
-   memcpy(cmd_buffer->state.dynamic.blend_constants,
-  blendConstants, sizeof(float) * 4);
+   if (!memcmp(state->dynamic.blend_constants, blendConstants, 
sizeof(float) * 4))
+   return;
+
+   memcpy(state->dynamic.blend_constants, blendConstants, sizeof(float) * 
4);
 
-   cmd_buffer->state.dirty |= RADV_CMD_DIRTY_DYNAMIC_BLEND_CONSTANTS;
+   state->dirty |= RADV_CMD_DIRTY_DYNAMIC_BLEND_CONSTANTS;
 }
 
 void radv_CmdSetDepthBounds(
@@ -3032,11 +3056,17 @@ void radv_CmdSetDepthBounds(
float   maxDepthBounds)
 {
RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
+   struct radv_cmd_state *state = _buffer->state;
 
-   cmd_buffer->state.dynamic.depth_bounds.min = minDepthBounds;
-   cmd_buffer->state.dynamic.depth_bounds.max = maxDepthBounds;
+   if (state->dynamic.depth_bounds.min == minDepthBounds &&
+   state->dynamic.depth_bounds.max == maxDepthBounds) {
+   return;
+   }
+
+   state->dynamic.depth_bounds.min = minDepthBounds;
+   state->dynamic.depth_bounds.max = maxDepthBounds;
 
-   cmd_buffer->state.dirty |= RADV_CMD_DIRTY_DYNAMIC_DEPTH_BOUNDS;
+   state->dirty |= RADV_CMD_DIRTY_DYNAMIC_DEPTH_BOUNDS;
 }
 
 void radv_CmdSetStencilCompareMask(
@@ -3045,13 +3075,21 @@ void radv_CmdSetStencilCompareMask(
uint32_tcompareMask)
 {
RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
+   struct radv_cmd_state *state = _buffer->state;
+   bool front_same = state->dynamic.stencil_compare_mask.front == 
compareMask;
+   bool back_same = state->dynamic.stencil_compare_mask.back == 
compareMask;
+
+   if ((!(faceMask & VK_STENCIL_FACE_FRONT_BIT) || front_same) &&
+   (!(faceMask & VK_STENCIL_FACE_BACK_BIT) || back_same)) {
+   

Re: [Mesa-dev] [PATCH v3 01/42] intel/compiler: handle conversions between int and half-float on atom

2019-01-15 Thread Francisco Jerez
Iago Toral Quiroga  writes:

> v2: adapted to work with the new regioning lowering pass
>
> Reviewed-by: Topi Pohjolainen  (v1)
> ---
>  src/intel/compiler/brw_ir_fs.h | 33 ++---
>  1 file changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
> index 3c23fb375e4..ba4d6a95720 100644
> --- a/src/intel/compiler/brw_ir_fs.h
> +++ b/src/intel/compiler/brw_ir_fs.h
> @@ -497,9 +497,10 @@ is_unordered(const fs_inst *inst)
>  }
>  
>  /**
> - * Return whether the following regioning restriction applies to the 
> specified
> - * instruction.  From the Cherryview PRM Vol 7. "Register Region
> - * Restrictions":
> + * Return whether one of the the following regioning restrictions apply to 
> the
> + * specified instruction.
> + *
> + * From the Cherryview PRM Vol 7. "Register Region Restrictions":
>   *
>   * "When source or destination datatype is 64b or operation is integer DWord
>   *  multiply, regioning in Align1 must follow these rules:
> @@ -508,6 +509,14 @@ is_unordered(const fs_inst *inst)
>   *  2. Regioning must ensure Src.Vstride = Src.Width * Src.Hstride.
>   *  3. Source and Destination offset must be the same, except the case of
>   * scalar source."
> + *
> + * From the Cherryview PRM Vol 7. "Register Region Restrictions":
> + *
> + *"Conversion between Integer and HF (Half Float) must be DWord
> + * aligned and strided by a DWord on the destination."
> + *
> + *The same restriction is listed for other hardware platforms, however,
> + *empirical testing suggests that only atom platforms are affected.
>   */
>  static inline bool
>  has_dst_aligned_region_restriction(const gen_device_info *devinfo,
> @@ -518,10 +527,20 @@ has_dst_aligned_region_restriction(const 
> gen_device_info *devinfo,
>   (inst->opcode == BRW_OPCODE_MUL || inst->opcode == BRW_OPCODE_MAD);
>  
> if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
> -   (type_sz(exec_type) == 4 && is_int_multiply))
> -  return devinfo->is_cherryview || gen_device_info_is_9lp(devinfo);
> -   else
> -  return false;
> +   (type_sz(exec_type) == 4 && is_int_multiply)) {
> +  if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo))
> + return true;
> +   }
> +
> +   const bool dst_type_is_hf = inst->dst.type == BRW_REGISTER_TYPE_HF;
> +   const bool exec_type_is_hf = exec_type == BRW_REGISTER_TYPE_HF;
> +   if ((dst_type_is_hf && !brw_reg_type_is_floating_point(exec_type)) ||
> +   (exec_type_is_hf && !brw_reg_type_is_floating_point(inst->dst.type))) 
> {
> +  if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo))
> + return true;
> +   }

While looking into this closely, I'm seeing substantial divergence
between the behavior of the simulator, the hardware docs, and the
restriction this is implementing...  The docs are certainly inconsistent
about how and where this should be handled.

I'm suspecting that this restriction is more similar in nature to the
one referred to in the regioning lowering pass as
"is_narrowing_conversion", rather than the one handled by
has_dst_aligned_region_restriction().  Probably we don't need to change
this function nor the regioning pass for it to be honored, because that
restriction is already implemented.  I have a feeling that the reason
for this may be that the 16-bit pipeline lacks the ability to handle
conversions from or to half-float, so the execution type is implicitly
promoted to the matching (integer or floating-point) 32-bit type where
any HF conversion would be needed.  And on those the usual alignment
restriction of the destination to a larger execution type applies.  From
the hardware docs for CHV *only*:

| When single precision and half precision floats are mixed between
| source operands or between source and destination operand. In such
| cases, single precision float is the execution datatype.

This would mean that an "add dst:f, src:hf, src:hf" is really computed
with single precision (!).

The restriction you're quoting seems to be the following:

| BDW+
|
| Conversion between Integer and HF (Half Float) must be DWord-aligned
| and strided by a DWord on the destination.
|
| // Example:
| add (8) r10.0<2>:hf r11.0<8;8,1>:w r12.0<8;8,1>:w
| // Destination stride must be 2.
| mov (8) r10.0<2>:w r11.0<8;8,1>:hf
| // Destination stride must be 2.

However that restriction is apparently overriden on *most* projects
except for BDW (where you aren't applying any restriction at all) by the
following:

| Project:  CHV, SKL+
|
| There is a relaxed alignment rule for word destinations. When the
| destination type is word (UW, W, HF), destination data types can be
| aligned to either the lowest word or the second lowest word of the
| execution channel. This means the destination data words can be either
| all in the even word locations or all in the odd word locations.
| 
| // Example:
| add (8)  r10.0<2>:hf 

Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-15 Thread Matt Turner
On Mon, Jan 14, 2019 at 4:36 AM Daniel Stone  wrote:
>
> Hi,
>
> On Fri, 11 Jan 2019 at 17:05, Jason Ekstrand  wrote:
> >  5. There's no way with gitlab for Reviewed-by tags to get automatically 
> > applied as part of the merging process.  This makes merging a bit more 
> > manual than it needs to be but is really no worse than it was before.
>
> I'm still on the side of not seeing the value in them.

Reviewed-by tags are useful for measuring the quantity of patch review
people do (which is useful in a corporate environment...). It's often
a thankless task that's valued much lower than first order
contributions, so having a way to at least quantify patch reviews
shows that people are spending their time to help others contribute.

The number of R-b tags is not a 100% accurate picture of the
situation, but it gives at least a good overview of who is doing the
tedious work of patch review. For instance, in 2018 the top reviewers
are

620 Bas Nieuwenhuizen 
530 Marek Olšák 
505 Jason Ekstrand 
452 Kenneth Graunke 

If my name were in there, it would definitely be something I put on my
yearly review.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/fs: Do the grf127 hack on SIMD8 instructions in SIMD16 mode

2019-01-15 Thread Matt Turner
On Tue, Jan 15, 2019 at 8:58 AM Jason Ekstrand  wrote:
>
> Previously, we only applied the fix to shaders with a dispatch mode of
> SIMD8 but the code it relies on for SIMD16 mode only applies to SIMD16
> instructions.  If you have a SIMD8 instruction in a SIMD16 shader,
> neither would trigger and the restriction could still be hit.
>
> Cc: Jose Maria Casanova Crespo 
> Fixes: 232ed8980217dd "i965/fs: Register allocator shoudn't use grf127..."
> ---
>  src/intel/compiler/brw_fs_reg_allocate.cpp | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp 
> b/src/intel/compiler/brw_fs_reg_allocate.cpp
> index 5db5242452e..ec743f9b5bf 100644
> --- a/src/intel/compiler/brw_fs_reg_allocate.cpp.
> +++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
> @@ -667,15 +667,14 @@ fs_visitor::assign_regs(bool allow_spilling, bool 
> spill_all)
> * messages adding a node interference to the grf127_send_hack_node.
> * This node has a fixed asignment to grf127.
> *
> -   * We don't apply it to SIMD16 because previous code avoids any 
> register
> -   * overlap between sources and destination.
> +   * We don't apply it to SIMD16 instructions because previous code 
> avoids
> +   * any register overlap between sources and destination.
> */
>ra_set_node_reg(g, grf127_send_hack_node, 127);
> -  if (dispatch_width == 8) {
> - foreach_block_and_inst(block, fs_inst, inst, cfg) {
> -if (inst->is_send_from_grf() && inst->dst.file == VGRF)
> -   ra_add_node_interference(g, inst->dst.nr, 
> grf127_send_hack_node);
> - }
> +  foreach_block_and_inst(block, fs_inst, inst, cfg) {
> + if (inst->exec_size < 16 && inst->is_send_from_grf() &&
> + inst->dst.file == VGRF)
> +ra_add_node_interference(g, inst->dst.nr, grf127_send_hack_node);
>}
>

Did the code in brw_eu_validate.c catch the case you found?

In fact, that code looks wrong:

|  (brw_inst_dst_da_reg_nr(devinfo, inst) +
|   brw_inst_rlen(devinfo, inst) > 127) &&

I think > should be >=. And maybe we should have a separate case
earlier that checks that dst_nr+rlen actually fits in registers, and
then change > to just ==. FFS :(

Not sure what I was thinking letting that patch through without a unit
test. I'll do that.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] radeonsi: merge & rename texture BO metadata functions

2019-01-15 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_texture.c | 117 ++
 1 file changed, 53 insertions(+), 64 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_texture.c 
b/src/gallium/drivers/radeonsi/si_texture.c
index c169d4e443d..a56674b6000 100644
--- a/src/gallium/drivers/radeonsi/si_texture.c
+++ b/src/gallium/drivers/radeonsi/si_texture.c
@@ -328,51 +328,25 @@ static int si_init_surface(struct si_screen *sscreen,
((uint64_t)pitch * 
surface->u.legacy.level[0].nblk_y * bpe) / 4;
}
if (offset) {
for (i = 0; i < ARRAY_SIZE(surface->u.legacy.level); 
++i)
surface->u.legacy.level[i].offset += offset;
}
}
return 0;
 }
 
-static void si_texture_init_metadata(struct si_screen *sscreen,
-struct si_texture *tex,
-struct radeon_bo_metadata *metadata)
-{
-   struct radeon_surf *surface = >surface;
-
-   memset(metadata, 0, sizeof(*metadata));
-
-   if (sscreen->info.chip_class >= GFX9) {
-   metadata->u.gfx9.swizzle_mode = 
surface->u.gfx9.surf.swizzle_mode;
-   } else {
-   metadata->u.legacy.microtile = surface->u.legacy.level[0].mode 
>= RADEON_SURF_MODE_1D ?
-  RADEON_LAYOUT_TILED : 
RADEON_LAYOUT_LINEAR;
-   metadata->u.legacy.macrotile = surface->u.legacy.level[0].mode 
>= RADEON_SURF_MODE_2D ?
-  RADEON_LAYOUT_TILED : 
RADEON_LAYOUT_LINEAR;
-   metadata->u.legacy.pipe_config = surface->u.legacy.pipe_config;
-   metadata->u.legacy.bankw = surface->u.legacy.bankw;
-   metadata->u.legacy.bankh = surface->u.legacy.bankh;
-   metadata->u.legacy.tile_split = surface->u.legacy.tile_split;
-   metadata->u.legacy.mtilea = surface->u.legacy.mtilea;
-   metadata->u.legacy.num_banks = surface->u.legacy.num_banks;
-   metadata->u.legacy.stride = surface->u.legacy.level[0].nblk_x * 
surface->bpe;
-   metadata->u.legacy.scanout = (surface->flags & 
RADEON_SURF_SCANOUT) != 0;
-   }
-}
-
-static void si_surface_import_metadata(struct si_screen *sscreen,
-  struct radeon_surf *surf,
-  struct radeon_bo_metadata *metadata,
-  enum radeon_surf_mode *array_mode,
-  bool *is_scanout)
+static void si_get_display_metadata(struct si_screen *sscreen,
+   struct radeon_surf *surf,
+   struct radeon_bo_metadata *metadata,
+   enum radeon_surf_mode *array_mode,
+   bool *is_scanout)
 {
if (sscreen->info.chip_class >= GFX9) {
if (metadata->u.gfx9.swizzle_mode > 0)
*array_mode = RADEON_SURF_MODE_2D;
else
*array_mode = RADEON_SURF_MODE_LINEAR_ALIGNED;
 
*is_scanout = metadata->u.gfx9.swizzle_mode == 0 ||
  metadata->u.gfx9.swizzle_mode % 4 == 2;
 
@@ -622,86 +596,106 @@ static void si_reallocate_texture_inplace(struct 
si_context *sctx,
si_texture_reference(_tex, NULL);
 
p_atomic_inc(>screen->dirty_tex_counter);
 }
 
 static uint32_t si_get_bo_metadata_word1(struct si_screen *sscreen)
 {
return (ATI_VENDOR_ID << 16) | sscreen->info.pci_id;
 }
 
-static void si_query_opaque_metadata(struct si_screen *sscreen,
-struct si_texture *tex,
-struct radeon_bo_metadata *md)
+static void si_set_tex_bo_metadata(struct si_screen *sscreen,
+  struct si_texture *tex)
 {
+   struct radeon_surf *surface = >surface;
struct pipe_resource *res = >buffer.b.b;
-   static const unsigned char swizzle[] = {
-   PIPE_SWIZZLE_X,
-   PIPE_SWIZZLE_Y,
-   PIPE_SWIZZLE_Z,
-   PIPE_SWIZZLE_W
-   };
-   uint32_t desc[8], i;
-   bool is_array = util_texture_is_array(res->target);
+   struct radeon_bo_metadata md;
 
-   if (!sscreen->info.has_bo_metadata)
-   return;
+   memset(, 0, sizeof(md));
+
+   if (sscreen->info.chip_class >= GFX9) {
+   md.u.gfx9.swizzle_mode = surface->u.gfx9.surf.swizzle_mode;
+   } else {
+   md.u.legacy.microtile = surface->u.legacy.level[0].mode >= 
RADEON_SURF_MODE_1D ?
+  RADEON_LAYOUT_TILED : 
RADEON_LAYOUT_LINEAR;
+   md.u.legacy.macrotile = surface->u.legacy.level[0].mode >= 
RADEON_SURF_MODE_2D ?
+  

Re: [Mesa-dev] [PATCH] radv: prevent dirtying of dynamic state when it does not change

2019-01-15 Thread Bas Nieuwenhuizen
On Tue, Jan 15, 2019 at 10:59 PM Rhys Perry  wrote:
>
> DXVK often sets dynamic state without actually changing it.
>
> Signed-off-by: Rhys Perry 
> ---
>  src/amd/vulkan/radv_cmd_buffer.c | 92 ++--
>  1 file changed, 76 insertions(+), 16 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_cmd_buffer.c 
> b/src/amd/vulkan/radv_cmd_buffer.c
> index 59903ab64d8..56b3c934c2e 100644
> --- a/src/amd/vulkan/radv_cmd_buffer.c
> +++ b/src/amd/vulkan/radv_cmd_buffer.c
> @@ -2965,6 +2965,11 @@ void radv_CmdSetViewport(
> assert(firstViewport < MAX_VIEWPORTS);
> assert(total_count >= 1 && total_count <= MAX_VIEWPORTS);
>
> +   if (!memcmp(state->dynamic.viewport.viewports + firstViewport,
> +   pViewports, viewportCount * sizeof(*pViewports))) {
> +   return;
> +   }
> +
> memcpy(state->dynamic.viewport.viewports + firstViewport, pViewports,
>viewportCount * sizeof(*pViewports));
>
> @@ -2984,6 +2989,11 @@ void radv_CmdSetScissor(
> assert(firstScissor < MAX_SCISSORS);
> assert(total_count >= 1 && total_count <= MAX_SCISSORS);
>
> +   if (!memcmp(state->dynamic.scissor.scissors + firstScissor, pScissors,
> +   scissorCount * sizeof(*pScissors))) {
> +   return;
> +   }
> +
> memcpy(state->dynamic.scissor.scissors + firstScissor, pScissors,
>scissorCount * sizeof(*pScissors));
>
> @@ -2995,6 +3005,10 @@ void radv_CmdSetLineWidth(
> float   lineWidth)
>  {
> RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
> +
> +   if (cmd_buffer->state.dynamic.line_width == lineWidth)
> +   return;
> +
> cmd_buffer->state.dynamic.line_width = lineWidth;
> cmd_buffer->state.dirty |= RADV_CMD_DIRTY_DYNAMIC_LINE_WIDTH;
>  }
> @@ -3006,12 +3020,19 @@ void radv_CmdSetDepthBias(
> float   depthBiasSlopeFactor)
>  {
> RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
> +   struct radv_cmd_state *state = _buffer->state;
>
> -   cmd_buffer->state.dynamic.depth_bias.bias = depthBiasConstantFactor;
> -   cmd_buffer->state.dynamic.depth_bias.clamp = depthBiasClamp;
> -   cmd_buffer->state.dynamic.depth_bias.slope = depthBiasSlopeFactor;
> +   if (state->dynamic.depth_bias.bias == depthBiasConstantFactor &&
> +   state->dynamic.depth_bias.clamp == depthBiasClamp &&
> +   state->dynamic.depth_bias.slope == depthBiasSlopeFactor) {
> +   return;
> +   }
>
> -   cmd_buffer->state.dirty |= RADV_CMD_DIRTY_DYNAMIC_DEPTH_BIAS;
> +   state->dynamic.depth_bias.bias = depthBiasConstantFactor;
> +   state->dynamic.depth_bias.clamp = depthBiasClamp;
> +   state->dynamic.depth_bias.slope = depthBiasSlopeFactor;
> +
> +   state->dirty |= RADV_CMD_DIRTY_DYNAMIC_DEPTH_BIAS;
>  }
>
>  void radv_CmdSetBlendConstants(
> @@ -3019,11 +3040,14 @@ void radv_CmdSetBlendConstants(
> const float blendConstants[4])
>  {
> RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
> +   struct radv_cmd_state *state = _buffer->state;
>
> -   memcpy(cmd_buffer->state.dynamic.blend_constants,
> -  blendConstants, sizeof(float) * 4);
> +   if (!memcmp(state->dynamic.blend_constants, blendConstants, 
> sizeof(float) * 4))
> +   return;
> +
> +   memcpy(state->dynamic.blend_constants, blendConstants, sizeof(float) 
> * 4);
>
> -   cmd_buffer->state.dirty |= RADV_CMD_DIRTY_DYNAMIC_BLEND_CONSTANTS;
> +   state->dirty |= RADV_CMD_DIRTY_DYNAMIC_BLEND_CONSTANTS;
>  }
>
>  void radv_CmdSetDepthBounds(
> @@ -3032,11 +3056,17 @@ void radv_CmdSetDepthBounds(
> float   maxDepthBounds)
>  {
> RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
> +   struct radv_cmd_state *state = _buffer->state;
>
> -   cmd_buffer->state.dynamic.depth_bounds.min = minDepthBounds;
> -   cmd_buffer->state.dynamic.depth_bounds.max = maxDepthBounds;
> +   if (state->dynamic.depth_bounds.min == minDepthBounds &&
> +   state->dynamic.depth_bounds.max == maxDepthBounds) {
> +   return;
> +   }
> +
> +   state->dynamic.depth_bounds.min = minDepthBounds;
> +   state->dynamic.depth_bounds.max = maxDepthBounds;
>
> -   cmd_buffer->state.dirty |= RADV_CMD_DIRTY_DYNAMIC_DEPTH_BOUNDS;
> +   state->dirty |= RADV_CMD_DIRTY_DYNAMIC_DEPTH_BOUNDS;
>  }
>
>  void radv_CmdSetStencilCompareMask(
> @@ -3045,13 +3075,21 @@ void radv_CmdSetStencilCompareMask(
> uint32_tcompareMask)
>  {
> RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
> +   struct radv_cmd_state *state = _buffer->state;
> +   bool front_same = 

Re: [Mesa-dev] [PATCH] radv: prevent dirtying of dynamic state when it does not change

2019-01-15 Thread Rhys Perry
I misread some code and forgot to remove it.

It was always unrelated to this patch.

On Wed, 16 Jan 2019 at 00:22, Bas Nieuwenhuizen  
wrote:
>
> On Tue, Jan 15, 2019 at 10:59 PM Rhys Perry  wrote:
> >
> > DXVK often sets dynamic state without actually changing it.
> >
> > Signed-off-by: Rhys Perry 
> > ---
> >  src/amd/vulkan/radv_cmd_buffer.c | 92 ++--
> >  1 file changed, 76 insertions(+), 16 deletions(-)
> >
> > diff --git a/src/amd/vulkan/radv_cmd_buffer.c 
> > b/src/amd/vulkan/radv_cmd_buffer.c
> > index 59903ab64d8..56b3c934c2e 100644
> > --- a/src/amd/vulkan/radv_cmd_buffer.c
> > +++ b/src/amd/vulkan/radv_cmd_buffer.c
> > @@ -2965,6 +2965,11 @@ void radv_CmdSetViewport(
> > assert(firstViewport < MAX_VIEWPORTS);
> > assert(total_count >= 1 && total_count <= MAX_VIEWPORTS);
> >
> > +   if (!memcmp(state->dynamic.viewport.viewports + firstViewport,
> > +   pViewports, viewportCount * sizeof(*pViewports))) {
> > +   return;
> > +   }
> > +
> > memcpy(state->dynamic.viewport.viewports + firstViewport, 
> > pViewports,
> >viewportCount * sizeof(*pViewports));
> >
> > @@ -2984,6 +2989,11 @@ void radv_CmdSetScissor(
> > assert(firstScissor < MAX_SCISSORS);
> > assert(total_count >= 1 && total_count <= MAX_SCISSORS);
> >
> > +   if (!memcmp(state->dynamic.scissor.scissors + firstScissor, 
> > pScissors,
> > +   scissorCount * sizeof(*pScissors))) {
> > +   return;
> > +   }
> > +
> > memcpy(state->dynamic.scissor.scissors + firstScissor, pScissors,
> >scissorCount * sizeof(*pScissors));
> >
> > @@ -2995,6 +3005,10 @@ void radv_CmdSetLineWidth(
> > float   lineWidth)
> >  {
> > RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
> > +
> > +   if (cmd_buffer->state.dynamic.line_width == lineWidth)
> > +   return;
> > +
> > cmd_buffer->state.dynamic.line_width = lineWidth;
> > cmd_buffer->state.dirty |= RADV_CMD_DIRTY_DYNAMIC_LINE_WIDTH;
> >  }
> > @@ -3006,12 +3020,19 @@ void radv_CmdSetDepthBias(
> > float   depthBiasSlopeFactor)
> >  {
> > RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
> > +   struct radv_cmd_state *state = _buffer->state;
> >
> > -   cmd_buffer->state.dynamic.depth_bias.bias = depthBiasConstantFactor;
> > -   cmd_buffer->state.dynamic.depth_bias.clamp = depthBiasClamp;
> > -   cmd_buffer->state.dynamic.depth_bias.slope = depthBiasSlopeFactor;
> > +   if (state->dynamic.depth_bias.bias == depthBiasConstantFactor &&
> > +   state->dynamic.depth_bias.clamp == depthBiasClamp &&
> > +   state->dynamic.depth_bias.slope == depthBiasSlopeFactor) {
> > +   return;
> > +   }
> >
> > -   cmd_buffer->state.dirty |= RADV_CMD_DIRTY_DYNAMIC_DEPTH_BIAS;
> > +   state->dynamic.depth_bias.bias = depthBiasConstantFactor;
> > +   state->dynamic.depth_bias.clamp = depthBiasClamp;
> > +   state->dynamic.depth_bias.slope = depthBiasSlopeFactor;
> > +
> > +   state->dirty |= RADV_CMD_DIRTY_DYNAMIC_DEPTH_BIAS;
> >  }
> >
> >  void radv_CmdSetBlendConstants(
> > @@ -3019,11 +3040,14 @@ void radv_CmdSetBlendConstants(
> > const float blendConstants[4])
> >  {
> > RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
> > +   struct radv_cmd_state *state = _buffer->state;
> >
> > -   memcpy(cmd_buffer->state.dynamic.blend_constants,
> > -  blendConstants, sizeof(float) * 4);
> > +   if (!memcmp(state->dynamic.blend_constants, blendConstants, 
> > sizeof(float) * 4))
> > +   return;
> > +
> > +   memcpy(state->dynamic.blend_constants, blendConstants, 
> > sizeof(float) * 4);
> >
> > -   cmd_buffer->state.dirty |= RADV_CMD_DIRTY_DYNAMIC_BLEND_CONSTANTS;
> > +   state->dirty |= RADV_CMD_DIRTY_DYNAMIC_BLEND_CONSTANTS;
> >  }
> >
> >  void radv_CmdSetDepthBounds(
> > @@ -3032,11 +3056,17 @@ void radv_CmdSetDepthBounds(
> > float   maxDepthBounds)
> >  {
> > RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
> > +   struct radv_cmd_state *state = _buffer->state;
> >
> > -   cmd_buffer->state.dynamic.depth_bounds.min = minDepthBounds;
> > -   cmd_buffer->state.dynamic.depth_bounds.max = maxDepthBounds;
> > +   if (state->dynamic.depth_bounds.min == minDepthBounds &&
> > +   state->dynamic.depth_bounds.max == maxDepthBounds) {
> > +   return;
> > +   }
> > +
> > +   state->dynamic.depth_bounds.min = minDepthBounds;
> > +   state->dynamic.depth_bounds.max = maxDepthBounds;
> >
> > -   cmd_buffer->state.dirty |= RADV_CMD_DIRTY_DYNAMIC_DEPTH_BOUNDS;
> > +   state->dirty |= 

Re: [Mesa-dev] [PATCH] intel/eu: Stop overriding exec sizes in send_indirect_message

2019-01-15 Thread Anuj Phogat
On Sat, Jan 12, 2019 at 7:56 PM Jason Ekstrand  wrote:
>
> For a long time, we based exec sizes on destination register widths.
> We've not been doing that since 1ca3a9442760b6f7 but a few remnants
> accidentally remained.
> ---
>  src/intel/compiler/brw_eu_emit.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/src/intel/compiler/brw_eu_emit.c 
> b/src/intel/compiler/brw_eu_emit.c
> index 45e2552783b..0b4cd4dc213 100644
> --- a/src/intel/compiler/brw_eu_emit.c
> +++ b/src/intel/compiler/brw_eu_emit.c
> @@ -2473,9 +2473,6 @@ brw_send_indirect_message(struct brw_codegen *p,
>brw_set_src1(p, send, addr);
> }
>
> -   if (dst.width < BRW_EXECUTE_8)
> -  brw_inst_set_exec_size(devinfo, send, dst.width);
> -
> brw_set_dest(p, send, dst);
> brw_set_src0(p, send, retype(payload, BRW_REGISTER_TYPE_UD));
> brw_inst_set_sfid(devinfo, send, sfid);
> --
Reviewed-by: Anuj Phogat 

> 2.20.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] intel/sanitize_gpu: add help/gdb options to wrapper

2019-01-15 Thread Matt Turner
On Mon, Oct 29, 2018 at 11:16 AM Lionel Landwerlin
 wrote:
>
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/tools/intel_sanitize_gpu.in | 55 ++-
>  1 file changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/tools/intel_sanitize_gpu.in 
> b/src/intel/tools/intel_sanitize_gpu.in
> index 3dac954c408..7e4c96d8738 100755
> --- a/src/intel/tools/intel_sanitize_gpu.in
> +++ b/src/intel/tools/intel_sanitize_gpu.in
> @@ -1,4 +1,57 @@
>  #!/bin/bash
>  # -*- mode: sh -*-
>
> -LD_PRELOAD="@install_libexecdir@/libintel_sanitize_gpu.so${LD_PRELOAD:+:$LD_PRELOAD}"
>  exec "$@"
> +function show_help() {
> +cat < +Usage: intel_sanitize_gpu [OPTION]... [--] COMMAND ARGUMENTS
> +
> +Run COMMAND with ARGUMENTS and verify the GPU doesn't write outside its 
> memory
> +mapped buffers.
> +
> +  -g, --gdb  Launch GDB
> +
> +  --help Display this help message and exit
> +
> +EOF
> +
> +exit 0
> +}
> +
> +gdb=""
> +
> +while true; do
> +case "$1" in
> +--gdb)
> +gdb=1
> +shift
> +;;
> +-g)
> +gdb=1
> +shift
> +;;
> +--help)
> +show_help
> +;;
> +--)
> +shift
> +break
> +;;
> +-*)
> +echo "intel_aubdump: invalid option: $1"

No idea why this patch never landed, but

s/intel_aubdump/intel_sanitize_gpu/

(I just came across it when trying to figure out whether we ever moved
intel_aubdump from igt into Mesa.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: avoid context rolls when binding graphics pipelines

2019-01-15 Thread Bas Nieuwenhuizen
On Mon, Jan 14, 2019 at 5:12 PM Rhys Perry  wrote:
>
> I did and found small improvements in Rise of the Tomb Raider. I
> measured framerates ~104.3% that of without the changes for the
> Geothermal Valley scene, ~101.2% for Spine of the Mountain and ~102.3%
> for Prophets Tomb.

My main question would be what the statistical significance is.  e.g.
did you do one run of each, did you do multiple, and what was your
test setup?

Just curious because I have tried the exact same thing before and
could not find anything more than noise.

>
> I found no change with Dota 2 but I've heard it's cpu-bound.
>
> On Mon, 14 Jan 2019 at 16:05, Samuel Pitoiset  
> wrote:
> >
> > Did you benchmark?
> >
> > On 1/14/19 5:01 PM, Rhys Perry wrote:
> > > It's common in some applications to bind a new graphics pipeline without
> > > ending up changing any context registers.
> > >
> > > This has a pipline have two command buffers: one for setting context
> > > registers and one for everything else. The context register command buffer
> > > is only emitted if it differs from the previous pipeline's.
> > >
> > > Signed-off-by: Rhys Perry 
> > > ---
> > >   src/amd/vulkan/radv_cmd_buffer.c |  46 +--
> > >   src/amd/vulkan/radv_pipeline.c   | 217 ---
> > >   src/amd/vulkan/radv_private.h|   2 +
> > >   3 files changed, 150 insertions(+), 115 deletions(-)
> > >
> > > diff --git a/src/amd/vulkan/radv_cmd_buffer.c 
> > > b/src/amd/vulkan/radv_cmd_buffer.c
> > > index f41d6c0b3e7..59903ab64d8 100644
> > > --- a/src/amd/vulkan/radv_cmd_buffer.c
> > > +++ b/src/amd/vulkan/radv_cmd_buffer.c
> > > @@ -634,7 +634,7 @@ radv_emit_descriptor_pointers(struct radv_cmd_buffer 
> > > *cmd_buffer,
> > >   }
> > >   }
> > >
> > > -static void
> > > +static bool
> > >   radv_update_multisample_state(struct radv_cmd_buffer *cmd_buffer,
> > > struct radv_pipeline *pipeline)
> > >   {
> > > @@ -646,7 +646,7 @@ radv_update_multisample_state(struct radv_cmd_buffer 
> > > *cmd_buffer,
> > >   cmd_buffer->sample_positions_needed = true;
> > >
> > >   if (old_pipeline && num_samples == 
> > > old_pipeline->graphics.ms.num_samples)
> > > - return;
> > > + return false;
> > >
> > >   radeon_set_context_reg_seq(cmd_buffer->cs, 
> > > R_028BDC_PA_SC_LINE_CNTL, 2);
> > >   radeon_emit(cmd_buffer->cs, ms->pa_sc_line_cntl);
> > > @@ -661,6 +661,8 @@ radv_update_multisample_state(struct radv_cmd_buffer 
> > > *cmd_buffer,
> > >   radeon_emit(cmd_buffer->cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
> > >   radeon_emit(cmd_buffer->cs, EVENT_TYPE(V_028A90_FLUSH_DFSM) 
> > > | EVENT_INDEX(0));
> > >   }
> > > +
> > > + return true;
> > >   }
> > >
> > >   static void
> > > @@ -863,15 +865,15 @@ radv_emit_rbplus_state(struct radv_cmd_buffer 
> > > *cmd_buffer)
> > >   radeon_emit(cmd_buffer->cs, sx_blend_opt_control);
> > >   }
> > >
> > > -static void
> > > +static bool
> > >   radv_emit_graphics_pipeline(struct radv_cmd_buffer *cmd_buffer)
> > >   {
> > >   struct radv_pipeline *pipeline = cmd_buffer->state.pipeline;
> > >
> > >   if (!pipeline || cmd_buffer->state.emitted_pipeline == pipeline)
> > > - return;
> > > + return false;
> > >
> > > - radv_update_multisample_state(cmd_buffer, pipeline);
> > > + bool context_roll = radv_update_multisample_state(cmd_buffer, 
> > > pipeline);
> > >
> > >   cmd_buffer->scratch_size_needed =
> > > MAX2(cmd_buffer->scratch_size_needed,
> > > @@ -884,6 +886,15 @@ radv_emit_graphics_pipeline(struct radv_cmd_buffer 
> > > *cmd_buffer)
> > >
> > >   radeon_emit_array(cmd_buffer->cs, pipeline->cs.buf, 
> > > pipeline->cs.cdw);
> > >
> > > + if (!cmd_buffer->state.emitted_pipeline ||
> > > + cmd_buffer->state.emitted_pipeline->ctx_cs.cdw != 
> > > pipeline->ctx_cs.cdw ||
> > > + cmd_buffer->state.emitted_pipeline->ctx_cs_hash != 
> > > pipeline->ctx_cs_hash ||
> > > + memcmp(cmd_buffer->state.emitted_pipeline->ctx_cs.buf,
> > > +pipeline->ctx_cs.buf, pipeline->ctx_cs.cdw * 4)) {
> > > + radeon_emit_array(cmd_buffer->cs, pipeline->ctx_cs.buf, 
> > > pipeline->ctx_cs.cdw);
> > > + context_roll = true;
> > > + }
> > > +
> > >   for (unsigned i = 0; i < MESA_SHADER_COMPUTE; i++) {
> > >   if (!pipeline->shaders[i])
> > >   continue;
> > > @@ -902,6 +913,8 @@ radv_emit_graphics_pipeline(struct radv_cmd_buffer 
> > > *cmd_buffer)
> > >   cmd_buffer->state.emitted_pipeline = pipeline;
> > >
> > >   cmd_buffer->state.dirty &= ~RADV_CMD_DIRTY_PIPELINE;
> > > +
> > > + return context_roll;
> > >   }
> > >
> > >   static void
> > > @@ -2859,6 +2872,8 @@ radv_emit_compute_pipeline(struct radv_cmd_buffer 
> > > *cmd_buffer)
> > >   if (!pipeline || pipeline == 
> > > cmd_buffer->state.emitted_compute_pipeline)

Re: [Mesa-dev] [PATCH] intel/fs: Do the grf127 hack on SIMD8 instructions in SIMD16 mode

2019-01-15 Thread Jason Ekstrand

On January 15, 2019 17:55:31 Matt Turner  wrote:


On Tue, Jan 15, 2019 at 8:58 AM Jason Ekstrand  wrote:


Previously, we only applied the fix to shaders with a dispatch mode of
SIMD8 but the code it relies on for SIMD16 mode only applies to SIMD16
instructions.  If you have a SIMD8 instruction in a SIMD16 shader,
neither would trigger and the restriction could still be hit.

Cc: Jose Maria Casanova Crespo 
Fixes: 232ed8980217dd "i965/fs: Register allocator shoudn't use grf127..."
---
src/intel/compiler/brw_fs_reg_allocate.cpp | 13 ++---
1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp 
b/src/intel/compiler/brw_fs_reg_allocate.cpp

index 5db5242452e..ec743f9b5bf 100644
--- a/src/intel/compiler/brw_fs_reg_allocate.cpp.
+++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
@@ -667,15 +667,14 @@ fs_visitor::assign_regs(bool allow_spilling, bool 
spill_all)

* messages adding a node interference to the grf127_send_hack_node.
* This node has a fixed asignment to grf127.
*
-   * We don't apply it to SIMD16 because previous code avoids any register
-   * overlap between sources and destination.
+   * We don't apply it to SIMD16 instructions because previous code avoids
+   * any register overlap between sources and destination.
*/
ra_set_node_reg(g, grf127_send_hack_node, 127);
-  if (dispatch_width == 8) {
- foreach_block_and_inst(block, fs_inst, inst, cfg) {
-if (inst->is_send_from_grf() && inst->dst.file == VGRF)
-   ra_add_node_interference(g, inst->dst.nr, 
grf127_send_hack_node);

- }
+  foreach_block_and_inst(block, fs_inst, inst, cfg) {
+ if (inst->exec_size < 16 && inst->is_send_from_grf() &&
+ inst->dst.file == VGRF)
+ra_add_node_interference(g, inst->dst.nr, grf127_send_hack_node);
}


Did the code in brw_eu_validate.c catch the case you found?


Yes, it did. It was a fairly simple case; it just occurred as a SIMD8 
instruction in a SIMD16 program.



In fact, that code looks wrong:

|  (brw_inst_dst_da_reg_nr(devinfo, inst) +
|   brw_inst_rlen(devinfo, inst) > 127) &&

I think > should be >=. And maybe we should have a separate case
earlier that checks that dst_nr+rlen actually fits in registers, and
then change > to just ==. FFS :(

Not sure what I was thinking letting that patch through without a unit
test. I'll do that.




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: avoid context rolls when binding graphics pipelines

2019-01-15 Thread Rhys Perry
I did a before/after comparison during development with multiple runs
but only 1 before and after run to produce the numbers I sent. They
seemed to match up well enough to the runs during development, so I
wasn't too concerned.

IIRC, the two runs were with a Vega 64 at 1080p with "High" settings.
The kernel/distro was 4.19.13 and Fedora 29. Also
"/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor" was set to
"performance" and
"/sys/class/drm/card*/device/power_dpm_force_performance_level" was
set to "high" while running.

I'll do multiple runs of Rise of the Tomb Raider tomorrow and see if I
get anything too different.

On Wed, 16 Jan 2019 at 00:25, Bas Nieuwenhuizen  
wrote:
>
> On Mon, Jan 14, 2019 at 5:12 PM Rhys Perry  wrote:
> >
> > I did and found small improvements in Rise of the Tomb Raider. I
> > measured framerates ~104.3% that of without the changes for the
> > Geothermal Valley scene, ~101.2% for Spine of the Mountain and ~102.3%
> > for Prophets Tomb.
>
> My main question would be what the statistical significance is.  e.g.
> did you do one run of each, did you do multiple, and what was your
> test setup?
>
> Just curious because I have tried the exact same thing before and
> could not find anything more than noise.
>
> >
> > I found no change with Dota 2 but I've heard it's cpu-bound.
> >
> > On Mon, 14 Jan 2019 at 16:05, Samuel Pitoiset  
> > wrote:
> > >
> > > Did you benchmark?
> > >
> > > On 1/14/19 5:01 PM, Rhys Perry wrote:
> > > > It's common in some applications to bind a new graphics pipeline without
> > > > ending up changing any context registers.
> > > >
> > > > This has a pipline have two command buffers: one for setting context
> > > > registers and one for everything else. The context register command 
> > > > buffer
> > > > is only emitted if it differs from the previous pipeline's.
> > > >
> > > > Signed-off-by: Rhys Perry 
> > > > ---
> > > >   src/amd/vulkan/radv_cmd_buffer.c |  46 +--
> > > >   src/amd/vulkan/radv_pipeline.c   | 217 ---
> > > >   src/amd/vulkan/radv_private.h|   2 +
> > > >   3 files changed, 150 insertions(+), 115 deletions(-)
> > > >
> > > > diff --git a/src/amd/vulkan/radv_cmd_buffer.c 
> > > > b/src/amd/vulkan/radv_cmd_buffer.c
> > > > index f41d6c0b3e7..59903ab64d8 100644
> > > > --- a/src/amd/vulkan/radv_cmd_buffer.c
> > > > +++ b/src/amd/vulkan/radv_cmd_buffer.c
> > > > @@ -634,7 +634,7 @@ radv_emit_descriptor_pointers(struct 
> > > > radv_cmd_buffer *cmd_buffer,
> > > >   }
> > > >   }
> > > >
> > > > -static void
> > > > +static bool
> > > >   radv_update_multisample_state(struct radv_cmd_buffer *cmd_buffer,
> > > > struct radv_pipeline *pipeline)
> > > >   {
> > > > @@ -646,7 +646,7 @@ radv_update_multisample_state(struct 
> > > > radv_cmd_buffer *cmd_buffer,
> > > >   cmd_buffer->sample_positions_needed = true;
> > > >
> > > >   if (old_pipeline && num_samples == 
> > > > old_pipeline->graphics.ms.num_samples)
> > > > - return;
> > > > + return false;
> > > >
> > > >   radeon_set_context_reg_seq(cmd_buffer->cs, 
> > > > R_028BDC_PA_SC_LINE_CNTL, 2);
> > > >   radeon_emit(cmd_buffer->cs, ms->pa_sc_line_cntl);
> > > > @@ -661,6 +661,8 @@ radv_update_multisample_state(struct 
> > > > radv_cmd_buffer *cmd_buffer,
> > > >   radeon_emit(cmd_buffer->cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
> > > >   radeon_emit(cmd_buffer->cs, 
> > > > EVENT_TYPE(V_028A90_FLUSH_DFSM) | EVENT_INDEX(0));
> > > >   }
> > > > +
> > > > + return true;
> > > >   }
> > > >
> > > >   static void
> > > > @@ -863,15 +865,15 @@ radv_emit_rbplus_state(struct radv_cmd_buffer 
> > > > *cmd_buffer)
> > > >   radeon_emit(cmd_buffer->cs, sx_blend_opt_control);
> > > >   }
> > > >
> > > > -static void
> > > > +static bool
> > > >   radv_emit_graphics_pipeline(struct radv_cmd_buffer *cmd_buffer)
> > > >   {
> > > >   struct radv_pipeline *pipeline = cmd_buffer->state.pipeline;
> > > >
> > > >   if (!pipeline || cmd_buffer->state.emitted_pipeline == pipeline)
> > > > - return;
> > > > + return false;
> > > >
> > > > - radv_update_multisample_state(cmd_buffer, pipeline);
> > > > + bool context_roll = radv_update_multisample_state(cmd_buffer, 
> > > > pipeline);
> > > >
> > > >   cmd_buffer->scratch_size_needed =
> > > > MAX2(cmd_buffer->scratch_size_needed,
> > > > @@ -884,6 +886,15 @@ radv_emit_graphics_pipeline(struct radv_cmd_buffer 
> > > > *cmd_buffer)
> > > >
> > > >   radeon_emit_array(cmd_buffer->cs, pipeline->cs.buf, 
> > > > pipeline->cs.cdw);
> > > >
> > > > + if (!cmd_buffer->state.emitted_pipeline ||
> > > > + cmd_buffer->state.emitted_pipeline->ctx_cs.cdw != 
> > > > pipeline->ctx_cs.cdw ||
> > > > + cmd_buffer->state.emitted_pipeline->ctx_cs_hash != 
> > > > pipeline->ctx_cs_hash ||
> > > > + 

[Mesa-dev] [Bug 109258] Weston drm-backend.so seems to fail with Mesa master and LIBGL_ALWAYS_SOFTWARE=1

2019-01-15 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109258

--- Comment #3 from n3rdopolis  ---
Haha. after all that manual bisecting too :) However, I feel something was
still broken, because I found a built Mesa I had from a January 2018 commit in
master, that didn't work...


To verify, I tried to see if the 84f3afc2e122cb418573f1e9c61716520f9859c1
commit would work, and it actually did not.

It seems another commit caused another problem I guess. Thankfully there
weren't too many EGL commits between this, so I narrowed it down to two. I
first tried to revert commit 47273d7312cb5b5b6b0b9faa814d574bbbce1c01
and then trying again, and Weston ran! 

I quickly skimmed the webgit log this time, and now I don't think I saw any
mentions of 47273d7312cb5b5b6b0b9faa814d574bbbce1c01 being reverted this time.


(My second guess would have d7e769abec732fd23b93145a519065c82b2ccb2b .
Thankfully it wasn't, as that one conflicted during the revert.)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radeonsi: unify error paths in si_texture_create_object

2019-01-15 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_texture.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_texture.c 
b/src/gallium/drivers/radeonsi/si_texture.c
index a56674b6000..585f58c1e38 100644
--- a/src/gallium/drivers/radeonsi/si_texture.c
+++ b/src/gallium/drivers/radeonsi/si_texture.c
@@ -1104,21 +1104,21 @@ si_texture_create_object(struct pipe_screen *screen,
 const struct pipe_resource *base,
 struct pb_buffer *buf,
 struct radeon_surf *surface)
 {
struct si_texture *tex;
struct r600_resource *resource;
struct si_screen *sscreen = (struct si_screen*)screen;
 
tex = CALLOC_STRUCT(si_texture);
if (!tex)
-   return NULL;
+   goto error;
 
resource = >buffer;
resource->b.b = *base;
resource->b.b.next = NULL;
resource->b.vtbl = _texture_vtbl;
pipe_reference_init(>b.b.reference, 1);
resource->b.b.screen = screen;
 
/* don't include stencil-only formats which we don't support for 
rendering */
tex->is_depth = 
util_format_has_depth(util_format_description(tex->buffer.b.b.format));
@@ -1179,48 +1179,44 @@ si_texture_create_object(struct pipe_screen *screen,
tex->fmask_offset = align64(tex->size,
 
tex->surface.fmask_alignment);
tex->size = tex->fmask_offset + tex->surface.fmask_size;
 
/* Allocate CMASK. */
tex->cmask_offset = align64(tex->size, 
tex->surface.cmask_alignment);
tex->size = tex->cmask_offset + tex->surface.cmask_size;
tex->cb_color_info |= S_028C70_FAST_CLEAR(1);
tex->cmask_buffer = >buffer;
 
-   if (!tex->surface.fmask_size || 
!tex->surface.cmask_size) {
-   FREE(tex);
-   return NULL;
-   }
+   if (!tex->surface.fmask_size || 
!tex->surface.cmask_size)
+   goto error;
}
 
/* Shared textures must always set up DCC here.
 * If it's not present, it will be disabled by
 * apply_opaque_metadata later.
 */
if (tex->surface.dcc_size &&
(buf || !(sscreen->debug_flags & DBG(NO_DCC))) &&
!(tex->surface.flags & RADEON_SURF_SCANOUT)) {
/* Reserve space for the DCC buffer. */
tex->dcc_offset = align64(tex->size, 
tex->surface.dcc_alignment);
tex->size = tex->dcc_offset + tex->surface.dcc_size;
}
}
 
/* Now create the backing buffer. */
if (!buf) {
si_init_resource_fields(sscreen, resource, tex->size,
  tex->surface.surf_alignment);
 
-   if (!si_alloc_resource(sscreen, resource)) {
-   FREE(tex);
-   return NULL;
-   }
+   if (!si_alloc_resource(sscreen, resource))
+   goto error;
} else {
resource->buf = buf;
resource->gpu_address = 
sscreen->ws->buffer_get_virtual_address(resource->buf);
resource->bo_size = buf->size;
resource->bo_alignment = buf->alignment;
resource->domains = 
sscreen->ws->buffer_get_initial_domain(resource->buf);
if (resource->domains & RADEON_DOMAIN_VRAM)
resource->vram_usage = buf->size;
else if (resource->domains & RADEON_DOMAIN_GTT)
resource->gart_usage = buf->size;
@@ -1268,20 +1264,24 @@ si_texture_create_object(struct pipe_screen *screen,
puts("Texture:");
struct u_log_context log;
u_log_context_init();
si_print_texture_info(sscreen, tex, );
u_log_new_page_print(, stdout);
fflush(stdout);
u_log_context_destroy();
}
 
return tex;
+
+error:
+   FREE(tex);
+   return NULL;
 }
 
 static enum radeon_surf_mode
 si_choose_tiling(struct si_screen *sscreen,
 const struct pipe_resource *templ, bool tc_compatible_htile)
 {
const struct util_format_description *desc = 
util_format_description(templ->format);
bool force_tiling = templ->flags & SI_RESOURCE_FLAG_FORCE_MSAA_TILING;
bool is_depth_stencil = util_format_is_depth_or_stencil(templ->format) 
&&
!(templ->flags & 
SI_RESOURCE_FLAG_FLUSHED_DEPTH);
-- 
2.17.1

___
mesa-dev mailing list

Re: [Mesa-dev] [RFC 1/6] dri: Support 64 bit rgba masks

2019-01-15 Thread Strasser, Kevin
Adam Jackson wrote:
> > On Fri, 2019-01-11 at 15:01 +, Emil Velikov wrote:
> > 
> > > > @@ -460,6 +464,14 @@ driGetConfigAttribIndex(const __DRIconfig
> > *config,
> > > > else
> > > > *value = 0;
> > > > break;
> > > > +case __DRI_ATTRIB_RED_MASK_HI:
> > > > +case __DRI_ATTRIB_GREEN_MASK_HI:
> > > > +case __DRI_ATTRIB_BLUE_MASK_HI:
> > > > +case __DRI_ATTRIB_ALPHA_MASK_HI:
> > > > +/* upper 32 bits of 64 bit fields */
> > > > +*value = *(unsigned int *)
> > > > +((char *) >modes + attribMap[index].offset +
> > > > + 4);
> > >
> > > Is the "+ 4" going to work on big endian systems?
> > 
> > No.
> > 
> > I think I'd prefer to just expand config attribute values to 64-bit across 
> > the board
> > internally, rather than have paired 32-bit attributes like this.

Emil is right, big endian wouldn't work with my patch as-is, but I suppose I
could incorporate some macros to make it work. I experimented with changing
getConfigAttrib/indexConfigAttrib to allow for 64-bit values, like you suggest,
but doing so without breaking ABI got pretty ugly - required adding new
functions (getConfigAttrib2/indexConfigAttrib2), and checking the driver's
extension version level at every call site. However, given the issue with
endianness, I'm now more inclined to go in that direction.

Thanks,
Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 1/6] dri: Support 64 bit rgba masks

2019-01-15 Thread Strasser, Kevin
Emil Velikov wrote:
> Hi Kevin,
> 
> Thanks for that massive undertaking in addressing this.

Sure thing!

> On 2019/01/04, Kevin Strasser wrote:
> > The dri core api was written with the assumption that all attribute
> > values would fit into 32 bits. This limitation means the config
> > handlers can't accept 64 bpp formats. Reserve 64 bits for rgba masks
> > and add new attributes that allow access to the upper 32 bits.
> >
> > Signed-off-by: Kevin Strasser 
> > ---
> >  include/GL/internal/dri_interface.h |  6 ++-
> >  src/egl/drivers/dri2/egl_dri2.c | 28 +---
> >  src/egl/drivers/dri2/egl_dri2.h |  6 +--
> >  src/egl/drivers/dri2/platform_android.c |  2 +-
> >  src/egl/drivers/dri2/platform_drm.c | 67 
> > ++---
> >  src/egl/drivers/dri2/platform_surfaceless.c |  2 +-
> >  src/egl/drivers/dri2/platform_wayland.c |  2 +-
> >  src/egl/drivers/dri2/platform_x11.c |  6 +--
> >  src/gbm/backends/dri/gbm_driint.h   |  8 ++--
> >  src/glx/glxconfig.h |  2 +-
> >  src/mesa/drivers/dri/common/utils.c | 16 ++-
> >  src/mesa/main/mtypes.h  |  2 +-
> >  12 files changed, 108 insertions(+), 39 deletions(-)
> >
> Please split this up a bit. I'm thinking of:
>  - dri_interface
>  - mesa
>  - egl
>  - gbm
>  - glx - seems sparse on updates, guessting you're followed in laster patches?

Sure, I can break it up a bit more. I didn't modify glx much as it doesn't read
the mask attributes directly, hence it can't handle configs with RGBA ordering.
I don't know the background of that limitation, but I assume there just hasn't
been any use cases for those formats outside of Android, and so handling hasn't
been needed for glx... The intention of this series was to get fp16 working
first for egl. Can we leave the glx side for if/when someone needs it there?

> > diff --git a/include/GL/internal/dri_interface.h
> > b/include/GL/internal/dri_interface.h
> > index 072f379..c5761c4 100644
> > --- a/include/GL/internal/dri_interface.h
> > +++ b/include/GL/internal/dri_interface.h
> > @@ -747,7 +747,11 @@ struct __DRIuseInvalidateExtensionRec {
> >  #define __DRI_ATTRIB_YINVERTED 47
> >  #define __DRI_ATTRIB_FRAMEBUFFER_SRGB_CAPABLE  48
> >  #define __DRI_ATTRIB_MUTABLE_RENDER_BUFFER 49 /* 
> > EGL_MUTABLE_RENDER_BUFFER_BIT_KHR */
> > -#define __DRI_ATTRIB_MAX   50
> > +#define __DRI_ATTRIB_RED_MASK_HI   50
> > +#define __DRI_ATTRIB_GREEN_MASK_HI 51
> > +#define __DRI_ATTRIB_BLUE_MASK_HI  52
> > +#define __DRI_ATTRIB_ALPHA_MASK_HI 53
> > +#define __DRI_ATTRIB_MAX   54
> 
> Worth adding some defines as below for clarity/consistency sake and updating
> the existing code to use them?
> 
> #define __DRI_ATTRIB_RED_MASK_LO __DRI_ATTRIB_RED_MASK ...

Sounds like a good idea.

> >  /* __DRI_ATTRIB_RENDER_TYPE */
> >  #define __DRI_ATTRIB_RGBA_BIT  0x01
> > diff --git a/src/egl/drivers/dri2/egl_dri2.c
> > b/src/egl/drivers/dri2/egl_dri2.c index 0be9235..d19950d 100644
> > --- a/src/egl/drivers/dri2/egl_dri2.c
> > +++ b/src/egl/drivers/dri2/egl_dri2.c
> > @@ -179,7 +179,7 @@ dri2_match_config(const _EGLConfig *conf, const
> > _EGLConfig *criteria)  struct dri2_egl_config *
> > dri2_add_config(_EGLDisplay *disp, const __DRIconfig *dri_config, int id,
> >  EGLint surface_type, const EGLint *attr_list,
> > -const unsigned int *rgba_masks)
> > +const unsigned long long int *rgba_masks)
> I'm slightly inclided towards uint64_t since it's a little more explicit and 
> clear.
> IIRC sizeof(long long) varies across platforms and/or compilers so uint64_t 
> will
> avoid all the potential fun ;-)

Sure, I'm all for using explicit width types.

> > +  const __DRIconfig *config = dri2_dpy->driver_configs[i];
> > +  unsigned long long int red, green, blue, alpha;
> > +  unsigned int mask_hi = 0, mask_lo;
> > +
> > +  dri2_dpy->core->getConfigAttrib(config, __DRI_ATTRIB_RED_MASK_HI,
> > +  _hi);
> > +  dri2_dpy->core->getConfigAttrib(config, __DRI_ATTRIB_RED_MASK,
> > +  _lo);
> > +  red = (unsigned long long int)mask_hi << 32 | mask_lo;
> > +
> > +  dri2_dpy->core->getConfigAttrib(config, __DRI_ATTRIB_GREEN_MASK_HI,
> > +  _hi);
> > +  dri2_dpy->core->getConfigAttrib(config, __DRI_ATTRIB_GREEN_MASK,
> > +  _lo);
> > +  green = (unsigned long long int)mask_hi << 32 | mask_lo;
> > +
> > +  dri2_dpy->core->getConfigAttrib(config, __DRI_ATTRIB_BLUE_MASK_HI,
> > +  _hi);
> > +  dri2_dpy->core->getConfigAttrib(config, __DRI_ATTRIB_BLUE_MASK,
> > +  _lo);
> > +  blue = (unsigned long long 

  1   2   >