[Mesa-dev] [PATCH] panfrost: Add pandecode (command stream debugger)

2019-02-18 Thread Alyssa Rosenzweig
The `panwrap` utility can be LD_PRELOAD'd into a GLES app, intercepting
communication between the driver and the kernel. Modern panwrap versions
do no processing of their own; instead, they create a trace directory.
This directory contains the following files:

 - control.log: a line-by-line plain text file, denoting important
   syscalls (mmaps and job submits) along with their arguments

 - memory_*.bin, shader_*.bin: binary dumps of mapped memory

Together, these files contain enough information to reconstruct the
command stream and shaders of (at minimum) a single frame.

The `pandecode` utility takes this directory structure as input,
reconstructing the mapped memory and using the job submit command as an
entrypoint. It then walks the descriptors as the hardware would, parsing
and pretty-printing. Its final output is the pretty-printed command
stream interleaved with the disassembled shaders, suitable for driver
debugging. For instance, the behaviour of two driver versions (one
working, one broken) can be compared by diff'ing their decoded logs.

pandecode/decode.c was originally a part of `panwrap`; it is the oldest
living code in the project. Its history is generally not worth
preserving.

panwrap itself will continue to live downstream for the foreseeable
future, as it is specifically written for the vendor kernel. It should
be simple, however, to produce equivalent traces directly from Panfrost,
bypassing the intermediate wrapping layer for well-behaved drivers.

Signed-off-by: Alyssa Rosenzweig 
---
 src/gallium/drivers/panfrost/meson.build  |   22 +
 .../drivers/panfrost/pan_pretty_print.c   |4 +-
 .../drivers/panfrost/pan_pretty_print.h   |2 +-
 .../drivers/panfrost/pandecode/cmdline.c  |  189 ++
 .../drivers/panfrost/pandecode/decode.c   | 1996 +
 src/gallium/drivers/panfrost/pandecode/mmap.h |   79 +
 6 files changed, 2289 insertions(+), 3 deletions(-)
 create mode 100644 src/gallium/drivers/panfrost/pandecode/cmdline.c
 create mode 100644 src/gallium/drivers/panfrost/pandecode/decode.c
 create mode 100644 src/gallium/drivers/panfrost/pandecode/mmap.h

diff --git a/src/gallium/drivers/panfrost/meson.build 
b/src/gallium/drivers/panfrost/meson.build
index 81f18c33d18..9c36daeb0f2 100644
--- a/src/gallium/drivers/panfrost/meson.build
+++ b/src/gallium/drivers/panfrost/meson.build
@@ -119,3 +119,25 @@ midgard_compiler = executable(
   ],
   build_by_default : true
 )
+
+files_pandecode = files(
+  'pandecode/cmdline.c',
+  'pandecode/decode.c',
+
+  'pan_pretty_print.c',
+
+  'midgard/disassemble.c'
+)
+
+pandecode = executable(
+  'pandecode',
+  files_pandecode,
+  include_directories : inc_panfrost,
+  dependencies : [
+dep_thread,
+  ],
+  link_with : [
+libmesa_util
+  ],
+  build_by_default : true
+)
diff --git a/src/gallium/drivers/panfrost/pan_pretty_print.c 
b/src/gallium/drivers/panfrost/pan_pretty_print.c
index fd8ad40d407..f9fd2c0e6da 100644
--- a/src/gallium/drivers/panfrost/pan_pretty_print.c
+++ b/src/gallium/drivers/panfrost/pan_pretty_print.c
@@ -27,11 +27,11 @@
 #include 
 #include 
 
-/* Some self-contained prettyprinting functions shared between panwrap and
+/* Some self-contained prettyprinting functions shared between pandecode and
  * the main driver */
 
 #define DEFINE_CASE(name) case MALI_## name: return "MALI_" #name
-char *panwrap_format_name(enum mali_format format)
+char *pandecode_format_name(enum mali_format format)
 {
static char unk_format_str[5];
 
diff --git a/src/gallium/drivers/panfrost/pan_pretty_print.h 
b/src/gallium/drivers/panfrost/pan_pretty_print.h
index a781ceaf582..22dca4abbf6 100644
--- a/src/gallium/drivers/panfrost/pan_pretty_print.h
+++ b/src/gallium/drivers/panfrost/pan_pretty_print.h
@@ -26,7 +26,7 @@
 
 #include "panfrost-job.h"
 
-char *panwrap_format_name(enum mali_format format);
+char *pandecode_format_name(enum mali_format format);
 void panfrost_print_blend_equation(struct mali_blend_equation eq);
 
 #endif
diff --git a/src/gallium/drivers/panfrost/pandecode/cmdline.c 
b/src/gallium/drivers/panfrost/pandecode/cmdline.c
new file mode 100644
index 000..b2ba21cfe41
--- /dev/null
+++ b/src/gallium/drivers/panfrost/pandecode/cmdline.c
@@ -0,0 +1,189 @@
+/*
+ * Copyright (C) 2019 Alyssa Rosenzweig
+ * Copyright (C) 2017-2018 Lyude Paul
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE 

Re: [Mesa-dev] [PATCH 2/2] anv/image: fix offset's alignment to the surface alignment

2019-02-18 Thread Jason Ekstrand
This confuses me. When is this ever a problem?  I suspect that we're doing 
something wing with alignments now.


--Jason

On February 18, 2019 09:02:39 Lionel Landwerlin 
 wrote:



On 15/02/2019 14:43, Samuel Iglesias Gonsálvez wrote:

Signed-off-by: Samuel Iglesias Gonsálvez 



Hey Samuel,


Thanks for this change. Would you mind changing the align_u32 in the
if() branch too?

It won't fix anything but that's just to be consistent.


With that :

Reviewed-by: Lionel Landwerlin 



---
  src/intel/vulkan/anv_image.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
index 3999c7399d0..f4a65044a3b 100644
--- a/src/intel/vulkan/anv_image.c
+++ b/src/intel/vulkan/anv_image.c
@@ -142,7 +142,7 @@ add_surface(struct anv_image *image, struct anv_surface 
*surf, uint32_t plane)

 surf->isl.alignment_B);
/* Plane offset is always 0 when it's disjoint. */
 } else {
-  surf->offset = align_u32(image->size, surf->isl.alignment_B);
+  surf->offset = util_align_npot(image->size, surf->isl.alignment_B);
/* Determine plane's offset only once when the first surface is added. 
*/
if (image->planes[plane].size == 0)
   image->planes[plane].offset = image->size;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] gallium/auxiliary/vl: Fix duplicate symbol build errors.

2019-02-18 Thread Vinson Lee
  CXXLDgallium_dri.la
duplicate symbol _compute_shader_video_buffer in:

../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor.o)

../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor_cs.o)
duplicate symbol _compute_shader_weave in:

../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor.o)

../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor_cs.o)
duplicate symbol _compute_shader_rgba in:

../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor.o)

../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor_cs.o)

Fixes: 9364d66cb7f7 ("gallium/auxiliary/vl: Add video compositor compute shader 
render")
Signed-off-by: Vinson Lee 
---
 src/gallium/auxiliary/vl/vl_compositor_cs.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_compositor_cs.h 
b/src/gallium/auxiliary/vl/vl_compositor_cs.h
index 7a203d327eda..a73a8755fc2a 100644
--- a/src/gallium/auxiliary/vl/vl_compositor_cs.h
+++ b/src/gallium/auxiliary/vl/vl_compositor_cs.h
@@ -32,9 +32,9 @@
 
 #include "vl_compositor.h"
 
-char *compute_shader_video_buffer;
-char *compute_shader_weave;
-char *compute_shader_rgba;
+extern char *compute_shader_video_buffer;
+extern char *compute_shader_weave;
+extern char *compute_shader_rgba;
 
 /**
  * create compute shader
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] drirc: Add sddm-greeter to adaptive_sync blacklist.

2019-02-18 Thread Mario Kleiner
This is the sddm login screen.

Fixes: a9c36dbf9c56 ("drirc: Initial blacklist for adaptive sync")
Signed-off-by: Mario Kleiner 
Cc: 19.0 
---
 src/util/00-mesa-defaults.conf | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/util/00-mesa-defaults.conf b/src/util/00-mesa-defaults.conf
index cb0e6e659e2..43fe95b8810 100644
--- a/src/util/00-mesa-defaults.conf
+++ b/src/util/00-mesa-defaults.conf
@@ -346,6 +346,9 @@ TODO: document the other workarounds.
 
 
 
+
+
+
 
 
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Mesa CI is too slow

2019-02-18 Thread Daniel Stone
Hi,

On Mon, 18 Feb 2019 at 18:58, Eric Engestrom  wrote:
> On Monday, 2019-02-18 17:31:41 +, Daniel Stone wrote:
> > Two hours of end-to-end pipeline time is also obviously far too long.
> > Amongst other things, it practically precludes pre-merge CI: by the
> > time your build has finished, someone will have pushed to the tree, so
> > you need to start again. Even if we serialised it through a bot, that
> > would limit us to pushing 12 changesets per day, which seems too low.
> >
> > I'm currently talking to two different hosts to try to get more
> > sponsored time for CI runners. Those are both on hold this week due to
> > travel / personal circumstances, but I'll hopefully find out more next
> > week. Eric E filed an issue
> > (https://gitlab.freedesktop.org/freedesktop/freedesktop/issues/120) to
> > enable ccache cache but I don't see myself having the time to do it
> > before next month.
>
> Just to chime in to this point, I also have an MR to enable ccache per
> runner, which with our static runners setup is not much worse than the
> shared cache:
> https://gitlab.freedesktop.org/mesa/mesa/merge_requests/240
>
> From my cursory testing, this should already cut the compilations by
> 80-90% :)

That's great! Is there any reason not to merge it?

> > Doing the above would reduce the run time fairly substantially, for
> > what I can tell is no loss in functional coverage, and bring the
> > parallelism to a mere 1.5x oversubscription of the whole
> > organisation's available job slots, from the current 2x.
> >
> > Any thoughts?
>
> Your suggestions all sound good, although I can't speak for #1 and #2.
>
> #3 sounds good, I guess we can keep meson builds with the "oldest supported
> llvm" and the "current llvm version", and only the "oldest supported"
> for autotools?

We could have Meson building all the LLVM versions autotools does for
not much overhead at all. At the moment though Meson builds 3 and
autotools builds 6, which isn't bring us increased code coverage.

> You've suggested reducing the amount that's built (ccache,
> dropping/merging jobs) and making it more parallel (fewer jobs), but
> there's another avenue to look at: run the CI less often.
>
> In my opinion, the CI should run on every single commit. Since that's
> not realistic, we need to decide what's essential.
> From most to least important:
>
> - master: everything that hits master needs to be build- and smoke-tested
>
> - stable branches: we obviously don't want to break stable branches
>
> - merge requests: the reason I wrote the CI was to automatically test MRs
>
> - personal work on forks: it would be really useful to test things
>   before sending out an MR, especially with the less-used build systems
>   that we often forget to update, but this should be opt-in, not opt-out
>   as it is right now.
>
> Ideally, this means we add this to the .gitlab.yml:
>   only:
> - master
> - merge_requests
> - ci/*
>
> Until this morning, I thought `merge_requests` was an Enterprise Edition
> only feature, which I why I didn't put it in, but it appears I was wrong,
> see:
> https://docs.gitlab.com/ce/ci/merge_request_pipelines/
> (Thanks Caio for reading through the docs more carefully than I did! :)
>
> I'll send an MR in a bit with the above. This will mean that master and
> MRs get automatic CI, and pushes on forks don't (except the fork's
> master), but one can push a `ci/*` branch to their own fork to run the
> CI on it.
>
> I think this should massively drop the use of the CI, but mostly remove
> unwanted uses :)

It depends on the definition of 'unwanted', of course ... I personally
like the idea of having a very early canary in the coalmine, and
building it into peoples' workflows as quickly as possible. If a more
sensible job split could reduce compilation time by 30-40%, and using
ccache could drop the compilation overhead by a huge amount as well,
that sounds like more than enough to not need to stop CI on personal
forks. Why don't we pursue those avenues first, rather than
restricting the audience?

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/4] Common KMS renderonly support

2019-02-18 Thread Kyle Russell
Thanks for this.  Sorry I'm just now seeing it.  I'll try this out on the
armada+etnaviv configuration.

On Thu, Jan 24, 2019 at 5:35 PM Rob Herring  wrote:

> This series aims to make supporting new platforms containing
> renderonly GPUs easier with less copy-n-paste. This hasn't been a big
> issue so far as the current renderonly drivers (vc4 and etnaviv) only
> exists on a few platforms. This is changing with i.MX+freedreno,
> armada+etnaviv and a slew of platforms using Mali lima and panfrost
> drivers.
>
> I've taken the kmsro winsys from Eric, extended the pipe-loader to
> fall back to kmsro, added etnaviv support, and switched imx to use
> kmsro.
>
> I've tested this with the panfrost tree. Help testing on i.MX would be
> nice. A git branch is here[1].
>
> Rob
>
> [1] https://github.com/robherring/mesa winsys-renderonly
>
> Eric Anholt (1):
>   pl111: Rename the pl111 driver to "kmsro".
>
> Rob Herring (3):
>   pipe-loader: Fallback to kmsro driver when no matching driver name
> found
>   kmsro: Add etnaviv renderonly support
>   Switch imx to kmsro and remove the imx winsys
>
>  .travis.yml   |  2 +-
>  Android.mk|  7 ++-
>  Makefile.am   |  2 +-
>  configure.ac  | 28 ---
>  meson.build   | 12 ++---
>  meson_options.txt |  4 +-
>  src/gallium/Android.mk|  3 +-
>  src/gallium/Makefile.am   |  8 +--
>  .../auxiliary/pipe-loader/pipe_loader_drm.c   | 18 +++
>  .../auxiliary/target-helpers/drm_helper.h | 35 +++--
>  .../target-helpers/drm_helper_public.h|  2 +-
>  src/gallium/drivers/imx/Automake.inc  |  9 
>  .../drivers/{pl111 => kmsro}/Android.mk   |  6 +--
>  src/gallium/drivers/kmsro/Automake.inc|  9 
>  .../drivers/{imx => kmsro}/Makefile.am|  4 +-
>  .../drivers/{pl111 => kmsro}/Makefile.sources |  0
>  src/gallium/drivers/pl111/Automake.inc|  9 
>  src/gallium/drivers/pl111/Makefile.am |  8 ---
>  src/gallium/meson.build   | 23 -
>  src/gallium/targets/dri/Makefile.am   |  3 +-
>  src/gallium/targets/dri/meson.build   |  8 +--
>  src/gallium/targets/dri/target.c  |  2 +-
>  src/gallium/winsys/imx/drm/Android.mk | 40 ---
>  src/gallium/winsys/imx/drm/Makefile.am| 35 -
>  src/gallium/winsys/imx/drm/Makefile.sources   |  3 --
>  src/gallium/winsys/imx/drm/imx_drm_public.h   | 34 -
>  src/gallium/winsys/imx/drm/imx_drm_winsys.c   | 50 ---
>  src/gallium/winsys/imx/drm/meson.build| 33 
>  .../winsys/{pl111 => kmsro}/drm/Android.mk|  2 +-
>  .../winsys/{pl111 => kmsro}/drm/Makefile.am   | 12 -
>  src/gallium/winsys/kmsro/drm/Makefile.sources |  3 ++
>  .../drm/kmsro_drm_public.h}   |  8 +--
>  .../drm/kmsro_drm_winsys.c}   | 42 +++-
>  .../winsys/{pl111 => kmsro}/drm/meson.build   | 23 ++---
>  src/gallium/winsys/pl111/drm/Makefile.sources |  3 --
>  35 files changed, 130 insertions(+), 360 deletions(-)
>  delete mode 100644 src/gallium/drivers/imx/Automake.inc
>  rename src/gallium/drivers/{pl111 => kmsro}/Android.mk (91%)
>  create mode 100644 src/gallium/drivers/kmsro/Automake.inc
>  rename src/gallium/drivers/{imx => kmsro}/Makefile.am (55%)
>  rename src/gallium/drivers/{pl111 => kmsro}/Makefile.sources (100%)
>  delete mode 100644 src/gallium/drivers/pl111/Automake.inc
>  delete mode 100644 src/gallium/drivers/pl111/Makefile.am
>  delete mode 100644 src/gallium/winsys/imx/drm/Android.mk
>  delete mode 100644 src/gallium/winsys/imx/drm/Makefile.am
>  delete mode 100644 src/gallium/winsys/imx/drm/Makefile.sources
>  delete mode 100644 src/gallium/winsys/imx/drm/imx_drm_public.h
>  delete mode 100644 src/gallium/winsys/imx/drm/imx_drm_winsys.c
>  delete mode 100644 src/gallium/winsys/imx/drm/meson.build
>  rename src/gallium/winsys/{pl111 => kmsro}/drm/Android.mk (97%)
>  rename src/gallium/winsys/{pl111 => kmsro}/drm/Makefile.am (87%)
>  create mode 100644 src/gallium/winsys/kmsro/drm/Makefile.sources
>  rename src/gallium/winsys/{pl111/drm/pl111_drm_public.h =>
> kmsro/drm/kmsro_drm_public.h} (89%)
>  rename src/gallium/winsys/{pl111/drm/pl111_drm_winsys.c =>
> kmsro/drm/kmsro_drm_winsys.c} (63%)
>  rename src/gallium/winsys/{pl111 => kmsro}/drm/meson.build (76%)
>  delete mode 100644 src/gallium/winsys/pl111/drm/Makefile.sources
>
> --
> 2.19.1
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Mesa CI is too slow

2019-02-18 Thread Eric Engestrom
On Monday, 2019-02-18 17:31:41 +, Daniel Stone wrote:
> Hi all,
> A few people have noted that Mesa's GitLab CI is just too slow, and
> not usable in day-to-day development, which is a massive shame.

Agreed :/

> 
> I looked into it a bit this morning, and also discussed it with Emil,
> though nothing in this is speaking for him.
> 
> Taking one of the last runs as representative (nothing in it looks
> like an outlier to me, and 7min to build RadeonSI seems entirely
> reasonable):
> https://gitlab.freedesktop.org/mesa/mesa/pipelines/19692/builds
> 
> This run executed 24 jobs, which is beyond the limit of our CI
> parallelism. As documented on
> https://www.freedesktop.org/wiki/Infrastructure/ we have 14 concurrent
> job slots (each with roughly 4 vCPUs). Those 24 jobs cumulatively took
> 177 minutes of execution time, taking 120 minutes for the end-to-end
> pipeline.
> 
> 177 minutes of runtime is too long for the runners we have now: if it
> perfectly occupies all our runners it will take over 12 minutes, which
> means that even if no-one else was using the runners, they could
> execute 5 Mesa builds per hour at full occupancy. Unfortunately,
> VirGL, Wayland/Weston, libinput, X.Org, IGT, GStreamer,
> NetworkManager/ModemManager, Bolt, Poppler, etc, would all probably
> have something to say about that.
> 
> When the runners aren't occupied and there's less contention for jobs,
> it looks quite good:
> https://gitlab.freedesktop.org/anholt/mesa/pipelines/19621/builds
> 
> This run 'only' took 20.5 minutes to execute, but then again, 3
> pipelines per hour isn't that great either.
> 
> Two hours of end-to-end pipeline time is also obviously far too long.
> Amongst other things, it practically precludes pre-merge CI: by the
> time your build has finished, someone will have pushed to the tree, so
> you need to start again. Even if we serialised it through a bot, that
> would limit us to pushing 12 changesets per day, which seems too low.
> 
> I'm currently talking to two different hosts to try to get more
> sponsored time for CI runners. Those are both on hold this week due to
> travel / personal circumstances, but I'll hopefully find out more next
> week. Eric E filed an issue
> (https://gitlab.freedesktop.org/freedesktop/freedesktop/issues/120) to
> enable ccache cache but I don't see myself having the time to do it
> before next month.

Just to chime in to this point, I also have an MR to enable ccache per
runner, which with our static runners setup is not much worse than the
shared cache:
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/240

From my cursory testing, this should already cut the compilations by
80-90% :)

> 
> In the meantime, it would be great to see how we could reduce the
> number of jobs Mesa runs for each pipeline. Given we're already
> exceeding the limits of parallelism, having so many independent jobs
> isn't reducing the end-to-end pipeline time, but instead just
> duplicating effort required to fetch and check out sources, cache (in
> the future), start the container, run meson or ./configure, and build
> any common files.
> 
> I'm taking it as a given that at least three separate builds are
> required: autotools, Meson, and SCons. Fair enough.
> 
> It's been suggested to me that SWR should remain separate, as it takes
> longer to build than the other drivers, and getting fast feedback is
> important, which is fair enough.
> 
> Suggestion #1: merge scons-swr into scons-llvm. scons-nollvm will
> already provide fast feedback on if we've broken the SCons build, and
> the rest is pretty uninteresting, so merging scons-swr into scons-llvm
> might help cut down on duplication.
> 
> Suggestion #2: merge the misc Gallium jobs together. Building
> gallium-radeonsi and gallium-st-other are both relatively quick. We
> could merge these into gallium-drivers-other for a very small increase
> in overall runtime for that job, and save ourselves probably about 10%
> of the overall build time here.
> 
> Suggestion #3: don't build so much LLVM in autotools. The Meson
> clover-llvm builds take half the time the autotools builds do. Perhaps
> we should only build one LLVM variant within autotools (to test the
> autotools LLVM selection still works), and then build all the rest
> only in Meson. That would be good for another 15-20% reduction in
> overall pipeline run time.
> 
> Suggestion #4 (if necessary): build SWR less frequently. Can we
> perhaps demote SWR to an 'only:' job which will only rebuild SWR if
> SWR itself or Gallium have changed? This would save a good chunk of
> runtime - again close to 10%.
> 
> Doing the above would reduce the run time fairly substantially, for
> what I can tell is no loss in functional coverage, and bring the
> parallelism to a mere 1.5x oversubscription of the whole
> organisation's available job slots, from the current 2x.
> 
> Any thoughts?

Your suggestions all sound good, although I can't speak for #1 and #2.

#3 sounds good, I guess we can 

[Mesa-dev] [ANNOUNCE] mesa 18.3.4

2019-02-18 Thread Emil Velikov
Mesa 18.3.4 is now available.

In this release we have:

A fix in the XvMC state-tracker, which was causing some video attributes to
not take affect. On the video front the VAAPI state tracker has seen
improvements with VP9 streams while the amdgpu driver advertises all available
profiles.

On Intel side we have compiler fixes and extra PCI IDs for Coffee Lake and 
Ice Lake parts. In the Broadcom drivers a couple of memory leaks were
addressed and the NEON assembly should compile properly on armhf.

Other drivers such as radeonsi, nouveau and freedreno have also seen some
love. The RADV driver has seen addressed to compile correctly with GCC9
amongst other changes.

The Xlib based libGL have been addressed to work with X servers, which lacks
the MIT-SHM extension such as XMing.

To top it up we have a few fixes to the meson build system.


Bart Oldeman (1):
  gallium-xlib: query MIT-SHM before using it.

Bas Nieuwenhuizen (2):
  radv: Only look at pImmutableSamples if the descriptor has a sampler.
  amd/common: Use correct writemask for shared memory stores.

Dylan Baker (2):
  get-pick-list: Add --pretty=medium to the arguments for Cc patches
  meson: Add dependency on genxml to anvil

Emil Velikov (6):
  docs: add sha256 checksums for 18.3.3
  cherry-ignore: nv50,nvc0: add explicit settings for recent caps
  cherry-ignore: add more 19.0 only nominations from Ilia
  cherry-ignore: radv: fix using LOAD_CONTEXT_REG with old GFX ME firmwares 
on GFX8
  Update version to 18.3.4
  docs: add release notes for 18.3.4

Eric Anholt (1):
  vc4: Fix copy-and-paste fail in backport of NEON asm fixes.

Eric Engestrom (2):
  xvmc: fix string comparison
  xvmc: fix string comparison

Ernestas Kulik (2):
  vc4: Fix leak in HW queries error path
  v3d: Fix leak in resource setup error path

Iago Toral Quiroga (1):
  intel/compiler: do not copy-propagate strided regions to ddx/ddy arguments

Ilia Mirkin (1):
  nvc0: we have 16k-sized framebuffers, fix default scissors

Jason Ekstrand (3):
  intel/fs: Handle IMAGE_SIZE in size_read() and is_send_from_grf()
  intel/fs: Do the grf127 hack on SIMD8 instructions in SIMD16 mode
  nir/deref: Rematerialize parents in rematerialize_derefs_in_use_blocks

Juan A. Suarez Romero (1):
  anv/cmd_buffer: check for NULL framebuffer

Kenneth Graunke (1):
  st/mesa: Limit GL_MAX_[NATIVE_]PROGRAM_PARAMETERS_ARB to 2048

Kristian H. Kristensen (1):
  freedreno/a6xx: Emit blitter dst with OUT_RELOCW

Leo Liu (2):
  st/va: fix the incorrect max profiles report
  st/va/vp9: set max reference as default of VP9 reference number

Marek Olšák (4):
  meson: drop the xcb-xrandr version requirement
  gallium/u_threaded: fix EXPLICIT_FLUSH for flush offsets > 0
  radeonsi: fix EXPLICIT_FLUSH for flush offsets > 0
  winsys/amdgpu: don't drop manually added fence dependencies

Mario Kleiner (2):
  egl/wayland: Allow client->server format conversion for PRIME offload. 
(v2)
  egl/wayland-drm: Only announce formats via wl_drm which the driver 
supports.

Oscar Blumberg (1):
  radeonsi: Fix guardband computation for large render targets

Rob Clark (1):
  freedreno: stop frob'ing pipe_resource::nr_samples

Rodrigo Vivi (1):
  intel: Add more PCI Device IDs for Coffee Lake and Ice Lake.

Samuel Pitoiset (2):
  radv: fix compiler issues with GCC 9
  radv: always export gl_SampleMask when the fragment shader uses it

git tag: mesa-18.3.4

https://mesa.freedesktop.org/archive/mesa-18.3.4.tar.gz
MD5:  c7c1c02ab654cca06f37027f91cbe365  mesa-18.3.4.tar.gz
SHA1: 54c78b574c22326f2e2f5b73a7685cb15c2e09f3  mesa-18.3.4.tar.gz
SHA256: e22e6fe4c3aca80fe872a0a7285b6c5523e0cfc0bfb57ffcc3b3d66d292593e4  
mesa-18.3.4.tar.gz
SHA512: 
dcbd871e374e9c7038c73741e3e8df4f2d7048335621f7cea68e8716d2458a9dd6968c2cb2dbb4e067dd2308f7eae48d4f67592d1c26b410f8df431cb6551a04
  mesa-18.3.4.tar.gz
PGP:  https://mesa.freedesktop.org/archive/mesa-18.3.4.tar.gz.sig

https://mesa.freedesktop.org/archive/mesa-18.3.4.tar.xz
MD5:  6f2a5e01dd5cb91d05a9534f5a80c35d  mesa-18.3.4.tar.xz
SHA1: a9a6ea0f5b99df362dc35770ade10c1738e2629c  mesa-18.3.4.tar.xz
SHA256: 32314da4365d37f80d84f599bd9625b00161c273c39600ba63b45002d500bb07  
mesa-18.3.4.tar.xz
SHA512: 
e4ead944ba053aa05425e9e199d633f576dfa424976253fc32438e8db6da5e8d381122e4c4b7fb18f94177421f208bab5567cfec8d2692d104e266483ca02a99
  mesa-18.3.4.tar.xz
PGP:  https://mesa.freedesktop.org/archive/mesa-18.3.4.tar.xz.sig


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Mesa CI is too slow

2019-02-18 Thread Matt Turner
On Mon, Feb 18, 2019 at 9:32 AM Daniel Stone  wrote:
>
> Hi all,
> A few people have noted that Mesa's GitLab CI is just too slow, and
> not usable in day-to-day development, which is a massive shame.
>
> I looked into it a bit this morning, and also discussed it with Emil,
> though nothing in this is speaking for him.
>
> Taking one of the last runs as representative (nothing in it looks
> like an outlier to me, and 7min to build RadeonSI seems entirely
> reasonable):
> https://gitlab.freedesktop.org/mesa/mesa/pipelines/19692/builds
>
> This run executed 24 jobs, which is beyond the limit of our CI
> parallelism. As documented on
> https://www.freedesktop.org/wiki/Infrastructure/ we have 14 concurrent
> job slots (each with roughly 4 vCPUs). Those 24 jobs cumulatively took
> 177 minutes of execution time, taking 120 minutes for the end-to-end
> pipeline.
>
> 177 minutes of runtime is too long for the runners we have now: if it
> perfectly occupies all our runners it will take over 12 minutes, which
> means that even if no-one else was using the runners, they could
> execute 5 Mesa builds per hour at full occupancy. Unfortunately,
> VirGL, Wayland/Weston, libinput, X.Org, IGT, GStreamer,
> NetworkManager/ModemManager, Bolt, Poppler, etc, would all probably
> have something to say about that.
>
> When the runners aren't occupied and there's less contention for jobs,
> it looks quite good:
> https://gitlab.freedesktop.org/anholt/mesa/pipelines/19621/builds
>
> This run 'only' took 20.5 minutes to execute, but then again, 3
> pipelines per hour isn't that great either.
>
> Two hours of end-to-end pipeline time is also obviously far too long.
> Amongst other things, it practically precludes pre-merge CI: by the
> time your build has finished, someone will have pushed to the tree, so
> you need to start again. Even if we serialised it through a bot, that
> would limit us to pushing 12 changesets per day, which seems too low.
>
> I'm currently talking to two different hosts to try to get more
> sponsored time for CI runners. Those are both on hold this week due to
> travel / personal circumstances, but I'll hopefully find out more next
> week. Eric E filed an issue
> (https://gitlab.freedesktop.org/freedesktop/freedesktop/issues/120) to
> enable ccache cache but I don't see myself having the time to do it
> before next month.
>
> In the meantime, it would be great to see how we could reduce the
> number of jobs Mesa runs for each pipeline. Given we're already
> exceeding the limits of parallelism, having so many independent jobs
> isn't reducing the end-to-end pipeline time, but instead just
> duplicating effort required to fetch and check out sources, cache (in
> the future), start the container, run meson or ./configure, and build
> any common files.
>
> I'm taking it as a given that at least three separate builds are
> required: autotools, Meson, and SCons. Fair enough.
>
> It's been suggested to me that SWR should remain separate, as it takes
> longer to build than the other drivers, and getting fast feedback is
> important, which is fair enough.
>
> Suggestion #1: merge scons-swr into scons-llvm. scons-nollvm will
> already provide fast feedback on if we've broken the SCons build, and
> the rest is pretty uninteresting, so merging scons-swr into scons-llvm
> might help cut down on duplication.
>
> Suggestion #2: merge the misc Gallium jobs together. Building
> gallium-radeonsi and gallium-st-other are both relatively quick. We
> could merge these into gallium-drivers-other for a very small increase
> in overall runtime for that job, and save ourselves probably about 10%
> of the overall build time here.
>
> Suggestion #3: don't build so much LLVM in autotools. The Meson
> clover-llvm builds take half the time the autotools builds do. Perhaps
> we should only build one LLVM variant within autotools (to test the
> autotools LLVM selection still works), and then build all the rest
> only in Meson. That would be good for another 15-20% reduction in
> overall pipeline run time.
>
> Suggestion #4 (if necessary): build SWR less frequently. Can we
> perhaps demote SWR to an 'only:' job which will only rebuild SWR if
> SWR itself or Gallium have changed? This would save a good chunk of
> runtime - again close to 10%.
>
> Doing the above would reduce the run time fairly substantially, for
> what I can tell is no loss in functional coverage, and bring the
> parallelism to a mere 1.5x oversubscription of the whole
> organisation's available job slots, from the current 2x.
>
> Any thoughts?

All of your suggestions seem reasonable.

Removing autotools [1] would obviously reduce the number of builds.

If I understood correctly, we are kicking off a CI run for every push
to a fork of the Mesa repo, and not just for merge requests. I think
that's absolutely the wrong thing to do. CI for personal branches
should be opt-in.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=mesa-autotools-removal

[Mesa-dev] [Bug 109443] Build failure with MSVC 2017 when using Scons >= 3.0.2

2019-02-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109443

Alex Granni  changed:

   What|Removed |Added

 Attachment #143208|Mesa3D MSVC build fails |Mesa3D MSVC build fails
description|with Scons >= 3.0.3 |with Scons >= 3.0.2
 Attachment #143208|scons-3.0.3-3.0.4-mesa3d-bu |scons-3.0.2-3.0.4-mesa3d-bu
   filename|ild-fails.txt   |ild-fails.txt

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] Mesa CI is too slow

2019-02-18 Thread Daniel Stone
Hi all,
A few people have noted that Mesa's GitLab CI is just too slow, and
not usable in day-to-day development, which is a massive shame.

I looked into it a bit this morning, and also discussed it with Emil,
though nothing in this is speaking for him.

Taking one of the last runs as representative (nothing in it looks
like an outlier to me, and 7min to build RadeonSI seems entirely
reasonable):
https://gitlab.freedesktop.org/mesa/mesa/pipelines/19692/builds

This run executed 24 jobs, which is beyond the limit of our CI
parallelism. As documented on
https://www.freedesktop.org/wiki/Infrastructure/ we have 14 concurrent
job slots (each with roughly 4 vCPUs). Those 24 jobs cumulatively took
177 minutes of execution time, taking 120 minutes for the end-to-end
pipeline.

177 minutes of runtime is too long for the runners we have now: if it
perfectly occupies all our runners it will take over 12 minutes, which
means that even if no-one else was using the runners, they could
execute 5 Mesa builds per hour at full occupancy. Unfortunately,
VirGL, Wayland/Weston, libinput, X.Org, IGT, GStreamer,
NetworkManager/ModemManager, Bolt, Poppler, etc, would all probably
have something to say about that.

When the runners aren't occupied and there's less contention for jobs,
it looks quite good:
https://gitlab.freedesktop.org/anholt/mesa/pipelines/19621/builds

This run 'only' took 20.5 minutes to execute, but then again, 3
pipelines per hour isn't that great either.

Two hours of end-to-end pipeline time is also obviously far too long.
Amongst other things, it practically precludes pre-merge CI: by the
time your build has finished, someone will have pushed to the tree, so
you need to start again. Even if we serialised it through a bot, that
would limit us to pushing 12 changesets per day, which seems too low.

I'm currently talking to two different hosts to try to get more
sponsored time for CI runners. Those are both on hold this week due to
travel / personal circumstances, but I'll hopefully find out more next
week. Eric E filed an issue
(https://gitlab.freedesktop.org/freedesktop/freedesktop/issues/120) to
enable ccache cache but I don't see myself having the time to do it
before next month.

In the meantime, it would be great to see how we could reduce the
number of jobs Mesa runs for each pipeline. Given we're already
exceeding the limits of parallelism, having so many independent jobs
isn't reducing the end-to-end pipeline time, but instead just
duplicating effort required to fetch and check out sources, cache (in
the future), start the container, run meson or ./configure, and build
any common files.

I'm taking it as a given that at least three separate builds are
required: autotools, Meson, and SCons. Fair enough.

It's been suggested to me that SWR should remain separate, as it takes
longer to build than the other drivers, and getting fast feedback is
important, which is fair enough.

Suggestion #1: merge scons-swr into scons-llvm. scons-nollvm will
already provide fast feedback on if we've broken the SCons build, and
the rest is pretty uninteresting, so merging scons-swr into scons-llvm
might help cut down on duplication.

Suggestion #2: merge the misc Gallium jobs together. Building
gallium-radeonsi and gallium-st-other are both relatively quick. We
could merge these into gallium-drivers-other for a very small increase
in overall runtime for that job, and save ourselves probably about 10%
of the overall build time here.

Suggestion #3: don't build so much LLVM in autotools. The Meson
clover-llvm builds take half the time the autotools builds do. Perhaps
we should only build one LLVM variant within autotools (to test the
autotools LLVM selection still works), and then build all the rest
only in Meson. That would be good for another 15-20% reduction in
overall pipeline run time.

Suggestion #4 (if necessary): build SWR less frequently. Can we
perhaps demote SWR to an 'only:' job which will only rebuild SWR if
SWR itself or Gallium have changed? This would save a good chunk of
runtime - again close to 10%.

Doing the above would reduce the run time fairly substantially, for
what I can tell is no loss in functional coverage, and bring the
parallelism to a mere 1.5x oversubscription of the whole
organisation's available job slots, from the current 2x.

Any thoughts?

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] egl/sl: use kms_swrast with vgem instead of a random GPU

2019-02-18 Thread Eric Engestrom
On Tuesday, 2019-02-05 15:31:08 +, Emil Velikov wrote:
> From: Emil Velikov 
> 
> VGEM and kms_swrast were introduced to work with one another.
> 
> All we do is CPU rendering to dumb buffers. There is no reason to carve
> out GPU memory, increasing the memory pressure on a device that could
> make a better use of it.
> 
> For kms_swrast to work properly we require the primary node, as the dumb
> buffer ioctls are not exposed via the render node.
> 
> Note that this requires libdrm commit 3df8a7f0 ("xf86drm: fallback to
> MODALIAS for OF less platform devices")

Without this, what happens? swrast stops working?

A couple style comments below, but this question is my main concern.

> 
> Signed-off-by: Emil Velikov 
> ---
>  src/egl/drivers/dri2/platform_surfaceless.c | 20 ++--
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/src/egl/drivers/dri2/platform_surfaceless.c 
> b/src/egl/drivers/dri2/platform_surfaceless.c
> index e1151e3585c..54c6856c63c 100644
> --- a/src/egl/drivers/dri2/platform_surfaceless.c
> +++ b/src/egl/drivers/dri2/platform_surfaceless.c
> @@ -286,10 +286,11 @@ surfaceless_probe_device(_EGLDisplay *dpy, bool swrast)
> for (i = 0; i < num_devices; ++i) {
>device = devices[i];
>  
> -  if (!(device->available_nodes & (1 << DRM_NODE_RENDER)))
> +  const unsigned node_type = swrast ? DRM_NODE_PRIMARY : DRM_NODE_RENDER;
> +  if (!(device->available_nodes & (1 << node_type)))
>   continue;
>  
> -  dri2_dpy->fd = loader_open_device(device->nodes[DRM_NODE_RENDER]);
> +  dri2_dpy->fd = loader_open_device(device->nodes[node_type]);
>if (dri2_dpy->fd < 0)
>   continue;
>  
> @@ -300,10 +301,17 @@ surfaceless_probe_device(_EGLDisplay *dpy, bool swrast)
>   continue;
>}
>  
> -  if (swrast)
> - dri2_dpy->driver_name = strdup("kms_swrast");
> -  else
> - dri2_dpy->driver_name = loader_get_driver_for_fd(dri2_dpy->fd);
> +  dri2_dpy->driver_name = loader_get_driver_for_fd(dri2_dpy->fd);

Can you keep the else branch like before? Makes it more readable IMO,
and avoids allocating memory just to free it a couple lines below.

> +  if (swrast) {
> + /* Use kms swrast only with vgem */
> + if (strcmp(dri2_dpy->driver_name, "vgem") != 0) {
> +free(dri2_dpy->driver_name);
> +dri2_dpy->driver_name = NULL;
> + } else {
> +free(dri2_dpy->driver_name);
> +dri2_dpy->driver_name = strdup("kms_swrast");

Again, IMO this would be more readable as "if vgem use kms_swrast"
instead of "if not vgem skip else use kms_swrast".

The above two comments combined give us this code instead:

  if swrast
if driver == vgem
  driver = kms_swrast
  else
driver = get_driver(fd)

> + }
> +  }
>  
>if (dri2_dpy->driver_name && dri2_load_driver_dri3(dpy))
>   break;
> -- 
> 2.20.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] egl/sl: use drmDevice API to enumerate available devices

2019-02-18 Thread Eric Engestrom
On Tuesday, 2019-02-05 15:31:07 +, Emil Velikov wrote:
> From: Emil Velikov 
> 
> This provides for a more comprehensive iteration and a more
> straight-forward codebase, while minimising the platform specifics.
> 
> Signed-off-by: Emil Velikov 
> ---
>  src/egl/drivers/dri2/platform_surfaceless.c | 73 +++--
>  1 file changed, 37 insertions(+), 36 deletions(-)
> 
> diff --git a/src/egl/drivers/dri2/platform_surfaceless.c 
> b/src/egl/drivers/dri2/platform_surfaceless.c
> index d6e48ba11b2..e1151e3585c 100644
> --- a/src/egl/drivers/dri2/platform_surfaceless.c
> +++ b/src/egl/drivers/dri2/platform_surfaceless.c
> @@ -274,55 +274,56 @@ static const __DRIextension *swrast_loader_extensions[] 
> = {
>  static bool
>  surfaceless_probe_device(_EGLDisplay *dpy, bool swrast)
>  {
> +#define MAX_DRM_DEVICES 32
> struct dri2_egl_display *dri2_dpy = dpy->DriverData;
> -   const int limit = 64;

Any reason to drop the 64 down to 32?

Other than that, looks good to me:
Reviewed-by: Eric Engestrom 

> -   const int base = 128;
> -   int fd;
> -   int i;
> -
> -   /* Attempt to find DRM device. */
> -   for (i = 0; i < limit; ++i) {
> -  char *card_path;
> -  if (asprintf(_path, DRM_RENDER_DEV_NAME, DRM_DIR_NAME, base + i) 
> < 0)
> +   drmDevicePtr device, devices[MAX_DRM_DEVICES] = { NULL };
> +   int i, num_devices;
> +
> +   num_devices = drmGetDevices2(0, devices, ARRAY_SIZE(devices));
> +   if (num_devices < 0)
> +  return false;
> +
> +   for (i = 0; i < num_devices; ++i) {
> +  device = devices[i];
> +
> +  if (!(device->available_nodes & (1 << DRM_NODE_RENDER)))
>   continue;
>  
> -  fd = loader_open_device(card_path);
> -  free(card_path);
> -  if (fd < 0)
> +  dri2_dpy->fd = loader_open_device(device->nodes[DRM_NODE_RENDER]);
> +  if (dri2_dpy->fd < 0)
>   continue;
>  
> -  if (swrast) {
> - dri2_dpy->driver_name = strdup("kms_swrast");
> - dri2_dpy->loader_extensions = swrast_loader_extensions;
> -  } else {
> - dri2_dpy->driver_name = loader_get_driver_for_fd(fd);
> - dri2_dpy->loader_extensions = image_loader_extensions;
> -  }
> -  if (!dri2_dpy->driver_name) {
> - close(fd);
> +  dpy->Device = _eglAddDevice(dri2_dpy->fd, swrast);
> +  if (!dpy->Device) {
> + close(dri2_dpy->fd);
> + dri2_dpy->fd = -1;
>   continue;
>}
>  
> -  dri2_dpy->fd = fd;
> -  if (dri2_load_driver_dri3(dpy)) {
> - _EGLDevice *dev = _eglAddDevice(dri2_dpy->fd, swrast);
> - if (!dev) {
> -dlclose(dri2_dpy->driver);
> -_eglLog(_EGL_WARNING, "DRI2: failed to find EGLDevice");
> -continue;
> - }
> - dpy->Device = dev;
> - return true;
> -  }
> +  if (swrast)
> + dri2_dpy->driver_name = strdup("kms_swrast");
> +  else
> + dri2_dpy->driver_name = loader_get_driver_for_fd(dri2_dpy->fd);
> +
> +  if (dri2_dpy->driver_name && dri2_load_driver_dri3(dpy))
> + break;
>  
> -  close(fd);
> -  dri2_dpy->fd = -1;
>free(dri2_dpy->driver_name);
>dri2_dpy->driver_name = NULL;
> -  dri2_dpy->loader_extensions = NULL;
> +  close(dri2_dpy->fd);
> +  dri2_dpy->fd = -1;
> }
> +   drmFreeDevices(devices, num_devices);
> +
> +   if (i == num_devices)
> +  return false;
> +
> +   if (swrast)
> +  dri2_dpy->loader_extensions = swrast_loader_extensions;
> +   else
> +  dri2_dpy->loader_extensions = image_loader_extensions;

I feel like you could've left this in the other `if (swrast)` above, but
it doesn't really matter.

>  
> -   return false;
> +   return true;
>  }
>  
>  static bool
> -- 
> 2.20.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] egl/sl: split out swrast probe into separate function

2019-02-18 Thread Eric Engestrom
On Tuesday, 2019-02-05 15:31:06 +, Emil Velikov wrote:
> From: Emil Velikov 
> 
> Make the code a bit easier to read.
> 
> As a bonus point this makes it obvious that we forgot to call
> _eglAddDevice() for the device - do so.
> 
> Signed-off-by: Emil Velikov 
> ---
>  src/egl/drivers/dri2/platform_surfaceless.c | 46 -
>  1 file changed, 27 insertions(+), 19 deletions(-)
> 
> diff --git a/src/egl/drivers/dri2/platform_surfaceless.c 
> b/src/egl/drivers/dri2/platform_surfaceless.c
> index f9809561611..d6e48ba11b2 100644
> --- a/src/egl/drivers/dri2/platform_surfaceless.c
> +++ b/src/egl/drivers/dri2/platform_surfaceless.c
> @@ -322,25 +322,27 @@ surfaceless_probe_device(_EGLDisplay *dpy, bool swrast)
>dri2_dpy->loader_extensions = NULL;
> }
>  
> -   /* No DRM device, so attempt to fall back to software path w/o DRM. */
> -   if (swrast) {
> -  _eglLog(_EGL_DEBUG, "Falling back to surfaceless swrast without DRM.");
> -  dri2_dpy->fd = -1;
> -  dri2_dpy->driver_name = strdup("swrast");
> -  if (!dri2_dpy->driver_name) {
> - return false;
> -  }
> +   return false;
> +}
>  
> -  if (dri2_load_driver_swrast(dpy)) {
> - dri2_dpy->loader_extensions = swrast_loader_extensions;
> - return true;
> -  }
> +static bool
> +surfaceless_probe_device_sw(_EGLDisplay *dpy)

Don't forget to rename s/dpy/disp/ :)

> +{
> +   struct dri2_egl_display *dri2_dpy = dpy->DriverData;
>  
> -  free(dri2_dpy->driver_name);
> -  dri2_dpy->driver_name = NULL;
> -   }
> +   dri2_dpy->fd = -1;
> +   dpy->Device = _eglAddDevice(dri2_dpy->fd, true);
> +   assert(dpy->Device);
>  
> -   return false;
> +   dri2_dpy->driver_name = strdup("swrast");
> +   if (!dri2_dpy->driver_name)
> +  return false;
> +
> +   if (!dri2_load_driver_swrast(dpy))

free(dri2_dpy->driver_name);

With that:
Reviewed-by: Eric Engestrom 

> +  return false;
> +
> +   dri2_dpy->loader_extensions = swrast_loader_extensions;
> +   return true;
>  }
>  
>  EGLBoolean
> @@ -364,9 +366,15 @@ dri2_initialize_surfaceless(_EGLDriver *drv, _EGLDisplay 
> *disp)
>   "No hardware driver found, falling back to software 
> rendering");
> }
>  
> -   if (!driver_loaded && !surfaceless_probe_device(disp, true)) {
> -  err = "DRI2: failed to load driver";
> -  goto cleanup;
> +   if (!driver_loaded)
> +  driver_loaded = surfaceless_probe_device(disp, true);
> +
> +   if (!driver_loaded) {
> +  _eglLog(_EGL_DEBUG, "Falling back to surfaceless swrast without DRM.");
> +  if (!surfaceless_probe_device_sw(disp)) {
> + err = "DRI2: failed to load driver";
> + goto cleanup;
> +  }
> }
>  
> if (!dri2_create_screen(disp)) {
> -- 
> 2.20.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] isl: remove the cache line size alignment requirement

2019-02-18 Thread Lionel Landwerlin

On 18/02/2019 15:08, Chris Wilson wrote:

Quoting Lionel Landwerlin (2019-02-18 15:06:15)

On 15/02/2019 14:43, Samuel Iglesias Gonsálvez wrote:

There are formats which bpp are not aligned to a power-of-two and
that can cause problems in the checks we do.

The cacheline size was a requirement for using the BLT engine, which
we don't use anymore except for a few things on old HW, so we drop it.

Fixes CTS's CL#3500 test:

dEQP-VK.api.image_clearing.core.clear_color_image.2d.linear.single_layer.r8g8b8_unorm

Signed-off-by: Samuel Iglesias Gonsálvez 


That looks good to me :
Reviewed-by: Lionel Landwerlin 

I'm doing a CI run just to convince myself, so if you can wait for that.

Is scanout a concern? The display engine also requires 64B alignment for
linear.
-Chris


Thanks for reminding us :)


I guess we need an additional check with if 
(isl_surf_usage_is_display(info->usage)) base_alignment_B = 
MAX(base_alignment_B, 64);


-Lionel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] isl: remove the cache line size alignment requirement

2019-02-18 Thread Chris Wilson
Quoting Lionel Landwerlin (2019-02-18 15:06:15)
> On 15/02/2019 14:43, Samuel Iglesias Gonsálvez wrote:
> > There are formats which bpp are not aligned to a power-of-two and
> > that can cause problems in the checks we do.
> >
> > The cacheline size was a requirement for using the BLT engine, which
> > we don't use anymore except for a few things on old HW, so we drop it.
> >
> > Fixes CTS's CL#3500 test:
> >
> > dEQP-VK.api.image_clearing.core.clear_color_image.2d.linear.single_layer.r8g8b8_unorm
> >
> > Signed-off-by: Samuel Iglesias Gonsálvez 
> 
> 
> That looks good to me :
> Reviewed-by: Lionel Landwerlin 
> 
> I'm doing a CI run just to convince myself, so if you can wait for that.

Is scanout a concern? The display engine also requires 64B alignment for
linear.
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] isl: remove the cache line size alignment requirement

2019-02-18 Thread Lionel Landwerlin

On 15/02/2019 14:43, Samuel Iglesias Gonsálvez wrote:

There are formats which bpp are not aligned to a power-of-two and
that can cause problems in the checks we do.

The cacheline size was a requirement for using the BLT engine, which
we don't use anymore except for a few things on old HW, so we drop it.

Fixes CTS's CL#3500 test:

dEQP-VK.api.image_clearing.core.clear_color_image.2d.linear.single_layer.r8g8b8_unorm

Signed-off-by: Samuel Iglesias Gonsálvez 



That looks good to me :
Reviewed-by: Lionel Landwerlin 

I'm doing a CI run just to convince myself, so if you can wait for that.

Thanks,

-Lionel



---
  src/intel/isl/isl.c | 21 -
  1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
index eaaa28014a3..7f1f2339931 100644
--- a/src/intel/isl/isl.c
+++ b/src/intel/isl/isl.c
@@ -1381,20 +1381,6 @@ isl_calc_row_pitch(const struct isl_device *dev,
 uint32_t alignment_B =
isl_calc_row_pitch_alignment(surf_info, tile_info);
  
-   /* If pitch isn't given and it can be chosen freely, align it by cache line

-* allowing one to use blit engine on the surface.
-*/
-   if (surf_info->row_pitch_B == 0 && tile_info->tiling == ISL_TILING_LINEAR) {
-  /* From the Broadwell PRM docs for XY_SRC_COPY_BLT::SourceBaseAddress:
-   *
-   *"Base address of the destination surface: X=0, Y=0. Lower 32bits
-   *of the 48bit addressing. When Src Tiling is enabled (Bit_15
-   *enabled), this address must be 4KB-aligned. When Tiling is not
-   *enabled, this address should be CL (64byte) aligned."
-   */
-  alignment_B = MAX2(alignment_B, 64);
-   }
-
 const uint32_t min_row_pitch_B =
isl_calc_min_row_pitch(dev, surf_info, tile_info, phys_total_el,
   alignment_B);
@@ -1527,12 +1513,13 @@ isl_surf_init_s(const struct isl_device *dev,
base_alignment_B = MAX(1, info->min_alignment_B);
if (info->usage & ISL_SURF_USAGE_RENDER_TARGET_BIT) {
   if (isl_format_is_yuv(info->format)) {
-base_alignment_B = MAX(base_alignment_B, fmtl->bpb / 4);
+base_alignment_B = isl_align_npot(base_alignment_B, fmtl->bpb / 4);
   } else {
-base_alignment_B = MAX(base_alignment_B, fmtl->bpb / 8);
+base_alignment_B = isl_align_npot(base_alignment_B, fmtl->bpb / 8);
   }
+  } else {
+ base_alignment_B = isl_round_up_to_power_of_two(base_alignment_B);
}
-  base_alignment_B = isl_round_up_to_power_of_two(base_alignment_B);
 } else {
const uint32_t total_h_tl =
   isl_align_div(phys_total_el.h, tile_info.logical_extent_el.height);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] anv/image: fix offset's alignment to the surface alignment

2019-02-18 Thread Lionel Landwerlin

On 15/02/2019 14:43, Samuel Iglesias Gonsálvez wrote:

Signed-off-by: Samuel Iglesias Gonsálvez 



Hey Samuel,


Thanks for this change. Would you mind changing the align_u32 in the 
if() branch too?


It won't fix anything but that's just to be consistent.


With that :

Reviewed-by: Lionel Landwerlin 



---
  src/intel/vulkan/anv_image.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
index 3999c7399d0..f4a65044a3b 100644
--- a/src/intel/vulkan/anv_image.c
+++ b/src/intel/vulkan/anv_image.c
@@ -142,7 +142,7 @@ add_surface(struct anv_image *image, struct anv_surface 
*surf, uint32_t plane)
 surf->isl.alignment_B);
/* Plane offset is always 0 when it's disjoint. */
 } else {
-  surf->offset = align_u32(image->size, surf->isl.alignment_B);
+  surf->offset = util_align_npot(image->size, surf->isl.alignment_B);
/* Determine plane's offset only once when the first surface is added. 
*/
if (image->planes[plane].size == 0)
   image->planes[plane].offset = image->size;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage

2019-02-18 Thread Rhys Perry
The CTS is buggy because the input_output_float_64_to_16 tests are run
even though they shouldn't be run because they try to use a
unadvertised (and unimplemented) optional feature.
Some of them crash for unrelated reasons though: load_tess_varyings()
from ac_nir_to_llvm.c doesn't handle 64-bit varyings. So not all of
them would work even if VK_FORMAT_R64_SFLOAT was a implemented vertex
format.

On Mon, 18 Feb 2019 at 08:53, Samuel Pitoiset  wrote:
>
>
> On 2/16/19 1:21 AM, Rhys Perry wrote:
> > This series add support for:
> > - VK_KHR_shader_float16_int8
> > - VK_AMD_gpu_shader_half_float
> > - VK_AMD_gpu_shader_int16
> > - VK_KHR_8bit_storage
> > on VI+. Half floats are disabled on LLVM 7 because of a bug causing large
> > memory usage and long (or unbounded) compilation times with some CTS
> > tests.
> >
> > It is written against the following patch series:
> > - https://patchwork.freedesktop.org/series/53454/ (v4)
> > - https://patchwork.freedesktop.org/series/53660/ (v1)
> >
> > With LLVM 9, there are no reproducable Vulkan CTS regressions with Vega
> > and VI except for
> > dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_float_64_to_16.*
> > which fails or crashes because of unrelated radv bugs with 64-bit varyings
> > and because the tests use VK_FORMAT_R64_SFLOAT as a vertex format even
> > though radv does not support it.
>
> test bug?
>
> The two NIR related patches (22 and 25) should be sent separately,
> otherwise people working on NIR might miss them.
>
> >
> > With LLVM 9, there are no reproducable piglit regressions except for
> > glsl-array-bounds-12.shader_test because of a LLVM bug when
> > SLP vectorization is enabled.
> >
> > With LLVM 8, there are no reproducable Vulkan CTS regressions with Vega
> > and VI except for those with LLVM 9 and a couple of tests because of a
> > LLVM bug after the SLP vectorizer and with the current lack of fallback
> > for 16-bit interpolation on LLVM versions before LLVM 9.
> >
> > With LLVM 7, there are no reproducable Vulkan CTS regressions with Vega
> > and VI except for those with LLVM 9 and a couple of tests because of a
> > LLVM bug after the SLP vectorizer.
> >
> > The SLP vectorization patch is marked as WIP because it exposes LLVM bugs
> > with piglit's glsl-array-bounds-12.shader_test, some Vulkan CTS tests and
> > some shader-db test for a game I can't remember. It also over-vectorizes
> > 32-bit code which can cause significant worsening in generated code
> > quality.
> >
> > The 16-bit interpolation patch is marked as WIP because it currently
> > requires intrinsics only available in LLVM 9 and does not have a fallback.
> >
> > A branch on Github containing this series can be found at:
> > https://github.com/pendingchaos/mesa/commits/radv_fp16_int16_int8_v2
> >
> > v2: rebase
> > v2: implement 16-bit interpolation
> > v2: move LLVMAddSLPVectorizePass to after LLVMAddEarlyCSEMemSSAPass
> > v2: run vectorization unconditionally on GFX9 and later
> > v2: remove ac_get_one(), ac_get_zero(), ac_get_onef() and ac_get_zerof()
> > v2: remove ac_int_of_size()
> > v2: fix 64-bit visit_load_var()
> > v2: mark VK_KHR_8bit_storage as DONE in features.txt
> > v2: mark SLP vectorization patch as WIP
> > v2: fix C++ style comment
> >
> > Rhys Perry (41):
> >radv: bitcast 16-bit outputs to integers
> >radv: ensure export arguments are always float
> >ac: add various helpers for float16/int16/int8
> >ac/nir: implement 8-bit push constant, ssbo and ubo loads
> >ac/nir: implement 8-bit ssbo stores
> >ac/nir: fix 16-bit ssbo stores
> >ac/nir: implement 8-bit nir_load_const_instr
> >ac/nir: implement 8-bit conversions
> >ac/nir: fix 64-bit nir_op_f2f16_rtz
> >ac/nir: make ac_build_clamp work on all bit sizes
> >ac/nir: make ac_build_fract work on all bit sizes
> >ac/nir: make ac_build_isign work on all bit sizes
> >ac/nir: make ac_build_fsign work on all bit sizes
> >ac/nir: make ac_build_fdiv support 16-bit floats
> >ac/nir: implement half-float nir_op_frcp
> >ac/nir: implement half-float nir_op_frsq
> >ac/nir: implement half-float nir_op_ldexp
> >radv: lower 16-bit flrp
> >ac/nir: support half floats in emit_b2f
> >ac/nir: make emit_b2i work on all bit sizes
> >ac/nir: implement 16-bit shifts
> >compiler/nir: add lowering option for 16-bit ffma
> >ac/nir: implement 16-bit ac_build_ddxy
> >ac/nir: implement 8 and 16 bit ac_build_readlane
> >nir: make bitfield_reverse and ifind_msb work with all integers
> >ac/nir: make ac_find_lsb work on all bit sizes
> >ac/nir: make ac_build_umsb work on all bit sizes
> >ac/nir: implement 8 and 16 bit ac_build_imsb
> >ac/nir: make ac_build_bit_count work on all bit sizes
> >ac/nir: make ac_build_bitfield_reverse work on all bit sizes
> >ac/nir: implement 16-bit pack/unpack opcodes
> >ac/nir: add 8-bit types to glsl_base_to_llvm_type
> >ac/nir,radv: create an array of varying 

Re: [Mesa-dev] [PATCH v2 06/41] ac/nir: fix 16-bit ssbo stores

2019-02-18 Thread Rhys Perry
I don't see a 16-bit version of tbuffer.store in IntrinsicsAMDGPU.td
and simply changing "llvm.amdgcn.tbuffer.store.i32" to
"llvm.amdgcn.tbuffer.store.i16" and removing the zext doesn't seem to
work.

On Mon, 18 Feb 2019 at 08:55, Samuel Pitoiset  wrote:
>
> Does this fix anything know? There is a 16-bit version of tbuffer.store,
> maybe we should use it?
>
> On 2/16/19 1:21 AM, Rhys Perry wrote:
> > Signed-off-by: Rhys Perry 
> > ---
> >   src/amd/common/ac_nir_to_llvm.c | 2 ++
> >   1 file changed, 2 insertions(+)
> >
> > diff --git a/src/amd/common/ac_nir_to_llvm.c 
> > b/src/amd/common/ac_nir_to_llvm.c
> > index 89a78b43c6f..b260142c177 100644
> > --- a/src/amd/common/ac_nir_to_llvm.c
> > +++ b/src/amd/common/ac_nir_to_llvm.c
> > @@ -1586,6 +1586,8 @@ static void visit_store_ssbo(struct ac_nir_context 
> > *ctx,
> >   } else if (num_bytes == 2) {
> >   store_name = "llvm.amdgcn.tbuffer.store.i32";
> >   data_type = ctx->ac.i32;
> > + data = LLVMBuildBitCast(ctx->ac.builder, data, 
> > ctx->ac.i16, "");
> > + data = LLVMBuildZExt(ctx->ac.builder, data, 
> > data_type, "");
> >   LLVMValueRef tbuffer_params[] = {
> >   data,
> >   rsrc,
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] i965: Take responsibility for context recovery after any GPU hang

2019-02-18 Thread Chris Wilson
To make wedging even more likely, we use a new "no recovery" context
parameter that tells the kernel to not even attempt to replay any
batches in flight against the default context image, as experience shows
the HW is not always robust enough to cope with the conflicting state.
This allows us to always take over responsibility of rebuilding the lost
context following a GPU hang.

Cc: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_bufmgr.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c 
b/src/mesa/drivers/dri/i965/brw_bufmgr.c
index 1248f8b9fa4..a0cbc315b60 100644
--- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
+++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
@@ -1589,6 +1589,28 @@ init_cache_buckets(struct brw_bufmgr *bufmgr)
}
 }
 
+static void init_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id)
+{
+   /*
+* Upon declaring a GPU hang, the kernel will zap the guilty context
+* back to the default logical HW state and attempt to continue on to
+* our next submitted batchbuffer. However, we only send incremental
+* logical state (e.g. we only ever setup invariant register state
+* once in brw_initial_gpu_upload()) and so attempting to reply the
+* next batchbuffer without the correct logical state can be fatal.
+* Here we tell the kernel not to attempt to recover our context but
+* immediately (on the next batchbuffer submission) report that the
+* context is lost, and we will do the recovery ourselves -- 2 lost
+* batches instead of a continual stream until we are banned, or the
+* machine is dead.
+*/
+   struct drm_i915_gem_context_param p = {
+  .ctx_id = ctx_id,
+  .param = I915_CONTEXT_PARAM_RECOVERABLE,
+   };
+   drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, );
+}
+
 uint32_t
 brw_create_hw_context(struct brw_bufmgr *bufmgr)
 {
@@ -1599,6 +1621,8 @@ brw_create_hw_context(struct brw_bufmgr *bufmgr)
   return 0;
}
 
+   init_context(bufmgr, create.ctx_id);
+
return create.ctx_id;
 }
 
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] drm-uapi: Update i915_drm.h for I915_CONTEXT_PARAM_RECOVERABLE

2019-02-18 Thread Chris Wilson
XXX Not in drm-next XXX

Pull i915_drm.h to include

commit ba4fda620a5f7db521aa9e0262cf49854c1b1d9c (HEAD -> drm-intel-next-queued, 
drm-intel/drm-intel-next-queued)
Author: Chris Wilson 
Date:   Mon Feb 18 10:58:21 2019 +

drm/i915: Optionally disable automatic recovery after a GPU reset

for improved resilience in handling GPU hangs.
---
 include/drm-uapi/i915_drm.h | 114 
 1 file changed, 114 insertions(+)

diff --git a/include/drm-uapi/i915_drm.h b/include/drm-uapi/i915_drm.h
index 16e452aa12d..43fb8ede2fe 100644
--- a/include/drm-uapi/i915_drm.h
+++ b/include/drm-uapi/i915_drm.h
@@ -412,6 +412,14 @@ typedef struct drm_i915_irq_wait {
int irq_seq;
 } drm_i915_irq_wait_t;
 
+/*
+ * Different modes of per-process Graphics Translation Table,
+ * see I915_PARAM_HAS_ALIASING_PPGTT
+ */
+#define I915_GEM_PPGTT_NONE0
+#define I915_GEM_PPGTT_ALIASING1
+#define I915_GEM_PPGTT_FULL2
+
 /* Ioctl to query kernel params:
  */
 #define I915_PARAM_IRQ_ACTIVE1
@@ -529,6 +537,28 @@ typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_CS_TIMESTAMP_FREQUENCY 51
 
+/*
+ * Once upon a time we supposed that writes through the GGTT would be
+ * immediately in physical memory (once flushed out of the CPU path). However,
+ * on a few different processors and chipsets, this is not necessarily the case
+ * as the writes appear to be buffered internally. Thus a read of the backing
+ * storage (physical memory) via a different path (with different physical tags
+ * to the indirect write via the GGTT) will see stale values from before
+ * the GGTT write. Inside the kernel, we can for the most part keep track of
+ * the different read/write domains in use (e.g. set-domain), but the 
assumption
+ * of coherency is baked into the ABI, hence reporting its true state in this
+ * parameter.
+ *
+ * Reports true when writes via mmap_gtt are immediately visible following an
+ * lfence to flush the WCB.
+ *
+ * Reports false when writes via mmap_gtt are indeterminately delayed in an in
+ * internal buffer and are _not_ immediately visible to third parties accessing
+ * directly via mmap_cpu/mmap_wc. Use of mmap_gtt as part of an IPC
+ * communications channel when reporting false is strongly disadvised.
+ */
+#define I915_PARAM_MMAP_GTT_COHERENT   52
+
 typedef struct drm_i915_getparam {
__s32 param;
/*
@@ -1456,9 +1486,93 @@ struct drm_i915_gem_context_param {
 #define   I915_CONTEXT_MAX_USER_PRIORITY   1023 /* inclusive */
 #define   I915_CONTEXT_DEFAULT_PRIORITY0
 #define   I915_CONTEXT_MIN_USER_PRIORITY   -1023 /* inclusive */
+   /*
+* When using the following param, value should be a pointer to
+* drm_i915_gem_context_param_sseu.
+*/
+#define I915_CONTEXT_PARAM_SSEU0x7
+
+/*
+ * Not all clients may want to attempt automatic recover of a context after
+ * a hang (for example, some clients may only submit very small incremental
+ * batches relying on known logical state of previous batches which will never
+ * recover correctly and each attempt will hang), and so would prefer that
+ * the context is forever banned instead.
+ *
+ * If set to false (0), after a reset, subsequent (and in flight) rendering
+ * from this context is discarded, and the client will need to create a new
+ * context to use instead.
+ *
+ * If set to true (1), the kernel will automatically attempt to recover the
+ * context by skipping the hanging batch and executing the next batch starting
+ * from the default context state (discarding the incomplete logical context
+ * state lost due to the reset).
+ *
+ * On creation, all new contexts are marked as recoverable.
+ */
+#define I915_CONTEXT_PARAM_RECOVERABLE 0x8
__u64 value;
 };
 
+/**
+ * Context SSEU programming
+ *
+ * It may be necessary for either functional or performance reason to configure
+ * a context to run with a reduced number of SSEU (where SSEU stands for Slice/
+ * Sub-slice/EU).
+ *
+ * This is done by configuring SSEU configuration using the below
+ * @struct drm_i915_gem_context_param_sseu for every supported engine which
+ * userspace intends to use.
+ *
+ * Not all GPUs or engines support this functionality in which case an error
+ * code -ENODEV will be returned.
+ *
+ * Also, flexibility of possible SSEU configuration permutations varies between
+ * GPU generations and software imposed limitations. Requesting such a
+ * combination will return an error code of -EINVAL.
+ *
+ * NOTE: When perf/OA is active the context's SSEU configuration is ignored in
+ * favour of a single global setting.
+ */
+struct drm_i915_gem_context_param_sseu {
+   /*
+* Engine class & instance to be configured or queried.
+*/
+   __u16 engine_class;
+   __u16 engine_instance;
+
+   /*
+* Unused for now. Must be cleared to zero.
+*/
+   __u32 flags;
+
+   /*
+* Mask of slices 

[Mesa-dev] [PATCH 1/3] i965: Be resilient in the face of GPU hangs

2019-02-18 Thread Chris Wilson
If we hang the GPU and end up banning our context, we will no longer be
able to submit and abort with an error (exit(1) no less). As we submit
minimal incremental batches that rely on the logical context state of
previous batches, we can not rely on the kernel's recovery mechanism
which tries to restore the context back to a "golden" renderstate (the
default HW setup) and replay the batches in flight. Instead, we must
create a new context and set it up, including all the lost register
settings that we only apply once during setup, before allow the user to
continue rendering. The batches already submitted are lost
(unrecoverable) so there will be a momentarily glitch and lost rendering
across frames, but the application should be able to recover and
continue on fairly oblivious.

To make wedging even more likely, we use a new "no recovery" context
parameter that tells the kernel to not even attempt to replay any
batches in flight against the default context image, as experience shows
the HW is not always robust enough to cope with the conflicting state.

v2: Export brw_reset_state() to improve the amount of state we clobber
on return to a starting context. (Kenneth)

Cc: Kenneth Graunke 
Reviewed-by: Kenneth Graunke  # pre-uapi split
---
 src/mesa/drivers/dri/i965/brw_bufmgr.c| 11 +
 src/mesa/drivers/dri/i965/brw_bufmgr.h|  1 +
 src/mesa/drivers/dri/i965/brw_context.h   |  3 +++
 src/mesa/drivers/dri/i965/brw_state_upload.c  | 22 ++
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 23 +++
 5 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c 
b/src/mesa/drivers/dri/i965/brw_bufmgr.c
index b33a30930db..1248f8b9fa4 100644
--- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
+++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
@@ -1621,6 +1621,17 @@ brw_hw_context_set_priority(struct brw_bufmgr *bufmgr,
return err;
 }
 
+int
+brw_hw_context_get_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id)
+{
+   struct drm_i915_gem_context_param p = {
+  .ctx_id = ctx_id,
+  .param = I915_CONTEXT_PARAM_PRIORITY,
+   };
+   drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, );
+   return p.value; /* on error, return 0 i.e. default priority */
+}
+
 void
 brw_destroy_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id)
 {
diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.h 
b/src/mesa/drivers/dri/i965/brw_bufmgr.h
index 32fc7a553c9..9e80c2a831b 100644
--- a/src/mesa/drivers/dri/i965/brw_bufmgr.h
+++ b/src/mesa/drivers/dri/i965/brw_bufmgr.h
@@ -356,6 +356,7 @@ uint32_t brw_create_hw_context(struct brw_bufmgr *bufmgr);
 int brw_hw_context_set_priority(struct brw_bufmgr *bufmgr,
 uint32_t ctx_id,
 int priority);
+int brw_hw_context_get_priority(struct brw_bufmgr *bufmgr, uint32_t ctx_id);
 
 void brw_destroy_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id);
 
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 66fe5b3a8a0..4a306c4217a 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1647,6 +1647,9 @@ brw_get_graphics_reset_status(struct gl_context *ctx);
 void
 brw_check_for_reset(struct brw_context *brw);
 
+void
+brw_reset_state(struct brw_context *brw);
+
 /* brw_compute.c */
 extern void
 brw_init_compute_functions(struct dd_function_table *functions);
diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index 50049d325b3..b873cf1b58a 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -228,12 +228,8 @@ brw_copy_pipeline_atoms(struct brw_context *brw,
 
 void brw_init_state( struct brw_context *brw )
 {
-   struct gl_context *ctx = >ctx;
const struct gen_device_info *devinfo = >screen->devinfo;
 
-   /* Force the first brw_select_pipeline to emit pipeline select */
-   brw->last_pipeline = BRW_NUM_PIPELINES;
-
brw_init_caches(brw);
 
if (devinfo->gen >= 11)
@@ -257,6 +253,17 @@ void brw_init_state( struct brw_context *brw )
else
   gen4_init_atoms(brw);
 
+   brw_reset_state(brw);
+}
+
+
+void brw_reset_state( struct brw_context *brw )
+{
+   struct gl_context *ctx = >ctx;
+
+   /* Force the first brw_select_pipeline to emit pipeline select */
+   brw->last_pipeline = BRW_NUM_PIPELINES;
+
brw_upload_initial_gpu_state(brw);
 
brw->NewGLState = ~0;
@@ -267,6 +274,13 @@ void brw_init_state( struct brw_context *brw )
 */
brw->pma_stall_bits = ~0;
 
+   brw->no_depth_or_stencil = false;
+
+   brw->urb.vsize = 0;
+   brw->urb.gsize = 0;
+   brw->urb.hsize = 0;
+   brw->urb.dsize = 0;
+
/* Make sure that brw->ctx.NewDriverState has enough bits to hold all 
possible
 * dirty flags.
 */
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 

[Mesa-dev] [Bug 108949] RADV: Subgroup codegen is sub-optimal

2019-02-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108949

Connor Abbott  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #4 from Connor Abbott  ---
Yes, the LLVM patch to re-enable the DPP combining pass landed recently:
https://github.com/llvm-mirror/llvm/commit/a0ecdf4bba1ba47b4dd8550c5a8c4a3a9183832d#diff-ad4812397731e1d4ff6992207b4d38fa

So neither of these should be issues anymore.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 108949] RADV: Subgroup codegen is sub-optimal

2019-02-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108949

--- Comment #3 from Samuel Pitoiset  ---
Can this be closed now?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 109597] wreckfest issues with transparent objects & skybox

2019-02-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109597

Samuel Pitoiset  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #1 from Samuel Pitoiset  ---
Should be fixed with
https://cgit.freedesktop.org/mesa/mesa/commit/?id=0d8f09629377da9cf48ab4315574d69fdef5369d

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 109647] /usr/include/xf86drm.h:40:10: fatal error: drm.h: No such file or directory

2019-02-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109647

Michel Dänzer  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Michel Dänzer  ---


*** This bug has been marked as a duplicate of bug 109645 ***

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 109645] build error on arm64: tegra_screen.c:33: /usr/include/xf86drm.h:41:10: fatal error: drm.h: No such file or directory

2019-02-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109645

Michel Dänzer  changed:

   What|Removed |Added

 CC||v...@freedesktop.org

--- Comment #1 from Michel Dänzer  ---
*** Bug 109647 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] ac: use new LLVM 8 intrinsic when loading 16-bit values

2019-02-18 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

for both.

On Thu, Feb 14, 2019 at 2:39 PM Samuel Pitoiset
 wrote:
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/common/ac_llvm_build.c | 41 ++
>  1 file changed, 27 insertions(+), 14 deletions(-)
>
> diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
> index 3acf41728ac..867a13622f9 100644
> --- a/src/amd/common/ac_llvm_build.c
> +++ b/src/amd/common/ac_llvm_build.c
> @@ -1347,20 +1347,33 @@ ac_build_tbuffer_load_short(struct ac_llvm_context 
> *ctx,
> LLVMValueRef immoffset,
> LLVMValueRef glc)
>  {
> -   const char *name = "llvm.amdgcn.tbuffer.load.i32";
> -   LLVMTypeRef type = ctx->i32;
> -   LLVMValueRef params[] = {
> -   rsrc,
> -   vindex,
> -   voffset,
> -   soffset,
> -   immoffset,
> -   LLVMConstInt(ctx->i32, 
> V_008F0C_BUF_DATA_FORMAT_16, false),
> -   LLVMConstInt(ctx->i32, 
> V_008F0C_BUF_NUM_FORMAT_UINT, false),
> -   glc,
> -   ctx->i1false,
> -   };
> -   LLVMValueRef res = ac_build_intrinsic(ctx, name, type, params, 9, 0);
> +   unsigned dfmt = V_008F0C_BUF_DATA_FORMAT_16;
> +   unsigned nfmt = V_008F0C_BUF_NUM_FORMAT_UINT;
> +   LLVMValueRef res;
> +
> +   if (HAVE_LLVM >= 0x0800) {
> +   voffset = LLVMBuildAdd(ctx->builder, voffset, immoffset, "");
> +
> +   res = ac_build_llvm8_tbuffer_load(ctx, rsrc, vindex, voffset,
> + soffset, 1, dfmt, nfmt, glc,
> + false, true, true);
> +   } else {
> +   const char *name = "llvm.amdgcn.tbuffer.load.i32";
> +   LLVMTypeRef type = ctx->i32;
> +   LLVMValueRef params[] = {
> +   rsrc,
> +   vindex,
> +   voffset,
> +   soffset,
> +   immoffset,
> +   LLVMConstInt(ctx->i32, dfmt, false),
> +   LLVMConstInt(ctx->i32, nfmt, false),
> +   glc,
> +   ctx->i1false,
> +   };
> +   res = ac_build_intrinsic(ctx, name, type, params, 9, 0);
> +   }
> +
> return LLVMBuildTrunc(ctx->builder, res, ctx->i16, "");
>  }
>
> --
> 2.20.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: write the alpha channel of MRT0 when alpha coverage is enabled

2019-02-18 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

On Fri, Feb 15, 2019 at 6:00 PM Samuel Pitoiset
 wrote:
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109597
> Cc: 18.3 19.0 
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_pipeline.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
> index 9745a1f2aa7..6b54da2e31b 100644
> --- a/src/amd/vulkan/radv_pipeline.c
> +++ b/src/amd/vulkan/radv_pipeline.c
> @@ -511,6 +511,13 @@ radv_pipeline_compute_spi_color_formats(struct 
> radv_pipeline *pipeline,
>
> if (subpass->color_attachments[i].attachment == 
> VK_ATTACHMENT_UNUSED) {
> cf = V_028714_SPI_SHADER_ZERO;
> +
> +   if (blend->need_src_alpha & (1 << i)) {
> +   /* Write the alpha channel of MRT0 when alpha 
> coverage is
> +* enabled because the depth attachment needs 
> it.
> +*/
> +   col_format |= V_028714_SPI_SHADER_32_ABGR;
> +   }
> } else {
> struct radv_render_pass_attachment *attachment = 
> pass->attachments + subpass->color_attachments[i].attachment;
> bool blend_enable =
> @@ -689,6 +696,7 @@ radv_pipeline_init_blend_state(struct radv_pipeline 
> *pipeline,
>
> if (vkms && vkms->alphaToCoverageEnable) {
> blend.db_alpha_to_mask |= S_028B70_ALPHA_TO_MASK_ENABLE(1);
> +   blend.need_src_alpha |= 0x1;
> }
>
> blend.cb_target_mask = 0;
> --
> 2.20.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v4 00/40] intel: VK_KHR_shader_float16_int8 implementation

2019-02-18 Thread Iago Toral
FWIW, the is_partial_write() patch is not strictly required, so I think
it would be okay to not merge it until someone has a bit more time to
think about it if we are not sure that this is the right way to go. We
would lose copy propagation for SIMD8 dispatches, which is not great,
but it is probably something we can live temporarily until we have put
the time to think about this properly. Even if we decide that it is the
right approach, we could probably still work a bit more on it to better
understand which of the opt passes that rely on that funcitonality
should be using the new helper.
Iago
On Sat, 2019-02-16 at 09:59 -0600, Jason Ekstrand wrote:
> I believe I've now reviewed everything except some of the validator
> patches and the is_partial_write() patch.  The validator patches I'm
> hoping Matt or Curro can look at.  For the is_partial_write() patch,
> I just need to convince myself that it doesn't make the compiler
> significantly more bogus than it already is today.
> 
> On Tue, Feb 12, 2019 at 5:57 AM
> Iago Toral Quiroga  wrote:
> > The changes in this version address review feedback to v3. The most
> > significant
> > 
> > changes include:
> > 
> > 
> > 
> > 1. A more generic constant combining pass that can handle more
> > 
> > constant types (not just F and HF) requested by Jason.
> > 
> > 
> > 
> > 2. The addition of assembly validation for half-float restrictions,
> > and also
> > 
> > for mixed float mode, requested by Curro. It should be noted that
> > this
> > 
> > implementation of VK_KHR_shader_float16_int8 does not emit any
> > mixed mode float
> > 
> > instructions at this moment so I have not empirically validated the
> > restictions
> > 
> > implemented here.
> > 
> > 
> > 
> > As always, a branch with these patches is available for testing in
> > the
> > 
> > itoral/VK_KHR_shader_float16_int8 branch of the Igalia Mesa
> > repository at
> > 
> > https://github.com/Igalia/mesa.
> > 
> > 
> > 
> > Iago Toral Quiroga (40):
> > 
> >   compiler/nir: add an is_conversion field to nir_op_info
> > 
> >   intel/compiler: add a NIR pass to lower conversions
> > 
> >   intel/compiler: split float to 64-bit opcodes from int to 64-bit
> > 
> >   intel/compiler: handle b2i/b2f with other integer conversion
> > opcodes
> > 
> >   intel/compiler: assert restrictions on conversions to half-float
> > 
> >   intel/compiler: lower some 16-bit float operations to 32-bit
> > 
> >   intel/compiler: handle extended math restrictions for half-float
> > 
> >   intel/compiler: implement 16-bit fsign
> > 
> >   intel/compiler: drop unnecessary temporary from 32-bit fsign
> > 
> > implementation
> > 
> >   compiler/nir: add lowering option for 16-bit fmod
> > 
> >   compiler/nir: add lowering for 16-bit flrp
> > 
> >   compiler/nir: add lowering for 16-bit ldexp
> > 
> >   intel/compiler: add instruction setters for Src1Type and
> > Src2Type.
> > 
> >   intel/compiler: add new half-float register type for 3-src
> > 
> > instructions
> > 
> >   intel/compiler: don't compact 3-src instructions with Src1Type or
> > 
> > Src2Type bits
> > 
> >   intel/compiler: allow half-float on 3-source instructions since
> > gen8
> > 
> >   intel/compiler: set correct precision fields for 3-source float
> > 
> > instructions
> > 
> >   intel/compiler: fix ddx and ddy for 16-bit float
> > 
> >   intel/compiler: fix ddy for half-float in Broadwell
> > 
> >   intel/compiler: workaround for SIMD8 half-float MAD in gen8
> > 
> >   intel/compiler: split is_partial_write() into two variants
> > 
> >   intel/compiler: activate 16-bit bit-size lowerings also for 8-bit
> > 
> >   intel/compiler: rework conversion opcodes
> > 
> >   intel/compiler: implement isign for int8
> > 
> >   intel/compiler: ask for an integer type if requesting an 8-bit
> > type
> > 
> >   intel/eu: force stride of 2 on NULL register for Byte
> > instructions
> > 
> >   intel/compiler: generalize the combine constants pass
> > 
> >   intel/compiler: implement is_zero, is_one, is_negative_one for
> > 
> > 8-bit/16-bit
> > 
> >   intel/compiler: add a brw_reg_type_is_integer helper
> > 
> >   intel/compiler: fix cmod propagation for non 32-bit types
> > 
> >   intel/compiler: remove inexact algebraic optimizations from the
> > 
> > backend
> > 
> >   intel/compiler: skip MAD algebraic optimization for half-float or
> > 
> > mixed mode
> > 
> >   intel/compiler: also set F execution type for mixed float mode in
> > BDW
> > 
> >   intel/compiler: validate region restrictions for half-float
> > 
> > conversions
> > 
> >   intel/compiler: validate conversions between 64-bit and 8-bit
> > types
> > 
> >   intel/compiler: skip validating restrictions on operand types for
> > 
> > mixed float
> > 
> >   intel/compiler: validate region restrictions for mixed float mode
> > 
> >   compiler/spirv: move the check for Int8 capability
> > 
> >   anv/pipeline: support Float16 and Int8 SPIR-V capabilities in
> > gen8+
> > 
> >   anv/device: 

Re: [Mesa-dev] [PATCH v4 35/40] intel/compiler: validate conversions between 64-bit and 8-bit types

2019-02-18 Thread Iago Toral
On Sat, 2019-02-16 at 09:42 -0600, Jason Ekstrand wrote:
> On Tue, Feb 12, 2019 at 5:56 AM Iago Toral Quiroga  > wrote:
> > ---
> > 
> >  src/intel/compiler/brw_eu_validate.c| 10 +-
> > 
> >  src/intel/compiler/test_eu_validate.cpp | 46
> > +
> > 
> >  2 files changed, 55 insertions(+), 1 deletion(-)
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_eu_validate.c
> > b/src/intel/compiler/brw_eu_validate.c
> > 
> > index 203641fecb9..b1fdd1ce941 100644
> > 
> > --- a/src/intel/compiler/brw_eu_validate.c
> > 
> > +++ b/src/intel/compiler/brw_eu_validate.c
> > 
> > @@ -533,10 +533,18 @@
> > general_restrictions_based_on_operand_types(const struct
> > gen_device_info *devinf
> > 
> > 
> > 
> > /* From the BDW+ PRM:
> > 
> >  *
> > 
> > -*"There is no direct conversion from HF to DF or DF to HF.
> > 
> > +*"There is no direct conversion from B/UB to DF or DF to
> > B/UB.
> > 
> > +* There is no direct conversion from B/UB to Q/UQ or Q/UQ
> > to B/UB.
> > 
> > +* There is no direct conversion from HF to DF or DF to HF.
> > 
> >  * There is no direct conversion from HF to Q/UQ or Q/UQ to
> > HF."
> > 
> >  */
> > 
> > enum brw_reg_type src0_type = brw_inst_src0_type(devinfo,
> > inst);
> > 
> > +
> > 
> > +   ERROR_IF(brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
> > 
> > +((dst_type_size == 1 && type_sz(src0_type) == 8) ||
> > 
> > + (dst_type_size == 8 && type_sz(src0_type) == 1)),
> > 
> > +"There are no direct conversion between 64-bit types
> > and B/UB");
> > 
> > +
> > 
> > ERROR_IF(brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
> > 
> >  ((dst_type == BRW_REGISTER_TYPE_HF &&
> > type_sz(src0_type) == 8) ||
> > 
> >   (dst_type_size == 8 && src0_type ==
> > BRW_REGISTER_TYPE_HF)),
> > 
> > diff --git a/src/intel/compiler/test_eu_validate.cpp
> > b/src/intel/compiler/test_eu_validate.cpp
> > 
> > index 1557b6d2452..06beb53eb5d 100644
> > 
> > --- a/src/intel/compiler/test_eu_validate.cpp
> > 
> > +++ b/src/intel/compiler/test_eu_validate.cpp
> > 
> > @@ -848,6 +848,52 @@ TEST_P(validation_test,
> > byte_destination_relaxed_alignment)
> > 
> > }
> > 
> >  }
> > 
> > 
> > 
> > +TEST_P(validation_test, byte_64bit_conversion)
> > 
> > +{
> > 
> > +   static const struct {
> > 
> > +  enum brw_reg_type dst_type;
> > 
> > +  enum brw_reg_type src_type;
> > 
> > +  unsigned dst_stride;
> > 
> > +  bool expected_result;
> > 
> > +   } inst[] = {
> > 
> > +#define INST(dst_type, src_type, dst_stride, expected_result) 
> >\
> > 
> > +  {   
> >\
> > 
> > + BRW_REGISTER_TYPE_##dst_type,   
> > \
> > 
> > + BRW_REGISTER_TYPE_##src_type,   
> > \
> > 
> > + BRW_HORIZONTAL_STRIDE_##dst_stride, 
> > \
> > 
> > + expected_result, 
> >\
> > 
> > +  }
> > 
> > +
> > 
> > +  INST(B,  Q, 1, false),
> > 
> > +  INST(B, UQ, 1, false),
> > 
> > +  INST(B, DF, 1, false),
> > 
> > +
> > 
> > +  INST(B,  Q, 2, false),
> > 
> > +  INST(B, UQ, 2, false),
> > 
> > +  INST(B, DF, 2, false),
> > 
> > +
> > 
> > +  INST(B,  Q, 4, false),
> > 
> > +  INST(B, UQ, 4, false),
> > 
> > +  INST(B, DF, 4, false),
> 
> Probably want some tests with a B or UB source too. :-)  With those
> added,

Sure, I added UB variants for all tests locally.
> Reviewed-by: Jason Ekstrand 
> 
> > +
> > 
> > +#undef INST
> > 
> > +   };
> > 
> > +
> > 
> > +   if (devinfo.gen < 8)
> > 
> > +  return;
> > 
> > +
> > 
> > +   for (unsigned i = 0; i < sizeof(inst) / sizeof(inst[0]); i++) {
> > 
> > +  if (!devinfo.has_64bit_types && type_sz(inst[i].src_type) ==
> > 8)
> > 
> > + continue;
> > 
> > +
> > 
> > +  brw_MOV(p, retype(g0, inst[i].dst_type), retype(g0,
> > inst[i].src_type));
> > 
> > +  brw_inst_set_dst_hstride(, last_inst,
> > inst[i].dst_stride);
> > 
> > +  EXPECT_EQ(inst[i].expected_result, validate(p));
> > 
> > +
> > 
> > +  clear_instructions(p);
> > 
> > +   }
> > 
> > +}
> > 
> > +
> > 
> >  TEST_P(validation_test, half_float_conversion)
> > 
> >  {
> > 
> > static const struct {
> > 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 9/9] ir3/compiler: Handle the new ir3 intrinsics for SSBO

2019-02-18 Thread Eduardo Lima Mitev
On 2/13/19 10:29 PM, Eduardo Lima Mitev wrote:
> These intrinsics have the offset in dwords already computed in the last
> source, so the change here is basically using that instead of emitting
> the ir3_SHR to divide the byte-offset by 4.
> 
> The improvement in shader stats is dramatic, of up to ~15% in
> instruction count in some cases. Tested on a5xx.
> 
> shader-db is unfortunately not very useful here because shaders that use
> SSBO require GLSL versions that are not supported by freedreno yet.
> 
> For examples, most Khronos CTS tests under 'dEQP-GLES31.functional.ssbo.*'
> are helped.
> 
> A random case:
> 
> dEQP-GLES31.functional.ssbo.layout.2_level_array.packed.row_major_mat3x2
> 
> with current master:
> 
> ; CL prog 14/1: 1252 instructions, 0 half, 48 full
> ; 8 const, 8 constlen
> ; 61 (ss), 43 (sy)
> 
> with the SSBO dword-offset moved to NIR:
> 
> ; CL prog 14/1: 1053 instructions, 0 half, 45 full
> ; 7 const, 7 constlen
> ; 34 (ss), 73 (sy)
> 
> The SHR previously emitted for every single SSBO instruction disappears
> in most cases, and the dword-offset ends up embedded in the STGB
> instruction as immediate in many cases as well.
> 
> No regressions observed with relevant CTS and piglit tests.
> ---
>  src/freedreno/ir3/ir3_compiler_nir.c | 71 +++-
>  1 file changed, 48 insertions(+), 23 deletions(-)
> 
> diff --git a/src/freedreno/ir3/ir3_compiler_nir.c 
> b/src/freedreno/ir3/ir3_compiler_nir.c
> index 0e141f03181..c494913f254 100644
> --- a/src/freedreno/ir3/ir3_compiler_nir.c
> +++ b/src/freedreno/ir3/ir3_compiler_nir.c
> @@ -760,7 +760,12 @@ emit_intrinsic_load_ssbo(struct ir3_context *ctx, 
> nir_intrinsic_instr *intr,
>   offset,
>   create_immed(b, 0),
>   }, 2);
> - src1 = ir3_SHR_B(b, offset, 0, create_immed(b, 2), 0);
> +
> + /* intrinsic->src[2] holds the dword-offset as placed by
> +  * 'ir3_nir_lower_io_offsets' pass.
> +  */
> + assert(intr->intrinsic == nir_intrinsic_load_ssbo_ir3);

This should be debug_assert(). Fixed locally.

> + src1 = ir3_get_src(ctx, >src[2])[0];
>  
>   ldgb = ir3_LDGB(b, create_immed(b, const_offset->u32[0]), 0,
>   src0, 0, src1, 0);
> @@ -798,7 +803,13 @@ emit_intrinsic_store_ssbo(struct ir3_context *ctx, 
> nir_intrinsic_instr *intr)
>* nir already *= 4:
>*/
>   src0 = ir3_create_collect(ctx, ir3_get_src(ctx, >src[0]), ncomp);
> - src1 = ir3_SHR_B(b, offset, 0, create_immed(b, 2), 0);
> +
> + /* intrinsic->src[3] holds the dword-offset as placed by
> +  * 'ir3_nir_lower_io_offsets' pass.
> +  */
> + assert(intr->intrinsic == nir_intrinsic_store_ssbo_ir3);

Same here.

> + src1 = ir3_get_src(ctx, >src[3])[0];
> +
>   src2 = ir3_create_collect(ctx, (struct ir3_instruction*[]){
>   offset,
>   create_immed(b, 0),
> @@ -869,40 +880,50 @@ emit_intrinsic_atomic_ssbo(struct ir3_context *ctx, 
> nir_intrinsic_instr *intr)
>* Note that nir already multiplies the offset by four
>*/
>   src0 = ir3_get_src(ctx, >src[2])[0];
> - src1 = ir3_SHR_B(b, offset, 0, create_immed(b, 2), 0);
> +
> + /* intrinsic->src[3] holds the dword-offset as placed by
> +  * 'ir3_nir_lower_io_offsets' pass. It doesn't handle
> +  * 'atomic_comp_swap', though, because that intrinsic already used
> +  * all its 4 sources.
> +  */
> + if (intr->intrinsic == nir_intrinsic_ssbo_atomic_comp_swap)
> + src1 = ir3_SHR_B(b, offset, 0, create_immed(b, 2), 0);
> + else
> + src1 = ir3_get_src(ctx, >src[3])[0];
> +
>   src2 = ir3_create_collect(ctx, (struct ir3_instruction*[]){
>   offset,
>   create_immed(b, 0),
>   }, 2);
>  
>   switch (intr->intrinsic) {
> - case nir_intrinsic_ssbo_atomic_add:
> + case nir_intrinsic_ssbo_atomic_add_ir3:
>   atomic = ir3_ATOMIC_ADD_G(b, ssbo, 0, src0, 0, src1, 0, src2, 
> 0);
>   break;
> - case nir_intrinsic_ssbo_atomic_imin:
> + case nir_intrinsic_ssbo_atomic_imin_ir3:
>   atomic = ir3_ATOMIC_MIN_G(b, ssbo, 0, src0, 0, src1, 0, src2, 
> 0);
>   type = TYPE_S32;
>   break;
> - case nir_intrinsic_ssbo_atomic_umin:
> + case nir_intrinsic_ssbo_atomic_umin_ir3:
>   atomic = ir3_ATOMIC_MIN_G(b, ssbo, 0, src0, 0, src1, 0, src2, 
> 0);
>   break;
> - case nir_intrinsic_ssbo_atomic_imax:
> + case nir_intrinsic_ssbo_atomic_imax_ir3:
>   atomic = ir3_ATOMIC_MAX_G(b, ssbo, 0, src0, 0, src1, 0, src2, 
> 0);
>   type = TYPE_S32;
>   break;
> - case nir_intrinsic_ssbo_atomic_umax:
> + case nir_intrinsic_ssbo_atomic_umax_ir3:
>   atomic = ir3_ATOMIC_MAX_G(b, ssbo, 0, src0, 0, src1, 0, src2, 
> 0);
>   break;
> - case nir_intrinsic_ssbo_atomic_and:
> + case nir_intrinsic_ssbo_atomic_and_ir3:
>

[Mesa-dev] [Bug 109659] Missing OpenGL symbols in OSMesa Gallium when building with meson

2019-02-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109659

Bug ID: 109659
   Summary: Missing OpenGL symbols in OSMesa Gallium when building
with meson
   Product: Mesa
   Version: 18.3
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Severity: normal
  Priority: medium
 Component: Drivers/OSMesa
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: pierre.guil...@lip6.fr
QA Contact: mesa-dev@lists.freedesktop.org

Building OSMesa Gallium with meson leads to missing exported OpenGL symbols in
libOSMesa.so compared to using the autotools/configure script.

Use case/context: ParaView in-situ analysis and rendering
(https://blog.kitware.com/messing-with-mesa-for-paraview-5-0vtk-7-0/)

When using meson to build OSMesa from Mesa 18.3:

$ meson [sourcedir] -Dosmesa=gallium
$ ninja

I noticed that libOSMesa.so does not export OpenGL symbols:

$ nm -D src/gallium/targets/osmesa/libOSMesa.so | grep " T "
0006cbe0 T OSMesaColorClamp
0006c6a0 T OSMesaCreateContext
0006c1a0 T OSMesaCreateContextAttribs
0006c620 T OSMesaCreateContextExt
0006cc60 T OSMesaDestroyContext
0006c0a0 T OSMesaGetColorBuffer
0006c0f0 T OSMesaGetCurrentContext
0006cf50 T OSMesaGetDepthBuffer
0006c6c0 T OSMesaGetIntegerv
0006cb70 T OSMesaGetProcAddress
0006c7d0 T OSMesaMakeCurrent
0006cc00 T OSMesaPixelStore
0006cee0 T OSMesaPostprocess

However, using the autotools/configure script (Mesa 18.3):

$ [sourcedir]/autogen.sh --enable-gallium-osmesa
or, from release tarball
$ [sourcedir]/configure --enable-gallium-osmesa
$ make

the generated library exports the whole OpenGL API:

$ nm -D lib/gallium/libOSMesa.so | grep " T "
00558f80 T glAccum
0055e300 T glActiveShaderProgram
[...]
0055ae40 T glWindowPos3sv
0055ae40 T glWindowPos3svARB
00557200 T OSMesaColorClamp
005569d0 T OSMesaCreateContext
005564d0 T OSMesaCreateContextAttribs
00556950 T OSMesaCreateContextExt
00557280 T OSMesaDestroyContext
005563d0 T OSMesaGetColorBuffer
00556420 T OSMesaGetCurrentContext
00556ea0 T OSMesaGetDepthBuffer
005569f0 T OSMesaGetIntegerv
00557190 T OSMesaGetProcAddress
00556b00 T OSMesaMakeCurrent
00557220 T OSMesaPixelStore
005572b0 T OSMesaPostprocess

This behavior has been witnessed on up-to-date ArchLinux and Ubuntu 18.04.

Assuming this is a bug in the Meson build script, one quick fix would be to
edit src/gallium/targets/osmesa/meson.build and move libglapi_static from the
link_with section to the link_whole section in the osmesa library declaration:

diff --git a/src/gallium/targets/osmesa/meson.build
b/src/gallium/targets/osmesa/meson.build
index b4ae8f4b6ec..e873e311aa0 100644
--- a/src/gallium/targets/osmesa/meson.build
+++ b/src/gallium/targets/osmesa/meson.build
@@ -43,9 +43,9 @@ libosmesa = shared_library(
 inc_gallium_drivers,
   ],
   link_depends : osmesa_link_deps,
-  link_whole : [libosmesa_st],
+  link_whole : [libosmesa_st, libglapi_static],
   link_with : [
-libmesa_gallium, libgallium, libglapi_static, libws_null,
osmesa_link_with,
+libmesa_gallium, libgallium, libws_null, osmesa_link_with,
   ],
   dependencies : [
 dep_selinux, dep_thread, dep_clock, dep_unwind,

However, being quite new at building Mesa and maybe missing the big picture,
I'm not sure
* if what I'm describing here is really a bug, or is intentional and
* if there is some unforeseen consequences of the aforementioned quick fix.

Is there a more appropriate way to use OpenGL through OSMesa?

Similar to: https://bugs.freedesktop.org/show_bug.cgi?id=94489

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions

2019-02-18 Thread Iago Toral
On Sat, 2019-02-16 at 09:40 -0600, Jason Ekstrand wrote:
> On Tue, Feb 12, 2019 at 11:53 AM Iago Toral Quiroga <
> ito...@igalia.com> wrote:
> > ---
> > 
> >  src/intel/compiler/brw_eu_validate.c|  64 -
> > 
> >  src/intel/compiler/test_eu_validate.cpp | 122
> > 
> > 
> >  2 files changed, 185 insertions(+), 1 deletion(-)
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_eu_validate.c
> > b/src/intel/compiler/brw_eu_validate.c
> > 
> > index 000a05cb6ac..203641fecb9 100644
> > 
> > --- a/src/intel/compiler/brw_eu_validate.c
> > 
> > +++ b/src/intel/compiler/brw_eu_validate.c
> > 
> > @@ -531,7 +531,69 @@
> > general_restrictions_based_on_operand_types(const struct
> > gen_device_info *devinf
> > 
> > exec_type_size == 8 && dst_type_size == 4)
> > 
> >dst_type_size = 8;
> > 
> > 
> > 
> > -   if (exec_type_size > dst_type_size) {
> > 
> > +   /* From the BDW+ PRM:
> > 
> > +*
> > 
> > +*"There is no direct conversion from HF to DF or DF to HF.
> > 
> > +* There is no direct conversion from HF to Q/UQ or Q/UQ to
> > HF."
> > 
> > +*/
> > 
> > +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo,
> > inst);
> > 
> > +   ERROR_IF(brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
> > 
> > +((dst_type == BRW_REGISTER_TYPE_HF &&
> > type_sz(src0_type) == 8) ||
> > 
> > + (dst_type_size == 8 && src0_type ==
> > BRW_REGISTER_TYPE_HF)),
> > 
> > +"There are no direct conversion between 64-bit types
> > and HF");
> > 
> > +
> > 
> > +   /* From the BDW+ PRM:
> > 
> > +*
> > 
> > +*   "Conversion between Integer and HF (Half Float) must be
> > 
> > +*DWord-aligned and strided by a DWord on the destination."
> > 
> > +*
> > 
> > +* But this seems to be expanded on CHV and SKL+ by:
> > 
> > +*
> > 
> > +*   "There is a relaxed alignment rule for word destinations.
> > When
> > 
> > +*the destination type is word (UW, W, HF), destination
> > data types
> > 
> > +*can be aligned to either the lowest word or the second
> > lowest
> > 
> > +*word of the execution channel. This means the destination
> > data
> > 
> > +*words can be either all in the even word locations or all
> > in the
> > 
> > +*odd word locations."
> > 
> > +*
> > 
> > +* We do not implement the second rule as is though, since
> > empirical testing
> > 
> > +* shows inconsistencies:
> > 
> > +*   - It suggests that packed 16-bit is not allowed, which is
> > not true.
> > 
> > +*   - It suggests that conversions from Q/DF to W (which need
> > to be 64-bit
> > 
> > +* aligned on the destination) are not possible, which is
> > not true.
> > 
> > +*   - It suggests that conversions from 16-bit executions
> > types to W need
> > 
> > +* to be 32-bit aligned, which doesn't seem to be
> > necessary.
> > 
> > +*
> > 
> > +* So from this rule we only validate the implication that
> > conversion from
> > 
> > +* F to HF needs to be DWord aligned too (in BDW this is
> > limited to
> > 
> > +* conversions from integer types).
> > 
> > +*/
> > 
> > +   bool is_half_float_conversion =
> > 
> > +   brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
> > 
> > +   dst_type != src0_type &&
> > 
> > +   (dst_type == BRW_REGISTER_TYPE_HF || src0_type ==
> > BRW_REGISTER_TYPE_HF);
> > 
> > +
> > 
> > +   if (is_half_float_conversion) {
> > 
> > +  assert(devinfo->gen >= 8);
> > 
> > +
> > 
> > +  if ((dst_type == BRW_REGISTER_TYPE_HF &&
> > brw_reg_type_is_integer(src0_type)) ||
> > 
> > +  (brw_reg_type_is_integer(dst_type) && src0_type ==
> > BRW_REGISTER_TYPE_HF)) {
> > 
> > + ERROR_IF(dst_stride * dst_type_size != 4,
> > 
> > +  "Conversions between integer and half-float must
> > be strided "
> > 
> > +  "by a DWord on the destination");
> 
> Does this mean stride must be 4B or does it mean a multiple of 4B?
>  
> > +
> > 
> > + unsigned subreg = brw_inst_dst_da1_subreg_nr(devinfo,
> > inst);
> > 
> > + ERROR_IF(subreg % 4 != 0,
> > 
> > +  "Conversions between integer and half-float must
> > be aligned "
> > 
> > +  "to a DWord on the destination");
> > 
> > +  } else if ((devinfo->is_cherryview || devinfo->gen >= 9) &&
> > 
> > + dst_type == BRW_REGISTER_TYPE_HF) {
> > 
> > + ERROR_IF(dst_stride != 2,
> 
> Should this be dst_stride != 2 or dst_stride == 1?  If dst_stride
> were, say 4, that would place them all in even or all in odd
> locations.  It's only if dst_stride == 1 that you end up with both
> even and odd.

I think this needs to be exactly a DWord for both the stride and the
alignment. When Curro explained this he made the case that what is
probably happening under the hood is that there is a promotion of the
exec type to 32-bit, and then the following 

Re: [Mesa-dev] [PATCH v2 18/41] radv: lower 16-bit flrp

2019-02-18 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset 

On 2/16/19 1:22 AM, Rhys Perry wrote:

Signed-off-by: Rhys Perry 
---
  src/amd/vulkan/radv_shader.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
index 1dcb0606246..adba730ad8b 100644
--- a/src/amd/vulkan/radv_shader.c
+++ b/src/amd/vulkan/radv_shader.c
@@ -53,6 +53,7 @@
  static const struct nir_shader_compiler_options nir_options = {
.vertex_id_zero_based = true,
.lower_scmp = true,
+   .lower_flrp16 = true,
.lower_flrp32 = true,
.lower_flrp64 = true,
.lower_device_index_to_zero = true,

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 17/41] ac/nir: implement half-float nir_op_ldexp

2019-02-18 Thread Samuel Pitoiset

Patches 14-17 are:

Reviewed-by: Samuel Pitoiset 

On 2/16/19 1:22 AM, Rhys Perry wrote:

Signed-off-by: Rhys Perry 
---
  src/amd/common/ac_nir_to_llvm.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 8b0e07d2930..0e5946dfdb3 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -829,8 +829,10 @@ static void visit_alu(struct ac_nir_context *ctx, const 
nir_alu_instr *instr)
break;
case nir_op_ldexp:
src[0] = ac_to_float(>ac, src[0]);
-   if (ac_get_elem_bits(>ac, LLVMTypeOf(src[0])) == 32)
+   if (ac_get_elem_bits(>ac, def_type) == 32)
result = ac_build_intrinsic(>ac, 
"llvm.amdgcn.ldexp.f32", ctx->ac.f32, src, 2, AC_FUNC_ATTR_READNONE);
+   else if (ac_get_elem_bits(>ac, def_type) == 16)
+   result = ac_build_intrinsic(>ac, 
"llvm.amdgcn.ldexp.f16", ctx->ac.f16, src, 2, AC_FUNC_ATTR_READNONE);
else
result = ac_build_intrinsic(>ac, 
"llvm.amdgcn.ldexp.f64", ctx->ac.f64, src, 2, AC_FUNC_ATTR_READNONE);
break;

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 12/41] ac/nir: make ac_build_isign work on all bit sizes

2019-02-18 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset 

On 2/16/19 1:22 AM, Rhys Perry wrote:

v2: don't use ac_get_zero(), ac_get_one() and ac_int_of_size()

Signed-off-by: Rhys Perry 
---
  src/amd/common/ac_llvm_build.c | 27 ---
  1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index db937eb66fb..3b2257e8bf0 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -2064,30 +2064,11 @@ LLVMValueRef ac_build_fract(struct ac_llvm_context 
*ctx, LLVMValueRef src0,
  LLVMValueRef ac_build_isign(struct ac_llvm_context *ctx, LLVMValueRef src0,
unsigned bitsize)
  {
-   LLVMValueRef cmp, val, zero, one;
-   LLVMTypeRef type;
-
-   switch (bitsize) {
-   case 64:
-   type = ctx->i64;
-   zero = ctx->i64_0;
-   one = ctx->i64_1;
-   break;
-   case 32:
-   type = ctx->i32;
-   zero = ctx->i32_0;
-   one = ctx->i32_1;
-   break;
-   case 16:
-   type = ctx->i16;
-   zero = ctx->i16_0;
-   one = ctx->i16_1;
-   break;
-   default:
-   unreachable(!"invalid bitsize");
-   break;
-   }
+   LLVMTypeRef type = LLVMIntTypeInContext(ctx->context, bitsize);
+   LLVMValueRef zero = LLVMConstInt(type, 0, false);
+   LLVMValueRef one = LLVMConstInt(type, 1, false);
  
+	LLVMValueRef cmp, val;

cmp = LLVMBuildICmp(ctx->builder, LLVMIntSGT, src0, zero, "");
val = LLVMBuildSelect(ctx->builder, cmp, one, src0, "");
cmp = LLVMBuildICmp(ctx->builder, LLVMIntSGE, val, zero, "");

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 10/41] ac/nir: make ac_build_clamp work on all bit sizes

2019-02-18 Thread Samuel Pitoiset

We usually use 'name' instead of 'intr'.

With that renamed, patch is:

Reviewed-by: Samuel Pitoiset 

On 2/16/19 1:21 AM, Rhys Perry wrote:

v2: don't use ac_get_zerof() and ac_get_onef()

Signed-off-by: Rhys Perry 
---
  src/amd/common/ac_llvm_build.c | 13 +
  1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index b53d9c7ff8c..667f9700764 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -1597,16 +1597,20 @@ ac_build_umsb(struct ac_llvm_context *ctx,
  LLVMValueRef ac_build_fmin(struct ac_llvm_context *ctx, LLVMValueRef a,
   LLVMValueRef b)
  {
+   char intr[64];
+   snprintf(intr, sizeof(intr), "llvm.minnum.f%d", ac_get_elem_bits(ctx, 
LLVMTypeOf(a)));
LLVMValueRef args[2] = {a, b};
-   return ac_build_intrinsic(ctx, "llvm.minnum.f32", ctx->f32, args, 2,
+   return ac_build_intrinsic(ctx, intr, LLVMTypeOf(a), args, 2,
  AC_FUNC_ATTR_READNONE);
  }
  
  LLVMValueRef ac_build_fmax(struct ac_llvm_context *ctx, LLVMValueRef a,

   LLVMValueRef b)
  {
+   char intr[64];
+   snprintf(intr, sizeof(intr), "llvm.maxnum.f%d", ac_get_elem_bits(ctx, 
LLVMTypeOf(a)));
LLVMValueRef args[2] = {a, b};
-   return ac_build_intrinsic(ctx, "llvm.maxnum.f32", ctx->f32, args, 2,
+   return ac_build_intrinsic(ctx, intr, LLVMTypeOf(a), args, 2,
  AC_FUNC_ATTR_READNONE);
  }
  
@@ -1633,8 +1637,9 @@ LLVMValueRef ac_build_umin(struct ac_llvm_context *ctx, LLVMValueRef a,
  
  LLVMValueRef ac_build_clamp(struct ac_llvm_context *ctx, LLVMValueRef value)

  {
-   return ac_build_fmin(ctx, ac_build_fmax(ctx, value, ctx->f32_0),
-ctx->f32_1);
+   LLVMTypeRef t = LLVMTypeOf(value);
+   return ac_build_fmin(ctx, ac_build_fmax(ctx, value, LLVMConstReal(t, 
0.0)),
+LLVMConstReal(t, 1.0));
  }
  
  void ac_build_export(struct ac_llvm_context *ctx, struct ac_export_args *a)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 09/41] ac/nir: fix 64-bit nir_op_f2f16_rtz

2019-02-18 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset 

On 2/16/19 1:21 AM, Rhys Perry wrote:

Signed-off-by: Rhys Perry 
---
  src/amd/common/ac_nir_to_llvm.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 691d444db05..741059b5f1a 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -886,6 +886,8 @@ static void visit_alu(struct ac_nir_context *ctx, const 
nir_alu_instr *instr)
break;
case nir_op_f2f16_rtz:
src[0] = ac_to_float(>ac, src[0]);
+   if (LLVMTypeOf(src[0]) == ctx->ac.f64)
+   src[0] = LLVMBuildFPTrunc(ctx->ac.builder, src[0], ctx->ac.f32, 
"");
LLVMValueRef param[2] = { src[0], ctx->ac.f32_0 };
result = ac_build_cvt_pkrtz_f16(>ac, param);
result = LLVMBuildExtractElement(ctx->ac.builder, result, ctx->ac.i32_0, 
"");

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 07/41] ac/nir: implement 8-bit nir_load_const_instr

2019-02-18 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset 

On 2/16/19 1:21 AM, Rhys Perry wrote:

Signed-off-by: Rhys Perry 
---
  src/amd/common/ac_nir_to_llvm.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index b260142c177..f39232b91a1 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -1114,6 +1114,10 @@ static void visit_load_const(struct ac_nir_context *ctx,
  
  	for (unsigned i = 0; i < instr->def.num_components; ++i) {

switch (instr->def.bit_size) {
+   case 8:
+   values[i] = LLVMConstInt(element_type,
+instr->value.u8[i], false);
+   break;
case 16:
values[i] = LLVMConstInt(element_type,
 instr->value.u16[i], false);

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 06/41] ac/nir: fix 16-bit ssbo stores

2019-02-18 Thread Samuel Pitoiset
Does this fix anything know? There is a 16-bit version of tbuffer.store, 
maybe we should use it?


On 2/16/19 1:21 AM, Rhys Perry wrote:

Signed-off-by: Rhys Perry 
---
  src/amd/common/ac_nir_to_llvm.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 89a78b43c6f..b260142c177 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -1586,6 +1586,8 @@ static void visit_store_ssbo(struct ac_nir_context *ctx,
} else if (num_bytes == 2) {
store_name = "llvm.amdgcn.tbuffer.store.i32";
data_type = ctx->ac.i32;
+   data = LLVMBuildBitCast(ctx->ac.builder, data, ctx->ac.i16, 
"");
+   data = LLVMBuildZExt(ctx->ac.builder, data, data_type, 
"");
LLVMValueRef tbuffer_params[] = {
data,
rsrc,

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/38] radv, ac: 16-bit and 8-bit arithmetic and 8-bit storage

2019-02-18 Thread Samuel Pitoiset


On 2/16/19 1:21 AM, Rhys Perry wrote:

This series add support for:
- VK_KHR_shader_float16_int8
- VK_AMD_gpu_shader_half_float
- VK_AMD_gpu_shader_int16
- VK_KHR_8bit_storage
on VI+. Half floats are disabled on LLVM 7 because of a bug causing large
memory usage and long (or unbounded) compilation times with some CTS
tests.

It is written against the following patch series:
- https://patchwork.freedesktop.org/series/53454/ (v4)
- https://patchwork.freedesktop.org/series/53660/ (v1)

With LLVM 9, there are no reproducable Vulkan CTS regressions with Vega
and VI except for
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_float_64_to_16.*
which fails or crashes because of unrelated radv bugs with 64-bit varyings
and because the tests use VK_FORMAT_R64_SFLOAT as a vertex format even
though radv does not support it.


test bug?

The two NIR related patches (22 and 25) should be sent separately, 
otherwise people working on NIR might miss them.




With LLVM 9, there are no reproducable piglit regressions except for
glsl-array-bounds-12.shader_test because of a LLVM bug when
SLP vectorization is enabled.

With LLVM 8, there are no reproducable Vulkan CTS regressions with Vega
and VI except for those with LLVM 9 and a couple of tests because of a
LLVM bug after the SLP vectorizer and with the current lack of fallback
for 16-bit interpolation on LLVM versions before LLVM 9.

With LLVM 7, there are no reproducable Vulkan CTS regressions with Vega
and VI except for those with LLVM 9 and a couple of tests because of a
LLVM bug after the SLP vectorizer.

The SLP vectorization patch is marked as WIP because it exposes LLVM bugs
with piglit's glsl-array-bounds-12.shader_test, some Vulkan CTS tests and
some shader-db test for a game I can't remember. It also over-vectorizes
32-bit code which can cause significant worsening in generated code
quality.

The 16-bit interpolation patch is marked as WIP because it currently
requires intrinsics only available in LLVM 9 and does not have a fallback.

A branch on Github containing this series can be found at:
https://github.com/pendingchaos/mesa/commits/radv_fp16_int16_int8_v2

v2: rebase
v2: implement 16-bit interpolation
v2: move LLVMAddSLPVectorizePass to after LLVMAddEarlyCSEMemSSAPass
v2: run vectorization unconditionally on GFX9 and later
v2: remove ac_get_one(), ac_get_zero(), ac_get_onef() and ac_get_zerof()
v2: remove ac_int_of_size()
v2: fix 64-bit visit_load_var()
v2: mark VK_KHR_8bit_storage as DONE in features.txt
v2: mark SLP vectorization patch as WIP
v2: fix C++ style comment

Rhys Perry (41):
   radv: bitcast 16-bit outputs to integers
   radv: ensure export arguments are always float
   ac: add various helpers for float16/int16/int8
   ac/nir: implement 8-bit push constant, ssbo and ubo loads
   ac/nir: implement 8-bit ssbo stores
   ac/nir: fix 16-bit ssbo stores
   ac/nir: implement 8-bit nir_load_const_instr
   ac/nir: implement 8-bit conversions
   ac/nir: fix 64-bit nir_op_f2f16_rtz
   ac/nir: make ac_build_clamp work on all bit sizes
   ac/nir: make ac_build_fract work on all bit sizes
   ac/nir: make ac_build_isign work on all bit sizes
   ac/nir: make ac_build_fsign work on all bit sizes
   ac/nir: make ac_build_fdiv support 16-bit floats
   ac/nir: implement half-float nir_op_frcp
   ac/nir: implement half-float nir_op_frsq
   ac/nir: implement half-float nir_op_ldexp
   radv: lower 16-bit flrp
   ac/nir: support half floats in emit_b2f
   ac/nir: make emit_b2i work on all bit sizes
   ac/nir: implement 16-bit shifts
   compiler/nir: add lowering option for 16-bit ffma
   ac/nir: implement 16-bit ac_build_ddxy
   ac/nir: implement 8 and 16 bit ac_build_readlane
   nir: make bitfield_reverse and ifind_msb work with all integers
   ac/nir: make ac_find_lsb work on all bit sizes
   ac/nir: make ac_build_umsb work on all bit sizes
   ac/nir: implement 8 and 16 bit ac_build_imsb
   ac/nir: make ac_build_bit_count work on all bit sizes
   ac/nir: make ac_build_bitfield_reverse work on all bit sizes
   ac/nir: implement 16-bit pack/unpack opcodes
   ac/nir: add 8-bit types to glsl_base_to_llvm_type
   ac/nir,radv: create an array of varying output types
   ac/nir: store all outputs as f32
   radv: store all fragment shader inputs as f32
   radv: handle all fragment output types
   WIP: radv,ac: implement 16-bit interpolation
   WIP: ac,radv: run LLVM's SLP vectorizer
   ac/nir: generate better code for nir_op_f2f16_rtz
   ac/nir: have nir_op_f2f16 round to zero
   radv,docs: expose float16, int16 and int8 features and extensions

  docs/features.txt|   2 +-
  src/amd/common/ac_llvm_build.c   | 325 +++
  src/amd/common/ac_llvm_build.h   |  18 +-
  src/amd/common/ac_llvm_util.c|   8 +-
  src/amd/common/ac_nir_to_llvm.c  | 268 +++
  src/amd/common/ac_shader_abi.h   |   1 +
  src/amd/vulkan/radv_device.c |  17 ++
  

Re: [Mesa-dev] [PATCH v4 27/40] intel/compiler: generalize the combine constants pass

2019-02-18 Thread Iago Toral
On Sat, 2019-02-16 at 09:29 -0600, Jason Ekstrand wrote:
> On Tue, Feb 12, 2019 at 5:57 AM Iago Toral Quiroga  > wrote:
> > At the very least we need it to handle HF too, since we are doing
> > 
> > constant propagation for MAD and LRP, which relies on this pass
> > 
> > to promote the immediates to GRF in the end, but ideally
> > 
> > we want it to support even more types so we can take advantage
> > 
> > of it to improve register pressure in some scenarios.
> > 
> > ---
> > 
> >  .../compiler/brw_fs_combine_constants.cpp | 202
> > --
> > 
> >  1 file changed, 180 insertions(+), 22 deletions(-)
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_fs_combine_constants.cpp
> > b/src/intel/compiler/brw_fs_combine_constants.cpp
> > 
> > index 7343f77bb45..5d79f1a0826 100644
> > 
> > --- a/src/intel/compiler/brw_fs_combine_constants.cpp
> > 
> > +++ b/src/intel/compiler/brw_fs_combine_constants.cpp
> > 
> > @@ -36,6 +36,7 @@
> > 
> > 
> > 
> >  #include "brw_fs.h"
> > 
> >  #include "brw_cfg.h"
> > 
> > +#include "util/half_float.h"
> > 
> > 
> > 
> >  using namespace brw;
> > 
> > 
> > 
> > @@ -114,8 +115,17 @@ struct imm {
> > 
> >  */
> > 
> > exec_list *uses;
> > 
> > 
> > 
> > -   /** The immediate value.  We currently only handle floats. */
> > 
> > -   float val;
> > 
> > +   /** The immediate value */
> > 
> > +   union {
> > 
> > +  char bytes[8];
> > 
> > +  float f;
> > 
> > +  int32_t d;
> > 
> > +  int16_t w;
> > 
> > +   };
> > 
> > +   uint8_t size;
> > 
> > +
> > 
> > +   /** When promoting half-float we need to account for certain
> > restrictions */
> > 
> > +   bool is_half_float;
> > 
> > 
> > 
> > /**
> > 
> >  * The GRF register and subregister number where we've decided
> > to store the
> > 
> > @@ -145,10 +155,11 @@ struct table {
> > 
> >  };
> > 
> > 
> > 
> >  static struct imm *
> > 
> > -find_imm(struct table *table, float val)
> > 
> > +find_imm(struct table *table, void *data, uint8_t size)
> > 
> >  {
> > 
> > for (int i = 0; i < table->len; i++) {
> > 
> > -  if (table->imm[i].val == val) {
> > 
> > +  if (table->imm[i].size == size &&
> > 
> > +  !memcmp(table->imm[i].bytes, data, size)) {
> > 
> >   return >imm[i];
> > 
> >}
> > 
> > }
> > 
> > @@ -190,6 +201,96 @@ compare(const void *_a, const void *_b)
> > 
> > return a->first_use_ip - b->first_use_ip;
> > 
> >  }
> > 
> > 
> > 
> > +static bool
> > 
> > +get_constant_value(const struct gen_device_info *devinfo,
> > 
> > +   const fs_inst *inst, uint32_t src_idx,
> > 
> > +   void *out, brw_reg_type *out_type)
> > 
> > +{
> > 
> > +   const bool can_do_source_mods = inst-
> > >can_do_source_mods(devinfo);
> > 
> > +   const fs_reg *src = >src[src_idx];
> > 
> > +
> > 
> > +   *out_type = src->type;
> > 
> > +
> > 
> > +   switch (*out_type) {
> > 
> > +   case BRW_REGISTER_TYPE_F: {
> > 
> > +  float val = !can_do_source_mods ? src->f : fabsf(src->f);
> > 
> > +  memcpy(out, , 4);
> > 
> > +  break;
> > 
> > +   }
> > 
> > +   case BRW_REGISTER_TYPE_HF: {
> > 
> > +  uint16_t val = src->d & 0xu;
> > 
> > +  if (can_do_source_mods)
> > 
> > + val =
> > _mesa_float_to_half(fabsf(_mesa_half_to_float(val)));
> > 
> > +  memcpy(out, , 2);
> > 
> > +  break;
> > 
> > +   }
> > 
> > +   case BRW_REGISTER_TYPE_D: {
> > 
> > +  int32_t val = !can_do_source_mods ? src->d : abs(src->d);
> > 
> > +  memcpy(out, , 4);
> > 
> > +  break;
> > 
> > +   }
> > 
> > +   case BRW_REGISTER_TYPE_UD:
> > 
> > +  memcpy(out, >ud, 4);
> > 
> > +  break;
> > 
> > +   case BRW_REGISTER_TYPE_W: {
> > 
> > +  int16_t val = src->d & 0xu;
> > 
> > +  if (can_do_source_mods)
> > 
> > + val = abs(val);
> > 
> > +  memcpy(out, , 2);
> > 
> > +  break;
> > 
> > +   }
> > 
> > +   case BRW_REGISTER_TYPE_UW:
> > 
> > +  memcpy(out, >ud, 2);
> > 
> > +  break;
> 
> You could also throw in DF and Q types.  This is probably sufficient
> for now though.

Sure, I was waiting to do that until I started enabling constant
propagation of 64-bit types but there is no reason why we can't leave
the pass prepared for that I guess. It seems that for platforms that
don't support 64-bit types we should not be seeing 64-bit constants
here, so I guess there is no risk.
>  
> > +   default:
> > 
> > +  return false;
> > 
> > +   };
> > 
> > +
> > 
> > +   return true;
> > 
> > +}
> > 
> > +
> > 
> > +static struct brw_reg
> > 
> > +build_imm_reg_for_copy(struct imm *imm)
> > 
> > +{
> > 
> > +   switch (imm->size) {
> > 
> > +   case 4:
> > 
> > +  return brw_imm_d(imm->d);
> > 
> > +   case 2:
> > 
> > +  return brw_imm_w(imm->w);
> > 
> > +   default:
> > 
> > +  unreachable("not implemented");
> > 
> > +   }
> > 
> > +}
> > 
> > +
> > 
> > +static inline uint32_t
> > 
> > +get_alignment_for_imm(const struct imm *imm)
> > 
> > +{
> > 
> > 

Re: [Mesa-dev] [PATCH v2 02/41] radv: ensure export arguments are always float

2019-02-18 Thread Samuel Pitoiset

Patch 1-2 are:

Reviewed-by: Samuel Pitoiset 

On 2/16/19 1:21 AM, Rhys Perry wrote:

So that the signature is correct and consistent, the inputs to a export
intrinsic should always be 32-bit floats.

This and the previous commit fixes a large amount crashes from
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_int_*
tests

Fixes: b722b29f10d ('radv: add support for 16bit input/output')
Signed-off-by: Rhys Perry 
---
  src/amd/vulkan/radv_nir_to_llvm.c | 6 +-
  1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/src/amd/vulkan/radv_nir_to_llvm.c 
b/src/amd/vulkan/radv_nir_to_llvm.c
index a8268c44ecf..d3795eec403 100644
--- a/src/amd/vulkan/radv_nir_to_llvm.c
+++ b/src/amd/vulkan/radv_nir_to_llvm.c
@@ -2429,12 +2429,8 @@ si_llvm_init_export_args(struct radv_shader_context *ctx,
} else
memcpy(>out[0], values, sizeof(values[0]) * 4);
  
-	for (unsigned i = 0; i < 4; ++i) {

-   if (!(args->enabled_channels & (1 << i)))
-   continue;
-
+   for (unsigned i = 0; i < 4; ++i)
args->out[i] = ac_to_float(>ac, args->out[i]);
-   }
  }
  
  static void

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v4 20/40] intel/compiler: workaround for SIMD8 half-float MAD in gen8

2019-02-18 Thread Iago Toral
On Sat, 2019-02-16 at 09:02 -0600, Jason Ekstrand wrote:
> On Tue, Feb 12, 2019 at 5:56 AM Iago Toral Quiroga  > wrote:
> > Empirical testing shows that gen8 has a bug where MAD instructions
> > with
> > 
> > a half-float source starting at a non-zero offset fail to execute
> > 
> > properly.
> > 
> > 
> > 
> > This scenario usually happened in SIMD8 executions, where we used
> > to
> > 
> > pack vector components Y and W in the second half of SIMD registers
> > 
> > (therefore, with a 16B offset). It looks like we are not currently
> > doing
> > 
> > this any more but this would handle the situation properly if we
> > ever
> > 
> > happen to produce code like this again.
> > 
> > 
> > 
> > v2 (Jason):
> > 
> >  - Move this workaround to the lower_regioning pass as an
> > additional case
> > 
> >to has_invalid_src_region()
> > 
> >  - Do not apply the workaround if the stride of the source operand
> > is 0,
> > 
> >testing suggests the problem doesn't exist in that case.
> > 
> > 
> > 
> > Reviewed-by: Topi Pohjolainen  (v1)
> > 
> > ---
> > 
> >  src/intel/compiler/brw_fs_lower_regioning.cpp | 39 +
> > --
> > 
> >  1 file changed, 28 insertions(+), 11 deletions(-)
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp
> > b/src/intel/compiler/brw_fs_lower_regioning.cpp
> > 
> > index df50993dee6..7c70cfab535 100644
> > 
> > --- a/src/intel/compiler/brw_fs_lower_regioning.cpp
> > 
> > +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
> > 
> > @@ -109,20 +109,37 @@ namespace {
> > 
> > has_invalid_src_region(const gen_device_info *devinfo, const
> > fs_inst *inst,
> > 
> >unsigned i)
> > 
> > {
> > 
> > -  if (is_unordered(inst)) {
> > 
> > +  if (is_unordered(inst))
> > 
> >   return false;
> > 
> > -  } else {
> > 
> > - const unsigned dst_byte_stride = inst->dst.stride *
> > type_sz(inst->dst.type);
> > 
> > - const unsigned src_byte_stride = inst->src[i].stride *
> > 
> > -type_sz(inst->src[i].type);
> > 
> > - const unsigned dst_byte_offset = reg_offset(inst->dst) %
> > REG_SIZE;
> > 
> > - const unsigned src_byte_offset = reg_offset(inst->src[i]) 
> > % REG_SIZE;
> > 
> > 
> > 
> > - return has_dst_aligned_region_restriction(devinfo, inst)
> > &&
> > 
> > -!is_uniform(inst->src[i]) &&
> > 
> > -(src_byte_stride != dst_byte_stride ||
> > 
> > - src_byte_offset != dst_byte_offset);
> > 
> > +  /* Empirical testing shows that Broadwell has a bug
> > affecting half-float
> > 
> > +   * MAD instructions when any of its sources has a non-zero
> > offset, such
> > 
> > +   * as:
> > 
> > +   *
> > 
> > +   * mad(8) g18<1>HF -g17<4,4,1>HF g14.8<4,4,1>HF g11<4,4,1>HF
> > { align16 1Q };
> > 
> > +   *
> > 
> > +   * We used to generate code like this for SIMD8 executions
> > where we
> > 
> > +   * used to pack components Y and W of a vector at offset 16B
> > of a SIMD
> > 
> > +   * register. The problem doesn't occur if the stride of the
> > source is 0.
> > 
> > +   */
> > 
> > +  if (devinfo->gen == 8 &&
> > 
> > +  inst->opcode == BRW_OPCODE_MAD &&
> > 
> > +  inst->src[i].type == BRW_REGISTER_TYPE_HF &&
> > 
> > +  inst->src[i].offset > 0 &&
> > 
> > +  inst->src[i].stride != 0) {
> 
> The above assumes the register is a GRF.  Perhaps we should make this
> assumption explicit?  Or you can use some of curro's helpers and add
> another one to get the subreg offset.  Also, the real problem here
> isn't offset > 0, it's offset % REG_SIZE > 0.  If we have an array of
> 4 things, they'll be at offsets 0, 16, 32, and 48.  We don't want an
> offset of 32 triggering it.
> 

You're right. We already have a helper available that does what we
want, reg_offset() in brw_ir_fs.h. I have this now:
  if (devinfo->gen == 8 &&  inst->opcode == BRW_OPCODE_MAD
&&  inst->src[i].type == BRW_REGISTER_TYPE_HF
&&  reg_offset(inst->src[i]) % REG_SIZE > 0 &&  inst-
>src[i].stride != 0) { return true;  }
> > + return true;
> > 
> >}
> > 
> > +
> > 
> > +  const unsigned dst_byte_stride = inst->dst.stride *
> > type_sz(inst->dst.type);
> > 
> > +  const unsigned src_byte_stride = inst->src[i].stride *
> > 
> > + type_sz(inst->src[i].type);
> > 
> > +  const unsigned dst_byte_offset = reg_offset(inst->dst) %
> > REG_SIZE;
> > 
> > +  const unsigned src_byte_offset = reg_offset(inst->src[i]) %
> > REG_SIZE;
> > 
> > +
> > 
> > +  return has_dst_aligned_region_restriction(devinfo, inst) &&
> > 
> > + !is_uniform(inst->src[i]) &&
> > 
> > + (src_byte_stride != dst_byte_stride ||
> > 
> > +  src_byte_offset != dst_byte_offset);
> > 
> > }
> > 
> > 
> > 
> > /*
> > 
___
mesa-dev mailing