Re: [Mesa-dev] NaN behavior in GLSL (was Re: [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs()))

2017-01-13 Thread Marek Olšák
On Fri, Jan 13, 2017 at 5:25 PM, Jason Ekstrand  wrote:
> On Fri, Jan 13, 2017 at 4:05 AM, Marek Olšák  wrote:
>>
>> On Fri, Jan 13, 2017 at 3:37 AM, Ilia Mirkin  wrote:
>> > On Thu, Jan 12, 2017 at 9:13 PM, Jason Ekstrand 
>> > wrote:
>> >> Unless, of course, it's controlled by the same hardware bit... Clearly,
>> >> we
>> >> can can give you abs on rsq without denorm flushing (easy shader hacks)
>> >> but
>> >> not the other way around.
>> >
>> > OK, so somehow I missed that earlier. However there's an interesting
>> > section in the PRM:
>> >
>> >
>> > https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol07-3d_media_gpgpu.pdf
>> >
>> > on PDF page 854, "Dismissed Legacy Behaviors" which has a list of
>> > suggested IEEE 754 deviations for DX9. One of them is indeed that 0 *
>> > x = 0, but another is that input NaNs be propagated with certain
>> > exceptions. Also they suggest that RCP(0)/RSQ(0) = fmax. Interesting.
>> >
>> > So at this point, the zero_wins thing is pretty much blown. i965
>> > appears to have an all-or-nothing approach, and additionally that
>> > approach doesn't match up exactly to what NVIDIA does (or at least I'm
>> > not aware of a clamp-everything mode).
>> >
>> > This will take some thought to figure out how something can be
>> > specified so that a single spec works for both i965 and nv/amd. OTOH
>> > we could have two different specs that just expose different things -
>> > e.g. i965 could expose a MESA_shader_float_alt_mode or whatever which
>> > is spec'd to do the things that the PRM says, and nv/amd have the
>> > MESA_shader_float_zero_wins ext which does what we were talking about
>> > earlier.
>> >
>> > I'm open to other suggestions too.
>>
>> There is also the "small" problem that it would take a non-trivial
>> effort for us on the LLVM side. You guys can flip a switch. We can't.
>
>
> Don't you have to expend that effort for ARB programs anyway?  I thought
> they weren't supposed to generate NaN either.

No, we don't, because st/mesa adds abs before RSQ and the driver
implements POW as log+mul+exp, where mul follows the rule
0*anything=0. I don't think any other opcode follows that rule though.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] fixup! EGL: Implement the libglvnd interface for EGL (v2)

2017-01-13 Thread Kyle Brenneman

I can if that's preferable.

On 01/11/2017 04:24 PM, Timo Aaltonen wrote:

On 05.01.2017 23:29, Kyle Brenneman wrote:

---
  src/egl/generate/eglFunctionList.py | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/egl/generate/eglFunctionList.py 
b/src/egl/generate/eglFunctionList.py
index b19b5f7..80cb834 100644
--- a/src/egl/generate/eglFunctionList.py
+++ b/src/egl/generate/eglFunctionList.py
@@ -53,12 +53,14 @@ method values:
  Select the vendor that owns the current context.
  """
  
-def _eglFunc(name, method, static=False, public=False, inheader=None, prefix="", extension=None, retval=None):

+def _eglFunc(name, method, static=None, public=False, inheader=None, 
prefix="dispatch_", extension=None, retval=None):
  """
  A convenience function to define an entry in the EGL function list.
  """
+if static is None:
+static = (not public and method != "custom")
  if inheader is None:
-inheader = (not public)
+inheader = (not static)
  values = {
  "method" : method,
  "prefix" : prefix,

You probably need to send a v3 with this added?





___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

2017-01-13 Thread Emil Velikov
From: Emil Velikov 

At the moment we support 5+ different implementations each with varying
amount of bugs - from thread safely problems [1], to outright broken
implementation(s) [2]

In order to accommodate these we have 150+ lines of configure script and
extra two configure toggles. Whist an actual implementation being
~200loc and our current compat wrapping ~250.

Let's not forget that different people use different code paths, thus
effectively makes it harder to test and debug since the default
implementation is automatically detected.

To minimise all these lovely experiences, import the "100% Public
Domain" OpenBSD sha1 implementation. Clearly document any changes needed
to get building correctly, since many/most of those can be upstreamed
making future syncs easier.

As an added bonus this will avoid all the 'fun' experiences trying to
integrate it with the Android and SCons builds.

Bugzilla [1]: https://bugs.freedesktop.org/show_bug.cgi?id=94904
Bugzilla [2]: https://bugs.freedesktop.org/show_bug.cgi?id=97967
Cc: Mark Janes 
Cc: Vinson Lee 
Cc: Tapani Pälli 
Cc: Jonathan Gray 
Signed-off-by: Emil Velikov 
---
 configure.ac | 161 +--
 src/compiler/glsl/tests/cache_test.c |   5 -
 src/mesa/main/shaderapi.c|   6 -
 src/util/Makefile.am |   3 -
 src/util/Makefile.sources|   2 +
 src/util/SConscript  |   5 -
 src/util/disk_cache.c|   4 -
 src/util/disk_cache.h|  42 --
 src/util/mesa-sha1.c | 242 +--
 src/util/sha1/README |  55 
 src/util/sha1/sha1.c | 173 +
 src/util/sha1/sha1.h |  47 +++
 12 files changed, 279 insertions(+), 466 deletions(-)
 create mode 100644 src/util/sha1/README
 create mode 100644 src/util/sha1/sha1.c
 create mode 100644 src/util/sha1/sha1.h

diff --git a/configure.ac b/configure.ac
index 459f3e8b0a..5772b378c7 100644
--- a/configure.ac
+++ b/configure.ac
@@ -9,7 +9,6 @@ dnl Copyright © 2009-2014 Jon TURNEY
 dnl Copyright © 2011-2012 Benjamin Franzke
 dnl Copyright © 2008-2014 David Airlie
 dnl Copyright © 2009-2013 Brian Paul
-dnl Copyright © 2003-2007 Keith Packard, Daniel Stone
 dnl
 dnl Permission is hereby granted, free of charge, to any person obtaining a
 dnl copy of this software and associated documentation files (the "Software"),
@@ -1432,151 +1431,6 @@ if test "x$enable_gallium_osmesa" = xyes; then
 fi
 fi
 
-# SHA1 hashing
-AC_ARG_WITH([sha1],
-
[AS_HELP_STRING([--with-sha1=libc|libmd|libnettle|libgcrypt|libcrypto|libsha1|CommonCrypto|CryptoAPI],
-[choose SHA1 implementation])])
-case "x$with_sha1" in
-x | xlibc | xlibmd | xlibnettle | xlibgcrypt | xlibcrypto | xlibsha1 | 
xCommonCrypto | xCryptoAPI)
-  ;;
-*)
-AC_MSG_ERROR([Illegal value for --with-sha1: $with_sha1])
-esac
-
-AC_CHECK_FUNC([SHA1Init], [HAVE_SHA1_IN_LIBC=yes])
-if test "x$with_sha1" = x && test "x$HAVE_SHA1_IN_LIBC" = xyes; then
-   with_sha1=libc
-fi
-if test "x$with_sha1" = xlibc && test "x$HAVE_SHA1_IN_LIBC" != xyes; then
-   AC_MSG_ERROR([sha1 in libc requested but not found])
-fi
-if test "x$with_sha1" = xlibc; then
-   AC_DEFINE([HAVE_SHA1_IN_LIBC], [1],
-   [Use libc SHA1 functions])
-   SHA1_LIBS=""
-fi
-AC_CHECK_FUNC([CC_SHA1_Init], [HAVE_SHA1_IN_COMMONCRYPTO=yes])
-if test "x$with_sha1" = x && test "x$HAVE_SHA1_IN_COMMONCRYPTO" = xyes; then
-   with_sha1=CommonCrypto
-fi
-if test "x$with_sha1" = xCommonCrypto && test "x$HAVE_SHA1_IN_COMMONCRYPTO" != 
xyes; then
-   AC_MSG_ERROR([CommonCrypto requested but not found])
-fi
-if test "x$with_sha1" = xCommonCrypto; then
-   AC_DEFINE([HAVE_SHA1_IN_COMMONCRYPTO], [1],
-   [Use CommonCrypto SHA1 functions])
-   SHA1_LIBS=""
-fi
-dnl stdcall functions cannot be tested with AC_CHECK_LIB
-AC_CHECK_HEADER([wincrypt.h], [HAVE_SHA1_IN_CRYPTOAPI=yes], [], [#include 
])
-if test "x$with_sha1" = x && test "x$HAVE_SHA1_IN_CRYPTOAPI" = xyes; then
-   with_sha1=CryptoAPI
-fi
-if test "x$with_sha1" = xCryptoAPI && test "x$HAVE_SHA1_IN_CRYPTOAPI" != xyes; 
then
-   AC_MSG_ERROR([CryptoAPI requested but not found])
-fi
-if test "x$with_sha1" = xCryptoAPI; then
-   AC_DEFINE([HAVE_SHA1_IN_CRYPTOAPI], [1],
-   [Use CryptoAPI SHA1 functions])
-   SHA1_LIBS=""
-fi
-AC_CHECK_LIB([md], [SHA1Init], [HAVE_LIBMD=yes])
-if test "x$with_sha1" = x && test "x$HAVE_LIBMD" = xyes; then
-   with_sha1=libmd
-fi
-if test "x$with_sha1" = xlibmd && test "x$HAVE_LIBMD" != xyes; then
-   AC_MSG_ERROR([libmd requested but not found])
-fi
-if test "x$with_sha1" = xlibmd; then
-   AC_DEFINE([HAVE_SHA1_IN_LIBMD], [1],
- [Use libmd SHA1 functions])

[Mesa-dev] [Bug 98428] Undefined non-weak-symbol in dri-drivers

2017-01-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98428

--- Comment #16 from NicolasChauvet  ---
I went deeper in this issue assuming the real fix is to build glapi shared and
link to it anyway. (I'm not sure if there is any users that will find the
dlopening glapi only as needed useful over not building glapi completely).

If there are real users, maybe this can be made a build time configuration
switch ? (In this case I will try to make a v4).

For Fedora users, there is a tests repo with this patch added:
https://copr.fedorainfracloud.org/coprs/kwizart/glvnd/
dnf copr enable kwizart/glvnd
dnf update

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/va: delay calling begin_frame until we have all parameters

2017-01-13 Thread Andy Furniss

Nayan Deshmukh wrote:

On Fri, Jan 13, 2017 at 9:54 PM, Andy Furniss  wrote:



Would be interesting to see if you see the same with this vid
which easily shows the corruption.

https://drive.google.com/drive/folders/0BxP5-S1t9VEEbkR4dWhTUFozV2s?usp=sharing

Looks bad --hwdec-vaapi with or without --vo=vaapi


with --hwdec=vaapi and --vo=vaapi I see the corruption. But without
--vo=vaapi it uses VAAPI EGL interop and leads to this error
unsupported VA image format unknown


Ok and thanks for looking into the buzilla bug.

I don't know why you get egl interop - I get "normal" opengl and don't
know how force mpv to try egl.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] NaN behavior in GLSL (was Re: [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs()))

2017-01-13 Thread Jason Ekstrand
On Fri, Jan 13, 2017 at 8:43 AM, Marek Olšák  wrote:

> On Fri, Jan 13, 2017 at 5:25 PM, Jason Ekstrand 
> wrote:
> > On Fri, Jan 13, 2017 at 4:05 AM, Marek Olšák  wrote:
> >>
> >> On Fri, Jan 13, 2017 at 3:37 AM, Ilia Mirkin 
> wrote:
> >> > On Thu, Jan 12, 2017 at 9:13 PM, Jason Ekstrand  >
> >> > wrote:
> >> >> Unless, of course, it's controlled by the same hardware bit...
> Clearly,
> >> >> we
> >> >> can can give you abs on rsq without denorm flushing (easy shader
> hacks)
> >> >> but
> >> >> not the other way around.
> >> >
> >> > OK, so somehow I missed that earlier. However there's an interesting
> >> > section in the PRM:
> >> >
> >> >
> >> > https://01.org/sites/default/files/documentation/intel-gfx-
> prm-osrc-skl-vol07-3d_media_gpgpu.pdf
> >> >
> >> > on PDF page 854, "Dismissed Legacy Behaviors" which has a list of
> >> > suggested IEEE 754 deviations for DX9. One of them is indeed that 0 *
> >> > x = 0, but another is that input NaNs be propagated with certain
> >> > exceptions. Also they suggest that RCP(0)/RSQ(0) = fmax. Interesting.
> >> >
> >> > So at this point, the zero_wins thing is pretty much blown. i965
> >> > appears to have an all-or-nothing approach, and additionally that
> >> > approach doesn't match up exactly to what NVIDIA does (or at least I'm
> >> > not aware of a clamp-everything mode).
> >> >
> >> > This will take some thought to figure out how something can be
> >> > specified so that a single spec works for both i965 and nv/amd. OTOH
> >> > we could have two different specs that just expose different things -
> >> > e.g. i965 could expose a MESA_shader_float_alt_mode or whatever which
> >> > is spec'd to do the things that the PRM says, and nv/amd have the
> >> > MESA_shader_float_zero_wins ext which does what we were talking about
> >> > earlier.
> >> >
> >> > I'm open to other suggestions too.
> >>
> >> There is also the "small" problem that it would take a non-trivial
> >> effort for us on the LLVM side. You guys can flip a switch. We can't.
> >
> >
> > Don't you have to expend that effort for ARB programs anyway?  I thought
> > they weren't supposed to generate NaN either.
>
> No, we don't, because st/mesa adds abs before RSQ and the driver
> implements POW as log+mul+exp, where mul follows the rule
> 0*anything=0. I don't think any other opcode follows that rule though.
>

Ah.  That makes sense.  Do you also implement DIV as MUL+RCP?  If so, the
two of those should take care of NaN getting generated in the shader.  We'd
still have to do something about inf and maybe denorms.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] glsl: Use hash table cloning in copy propagation

2017-01-13 Thread Thomas Helland
2017-01-13 18:41 GMT+01:00 Vladislav Egorov :
> 13.01.2017 15:31, Tapani Pälli пишет:
>>
>>
>>
>> On 01/12/2017 09:23 PM, Thomas Helland wrote:
>>>
>>> Walking the whole hash table, inserting entries by hashing them first
>>> is just a really really bad idea. We can simply memcpy the whole thing.
>>
>>
>> Maybe it is just 'really' not 'really really' since I don't spot any
>> difference in time running the torture test in bug #94477 (oscillates close
>> to 120s with both with and without these patches), I would expect at least
>> some difference as it is utilizing this path a lot. Did you measure
>> performance difference?
>>
>
> It wouldn't help the torture case from the bug, because that shader doesn't
> have LOOP and IF blocks, so more efficient copying the ACP for LOOP/IF
> blocks would not be even touched.
>
> Quick benchmark of Tom's patches on shader-db.
>
> Default shader-db, ./run -1, 10 runs:
>
>   BEFOREAFTER
> softpipe  3.20s 3.15s
> radeonsi  5.17s 5.12s
> i965/Haswell  7.33s 7.19s
>
> On my full shader-db (50K+ shaders from games):
>
>   BEFORE   AFTER
> softpipe (5 runs) 156.6s   153.9s
> i965  625s 613s
>
> So it brings 1-2% speed across the board.

What he said. It only helps when there are if's or loops.
The other patch I wrote based on Connor's suggestion makes a big impact.
But as he found out, and I confirmed, yesterday, the approach doesn't work.
So it is back to the drawing board on that one. And I thought I was so close :-/
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] shader-db: Update the README

2017-01-13 Thread Matt Turner
On Fri, Jan 13, 2017 at 10:04 AM, Elie Tournier  wrote:
> Use the binary to run shader-db instead of run.py
>
> Signed-off-by: Elie Tournier 
> ---
>  README | 18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/README b/README
> index 5e9bb2d..6f6a7e2 100644
> --- a/README
> +++ b/README
> @@ -1,6 +1,6 @@
>  === What ===
>
> -A giant pile of shaders from various apps, for whatever purpose.  In
> +A giant pile of shaders from various apps, for whatever purpose. In
>  particular, we use it to capture assembly output of the shader
>  compiler for analysis of regressions in compiler behavior.
>
> @@ -16,10 +16,16 @@ MESA_SHADER_CAPTURE_PATH=dirpath executable
>
>  # "fdupes" can be used to remove duplicates
>
> +=== Compiling ===
> +
> +Some libraries are required when building. See section "Dependencies" below.
> +To build the binary, do:
> +make
>
>  === i965 Usage ===
>
>  === Running shaders ===
> +
>  ./run shaders 2> err | tee new-run
>
>  # To run just a subset:
> @@ -34,8 +40,8 @@ To compile shaders for an i965 PCI ID different from your 
> system, pass
>  to run.
>
>  === Analysis ===
> -./report.py old-run new-run
>
> +./report.py old-run new-run
>
>  === radeonsi Usage ===
>
> @@ -46,6 +52,7 @@ to run.
>  Note that a debug mesa build required (ie. --enable-debug)
>
>  === Analysis ===
> +
>  ./si-report.py old-run new-run
>
>  === freedreno Usage ===
> @@ -59,15 +66,22 @@ Note that a debug mesa build required (ie. --enable-debug)
>  -1 option for disabling multi-threading is required to avoid garbled shader 
> dumps.
>
>  === Analysis ===
> +
>  ./fd-report.py old-run new-run
>
>  === Dependencies ===
> +
>  run requires some GNU C extensions, render nodes (/dev/dri/renderD128),
>  libepoxy, OpenMP, and Mesa configured with --with-egl-platforms=x11,drm
>
>  === jemalloc ===
> +
>  Since run compiles shaders in different threads, malloc/free locking overhead
>  from inside Mesa can be expensive. Preloading jemalloc can cut significant
>  amounts of time:
>
>  LD_PRELOAD=/usr/lib64/libjemalloc.so.1 ./run shaders 2> err | tee new-run
> +
> +=== Depreciated ===

Typo: Deprecated

Otherwise, fine by me.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

2017-01-13 Thread Emil Velikov
On 13 January 2017 at 19:22, Vladislav Egorov  wrote:
> 13.01.2017 19:51, Emil Velikov пишет:
>>
>> From: Emil Velikov 
>>
>> At the moment we support 5+ different implementations each with varying
>> amount of bugs - from thread safely problems [1], to outright broken
>> implementation(s) [2]
>>
>> In order to accommodate these we have 150+ lines of configure script and
>> extra two configure toggles. Whist an actual implementation being
>> ~200loc and our current compat wrapping ~250.
>>
>> Let's not forget that different people use different code paths, thus
>> effectively makes it harder to test and debug since the default
>> implementation is automatically detected.
>>
>> To minimise all these lovely experiences, import the "100% Public
>> Domain" OpenBSD sha1 implementation. Clearly document any changes needed
>> to get building correctly, since many/most of those can be upstreamed
>> making future syncs easier.
>
>
> It can hurt performance.
This is not performance critical path ;-) If that ever changes we can
rethink our options.

Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

2017-01-13 Thread Matt Turner
On Fri, Jan 13, 2017 at 11:51 AM, Dylan Baker  wrote:
> Quoting Emil Velikov (2017-01-13 08:51:31)
>> From: Emil Velikov 
>>
>> At the moment we support 5+ different implementations each with varying
>> amount of bugs - from thread safely problems [1], to outright broken
>> implementation(s) [2]
>>
>> In order to accommodate these we have 150+ lines of configure script and
>> extra two configure toggles. Whist an actual implementation being
>> ~200loc and our current compat wrapping ~250.
>>
>> Let's not forget that different people use different code paths, thus
>> effectively makes it harder to test and debug since the default
>> implementation is automatically detected.
>>
>> To minimise all these lovely experiences, import the "100% Public
>> Domain" OpenBSD sha1 implementation. Clearly document any changes needed
>> to get building correctly, since many/most of those can be upstreamed
>> making future syncs easier.
>>
>> As an added bonus this will avoid all the 'fun' experiences trying to
>> integrate it with the Android and SCons builds.
>>
>> Bugzilla [1]: https://bugs.freedesktop.org/show_bug.cgi?id=94904
>> Bugzilla [2]: https://bugs.freedesktop.org/show_bug.cgi?id=97967
>> Cc: Mark Janes 
>> Cc: Vinson Lee 
>> Cc: Tapani Pälli 
>> Cc: Jonathan Gray 
>> Signed-off-by: Emil Velikov 
>> ---
>>  configure.ac | 161 +--
>>  src/compiler/glsl/tests/cache_test.c |   5 -
>>  src/mesa/main/shaderapi.c|   6 -
>>  src/util/Makefile.am |   3 -
>>  src/util/Makefile.sources|   2 +
>>  src/util/SConscript  |   5 -
>>  src/util/disk_cache.c|   4 -
>>  src/util/disk_cache.h|  42 --
>>  src/util/mesa-sha1.c | 242 
>> +--
>>  src/util/sha1/README |  55 
>>  src/util/sha1/sha1.c | 173 +
>>  src/util/sha1/sha1.h |  47 +++
>>  12 files changed, 279 insertions(+), 466 deletions(-)
>>  create mode 100644 src/util/sha1/README
>>  create mode 100644 src/util/sha1/sha1.c
>>  create mode 100644 src/util/sha1/sha1.h
>>
>> diff --git a/configure.ac b/configure.ac
>> index 459f3e8b0a..5772b378c7 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -9,7 +9,6 @@ dnl Copyright © 2009-2014 Jon TURNEY
>>  dnl Copyright © 2011-2012 Benjamin Franzke
>>  dnl Copyright © 2008-2014 David Airlie
>>  dnl Copyright © 2009-2013 Brian Paul
>> -dnl Copyright © 2003-2007 Keith Packard, Daniel Stone
>
> This change seems like a mistake?

Actually no, since that line was added in commit a24bdce46 which was
the import of all of the SHA1 configuration machinery from the
xserver.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] gallivm: Reenable PPC VSX

2017-01-13 Thread Ben Crocker
Reenable the PPC64LE Vector-Scalar Extension for LLVM versions >= 3.8.1,
now that LLVM bug 26775 and its corollary, 25503, are fixed.

Signed-off-by: Ben Crocker 
---
 src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp 
b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
index 0bd5044..fbffa8e 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
+++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
@@ -50,6 +50,8 @@
 
 #include 
 
+#define LLVM_VERSION( MAJOR_MINOR, PATCH_LEVEL) ((MAJOR_MINOR << 8) + 
PATCH_LEVEL)
+
 // Workaround http://llvm.org/PR23628
 #if HAVE_LLVM >= 0x0307
 #  pragma push_macro("DEBUG")
@@ -614,7 +616,8 @@ 
lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
 
 #if defined(PIPE_ARCH_PPC)
MAttrs.push_back(util_cpu_caps.has_altivec ? "+altivec" : "-altivec");
-#if HAVE_LLVM >= 0x0304
+#if (HAVE_LLVM >= 0x0304) && \
+   (LLVM_VERSION( HAVE_LLVM, MESA_LLVM_VERSION_PATCH) <= LLVM_VERSION(0x0308, 
0x00))
/*
 * Make sure VSX instructions are disabled
 * See LLVM bug https://llvm.org/bugs/show_bug.cgi?id=25503#c7
@@ -622,6 +625,16 @@ 
lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
if (util_cpu_caps.has_altivec) {
   MAttrs.push_back("-vsx");
}
+#elif LLVM_VERSION( HAVE_LLVM, MESA_LLVM_VERSION_PATCH) > LLVM_VERSION(0x0308, 
0x00))
+   /*
+* However, bug 25503 is fixed, by the same fix that fixed
+* bug 26775, in versions of LLVM later than 3.8 (starting with 3.8.1):
+* Make sure VSX instructions are ENABLED
+* See LLVM bug https://llvm.org/bugs/show_bug.cgi?id=26775
+*/
+   if (util_cpu_caps.has_altivec) {
+  MAttrs.push_back("+vsx");
+   }
 #endif
 #endif
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] gallivm: Reenable PPC VSX

2017-01-13 Thread Matt Turner
On Fri, Jan 13, 2017 at 12:39 PM, Ben Crocker  wrote:
> Reenable the PPC64LE Vector-Scalar Extension for LLVM versions >= 3.8.1,
> now that LLVM bug 26775 and its corollary, 25503, are fixed.
>
> Signed-off-by: Ben Crocker 
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp 
> b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> index 0bd5044..fbffa8e 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> @@ -50,6 +50,8 @@
>
>  #include 
>
> +#define LLVM_VERSION( MAJOR_MINOR, PATCH_LEVEL) ((MAJOR_MINOR << 8) + 
> PATCH_LEVEL)

Stray space after opening (
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/8] android: fix llvmpipe build

2017-01-13 Thread Jose Fonseca

On 13/01/17 18:37, Emil Velikov wrote:

On 11 January 2017 at 19:19, Jose Fonseca  wrote:

On 10/01/17 15:54, Emil Velikov wrote:


On 6 January 2017 at 17:35, Wu Zhen  wrote:


From: WuZhen 

since (cf410574 gallivm: Make MCJIT a runtime optioni.), llvmpipe assume
MCJIT is available on x86(_64). this is not the case for android prior to
M.


Wu Zhen, what exactly is the issue you're getting - build or link-time
error ?

Looking at the hunk [1] in the offending commit makes me wonder.
 - Why do we call LLVMLinkInJIT() even if one selects MCJIT via the env
var.
 - Why do we always call LLVMLinkInMCJIT regardless of a) if we've
build against old LLVM and b) the env var.

Jose, shouldn't we honour the above ? One way that comes to mind is to
have USE_MCJIT always as static variable. Then we can guard the
debug_get_bool_option() override with the current LLVM_VERSION/ARCH
heuristics while preserving original invocation.

if (USE_MCJIT) // use lowercase name since it's not a macro ?
   LLVMLinkInMCJIT();
else
   LLVMLinkInJIT();


Thanks
Emil

[1]
@@ -385,18 +382,18 @@ lp_build_init(void)
   if (gallivm_initialized)
  return TRUE;

+   LLVMLinkInMCJIT();
+#if !defined(USE_MCJIT)
+   USE_MCJIT = debug_get_bool_option("GALLIVM_MCJIT", 0);
+   LLVMLinkInJIT();
+#endif
+
#ifdef DEBUG
   gallivm_debug = debug_get_option_gallivm_debug();
#endif

   lp_set_target_options();

-#if USE_MCJIT
-   LLVMLinkInMCJIT();
-#else
-   LLVMLinkInJIT();
-#endif
-



USE_MCJIT used to be a statically define macro, but it's now it can also be
runtime boolean.

We require LLVM 3.3, and MCJIT has been available since then, so there was
no reason not to link.

Android seems a new beast: it has LLVM 3.3 but not MCJIT??


The Android discussion aside I was trying to point out that the commit
in question does more than making the compile time decision run-time
one.

Before the commit - LLVMLinkInMCJIT() was executed only when USE_MCJIT
is set, and after it's executed regardless. On the LLVMLinkInJIT front
- we seem to execute it even if the user has requested USE_MCJIT. I'm
either missing something here, or things look nasty/wrong ?

Thanks
Emil




Note that LLVMLinkInMCJIT() is a no-op at runtime.  (At least it was 
went I last looked at this.)  And the purpose of using LLVMLinkInMCJIT 
is merely to ensure that MCJIT library gets linked (otherwise trying to 
enable mcjit would fail.)


Which is precisely the objective here: in order to potentially enable 
the use MCJIT at runtime, the library needs to be statically linked.


It's not even necessary to actually call LLVMLinkInMCJIT() at runtime, 
but it's important that LLVMLinkInMCJIT call doesn't get optimized away.



Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

2017-01-13 Thread Vladislav Egorov
2017-01-13 22:43 GMT+03:00 Emil Velikov :
>
> On 13 January 2017 at 19:22, Vladislav Egorov  wrote:
> > 13.01.2017 19:51, Emil Velikov пишет:
> >>
> >> From: Emil Velikov 
> >>
> >> At the moment we support 5+ different implementations each with varying
> >> amount of bugs - from thread safely problems [1], to outright broken
> >> implementation(s) [2]
> >>
> >> In order to accommodate these we have 150+ lines of configure script and
> >> extra two configure toggles. Whist an actual implementation being
> >> ~200loc and our current compat wrapping ~250.
> >>
> >> Let's not forget that different people use different code paths, thus
> >> effectively makes it harder to test and debug since the default
> >> implementation is automatically detected.
> >>
> >> To minimise all these lovely experiences, import the "100% Public
> >> Domain" OpenBSD sha1 implementation. Clearly document any changes needed
> >> to get building correctly, since many/most of those can be upstreamed
> >> making future syncs easier.
> >
> >
> > It can hurt performance.
> This is not performance critical path ;-) If that ever changes we can
> rethink our options.
>
> Emil


If it's used by shader-cache, it's certainly along the critical path.
And 7-8 cycles per byte (or more than 10 cycles per byte on Atoms,
Celerons and low-end AMDs) per byte of shader text is something to be
considered. In comparison the entire preprocessing stage takes ~15
cycles per byte -- well, after my optimizations :) I regularly see
util_hash_crc32() in perf top - because it uses inefficient
table-based implementation with the same ~8 cycles per byte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

2017-01-13 Thread Jason Ekstrand
On Fri, Jan 13, 2017 at 11:22 AM, Vladislav Egorov 
wrote:

> 13.01.2017 19:51, Emil Velikov пишет:
>
>> From: Emil Velikov 
>>
>> At the moment we support 5+ different implementations each with varying
>> amount of bugs - from thread safely problems [1], to outright broken
>> implementation(s) [2]
>>
>> In order to accommodate these we have 150+ lines of configure script and
>> extra two configure toggles. Whist an actual implementation being
>> ~200loc and our current compat wrapping ~250.
>>
>
Yes, this is a problem.  Especially given that at least one of those
implementations (openssl?) is something that a certain major game
distributor likes to hard-link into things causing interesting and
hard-to-debug problems.  I am all for getting rid of the "piles of
different dependencies" approach.

Also, something I would like to see (maybe a follow-on patch?) would a
change to the mesa internal API to be able to put the SHA context on the
stack and not need to malloc it.  It's not really a memory or cycle-saving
thing so much as it leaves one fewer cleanup paths you have to worry about.


> Let's not forget that different people use different code paths, thus
>> effectively makes it harder to test and debug since the default
>> implementation is automatically detected.
>>
>> To minimise all these lovely experiences, import the "100% Public
>> Domain" OpenBSD sha1 implementation. Clearly document any changes needed
>> to get building correctly, since many/most of those can be upstreamed
>> making future syncs easier.
>>
>
> It can hurt performance. OpenSSL implementation is optimized for all
> thinkable architectures and it will use hardware SHA-1 instructions on
> newer CPUs. From https://github.com/openssl/ope
> nssl/blob/master/crypto/sha/asm/sha1-x86_64.pl :
>
> > Current performance is summarized in following table. Numbers are
> > CPU clock cycles spent to process single byte (less is better).
> >
> >x86_64SSSE3AVX[2]
> > P49.05-
> > Opteron6.26-
> > Core26.556.05/+8%-
> > Westmere6.735.30/+27%-
> > Sandy Bridge7.706.10/+26%4.99/+54%
> > Ivy Bridge6.064.67/+30%4.60/+32%
> > Haswell5.454.15/+31%3.57/+53%
> > Skylake5.184.06/+28%3.54/+46%
> > Bulldozer9.115.95/+53%
> > VIA Nano9.327.15/+30%
> > Atom10.39.17/+12%
> > Silvermont13.1(*)9.37/+40%
> > Goldmont8.136.42/+27%1.70/+380%(**)
>
> Quick benchmark on my Haswell of the OpenBSD implementation compiled with
> GCC5 -O2: ~8 cycles per byte on 32-bit, ~7 cycles per byte on 64-bit. But
> Haswell is a very powerful CPU, on weaker CPUs the difference would be
> probably larger, especially on new CPUs that have SHA instruction set.
>

Thanks for the numbers.  It sounds like, on Haswell, the openSSL
implementation is about 2x as fast which is very useful to know.  However,
this isn't on a super perf-critical path.  We never use SHA1 on any
draw-time paths; we always use a simpler hash function in those cases and
reserve SHA1 for when we really don't want collisions.  That said, it's a
bit more critical than Emil makes it sound.  A typical Vulkan application
may easily create 10k pipelines and each of those will involve hashing at
least about 100B of data (not include the SPIR-V source).  I doubt,
however, that this is enough to really cause a problem given how much other
work goes into building a pipeline.

Unfortunately, the OpenSSL implementation, while fast, is one of the ones
that is causing problems.  One of our favorite game distributors likes to
hard-link against openssl in some of their games and/or libraries (not sure
which).  This means that, if mesa tries to dynamically open libssl, you get
mysterious crashes due to slight differences between the system-installed
version and the one that has been linked into the game.  This makes trying
to use the OpenSSL implementation a non-starter without being able to
wholesale import the implementation.

Emil, I'm fine with this change.  I haven't reviewed the details, but my
gut tells me we can eat the perf difference for now.  Consider that an
Acked-by if you'd like but it would be good to have someone review at least
the build system stuff.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/main: fix version/extension checks in _mesa_ClampColor

2017-01-13 Thread Mark Janes
This patch regressed i915 systems:

https://bugs.freedesktop.org/show_bug.cgi?id=99401

Please don't apply to stable until the bug is resolved.

Nicolai Hähnle  writes:

> From: Nicolai Hähnle 
>
> Add a proper check for feature support, and raise an invalid enum for
> GL_CLAMP_VERTEX/FRAGMENT_COLOR unconditionally in core profiles, since
> those enums were explicitly removed after the extension was promoted
> to core functionality (not in the profile sense) with OpenGL 3.0.
>
> This matches the behavior of the AMD closed source driver and fixes
> GL45-CTS.gtf30.GL3Tests.half_float.half_float_textures.
>
> Cc: "12.0 13.0" 
> ---
>  src/mesa/main/blend.c | 16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/src/mesa/main/blend.c b/src/mesa/main/blend.c
> index 0322799..955fda1 100644
> --- a/src/mesa/main/blend.c
> +++ b/src/mesa/main/blend.c
> @@ -854,40 +854,44 @@ _mesa_ColorMaski( GLuint buf, GLboolean red, GLboolean 
> green,
> FLUSH_VERTICES(ctx, _NEW_COLOR);
> COPY_4UBV(ctx->Color.ColorMask[buf], tmp);
>  }
>  
>  
>  void GLAPIENTRY
>  _mesa_ClampColor(GLenum target, GLenum clamp)
>  {
> GET_CURRENT_CONTEXT(ctx);
>  
> +   /* Check for both the extension and the GL version, since the Intel driver
> +* does not advertise the extension in core profiles.
> +*/
> +   if (ctx->Version <= 30 && !ctx->Extensions.ARB_color_buffer_float) {
> +  _mesa_error(ctx, GL_INVALID_OPERATION, "glClampColor()");
> +  return;
> +   }
> +
> if (clamp != GL_TRUE && clamp != GL_FALSE && clamp != GL_FIXED_ONLY_ARB) {
>_mesa_error(ctx, GL_INVALID_ENUM, "glClampColorARB(clamp)");
>return;
> }
>  
> switch (target) {
> case GL_CLAMP_VERTEX_COLOR_ARB:
> -  if (ctx->API == API_OPENGL_CORE &&
> -  !ctx->Extensions.ARB_color_buffer_float) {
> +  if (ctx->API == API_OPENGL_CORE)
>   goto invalid_enum;
> -  }
>FLUSH_VERTICES(ctx, _NEW_LIGHT);
>ctx->Light.ClampVertexColor = clamp;
>_mesa_update_clamp_vertex_color(ctx, ctx->DrawBuffer);
>break;
> case GL_CLAMP_FRAGMENT_COLOR_ARB:
> -  if (ctx->API == API_OPENGL_CORE &&
> -  !ctx->Extensions.ARB_color_buffer_float) {
> +  if (ctx->API == API_OPENGL_CORE)
>   goto invalid_enum;
> -  }
>FLUSH_VERTICES(ctx, _NEW_FRAG_CLAMP);
>ctx->Color.ClampFragmentColor = clamp;
>_mesa_update_clamp_fragment_color(ctx, ctx->DrawBuffer);
>break;
> case GL_CLAMP_READ_COLOR_ARB:
>ctx->Color.ClampReadColor = clamp;
>break;
> default:
>goto invalid_enum;
> }
> -- 
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

2017-01-13 Thread Matt Turner
I am generally in favor of this for all the reasons you've described.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

2017-01-13 Thread Matt Turner
On Fri, Jan 13, 2017 at 1:01 PM, Vladislav Egorov  wrote:
> 2017-01-13 22:43 GMT+03:00 Emil Velikov :
>>
>> On 13 January 2017 at 19:22, Vladislav Egorov  wrote:
>> > 13.01.2017 19:51, Emil Velikov пишет:
>> >>
>> >> From: Emil Velikov 
>> >>
>> >> At the moment we support 5+ different implementations each with varying
>> >> amount of bugs - from thread safely problems [1], to outright broken
>> >> implementation(s) [2]
>> >>
>> >> In order to accommodate these we have 150+ lines of configure script and
>> >> extra two configure toggles. Whist an actual implementation being
>> >> ~200loc and our current compat wrapping ~250.
>> >>
>> >> Let's not forget that different people use different code paths, thus
>> >> effectively makes it harder to test and debug since the default
>> >> implementation is automatically detected.
>> >>
>> >> To minimise all these lovely experiences, import the "100% Public
>> >> Domain" OpenBSD sha1 implementation. Clearly document any changes needed
>> >> to get building correctly, since many/most of those can be upstreamed
>> >> making future syncs easier.
>> >
>> >
>> > It can hurt performance.
>> This is not performance critical path ;-) If that ever changes we can
>> rethink our options.
>>
>> Emil
>
>
> If it's used by shader-cache, it's certainly along the critical path.
> And 7-8 cycles per byte (or more than 10 cycles per byte on Atoms,
> Celerons and low-end AMDs) per byte of shader text is something to be
> considered. In comparison the entire preprocessing stage takes ~15
> cycles per byte -- well, after my optimizations :) I regularly see
> util_hash_crc32() in perf top - because it uses inefficient
> table-based implementation with the same ~8 cycles per byte.

Perhaps we should consider using CRC32C (for which an instruction
exists in SSE 4.2 with a latency of 3 cycles) as the hashing function?

http://stackoverflow.com/questions/2694740/can-one-construct-a-good-hash-function-using-crc32c-as-a-base
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Subset of patches from wip-high-priority that can be useful

2017-01-13 Thread Andres Rodriguez
Following are a subset of patches from my wip-high-priority branch that
may be useful outside that context.

The HW priority debugging may take a little while, so I wanted to make some
of the more generic bits to be available on master as other work could benefit
from it.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] radv: make device extension setup dynamic

2017-01-13 Thread Andres Rodriguez
Each physical may have different extensions than one another.
Furthermore, depending on the software stack, some extensions may not be
accessible.

If an extension is conditional, it can be registered only when
necessary.

Signed-off-by: Andres Rodriguez 
---
 src/amd/vulkan/radv_device.c  | 196 --
 src/amd/vulkan/radv_private.h |   6 ++
 2 files changed, 137 insertions(+), 65 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 5669fd7..0333688 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -77,6 +77,115 @@ radv_device_get_cache_uuid(enum radeon_family family, void 
*uuid)
return 0;
 }
 
+static const VkExtensionProperties instance_extensions[] = {
+   {
+   .extensionName = VK_KHR_SURFACE_EXTENSION_NAME,
+   .specVersion = 25,
+   },
+#ifdef VK_USE_PLATFORM_XCB_KHR
+   {
+   .extensionName = VK_KHR_XCB_SURFACE_EXTENSION_NAME,
+   .specVersion = 6,
+   },
+#endif
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+   {
+   .extensionName = VK_KHR_XLIB_SURFACE_EXTENSION_NAME,
+   .specVersion = 6,
+   },
+#endif
+#ifdef VK_USE_PLATFORM_WAYLAND_KHR
+   {
+   .extensionName = VK_KHR_WAYLAND_SURFACE_EXTENSION_NAME,
+   .specVersion = 5,
+   },
+#endif
+};
+
+static const VkExtensionProperties common_device_extensions[] = {
+   {
+   .extensionName = 
VK_KHR_SAMPLER_MIRROR_CLAMP_TO_EDGE_EXTENSION_NAME,
+   .specVersion = 1,
+   },
+   {
+   .extensionName = VK_KHR_SWAPCHAIN_EXTENSION_NAME,
+   .specVersion = 68,
+   },
+   {
+   .extensionName = VK_AMD_DRAW_INDIRECT_COUNT_EXTENSION_NAME,
+   .specVersion = 1,
+   },
+   {
+   .extensionName = VK_AMD_NEGATIVE_VIEWPORT_HEIGHT_EXTENSION_NAME,
+   .specVersion = 1,
+   },
+};
+
+static VkResult
+radv_extensions_register(struct radv_instance *instance,
+   struct radv_extensions *extensions,
+   const VkExtensionProperties *new_ext,
+   uint32_t num_ext)
+{
+   size_t new_size;
+   VkExtensionProperties *new_ptr;
+
+   assert(new_ext && num_ext > 0);
+
+   if (!new_ext)
+   return VK_ERROR_INITIALIZATION_FAILED;
+
+   new_size = (extensions->num_ext + num_ext) * 
sizeof(VkExtensionProperties);
+   new_ptr = vk_realloc(>alloc, extensions->ext_array,
+   new_size, 8, 
VK_SYSTEM_ALLOCATION_SCOPE_INSTANCE);
+
+   /* Old array continues to be valid, update nothing */
+   if (!new_ptr)
+   return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+   memcpy(_ptr[extensions->num_ext], new_ext,
+   num_ext * sizeof(VkExtensionProperties));
+   extensions->ext_array = new_ptr;
+   extensions->num_ext += num_ext;
+
+   return VK_SUCCESS;
+}
+
+#define radv_extensions_register_single(instance, extensions, name, version) \
+   radv_extensions_register(instance, extensions, \
+   
&(VkExtensionProperties){ \
+   
.extensionName = name, \
+   
.specVersion = version \
+   }, 1);
+
+static void
+radv_extensions_finish(struct radv_instance *instance,
+   struct radv_extensions 
*extensions)
+{
+   assert(extensions);
+
+   if (!extensions)
+   radv_loge("Attemted to free invalid extension struct\n");
+
+   if (extensions->ext_array)
+   vk_free(>alloc, extensions->ext_array);
+}
+
+static bool
+is_extension_enabled(const VkExtensionProperties *extensions,
+   size_t num_ext,
+   const char *name)
+{
+   assert(extensions && name);
+
+   for (uint32_t i = 0; i < num_ext; i++) {
+   if (strcmp(name, extensions[i].extensionName) == 0)
+   return true;
+   }
+
+   return false;
+}
+
 static VkResult
 radv_physical_device_init(struct radv_physical_device *device,
  struct radv_instance *instance,
@@ -130,6 +239,13 @@ radv_physical_device_init(struct radv_physical_device 
*device,
goto fail;
}
 
+   result = radv_extensions_register(instance,
+   
>extensions,
+   
common_device_extensions,
+ 

[Mesa-dev] [PATCH 1/3] radv: use a winsys context per-queue, instead of per device

2017-01-13 Thread Andres Rodriguez
Queues are independent execution streams. The vulkan spec provides no
ordering guarantees for different queues.

By using a single context for all queues, we are forcing all commands
into an unecessary FIFO ordering.

This change is a preparation step to allow our-of-ordering scheduling of
certain work tasks.

As a side effect, vkQueueWaitIdle will be marginally faster. Previously
due to the shared context, vkQueueWaitIdle was equivalent to
vkDeviceWaitIdle.

Signed-off-by: Andres Rodriguez 
---
 src/amd/vulkan/radv_device.c  | 34 --
 src/amd/vulkan/radv_private.h |  2 +-
 src/amd/vulkan/radv_wsi.c |  2 +-
 3 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 64fbce8..e8a91a3 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -662,7 +662,7 @@ void radv_GetPhysicalDeviceMemoryProperties(
};
 }
 
-static void
+static int
 radv_queue_init(struct radv_device *device, struct radv_queue *queue,
int queue_family_index, int idx)
 {
@@ -670,11 +670,19 @@ radv_queue_init(struct radv_device *device, struct 
radv_queue *queue,
queue->device = device;
queue->queue_family_index = queue_family_index;
queue->queue_idx = idx;
+
+   queue->hw_ctx = device->ws->ctx_create(device->ws);
+   if (!queue->hw_ctx)
+   return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+   return VK_SUCCESS;
 }
 
 static void
 radv_queue_finish(struct radv_queue *queue)
 {
+   if (queue->hw_ctx)
+   queue->device->ws->ctx_destroy(queue->hw_ctx);
 }
 
 VkResult radv_CreateDevice(
@@ -730,23 +738,20 @@ VkResult radv_CreateDevice(
goto fail;
}
 
-   device->queue_count[qfi] = queue_create->queueCount;
+   memset(device->queues[qfi], 0, queue_create->queueCount * 
sizeof(struct radv_queue));
 
-   for (unsigned q = 0; q < queue_create->queueCount; q++)
-   radv_queue_init(device, >queues[qfi][q], qfi, 
q);
-   }
+   device->queue_count[qfi] = queue_create->queueCount;
 
-   device->hw_ctx = device->ws->ctx_create(device->ws);
-   if (!device->hw_ctx) {
-   result = VK_ERROR_OUT_OF_HOST_MEMORY;
-   goto fail;
+   for (unsigned q = 0; q < queue_create->queueCount; q++) {
+   result = radv_queue_init(device, 
>queues[qfi][q], qfi, q);
+   if (result != VK_SUCCESS)
+   goto fail;
+   }
}
 
result = radv_device_init_meta(device);
-   if (result != VK_SUCCESS) {
-   device->ws->ctx_destroy(device->hw_ctx);
+   if (result != VK_SUCCESS)
goto fail;
-   }
 
radv_device_init_msaa(device);
 
@@ -808,6 +813,7 @@ void radv_DestroyDevice(
device->ws->buffer_destroy(device->trace_bo);
 
device->ws->ctx_destroy(device->hw_ctx);
+
for (unsigned i = 0; i < RADV_MAX_QUEUE_FAMILIES; i++) {
for (unsigned q = 0; q < device->queue_count[i]; q++)
radv_queue_finish(>queues[i][q]);
@@ -920,7 +926,7 @@ VkResult radv_QueueSubmit(
RADV_FROM_HANDLE(radv_queue, queue, _queue);
RADV_FROM_HANDLE(radv_fence, fence, _fence);
struct radeon_winsys_fence *base_fence = fence ? fence->fence : NULL;
-   struct radeon_winsys_ctx *ctx = queue->device->hw_ctx;
+   struct radeon_winsys_ctx *ctx = queue->hw_ctx;
int ret;
uint32_t max_cs_submission = queue->device->trace_bo ? 1 : UINT32_MAX;
 
@@ -999,7 +1005,7 @@ VkResult radv_QueueWaitIdle(
 {
RADV_FROM_HANDLE(radv_queue, queue, _queue);
 
-   queue->device->ws->ctx_wait_idle(queue->device->hw_ctx,
+   queue->device->ws->ctx_wait_idle(queue->hw_ctx,
 
radv_queue_family_to_ring(queue->queue_family_index),
 queue->queue_idx);
return VK_SUCCESS;
diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
index fc3cbca..ab4ede6 100644
--- a/src/amd/vulkan/radv_private.h
+++ b/src/amd/vulkan/radv_private.h
@@ -459,6 +459,7 @@ enum ring_type radv_queue_family_to_ring(int f);
 struct radv_queue {
VK_LOADER_DATA  _loader_data;
struct radv_device * device;
+   struct radeon_winsys_ctx*hw_ctx;
int queue_family_index;
int queue_idx;
 };
@@ -470,7 +471,6 @@ struct radv_device {
 
struct radv_instance *   instance;
struct radeon_winsys *ws;
-   struct radeon_winsys_ctx *hw_ctx;
 
struct radv_meta_state   meta_state;
 
diff --git a/src/amd/vulkan/radv_wsi.c b/src/amd/vulkan/radv_wsi.c
index 952f2c3..002b3a8 100644
--- 

[Mesa-dev] [PATCH 2/3] radv: rename global extension properties structs

2017-01-13 Thread Andres Rodriguez
All extension arrays are global, but only one of them refers to instance
extensions.

The device extension array refers to extensions that are common across
all physical devices. This disctinction will be more imporant once we
have dynamic extension support for devices.

Signed-off-by: Andres Rodriguez 
---
 src/amd/vulkan/radv_device.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index e8a91a3..5669fd7 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -147,7 +147,7 @@ radv_physical_device_finish(struct radv_physical_device 
*device)
device->ws->destroy(device->ws);
 }
 
-static const VkExtensionProperties global_extensions[] = {
+static const VkExtensionProperties instance_extensions[] = {
{
.extensionName = VK_KHR_SURFACE_EXTENSION_NAME,
.specVersion = 25,
@@ -172,7 +172,7 @@ static const VkExtensionProperties global_extensions[] = {
 #endif
 };
 
-static const VkExtensionProperties device_extensions[] = {
+static const VkExtensionProperties common_device_extensions[] = {
{
.extensionName = 
VK_KHR_SAMPLER_MIRROR_CLAMP_TO_EDGE_EXTENSION_NAME,
.specVersion = 1,
@@ -258,9 +258,9 @@ VkResult radv_CreateInstance(
 
for (uint32_t i = 0; i < pCreateInfo->enabledExtensionCount; i++) {
bool found = false;
-   for (uint32_t j = 0; j < ARRAY_SIZE(global_extensions); j++) {
+   for (uint32_t j = 0; j < ARRAY_SIZE(instance_extensions); j++) {
if (strcmp(pCreateInfo->ppEnabledExtensionNames[i],
-  global_extensions[j].extensionName) == 0) {
+  instance_extensions[j].extensionName) == 0) {
found = true;
break;
}
@@ -697,9 +697,9 @@ VkResult radv_CreateDevice(
 
for (uint32_t i = 0; i < pCreateInfo->enabledExtensionCount; i++) {
bool found = false;
-   for (uint32_t j = 0; j < ARRAY_SIZE(device_extensions); j++) {
+   for (uint32_t j = 0; j < ARRAY_SIZE(common_device_extensions); 
j++) {
if (strcmp(pCreateInfo->ppEnabledExtensionNames[i],
-  device_extensions[j].extensionName) == 0) {
+  common_device_extensions[j].extensionName) 
== 0) {
found = true;
break;
}
@@ -831,14 +831,14 @@ VkResult radv_EnumerateInstanceExtensionProperties(
VkExtensionProperties*  pProperties)
 {
if (pProperties == NULL) {
-   *pPropertyCount = ARRAY_SIZE(global_extensions);
+   *pPropertyCount = ARRAY_SIZE(instance_extensions);
return VK_SUCCESS;
}
 
-   *pPropertyCount = MIN2(*pPropertyCount, ARRAY_SIZE(global_extensions));
-   typed_memcpy(pProperties, global_extensions, *pPropertyCount);
+   *pPropertyCount = MIN2(*pPropertyCount, 
ARRAY_SIZE(instance_extensions));
+   typed_memcpy(pProperties, instance_extensions, *pPropertyCount);
 
-   if (*pPropertyCount < ARRAY_SIZE(global_extensions))
+   if (*pPropertyCount < ARRAY_SIZE(instance_extensions))
return VK_INCOMPLETE;
 
return VK_SUCCESS;
@@ -851,14 +851,14 @@ VkResult radv_EnumerateDeviceExtensionProperties(
VkExtensionProperties*  pProperties)
 {
if (pProperties == NULL) {
-   *pPropertyCount = ARRAY_SIZE(device_extensions);
+   *pPropertyCount = ARRAY_SIZE(common_device_extensions);
return VK_SUCCESS;
}
 
-   *pPropertyCount = MIN2(*pPropertyCount, ARRAY_SIZE(device_extensions));
-   typed_memcpy(pProperties, device_extensions, *pPropertyCount);
+   *pPropertyCount = MIN2(*pPropertyCount, 
ARRAY_SIZE(common_device_extensions));
+   typed_memcpy(pProperties, common_device_extensions, *pPropertyCount);
 
-   if (*pPropertyCount < ARRAY_SIZE(device_extensions))
+   if (*pPropertyCount < ARRAY_SIZE(common_device_extensions))
return VK_INCOMPLETE;
 
return VK_SUCCESS;
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] gallivm: Override getHostCPUName() "generic" w/ "pwr8"

2017-01-13 Thread Ben Crocker
If llvm::sys::getHostCPUName() returns "generic", override
it with "pwr8" (on PPC64LE).

This is a work-around for a bug in LLVM: a table entry for "POWER8NVL"
is missing, resulting in (big-endian) "generic" being returned on
little-endian Power8NVL systems.  The result is that code that
attempts to load the least significant 32 bits of a 64-bit quantity in
memory loads the wrong half.

This omission should be fixed in the next version of LLVM,
but this work-around should be left in place in case some
future version of POWER also ends up unrepresented in LLVM's table.

Signed-off-by: Ben Crocker 
---
 src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp 
b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
index f7b31ee..0bd5044 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
+++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
@@ -649,6 +649,10 @@ 
lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
 * when not using MCJIT so no instructions are generated which the old JIT
 * can't handle. Not entirely sure if we really need to do anything yet.
 */
+#if defined( PIPE_ARCH_LITTLE_ENDIAN )  && defined( PIPE_ARCH_PPC_64 )
+   if (MCPU == "generic")
+  MCPU = "pwr8";
+#endif
builder.setMCPU(MCPU);
debug_printf("llc -mcpu option: %s\n", MCPU.str().c_str());
 #endif
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] gallivm: Improve debug output

2017-01-13 Thread Ben Crocker
Improve debug output from gallivm_compile_module and
lp_build_create_jit_compiler_for_module, printing the
-mcpu and -mattr options passed to LLC.

Signed-off-by: Ben Crocker 
---
 src/gallium/auxiliary/gallivm/lp_bld_init.c   | 5 -
 src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 9 +
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c 
b/src/gallium/auxiliary/gallivm/lp_bld_init.c
index d1b2369..fed43e9 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_init.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_init.c
@@ -606,7 +606,10 @@ gallivm_compile_module(struct gallivm_state *gallivm)
   util_snprintf(filename, sizeof(filename), "ir_%s.bc", 
gallivm->module_name);
   LLVMWriteBitcodeToFile(gallivm->module, filename);
   debug_printf("%s written\n", filename);
-  debug_printf("Invoke as \"llc -o - %s\"\n", filename);
+  debug_printf("Invoke as \"llc %s%s -o - %s\"\n",
+   (HAVE_LLVM >= 0x0305) ? "[-mcpu=<-mcpu option] " : "",
+   "[-mattr=<-mattr option(s)>]",
+   filename);
}
 
if (USE_MCJIT) {
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp 
b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
index 21d9e15..f7b31ee 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
+++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
@@ -627,6 +627,14 @@ 
lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
 
builder.setMAttrs(MAttrs);
 
+   int n = MAttrs.size();
+   if (n > 0) {
+  debug_printf("llc -mattr option(s): ");
+  for (int i = 0; i < n; i++)
+ debug_printf("%s%s", MAttrs[i].c_str(), (i < n - 1) ? "," : "");
+  debug_printf("\n");
+   }
+
 #if HAVE_LLVM >= 0x0305
StringRef MCPU = llvm::sys::getHostCPUName();
/*
@@ -642,6 +650,7 @@ 
lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
 * can't handle. Not entirely sure if we really need to do anything yet.
 */
builder.setMCPU(MCPU);
+   debug_printf("llc -mcpu option: %s\n", MCPU.str().c_str());
 #endif
 
ShaderMemoryManager *MM = NULL;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/vec4: Fix mapping attributes

2017-01-13 Thread Jordan Justen
From: "Juan A. Suarez Romero" 

This patch reverts 57bab6708f2bbc1ab8a3d202e9a467963596d462, which was
causing issues with ILK and earlier VS programs.

1. Revert "i965/vec4/nir: vec4 also needs to remap vs attributes"

   Do not perform a remap in vec4 backend. Rather, do it later when
   setup attributes

2. This fixes mapping ATTRx to proper GRFn.

Suggested-by: Kenneth Graunke 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99391
[jordan.l.jus...@intel.com: merge Juan's two patches from bugzilla]
Signed-off-by: Jordan Justen 
Cc: Kenneth Graunke 
---
 I merged Juan's revert + fix patches, as suggested by Ken.

 I put this patch through jenkins, and they appeared to fix the
 ilk/g45/g965 regressions.

 The revert is in brw_nir.c, and Juan's new change is in brw_vec4.cpp.

 src/mesa/drivers/dri/i965/brw_nir.c| 32 ++--
 src/mesa/drivers/dri/i965/brw_vec4.cpp |  2 +-
 2 files changed, 11 insertions(+), 23 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index b39e2b1f523..3c1bc5162fc 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -95,19 +95,9 @@ add_const_offset_to_base(nir_shader *nir, nir_variable_mode 
mode)
}
 }
 
-struct remap_vs_attrs_params {
-   shader_info *nir_info;
-   bool is_scalar;
-};
-
 static bool
-remap_vs_attrs(nir_block *block, void *closure)
+remap_vs_attrs(nir_block *block, shader_info *nir_info)
 {
-   struct remap_vs_attrs_params *params =
-  (struct remap_vs_attrs_params *) closure;
-   shader_info *nir_info = params->nir_info;
-   bool is_scalar = params->is_scalar;
-
nir_foreach_instr(instr, block) {
   if (instr->type != nir_instr_type_intrinsic)
  continue;
@@ -123,7 +113,7 @@ remap_vs_attrs(nir_block *block, void *closure)
  int attr = intrin->const_index[0];
  int slot = _mesa_bitcount_64(nir_info->inputs_read &
   BITFIELD64_MASK(attr));
- intrin->const_index[0] = is_scalar ? 4 * slot : slot;
+ intrin->const_index[0] = 4 * slot;
   }
}
return true;
@@ -267,11 +257,6 @@ brw_nir_lower_vs_inputs(nir_shader *nir,
 bool use_legacy_snorm_formula,
 const uint8_t *vs_attrib_wa_flags)
 {
-   struct remap_vs_attrs_params params = {
-  .nir_info = nir->info,
-  .is_scalar = is_scalar
-   };
-
/* Start with the location of the variable's base. */
foreach_list_typed(nir_variable, var, node, >inputs) {
   var->data.driver_location = var->data.location;
@@ -291,11 +276,14 @@ brw_nir_lower_vs_inputs(nir_shader *nir,
brw_nir_apply_attribute_workarounds(nir, use_legacy_snorm_formula,
vs_attrib_wa_flags);
 
-   /* Finally, translate VERT_ATTRIB_* values into the actual registers. */
-   nir_foreach_function(function, nir) {
-  if (function->impl) {
- nir_foreach_block(block, function->impl) {
-remap_vs_attrs(block, );
+   if (is_scalar) {
+  /* Finally, translate VERT_ATTRIB_* values into the actual registers. */
+
+  nir_foreach_function(function, nir) {
+ if (function->impl) {
+nir_foreach_block(block, function->impl) {
+   remap_vs_attrs(block, nir->info);
+}
  }
   }
}
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 748a068b142..5e60eb657a7 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1739,7 +1739,7 @@ vec4_vs_visitor::setup_attributes(int payload_reg)
   int needed_slots =
  (vs_prog_data->double_inputs_read & BITFIELD64_BIT(first)) ? 2 : 1;
   for (int c = 0; c < needed_slots; c++) {
- attribute_map[nr_attributes] = payload_reg + nr_attributes;
+ attribute_map[first + c] = payload_reg + nr_attributes;
  nr_attributes++;
  vs_inputs &= ~BITFIELD64_BIT(first + c);
   }
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

2017-01-13 Thread Dylan Baker
Quoting Emil Velikov (2017-01-13 08:51:31)
> From: Emil Velikov 
> 
> At the moment we support 5+ different implementations each with varying
> amount of bugs - from thread safely problems [1], to outright broken
> implementation(s) [2]
> 
> In order to accommodate these we have 150+ lines of configure script and
> extra two configure toggles. Whist an actual implementation being
> ~200loc and our current compat wrapping ~250.
> 
> Let's not forget that different people use different code paths, thus
> effectively makes it harder to test and debug since the default
> implementation is automatically detected.
> 
> To minimise all these lovely experiences, import the "100% Public
> Domain" OpenBSD sha1 implementation. Clearly document any changes needed
> to get building correctly, since many/most of those can be upstreamed
> making future syncs easier.
> 
> As an added bonus this will avoid all the 'fun' experiences trying to
> integrate it with the Android and SCons builds.
> 
> Bugzilla [1]: https://bugs.freedesktop.org/show_bug.cgi?id=94904
> Bugzilla [2]: https://bugs.freedesktop.org/show_bug.cgi?id=97967
> Cc: Mark Janes 
> Cc: Vinson Lee 
> Cc: Tapani Pälli 
> Cc: Jonathan Gray 
> Signed-off-by: Emil Velikov 
> ---
>  configure.ac | 161 +--
>  src/compiler/glsl/tests/cache_test.c |   5 -
>  src/mesa/main/shaderapi.c|   6 -
>  src/util/Makefile.am |   3 -
>  src/util/Makefile.sources|   2 +
>  src/util/SConscript  |   5 -
>  src/util/disk_cache.c|   4 -
>  src/util/disk_cache.h|  42 --
>  src/util/mesa-sha1.c | 242 
> +--
>  src/util/sha1/README |  55 
>  src/util/sha1/sha1.c | 173 +
>  src/util/sha1/sha1.h |  47 +++
>  12 files changed, 279 insertions(+), 466 deletions(-)
>  create mode 100644 src/util/sha1/README
>  create mode 100644 src/util/sha1/sha1.c
>  create mode 100644 src/util/sha1/sha1.h
> 
> diff --git a/configure.ac b/configure.ac
> index 459f3e8b0a..5772b378c7 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -9,7 +9,6 @@ dnl Copyright © 2009-2014 Jon TURNEY
>  dnl Copyright © 2011-2012 Benjamin Franzke
>  dnl Copyright © 2008-2014 David Airlie
>  dnl Copyright © 2009-2013 Brian Paul
> -dnl Copyright © 2003-2007 Keith Packard, Daniel Stone

This change seems like a mistake?

>  dnl
>  dnl Permission is hereby granted, free of charge, to any person obtaining a
>  dnl copy of this software and associated documentation files (the 
> "Software"),
> @@ -1432,151 +1431,6 @@ if test "x$enable_gallium_osmesa" = xyes; then
>  fi
>  fi
>  
> -# SHA1 hashing
> -AC_ARG_WITH([sha1],
> -
> [AS_HELP_STRING([--with-sha1=libc|libmd|libnettle|libgcrypt|libcrypto|libsha1|CommonCrypto|CryptoAPI],
> -[choose SHA1 implementation])])
> -case "x$with_sha1" in
> -x | xlibc | xlibmd | xlibnettle | xlibgcrypt | xlibcrypto | xlibsha1 | 
> xCommonCrypto | xCryptoAPI)
> -  ;;
> -*)
> -AC_MSG_ERROR([Illegal value for --with-sha1: $with_sha1])
> -esac
> -
> -AC_CHECK_FUNC([SHA1Init], [HAVE_SHA1_IN_LIBC=yes])
> -if test "x$with_sha1" = x && test "x$HAVE_SHA1_IN_LIBC" = xyes; then
> -   with_sha1=libc
> -fi
> -if test "x$with_sha1" = xlibc && test "x$HAVE_SHA1_IN_LIBC" != xyes; then
> -   AC_MSG_ERROR([sha1 in libc requested but not found])
> -fi
> -if test "x$with_sha1" = xlibc; then
> -   AC_DEFINE([HAVE_SHA1_IN_LIBC], [1],
> -   [Use libc SHA1 functions])
> -   SHA1_LIBS=""
> -fi
> -AC_CHECK_FUNC([CC_SHA1_Init], [HAVE_SHA1_IN_COMMONCRYPTO=yes])
> -if test "x$with_sha1" = x && test "x$HAVE_SHA1_IN_COMMONCRYPTO" = xyes; then
> -   with_sha1=CommonCrypto
> -fi
> -if test "x$with_sha1" = xCommonCrypto && test "x$HAVE_SHA1_IN_COMMONCRYPTO" 
> != xyes; then
> -   AC_MSG_ERROR([CommonCrypto requested but not found])
> -fi
> -if test "x$with_sha1" = xCommonCrypto; then
> -   AC_DEFINE([HAVE_SHA1_IN_COMMONCRYPTO], [1],
> -   [Use CommonCrypto SHA1 functions])
> -   SHA1_LIBS=""
> -fi
> -dnl stdcall functions cannot be tested with AC_CHECK_LIB
> -AC_CHECK_HEADER([wincrypt.h], [HAVE_SHA1_IN_CRYPTOAPI=yes], [], [#include 
> ])
> -if test "x$with_sha1" = x && test "x$HAVE_SHA1_IN_CRYPTOAPI" = xyes; then
> -   with_sha1=CryptoAPI
> -fi
> -if test "x$with_sha1" = xCryptoAPI && test "x$HAVE_SHA1_IN_CRYPTOAPI" != 
> xyes; then
> -   AC_MSG_ERROR([CryptoAPI requested but not found])
> -fi
> -if test "x$with_sha1" = xCryptoAPI; then
> -   AC_DEFINE([HAVE_SHA1_IN_CRYPTOAPI], [1],
> -   [Use CryptoAPI SHA1 functions])
> -   SHA1_LIBS=""
> -fi
> -AC_CHECK_LIB([md], [SHA1Init], [HAVE_LIBMD=yes])
> -if test "x$with_sha1" = x 

Re: [Mesa-dev] [PATCH v4 0/7] etnaviv: update derived texture resources of (re)imported buffers

2017-01-13 Thread Christian Gmeiner
Hi all.

I am looking for some r-b/s-b for the core bits - anyone?

2016-12-06 17:17 GMT+01:00 Philipp Zabel :
> Hi,
>
> to get weston / wayland_egl working on etnaviv, we need to update the texture
> resources derived from imported buffers every time they are re-imported.
>
> This patchset is based on the github-etnaviv/for_mainline_v1 branch and adds
> a new pipe_screen::resource_changed callback that is called inside
> dri2_from_planar and instructs the pipe driver to invalidate the internal
> (texture) resources that are derived from the re-imported resource.
>
> I've also added an updated version of the earlier GL_OES_EGL_image_external
> patches that now use resource_changed to invalidate internal derived resources
> when an external texture is (re-)bound, to comply with the specification.
>
> The etnaviv implementation of resource_changed just sets the texture seqno
> to the resource seqno - 1. The initial seqno of imported resources is set to 1
> so that texture resources created from them are actually older and trigger the
> resolve on first use.
>
> Changes since v3:
>  - Added resource_changed to ddebug, rbug, and trace wrapper drivers
>
> regards
> Philipp
>
> Philipp Zabel (7):
>   gallium: add pipe_screen::resource_changed
>   st/dri: ask the driver to update its internal copies on reimport
>   etnaviv: initialize seqno of imported resources
>   etnaviv: implement resource_changed to invalidate internal resources
> derived from imported buffers
>   mesa: update external textures when (re-)binding
>   st/mesa: ask pipe driver to recreate derived internal resources when
> (re-)binding external textures
>   gallium: add pipe_screen::resource_changed callback wrappers
>
>  src/gallium/docs/source/screen.rst | 14 ++
>  src/gallium/drivers/ddebug/dd_screen.c | 10 ++
>  src/gallium/drivers/etnaviv/etnaviv_resource.c | 15 +++
>  src/gallium/drivers/rbug/rbug_screen.c | 11 +++
>  src/gallium/drivers/trace/tr_screen.c  | 20 
>  src/gallium/include/pipe/p_screen.h|  8 
>  src/gallium/state_trackers/dri/dri2.c  |  4 
>  src/mesa/main/texobj.c |  5 +++--
>  src/mesa/state_tracker/st_atom_texture.c   |  4 
>  9 files changed, 89 insertions(+), 2 deletions(-)
>
> --
> 2.10.2
>

greets
--
Christian Gmeiner, MSc

https://www.youtube.com/user/AloryOFFICIAL
https://soundcloud.com/christian-gmeiner
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] NaN behavior in GLSL (was Re: [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs()))

2017-01-13 Thread Axel Davy

On 13/01/2017 19:50, Matteo Bruni wrote:

2017-01-13 3:37 GMT+01:00 Ilia Mirkin :

On Thu, Jan 12, 2017 at 9:13 PM, Jason Ekstrand  wrote:

Unless, of course, it's controlled by the same hardware bit... Clearly, we
can can give you abs on rsq without denorm flushing (easy shader hacks) but
not the other way around.

OK, so somehow I missed that earlier. However there's an interesting
section in the PRM:

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol07-3d_media_gpgpu.pdf

on PDF page 854, "Dismissed Legacy Behaviors" which has a list of
suggested IEEE 754 deviations for DX9. One of them is indeed that 0 *
x = 0, but another is that input NaNs be propagated with certain
exceptions. Also they suggest that RCP(0)/RSQ(0) = fmax. Interesting.

So at this point, the zero_wins thing is pretty much blown. i965
appears to have an all-or-nothing approach, and additionally that
approach doesn't match up exactly to what NVIDIA does (or at least I'm
not aware of a clamp-everything mode).

This will take some thought to figure out how something can be
specified so that a single spec works for both i965 and nv/amd. OTOH
we could have two different specs that just expose different things -
e.g. i965 could expose a MESA_shader_float_alt_mode or whatever which
is spec'd to do the things that the PRM says, and nv/amd have the
MESA_shader_float_zero_wins ext which does what we were talking about
earlier.

I'm open to other suggestions too.

Maybe we can go back to the original idea and have the extension
require that no NaNs can be generated by GLSL mathematical operators
and builtin functions (if no operand is a NaN?) It's possible that's
not exactly it but in any case the idea is to just specify expected
results, without requiring a specific route to get there. The
extension could introduce undefined behavior where necessary e.g.
allowing (but not requiring) INF results to be always flushed to fmax
when enabled.

For Intel that would work trivially. For AMD it should be a matter of
using the special instructions where necessary and "be careful" in a
few places (in the same vein as the RSQ and POW opcodes of ARB
programs Marek mentioned). Not sure about nouveau, I guess it should
be similar to AMD in the end.

Would that be too messy? Am I completely missing the point?


Specifying just the behaviour for NaN doesn't solve the 0*inf issue for 
MAD operations. 24 + 0*inf = NaN gets converted to 0 instead of 24.



Axel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

2017-01-13 Thread Vladislav Egorov

13.01.2017 19:51, Emil Velikov пишет:

From: Emil Velikov 

At the moment we support 5+ different implementations each with varying
amount of bugs - from thread safely problems [1], to outright broken
implementation(s) [2]

In order to accommodate these we have 150+ lines of configure script and
extra two configure toggles. Whist an actual implementation being
~200loc and our current compat wrapping ~250.

Let's not forget that different people use different code paths, thus
effectively makes it harder to test and debug since the default
implementation is automatically detected.

To minimise all these lovely experiences, import the "100% Public
Domain" OpenBSD sha1 implementation. Clearly document any changes needed
to get building correctly, since many/most of those can be upstreamed
making future syncs easier.


It can hurt performance. OpenSSL implementation is optimized for all 
thinkable architectures and it will use hardware SHA-1 instructions on 
newer CPUs. From 
https://github.com/openssl/openssl/blob/master/crypto/sha/asm/sha1-x86_64.pl 
:


> Current performance is summarized in following table. Numbers are
> CPU clock cycles spent to process single byte (less is better).
>
>x86_64SSSE3AVX[2]
> P49.05-
> Opteron6.26-
> Core26.556.05/+8%-
> Westmere6.735.30/+27%-
> Sandy Bridge7.706.10/+26%4.99/+54%
> Ivy Bridge6.064.67/+30%4.60/+32%
> Haswell5.454.15/+31%3.57/+53%
> Skylake5.184.06/+28%3.54/+46%
> Bulldozer9.115.95/+53%
> VIA Nano9.327.15/+30%
> Atom10.39.17/+12%
> Silvermont13.1(*)9.37/+40%
> Goldmont8.136.42/+27%1.70/+380%(**)

Quick benchmark on my Haswell of the OpenBSD implementation compiled 
with GCC5 -O2: ~8 cycles per byte on 32-bit, ~7 cycles per byte on 
64-bit. But Haswell is a very powerful CPU, on weaker CPUs the 
difference would be probably larger, especially on new CPUs that have 
SHA instruction set.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/get: Remove unused extra_ARB_viewport_array

2017-01-13 Thread Emil Velikov
On 9 January 2017 at 14:48, Boyan Ding  wrote:
> Unused since 0a7691ee (mesa: Enable enums for OES_viewport_array).
> Silence a warning of unused variable.
>
> Signed-off-by: Boyan Ding 
R-b and pushed to master.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: Default PointSize to 1.0 if not written by the shader

2017-01-13 Thread Kenneth Graunke
On Friday, January 13, 2017 9:41:58 AM PST Jason Ekstrand wrote:
> The Vulkan rules for point size are a bit whacky.  If you only have a
> vertex shader and you use points, then you must write PointSize in your
> vertex shader.  If you have a geometry or tessellation shader, then it's
> dependent on the shaderTessellationAndGeometryPointSize device feature.
> From the Vulkan 1.0.38 specification:
> 
>"shaderTessellationAndGeometryPointSize indicates whether the
>PointSize built-in decoration is available in the tessellation
>control, tessellation evaluation, and geometry shader stages. If this
>feature is not enabled, members decorated with the PointSize built-in
>decoration must not be read from or written to and all points written
>from a tessellation or geometry shader will have a size of 1.0. This
>also indicates whether shader modules can declare the
>TessellationPointSize capability for tessellation control and
>evaluation shaders, or if the shader modules can declare the
>GeometryPointSize capability for geometry shaders. An implementation
>supporting this feature must also support one or both of the
>tessellationShader or geometryShader features."
> 
> In other words, if the feature is disbled (the client can disable
> features!) then they don't write PointSize and we provide a 1.0 default
> but if the feature is enabled, they do write PointSize and we use the
> one they wrote in the shader.  There are at least two valid ways we can
> implement this:
> 
>  1) Track whether or not shaderTessellationAndGeometryPointSize is
> enabled and set the 3DSTATE_SF bits based on that and what stages
> are enabled, ignoring the shader source.
> 
>  2) Just look at the last geometry stage VUE map and see if they wrote
> PointSize and set the 3DSTATE_SF accordingly.
> 
> The second solution is the easiest and the most robust against invalid
> usage of the Vulkan API, so we choose to go with that one.
> 
> This fixes all of the dEQP-VK.tessellation.primitive_discard.*point_mode
> tests.  The tests are also broken because they unconditionally enable
> shaderTessellationAndGeometryPointSize if it's supported by the
> implementation and then don't write PointSize in the evaluation shader.
> However, since this is the "robust against invalid API usage" solution,
> the tests happily pass. :-)
> ---
>  src/intel/vulkan/genX_pipeline.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/src/intel/vulkan/genX_pipeline.c 
> b/src/intel/vulkan/genX_pipeline.c
> index 7fa68c0..a537a40 100644
> --- a/src/intel/vulkan/genX_pipeline.c
> +++ b/src/intel/vulkan/genX_pipeline.c
> @@ -420,8 +420,16 @@ emit_rs_state(struct anv_pipeline *pipeline,
> sf.TriangleStripListProvokingVertexSelect = 0;
> sf.LineStripListProvokingVertexSelect = 0;
> sf.TriangleFanProvokingVertexSelect = 1;
> -   sf.PointWidthSource = Vertex;
> -   sf.PointWidth = 1.0;
> +
> +   const struct brw_vue_prog_data *last_vue_prog_data =
> +  anv_pipeline_get_last_vue_prog_data(pipeline);
> +
> +   if (last_vue_prog_data->vue_map.slots_valid & VARYING_BIT_PSIZ) {
> +  sf.PointWidthSource = Vertex;
> +   } else {
> +  sf.PointWidthSource = State;
> +  sf.PointWidth = 1.0;
> +   }
>  
>  #if GEN_GEN >= 8
> struct GENX(3DSTATE_RASTER) raster = {
> 

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Unify the style of function pointer calls in structs

2017-01-13 Thread Emil Velikov
Hi Boyan,

On 25 November 2015 at 05:27, Boyan Ding  wrote:
> This series is a follow-up of Brian's patch ([1], commit 47fae842). It
> converts nearly all of the function-pointer-in-a-struct calls from
>   (*foo->bar)(...) or (foo->bar)(...)
> to
>   foo->bar(...)
>
> The sed regex to do the conversion looks like this (really ugly):
>   s/(\*\?\([^*(), \]]*->[^*), \]]*\))(/\1(/
> It doesn't affect pointer-to-member operation in C++ (the parenthesis
> there can't be omitted).
>
> This series didn't touch the gtest directory since it seems to be an
> external project. If people think it is also necessary to apply
> conversion there, I can send the patch for it.
>
> I compile-tested the series and it builds okay.
>
I could swear I've pushed this series ages ago... but it is now.

I've added a couple of extra fixes and pushed the lot excluding 2/8
and 8/8. Former is quite incomplete whist the latter is no longer
applicable.
According to the following grep we have another ~60 cases in glx, with
a couple of odd ones throughout.

Please let us know if some patches fall through the cracks.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/7] i965/sync: Add brw_fence::type

2017-01-13 Thread Chad Versace
This a refactor patch; no expected changed in behavior.

Add `enum brw_fence_type` and brw_fence::type. There is only one type
currently, BRW_FENCE_TYPE_BO_WAIT. This patch reduces a lot of noise in
the next, which adds new type BRW_FENCE_TYPE_SYNC_FD.
---
 src/mesa/drivers/dri/i965/brw_sync.c | 103 ---
 1 file changed, 71 insertions(+), 32 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_sync.c 
b/src/mesa/drivers/dri/i965/brw_sync.c
index 1df5610385..f9127a41ea 100644
--- a/src/mesa/drivers/dri/i965/brw_sync.c
+++ b/src/mesa/drivers/dri/i965/brw_sync.c
@@ -45,6 +45,11 @@
 
 struct brw_fence {
struct brw_context *brw;
+
+   enum brw_fence_type {
+  BRW_FENCE_TYPE_BO_WAIT,
+   } type;
+
/** The fence waits for completion of this batch. */
drm_intel_bo *batch_bo;
 
@@ -58,18 +63,29 @@ struct brw_gl_sync {
 };
 
 static void
-brw_fence_init(struct brw_context *brw, struct brw_fence *fence)
+brw_fence_init(struct brw_context *brw, struct brw_fence *fence,
+   enum brw_fence_type type)
 {
fence->brw = brw;
-   fence->batch_bo = NULL;
+   fence->type = type;
mtx_init(>mutex, mtx_plain);
+
+   switch (type) {
+   case BRW_FENCE_TYPE_BO_WAIT:
+  fence->batch_bo = NULL;
+  break;
+   }
 }
 
 static void
 brw_fence_finish(struct brw_fence *fence)
 {
-   if (fence->batch_bo)
-  drm_intel_bo_unreference(fence->batch_bo);
+   switch (fence->type) {
+   case BRW_FENCE_TYPE_BO_WAIT:
+  if (fence->batch_bo)
+ drm_intel_bo_unreference(fence->batch_bo);
+  break;
+   }
 
mtx_destroy(>mutex);
 }
@@ -77,13 +93,18 @@ brw_fence_finish(struct brw_fence *fence)
 static void
 brw_fence_insert(struct brw_context *brw, struct brw_fence *fence)
 {
-   assert(!fence->batch_bo);
-   assert(!fence->signalled);
-
brw_emit_mi_flush(brw);
-   fence->batch_bo = brw->batch.bo;
-   drm_intel_bo_reference(fence->batch_bo);
-   intel_batchbuffer_flush(brw);
+
+   switch (fence->type) {
+   case BRW_FENCE_TYPE_BO_WAIT:
+  assert(!fence->batch_bo);
+  assert(!fence->signalled);
+
+  fence->batch_bo = brw->batch.bo;
+  drm_intel_bo_reference(fence->batch_bo);
+  intel_batchbuffer_flush(brw);
+  break;
+   }
 }
 
 static bool
@@ -92,10 +113,18 @@ brw_fence_has_completed_locked(struct brw_fence *fence)
if (fence->signalled)
   return true;
 
-   if (fence->batch_bo && !drm_intel_bo_busy(fence->batch_bo)) {
+   switch (fence->type) {
+   case BRW_FENCE_TYPE_BO_WAIT:
+  if (!fence->batch_bo)
+ return false;
+
+  if (drm_intel_bo_busy(fence->batch_bo))
+ return false;
+
   drm_intel_bo_unreference(fence->batch_bo);
   fence->batch_bo = NULL;
   fence->signalled = true;
+
   return true;
}
 
@@ -121,24 +150,30 @@ brw_fence_client_wait_locked(struct brw_context *brw, 
struct brw_fence *fence,
if (fence->signalled)
   return true;
 
-   assert(fence->batch_bo);
+   switch (fence->type) {
+   case BRW_FENCE_TYPE_BO_WAIT:
+  assert(fence->batch_bo);
 
-   /* DRM_IOCTL_I915_GEM_WAIT uses a signed 64 bit timeout and returns
-* immediately for timeouts <= 0.  The best we can do is to clamp the
-* timeout to INT64_MAX.  This limits the maximum timeout from 584 years to
-* 292 years - likely not a big deal.
-*/
-   if (timeout > INT64_MAX)
-  timeout = INT64_MAX;
+  /* DRM_IOCTL_I915_GEM_WAIT uses a signed 64 bit timeout and returns
+   * immediately for timeouts <= 0.  The best we can do is to clamp the
+   * timeout to INT64_MAX.  This limits the maximum timeout from 584 years 
to
+   * 292 years - likely not a big deal.
+   */
+  if (timeout > INT64_MAX)
+ timeout = INT64_MAX;
 
-   if (drm_intel_gem_bo_wait(fence->batch_bo, timeout) != 0)
-  return false;
+  if (drm_intel_gem_bo_wait(fence->batch_bo, timeout) != 0)
+ return false;
+
+  fence->signalled = true;
+  drm_intel_bo_unreference(fence->batch_bo);
+  fence->batch_bo = NULL;
 
-   fence->signalled = true;
-   drm_intel_bo_unreference(fence->batch_bo);
-   fence->batch_bo = NULL;
+  return true;
+   }
 
-   return true;
+   assert(!"bad enum brw_fence_type");
+   return false;
 }
 
 /**
@@ -161,11 +196,15 @@ brw_fence_client_wait(struct brw_context *brw, struct 
brw_fence *fence,
 static void
 brw_fence_server_wait(struct brw_context *brw, struct brw_fence *fence)
 {
-   /* We have nothing to do for WaitSync.  Our GL command stream is sequential,
-* so given that the sync object has already flushed the batchbuffer, any
-* batchbuffers coming after this waitsync will naturally not occur until
-* the previous one is done.
-*/
+   switch (fence->type) {
+   case BRW_FENCE_TYPE_BO_WAIT:
+  /* We have nothing to do for WaitSync.  Our GL command stream is 
sequential,
+   * so given that the sync object has already flushed the batchbuffer, any
+   * batchbuffers coming after this waitsync will 

[Mesa-dev] [PATCH 5/7] i965/sync: Rename brw_fence_insert()

2017-01-13 Thread Chad Versace
Rename to brw_fence_insert_locked(). This is correct because the fence's
mutex is effectively locked, as all callers are also *creators* of the
fence, and have not yet returned the new fence.

This reduces noise in the next patch, which defines and uses
brw_fence_insert(), an unlocked variant.
---
 src/mesa/drivers/dri/i965/brw_sync.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_sync.c 
b/src/mesa/drivers/dri/i965/brw_sync.c
index 24c8cbd3b0..77d382cab6 100644
--- a/src/mesa/drivers/dri/i965/brw_sync.c
+++ b/src/mesa/drivers/dri/i965/brw_sync.c
@@ -91,7 +91,7 @@ brw_fence_finish(struct brw_fence *fence)
 }
 
 static bool MUST_CHECK
-brw_fence_insert(struct brw_context *brw, struct brw_fence *fence)
+brw_fence_insert_locked(struct brw_context *brw, struct brw_fence *fence)
 {
brw_emit_mi_flush(brw);
 
@@ -249,7 +249,7 @@ brw_gl_fence_sync(struct gl_context *ctx, struct 
gl_sync_object *_sync,
 
brw_fence_init(brw, >fence, BRW_FENCE_TYPE_BO_WAIT);
 
-   if (!brw_fence_insert(brw, >fence)) {
+   if (!brw_fence_insert_locked(brw, >fence)) {
   /* FIXME: There exists no way to report a GL error here. If an error
* occurs, continue silently and hope for the best.
*/
@@ -309,7 +309,7 @@ brw_dri_create_fence(__DRIcontext *ctx)
 
brw_fence_init(brw, fence, BRW_FENCE_TYPE_BO_WAIT);
 
-   if (!brw_fence_insert(brw, fence)) {
+   if (!brw_fence_insert_locked(brw, fence)) {
   brw_fence_finish(fence);
   free(fence);
   return NULL;
-- 
2.11.0.21.ga274e0a

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] glx: Add missing glproto dependency for gallium-xlib glx

2017-01-13 Thread Emil Velikov
On 13 January 2017 at 17:41, Chuck Atkins  wrote:
> Just saw this got merged, thanks!  Any chance of it getting to stable for
> the 13.1 release?
>
Not sure I parse that - currently we have 12.0 and 13.0 series. With a
17.0 one coming in shortly.
It will land for 13.0, but I can check if it's applicable for 12.0 and
get it in there as well.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/8] android: fix llvmpipe build

2017-01-13 Thread Emil Velikov
On 11 January 2017 at 19:19, Jose Fonseca  wrote:
> On 10/01/17 15:54, Emil Velikov wrote:
>>
>> On 6 January 2017 at 17:35, Wu Zhen  wrote:
>>>
>>> From: WuZhen 
>>>
>>> since (cf410574 gallivm: Make MCJIT a runtime optioni.), llvmpipe assume
>>> MCJIT is available on x86(_64). this is not the case for android prior to
>>> M.
>>>
>> Wu Zhen, what exactly is the issue you're getting - build or link-time
>> error ?
>>
>> Looking at the hunk [1] in the offending commit makes me wonder.
>>  - Why do we call LLVMLinkInJIT() even if one selects MCJIT via the env
>> var.
>>  - Why do we always call LLVMLinkInMCJIT regardless of a) if we've
>> build against old LLVM and b) the env var.
>>
>> Jose, shouldn't we honour the above ? One way that comes to mind is to
>> have USE_MCJIT always as static variable. Then we can guard the
>> debug_get_bool_option() override with the current LLVM_VERSION/ARCH
>> heuristics while preserving original invocation.
>>
>> if (USE_MCJIT) // use lowercase name since it's not a macro ?
>>LLVMLinkInMCJIT();
>> else
>>LLVMLinkInJIT();
>>
>>
>> Thanks
>> Emil
>>
>> [1]
>> @@ -385,18 +382,18 @@ lp_build_init(void)
>>if (gallivm_initialized)
>>   return TRUE;
>>
>> +   LLVMLinkInMCJIT();
>> +#if !defined(USE_MCJIT)
>> +   USE_MCJIT = debug_get_bool_option("GALLIVM_MCJIT", 0);
>> +   LLVMLinkInJIT();
>> +#endif
>> +
>> #ifdef DEBUG
>>gallivm_debug = debug_get_option_gallivm_debug();
>> #endif
>>
>>lp_set_target_options();
>>
>> -#if USE_MCJIT
>> -   LLVMLinkInMCJIT();
>> -#else
>> -   LLVMLinkInJIT();
>> -#endif
>> -
>>
>
> USE_MCJIT used to be a statically define macro, but it's now it can also be
> runtime boolean.
>
> We require LLVM 3.3, and MCJIT has been available since then, so there was
> no reason not to link.
>
> Android seems a new beast: it has LLVM 3.3 but not MCJIT??
>
The Android discussion aside I was trying to point out that the commit
in question does more than making the compile time decision run-time
one.

Before the commit - LLVMLinkInMCJIT() was executed only when USE_MCJIT
is set, and after it's executed regardless. On the LLVMLinkInJIT front
- we seem to execute it even if the user has requested USE_MCJIT. I'm
either missing something here, or things look nasty/wrong ?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/va: delay calling begin_frame until we have all parameters

2017-01-13 Thread Nayan Deshmukh
On Fri, Jan 13, 2017 at 9:54 PM, Andy Furniss  wrote:
>
> Nayan Deshmukh wrote:
>>
>> On Fri, Jan 13, 2017 at 8:32 PM, Andy Furniss  wrote:
>>
>>> Nayan Deshmukh wrote:
>>>
 Hi Andy,

 Please test this patch for regressions.

>>>
>>> Do you have a testcase to show the fix?
>>>
>>> TBH I've not tested gstreamer with mpeg2 before as vaapi mpeg2
>>> h/w dec never worked properly anyway.
>>>
>>> https://bugs.freedesktop.org/show_bug.cgi?id=93760
>>>
>>> With mpv --hwdec=vaapi it doesn't seem to regress anything.
>>>
>> I was talking about --hwdec=vaapi. Before this patch I was not able to play
>> any mpeg videos with vaapi as mpv --hwdec=vaapi --vo=vaapi always
>> segfaulted. With this patch I can see videos properly. Just wanted to
>> make sure it did not cause any regression when using hardware decoder.
>
>
> Oh, OK, I can't reproduce that with mpv, but it will still just assert with 
> mesa debug build
>
> mpv: picture_mpeg12.c:84: vlVaHandleSliceParameterBufferMPEG12: Assertion 
> `buf->size >= sizeof(VASliceParameterBufferMPEG2) && buf->num_elements == 1' 
> failed.
>
> Or play with non debug build, but depending on source vid may be
> slightly corrupted.
>
> Would be interesting to see if you see the same with this vid
> which easily shows the corruption.
>
> https://drive.google.com/drive/folders/0BxP5-S1t9VEEbkR4dWhTUFozV2s?usp=sharing
>
> Looks bad --hwdec-vaapi with or without --vo=vaapi
>
with --hwdec=vaapi and --vo=vaapi I see the corruption. But without
--vo=vaapi it uses VAAPI EGL interop and leads to this error
unsupported VA image format unknown

> OK with --hwdec=vdpau --vo=vdpau (just --hwdec=vdpau will be slightly wrong
> currently as there is a vdpau gl interop bug that causes half res)
>
Same for me.
>
>>> More generally - it's really good you are working on vaapi - I don't
>>> know what you've discusses with anyone but did you see the old threads
>>> around VAAPI_DISABLE_INTERLACE?
>>>
>> I haven't discussed it with anyone but I will try reading the old threads
>> and the
>> bug reports.
>
>
> Thanks.
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] anv: Default PointSize to 1.0 if not written by the shader

2017-01-13 Thread Jason Ekstrand
The Vulkan rules for point size are a bit whacky.  If you only have a
vertex shader and you use points, then you must write PointSize in your
vertex shader.  If you have a geometry or tessellation shader, then it's
dependent on the shaderTessellationAndGeometryPointSize device feature.
From the Vulkan 1.0.38 specification:

   "shaderTessellationAndGeometryPointSize indicates whether the
   PointSize built-in decoration is available in the tessellation
   control, tessellation evaluation, and geometry shader stages. If this
   feature is not enabled, members decorated with the PointSize built-in
   decoration must not be read from or written to and all points written
   from a tessellation or geometry shader will have a size of 1.0. This
   also indicates whether shader modules can declare the
   TessellationPointSize capability for tessellation control and
   evaluation shaders, or if the shader modules can declare the
   GeometryPointSize capability for geometry shaders. An implementation
   supporting this feature must also support one or both of the
   tessellationShader or geometryShader features."

In other words, if the feature is disbled (the client can disable
features!) then they don't write PointSize and we provide a 1.0 default
but if the feature is enabled, they do write PointSize and we use the
one they wrote in the shader.  There are at least two valid ways we can
implement this:

 1) Track whether or not shaderTessellationAndGeometryPointSize is
enabled and set the 3DSTATE_SF bits based on that and what stages
are enabled, ignoring the shader source.

 2) Just look at the last geometry stage VUE map and see if they wrote
PointSize and set the 3DSTATE_SF accordingly.

The second solution is the easiest and the most robust against invalid
usage of the Vulkan API, so we choose to go with that one.

This fixes all of the dEQP-VK.tessellation.primitive_discard.*point_mode
tests.  The tests are also broken because they unconditionally enable
shaderTessellationAndGeometryPointSize if it's supported by the
implementation and then don't write PointSize in the evaluation shader.
However, since this is the "robust against invalid API usage" solution,
the tests happily pass. :-)
---
 src/intel/vulkan/genX_pipeline.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
index 7fa68c0..a537a40 100644
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -420,8 +420,16 @@ emit_rs_state(struct anv_pipeline *pipeline,
sf.TriangleStripListProvokingVertexSelect = 0;
sf.LineStripListProvokingVertexSelect = 0;
sf.TriangleFanProvokingVertexSelect = 1;
-   sf.PointWidthSource = Vertex;
-   sf.PointWidth = 1.0;
+
+   const struct brw_vue_prog_data *last_vue_prog_data =
+  anv_pipeline_get_last_vue_prog_data(pipeline);
+
+   if (last_vue_prog_data->vue_map.slots_valid & VARYING_BIT_PSIZ) {
+  sf.PointWidthSource = Vertex;
+   } else {
+  sf.PointWidthSource = State;
+  sf.PointWidth = 1.0;
+   }
 
 #if GEN_GEN >= 8
struct GENX(3DSTATE_RASTER) raster = {
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glx: Add missing glproto dependency for gallium-xlib glx

2017-01-13 Thread Chuck Atkins
Just saw this got merged, thanks!  Any chance of it getting to stable for
the 13.1 release?

--
Chuck Atkins
Staff R Engineer, Scientific Computing
Kitware, Inc.

On Mon, Jan 9, 2017 at 11:10 PM, Cherniak, Bruce 
wrote:

> This comes in very handy on a SLES11 (or similar) based install.
>
> Reviewed-by: Bruce Cherniak 
>
> > On Jan 6, 2017, at 7:27 AM, Chuck Atkins 
> wrote:
> >
> > Cc: mesa-sta...@lists.freedesktop.org
> > Cc: Bruce Cherniak 
> > Signed-of-by: Chuck Atkins 
> > ---
> > configure.ac| 4 +++-
> > src/gallium/state_trackers/glx/xlib/Makefile.am | 1 +
> > 2 files changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/configure.ac b/configure.ac
> > index d1ffb57..092bea0 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -1597,6 +1597,9 @@ AC_ARG_ENABLE([driglx-direct],
> > dnl
> > dnl libGL configuration per driver
> > dnl
> > +if test "x$enable_glx" != xno; then
> > +PKG_CHECK_MODULES([GLPROTO], [glproto >= $GLPROTO_REQUIRED])
> > +fi
> > case "x$enable_glx" in
> > xxlib | xgallium-xlib)
> > # Xlib-based GLX
> > @@ -1610,7 +1613,6 @@ xxlib | xgallium-xlib)
> > ;;
> > xdri)
> > # DRI-based GLX
> > -PKG_CHECK_MODULES([GLPROTO], [glproto >= $GLPROTO_REQUIRED])
> >
> > # find the DRI deps for libGL
> > dri_modules="x11 xext xdamage xfixes x11-xcb xcb xcb-glx >=
> $XCBGLX_REQUIRED"
> > diff --git a/src/gallium/state_trackers/glx/xlib/Makefile.am
> b/src/gallium/state_trackers/glx/xlib/Makefile.am
> > index a7e6c0c..112030be 100644
> > --- a/src/gallium/state_trackers/glx/xlib/Makefile.am
> > +++ b/src/gallium/state_trackers/glx/xlib/Makefile.am
> > @@ -25,6 +25,7 @@ include $(top_srcdir)/src/gallium/Automake.inc
> >
> > AM_CFLAGS = \
> >   $(GALLIUM_CFLAGS) \
> > + $(GLPROTO_CFLAGS) \
> >   $(X11_INCLUDES)
> > AM_CPPFLAGS = \
> >   -I$(top_srcdir)/include \
> > --
> > 2.7.4
> >
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] glsl: Use hash table cloning in copy propagation

2017-01-13 Thread Vladislav Egorov

13.01.2017 15:31, Tapani Pälli пишет:



On 01/12/2017 09:23 PM, Thomas Helland wrote:

Walking the whole hash table, inserting entries by hashing them first
is just a really really bad idea. We can simply memcpy the whole thing.


Maybe it is just 'really' not 'really really' since I don't spot any 
difference in time running the torture test in bug #94477 (oscillates 
close to 120s with both with and without these patches), I would 
expect at least some difference as it is utilizing this path a lot. 
Did you measure performance difference?




It wouldn't help the torture case from the bug, because that shader 
doesn't have LOOP and IF blocks, so more efficient copying the ACP for 
LOOP/IF blocks would not be even touched.


Quick benchmark of Tom's patches on shader-db.

Default shader-db, ./run -1, 10 runs:

  BEFOREAFTER
softpipe  3.20s 3.15s
radeonsi  5.17s 5.12s
i965/Haswell  7.33s 7.19s

On my full shader-db (50K+ shaders from games):

  BEFORE   AFTER
softpipe (5 runs) 156.6s   153.9s
i965  625s 613s

So it brings 1-2% speed across the board.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] NaN behavior in GLSL (was Re: [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs()))

2017-01-13 Thread Nicolai Hähnle

On 13.01.2017 18:53, Jason Ekstrand wrote:

On Fri, Jan 13, 2017 at 8:43 AM, Marek Olšák > wrote:

On Fri, Jan 13, 2017 at 5:25 PM, Jason Ekstrand
> wrote:
> On Fri, Jan 13, 2017 at 4:05 AM, Marek Olšák > wrote:
>>
>> On Fri, Jan 13, 2017 at 3:37 AM, Ilia Mirkin
> wrote:
>> > On Thu, Jan 12, 2017 at 9:13 PM, Jason Ekstrand
>
>> > wrote:
>> >> Unless, of course, it's controlled by the same hardware bit...
Clearly,
>> >> we
>> >> can can give you abs on rsq without denorm flushing (easy
shader hacks)
>> >> but
>> >> not the other way around.
>> >
>> > OK, so somehow I missed that earlier. However there's an
interesting
>> > section in the PRM:
>> >
>> >
>> >

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol07-3d_media_gpgpu.pdf


>> >
>> > on PDF page 854, "Dismissed Legacy Behaviors" which has a list of
>> > suggested IEEE 754 deviations for DX9. One of them is indeed
that 0 *
>> > x = 0, but another is that input NaNs be propagated with certain
>> > exceptions. Also they suggest that RCP(0)/RSQ(0) = fmax.
Interesting.
>> >
>> > So at this point, the zero_wins thing is pretty much blown. i965
>> > appears to have an all-or-nothing approach, and additionally that
>> > approach doesn't match up exactly to what NVIDIA does (or at
least I'm
>> > not aware of a clamp-everything mode).
>> >
>> > This will take some thought to figure out how something can be
>> > specified so that a single spec works for both i965 and nv/amd.
OTOH
>> > we could have two different specs that just expose different
things -
>> > e.g. i965 could expose a MESA_shader_float_alt_mode or whatever
which
>> > is spec'd to do the things that the PRM says, and nv/amd have the
>> > MESA_shader_float_zero_wins ext which does what we were talking
about
>> > earlier.
>> >
>> > I'm open to other suggestions too.
>>
>> There is also the "small" problem that it would take a non-trivial
>> effort for us on the LLVM side. You guys can flip a switch. We can't.
>
>
> Don't you have to expend that effort for ARB programs anyway?  I
thought
> they weren't supposed to generate NaN either.

No, we don't, because st/mesa adds abs before RSQ and the driver
implements POW as log+mul+exp, where mul follows the rule
0*anything=0. I don't think any other opcode follows that rule though.


Ah.  That makes sense.  Do you also implement DIV as MUL+RCP?


For single-precision, yes. For double-precision, it seems we need to 
move away from that due to precision issues (which is itself a bit odd, 
since you don't seem to have encountered that?).


Nicolai


 If so,
the two of those should take care of NaN getting generated in the
shader.  We'd still have to do something about inf and maybe denorms.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] NaN behavior in GLSL (was Re: [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs()))

2017-01-13 Thread Matteo Bruni
2017-01-13 3:37 GMT+01:00 Ilia Mirkin :
> On Thu, Jan 12, 2017 at 9:13 PM, Jason Ekstrand  wrote:
>> Unless, of course, it's controlled by the same hardware bit... Clearly, we
>> can can give you abs on rsq without denorm flushing (easy shader hacks) but
>> not the other way around.
>
> OK, so somehow I missed that earlier. However there's an interesting
> section in the PRM:
>
> https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol07-3d_media_gpgpu.pdf
>
> on PDF page 854, "Dismissed Legacy Behaviors" which has a list of
> suggested IEEE 754 deviations for DX9. One of them is indeed that 0 *
> x = 0, but another is that input NaNs be propagated with certain
> exceptions. Also they suggest that RCP(0)/RSQ(0) = fmax. Interesting.
>
> So at this point, the zero_wins thing is pretty much blown. i965
> appears to have an all-or-nothing approach, and additionally that
> approach doesn't match up exactly to what NVIDIA does (or at least I'm
> not aware of a clamp-everything mode).
>
> This will take some thought to figure out how something can be
> specified so that a single spec works for both i965 and nv/amd. OTOH
> we could have two different specs that just expose different things -
> e.g. i965 could expose a MESA_shader_float_alt_mode or whatever which
> is spec'd to do the things that the PRM says, and nv/amd have the
> MESA_shader_float_zero_wins ext which does what we were talking about
> earlier.
>
> I'm open to other suggestions too.

Maybe we can go back to the original idea and have the extension
require that no NaNs can be generated by GLSL mathematical operators
and builtin functions (if no operand is a NaN?) It's possible that's
not exactly it but in any case the idea is to just specify expected
results, without requiring a specific route to get there. The
extension could introduce undefined behavior where necessary e.g.
allowing (but not requiring) INF results to be always flushed to fmax
when enabled.

For Intel that would work trivially. For AMD it should be a matter of
using the special instructions where necessary and "be careful" in a
few places (in the same vein as the RSQ and POW opcodes of ARB
programs Marek mentioned). Not sure about nouveau, I guess it should
be similar to AMD in the end.

Would that be too messy? Am I completely missing the point?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98428] Undefined non-weak-symbol in dri-drivers

2017-01-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98428

NicolasChauvet  changed:

   What|Removed |Added

 Attachment #127532|0   |1
is obsolete||
 Attachment #128291|0   |1
is obsolete||

--- Comment #13 from NicolasChauvet  ---
Created attachment 128934
  --> https://bugs.freedesktop.org/attachment.cgi?id=128934=edit
glapi: v3 Link with glapi when built shared

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98428] Undefined non-weak-symbol in dri-drivers

2017-01-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98428

--- Comment #15 from NicolasChauvet  ---
Created attachment 128936
  --> https://bugs.freedesktop.org/attachment.cgi?id=128936=edit
mesa: glapi: Clean-up dlopening glapi as we are building shared by default

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98428] Undefined non-weak-symbol in dri-drivers

2017-01-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98428

--- Comment #14 from NicolasChauvet  ---
Created attachment 128935
  --> https://bugs.freedesktop.org/attachment.cgi?id=128935=edit
Revert "gbm: dlopen libglapi so gbm_create_device works"

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] egl/wayland: resolve quirky try_damage_buffer() implementation

2017-01-13 Thread Emil Velikov
From: Emil Velikov 

The implementation was added with commit d085a5dff5b and effectively
provided a hidden dependency.

Namely: the codepath used was determined solely during build time. Thus
if we built again new wayland and then run against older (yet still
within the requirements, as per the configure) one will get undefined
symbols.

As of earlier commit 36b9976e1f9 "egl/wayland: Avoid race conditions
when on non-main thread" the required version was bumped to one which
provides the API, thus we can drop the quirky solution.

Cc: Derek Foreman 
Signed-off-by: Emil Velikov 
---
One way to avoid the issue w/o bumping the requirement (for -stable) is
to add fall-back define alongside weak implementation of the functions.
The latter should "return false" and will get automatically overridden
if new enough wayland is used.

Not sure how much one should care - just thinking out loud.
---
 src/egl/drivers/dri2/platform_wayland.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_wayland.c 
b/src/egl/drivers/dri2/platform_wayland.c
index 4009cc9691..3057604d3c 100644
--- a/src/egl/drivers/dri2/platform_wayland.c
+++ b/src/egl/drivers/dri2/platform_wayland.c
@@ -669,14 +669,6 @@ try_damage_buffer(struct dri2_egl_surface *dri2_surf,
   const EGLint *rects,
   EGLint n_rects)
 {
-/* The WL_SURFACE_DAMAGE_BUFFER_SINCE_VERSION macro and
- * wl_proxy_get_version() were both introduced in wayland 1.10.
- * Instead of bumping our wayland dependency we just make this
- * function conditional on the required 1.10 features, falling
- * back to old (correct but suboptimal) behaviour for older
- * wayland.
- */
-#ifdef WL_SURFACE_DAMAGE_BUFFER_SINCE_VERSION
int i;
 
if (wl_proxy_get_version((struct wl_proxy *) dri2_surf->wl_win->surface)
@@ -692,8 +684,6 @@ try_damage_buffer(struct dri2_egl_surface *dri2_surf,
rect[2], rect[3]);
}
return EGL_TRUE;
-#endif
-   return EGL_FALSE;
 }
 /**
  * Called via eglSwapBuffers(), drv->API.SwapBuffers().
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl/wayland: resolve quirky try_damage_buffer() implementation

2017-01-13 Thread Daniel Stone
Hi Emil,

On 13 January 2017 at 17:27, Emil Velikov  wrote:
> The implementation was added with commit d085a5dff5b and effectively
> provided a hidden dependency.
>
> Namely: the codepath used was determined solely during build time. Thus
> if we built again new wayland and then run against older (yet still
> within the requirements, as per the configure) one will get undefined
> symbols.
>
> As of earlier commit 36b9976e1f9 "egl/wayland: Avoid race conditions
> when on non-main thread" the required version was bumped to one which
> provides the API, thus we can drop the quirky solution.

Works for me:
Reviewed-by: Daniel Stone 

> One way to avoid the issue w/o bumping the requirement (for -stable) is
> to add fall-back define alongside weak implementation of the functions.
> The latter should "return false" and will get automatically overridden
> if new enough wayland is used.
>
> Not sure how much one should care - just thinking out loud.

On the other hand, I'm more than happy to leave the runtime-resolution
and stable branch details to you!

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] glsl: Use hash table cloning in copy propagation

2017-01-13 Thread Владислав Егоров
> Quick benchmark of Tom's patches on shader-db.

Thomas' patch, sorry. It's hard to simultaneously compose messages, and
play in Paw Patrol with 3 yo kid.

2017-01-13 20:41 GMT+03:00 Vladislav Egorov :

> 13.01.2017 15:31, Tapani Pälli пишет:
>
>>
>>
>> On 01/12/2017 09:23 PM, Thomas Helland wrote:
>>
>>> Walking the whole hash table, inserting entries by hashing them first
>>> is just a really really bad idea. We can simply memcpy the whole thing.
>>>
>>
>> Maybe it is just 'really' not 'really really' since I don't spot any
>> difference in time running the torture test in bug #94477 (oscillates close
>> to 120s with both with and without these patches), I would expect at least
>> some difference as it is utilizing this path a lot. Did you measure
>> performance difference?
>>
>>
> It wouldn't help the torture case from the bug, because that shader
> doesn't have LOOP and IF blocks, so more efficient copying the ACP for
> LOOP/IF blocks would not be even touched.
>
> Quick benchmark of Tom's patches on shader-db.
>
> Default shader-db, ./run -1, 10 runs:
>
>   BEFOREAFTER
> softpipe  3.20s 3.15s
> radeonsi  5.17s 5.12s
> i965/Haswell  7.33s 7.19s
>
> On my full shader-db (50K+ shaders from games):
>
>   BEFORE   AFTER
> softpipe (5 runs) 156.6s   153.9s
> i965  625s 613s
>
> So it brings 1-2% speed across the board.
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/7] i965: Add intel_batchbuffer_flush_fence()

2017-01-13 Thread Chad Versace
A variant of intel_batchbuffer_flush() with parameters for in and out
fence fds.
---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 24 ++--
 src/mesa/drivers/dri/i965/intel_batchbuffer.h | 14 --
 2 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index d1b9317a8c..67054cf77f 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -319,7 +319,7 @@ throttle(struct brw_context *brw)
 /* TODO: Push this whole function into bufmgr.
  */
 static int
-do_flush_locked(struct brw_context *brw)
+do_flush_locked(struct brw_context *brw, int in_fence_fd, int *out_fence_fd)
 {
struct intel_batchbuffer *batch = >batch;
int ret = 0;
@@ -353,11 +353,15 @@ do_flush_locked(struct brw_context *brw)
 brw_annotate_aub(brw);
 
 if (brw->hw_ctx == NULL || batch->ring != RENDER_RING) {
+assert(in_fence_fd == -1);
+assert(out_fence_fd == NULL);
 ret = drm_intel_bo_mrb_exec(batch->bo, 4 * USED_BATCH(*batch),
 NULL, 0, 0, flags);
 } else {
-   ret = drm_intel_gem_bo_context_exec(batch->bo, brw->hw_ctx,
-4 * USED_BATCH(*batch), flags);
+   ret = drm_intel_gem_bo_fence_exec(batch->bo, brw->hw_ctx,
+4 * USED_BATCH(*batch),
+in_fence_fd, out_fence_fd,
+flags);
 }
   }
 
@@ -378,9 +382,17 @@ do_flush_locked(struct brw_context *brw)
return ret;
 }
 
+/**
+ * The in_fence_fd is ignored if -1.  Otherwise this function takes ownership
+ * of the fd.
+ *
+ * The out_fence_fd is ignored if NULL. Otherwise, the caller takes ownership
+ * of the returned fd.
+ */
 int
-_intel_batchbuffer_flush(struct brw_context *brw,
-const char *file, int line)
+_intel_batchbuffer_flush_fence(struct brw_context *brw,
+   int in_fence_fd, int *out_fence_fd,
+   const char *file, int line)
 {
int ret;
 
@@ -419,7 +431,7 @@ _intel_batchbuffer_flush(struct brw_context *brw,
/* Check that we didn't just wrap our batchbuffer at a bad time. */
assert(!brw->no_batch_wrap);
 
-   ret = do_flush_locked(brw);
+   ret = do_flush_locked(brw, in_fence_fd, out_fence_fd);
 
if (unlikely(INTEL_DEBUG & DEBUG_SYNC)) {
   fprintf(stderr, "waiting for idle\n");
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
index ee03a44c9e..bf7cadfc4d 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
@@ -46,14 +46,16 @@ void intel_batchbuffer_save_state(struct brw_context *brw);
 void intel_batchbuffer_reset_to_saved(struct brw_context *brw);
 void intel_batchbuffer_require_space(struct brw_context *brw, GLuint sz,
  enum brw_gpu_ring ring);
+int _intel_batchbuffer_flush_fence(struct brw_context *brw,
+   int in_fence_fd, int *out_fence_fd,
+   const char *file, int line);
 
-int _intel_batchbuffer_flush(struct brw_context *brw,
-const char *file, int line);
-
-#define intel_batchbuffer_flush(intel) \
-   _intel_batchbuffer_flush(intel, __FILE__, __LINE__)
-
+#define intel_batchbuffer_flush(brw) \
+   _intel_batchbuffer_flush_fence((brw), -1, NULL, __FILE__, __LINE__)
 
+#define intel_batchbuffer_flush_fence(brw, in_fence_fd, out_fence_fd) \
+   _intel_batchbuffer_flush_fence((brw), (in_fence_fd), (out_fence_fd), \
+  __FILE__, __LINE__)
 
 /* Unlike bmBufferData, this currently requires the buffer be mapped.
  * Consider it a convenience function wrapping multple
-- 
2.11.0.21.ga274e0a

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] glx: Add missing glproto dependency for gallium-xlib glx

2017-01-13 Thread Chuck Atkins
Hi Emil,


> It will land for 13.0,


Excellent!  Sorry for the confusion.  That's what I was looking for.  It
caused specific pains for deploying on "older" Cray systems, whch are a
large part of my userbase.  This way I can stop patching the builds and
move to an actual release.



> but I can check if it's applicable for 12.0 and get it in there as well.
>

It should be applicable but I don't particularly need it for anything; I've
moved all of my customers to 13.x so unless anybody's specifically asking
for it then I wouldn't bother.

 Thanks again
- Chuck
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/7] i965: Add intel_screen::has_fence_fd

2017-01-13 Thread Chad Versace
This bool maps to I915_PARAM_HAS_EXEC_FENCE_FD.

TODO: The i915 param is not yet upstream.  Wait for the kernel interface
  before committing.
---
 src/mesa/drivers/dri/i965/intel_screen.c | 3 +++
 src/mesa/drivers/dri/i965/intel_screen.h | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index a8d401cdff..dffb003c99 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1811,6 +1811,9 @@ __DRIconfig **intelInitScreen2(__DRIscreen *dri_screen)
 intel_get_boolean(screen, I915_PARAM_HAS_RESOURCE_STREAMER);
}
 
+   screen->has_exec_fence =
+ intel_get_boolean(screen, I915_PARAM_HAS_EXEC_FENCE);
+
return (const __DRIconfig**) intel_screen_make_configs(dri_screen);
 }
 
diff --git a/src/mesa/drivers/dri/i965/intel_screen.h 
b/src/mesa/drivers/dri/i965/intel_screen.h
index 890dd9044b..a1e2b31774 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.h
+++ b/src/mesa/drivers/dri/i965/intel_screen.h
@@ -47,8 +47,8 @@ struct intel_screen
uint64_t max_gtt_map_object_size;
 
bool no_hw;
-
bool hw_has_swizzling;
+   bool has_exec_fence; /**< I915_PARAM_HAS_EXEC_FENCE */
 
int hw_has_timestamp;
 
-- 
2.11.0.21.ga274e0a

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/7] i965: Implement EGL_ANDROID_native_fence_sync

2017-01-13 Thread Chad Versace
This series depends on fence fd support in I915_GEM_EXECBUFFER2, which
isn't upstream in libdrm nor the kernel yet.  I tested this with kmscube
on Skylake, and everything looked good to me.

I pushed tags for this series as well as all the code I tested with:

mesa: 
http://git.kiwitree.net/cgit/~chadv/mesa/tag/?h=chadv/review/i965-exec-fence-v03
libdrm: 
http://git.kiwitree.net/cgit/~chadv/libdrm/tag/?h=chadv/review/intel-exec-fence-v01
linux: 
http://git.kiwitree.net/cgit/~chadv/linux/tag/?h=chadv/test/i915-exec-fence-v03
kmscube: 
http://git.kiwitree.net/cgit/~chadv/kmscube/tag/?h=chadv/test/fences-v02

I submitted the libdrm patches to the intel-gfx list. Someone else
should submit the kernel patches, as I tested them but don't grok them.

Chad Versace (7):
  i965: Add intel_screen::has_fence_fd
  i965: Add intel_batchbuffer_flush_fence()
  i965/sync: Add brw_fence::type
  i965/sync: Fail sync creation when batchbuffer flush fails
  i965/sync: Rename brw_fence_insert()
  WAIT: configure: Bump libdrm requirement to 2.4.XX
  i965/sync: Implement fences based on Linux sync_file

 configure.ac  |   3 +-
 src/mesa/drivers/dri/i965/brw_sync.c  | 293 ++
 src/mesa/drivers/dri/i965/intel_batchbuffer.c |  24 ++-
 src/mesa/drivers/dri/i965/intel_batchbuffer.h |  14 +-
 src/mesa/drivers/dri/i965/intel_screen.c  |   3 +
 src/mesa/drivers/dri/i965/intel_screen.h  |   2 +-
 6 files changed, 287 insertions(+), 52 deletions(-)

-- 
2.11.0.21.ga274e0a

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 1/2] egl/wayland: Cleanup private display connection when init fails

2017-01-13 Thread Emil Velikov
On 13 January 2017 at 15:05, Jonas Ådahl  wrote:
> When failing to initializing the Wayland EGL driver, don't leak the
> display server connection if it was us who created it.
>
> Signed-off-by: Jonas Ådahl 
> Cc: mesa-sta...@lists.freedesktop.org
Added the r-b tags and pushed to master.

Thanks Jonas !
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] shader-db: Update the README

2017-01-13 Thread Elie Tournier
Use the binary to run shader-db instead of run.py

Signed-off-by: Elie Tournier 
---
 README | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/README b/README
index 5e9bb2d..6f6a7e2 100644
--- a/README
+++ b/README
@@ -1,6 +1,6 @@
 === What ===
 
-A giant pile of shaders from various apps, for whatever purpose.  In
+A giant pile of shaders from various apps, for whatever purpose. In
 particular, we use it to capture assembly output of the shader
 compiler for analysis of regressions in compiler behavior.
 
@@ -16,10 +16,16 @@ MESA_SHADER_CAPTURE_PATH=dirpath executable
 
 # "fdupes" can be used to remove duplicates
 
+=== Compiling ===
+
+Some libraries are required when building. See section "Dependencies" below.
+To build the binary, do:
+make
 
 === i965 Usage ===
 
 === Running shaders ===
+
 ./run shaders 2> err | tee new-run
 
 # To run just a subset:
@@ -34,8 +40,8 @@ To compile shaders for an i965 PCI ID different from your 
system, pass
 to run.
 
 === Analysis ===
-./report.py old-run new-run
 
+./report.py old-run new-run
 
 === radeonsi Usage ===
 
@@ -46,6 +52,7 @@ to run.
 Note that a debug mesa build required (ie. --enable-debug)
 
 === Analysis ===
+
 ./si-report.py old-run new-run
 
 === freedreno Usage ===
@@ -59,15 +66,22 @@ Note that a debug mesa build required (ie. --enable-debug)
 -1 option for disabling multi-threading is required to avoid garbled shader 
dumps.
 
 === Analysis ===
+
 ./fd-report.py old-run new-run
 
 === Dependencies ===
+
 run requires some GNU C extensions, render nodes (/dev/dri/renderD128),
 libepoxy, OpenMP, and Mesa configured with --with-egl-platforms=x11,drm
 
 === jemalloc ===
+
 Since run compiles shaders in different threads, malloc/free locking overhead
 from inside Mesa can be expensive. Preloading jemalloc can cut significant
 amounts of time:
 
 LD_PRELOAD=/usr/lib64/libjemalloc.so.1 ./run shaders 2> err | tee new-run
+
+=== Depreciated ===
+
+run.py is obsolete. Use the 'run' binary instead.
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/7] WAIT: configure: Bump libdrm requirement to 2.4.XX

2017-01-13 Thread Chad Versace
Required to implement EGL_ANDROID_native_fence_sync on i965.
Specifically, i965 needs drm_intel_gem_bo_exec_fence(),
I915_PARAM_HAS_EXEC_FENCE, and libsync.h.

TODO: Pick real libdrm version after Intel exec fences land.
---
 configure.ac | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index 459f3e8b0a..845999fb17 100644
--- a/configure.ac
+++ b/configure.ac
@@ -68,7 +68,8 @@ OPENCL_VERSION=1
 AC_SUBST([OPENCL_VERSION])
 
 dnl Versions for external dependencies
-LIBDRM_REQUIRED=2.4.66
+# TODO(chadv): Pick real libdrm version after Intel exec fences land
+LIBDRM_REQUIRED=2.4.73
 LIBDRM_RADEON_REQUIRED=2.4.56
 LIBDRM_AMDGPU_REQUIRED=2.4.63
 LIBDRM_INTEL_REQUIRED=2.4.61
-- 
2.11.0.21.ga274e0a

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/7] i965/sync: Implement fences based on Linux sync_file

2017-01-13 Thread Chad Versace
This patch implements a new type of struct brw_fence, one that is based
struct sync_file.

This completes support for EGL_ANDROID_native_fence_sync.

* Background

  Linux 4.7 added a new file type, struct sync_file. See

commit 460bfc41fd52959311ed0328163f785e023857af
Author:  Gustavo Padovan 
Date:Thu Apr 28 10:46:57 2016 -0300
Subject: dma-buf/sync_file: de-stage sync_file headers

  A sync file is a cross-driver explicit synchronization primitive. In a
  sense, sync_file's relation to synchronization is similar to dma_buf's
  relation to memory: both are primitives that can be imported and
  exported across drivers (at least in theory).
---
 src/mesa/drivers/dri/i965/brw_sync.c | 162 ++-
 1 file changed, 159 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_sync.c 
b/src/mesa/drivers/dri/i965/brw_sync.c
index 77d382cab6..1c5d5a50b6 100644
--- a/src/mesa/drivers/dri/i965/brw_sync.c
+++ b/src/mesa/drivers/dri/i965/brw_sync.c
@@ -38,6 +38,8 @@
  * performance bottleneck, though.
  */
 
+#include  /* Requires Android or libdrm-2.4.72 */
+
 #include "main/imports.h"
 
 #include "brw_context.h"
@@ -47,11 +49,19 @@ struct brw_fence {
struct brw_context *brw;
 
enum brw_fence_type {
+  /** The fence waits for completion of brw_fence::batch_bo. */
   BRW_FENCE_TYPE_BO_WAIT,
+
+  /** The fence waits for brw_fence::sync_fd to signal. */
+  BRW_FENCE_TYPE_SYNC_FD,
} type;
 
-   /** The fence waits for completion of this batch. */
-   drm_intel_bo *batch_bo;
+   union {
+  drm_intel_bo *batch_bo;
+
+  /* This struct owns the fd. */
+  int sync_fd;
+   };
 
mtx_t mutex;
bool signalled;
@@ -74,6 +84,9 @@ brw_fence_init(struct brw_context *brw, struct brw_fence 
*fence,
case BRW_FENCE_TYPE_BO_WAIT:
   fence->batch_bo = NULL;
   break;
+case BRW_FENCE_TYPE_SYNC_FD:
+  fence->sync_fd = -1;
+  break;
}
 }
 
@@ -85,6 +98,10 @@ brw_fence_finish(struct brw_fence *fence)
   if (fence->batch_bo)
  drm_intel_bo_unreference(fence->batch_bo);
   break;
+   case BRW_FENCE_TYPE_SYNC_FD:
+  if (fence->sync_fd != -1)
+ close(fence->sync_fd);
+  break;
}
 
mtx_destroy(>mutex);
@@ -109,11 +126,46 @@ brw_fence_insert_locked(struct brw_context *brw, struct 
brw_fence *fence)
  return false;
   }
   break;
+   case BRW_FENCE_TYPE_SYNC_FD:
+  assert(!fence->signalled);
+
+  if (fence->sync_fd == -1) {
+ /* Create an out-fence that signals after all pending commands
+  * complete.
+  */
+ if (intel_batchbuffer_flush_fence(brw, -1, >sync_fd) < 0)
+return false;
+ assert(fence->sync_fd != -1);
+  } else {
+ /* Wait on the in-fence before executing any subsequently submitted
+  * commands.
+  */
+ if (intel_batchbuffer_flush(brw) < 0)
+return false;
+
+ /* Emit a dummy batch just for the fence. */
+ brw_emit_mi_flush(brw);
+ if (intel_batchbuffer_flush_fence(brw, fence->sync_fd, NULL) < 0)
+return false;
+  }
+  break;
}
 
return true;
 }
 
+static bool MUST_CHECK
+brw_fence_insert(struct brw_context *brw, struct brw_fence *fence)
+{
+   bool ret;
+
+   mtx_lock(>mutex);
+   ret = brw_fence_insert_locked(brw, fence);
+   mtx_unlock(>mutex);
+
+   return ret;
+}
+
 static bool
 brw_fence_has_completed_locked(struct brw_fence *fence)
 {
@@ -135,6 +187,16 @@ brw_fence_has_completed_locked(struct brw_fence *fence)
   fence->signalled = true;
 
   return true;
+
+   case BRW_FENCE_TYPE_SYNC_FD:
+  assert(fence->sync_fd != -1);
+
+  if (sync_wait(fence->sync_fd, 0) == -1)
+ return false;
+
+  fence->signalled = true;
+
+  return true;
}
 
return false;
@@ -156,6 +218,8 @@ static bool
 brw_fence_client_wait_locked(struct brw_context *brw, struct brw_fence *fence,
  uint64_t timeout)
 {
+   int32_t timeout_i32;
+
if (fence->signalled)
   return true;
 
@@ -182,6 +246,20 @@ brw_fence_client_wait_locked(struct brw_context *brw, 
struct brw_fence *fence,
   fence->batch_bo = NULL;
 
   return true;
+   case BRW_FENCE_TYPE_SYNC_FD:
+  if (fence->sync_fd == -1)
+ return false;
+
+  if (timeout > INT32_MAX)
+ timeout_i32 = -1;
+  else
+ timeout_i32 = timeout;
+
+  if (sync_wait(fence->sync_fd, timeout_i32) == -1)
+ return false;
+
+  fence->signalled = true;
+  return true;
}
 
assert(!"bad enum brw_fence_type");
@@ -216,6 +294,16 @@ brw_fence_server_wait(struct brw_context *brw, struct 
brw_fence *fence)
* the previous one is done.
*/
   break;
+   case BRW_FENCE_TYPE_SYNC_FD:
+  assert(fence->sync_fd != -1);
+
+  /* The user wants explicit synchronization, so give them what they want. 

[Mesa-dev] [PATCH 4/7] i965/sync: Fail sync creation when batchbuffer flush fails

2017-01-13 Thread Chad Versace
Pre-patch, brw_sync.c ignored the return value of
intel_batchbuffer_flush().

When intel_batchbuffer_flush() fails during eglCreateSync
(brw_dri_create_fence), we now give up, cleanup, and return NULL.

When it fails during glFenceSync, however, we blindly continue and hope
for the best because there does not exist yet a way to tell core GL that
sync creation failed.
---
 src/mesa/drivers/dri/i965/brw_sync.c | 34 --
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_sync.c 
b/src/mesa/drivers/dri/i965/brw_sync.c
index f9127a41ea..24c8cbd3b0 100644
--- a/src/mesa/drivers/dri/i965/brw_sync.c
+++ b/src/mesa/drivers/dri/i965/brw_sync.c
@@ -90,7 +90,7 @@ brw_fence_finish(struct brw_fence *fence)
mtx_destroy(>mutex);
 }
 
-static void
+static bool MUST_CHECK
 brw_fence_insert(struct brw_context *brw, struct brw_fence *fence)
 {
brw_emit_mi_flush(brw);
@@ -102,9 +102,16 @@ brw_fence_insert(struct brw_context *brw, struct brw_fence 
*fence)
 
   fence->batch_bo = brw->batch.bo;
   drm_intel_bo_reference(fence->batch_bo);
-  intel_batchbuffer_flush(brw);
+
+  if (intel_batchbuffer_flush(brw) < 0) {
+ drm_intel_bo_unreference(fence->batch_bo);
+ fence->batch_bo = NULL;
+ return false;
+  }
   break;
}
+
+   return true;
 }
 
 static bool
@@ -115,8 +122,10 @@ brw_fence_has_completed_locked(struct brw_fence *fence)
 
switch (fence->type) {
case BRW_FENCE_TYPE_BO_WAIT:
-  if (!fence->batch_bo)
+  if (!fence->batch_bo) {
+ /* There may be no batch if intel_batchbuffer_flush() failed. */
  return false;
+  }
 
   if (drm_intel_bo_busy(fence->batch_bo))
  return false;
@@ -152,7 +161,10 @@ brw_fence_client_wait_locked(struct brw_context *brw, 
struct brw_fence *fence,
 
switch (fence->type) {
case BRW_FENCE_TYPE_BO_WAIT:
-  assert(fence->batch_bo);
+  if (!fence->batch_bo) {
+ /* There may be no batch if intel_batchbuffer_flush() failed. */
+ return false;
+  }
 
   /* DRM_IOCTL_I915_GEM_WAIT uses a signed 64 bit timeout and returns
* immediately for timeouts <= 0.  The best we can do is to clamp the
@@ -236,7 +248,12 @@ brw_gl_fence_sync(struct gl_context *ctx, struct 
gl_sync_object *_sync,
struct brw_gl_sync *sync = (struct brw_gl_sync *) _sync;
 
brw_fence_init(brw, >fence, BRW_FENCE_TYPE_BO_WAIT);
-   brw_fence_insert(brw, >fence);
+
+   if (!brw_fence_insert(brw, >fence)) {
+  /* FIXME: There exists no way to report a GL error here. If an error
+   * occurs, continue silently and hope for the best.
+   */
+   }
 }
 
 static void
@@ -291,7 +308,12 @@ brw_dri_create_fence(__DRIcontext *ctx)
   return NULL;
 
brw_fence_init(brw, fence, BRW_FENCE_TYPE_BO_WAIT);
-   brw_fence_insert(brw, fence);
+
+   if (!brw_fence_insert(brw, fence)) {
+  brw_fence_finish(fence);
+  free(fence);
+  return NULL;
+   }
 
return fence;
 }
-- 
2.11.0.21.ga274e0a

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97879] [amdgpu] Rocket League: long hangs (several seconds) when loading assets (models/textures/shaders?)

2017-01-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97879

--- Comment #51 from Jani Kärkkäinen  ---
Sent a support ticket to Psyonix about this (and the request for a debug
build). Hopefully a random persons support ticket gets to the dev team and they
deem it something that's in the realm of possibility. On a good note, already
got a positive response from the first-level support, so, fingers crossed. :D

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] anv: remove some unused macros and functions

2017-01-13 Thread Grazvydas Ignotas
VK_ICD_WSI_PLATFORM_MAX is used, but a duplicate from wsi_common.h .

Signed-off-by: Grazvydas Ignotas 
---
no commit access
requested by Emil: 
https://lists.freedesktop.org/archives/mesa-dev/2017-January/140733.html

 src/intel/vulkan/anv_private.h | 15 ---
 src/intel/vulkan/anv_util.c| 19 ---
 2 files changed, 34 deletions(-)

diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 2342fcb..afa5cbb 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -95,9 +95,7 @@ extern "C" {
 #define MAX_PUSH_CONSTANTS_SIZE 128
 #define MAX_DYNAMIC_BUFFERS 16
 #define MAX_IMAGES 8
-#define MAX_SAMPLES_LOG2 4 /* SKL supports 16 samples */
 
-#define anv_noreturn __attribute__((__noreturn__))
 #define anv_printflike(a, b) __attribute__((__format__(__printf__, a, b)))
 
 static inline uint32_t
@@ -243,9 +241,6 @@ void anv_loge_v(const char *format, va_list va);
 #define anv_validate if (0)
 #endif
 
-void anv_abortf(const char *format, ...) anv_noreturn anv_printflike(1, 2);
-void anv_abortfv(const char *format, va_list va) anv_noreturn;
-
 #define stub_return(v) \
do { \
   anv_finishme("stub %s", __func__); \
@@ -495,8 +490,6 @@ struct anv_bo *anv_scratch_pool_alloc(struct anv_device 
*device,
 
 extern struct anv_dispatch_table dtable;
 
-#define VK_ICD_WSI_PLATFORM_MAX 5
-
 struct anv_physical_device {
 VK_LOADER_DATA  _loader_data;
 
@@ -1893,14 +1886,6 @@ ANV_DEFINE_NONDISP_HANDLE_CASTS(anv_render_pass, 
VkRenderPass)
 ANV_DEFINE_NONDISP_HANDLE_CASTS(anv_sampler, VkSampler)
 ANV_DEFINE_NONDISP_HANDLE_CASTS(anv_shader_module, VkShaderModule)
 
-#define ANV_DEFINE_STRUCT_CASTS(__anv_type, __VkType) \
-   \
-   static inline const __VkType * \
-   __anv_type ## _to_ ## __VkType(const struct __anv_type *__anv_obj) \
-   { \
-  return (const __VkType *) __anv_obj; \
-   }
-
 /* Gen-specific function declarations */
 #ifdef genX
 #  include "anv_genX.h"
diff --git a/src/intel/vulkan/anv_util.c b/src/intel/vulkan/anv_util.c
index 2972cd2..6408ac8 100644
--- a/src/intel/vulkan/anv_util.c
+++ b/src/intel/vulkan/anv_util.c
@@ -63,25 +63,6 @@ __anv_finishme(const char *file, int line, const char 
*format, ...)
fprintf(stderr, "%s:%d: FINISHME: %s\n", file, line, buffer);
 }
 
-void anv_noreturn anv_printflike(1, 2)
-anv_abortf(const char *format, ...)
-{
-   va_list va;
-
-   va_start(va, format);
-   anv_abortfv(format, va);
-   va_end(va);
-}
-
-void anv_noreturn
-anv_abortfv(const char *format, va_list va)
-{
-   fprintf(stderr, "vk: error: ");
-   vfprintf(stderr, format, va);
-   fprintf(stderr, "\n");
-   abort();
-}
-
 VkResult
 __vk_errorf(VkResult error, const char *file, int line, const char *format, 
...)
 {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: remove some unused macros and functions

2017-01-13 Thread Jason Ekstrand
Acked-by: Jason Ekstrand 

I'll push it.

On Fri, Jan 13, 2017 at 3:10 PM, Grazvydas Ignotas 
wrote:

> VK_ICD_WSI_PLATFORM_MAX is used, but a duplicate from wsi_common.h .
>
> Signed-off-by: Grazvydas Ignotas 
> ---
> no commit access
> requested by Emil:
> https://lists.freedesktop.org/archives/mesa-dev/2017-January/140733.html
>
>  src/intel/vulkan/anv_private.h | 15 ---
>  src/intel/vulkan/anv_util.c| 19 ---
>  2 files changed, 34 deletions(-)
>
> diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_
> private.h
> index 2342fcb..afa5cbb 100644
> --- a/src/intel/vulkan/anv_private.h
> +++ b/src/intel/vulkan/anv_private.h
> @@ -95,9 +95,7 @@ extern "C" {
>  #define MAX_PUSH_CONSTANTS_SIZE 128
>  #define MAX_DYNAMIC_BUFFERS 16
>  #define MAX_IMAGES 8
> -#define MAX_SAMPLES_LOG2 4 /* SKL supports 16 samples */
>
> -#define anv_noreturn __attribute__((__noreturn__))
>  #define anv_printflike(a, b) __attribute__((__format__(__printf__, a,
> b)))
>
>  static inline uint32_t
> @@ -243,9 +241,6 @@ void anv_loge_v(const char *format, va_list va);
>  #define anv_validate if (0)
>  #endif
>
> -void anv_abortf(const char *format, ...) anv_noreturn anv_printflike(1,
> 2);
> -void anv_abortfv(const char *format, va_list va) anv_noreturn;
> -
>  #define stub_return(v) \
> do { \
>anv_finishme("stub %s", __func__); \
> @@ -495,8 +490,6 @@ struct anv_bo *anv_scratch_pool_alloc(struct
> anv_device *device,
>
>  extern struct anv_dispatch_table dtable;
>
> -#define VK_ICD_WSI_PLATFORM_MAX 5
> -
>  struct anv_physical_device {
>  VK_LOADER_DATA  _loader_data;
>
> @@ -1893,14 +1886,6 @@ ANV_DEFINE_NONDISP_HANDLE_CASTS(anv_render_pass,
> VkRenderPass)
>  ANV_DEFINE_NONDISP_HANDLE_CASTS(anv_sampler, VkSampler)
>  ANV_DEFINE_NONDISP_HANDLE_CASTS(anv_shader_module, VkShaderModule)
>
> -#define ANV_DEFINE_STRUCT_CASTS(__anv_type, __VkType) \
> -   \
> -   static inline const __VkType * \
> -   __anv_type ## _to_ ## __VkType(const struct __anv_type *__anv_obj) \
> -   { \
> -  return (const __VkType *) __anv_obj; \
> -   }
> -
>  /* Gen-specific function declarations */
>  #ifdef genX
>  #  include "anv_genX.h"
> diff --git a/src/intel/vulkan/anv_util.c b/src/intel/vulkan/anv_util.c
> index 2972cd2..6408ac8 100644
> --- a/src/intel/vulkan/anv_util.c
> +++ b/src/intel/vulkan/anv_util.c
> @@ -63,25 +63,6 @@ __anv_finishme(const char *file, int line, const char
> *format, ...)
> fprintf(stderr, "%s:%d: FINISHME: %s\n", file, line, buffer);
>  }
>
> -void anv_noreturn anv_printflike(1, 2)
> -anv_abortf(const char *format, ...)
> -{
> -   va_list va;
> -
> -   va_start(va, format);
> -   anv_abortfv(format, va);
> -   va_end(va);
> -}
> -
> -void anv_noreturn
> -anv_abortfv(const char *format, va_list va)
> -{
> -   fprintf(stderr, "vk: error: ");
> -   vfprintf(stderr, format, va);
> -   fprintf(stderr, "\n");
> -   abort();
> -}
> -
>  VkResult
>  __vk_errorf(VkResult error, const char *file, int line, const char
> *format, ...)
>  {
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallivm: correctly manage MCJIT at run-time

2017-01-13 Thread Emil Velikov
Earlier commit made the decision whether to use MCJIT a run-time one. At
the same time, it changed the code-flow in the following manner:
 - LLVMLinkInMCJIT() was executed regardless of whether MCJIT is to be
used or not. Admittedly it is a no-op at least in some builds.
 - LLVMLinkInJIT() was executed regardless of weather MCJIT is to be
used or not.

Resolve that my promoting USE_MCJIT to be static bool, always. Make sure
it's honoured and the correct LLVMLinkIn{MC,}JIT() function is called
only as needed.

Fixes: cf4105740f0 "gallivm: Make MCJIT a runtime option."
Cc: Zhen Wu 
Cc: Jose Fonseca 
Signed-off-by: Emil Velikov 
---
Jose, rather than jumping around like a headless chicken I've went ahead
and fixed things... or maybe I broke it ? Afaict this preserves the
original behaviour and linking should be perfectly fine.

XXX: worth dropping the ALL_CAPS from the, now, variable name ? Should
we squash it here or as separate patch ?

As an added bonus might even solve the issue Wu Zhen is hitting :-)
---
 src/gallium/auxiliary/gallivm/lp_bld_init.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c 
b/src/gallium/auxiliary/gallivm/lp_bld_init.c
index d1b2369f34..9a77c87ae4 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_init.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_init.c
@@ -45,9 +45,9 @@
 
 /* Only MCJIT is available as of LLVM SVN r216982 */
 #if HAVE_LLVM >= 0x0306
-#  define USE_MCJIT 1
+static bool USE_MCJIT = 1;
 #elif defined(PIPE_ARCH_PPC_64) || defined(PIPE_ARCH_S390) || 
defined(PIPE_ARCH_ARM) || defined(PIPE_ARCH_AARCH64)
-#  define USE_MCJIT 1
+static bool USE_MCJIT = 1;
 #else
 static bool USE_MCJIT = 0;
 #endif
@@ -395,11 +395,11 @@ lp_build_init(void)
if (gallivm_initialized)
   return TRUE;
 
-   LLVMLinkInMCJIT();
-#if !defined(USE_MCJIT)
-   USE_MCJIT = debug_get_bool_option("GALLIVM_MCJIT", 0);
-   LLVMLinkInJIT();
-#endif
+   USE_MCJIT = debug_get_bool_option("GALLIVM_MCJIT", USE_MCJIT);
+   if (USE_MCJIT)
+  LLVMLinkInMCJIT();
+   else
+  LLVMLinkInJIT();
 
 #ifdef DEBUG
gallivm_debug = debug_get_option_gallivm_debug();
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

2017-01-13 Thread Vladislav Egorov

14.01.2017 01:45, Timothy Arceri пишет:

I'm asking for a chance to test before we jump in, its probably not a
big deal and I may even still be able to reduce my use of hashing but
it would be nice to be given a few days to test and even explore
alternatives before jumping on this implementation.
A very quick and very dirty simple benchmark. I took shader-cache from 
github, branch shader-cache39. Then I've applied my preprocessor patch 
on top (because shader-cache still uses preprocessor even if the shader 
is cached and it was painful to see preprocessor taking more than half 
of the whole time). Then I've compiled it with openssl and with the 
Emil's patch. Full run on shader-db (300Mb+ of shaders) with 
shader-cache warmed up. It takes 78s, spends in libcrypto 0.27%. With 
OpenBSD SHA1 it runs approximately the same time, spends 0.53% in 
SHA1Transform() and other SHA1* functions. Subtest - 46Mb of shaders 
from Total War: Attila - 3.10s (for some reason, the cache works much 
faster on smaller subsets than on full shader-db). 1.08% were spent in 
libcrypto, 1.04% in sha1_block_data_order_avx2(). With OpenBSD 3.07s - 
2.27% in SHA1Transform() and other SHA1* functions.


Overall not that significant in context of shader-cache, but as 
expected, on Haswell it's twice slower than OpenSSL's AVX2 implementation.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/22] i965/vec4: fix SIMD-width lowering for double_to_single operation in IVB/VLV

2017-01-13 Thread Matt Turner
On Thu, Jan 5, 2017 at 5:07 AM, Samuel Iglesias Gonsálvez
 wrote:
> From: "Juan A. Suarez Romero" 
>
> When spliting double_to_single() in Ivybridge/Valleyview, the second
> part should use a temporal register, and then move the values to the
> second half of the original destiny, so we get all the results in the

Typo: destination

> same register.

Please change double_to_single() to VEC4_OPCODE_FROM_DOUBLE  (or just
FROM_DOUBLE) throughout.

> ---
>  src/mesa/drivers/dri/i965/brw_vec4.cpp   | 17 +
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  1 +
>  2 files changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index f533207..afabc22 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -2199,9 +2199,15 @@ vec4_visitor::lower_simd_width()
>   linst->group = channel_offset;
>   linst->size_written = size_written;
>
> + /* When spliting double_to_single() in Ivybridge, the second part

Typo: splitting

s/in/on/
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

2017-01-13 Thread Timothy Arceri
On Fri, 2017-01-13 at 13:59 -0800, Jason Ekstrand wrote:
> On Fri, Jan 13, 2017 at 11:22 AM, Vladislav Egorov  com> wrote:
> > 13.01.2017 19:51, Emil Velikov пишет:
> > > From: Emil Velikov 
> > > 
> > > At the moment we support 5+ different implementations each with
> > > varying
> > > amount of bugs - from thread safely problems [1], to outright
> > > broken
> > > implementation(s) [2]
> > > 
> > > In order to accommodate these we have 150+ lines of configure
> > > script and
> > > extra two configure toggles. Whist an actual implementation being
> > > ~200loc and our current compat wrapping ~250.
> 
> Yes, this is a problem.  Especially given that at least one of those
> implementations (openssl?) is something that a certain major game
> distributor likes to hard-link into things causing interesting and
> hard-to-debug problems.  I am all for getting rid of the "piles of
> different dependencies" approach.
> 
> Also, something I would like to see (maybe a follow-on patch?) would
> a change to the mesa internal API to be able to put the SHA context
> on the stack and not need to malloc it.  It's not really a memory or
> cycle-saving thing so much as it leaves one fewer cleanup paths you
> have to worry about.
>  
> > > Let's not forget that different people use different code paths,
> > > thus
> > > effectively makes it harder to test and debug since the default
> > > implementation is automatically detected.
> > > 
> > > To minimise all these lovely experiences, import the "100% Public
> > > Domain" OpenBSD sha1 implementation. Clearly document any changes
> > > needed
> > > to get building correctly, since many/most of those can be
> > > upstreamed
> > > making future syncs easier.
> > > 
> >  
> > It can hurt performance. OpenSSL implementation is optimized for
> > all thinkable architectures and it will use hardware SHA-1
> > instructions on newer CPUs. From https://github.com/openssl/openssl
> > /blob/master/crypto/sha/asm/sha1-x86_64.pl :
> > 
> > > Current performance is summarized in following table. Numbers are
> > > CPU clock cycles spent to process single byte (less is better).
> > >
> > >        x86_64        SSSE3        AVX[2]
> > > P4        9.05        -
> > > Opteron    6.26        -
> > > Core2        6.55        6.05/+8%    -
> > > Westmere    6.73        5.30/+27%    -
> > > Sandy Bridge    7.70        6.10/+26%    4.99/+54%
> > > Ivy Bridge    6.06        4.67/+30%    4.60/+32%
> > > Haswell    5.45        4.15/+31%    3.57/+53%
> > > Skylake    5.18        4.06/+28%    3.54/+46%
> > > Bulldozer    9.11        5.95/+53%
> > > VIA Nano    9.32        7.15/+30%
> > > Atom        10.3        9.17/+12%
> > > Silvermont    13.1(*)        9.37/+40%
> > > Goldmont    8.13        6.42/+27%    1.70/+380%(**)
> > 
> > Quick benchmark on my Haswell of the OpenBSD implementation
> > compiled with GCC5 -O2: ~8 cycles per byte on 32-bit, ~7 cycles per
> > byte on 64-bit. But Haswell is a very powerful CPU, on weaker CPUs
> > the difference would be probably larger, especially on new CPUs
> > that have SHA instruction set.
> 
> Thanks for the numbers.  It sounds like, on Haswell, the openSSL
> implementation is about 2x as fast which is very useful to know. 
> However, this isn't on a super perf-critical path.  We never use SHA1
> on any draw-time paths; we always use a simpler hash function in
> those cases and reserve SHA1 for when we really don't want
> collisions.

Actually the OpenGL shader cache uses it a draw time to find cached
variants. I looked at pulling an implementation into Mesa a while ago
but found the perf drop wasn't worth it.

I really like the idea of having an internal implementation but I don't
think we should dismiss performance so quickly it would be nice if we
could hold this off until more testing can be done.

>   That said, it's a bit more critical than Emil makes it sound.  A
> typical Vulkan application may easily create 10k pipelines and each
> of those will involve hashing at least about 100B of data (not
> include the SPIR-V source).  I doubt, however, that this is enough to
> really cause a problem given how much other work goes into building a
> pipeline.
> 
> Unfortunately, the OpenSSL implementation, while fast, is one of the
> ones that is causing problems.  One of our favorite game distributors
> likes to hard-link against openssl in some of their games and/or
> libraries (not sure which).  This means that, if mesa tries to
> dynamically open libssl, you get mysterious crashes due to slight
> differences between the system-installed version and the one that has
> been linked into the game.  This makes trying to use the OpenSSL
> implementation a non-starter without being able to wholesale import
> the implementation.
> 
> Emil, I'm fine with this change.  I haven't reviewed the details, but
> my gut tells me we can eat the perf difference for now.  Consider
> that an Acked-by if you'd like but it would be good 

Re: [Mesa-dev] [PATCH 09/22] i965/fs: add lowering x2d step for IVB/VLV

2017-01-13 Thread Matt Turner
On Thu, Jan 5, 2017 at 5:07 AM, Samuel Iglesias Gonsálvez
 wrote:
> From: "Juan A. Suarez Romero" 
>
> On Ivybridge/Valleyview, when converting a float (F) to a double
> precision float (DF), the hardware automatically duplicates the source
> horizontal stride, hence converting only the values in odd positions.
>
> This commit adds a new lowering step, exclusively for IVB/VLV, where the
> sources are first copied in a temporal register with stride 2, and
> then converted from this temporal register. Thus, we do not lose any
> value.

Curro explained how he thinks the hardware works to me. I'll try to
reproduce that description here.

The FPU channels are 32-bits wide on IVB/BYT. Normally, for example
when operating on 8 float channels, the FPU is given a channel of the
source register to operate on, and each FPU channel produces a value
which is written to the channels of the destination.

But when operating on doubles, each *pair* of FPU channels operates on
one (double-precision) value. Unfortunately the hardware designers
didn't seem to update the input and output logic, so for instance
every pair of float channels from the source region are given as input
to the FPU, even though only the low (or even numbered) channel will
be used. This is why it appears that the hardware doubles the stride,
but it's really just ignoring all of the odd channels.

A similar thing happens on output. The output elements are 64-bits
(even if the output type is float), and so a destination stride of 1
means the writes are strided by 64-bits. This explains the strange
looking behavior you discovered of an instruction like mov(8) gX<1>F
gY<8,8,1>DF.

With that understanding, we actually can read consecutive float
channels and convert them to doubles in one instruction -- by using a
<1,2,0> region. Each float channel is read twice, and the second read
will be ignored by the FPU.

So we can replace this patch with the one I have attached. A nice side
effect of this is that we can simplify VEC4_OPCODE_TO_DOUBLE.
From 5637d035982c89415d47be119ad6a1c9d2e14e42 Mon Sep 17 00:00:00 2001
From: Matt Turner 
Date: Thu, 12 Jan 2017 18:05:58 -0800
Subject: [PATCH] i965: Use source region <1,2,0> when converting to DF.

Doing so allows us to use a single MOV in VEC4_OPCODE_TO_DOUBLE instead
of two.
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c  | 28 +++-
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 13 +--
 2 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 5f81b7a..ebdd557 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -1101,7 +1101,6 @@ void brw_##OP(struct brw_codegen *p,	  \
 }
 
 
-ALU1(MOV)
 ALU2(SEL)
 ALU1(NOT)
 ALU2(AND)
@@ -1135,6 +1134,33 @@ ALU2(SUBB)
 ROUND(RNDZ)
 ROUND(RNDE)
 
+brw_inst *
+brw_MOV(struct brw_codegen *p, struct brw_reg dest, struct brw_reg src0)
+{
+   const struct gen_device_info *devinfo = p->devinfo;
+
+   /* When converting F->DF on IVB/BYT, every odd source channel is ignored.
+* To avoid the problems that causes, we use a <1,2,0> source region to read
+* each element twice.
+*/
+   if (devinfo->gen == 7 && !devinfo->is_haswell &&
+   brw_inst_access_mode(devinfo, p->current) == BRW_ALIGN_1 &&
+   dest.type == BRW_REGISTER_TYPE_DF &&
+   (src0.type == BRW_REGISTER_TYPE_F ||
+src0.type == BRW_REGISTER_TYPE_D ||
+src0.type == BRW_REGISTER_TYPE_UD) &&
+   !has_scalar_region(src0)) {
+  assert(src0.vstride == BRW_VERTICAL_STRIDE_4 &&
+ src0.width == BRW_WIDTH_4 &&
+ src0.hstride == BRW_HORIZONTAL_STRIDE_1);
+
+  src0.vstride = BRW_VERTICAL_STRIDE_1;
+  src0.width = BRW_WIDTH_2;
+  src0.hstride = BRW_HORIZONTAL_STRIDE_0;
+   }
+
+   return brw_alu1(p, BRW_OPCODE_MOV, dest, src0);
+}
 
 brw_inst *
 brw_ADD(struct brw_codegen *p, struct brw_reg dest,
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index f68baab..847a01b 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1958,18 +1958,7 @@ generate_code(struct brw_codegen *p,
 
  brw_set_default_access_mode(p, BRW_ALIGN_1);
 
- struct brw_reg tmp = retype(dst, src[0].type);
- tmp.hstride = BRW_HORIZONTAL_STRIDE_2;
- tmp.width = BRW_WIDTH_4;
- src[0].vstride = BRW_VERTICAL_STRIDE_4;
- src[0].hstride = BRW_HORIZONTAL_STRIDE_1;
- src[0].width = BRW_WIDTH_4;
- brw_MOV(p, tmp, src[0]);
-
- tmp.vstride = BRW_VERTICAL_STRIDE_8;
- tmp.hstride = BRW_HORIZONTAL_STRIDE_2;
- tmp.width = BRW_WIDTH_4;
- brw_MOV(p, dst, tmp);
+ brw_MOV(p, dst, src[0]);
 
  brw_set_default_access_mode(p, 

Re: [Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

2017-01-13 Thread Jason Ekstrand
On Fri, Jan 13, 2017 at 2:18 PM, Timothy Arceri <
timothy.arc...@collabora.com> wrote:

> On Fri, 2017-01-13 at 13:59 -0800, Jason Ekstrand wrote:
> > On Fri, Jan 13, 2017 at 11:22 AM, Vladislav Egorov  > com> wrote:
> > > 13.01.2017 19:51, Emil Velikov пишет:
> > > > From: Emil Velikov 
> > > >
> > > > At the moment we support 5+ different implementations each with
> > > > varying
> > > > amount of bugs - from thread safely problems [1], to outright
> > > > broken
> > > > implementation(s) [2]
> > > >
> > > > In order to accommodate these we have 150+ lines of configure
> > > > script and
> > > > extra two configure toggles. Whist an actual implementation being
> > > > ~200loc and our current compat wrapping ~250.
> >
> > Yes, this is a problem.  Especially given that at least one of those
> > implementations (openssl?) is something that a certain major game
> > distributor likes to hard-link into things causing interesting and
> > hard-to-debug problems.  I am all for getting rid of the "piles of
> > different dependencies" approach.
> >
> > Also, something I would like to see (maybe a follow-on patch?) would
> > a change to the mesa internal API to be able to put the SHA context
> > on the stack and not need to malloc it.  It's not really a memory or
> > cycle-saving thing so much as it leaves one fewer cleanup paths you
> > have to worry about.
> >
> > > > Let's not forget that different people use different code paths,
> > > > thus
> > > > effectively makes it harder to test and debug since the default
> > > > implementation is automatically detected.
> > > >
> > > > To minimise all these lovely experiences, import the "100% Public
> > > > Domain" OpenBSD sha1 implementation. Clearly document any changes
> > > > needed
> > > > to get building correctly, since many/most of those can be
> > > > upstreamed
> > > > making future syncs easier.
> > > >
> > >
> > > It can hurt performance. OpenSSL implementation is optimized for
> > > all thinkable architectures and it will use hardware SHA-1
> > > instructions on newer CPUs. From https://github.com/openssl/openssl
> > > /blob/master/crypto/sha/asm/sha1-x86_64.pl :
> > >
> > > > Current performance is summarized in following table. Numbers are
> > > > CPU clock cycles spent to process single byte (less is better).
> > > >
> > > >x86_64SSSE3AVX[2]
> > > > P49.05-
> > > > Opteron6.26-
> > > > Core26.556.05/+8%-
> > > > Westmere6.735.30/+27%-
> > > > Sandy Bridge7.706.10/+26%4.99/+54%
> > > > Ivy Bridge6.064.67/+30%4.60/+32%
> > > > Haswell5.454.15/+31%3.57/+53%
> > > > Skylake5.184.06/+28%3.54/+46%
> > > > Bulldozer9.115.95/+53%
> > > > VIA Nano9.327.15/+30%
> > > > Atom10.39.17/+12%
> > > > Silvermont13.1(*)9.37/+40%
> > > > Goldmont8.136.42/+27%1.70/+380%(**)
> > >
> > > Quick benchmark on my Haswell of the OpenBSD implementation
> > > compiled with GCC5 -O2: ~8 cycles per byte on 32-bit, ~7 cycles per
> > > byte on 64-bit. But Haswell is a very powerful CPU, on weaker CPUs
> > > the difference would be probably larger, especially on new CPUs
> > > that have SHA instruction set.
> >
> > Thanks for the numbers.  It sounds like, on Haswell, the openSSL
> > implementation is about 2x as fast which is very useful to know.
> > However, this isn't on a super perf-critical path.  We never use SHA1
> > on any draw-time paths; we always use a simpler hash function in
> > those cases and reserve SHA1 for when we really don't want
> > collisions.
>
> Actually the OpenGL shader cache uses it a draw time to find cached
> variants. I looked at pulling an implementation into Mesa a while ago
> but found the perf drop wasn't worth it.
>

Why doesn't the usual in-memory cache stand as a front-line defense?  Could
you please be more specific about the perf implications you've seen?  Also,
which implementation were you linking to that was so much faster?


> I really like the idea of having an internal implementation but I don't
> think we should dismiss performance so quickly it would be nice if we
> could hold this off until more testing can be done.
>
> >   That said, it's a bit more critical than Emil makes it sound.  A
> > typical Vulkan application may easily create 10k pipelines and each
> > of those will involve hashing at least about 100B of data (not
> > include the SPIR-V source).  I doubt, however, that this is enough to
> > really cause a problem given how much other work goes into building a
> > pipeline.
> >
> > Unfortunately, the OpenSSL implementation, while fast, is one of the
> > ones that is causing problems.  One of our favorite game distributors
> > likes to hard-link against openssl in some of their games and/or
> > libraries (not sure which).  This means that, if mesa tries to
> > 

Re: [Mesa-dev] [PATCH 14/22] i965/vec4: fix double_to_single() for IVB/VLV

2017-01-13 Thread Matt Turner
On Thu, Jan 5, 2017 at 5:07 AM, Samuel Iglesias Gonsálvez
 wrote:
> From: "Juan A. Suarez Romero" 
>
> In the generator we must generate slightly different code for
> Ivybridge/Valleview, because of the way the stride works in
> this hardware.
> ---
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 26 
> +---
>  1 file changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> index 0eaa91b..a68e14c 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> @@ -1936,13 +1936,28 @@ generate_code(struct brw_codegen *p,
>
>   brw_set_default_access_mode(p, BRW_ALIGN_1);
>
> - dst.hstride = BRW_HORIZONTAL_STRIDE_2;
> + /* When converting from DF->F, we set destination's stride as 2 as 
> an
> +  * aligment requirement. But in IVB/VLV, each DF implicitly writes

Typo: alignment

> +  * two floats, being the first one the converted value. So we don't
> +  * need to explicitly set stride 2, but 1.
> +  */
> + if (devinfo->gen == 7 && !devinfo->is_haswell)
> +dst.hstride = BRW_HORIZONTAL_STRIDE_1;
> + else
> +dst.hstride = BRW_HORIZONTAL_STRIDE_2;
> +
>   dst.width = BRW_WIDTH_4;
>   src[0].vstride = BRW_VERTICAL_STRIDE_4;
>   src[0].width = BRW_WIDTH_4;
>   brw_MOV(p, dst, src[0]);
>
>   struct brw_reg dst_as_src = dst;
> + /* As we have set horizontal stride 1 instead of 2 in IVB/VLV, we
> +  * need to fix it here to have the expected value.
> +  */
> + if (devinfo->gen == 7 && !devinfo->is_haswell)
> +dst_as_src.hstride = BRW_HORIZONTAL_STRIDE_2;
> +
>   dst.hstride = BRW_HORIZONTAL_STRIDE_1;
>   dst.width = BRW_WIDTH_8;
>   brw_MOV(p, dst, dst_as_src);
> @@ -1965,8 +1980,13 @@ generate_code(struct brw_codegen *p,
>   src[0].width = BRW_WIDTH_4;
>   brw_MOV(p, tmp, src[0]);
>
> - tmp.vstride = BRW_VERTICAL_STRIDE_8;
> - tmp.hstride = BRW_HORIZONTAL_STRIDE_2;
> + if (devinfo->gen == 7 && !devinfo->is_haswell) {
> +tmp.vstride = BRW_VERTICAL_STRIDE_4;
> +tmp.hstride = BRW_HORIZONTAL_STRIDE_1;
> + } else {
> +tmp.vstride = BRW_VERTICAL_STRIDE_8;
> +tmp.hstride = BRW_HORIZONTAL_STRIDE_2;
> + }

With the patch I sent to replace 09/22, there should be no changes
needed to VEC4_OPCODE_TO_DOUBLE. :)

Please change double_to_single() to VEC4_OPCODE_FROM_DOUBLE in the title.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

2017-01-13 Thread Timothy Arceri
On Fri, 2017-01-13 at 14:32 -0800, Jason Ekstrand wrote:
> On Fri, Jan 13, 2017 at 2:18 PM, Timothy Arceri  bora.com> wrote:
> > On Fri, 2017-01-13 at 13:59 -0800, Jason Ekstrand wrote:
> > > On Fri, Jan 13, 2017 at 11:22 AM, Vladislav Egorov  > ail.
> > > com> wrote:
> > > > 13.01.2017 19:51, Emil Velikov пишет:
> > > > > From: Emil Velikov 
> > > > >
> > > > > At the moment we support 5+ different implementations each
> > with
> > > > > varying
> > > > > amount of bugs - from thread safely problems [1], to outright
> > > > > broken
> > > > > implementation(s) [2]
> > > > >
> > > > > In order to accommodate these we have 150+ lines of configure
> > > > > script and
> > > > > extra two configure toggles. Whist an actual implementation
> > being
> > > > > ~200loc and our current compat wrapping ~250.
> > >
> > > Yes, this is a problem.  Especially given that at least one of
> > those
> > > implementations (openssl?) is something that a certain major game
> > > distributor likes to hard-link into things causing interesting
> > and
> > > hard-to-debug problems.  I am all for getting rid of the "piles
> > of
> > > different dependencies" approach.
> > >
> > > Also, something I would like to see (maybe a follow-on patch?)
> > would
> > > a change to the mesa internal API to be able to put the SHA
> > context
> > > on the stack and not need to malloc it.  It's not really a memory
> > or
> > > cycle-saving thing so much as it leaves one fewer cleanup paths
> > you
> > > have to worry about.
> > >  
> > > > > Let's not forget that different people use different code
> > paths,
> > > > > thus
> > > > > effectively makes it harder to test and debug since the
> > default
> > > > > implementation is automatically detected.
> > > > >
> > > > > To minimise all these lovely experiences, import the "100%
> > Public
> > > > > Domain" OpenBSD sha1 implementation. Clearly document any
> > changes
> > > > > needed
> > > > > to get building correctly, since many/most of those can be
> > > > > upstreamed
> > > > > making future syncs easier.
> > > > >
> > > >  
> > > > It can hurt performance. OpenSSL implementation is optimized
> > for
> > > > all thinkable architectures and it will use hardware SHA-1
> > > > instructions on newer CPUs. From https://github.com/openssl/ope
> > nssl
> > > > /blob/master/crypto/sha/asm/sha1-x86_64.pl :
> > > >
> > > > > Current performance is summarized in following table. Numbers
> > are
> > > > > CPU clock cycles spent to process single byte (less is
> > better).
> > > > >
> > > > >        x86_64        SSSE3        AVX[2]
> > > > > P4        9.05        -
> > > > > Opteron    6.26        -
> > > > > Core2        6.55        6.05/+8%    -
> > > > > Westmere    6.73        5.30/+27%    -
> > > > > Sandy Bridge    7.70        6.10/+26%    4.99/+54%
> > > > > Ivy Bridge    6.06        4.67/+30%    4.60/+32%
> > > > > Haswell    5.45        4.15/+31%    3.57/+53%
> > > > > Skylake    5.18        4.06/+28%    3.54/+46%
> > > > > Bulldozer    9.11        5.95/+53%
> > > > > VIA Nano    9.32        7.15/+30%
> > > > > Atom        10.3        9.17/+12%
> > > > > Silvermont    13.1(*)        9.37/+40%
> > > > > Goldmont    8.13        6.42/+27%    1.70/+380%(**)
> > > >
> > > > Quick benchmark on my Haswell of the OpenBSD implementation
> > > > compiled with GCC5 -O2: ~8 cycles per byte on 32-bit, ~7 cycles
> > per
> > > > byte on 64-bit. But Haswell is a very powerful CPU, on weaker
> > CPUs
> > > > the difference would be probably larger, especially on new CPUs
> > > > that have SHA instruction set.
> > >
> > > Thanks for the numbers.  It sounds like, on Haswell, the openSSL
> > > implementation is about 2x as fast which is very useful to know. 
> > > However, this isn't on a super perf-critical path.  We never use
> > SHA1
> > > on any draw-time paths; we always use a simpler hash function in
> > > those cases and reserve SHA1 for when we really don't want
> > > collisions.
> > 
> > Actually the OpenGL shader cache uses it a draw time to find cached
> > variants. I looked at pulling an implementation into Mesa a while
> > ago
> > but found the perf drop wasn't worth it.
> 
> Why doesn't the usual in-memory cache stand as a front-line defense?

It does :)

>   Could you please be more specific about the perf implications
> you've seen? 

I'm asking for a chance to test before we jump in, its probably not a
big deal and I may even still be able to reduce my use of hashing but
it would be nice to be given a few days to test and even explore
alternatives before jumping on this implementation.

>  Also, which implementation were you linking to that was so much
> faster?

I didn't test the OpenBSD implementation I tried another small
implementation that claimed it was fast. Pretty much any of the
available libraries were much faster as you would expect from something
that has been tweaked over the years.

>  
> > I really like the idea of 

Re: [Mesa-dev] [PATCH 3/3] radv: make device extension setup dynamic

2017-01-13 Thread Bas Nieuwenhuizen
On Sat, Jan 14, 2017 at 12:20 AM, Andres Rodriguez  wrote:
>
>
> On 2017-01-13 06:04 PM, Bas Nieuwenhuizen wrote:
>>
>> On Fri, Jan 13, 2017 at 11:06 PM, Andres Rodriguez 
>> wrote:
>>>
>>> Each physical may have different extensions than one another.
>>> Furthermore, depending on the software stack, some extensions may not be
>>> accessible.
>>>
>>> If an extension is conditional, it can be registered only when
>>> necessary.
>>>
>>> Signed-off-by: Andres Rodriguez 
>>> ---
>>>   src/amd/vulkan/radv_device.c  | 196
>>> --
>>>   src/amd/vulkan/radv_private.h |   6 ++
>>>   2 files changed, 137 insertions(+), 65 deletions(-)
>>>
>>> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
>>> index 5669fd7..0333688 100644
>>> --- a/src/amd/vulkan/radv_device.c
>>> +++ b/src/amd/vulkan/radv_device.c
>>> @@ -77,6 +77,115 @@ radv_device_get_cache_uuid(enum radeon_family family,
>>> void *uuid)
>>>  return 0;
>>>   }
>>>
>>> +static const VkExtensionProperties instance_extensions[] = {
>>> +   {
>>> +   .extensionName = VK_KHR_SURFACE_EXTENSION_NAME,
>>> +   .specVersion = 25,
>>> +   },
>>> +#ifdef VK_USE_PLATFORM_XCB_KHR
>>> +   {
>>> +   .extensionName = VK_KHR_XCB_SURFACE_EXTENSION_NAME,
>>> +   .specVersion = 6,
>>> +   },
>>> +#endif
>>> +#ifdef VK_USE_PLATFORM_XLIB_KHR
>>> +   {
>>> +   .extensionName = VK_KHR_XLIB_SURFACE_EXTENSION_NAME,
>>> +   .specVersion = 6,
>>> +   },
>>> +#endif
>>> +#ifdef VK_USE_PLATFORM_WAYLAND_KHR
>>> +   {
>>> +   .extensionName = VK_KHR_WAYLAND_SURFACE_EXTENSION_NAME,
>>> +   .specVersion = 5,
>>> +   },
>>> +#endif
>>> +};
>>> +
>>> +static const VkExtensionProperties common_device_extensions[] = {
>>> +   {
>>> +   .extensionName =
>>> VK_KHR_SAMPLER_MIRROR_CLAMP_TO_EDGE_EXTENSION_NAME,
>>> +   .specVersion = 1,
>>> +   },
>>> +   {
>>> +   .extensionName = VK_KHR_SWAPCHAIN_EXTENSION_NAME,
>>> +   .specVersion = 68,
>>> +   },
>>> +   {
>>> +   .extensionName =
>>> VK_AMD_DRAW_INDIRECT_COUNT_EXTENSION_NAME,
>>> +   .specVersion = 1,
>>> +   },
>>> +   {
>>> +   .extensionName =
>>> VK_AMD_NEGATIVE_VIEWPORT_HEIGHT_EXTENSION_NAME,
>>> +   .specVersion = 1,
>>> +   },
>>> +};
>>> +
>>> +static VkResult
>>> +radv_extensions_register(struct radv_instance *instance,
>>> +   struct radv_extensions
>>> *extensions,
>>> +   const VkExtensionProperties
>>> *new_ext,
>>> +   uint32_t num_ext)
>>> +{
>>> +   size_t new_size;
>>> +   VkExtensionProperties *new_ptr;
>>> +
>>> +   assert(new_ext && num_ext > 0);
>>> +
>>> +   if (!new_ext)
>>> +   return VK_ERROR_INITIALIZATION_FAILED;
>>> +
>>> +   new_size = (extensions->num_ext + num_ext) *
>>> sizeof(VkExtensionProperties);
>>> +   new_ptr = vk_realloc(>alloc, extensions->ext_array,
>>> +   new_size, 8,
>>> VK_SYSTEM_ALLOCATION_SCOPE_INSTANCE);
>>> +
>>> +   /* Old array continues to be valid, update nothing */
>>> +   if (!new_ptr)
>>> +   return VK_ERROR_OUT_OF_HOST_MEMORY;
>>> +
>>> +   memcpy(_ptr[extensions->num_ext], new_ext,
>>> +   num_ext * sizeof(VkExtensionProperties));
>>> +   extensions->ext_array = new_ptr;
>>> +   extensions->num_ext += num_ext;
>>> +
>>> +   return VK_SUCCESS;
>>> +}
>>> +
>>> +#define radv_extensions_register_single(instance, extensions, name,
>>> version) \
>>> +   radv_extensions_register(instance, extensions, \
>>> +
>>> &(VkExtensionProperties){ \
>>> +
>>> .extensionName = name, \
>>> +
>>> .specVersion = version \
>>> +   }, 1);
>>
>> Please make this a function, I see no reason to keep this a macro. Or
>> lose it, as I can't find an user in this patch.
>>>
>>> +
>>> +static void
>>> +radv_extensions_finish(struct radv_instance *instance,
>>> +   struct radv_extensions
>>> *extensions)
>>> +{
>>> +   assert(extensions);
>>> +
>>> +   if (!extensions)
>>> +   radv_loge("Attemted to free invalid extension struct\n");
>>> +
>>> +   if (extensions->ext_array)
>>> +   vk_free(>alloc, extensions->ext_array);
>>> +}
>>> +
>>> +static bool
>>> +is_extension_enabled(const VkExtensionProperties *extensions,
>>> +   size_t num_ext,
>>> +   const char *name)
>>> +{
>>> +   assert(extensions && name);
>>> +
>>> +   for (uint32_t i = 0; i < num_ext; 

Re: [Mesa-dev] [PATCH 3/3] radv: make device extension setup dynamic

2017-01-13 Thread Bas Nieuwenhuizen
On Sat, Jan 14, 2017 at 12:34 AM, Andres Rodriguez  wrote:
>
>
> On 2017-01-13 06:30 PM, Bas Nieuwenhuizen wrote:
>>
>> On Sat, Jan 14, 2017 at 12:20 AM, Andres Rodriguez 
>> wrote:
>>>
>>>
>>> On 2017-01-13 06:04 PM, Bas Nieuwenhuizen wrote:

 On Fri, Jan 13, 2017 at 11:06 PM, Andres Rodriguez 
 wrote:
>
> Each physical may have different extensions than one another.
> Furthermore, depending on the software stack, some extensions may not
> be
> accessible.
>
> If an extension is conditional, it can be registered only when
> necessary.
>
> Signed-off-by: Andres Rodriguez 
> ---
>src/amd/vulkan/radv_device.c  | 196
> --
>src/amd/vulkan/radv_private.h |   6 ++
>2 files changed, 137 insertions(+), 65 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_device.c
> b/src/amd/vulkan/radv_device.c
> index 5669fd7..0333688 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -77,6 +77,115 @@ radv_device_get_cache_uuid(enum radeon_family
> family,
> void *uuid)
>   return 0;
>}
>
> +static const VkExtensionProperties instance_extensions[] = {
> +   {
> +   .extensionName = VK_KHR_SURFACE_EXTENSION_NAME,
> +   .specVersion = 25,
> +   },
> +#ifdef VK_USE_PLATFORM_XCB_KHR
> +   {
> +   .extensionName = VK_KHR_XCB_SURFACE_EXTENSION_NAME,
> +   .specVersion = 6,
> +   },
> +#endif
> +#ifdef VK_USE_PLATFORM_XLIB_KHR
> +   {
> +   .extensionName = VK_KHR_XLIB_SURFACE_EXTENSION_NAME,
> +   .specVersion = 6,
> +   },
> +#endif
> +#ifdef VK_USE_PLATFORM_WAYLAND_KHR
> +   {
> +   .extensionName = VK_KHR_WAYLAND_SURFACE_EXTENSION_NAME,
> +   .specVersion = 5,
> +   },
> +#endif
> +};
> +
> +static const VkExtensionProperties common_device_extensions[] = {
> +   {
> +   .extensionName =
> VK_KHR_SAMPLER_MIRROR_CLAMP_TO_EDGE_EXTENSION_NAME,
> +   .specVersion = 1,
> +   },
> +   {
> +   .extensionName = VK_KHR_SWAPCHAIN_EXTENSION_NAME,
> +   .specVersion = 68,
> +   },
> +   {
> +   .extensionName =
> VK_AMD_DRAW_INDIRECT_COUNT_EXTENSION_NAME,
> +   .specVersion = 1,
> +   },
> +   {
> +   .extensionName =
> VK_AMD_NEGATIVE_VIEWPORT_HEIGHT_EXTENSION_NAME,
> +   .specVersion = 1,
> +   },
> +};
> +
> +static VkResult
> +radv_extensions_register(struct radv_instance *instance,
> +   struct radv_extensions
> *extensions,
> +   const VkExtensionProperties
> *new_ext,
> +   uint32_t num_ext)
> +{
> +   size_t new_size;
> +   VkExtensionProperties *new_ptr;
> +
> +   assert(new_ext && num_ext > 0);
> +
> +   if (!new_ext)
> +   return VK_ERROR_INITIALIZATION_FAILED;
> +
> +   new_size = (extensions->num_ext + num_ext) *
> sizeof(VkExtensionProperties);
> +   new_ptr = vk_realloc(>alloc, extensions->ext_array,
> +   new_size, 8,
> VK_SYSTEM_ALLOCATION_SCOPE_INSTANCE);
> +
> +   /* Old array continues to be valid, update nothing */
> +   if (!new_ptr)
> +   return VK_ERROR_OUT_OF_HOST_MEMORY;
> +
> +   memcpy(_ptr[extensions->num_ext], new_ext,
> +   num_ext * sizeof(VkExtensionProperties));
> +   extensions->ext_array = new_ptr;
> +   extensions->num_ext += num_ext;
> +
> +   return VK_SUCCESS;
> +}
> +
> +#define radv_extensions_register_single(instance, extensions, name,
> version) \
> +   radv_extensions_register(instance, extensions, \
> +
> &(VkExtensionProperties){ \
> +
> .extensionName = name, \
> +
> .specVersion = version \
> +   }, 1);

 Please make this a function, I see no reason to keep this a macro. Or
 lose it, as I can't find an user in this patch.
>
> +
> +static void
> +radv_extensions_finish(struct radv_instance *instance,
> +   struct radv_extensions
> *extensions)
> +{
> +   assert(extensions);
> +
> +   if (!extensions)
> +   radv_loge("Attemted to free invalid extension
> 

Re: [Mesa-dev] [PATCH 17/22] i965/vec4: fix register_coalesce() for partial writes

2017-01-13 Thread Matt Turner
On Thu, Jan 5, 2017 at 5:07 AM, Samuel Iglesias Gonsálvez
 wrote:
> From: "Juan A. Suarez Romero" 
>
> When lowering double_to_single() we added a final mov() that puts 32-bit

I can't confirm that this patch is necessary in the current
i965-fp64-gen7-ivb-scalar-vec4-rc2 branch. It passes Jenkins with it
reverted.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Move Gen4-5 interpolation stuff to brw_wm_prog_data.

2017-01-13 Thread Timothy Arceri
On Fri, 2017-01-13 at 16:07 -0800, Kenneth Graunke wrote:
> This fixes glxgears rendering, which had surprisingly been broken
> since
> late October!  Specifically, commit
> 91d61fbf7cb61a44adcaae51ee08ad0dd6b.
> 
> glxgears uses glShadeModel(GL_FLAT) when drawing the main portion of
> the
> gears, then uses glShadeModel(GL_SMOOTH) for drawing the Gouraud-
> shaded
> inner portion of the gears.  This results in the same fragment
> program
> having two different state-dependent interpolation maps: one where
> gl_Color is flat, and another where it's smooth.
> 
> The problem is that there's only one gen4_fragment_program, so it
> can't
> store both.  Each FS compile would trash the last one.  But, the FS
> compiles are cached, so the first one would store FLAT, and the
> second
> would see a matching program in the cache and never bother to compile
> one with SMOOTH.  (Clearing the program cache on every draw made it
> render correctly.)

I believe the fs key should have caused the one with SMOOTH to be
compiled, but since variants share gl_program it would get set to
whatever the last variant to be compiled had set and would never change
again as they would be grabbed from cache in future.

Thanks so much for fixing this. It would have been tricky without the
hardware :)

Reviewed-by: Timothy Arceri 


> 
> Instead, move it to brw_wm_prog_data, where we can keep a copy for
> every specialization of the program.  The only downside is bloating
> the structure a bit, but we can tighten that up a bit if we need to.
> This also lets us kill gen4_fragment_program entirely!
> 
> Signed-off-by: Kenneth Graunke 
> Cc: Timothy Arceri 
> ---
>  src/mesa/drivers/dri/i965/brw_clip.c  | 19 +++
> ---
>  src/mesa/drivers/dri/i965/brw_clip.h  |  2 +-
>  src/mesa/drivers/dri/i965/brw_compiler.h  | 10 +++-
>  src/mesa/drivers/dri/i965/brw_context.h   | 14 ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp  |  9 ++-
>  src/mesa/drivers/dri/i965/brw_interpolation_map.c | 30 +++
> 
>  src/mesa/drivers/dri/i965/brw_nir.c   |  8 +-
>  src/mesa/drivers/dri/i965/brw_nir.h   |  3 +--
>  src/mesa/drivers/dri/i965/brw_program.c   |  9 +--
>  src/mesa/drivers/dri/i965/brw_sf.c| 16 ++--
>  src/mesa/drivers/dri/i965/brw_sf.h|  2 +-
>  11 files changed, 52 insertions(+), 70 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_clip.c
> b/src/mesa/drivers/dri/i965/brw_clip.c
> index 8560dd4..e375674 100644
> --- a/src/mesa/drivers/dri/i965/brw_clip.c
> +++ b/src/mesa/drivers/dri/i965/brw_clip.c
> @@ -139,7 +139,7 @@ brw_upload_clip_prog(struct brw_context *brw)
>  _NEW_POLYGON |
>  _NEW_TRANSFORM,
>  BRW_NEW_BLORP |
> -BRW_NEW_FRAGMENT_PROGRAM |
> +BRW_NEW_FS_PROG_DATA |
>  BRW_NEW_REDUCED_PRIMITIVE |
>  BRW_NEW_VUE_MAP_GEOM_OUT))
>    return;
> @@ -149,15 +149,14 @@ brw_upload_clip_prog(struct brw_context *brw)
> /* Populate the key:
>  */
>  
> -   const struct gl_program *fprog = brw->fragment_program;
> -   if (fprog) {
> -  assert(brw->gen < 6);
> -  struct gen4_fragment_program *p = (struct
> gen4_fragment_program *) fprog;
> -
> -  /* BRW_NEW_FRAGMENT_PROGRAM */
> -  key.contains_flat_varying = p->contains_flat_varying;
> -  key.contains_noperspective_varying = p-
> >contains_noperspective_varying;
> -  key.interp_mode = p->interp_mode;
> +   /* BRW_NEW_FS_PROG_DATA */
> +   const struct brw_wm_prog_data *wm_prog_data =
> +  brw_wm_prog_data(brw->wm.base.prog_data);
> +   if (wm_prog_data) {
> +  key.contains_flat_varying = wm_prog_data-
> >contains_flat_varying;
> +  key.contains_noperspective_varying =
> + wm_prog_data->contains_noperspective_varying;
> +  key.interp_mode = wm_prog_data->interp_mode;
> }
>  
> /* BRW_NEW_REDUCED_PRIMITIVE */
> diff --git a/src/mesa/drivers/dri/i965/brw_clip.h
> b/src/mesa/drivers/dri/i965/brw_clip.h
> index 355ae64..a8ee394 100644
> --- a/src/mesa/drivers/dri/i965/brw_clip.h
> +++ b/src/mesa/drivers/dri/i965/brw_clip.h
> @@ -49,7 +49,7 @@ struct brw_clip_prog_key {
> GLbitfield64 attrs;
> bool contains_flat_varying;
> bool contains_noperspective_varying;
> -   unsigned char *interp_mode;
> +   const unsigned char *interp_mode;
> GLuint primitive:4;
> GLuint nr_userclip:4;
> GLuint pv_first:1;
> diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h
> b/src/mesa/drivers/dri/i965/brw_compiler.h
> index c378e93..3b3b7e0 100644
> --- a/src/mesa/drivers/dri/i965/brw_compiler.h
> +++ b/src/mesa/drivers/dri/i965/brw_compiler.h
> @@ -412,6 +412,9 @@ struct brw_wm_prog_data {
> bool 

Re: [Mesa-dev] [PATCH 2/3] radv: rename global extension properties structs

2017-01-13 Thread Emil Velikov
On 13 January 2017 at 23:44, Andres Rodriguez  wrote:
> All extension arrays are global, but only one of them refers to instance
> extensions.
>
> The device extension array refers to extensions that are common across
> all physical devices. This disctinction will be more imporant once we
Typos: "distinction" and "important"

> have dynamic extension support for devices.
>
I think that this and 3/3 are very good idea, but since RADV supports
only one device I'm not sure that they're applicable, yet.
Not too familiar with the RADV code so I might be off there.

Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

2017-01-13 Thread Vladislav Egorov



14.01.2017 00:17, Matt Turner пишет:

On Fri, Jan 13, 2017 at 1:01 PM, Vladislav Egorov  wrote:

2017-01-13 22:43 GMT+03:00 Emil Velikov :

On 13 January 2017 at 19:22, Vladislav Egorov  wrote:

13.01.2017 19:51, Emil Velikov пишет:

From: Emil Velikov 

At the moment we support 5+ different implementations each with varying
amount of bugs - from thread safely problems [1], to outright broken
implementation(s) [2]

In order to accommodate these we have 150+ lines of configure script and
extra two configure toggles. Whist an actual implementation being
~200loc and our current compat wrapping ~250.

Let's not forget that different people use different code paths, thus
effectively makes it harder to test and debug since the default
implementation is automatically detected.

To minimise all these lovely experiences, import the "100% Public
Domain" OpenBSD sha1 implementation. Clearly document any changes needed
to get building correctly, since many/most of those can be upstreamed
making future syncs easier.


It can hurt performance.

This is not performance critical path ;-) If that ever changes we can
rethink our options.

Emil


If it's used by shader-cache, it's certainly along the critical path.
And 7-8 cycles per byte (or more than 10 cycles per byte on Atoms,
Celerons and low-end AMDs) per byte of shader text is something to be
considered. In comparison the entire preprocessing stage takes ~15
cycles per byte -- well, after my optimizations :) I regularly see
util_hash_crc32() in perf top - because it uses inefficient
table-based implementation with the same ~8 cycles per byte.

Perhaps we should consider using CRC32C (for which an instruction
exists in SSE 4.2 with a latency of 3 cycles) as the hashing function?

http://stackoverflow.com/questions/2694740/can-one-construct-a-good-hash-function-using-crc32c-as-a-base


Disregard my previous comment in part about util_hash_crc32(), it seems 
that my memory served me wrong, it's not anywhere near to the hottest 
functions in perf top. But generally speaking CRC32 from SSE4.2 can't be 
used as a drop-in replacement for util_hash_crc32(), because it uses a 
different polynomial (not to mention that it will require CPU-detection 
and so on). I don't know if it matters or not in that case. And if 
checksum/hash-function can be changed, maybe CRC32 should not be used at 
all (if CRC32 is used just as a non-cryptographic hash-function). There 
are much faster non-secure hash-functions around [1].


[1] https://github.com/rurban/smhasher#smhasher
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] radv: make device extension setup dynamic

2017-01-13 Thread Andres Rodriguez



On 2017-01-13 06:04 PM, Bas Nieuwenhuizen wrote:

On Fri, Jan 13, 2017 at 11:06 PM, Andres Rodriguez  wrote:

Each physical may have different extensions than one another.
Furthermore, depending on the software stack, some extensions may not be
accessible.

If an extension is conditional, it can be registered only when
necessary.

Signed-off-by: Andres Rodriguez 
---
  src/amd/vulkan/radv_device.c  | 196 --
  src/amd/vulkan/radv_private.h |   6 ++
  2 files changed, 137 insertions(+), 65 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 5669fd7..0333688 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -77,6 +77,115 @@ radv_device_get_cache_uuid(enum radeon_family family, void 
*uuid)
 return 0;
  }

+static const VkExtensionProperties instance_extensions[] = {
+   {
+   .extensionName = VK_KHR_SURFACE_EXTENSION_NAME,
+   .specVersion = 25,
+   },
+#ifdef VK_USE_PLATFORM_XCB_KHR
+   {
+   .extensionName = VK_KHR_XCB_SURFACE_EXTENSION_NAME,
+   .specVersion = 6,
+   },
+#endif
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+   {
+   .extensionName = VK_KHR_XLIB_SURFACE_EXTENSION_NAME,
+   .specVersion = 6,
+   },
+#endif
+#ifdef VK_USE_PLATFORM_WAYLAND_KHR
+   {
+   .extensionName = VK_KHR_WAYLAND_SURFACE_EXTENSION_NAME,
+   .specVersion = 5,
+   },
+#endif
+};
+
+static const VkExtensionProperties common_device_extensions[] = {
+   {
+   .extensionName = 
VK_KHR_SAMPLER_MIRROR_CLAMP_TO_EDGE_EXTENSION_NAME,
+   .specVersion = 1,
+   },
+   {
+   .extensionName = VK_KHR_SWAPCHAIN_EXTENSION_NAME,
+   .specVersion = 68,
+   },
+   {
+   .extensionName = VK_AMD_DRAW_INDIRECT_COUNT_EXTENSION_NAME,
+   .specVersion = 1,
+   },
+   {
+   .extensionName = VK_AMD_NEGATIVE_VIEWPORT_HEIGHT_EXTENSION_NAME,
+   .specVersion = 1,
+   },
+};
+
+static VkResult
+radv_extensions_register(struct radv_instance *instance,
+   struct radv_extensions *extensions,
+   const VkExtensionProperties *new_ext,
+   uint32_t num_ext)
+{
+   size_t new_size;
+   VkExtensionProperties *new_ptr;
+
+   assert(new_ext && num_ext > 0);
+
+   if (!new_ext)
+   return VK_ERROR_INITIALIZATION_FAILED;
+
+   new_size = (extensions->num_ext + num_ext) * 
sizeof(VkExtensionProperties);
+   new_ptr = vk_realloc(>alloc, extensions->ext_array,
+   new_size, 8, 
VK_SYSTEM_ALLOCATION_SCOPE_INSTANCE);
+
+   /* Old array continues to be valid, update nothing */
+   if (!new_ptr)
+   return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+   memcpy(_ptr[extensions->num_ext], new_ext,
+   num_ext * sizeof(VkExtensionProperties));
+   extensions->ext_array = new_ptr;
+   extensions->num_ext += num_ext;
+
+   return VK_SUCCESS;
+}
+
+#define radv_extensions_register_single(instance, extensions, name, version) \
+   radv_extensions_register(instance, extensions, \
+   
&(VkExtensionProperties){ \
+   
.extensionName = name, \
+   
.specVersion = version \
+   }, 1);

Please make this a function, I see no reason to keep this a macro. Or
lose it, as I can't find an user in this patch.

+
+static void
+radv_extensions_finish(struct radv_instance *instance,
+   struct radv_extensions 
*extensions)
+{
+   assert(extensions);
+
+   if (!extensions)
+   radv_loge("Attemted to free invalid extension struct\n");
+
+   if (extensions->ext_array)
+   vk_free(>alloc, extensions->ext_array);
+}
+
+static bool
+is_extension_enabled(const VkExtensionProperties *extensions,
+   size_t num_ext,
+   const char *name)
+{
+   assert(extensions && name);
+
+   for (uint32_t i = 0; i < num_ext; i++) {
+   if (strcmp(name, extensions[i].extensionName) == 0)
+   return true;
+   }
+
+   return false;
+}
+
  static VkResult
  radv_physical_device_init(struct radv_physical_device *device,
   struct radv_instance *instance,
@@ -130,6 +239,13 @@ radv_physical_device_init(struct radv_physical_device 
*device,
 goto fail;
 }

+   result = 

[Mesa-dev] [PATCH] i965: Move Gen4-5 interpolation stuff to brw_wm_prog_data.

2017-01-13 Thread Kenneth Graunke
This fixes glxgears rendering, which had surprisingly been broken since
late October!  Specifically, commit 91d61fbf7cb61a44adcaae51ee08ad0dd6b.

glxgears uses glShadeModel(GL_FLAT) when drawing the main portion of the
gears, then uses glShadeModel(GL_SMOOTH) for drawing the Gouraud-shaded
inner portion of the gears.  This results in the same fragment program
having two different state-dependent interpolation maps: one where
gl_Color is flat, and another where it's smooth.

The problem is that there's only one gen4_fragment_program, so it can't
store both.  Each FS compile would trash the last one.  But, the FS
compiles are cached, so the first one would store FLAT, and the second
would see a matching program in the cache and never bother to compile
one with SMOOTH.  (Clearing the program cache on every draw made it
render correctly.)

Instead, move it to brw_wm_prog_data, where we can keep a copy for
every specialization of the program.  The only downside is bloating
the structure a bit, but we can tighten that up a bit if we need to.
This also lets us kill gen4_fragment_program entirely!

Signed-off-by: Kenneth Graunke 
Cc: Timothy Arceri 
---
 src/mesa/drivers/dri/i965/brw_clip.c  | 19 +++---
 src/mesa/drivers/dri/i965/brw_clip.h  |  2 +-
 src/mesa/drivers/dri/i965/brw_compiler.h  | 10 +++-
 src/mesa/drivers/dri/i965/brw_context.h   | 14 ---
 src/mesa/drivers/dri/i965/brw_fs.cpp  |  9 ++-
 src/mesa/drivers/dri/i965/brw_interpolation_map.c | 30 +++
 src/mesa/drivers/dri/i965/brw_nir.c   |  8 +-
 src/mesa/drivers/dri/i965/brw_nir.h   |  3 +--
 src/mesa/drivers/dri/i965/brw_program.c   |  9 +--
 src/mesa/drivers/dri/i965/brw_sf.c| 16 ++--
 src/mesa/drivers/dri/i965/brw_sf.h|  2 +-
 11 files changed, 52 insertions(+), 70 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_clip.c 
b/src/mesa/drivers/dri/i965/brw_clip.c
index 8560dd4..e375674 100644
--- a/src/mesa/drivers/dri/i965/brw_clip.c
+++ b/src/mesa/drivers/dri/i965/brw_clip.c
@@ -139,7 +139,7 @@ brw_upload_clip_prog(struct brw_context *brw)
 _NEW_POLYGON |
 _NEW_TRANSFORM,
 BRW_NEW_BLORP |
-BRW_NEW_FRAGMENT_PROGRAM |
+BRW_NEW_FS_PROG_DATA |
 BRW_NEW_REDUCED_PRIMITIVE |
 BRW_NEW_VUE_MAP_GEOM_OUT))
   return;
@@ -149,15 +149,14 @@ brw_upload_clip_prog(struct brw_context *brw)
/* Populate the key:
 */
 
-   const struct gl_program *fprog = brw->fragment_program;
-   if (fprog) {
-  assert(brw->gen < 6);
-  struct gen4_fragment_program *p = (struct gen4_fragment_program *) fprog;
-
-  /* BRW_NEW_FRAGMENT_PROGRAM */
-  key.contains_flat_varying = p->contains_flat_varying;
-  key.contains_noperspective_varying = p->contains_noperspective_varying;
-  key.interp_mode = p->interp_mode;
+   /* BRW_NEW_FS_PROG_DATA */
+   const struct brw_wm_prog_data *wm_prog_data =
+  brw_wm_prog_data(brw->wm.base.prog_data);
+   if (wm_prog_data) {
+  key.contains_flat_varying = wm_prog_data->contains_flat_varying;
+  key.contains_noperspective_varying =
+ wm_prog_data->contains_noperspective_varying;
+  key.interp_mode = wm_prog_data->interp_mode;
}
 
/* BRW_NEW_REDUCED_PRIMITIVE */
diff --git a/src/mesa/drivers/dri/i965/brw_clip.h 
b/src/mesa/drivers/dri/i965/brw_clip.h
index 355ae64..a8ee394 100644
--- a/src/mesa/drivers/dri/i965/brw_clip.h
+++ b/src/mesa/drivers/dri/i965/brw_clip.h
@@ -49,7 +49,7 @@ struct brw_clip_prog_key {
GLbitfield64 attrs;
bool contains_flat_varying;
bool contains_noperspective_varying;
-   unsigned char *interp_mode;
+   const unsigned char *interp_mode;
GLuint primitive:4;
GLuint nr_userclip:4;
GLuint pv_first:1;
diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
b/src/mesa/drivers/dri/i965/brw_compiler.h
index c378e93..3b3b7e0 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.h
+++ b/src/mesa/drivers/dri/i965/brw_compiler.h
@@ -412,6 +412,9 @@ struct brw_wm_prog_data {
bool has_side_effects;
bool pulls_bary;
 
+   bool contains_flat_varying;
+   bool contains_noperspective_varying;
+
/**
 * Mask of which interpolation modes are required by the fragment shader.
 * Used in hardware setup on gen6+.
@@ -424,6 +427,11 @@ struct brw_wm_prog_data {
 */
uint32_t flat_inputs;
 
+   /* Mapping of VUE slots to interpolation modes.
+* Used by the Gen4-5 clip/sf/wm stages.
+*/
+   unsigned char interp_mode[65]; /* BRW_VARYING_SLOT_COUNT */
+
/**
 * Map from gl_varying_slot to the position within the FS setup data
 * payload where the varying's attribute vertex deltas should be delivered.
@@ -580,7 +588,7 @@ void 

[Mesa-dev] [Bug 92877] Add support for libglvnd

2017-01-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92877

Darek  changed:

   What|Removed |Added

 CC||dz1125.bug.trac...@gmail.co
   ||m

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nvc0: allow TK1 (NVEA) queries to work

2017-01-13 Thread Ilia Mirkin
The NVEA 3D class is numerically larger than the NVF0 3D class. The TK1
chip uses the SM35 ISA and likely has the same hw counters. Allow these
to be used like on all the other supported chips.

Signed-off-by: Ilia Mirkin 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_query.c |  4 ++--
 .../drivers/nouveau/nvc0/nvc0_query_hw_metric.c   |  3 +++
 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c   | 19 ++-
 3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c
index 8b9e6b6..6bf2285 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c
@@ -205,7 +205,7 @@ nvc0_screen_get_driver_query_group_info(struct pipe_screen 
*pscreen,
 
if (screen->base.drm->version >= 0x01000101) {
   if (screen->compute) {
- if (screen->base.class_3d <= NVF0_3D_CLASS) {
+ if (screen->base.class_3d < GM107_3D_CLASS) {
 count += 2;
  }
   }
@@ -229,7 +229,7 @@ nvc0_screen_get_driver_query_group_info(struct pipe_screen 
*pscreen,
} else
if (id == NVC0_HW_METRIC_QUERY_GROUP) {
   if (screen->compute) {
-  if (screen->base.class_3d <= NVF0_3D_CLASS) {
+  if (screen->base.class_3d < GM107_3D_CLASS) {
 info->name = "Performance metrics";
 info->max_active_queries = 4; /* A metric uses at least 2 queries 
*/
 info->num_queries = nvc0_hw_metric_get_num_queries(screen);
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
index 089af61..494f2dd 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
@@ -403,6 +403,7 @@ nvc0_hw_metric_get_queries(struct nvc0_screen *screen)
case GM200_3D_CLASS:
case GM107_3D_CLASS:
   return sm50_hw_metric_queries;
+   case NVEA_3D_CLASS:
case NVF0_3D_CLASS:
   return sm35_hw_metric_queries;
case NVE4_3D_CLASS:
@@ -425,6 +426,7 @@ nvc0_hw_metric_get_num_queries(struct nvc0_screen *screen)
case GM200_3D_CLASS:
case GM107_3D_CLASS:
   return ARRAY_SIZE(sm50_hw_metric_queries);
+   case NVEA_3D_CLASS:
case NVF0_3D_CLASS:
   return ARRAY_SIZE(sm35_hw_metric_queries);
case NVE4_3D_CLASS:
@@ -684,6 +686,7 @@ nvc0_hw_metric_get_query_result(struct nvc0_context *nvc0,
switch (screen->base.class_3d) {
case GM200_3D_CLASS:
case GM107_3D_CLASS:
+   case NVEA_3D_CLASS:
case NVF0_3D_CLASS:
   value = sm35_hw_metric_calc_result(hq, res64);
   break;
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
index df5723d..440e5d3 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
@@ -2239,6 +2239,7 @@ nvc0_hw_sm_get_queries(struct nvc0_screen *screen)
   return sm52_hw_sm_queries;
case GM107_3D_CLASS:
   return sm50_hw_sm_queries;
+   case NVEA_3D_CLASS:
case NVF0_3D_CLASS:
   return sm35_hw_sm_queries;
case NVE4_3D_CLASS:
@@ -2262,6 +2263,7 @@ nvc0_hw_sm_get_num_queries(struct nvc0_screen *screen)
   return ARRAY_SIZE(sm52_hw_sm_queries);
case GM107_3D_CLASS:
   return ARRAY_SIZE(sm50_hw_sm_queries);
+   case NVEA_3D_CLASS:
case NVF0_3D_CLASS:
   return ARRAY_SIZE(sm35_hw_sm_queries);
case NVE4_3D_CLASS:
@@ -2475,15 +2477,14 @@ nvc0_hw_sm_get_program(struct nvc0_screen *screen)
   prog->code_size = sizeof(gm107_read_hw_sm_counters_code);
   prog->num_gprs = 14;
} else
-   if (screen->base.class_3d == NVE4_3D_CLASS ||
-   screen->base.class_3d == NVF0_3D_CLASS) {
-  if (screen->base.class_3d == NVE4_3D_CLASS) {
- prog->code = (uint32_t *)nve4_read_hw_sm_counters_code;
- prog->code_size = sizeof(nve4_read_hw_sm_counters_code);
-  } else {
- prog->code = (uint32_t *)nvf0_read_hw_sm_counters_code;
- prog->code_size = sizeof(nvf0_read_hw_sm_counters_code);
-  }
+   if (screen->base.class_3d > NVE4_3D_CLASS) {
+  prog->code = (uint32_t *)nvf0_read_hw_sm_counters_code;
+  prog->code_size = sizeof(nvf0_read_hw_sm_counters_code);
+  prog->num_gprs = 14;
+   } else
+   if (screen->base.class_3d == NVE4_3D_CLASS) {
+  prog->code = (uint32_t *)nve4_read_hw_sm_counters_code;
+  prog->code_size = sizeof(nve4_read_hw_sm_counters_code);
   prog->num_gprs = 14;
} else {
   prog->code = (uint32_t *)nvc0_read_hw_sm_counters_code;
-- 
2.10.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] radv: rename global extension properties structs

2017-01-13 Thread Andres Rodriguez
All extension arrays are global, but only one of them refers to instance
extensions.

The device extension array refers to extensions that are common across
all physical devices. This disctinction will be more imporant once we
have dynamic extension support for devices.

Signed-off-by: Andres Rodriguez 
---
 src/amd/vulkan/radv_device.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 99c56a4..e0991d4 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -147,7 +147,7 @@ radv_physical_device_finish(struct radv_physical_device 
*device)
device->ws->destroy(device->ws);
 }
 
-static const VkExtensionProperties global_extensions[] = {
+static const VkExtensionProperties instance_extensions[] = {
{
.extensionName = VK_KHR_SURFACE_EXTENSION_NAME,
.specVersion = 25,
@@ -172,7 +172,7 @@ static const VkExtensionProperties global_extensions[] = {
 #endif
 };
 
-static const VkExtensionProperties device_extensions[] = {
+static const VkExtensionProperties common_device_extensions[] = {
{
.extensionName = 
VK_KHR_SAMPLER_MIRROR_CLAMP_TO_EDGE_EXTENSION_NAME,
.specVersion = 1,
@@ -258,9 +258,9 @@ VkResult radv_CreateInstance(
 
for (uint32_t i = 0; i < pCreateInfo->enabledExtensionCount; i++) {
bool found = false;
-   for (uint32_t j = 0; j < ARRAY_SIZE(global_extensions); j++) {
+   for (uint32_t j = 0; j < ARRAY_SIZE(instance_extensions); j++) {
if (strcmp(pCreateInfo->ppEnabledExtensionNames[i],
-  global_extensions[j].extensionName) == 0) {
+  instance_extensions[j].extensionName) == 0) {
found = true;
break;
}
@@ -697,9 +697,9 @@ VkResult radv_CreateDevice(
 
for (uint32_t i = 0; i < pCreateInfo->enabledExtensionCount; i++) {
bool found = false;
-   for (uint32_t j = 0; j < ARRAY_SIZE(device_extensions); j++) {
+   for (uint32_t j = 0; j < ARRAY_SIZE(common_device_extensions); 
j++) {
if (strcmp(pCreateInfo->ppEnabledExtensionNames[i],
-  device_extensions[j].extensionName) == 0) {
+  common_device_extensions[j].extensionName) 
== 0) {
found = true;
break;
}
@@ -826,14 +826,14 @@ VkResult radv_EnumerateInstanceExtensionProperties(
VkExtensionProperties*  pProperties)
 {
if (pProperties == NULL) {
-   *pPropertyCount = ARRAY_SIZE(global_extensions);
+   *pPropertyCount = ARRAY_SIZE(instance_extensions);
return VK_SUCCESS;
}
 
-   *pPropertyCount = MIN2(*pPropertyCount, ARRAY_SIZE(global_extensions));
-   typed_memcpy(pProperties, global_extensions, *pPropertyCount);
+   *pPropertyCount = MIN2(*pPropertyCount, 
ARRAY_SIZE(instance_extensions));
+   typed_memcpy(pProperties, instance_extensions, *pPropertyCount);
 
-   if (*pPropertyCount < ARRAY_SIZE(global_extensions))
+   if (*pPropertyCount < ARRAY_SIZE(instance_extensions))
return VK_INCOMPLETE;
 
return VK_SUCCESS;
@@ -846,14 +846,14 @@ VkResult radv_EnumerateDeviceExtensionProperties(
VkExtensionProperties*  pProperties)
 {
if (pProperties == NULL) {
-   *pPropertyCount = ARRAY_SIZE(device_extensions);
+   *pPropertyCount = ARRAY_SIZE(common_device_extensions);
return VK_SUCCESS;
}
 
-   *pPropertyCount = MIN2(*pPropertyCount, ARRAY_SIZE(device_extensions));
-   typed_memcpy(pProperties, device_extensions, *pPropertyCount);
+   *pPropertyCount = MIN2(*pPropertyCount, 
ARRAY_SIZE(common_device_extensions));
+   typed_memcpy(pProperties, common_device_extensions, *pPropertyCount);
 
-   if (*pPropertyCount < ARRAY_SIZE(device_extensions))
+   if (*pPropertyCount < ARRAY_SIZE(common_device_extensions))
return VK_INCOMPLETE;
 
return VK_SUCCESS;
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Subset of patches from wip-high-priority that can be useful v2

2017-01-13 Thread Andres Rodriguez
Following are a subset of patches from my wip-high-priority branch that
may be useful outside that context.

The HW priority debugging may take a little while, so I wanted to make some
of the more generic bits to be available on master as other work could benefit
from it.

v2: Fixed a rebasing mistake, indentation and commit message logs

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] radv: use a winsys context per-queue, instead of per device v2

2017-01-13 Thread Andres Rodriguez
Queues are independent execution streams. The vulkan spec provides no
ordering guarantees for different queues.

By using a single context for all queues, we are forcing all commands
into an unecessary FIFO ordering.

This change is a preparation step to allow our-of-ordering scheduling of
certain work tasks.

v2: Fix a rebase error with radv_QueueSubmit() and trace_bo
Signed-off-by: Andres Rodriguez 
---
 src/amd/vulkan/radv_device.c  | 39 ---
 src/amd/vulkan/radv_private.h |  2 +-
 src/amd/vulkan/radv_wsi.c |  2 +-
 3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 64fbce8..99c56a4 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -662,7 +662,7 @@ void radv_GetPhysicalDeviceMemoryProperties(
};
 }
 
-static void
+static int
 radv_queue_init(struct radv_device *device, struct radv_queue *queue,
int queue_family_index, int idx)
 {
@@ -670,11 +670,19 @@ radv_queue_init(struct radv_device *device, struct 
radv_queue *queue,
queue->device = device;
queue->queue_family_index = queue_family_index;
queue->queue_idx = idx;
+
+   queue->hw_ctx = device->ws->ctx_create(device->ws);
+   if (!queue->hw_ctx)
+   return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+   return VK_SUCCESS;
 }
 
 static void
 radv_queue_finish(struct radv_queue *queue)
 {
+   if (queue->hw_ctx)
+   queue->device->ws->ctx_destroy(queue->hw_ctx);
 }
 
 VkResult radv_CreateDevice(
@@ -730,23 +738,20 @@ VkResult radv_CreateDevice(
goto fail;
}
 
-   device->queue_count[qfi] = queue_create->queueCount;
+   memset(device->queues[qfi], 0, queue_create->queueCount * 
sizeof(struct radv_queue));
 
-   for (unsigned q = 0; q < queue_create->queueCount; q++)
-   radv_queue_init(device, >queues[qfi][q], qfi, 
q);
-   }
+   device->queue_count[qfi] = queue_create->queueCount;
 
-   device->hw_ctx = device->ws->ctx_create(device->ws);
-   if (!device->hw_ctx) {
-   result = VK_ERROR_OUT_OF_HOST_MEMORY;
-   goto fail;
+   for (unsigned q = 0; q < queue_create->queueCount; q++) {
+   result = radv_queue_init(device, 
>queues[qfi][q], qfi, q);
+   if (result != VK_SUCCESS)
+   goto fail;
+   }
}
 
result = radv_device_init_meta(device);
-   if (result != VK_SUCCESS) {
-   device->ws->ctx_destroy(device->hw_ctx);
+   if (result != VK_SUCCESS)
goto fail;
-   }
 
radv_device_init_msaa(device);
 
@@ -791,9 +796,6 @@ fail:
vk_free(>alloc, device->queues[i]);
}
 
-   if (device->hw_ctx)
-   device->ws->ctx_destroy(device->hw_ctx);
-
vk_free(>alloc, device);
return result;
 }
@@ -807,7 +809,6 @@ void radv_DestroyDevice(
if (device->trace_bo)
device->ws->buffer_destroy(device->trace_bo);
 
-   device->ws->ctx_destroy(device->hw_ctx);
for (unsigned i = 0; i < RADV_MAX_QUEUE_FAMILIES; i++) {
for (unsigned q = 0; q < device->queue_count[i]; q++)
radv_queue_finish(>queues[i][q]);
@@ -920,7 +921,7 @@ VkResult radv_QueueSubmit(
RADV_FROM_HANDLE(radv_queue, queue, _queue);
RADV_FROM_HANDLE(radv_fence, fence, _fence);
struct radeon_winsys_fence *base_fence = fence ? fence->fence : NULL;
-   struct radeon_winsys_ctx *ctx = queue->device->hw_ctx;
+   struct radeon_winsys_ctx *ctx = queue->hw_ctx;
int ret;
uint32_t max_cs_submission = queue->device->trace_bo ? 1 : UINT32_MAX;
 
@@ -968,7 +969,7 @@ VkResult radv_QueueSubmit(
}
if (queue->device->trace_bo) {
bool success = queue->device->ws->ctx_wait_idle(
-   queue->device->hw_ctx,
+   queue->hw_ctx,

radv_queue_family_to_ring(

queue->queue_family_index),
queue->queue_idx);
@@ -999,7 +1000,7 @@ VkResult radv_QueueWaitIdle(
 {
RADV_FROM_HANDLE(radv_queue, queue, _queue);
 
-   queue->device->ws->ctx_wait_idle(queue->device->hw_ctx,
+   queue->device->ws->ctx_wait_idle(queue->hw_ctx,
 
radv_queue_family_to_ring(queue->queue_family_index),
 queue->queue_idx);
return VK_SUCCESS;
diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h

[Mesa-dev] [PATCH 3/3] radv: make device extension setup dynamic

2017-01-13 Thread Andres Rodriguez
Each physical device may have different extensions than one another.
Furthermore, depending on the software stack, some extensions may not be
accessible.

If an extension is conditional, it can be registered only when
necessary.

v2: removed unused function and fixed indentation

Signed-off-by: Andres Rodriguez 
---
 src/amd/vulkan/radv_device.c  | 189 +++---
 src/amd/vulkan/radv_private.h |   6 ++
 2 files changed, 130 insertions(+), 65 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index e0991d4..08a1bf3 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -77,6 +77,108 @@ radv_device_get_cache_uuid(enum radeon_family family, void 
*uuid)
return 0;
 }
 
+static const VkExtensionProperties instance_extensions[] = {
+   {
+   .extensionName = VK_KHR_SURFACE_EXTENSION_NAME,
+   .specVersion = 25,
+   },
+#ifdef VK_USE_PLATFORM_XCB_KHR
+   {
+   .extensionName = VK_KHR_XCB_SURFACE_EXTENSION_NAME,
+   .specVersion = 6,
+   },
+#endif
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+   {
+   .extensionName = VK_KHR_XLIB_SURFACE_EXTENSION_NAME,
+   .specVersion = 6,
+   },
+#endif
+#ifdef VK_USE_PLATFORM_WAYLAND_KHR
+   {
+   .extensionName = VK_KHR_WAYLAND_SURFACE_EXTENSION_NAME,
+   .specVersion = 5,
+   },
+#endif
+};
+
+static const VkExtensionProperties common_device_extensions[] = {
+   {
+   .extensionName = 
VK_KHR_SAMPLER_MIRROR_CLAMP_TO_EDGE_EXTENSION_NAME,
+   .specVersion = 1,
+   },
+   {
+   .extensionName = VK_KHR_SWAPCHAIN_EXTENSION_NAME,
+   .specVersion = 68,
+   },
+   {
+   .extensionName = VK_AMD_DRAW_INDIRECT_COUNT_EXTENSION_NAME,
+   .specVersion = 1,
+   },
+   {
+   .extensionName = VK_AMD_NEGATIVE_VIEWPORT_HEIGHT_EXTENSION_NAME,
+   .specVersion = 1,
+   },
+};
+
+static VkResult
+radv_extensions_register(struct radv_instance *instance,
+   struct radv_extensions *extensions,
+   const VkExtensionProperties *new_ext,
+   uint32_t num_ext)
+{
+   size_t new_size;
+   VkExtensionProperties *new_ptr;
+
+   assert(new_ext && num_ext > 0);
+
+   if (!new_ext)
+   return VK_ERROR_INITIALIZATION_FAILED;
+
+   new_size = (extensions->num_ext + num_ext) * 
sizeof(VkExtensionProperties);
+   new_ptr = vk_realloc(>alloc, extensions->ext_array,
+   new_size, 8, 
VK_SYSTEM_ALLOCATION_SCOPE_INSTANCE);
+
+   /* Old array continues to be valid, update nothing */
+   if (!new_ptr)
+   return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+   memcpy(_ptr[extensions->num_ext], new_ext,
+   num_ext * sizeof(VkExtensionProperties));
+   extensions->ext_array = new_ptr;
+   extensions->num_ext += num_ext;
+
+   return VK_SUCCESS;
+}
+
+static void
+radv_extensions_finish(struct radv_instance *instance,
+   struct radv_extensions *extensions)
+{
+   assert(extensions);
+
+   if (!extensions)
+   radv_loge("Attemted to free invalid extension struct\n");
+
+   if (extensions->ext_array)
+   vk_free(>alloc, extensions->ext_array);
+}
+
+static bool
+is_extension_enabled(const VkExtensionProperties *extensions,
+   size_t num_ext,
+   const char *name)
+{
+   assert(extensions && name);
+
+   for (uint32_t i = 0; i < num_ext; i++) {
+   if (strcmp(name, extensions[i].extensionName) == 0)
+   return true;
+   }
+
+   return false;
+}
+
 static VkResult
 radv_physical_device_init(struct radv_physical_device *device,
  struct radv_instance *instance,
@@ -130,6 +232,13 @@ radv_physical_device_init(struct radv_physical_device 
*device,
goto fail;
}
 
+   result = radv_extensions_register(instance,
+   >extensions,
+   common_device_extensions,
+   ARRAY_SIZE(common_device_extensions));
+   if (result != VK_SUCCESS)
+   goto fail;
+
fprintf(stderr, "WARNING: radv is not a conformant vulkan 
implementation, testing use only.\n");
device->name = device->rad_info.name;
close(fd);
@@ -143,53 +252,11 @@ fail:
 static void
 radv_physical_device_finish(struct radv_physical_device *device)
 {
+   radv_extensions_finish(device->instance, >extensions);
radv_finish_wsi(device);
device->ws->destroy(device->ws);
 }
 
-static const VkExtensionProperties instance_extensions[] = {
-   {
-   .extensionName = VK_KHR_SURFACE_EXTENSION_NAME,
-

Re: [Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

2017-01-13 Thread Emil Velikov
Thanks for the comments gents.

This is the type of discussion I was aiming at.

On 13 January 2017 at 22:45, Timothy Arceri
 wrote:
> On Fri, 2017-01-13 at 14:32 -0800, Jason Ekstrand wrote:
>> On Fri, Jan 13, 2017 at 2:18 PM, Timothy Arceri > bora.com> wrote:
>> > On Fri, 2017-01-13 at 13:59 -0800, Jason Ekstrand wrote:
>> > > On Fri, Jan 13, 2017 at 11:22 AM, Vladislav Egorov > > ail.
>> > > com> wrote:
>> > > > 13.01.2017 19:51, Emil Velikov пишет:
>> > > > > From: Emil Velikov 
>> > > > >
>> > > > > At the moment we support 5+ different implementations each
>> > with
>> > > > > varying
>> > > > > amount of bugs - from thread safely problems [1], to outright
>> > > > > broken
>> > > > > implementation(s) [2]
>> > > > >
>> > > > > In order to accommodate these we have 150+ lines of configure
>> > > > > script and
>> > > > > extra two configure toggles. Whist an actual implementation
>> > being
>> > > > > ~200loc and our current compat wrapping ~250.
>> > >
>> > > Yes, this is a problem.  Especially given that at least one of
>> > those
>> > > implementations (openssl?) is something that a certain major game
>> > > distributor likes to hard-link into things causing interesting
>> > and
>> > > hard-to-debug problems.  I am all for getting rid of the "piles
>> > of
>> > > different dependencies" approach.
>> > >
>> > > Also, something I would like to see (maybe a follow-on patch?)
>> > would
>> > > a change to the mesa internal API to be able to put the SHA
>> > context
>> > > on the stack and not need to malloc it.  It's not really a memory
>> > or
>> > > cycle-saving thing so much as it leaves one fewer cleanup paths
>> > you
>> > > have to worry about.
>> > >
>> > > > > Let's not forget that different people use different code
>> > paths,
>> > > > > thus
>> > > > > effectively makes it harder to test and debug since the
>> > default
>> > > > > implementation is automatically detected.
>> > > > >
>> > > > > To minimise all these lovely experiences, import the "100%
>> > Public
>> > > > > Domain" OpenBSD sha1 implementation. Clearly document any
>> > changes
>> > > > > needed
>> > > > > to get building correctly, since many/most of those can be
>> > > > > upstreamed
>> > > > > making future syncs easier.
>> > > > >
>> > > >
>> > > > It can hurt performance. OpenSSL implementation is optimized
>> > for
>> > > > all thinkable architectures and it will use hardware SHA-1
>> > > > instructions on newer CPUs. From https://github.com/openssl/ope
>> > nssl
>> > > > /blob/master/crypto/sha/asm/sha1-x86_64.pl :
>> > > >
>> > > > > Current performance is summarized in following table. Numbers
>> > are
>> > > > > CPU clock cycles spent to process single byte (less is
>> > better).
>> > > > >
>> > > > >x86_64SSSE3AVX[2]
>> > > > > P49.05-
>> > > > > Opteron6.26-
>> > > > > Core26.556.05/+8%-
>> > > > > Westmere6.735.30/+27%-
>> > > > > Sandy Bridge7.706.10/+26%4.99/+54%
>> > > > > Ivy Bridge6.064.67/+30%4.60/+32%
>> > > > > Haswell5.454.15/+31%3.57/+53%
>> > > > > Skylake5.184.06/+28%3.54/+46%
>> > > > > Bulldozer9.115.95/+53%
>> > > > > VIA Nano9.327.15/+30%
>> > > > > Atom10.39.17/+12%
>> > > > > Silvermont13.1(*)9.37/+40%
>> > > > > Goldmont8.136.42/+27%1.70/+380%(**)
>> > > >
>> > > > Quick benchmark on my Haswell of the OpenBSD implementation
>> > > > compiled with GCC5 -O2: ~8 cycles per byte on 32-bit, ~7 cycles
>> > per
>> > > > byte on 64-bit. But Haswell is a very powerful CPU, on weaker
>> > CPUs
>> > > > the difference would be probably larger, especially on new CPUs
>> > > > that have SHA instruction set.
>> > >
>> > > Thanks for the numbers.  It sounds like, on Haswell, the openSSL
>> > > implementation is about 2x as fast which is very useful to know.
>> > > However, this isn't on a super perf-critical path.  We never use
>> > SHA1
>> > > on any draw-time paths; we always use a simpler hash function in
>> > > those cases and reserve SHA1 for when we really don't want
>> > > collisions.
>> >
>> > Actually the OpenGL shader cache uses it a draw time to find cached
>> > variants. I looked at pulling an implementation into Mesa a while
>> > ago
>> > but found the perf drop wasn't worth it.
>>
>> Why doesn't the usual in-memory cache stand as a front-line defense?
>
> It does :)
>
>>   Could you please be more specific about the perf implications
>> you've seen?
>
> I'm asking for a chance to test before we jump in, its probably not a
> big deal and I may even still be able to reduce my use of hashing but
> it would be nice to be given a few days to test and even explore
> alternatives before jumping on this implementation.
>
>>  Also, which implementation were you linking to that was so much
>> 

Re: [Mesa-dev] [PATCH 3/3] radv: make device extension setup dynamic

2017-01-13 Thread Andres Rodriguez



On 2017-01-13 06:30 PM, Bas Nieuwenhuizen wrote:

On Sat, Jan 14, 2017 at 12:20 AM, Andres Rodriguez  wrote:


On 2017-01-13 06:04 PM, Bas Nieuwenhuizen wrote:

On Fri, Jan 13, 2017 at 11:06 PM, Andres Rodriguez 
wrote:

Each physical may have different extensions than one another.
Furthermore, depending on the software stack, some extensions may not be
accessible.

If an extension is conditional, it can be registered only when
necessary.

Signed-off-by: Andres Rodriguez 
---
   src/amd/vulkan/radv_device.c  | 196
--
   src/amd/vulkan/radv_private.h |   6 ++
   2 files changed, 137 insertions(+), 65 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 5669fd7..0333688 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -77,6 +77,115 @@ radv_device_get_cache_uuid(enum radeon_family family,
void *uuid)
  return 0;
   }

+static const VkExtensionProperties instance_extensions[] = {
+   {
+   .extensionName = VK_KHR_SURFACE_EXTENSION_NAME,
+   .specVersion = 25,
+   },
+#ifdef VK_USE_PLATFORM_XCB_KHR
+   {
+   .extensionName = VK_KHR_XCB_SURFACE_EXTENSION_NAME,
+   .specVersion = 6,
+   },
+#endif
+#ifdef VK_USE_PLATFORM_XLIB_KHR
+   {
+   .extensionName = VK_KHR_XLIB_SURFACE_EXTENSION_NAME,
+   .specVersion = 6,
+   },
+#endif
+#ifdef VK_USE_PLATFORM_WAYLAND_KHR
+   {
+   .extensionName = VK_KHR_WAYLAND_SURFACE_EXTENSION_NAME,
+   .specVersion = 5,
+   },
+#endif
+};
+
+static const VkExtensionProperties common_device_extensions[] = {
+   {
+   .extensionName =
VK_KHR_SAMPLER_MIRROR_CLAMP_TO_EDGE_EXTENSION_NAME,
+   .specVersion = 1,
+   },
+   {
+   .extensionName = VK_KHR_SWAPCHAIN_EXTENSION_NAME,
+   .specVersion = 68,
+   },
+   {
+   .extensionName =
VK_AMD_DRAW_INDIRECT_COUNT_EXTENSION_NAME,
+   .specVersion = 1,
+   },
+   {
+   .extensionName =
VK_AMD_NEGATIVE_VIEWPORT_HEIGHT_EXTENSION_NAME,
+   .specVersion = 1,
+   },
+};
+
+static VkResult
+radv_extensions_register(struct radv_instance *instance,
+   struct radv_extensions
*extensions,
+   const VkExtensionProperties
*new_ext,
+   uint32_t num_ext)
+{
+   size_t new_size;
+   VkExtensionProperties *new_ptr;
+
+   assert(new_ext && num_ext > 0);
+
+   if (!new_ext)
+   return VK_ERROR_INITIALIZATION_FAILED;
+
+   new_size = (extensions->num_ext + num_ext) *
sizeof(VkExtensionProperties);
+   new_ptr = vk_realloc(>alloc, extensions->ext_array,
+   new_size, 8,
VK_SYSTEM_ALLOCATION_SCOPE_INSTANCE);
+
+   /* Old array continues to be valid, update nothing */
+   if (!new_ptr)
+   return VK_ERROR_OUT_OF_HOST_MEMORY;
+
+   memcpy(_ptr[extensions->num_ext], new_ext,
+   num_ext * sizeof(VkExtensionProperties));
+   extensions->ext_array = new_ptr;
+   extensions->num_ext += num_ext;
+
+   return VK_SUCCESS;
+}
+
+#define radv_extensions_register_single(instance, extensions, name,
version) \
+   radv_extensions_register(instance, extensions, \
+
&(VkExtensionProperties){ \
+
.extensionName = name, \
+
.specVersion = version \
+   }, 1);

Please make this a function, I see no reason to keep this a macro. Or
lose it, as I can't find an user in this patch.

+
+static void
+radv_extensions_finish(struct radv_instance *instance,
+   struct radv_extensions
*extensions)
+{
+   assert(extensions);
+
+   if (!extensions)
+   radv_loge("Attemted to free invalid extension struct\n");
+
+   if (extensions->ext_array)
+   vk_free(>alloc, extensions->ext_array);
+}
+
+static bool
+is_extension_enabled(const VkExtensionProperties *extensions,
+   size_t num_ext,
+   const char *name)
+{
+   assert(extensions && name);
+
+   for (uint32_t i = 0; i < num_ext; i++) {
+   if (strcmp(name, extensions[i].extensionName) == 0)
+   return true;
+   }
+
+   return false;
+}
+
   static VkResult
   radv_physical_device_init(struct radv_physical_device *device,
struct radv_instance *instance,
@@ -130,6 +239,13 @@ radv_physical_device_init(struct
radv_physical_device *device,
  goto fail;
  }

+   result = radv_extensions_register(instance,
+
>extensions,
+

Re: [Mesa-dev] [PATCH] i965/vec4: Fix mapping attributes

2017-01-13 Thread Kenneth Graunke
On Friday, January 13, 2017 1:41:34 PM PST Jordan Justen wrote:
> From: "Juan A. Suarez Romero" 
> 
> This patch reverts 57bab6708f2bbc1ab8a3d202e9a467963596d462, which was
> causing issues with ILK and earlier VS programs.
> 
> 1. Revert "i965/vec4/nir: vec4 also needs to remap vs attributes"
> 
>Do not perform a remap in vec4 backend. Rather, do it later when
>setup attributes
> 
> 2. This fixes mapping ATTRx to proper GRFn.
> 
> Suggested-by: Kenneth Graunke 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99391
> [jordan.l.jus...@intel.com: merge Juan's two patches from bugzilla]
> Signed-off-by: Jordan Justen 
> Cc: Kenneth Graunke 
> ---
>  I merged Juan's revert + fix patches, as suggested by Ken.
> 
>  I put this patch through jenkins, and they appeared to fix the
>  ilk/g45/g965 regressions.
> 
>  The revert is in brw_nir.c, and Juan's new change is in brw_vec4.cpp.
> 
>  src/mesa/drivers/dri/i965/brw_nir.c| 32 ++--
>  src/mesa/drivers/dri/i965/brw_vec4.cpp |  2 +-
>  2 files changed, 11 insertions(+), 23 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
> b/src/mesa/drivers/dri/i965/brw_nir.c
> index b39e2b1f523..3c1bc5162fc 100644
> --- a/src/mesa/drivers/dri/i965/brw_nir.c
> +++ b/src/mesa/drivers/dri/i965/brw_nir.c
> @@ -95,19 +95,9 @@ add_const_offset_to_base(nir_shader *nir, 
> nir_variable_mode mode)
> }
>  }
>  
> -struct remap_vs_attrs_params {
> -   shader_info *nir_info;
> -   bool is_scalar;
> -};
> -
>  static bool
> -remap_vs_attrs(nir_block *block, void *closure)
> +remap_vs_attrs(nir_block *block, shader_info *nir_info)
>  {
> -   struct remap_vs_attrs_params *params =
> -  (struct remap_vs_attrs_params *) closure;
> -   shader_info *nir_info = params->nir_info;
> -   bool is_scalar = params->is_scalar;
> -
> nir_foreach_instr(instr, block) {
>if (instr->type != nir_instr_type_intrinsic)
>   continue;
> @@ -123,7 +113,7 @@ remap_vs_attrs(nir_block *block, void *closure)
>   int attr = intrin->const_index[0];
>   int slot = _mesa_bitcount_64(nir_info->inputs_read &
>BITFIELD64_MASK(attr));
> - intrin->const_index[0] = is_scalar ? 4 * slot : slot;
> + intrin->const_index[0] = 4 * slot;
>}
> }
> return true;
> @@ -267,11 +257,6 @@ brw_nir_lower_vs_inputs(nir_shader *nir,
>  bool use_legacy_snorm_formula,
>  const uint8_t *vs_attrib_wa_flags)
>  {
> -   struct remap_vs_attrs_params params = {
> -  .nir_info = nir->info,
> -  .is_scalar = is_scalar
> -   };
> -
> /* Start with the location of the variable's base. */
> foreach_list_typed(nir_variable, var, node, >inputs) {
>var->data.driver_location = var->data.location;
> @@ -291,11 +276,14 @@ brw_nir_lower_vs_inputs(nir_shader *nir,
> brw_nir_apply_attribute_workarounds(nir, use_legacy_snorm_formula,
> vs_attrib_wa_flags);
>  
> -   /* Finally, translate VERT_ATTRIB_* values into the actual registers. */
> -   nir_foreach_function(function, nir) {
> -  if (function->impl) {
> - nir_foreach_block(block, function->impl) {
> -remap_vs_attrs(block, );
> +   if (is_scalar) {
> +  /* Finally, translate VERT_ATTRIB_* values into the actual registers. 
> */
> +
> +  nir_foreach_function(function, nir) {
> + if (function->impl) {
> +nir_foreach_block(block, function->impl) {
> +   remap_vs_attrs(block, nir->info);
> +}
>   }
>}
> }
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index 748a068b142..5e60eb657a7 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -1739,7 +1739,7 @@ vec4_vs_visitor::setup_attributes(int payload_reg)
>int needed_slots =
>   (vs_prog_data->double_inputs_read & BITFIELD64_BIT(first)) ? 2 : 1;
>for (int c = 0; c < needed_slots; c++) {
> - attribute_map[nr_attributes] = payload_reg + nr_attributes;
> + attribute_map[first + c] = payload_reg + nr_attributes;
>   nr_attributes++;
>   vs_inputs &= ~BITFIELD64_BIT(first + c);
>}
> 

Thanks Juan and Jordan!

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965: Combine some dead code elimination NOP'ing code.

2017-01-13 Thread Kenneth Graunke
In theory we might have incorrectly NOP'd instructions that write the
flag, but where that flag value isn't used, and yet the instruction
either writes the accumulator or has side effects.

I don't believe any such instructions exist, so this is mostly a
code cleanup.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
index 8a0469a51b9..930dc733b45 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
@@ -70,17 +70,11 @@ fs_visitor::dead_code_eliminate()
 }
  }
 
- if (inst->dst.is_null() && inst->flags_written()) {
-if (!(flag_live[0] & inst->flags_written())) {
-   inst->opcode = BRW_OPCODE_NOP;
-   progress = true;
-}
- }
-
  if ((inst->opcode != BRW_OPCODE_IF &&
   inst->opcode != BRW_OPCODE_WHILE) &&
  inst->dst.is_null() &&
  !inst->has_side_effects() &&
+ !(flag_live[0] & inst->flags_written()) &&
  !inst->flags_written() &&
  !inst->writes_accumulator) {
 inst->opcode = BRW_OPCODE_NOP;
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965: Make DCE set null destinations on messages with side effects.

2017-01-13 Thread Kenneth Graunke
(Co-authored by Matt Turner.)

Image atomics, for example, return a value - but the shader may not
want to use it.  We assigned a useless VGRF destination.  This seemed
harmless, but it can actually be quite harmful.  The register allocator
has to assign that VGRF to a real register.  It may assign the same
actual GRF to the destination of an instruction that follows soon after.

This results in a write-after-write (WAW) dependency, and stall.

A number of "Deus Ex: Mankind Divided" shaders use image atomics, but
don't use the return value.  Several of these were hitting WAW stalls
for nearly 14,000 (poorly estimated) cycles a pop.  Making dead code
elimination null out the destination avoids this issue.

This patch cuts one shader's estimated cycles by -98.39%!  Removing the
message response should also help with data cluster bandwidth.

On Skylake:

total instructions in shared programs: 13366907 -> 13363051 (-0.03%)
instructions in affected programs: 49635 -> 45779 (-7.77%)
helped: 133
HURT: 0

total cycles in shared programs: 255433388 -> 248081818 (-2.88%)
cycles in affected programs: 12370702 -> 5019132 (-59.43%)
helped: 100
HURT: 24

Signed-off-by: Kenneth Graunke 
---
 .../dri/i965/brw_fs_dead_code_eliminate.cpp| 56 --
 1 file changed, 41 insertions(+), 15 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
index 930dc733b45..885ae2638a8 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
@@ -34,6 +34,43 @@
  * yet in the tail end of this block.
  */
 
+static bool
+can_eliminate(const fs_inst *inst, BITSET_WORD *flag_live)
+{
+return inst->opcode != BRW_OPCODE_IF &&
+   inst->opcode != BRW_OPCODE_WHILE &&
+   !inst->has_side_effects() &&
+   !(flag_live[0] & inst->flags_written()) &&
+   !inst->writes_accumulator;
+}
+
+static bool
+can_omit_write(const fs_inst *inst, BITSET_WORD *flag_live)
+{
+   switch (inst->opcode) {
+   case SHADER_OPCODE_UNTYPED_ATOMIC:
+   case SHADER_OPCODE_UNTYPED_ATOMIC_LOGICAL:
+   case SHADER_OPCODE_TYPED_ATOMIC:
+   case SHADER_OPCODE_TYPED_ATOMIC_LOGICAL:
+  return true;
+   default:
+  /* If we're going to eliminate the instruction entirely, omitting the
+   * write is always safe.
+   */
+  if (can_eliminate(inst, flag_live))
+ return true;
+
+  /* We can eliminate the destination write for ordinary instructions,
+   * but not most SENDs.
+   */
+  if (inst->opcode < 128 && inst->mlen == 0)
+ return true;
+
+  /* It might not be safe for other virtual opcodes. */
+  return false;
+   }
+}
+
 bool
 fs_visitor::dead_code_eliminate()
 {
@@ -52,31 +89,20 @@ fs_visitor::dead_code_eliminate()
  sizeof(BITSET_WORD));
 
   foreach_inst_in_block_reverse_safe(fs_inst, inst, block) {
- if (inst->dst.file == VGRF && !inst->has_side_effects()) {
+ if (inst->dst.file == VGRF) {
 const unsigned var = live_intervals->var_from_reg(inst->dst);
 bool result_live = false;
 
 for (unsigned i = 0; i < regs_written(inst); i++)
result_live |= BITSET_TEST(live, var + i);
 
-if (!result_live) {
+if (!result_live && can_omit_write(inst, flag_live)) {
+   inst->dst = fs_reg(retype(brw_null_reg(), inst->dst.type));
progress = true;
-
-   if (inst->writes_accumulator || inst->flags_written()) {
-  inst->dst = fs_reg(retype(brw_null_reg(), inst->dst.type));
-   } else {
-  inst->opcode = BRW_OPCODE_NOP;
-   }
 }
  }
 
- if ((inst->opcode != BRW_OPCODE_IF &&
-  inst->opcode != BRW_OPCODE_WHILE) &&
- inst->dst.is_null() &&
- !inst->has_side_effects() &&
- !(flag_live[0] & inst->flags_written()) &&
- !inst->flags_written() &&
- !inst->writes_accumulator) {
+ if (inst->dst.is_null() && can_eliminate(inst, flag_live)) {
 inst->opcode = BRW_OPCODE_NOP;
 progress = true;
  }
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] radv: make device extension setup dynamic

2017-01-13 Thread Bas Nieuwenhuizen
On Fri, Jan 13, 2017 at 11:06 PM, Andres Rodriguez  wrote:
> Each physical may have different extensions than one another.
> Furthermore, depending on the software stack, some extensions may not be
> accessible.
>
> If an extension is conditional, it can be registered only when
> necessary.
>
> Signed-off-by: Andres Rodriguez 
> ---
>  src/amd/vulkan/radv_device.c  | 196 
> --
>  src/amd/vulkan/radv_private.h |   6 ++
>  2 files changed, 137 insertions(+), 65 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index 5669fd7..0333688 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -77,6 +77,115 @@ radv_device_get_cache_uuid(enum radeon_family family, 
> void *uuid)
> return 0;
>  }
>
> +static const VkExtensionProperties instance_extensions[] = {
> +   {
> +   .extensionName = VK_KHR_SURFACE_EXTENSION_NAME,
> +   .specVersion = 25,
> +   },
> +#ifdef VK_USE_PLATFORM_XCB_KHR
> +   {
> +   .extensionName = VK_KHR_XCB_SURFACE_EXTENSION_NAME,
> +   .specVersion = 6,
> +   },
> +#endif
> +#ifdef VK_USE_PLATFORM_XLIB_KHR
> +   {
> +   .extensionName = VK_KHR_XLIB_SURFACE_EXTENSION_NAME,
> +   .specVersion = 6,
> +   },
> +#endif
> +#ifdef VK_USE_PLATFORM_WAYLAND_KHR
> +   {
> +   .extensionName = VK_KHR_WAYLAND_SURFACE_EXTENSION_NAME,
> +   .specVersion = 5,
> +   },
> +#endif
> +};
> +
> +static const VkExtensionProperties common_device_extensions[] = {
> +   {
> +   .extensionName = 
> VK_KHR_SAMPLER_MIRROR_CLAMP_TO_EDGE_EXTENSION_NAME,
> +   .specVersion = 1,
> +   },
> +   {
> +   .extensionName = VK_KHR_SWAPCHAIN_EXTENSION_NAME,
> +   .specVersion = 68,
> +   },
> +   {
> +   .extensionName = VK_AMD_DRAW_INDIRECT_COUNT_EXTENSION_NAME,
> +   .specVersion = 1,
> +   },
> +   {
> +   .extensionName = 
> VK_AMD_NEGATIVE_VIEWPORT_HEIGHT_EXTENSION_NAME,
> +   .specVersion = 1,
> +   },
> +};
> +
> +static VkResult
> +radv_extensions_register(struct radv_instance *instance,
> +   struct radv_extensions *extensions,
> +   const VkExtensionProperties *new_ext,
> +   uint32_t num_ext)
> +{
> +   size_t new_size;
> +   VkExtensionProperties *new_ptr;
> +
> +   assert(new_ext && num_ext > 0);
> +
> +   if (!new_ext)
> +   return VK_ERROR_INITIALIZATION_FAILED;
> +
> +   new_size = (extensions->num_ext + num_ext) * 
> sizeof(VkExtensionProperties);
> +   new_ptr = vk_realloc(>alloc, extensions->ext_array,
> +   new_size, 8, 
> VK_SYSTEM_ALLOCATION_SCOPE_INSTANCE);
> +
> +   /* Old array continues to be valid, update nothing */
> +   if (!new_ptr)
> +   return VK_ERROR_OUT_OF_HOST_MEMORY;
> +
> +   memcpy(_ptr[extensions->num_ext], new_ext,
> +   num_ext * sizeof(VkExtensionProperties));
> +   extensions->ext_array = new_ptr;
> +   extensions->num_ext += num_ext;
> +
> +   return VK_SUCCESS;
> +}
> +
> +#define radv_extensions_register_single(instance, extensions, name, version) 
> \
> +   radv_extensions_register(instance, extensions, \
> +   
> &(VkExtensionProperties){ \
> +   
> .extensionName = name, \
> +   
> .specVersion = version \
> +   }, 1);

Please make this a function, I see no reason to keep this a macro. Or
lose it, as I can't find an user in this patch.
> +
> +static void
> +radv_extensions_finish(struct radv_instance *instance,
> +   struct radv_extensions 
> *extensions)
> +{
> +   assert(extensions);
> +
> +   if (!extensions)
> +   radv_loge("Attemted to free invalid extension struct\n");
> +
> +   if (extensions->ext_array)
> +   vk_free(>alloc, extensions->ext_array);
> +}
> +
> +static bool
> +is_extension_enabled(const VkExtensionProperties *extensions,
> +   size_t num_ext,
> +   const char *name)
> +{
> +   assert(extensions && name);
> +
> +   for (uint32_t i = 0; i < num_ext; i++) {
> +   if (strcmp(name, extensions[i].extensionName) == 0)
> +   return true;
> +   }
> +
> +   return false;
> +}
> +
>  static VkResult
>  radv_physical_device_init(struct 

Re: [Mesa-dev] [PATCH 4/4] glsl: Use hash table cloning in copy propagation

2017-01-13 Thread Connor Abbott
On Fri, Jan 13, 2017 at 1:55 PM, Thomas Helland
 wrote:
> 2017-01-13 18:41 GMT+01:00 Vladislav Egorov :
>> 13.01.2017 15:31, Tapani Pälli пишет:
>>>
>>>
>>>
>>> On 01/12/2017 09:23 PM, Thomas Helland wrote:

 Walking the whole hash table, inserting entries by hashing them first
 is just a really really bad idea. We can simply memcpy the whole thing.
>>>
>>>
>>> Maybe it is just 'really' not 'really really' since I don't spot any
>>> difference in time running the torture test in bug #94477 (oscillates close
>>> to 120s with both with and without these patches), I would expect at least
>>> some difference as it is utilizing this path a lot. Did you measure
>>> performance difference?
>>>
>>
>> It wouldn't help the torture case from the bug, because that shader doesn't
>> have LOOP and IF blocks, so more efficient copying the ACP for LOOP/IF
>> blocks would not be even touched.
>>
>> Quick benchmark of Tom's patches on shader-db.
>>
>> Default shader-db, ./run -1, 10 runs:
>>
>>   BEFOREAFTER
>> softpipe  3.20s 3.15s
>> radeonsi  5.17s 5.12s
>> i965/Haswell  7.33s 7.19s
>>
>> On my full shader-db (50K+ shaders from games):
>>
>>   BEFORE   AFTER
>> softpipe (5 runs) 156.6s   153.9s
>> i965  625s 613s
>>
>> So it brings 1-2% speed across the board.
>
> What he said. It only helps when there are if's or loops.
> The other patch I wrote based on Connor's suggestion makes a big impact.
> But as he found out, and I confirmed, yesterday, the approach doesn't work.
> So it is back to the drawing board on that one. And I thought I was so close 
> :-/

Did you try deep copying the hash table (making new copies of the
acp_entry's and fixing up their pointers)? It won't be as fast, and
you can't do this optimization, but it might still be faster.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 3/4] anv: generate entry points from vk.xml

2017-01-13 Thread Lionel Landwerlin

On 13/01/17 15:22, Emil Velikov wrote:

On 13 January 2017 at 15:02, Lionel Landwerlin
 wrote:

On 13/01/17 14:31, Emil Velikov wrote:

On 13 January 2017 at 12:00, Lionel Landwerlin
 wrote:

v2: rework entry point iteration (Jason)
  cleanup unused imports

Signed-off-by: Lionel Landwerlin 
---
   src/intel/vulkan/Makefile.am|  15 ++--
   src/intel/vulkan/anv_entrypoints_gen.py | 126

   2 files changed, 71 insertions(+), 70 deletions(-)

diff --git a/src/intel/vulkan/Makefile.am b/src/intel/vulkan/Makefile.am
index df7645fb13..d32b57f267 100644
--- a/src/intel/vulkan/Makefile.am
+++ b/src/intel/vulkan/Makefile.am
@@ -23,11 +23,6 @@ include Makefile.sources

   vulkan_includedir = $(includedir)/vulkan

-vulkan_include_HEADERS = \
-   $(top_srcdir)/include/vulkan/vk_platform.h \
-   $(top_srcdir)/include/vulkan/vulkan.h \
-   $(top_srcdir)/include/vulkan/vulkan_intel.h
-

I think that in the long term we might want to remove the local
headers/xml files and use the Khronos ones directly. Distros, Arch at
least, already ship them in separate package.

Until then we really want to keep the above hunk. vulkan_intel.h must
always be there - Mesa is the canonical source of it, afaict.


But they're not a dependency to build the entry point files anymore with
these 2 patches.
So why have them as a dependency?


You've already handled dependency tracking nicely in the now missing hunk.
The above 4-5 lines install the headers.


Oh right!
Thanks!



Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/va: delay calling begin_frame until we have all parameters

2017-01-13 Thread Andy Furniss

Nayan Deshmukh wrote:

On Fri, Jan 13, 2017 at 8:32 PM, Andy Furniss  wrote:


Nayan Deshmukh wrote:


Hi Andy,

Please test this patch for regressions.



Do you have a testcase to show the fix?

TBH I've not tested gstreamer with mpeg2 before as vaapi mpeg2
h/w dec never worked properly anyway.

https://bugs.freedesktop.org/show_bug.cgi?id=93760

With mpv --hwdec=vaapi it doesn't seem to regress anything.


I was talking about --hwdec=vaapi. Before this patch I was not able to play
any mpeg videos with vaapi as mpv --hwdec=vaapi --vo=vaapi always
segfaulted. With this patch I can see videos properly. Just wanted to
make sure it did not cause any regression when using hardware decoder.


Oh, OK, I can't reproduce that with mpv, but it will still just assert 
with mesa debug build


mpv: picture_mpeg12.c:84: vlVaHandleSliceParameterBufferMPEG12: 
Assertion `buf->size >= sizeof(VASliceParameterBufferMPEG2) && 
buf->num_elements == 1' failed.


Or play with non debug build, but depending on source vid may be
slightly corrupted.

Would be interesting to see if you see the same with this vid
which easily shows the corruption.

https://drive.google.com/drive/folders/0BxP5-S1t9VEEbkR4dWhTUFozV2s?usp=sharing

Looks bad --hwdec-vaapi with or without --vo=vaapi

OK with --hwdec=vdpau --vo=vdpau (just --hwdec=vdpau will be slightly wrong
currently as there is a vdpau gl interop bug that causes half res)



More generally - it's really good you are working on vaapi - I don't
know what you've discusses with anyone but did you see the old threads
around VAAPI_DISABLE_INTERLACE?


I haven't discussed it with anyone but I will try reading the old threads
and the
bug reports.


Thanks.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] NaN behavior in GLSL (was Re: [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs()))

2017-01-13 Thread Jason Ekstrand
On Fri, Jan 13, 2017 at 4:05 AM, Marek Olšák  wrote:

> On Fri, Jan 13, 2017 at 3:37 AM, Ilia Mirkin  wrote:
> > On Thu, Jan 12, 2017 at 9:13 PM, Jason Ekstrand 
> wrote:
> >> Unless, of course, it's controlled by the same hardware bit... Clearly,
> we
> >> can can give you abs on rsq without denorm flushing (easy shader hacks)
> but
> >> not the other way around.
> >
> > OK, so somehow I missed that earlier. However there's an interesting
> > section in the PRM:
> >
> > https://01.org/sites/default/files/documentation/intel-gfx-
> prm-osrc-skl-vol07-3d_media_gpgpu.pdf
> >
> > on PDF page 854, "Dismissed Legacy Behaviors" which has a list of
> > suggested IEEE 754 deviations for DX9. One of them is indeed that 0 *
> > x = 0, but another is that input NaNs be propagated with certain
> > exceptions. Also they suggest that RCP(0)/RSQ(0) = fmax. Interesting.
> >
> > So at this point, the zero_wins thing is pretty much blown. i965
> > appears to have an all-or-nothing approach, and additionally that
> > approach doesn't match up exactly to what NVIDIA does (or at least I'm
> > not aware of a clamp-everything mode).
> >
> > This will take some thought to figure out how something can be
> > specified so that a single spec works for both i965 and nv/amd. OTOH
> > we could have two different specs that just expose different things -
> > e.g. i965 could expose a MESA_shader_float_alt_mode or whatever which
> > is spec'd to do the things that the PRM says, and nv/amd have the
> > MESA_shader_float_zero_wins ext which does what we were talking about
> > earlier.
> >
> > I'm open to other suggestions too.
>
> There is also the "small" problem that it would take a non-trivial
> effort for us on the LLVM side. You guys can flip a switch. We can't.
>

Don't you have to expend that effort for ARB programs anyway?  I thought
they weren't supposed to generate NaN either.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: increase ANV_MAX_STATE_SIZE_LOG2 limit to 256 kB

2017-01-13 Thread Jason Ekstrand
On Fri, Jan 13, 2017 at 1:33 AM, Samuel Iglesias Gonsálvez <
sigles...@igalia.com> wrote:

> Fixes crash in dEQP-VK.ubo.random.all_shared_buffer.48 due to a
> fragment shader code bigger than 128 kB.
>
> This patch increases the allocation size limit to 256 kB.
>

That limit will have to be changed two places.  Where you did and also
where we init the instruction_block_pool in anv_device.c


> Signed-off-by: Samuel Iglesias Gonsálvez 
> ---
>
> OpenGL driver compares the code size against the allocated cache buffer
> object size and allocates more if needed. I don't know how plausible
> is to have something similar in ANV instead of hardcoding the maximum
> size.
>
> Anyway, if setting it to 256 kB is too much for some reason, please discard
> this patch.
>
>  src/intel/vulkan/anv_private.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_
> private.h
> index 2342fcbfeb4..cd3f17648dd 100644
> --- a/src/intel/vulkan/anv_private.h
> +++ b/src/intel/vulkan/anv_private.h
> @@ -393,7 +393,7 @@ struct anv_fixed_size_state_pool {
>  };
>
>  #define ANV_MIN_STATE_SIZE_LOG2 6
> -#define ANV_MAX_STATE_SIZE_LOG2 17
> +#define ANV_MAX_STATE_SIZE_LOG2 18
>

Let's make it 1 MB while we're at it.  I'm tired of bumping this.


>
>  #define ANV_STATE_BUCKETS (ANV_MAX_STATE_SIZE_LOG2 -
> ANV_MIN_STATE_SIZE_LOG2 + 1)
>
> --
> 2.11.0
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl/wayland: resolve quirky try_damage_buffer() implementation

2017-01-13 Thread Derek Foreman

On 13/01/17 11:27 AM, Emil Velikov wrote:

From: Emil Velikov 

The implementation was added with commit d085a5dff5b and effectively
provided a hidden dependency.

Namely: the codepath used was determined solely during build time. Thus
if we built again new wayland and then run against older (yet still
within the requirements, as per the configure) one will get undefined
symbols.


indeed. :(


As of earlier commit 36b9976e1f9 "egl/wayland: Avoid race conditions
when on non-main thread" the required version was bumped to one which
provides the API, thus we can drop the quirky solution.

Cc: Derek Foreman 
Signed-off-by: Emil Velikov 


I'd forgotten this was still in place - thanks for coming back to it.

Reviewed-by: Derek Foreman 


---
One way to avoid the issue w/o bumping the requirement (for -stable) is
to add fall-back define alongside weak implementation of the functions.
The latter should "return false" and will get automatically overridden
if new enough wayland is used.

>

Not sure how much one should care - just thinking out loud.


If one should care, that sounds like a good way to go about it...

Let me know if you want me to care, and I'll write it up.

Thanks,
Derek


---
 src/egl/drivers/dri2/platform_wayland.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_wayland.c 
b/src/egl/drivers/dri2/platform_wayland.c
index 4009cc9691..3057604d3c 100644
--- a/src/egl/drivers/dri2/platform_wayland.c
+++ b/src/egl/drivers/dri2/platform_wayland.c
@@ -669,14 +669,6 @@ try_damage_buffer(struct dri2_egl_surface *dri2_surf,
   const EGLint *rects,
   EGLint n_rects)
 {
-/* The WL_SURFACE_DAMAGE_BUFFER_SINCE_VERSION macro and
- * wl_proxy_get_version() were both introduced in wayland 1.10.
- * Instead of bumping our wayland dependency we just make this
- * function conditional on the required 1.10 features, falling
- * back to old (correct but suboptimal) behaviour for older
- * wayland.
- */
-#ifdef WL_SURFACE_DAMAGE_BUFFER_SINCE_VERSION
int i;

if (wl_proxy_get_version((struct wl_proxy *) dri2_surf->wl_win->surface)
@@ -692,8 +684,6 @@ try_damage_buffer(struct dri2_egl_surface *dri2_surf,
rect[2], rect[3]);
}
return EGL_TRUE;
-#endif
-   return EGL_FALSE;
 }
 /**
  * Called via eglSwapBuffers(), drv->API.SwapBuffers().



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

2017-01-13 Thread Jonathan Gray
On Fri, Jan 13, 2017 at 04:51:31PM +, Emil Velikov wrote:
> From: Emil Velikov 
> 
> At the moment we support 5+ different implementations each with varying
> amount of bugs - from thread safely problems [1], to outright broken
> implementation(s) [2]
> 
> In order to accommodate these we have 150+ lines of configure script and
> extra two configure toggles. Whist an actual implementation being
> ~200loc and our current compat wrapping ~250.
> 
> Let's not forget that different people use different code paths, thus
> effectively makes it harder to test and debug since the default
> implementation is automatically detected.
> 
> To minimise all these lovely experiences, import the "100% Public
> Domain" OpenBSD sha1 implementation. Clearly document any changes needed
> to get building correctly, since many/most of those can be upstreamed
> making future syncs easier.

I had feared that this would somehow collide with the symbols
in libc but it seems to build and run xorg/glxgears at least
on broadwell with i965.

Patches for OpenBSD go to tech@ and you should look at how portable
openssh and libressl handle systems that lack functions like
explicit_bzero, autoconf detects systems that lack functions or are
known to have broken implementations and alternate versions are
provided.  Damien Miller described how this is handled for ssh in
https://www.openbsd.org/papers/portability.pdf
https://www.openbsd.org/papers/auug2005-portability/

The attribute could also be checked in autoconf as is already done
for various other attributes.

Other parts seem odd, posix defines size_t as being in sys/types.h
not stddef.h for example.

u_int* are bsd types which predate c99 types, I could see an
argument being made for changing the types there but it
would likely have to cover all the other hashes as well,
not just sha1.

> 
> As an added bonus this will avoid all the 'fun' experiences trying to
> integrate it with the Android and SCons builds.
> 
> Bugzilla [1]: https://bugs.freedesktop.org/show_bug.cgi?id=94904
> Bugzilla [2]: https://bugs.freedesktop.org/show_bug.cgi?id=97967
> Cc: Mark Janes 
> Cc: Vinson Lee 
> Cc: Tapani P??lli 
> Cc: Jonathan Gray 
> Signed-off-by: Emil Velikov 
> ---
>  configure.ac | 161 +--
>  src/compiler/glsl/tests/cache_test.c |   5 -
>  src/mesa/main/shaderapi.c|   6 -
>  src/util/Makefile.am |   3 -
>  src/util/Makefile.sources|   2 +
>  src/util/SConscript  |   5 -
>  src/util/disk_cache.c|   4 -
>  src/util/disk_cache.h|  42 --
>  src/util/mesa-sha1.c | 242 
> +--
>  src/util/sha1/README |  55 
>  src/util/sha1/sha1.c | 173 +
>  src/util/sha1/sha1.h |  47 +++
>  12 files changed, 279 insertions(+), 466 deletions(-)
>  create mode 100644 src/util/sha1/README
>  create mode 100644 src/util/sha1/sha1.c
>  create mode 100644 src/util/sha1/sha1.h
> 
> diff --git a/configure.ac b/configure.ac
> index 459f3e8b0a..5772b378c7 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -9,7 +9,6 @@ dnl Copyright ?? 2009-2014 Jon TURNEY
>  dnl Copyright ?? 2011-2012 Benjamin Franzke
>  dnl Copyright ?? 2008-2014 David Airlie
>  dnl Copyright ?? 2009-2013 Brian Paul
> -dnl Copyright ?? 2003-2007 Keith Packard, Daniel Stone
>  dnl
>  dnl Permission is hereby granted, free of charge, to any person obtaining a
>  dnl copy of this software and associated documentation files (the 
> "Software"),
> @@ -1432,151 +1431,6 @@ if test "x$enable_gallium_osmesa" = xyes; then
>  fi
>  fi
>  
> -# SHA1 hashing
> -AC_ARG_WITH([sha1],
> -
> [AS_HELP_STRING([--with-sha1=libc|libmd|libnettle|libgcrypt|libcrypto|libsha1|CommonCrypto|CryptoAPI],
> -[choose SHA1 implementation])])
> -case "x$with_sha1" in
> -x | xlibc | xlibmd | xlibnettle | xlibgcrypt | xlibcrypto | xlibsha1 | 
> xCommonCrypto | xCryptoAPI)
> -  ;;
> -*)
> -AC_MSG_ERROR([Illegal value for --with-sha1: $with_sha1])
> -esac
> -
> -AC_CHECK_FUNC([SHA1Init], [HAVE_SHA1_IN_LIBC=yes])
> -if test "x$with_sha1" = x && test "x$HAVE_SHA1_IN_LIBC" = xyes; then
> - with_sha1=libc
> -fi
> -if test "x$with_sha1" = xlibc && test "x$HAVE_SHA1_IN_LIBC" != xyes; then
> - AC_MSG_ERROR([sha1 in libc requested but not found])
> -fi
> -if test "x$with_sha1" = xlibc; then
> - AC_DEFINE([HAVE_SHA1_IN_LIBC], [1],
> - [Use libc SHA1 functions])
> - SHA1_LIBS=""
> -fi
> -AC_CHECK_FUNC([CC_SHA1_Init], [HAVE_SHA1_IN_COMMONCRYPTO=yes])
> -if test "x$with_sha1" = x && test "x$HAVE_SHA1_IN_COMMONCRYPTO" = xyes; then
> - with_sha1=CommonCrypto
> -fi
> -if test "x$with_sha1" = xCommonCrypto && test "x$HAVE_SHA1_IN_COMMONCRYPTO" 

Re: [Mesa-dev] [PATCH] nir/i965: assert first is always less than 64

2017-01-13 Thread Kenneth Graunke
On Thursday, January 12, 2017 12:24:41 PM PST Juan A. Suarez Romero wrote:
> This fixes a defect detected by Coverity Scan.
> ---
>  src/mesa/drivers/dri/i965/brw_draw_upload.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c 
> b/src/mesa/drivers/dri/i965/brw_draw_upload.c
> index b7527f2cd9..abf3859614 100644
> --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
> +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
> @@ -482,6 +482,7 @@ brw_prepare_vertices(struct brw_context *brw)
> brw->vb.nr_enabled = 0;
> while (vs_inputs) {
>GLuint first = ffsll(vs_inputs) - 1;
> +  assert (first < 64);
>GLuint index =
>   first - 
> DIV_ROUND_UP(_mesa_bitcount_64(vs_prog_data->double_inputs_read &
>  BITFIELD64_MASK(first)), 2);
> 

FWIW, I believe the actual bound is (first <= VERT_ATTRIB_MAX + 1).

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Enable OpenGL 4.5 on Haswell.

2017-01-13 Thread Matt Turner
On Fri, Jan 13, 2017 at 10:53 PM, Kenneth Graunke  wrote:
> Everything is in place and the test results look solid.
>
> Signed-off-by: Kenneth Graunke 

Wow, that's been a long time coming. Great work all around. I feel
like we've reached the top of the mountain.

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/va: delay calling begin_frame until we have all parameters

2017-01-13 Thread Christian König

Am 13.01.2017 um 14:15 schrieb Nayan Deshmukh:

If begin_frame is called before setting intra_matrix and
non_intra_matrix it leads to segmentation faults when
vl_mpeg12_decoder.c is used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92634
Signed-off-by: Nayan Deshmukh 


At one point I would rather like to fix all the codecs (both decoders 
and encoders) to don't rely on the picture info to be complete, but that 
is clearly a different problem.


So that patch is Reviewed-by: Christian König  
for now.


Regards,
Christian.


---
  src/gallium/state_trackers/va/picture.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index b5b9a83..dc7121c 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -178,9 +178,6 @@ handlePictureParameterBuffer(vlVaDriver *drv, vlVaContext 
*context, vlVaBuffer *
  
if (!context->decoder)

   return VA_STATUS_ERROR_ALLOCATION_FAILED;
-
-  context->decoder->begin_frame(context->decoder, context->target,
- >desc.base);
 }
  
 return vaStatus;

@@ -310,6 +307,9 @@ handleVASliceDataBufferType(vlVaContext *context, 
vlVaBuffer *buf)
 buffers[num_buffers] = buf->data;
 sizes[num_buffers] = buf->size;
 ++num_buffers;
+
+   context->decoder->begin_frame(context->decoder, context->target,
+  >desc.base);
 context->decoder->decode_bitstream(context->decoder, context->target, 
>desc.base,
num_buffers, (const void * const*)buffers, sizes);
  }



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 2/2] egl/wayland: Cleanup private display connection when init fails

2017-01-13 Thread Daniel Stone
Hi,

On 13 January 2017 at 14:09, Emil Velikov  wrote:
> Please use conditional as the one during setup. Namely:
> disp->PlatformDisplay == NULL
> dri2_initialize_wayland_swrast needs a similar hunk, as well as
> platform_drm.c. Can you address those as with later patches ?
>
> To make it cleaner to cherry-pick let's have this as 1/2. Do add the
> following tag.
> Cc: mesa-sta...@lists.freedesktop.org
>
> With the above, pre-emptively
> Reviewed-by: Emil Velikov 

and
Reviewed-by: Daniel Stone 

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gallium: Moving X11 dependencies under HAVE_PLATFORM_X11

2017-01-13 Thread Emil Velikov
On 13 January 2017 at 13:19, Christian König  wrote:
> Am 29.11.2016 um 13:22 schrieb Emil Velikov:
>>
>> On 28 November 2016 at 14:51, Christian König 
>> wrote:
>>
>>> --- a/src/gallium/state_trackers/va/context.c
>>> +++ b/src/gallium/state_trackers/va/context.c
>>> @@ -118,6 +118,7 @@ VA_DRIVER_INIT_FUNC(VADriverContextP ctx)
>>> return VA_STATUS_ERROR_UNIMPLEMENTED;
>>>  case VA_DISPLAY_GLX:
>>>  case VA_DISPLAY_X11:
>>> +#if defined(HAVE_PLATFORM_X11)
>>>   #if defined(HAVE_DRI3)
>>> drv->vscreen = vl_dri3_screen_create(ctx->native_dpy,
>>> ctx->x11_screen);
>>>   #endif
>>> @@ -125,6 +126,7 @@ VA_DRIVER_INIT_FUNC(VADriverContextP ctx)
>>>drv->vscreen = vl_dri2_screen_create(ctx->native_dpy,
>>> ctx->x11_screen);
>>> if (!drv->vscreen)
>>>goto error_screen;
>>> +#endif
>>
>> As mentioned off-list we want an #else return
>> VA_STATUS_ERROR_UNIMPLEMENTED; here.
>>
>> I've added that locally and noticed that the patch (as-is) causes
>> breakage. I have a WIP that unwraps the --with-egl-platform, fixing
>> Vulkan implementations along the way.
>> I should have that finished later on today.
>
>
> Sorry for coming back to this thread after nearly two month, but I had to
> take some involuntary time of.
>
> Did you had the chance to fix all this and if yes how we should proceed with
> st/va?
>
As-is this is a band-aid with nasty side-effects. The "Enable
--with-platforms for all" series addresses things fully, but I need to
retest and send out v2.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >