from:"Matt Turner"

Re: [Mesa-dev] [PATCH] i965/fs_live_variables: Do liveness analysis bottom-to-top

2015-06-10 Thread Matt Turner

On Mon, Jun 8, 2015 at 4:44 PM, Jason Ekstrand  wrote:
> Generally, liveness information propagates up the program, not down.

I'd replace this with an actual quote from the cited text:

"To determine which variables are live at each point in a flowgraph,
we perform a backward data-flow analysis"

> Previously, we were walking the blocks forwards and updating the livein and
> then the liveout.  However, the livein calculation depends on the liveout
> and the liveout depends on the successor blocks.  The net result is that it
> takes one full iteration to go from liveout to livein and then another
> full iteration to propagate to the predecessors.  This works out to an
> O(n^2) computation where n is the number of blocks.  If we run things in
> the other order, it's O(nl) where l is the maximum loop depth which is
> practically bounded by 3.
>
> This seems to shave about 5s off the compile timem of one particular
> shadertoy shader.

typo: time

I'd mention the absolute times as well. Extra points for collecting
actual numbers and using ministat.

> ---
>  .../drivers/dri/i965/brw_fs_live_variables.cpp | 38 
> +++---
>  1 file changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp
> index 502161d..f683a0d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp
> @@ -204,27 +204,9 @@ fs_live_variables::compute_live_variables()
> while (cont) {
>cont = false;
>
> -  foreach_block (block, cfg) {
> +  foreach_block_reverse (block, cfg) {

Ken didn't like me putting a space between the macro name and the (.
You could make him happy by removing it when you have the chance. :)

>   struct block_data *bd = &block_data[block->num];
>
> -/* Update livein */
> -for (int i = 0; i < bitset_words; i++) {
> -BITSET_WORD new_livein = (bd->use[i] |
> -  (bd->liveout[i] &
> -   ~bd->def[i]));
> -   if (new_livein & ~bd->livein[i]) {
> -   bd->livein[i] |= new_livein;
> -   cont = true;
> -   }
> -}
> - BITSET_WORD new_livein = (bd->flag_use[0] |
> -   (bd->flag_liveout[0] &
> -~bd->flag_def[0]));
> - if (new_livein & ~bd->flag_livein[0]) {
> -bd->flag_livein[0] |= new_livein;
> -cont = true;
> - }
> -
>  /* Update liveout */
>  foreach_list_typed(bblock_link, child_link, link, &block->children) {
>  struct block_data *child_bd = 
> &block_data[child_link->block->num];
> @@ -244,6 +226,24 @@ fs_live_variables::compute_live_variables()
> cont = true;
>  }
>  }
> +
> +/* Update livein */
> +for (int i = 0; i < bitset_words; i++) {
> +BITSET_WORD new_livein = (bd->use[i] |
> +  (bd->liveout[i] &
> +   ~bd->def[i]));
> +   if (new_livein & ~bd->livein[i]) {
> +   bd->livein[i] |= new_livein;
> +   cont = true;
> +   }
> +}

There are some tabs in the lines your moving, and this line's
indentation is messed up. Fix those while you're here.

> + BITSET_WORD new_livein = (bd->flag_use[0] |
> +   (bd->flag_liveout[0] &
> +~bd->flag_def[0]));
> + if (new_livein & ~bd->flag_livein[0]) {
> +bd->flag_livein[0] |= new_livein;
> +cont = true;
> + }
>}
> }
>  }
> --

With those comments addressed,

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 6/7] mesa: build xmlconfig to a separate static library

2015-06-10 Thread Matt Turner

On Wed, Jun 10, 2015 at 3:54 PM, Emil Velikov  wrote:
> From: Erik Faye-Lund 
>
> As we use the file from both the dri modules and loader, we end up with
> multiple definition of the symbols provided in our gallium dri  modules.
> Additionally we compile the file twice.
>
> Resolve both issues, effectively enabling the build on toolchains which
> don't support -Wl,--allow-multiple-definition.
>
> v2: [Emil Velikov]
>  - Fix the Scons/Android build.
>  - Resolve libgbm build issues (bring back the missing -lm)
>
> Cc: Julien Isorce 
> Cc: "10.5 10.6" 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90310
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90905
> Signed-off-by: Emil Velikov 
> ---
>  src/gallium/targets/dri/Makefile.am  |  6 --
>  src/loader/Makefile.am   | 10 +++---
>  src/mesa/drivers/dri/Makefile.am |  1 +
>  src/mesa/drivers/dri/common/Android.mk   |  4 +++-
>  src/mesa/drivers/dri/common/Makefile.am  |  6 +-
>  src/mesa/drivers/dri/common/Makefile.sources |  4 +++-
>  src/mesa/drivers/dri/common/SConscript   |  2 +-
>  src/mesa/drivers/dri/i965/Makefile.am|  1 +
>  8 files changed, 17 insertions(+), 17 deletions(-)
>
> diff --git a/src/gallium/targets/dri/Makefile.am 
> b/src/gallium/targets/dri/Makefile.am
> index f9e4ada..9648396 100644
> --- a/src/gallium/targets/dri/Makefile.am
> +++ b/src/gallium/targets/dri/Makefile.am
> @@ -53,12 +53,6 @@ gallium_dri_la_LIBADD = \
> $(LIBDRM_LIBS) \
> $(GALLIUM_COMMON_LIB_DEPS)
>
> -# XXX: Temporary allow duplicated symbols, as the loader pulls in xmlconfig.c
> -# which already provides driParse* and driQuery* amongst others.
> -# Remove this hack as we come up with a cleaner solution.
> -gallium_dri_la_LDFLAGS += \
> -   -Wl,--allow-multiple-definition
> -
>  EXTRA_gallium_dri_la_DEPENDENCIES = \
> dri.sym \
> $(top_srcdir)/src/gallium/targets/dri-vdpau.dyn
> diff --git a/src/loader/Makefile.am b/src/loader/Makefile.am
> index 36ddba8..aef1bd6 100644
> --- a/src/loader/Makefile.am
> +++ b/src/loader/Makefile.am
> @@ -41,15 +41,11 @@ libloader_la_CPPFLAGS += \
> -I$(top_builddir)/src/mesa/drivers/dri/common/ \
> -I$(top_srcdir)/src/mesa/ \
> -I$(top_srcdir)/src/mapi/ \
> -   -DUSE_DRICONF \
> -   $(EXPAT_CFLAGS)
> +   -DUSE_DRICONF
>
> -libloader_la_SOURCES += \
> -   $(top_srcdir)/src/mesa/drivers/dri/common/xmlconfig.c
> + libloader_la_LIBADD += \

Looks like we have an extra leading space here.

Do I understand correctly that after this patch the Gallium drivers
will get their only copy of xmlconfig via linking against
libloader.la?

If that's correct,

Acked-by: Matt Turner 

> +   $(top_builddir)/src/mesa/drivers/dri/common/libxmlconfig.la
>
> -libloader_la_LIBADD += \
> -   -lm \
> -   $(EXPAT_LIBS)
>  endif
>
>  if !HAVE_LIBDRM
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] mesa: Enable subdir-objects globally.

2015-06-10 Thread Matt Turner

---
Emil,

With your series, I think we can enable subdir-objects globally!

Please give it a test.

 configure.ac  | 2 +-
 src/Makefile.am   | 2 --
 src/gallium/auxiliary/Makefile.am | 2 --
 src/gallium/drivers/freedreno/Makefile.am | 2 --
 src/gallium/drivers/ilo/Makefile.am   | 2 --
 src/gallium/drivers/nouveau/Makefile.am   | 2 --
 src/gallium/drivers/r300/Makefile.am  | 2 --
 src/gallium/drivers/r600/Makefile.am  | 2 --
 src/gallium/drivers/svga/Makefile.am  | 2 --
 src/gallium/drivers/vc4/Makefile.am   | 2 --
 src/gallium/state_trackers/clover/Makefile.am | 2 --
 src/gallium/state_trackers/xvmc/Makefile.am   | 1 -
 src/gallium/targets/opencl/Makefile.am| 2 --
 src/gbm/Makefile.am   | 2 --
 src/glsl/Makefile.am  | 2 --
 src/gtest/Makefile.am | 2 --
 src/mapi/Makefile.am  | 2 --
 src/mesa/Makefile.am  | 2 --
 18 files changed, 1 insertion(+), 34 deletions(-)

diff --git a/configure.ac b/configure.ac
index be0cd7d..f363311 100644
--- a/configure.ac
+++ b/configure.ac
@@ -44,7 +44,7 @@ AC_INIT([Mesa], [MESA_VERSION],
 AC_CONFIG_AUX_DIR([bin])
 AC_CONFIG_MACRO_DIR([m4])
 AC_CANONICAL_SYSTEM
-AM_INIT_AUTOMAKE([foreign tar-ustar dist-xz])
+AM_INIT_AUTOMAKE([foreign tar-ustar dist-xz subdir-objects])
 
 dnl We only support native Windows builds (MinGW/MSVC) through SCons.
 case "$host_os" in
diff --git a/src/Makefile.am b/src/Makefile.am
index 18cb4ce..0df31cc 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -19,8 +19,6 @@
 # FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
 # IN THE SOFTWARE.
 
-AUTOMAKE_OPTIONS = subdir-objects
-
 SUBDIRS = . gtest util mapi/glapi/gen mapi
 
 if NEED_OPENGL_COMMON
diff --git a/src/gallium/auxiliary/Makefile.am 
b/src/gallium/auxiliary/Makefile.am
index 89c7a13..ab91062 100644
--- a/src/gallium/auxiliary/Makefile.am
+++ b/src/gallium/auxiliary/Makefile.am
@@ -1,5 +1,3 @@
-AUTOMAKE_OPTIONS = subdir-objects
-
 if HAVE_LOADER_GALLIUM
 SUBDIRS := pipe-loader
 endif
diff --git a/src/gallium/drivers/freedreno/Makefile.am 
b/src/gallium/drivers/freedreno/Makefile.am
index 4b2629f..6ac70e8 100644
--- a/src/gallium/drivers/freedreno/Makefile.am
+++ b/src/gallium/drivers/freedreno/Makefile.am
@@ -1,5 +1,3 @@
-AUTOMAKE_OPTIONS = subdir-objects
-
 include Makefile.sources
 include $(top_srcdir)/src/gallium/Automake.inc
 
diff --git a/src/gallium/drivers/ilo/Makefile.am 
b/src/gallium/drivers/ilo/Makefile.am
index a8785a5..1f14153 100644
--- a/src/gallium/drivers/ilo/Makefile.am
+++ b/src/gallium/drivers/ilo/Makefile.am
@@ -21,8 +21,6 @@
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 # DEALINGS IN THE SOFTWARE.
 
-AUTOMAKE_OPTIONS = subdir-objects
-
 include Makefile.sources
 include $(top_srcdir)/src/gallium/Automake.inc
 
diff --git a/src/gallium/drivers/nouveau/Makefile.am 
b/src/gallium/drivers/nouveau/Makefile.am
index 0aefc03..ce53022 100644
--- a/src/gallium/drivers/nouveau/Makefile.am
+++ b/src/gallium/drivers/nouveau/Makefile.am
@@ -20,8 +20,6 @@
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 # DEALINGS IN THE SOFTWARE.
 
-AUTOMAKE_OPTIONS = subdir-objects
-
 include Makefile.sources
 include $(top_srcdir)/src/gallium/Automake.inc
 
diff --git a/src/gallium/drivers/r300/Makefile.am 
b/src/gallium/drivers/r300/Makefile.am
index dd1a5ed..081f332 100644
--- a/src/gallium/drivers/r300/Makefile.am
+++ b/src/gallium/drivers/r300/Makefile.am
@@ -1,5 +1,3 @@
-AUTOMAKE_OPTIONS = subdir-objects
-
 include Makefile.sources
 include $(top_srcdir)/src/gallium/Automake.inc
 
diff --git a/src/gallium/drivers/r600/Makefile.am 
b/src/gallium/drivers/r600/Makefile.am
index dc0d90d..8317da7 100644
--- a/src/gallium/drivers/r600/Makefile.am
+++ b/src/gallium/drivers/r600/Makefile.am
@@ -1,5 +1,3 @@
-AUTOMAKE_OPTIONS = subdir-objects
-
 include Makefile.sources
 include $(top_srcdir)/src/gallium/Automake.inc
 
diff --git a/src/gallium/drivers/svga/Makefile.am 
b/src/gallium/drivers/svga/Makefile.am
index e0a8cad..d46de95 100644
--- a/src/gallium/drivers/svga/Makefile.am
+++ b/src/gallium/drivers/svga/Makefile.am
@@ -20,8 +20,6 @@
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 # DEALINGS IN THE SOFTWARE.
 
-AUTOMAKE_OPTIONS = subdir-objects
-
 include Makefile.sources
 include $(top_srcdir)/src/gallium/Automake.inc
 
diff --git a/src/gallium/drivers/vc4/Makefile.am 
b/src/gallium/drivers/vc4/Makefile.am
index 3f62ce2..7744631 100644
--- a/src/gallium/drivers/vc4/Makefile.am
+++ b/src/gallium/drivers/vc4/Makefile.am
@@ -19,8 +19,6 @@
 # FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
 # IN THE SOFTWARE.
 
-AUTOMAKE_OPTIONS = subdir-objects
-
 include Makefile.sources
 include $(top_srcdir)/src/gallium/Automake.inc
 
diff --git a/src/gallium

Re: [Mesa-dev] Request for Mentorship

2015-06-10 Thread Matt Turner

On Wed, Jun 10, 2015 at 5:04 PM, Emil Velikov  wrote:
> On 10 June 2015 at 23:56, Rob Clark  wrote:
>> On Wed, Jun 10, 2015 at 7:28 PM, Emil Velikov  
>> wrote:
>>> On 5 June 2015 at 22:08, Rob Clark  wrote:
 so, maybe a bit off topic (and maybe doesn't really help with the
 whole finding a mentor thing).. but a sort of wish-list thing for
 piglit that I've had in the back of my head is something like "tig"
 for piglit.  I suppose it doesn't have to necessarily be a curses
 based UI, it could be gui..  but a lot of times while I find myself
 searching through results.json to find cmdline to re-run failed tests,
 which is kind of time consuming.  Some sort of front-end to piglit
 which would let me do things like filter on test status, then see
 output/results and if I want re-run the selected test(s) would be kind
 of nice.

>>> Sounds like exactly what the html test summary provides. Although I'm
>>> not sure how many people actually use it - json seems to be more
>>> popular nowadays :-) I even recall naggin' on Dylan when things broke
>>> on an occasion or two.
>>
>> except html is neither interactive nor fast/convenient ;-)
>>
>> The use case I'm thinking of is mostly driver devel/debug where I want
>> to check if some updated version of a change I am working on fixed the
>> regressions, and that sort of thing.  (Or re-run the failed tests with
>> some FD_MESA_DEBUG env var debug flag.)  So ability to re-run
>> individual tests (or group of tests based on filtered result status,
>> etc) is what I'm looking for.  Versus digging through .json or html to
>> find individual cmdlines for some 100's of tests ;-)
>>
> I see. I was fortunate enough to rerun only 2-3 regressing tests,
> rather than 100's, so html was convenient enough :) The "rerun all
> {failing,...} tests" does sound like a useful feature.
>
> -Emil

Why are you guys using a thread about requesting an EVoC mentor for this?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v5] egl/dri2: implement platform_surfaceless

2015-06-12 Thread Matt Turner

On Fri, Jun 12, 2015 at 8:04 AM, Daniel Stone  wrote:
> Hi,
>
> On 12 June 2015 at 15:43, Emil Velikov  wrote:
>> On 11/06/15 17:43, Zach Reizner wrote:
>>> +  dri2_dpy->fd = open(card_path, O_RDWR);
>> Pretty sure that we'd want some O_CLOEXEC/fcntl() magic but that can be
>> done as a separate patch.
>
> We absolutely want O_CLOEXEC; if that could be put in now, that would be 
> great.

Related, Derek Foreman sent a patch at the beginning of May

egl/drm: Try to use CLOEXEC for drm fds

that I've been hoping someone more knowledgeable about EGL would review.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH shader-db] si-report: Fix algorithm for determing difference in shader stats

2015-06-15 Thread Matt Turner

On Mon, Jun 15, 2015 at 11:39 AM, Tom Stellard  wrote:
> ---
>  si-report.py | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)

Feel free to commit. I think anyone with mesa commit access can push
to shader-db as well. Let me know if that's not the case.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 23/82] glsl: Do not do CSE for expressions involving SSBO loads

2015-06-17 Thread Matt Turner

On Wed, Jun 17, 2015 at 5:20 PM, Jordan Justen
 wrote:
> I wanted to question whether this was required, based on this text
> from the extension spec:
>
> "The ability to write to buffer objects creates the potential for
>  multiple independent shader invocations to read and write the same
>  underlying memory. The same issue exists with the
>  ARB_shader_image_load_store extension provided in OpenGL 4.2, which
>  can write to texture objects and buffers. In both cases, the
>  specification makes few guarantees related to the relative order of
>  memory reads and writes performed by the shader invocations."
>
> But I'm not sure if we can reconcile CSE with 'memoryBarrier' and
> 'barrier'. curro, any thoughts from image load/store?

I think the point is that if you have two reads of the same location,
you can't combine them into a single load since in the intervening
time another shader invocation might have written to that location...?

The question about the semantics of memory-barriers is a good one though.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965: Add missing braces around if-statement.

2015-06-18 Thread Matt Turner

Fixes a performance problem caused by commit b639ed2f.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90895
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index c0c8dfa..49f2e3e 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -339,12 +339,13 @@ is_color_fast_clear_compatible(struct brw_context *brw,
mesa_format format,
const union gl_color_union *color)
 {
-   if (_mesa_is_format_integer_color(format))
+   if (_mesa_is_format_integer_color(format)) {
   if (brw->gen >= 8) {
  perf_debug("Integer fast clear not enabled for (%s)",
 _mesa_get_format_name(format));
   }
   return false;
+   }
 
for (int i = 0; i < 4; i++) {
   if (color->f[i] != 0.0 && color->f[i] != 1.0 &&
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] docs: update developer info

2015-06-18 Thread Matt Turner

On Thu, Jun 18, 2015 at 8:14 PM, Timothy Arceri  wrote:
> Just link directly to the piglit repo the old link has outdated information.
>
> Add note about updating patchwork when sending patch revisions.
> ---
>  docs/devinfo.html | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/docs/devinfo.html b/docs/devinfo.html
> index 0da18b9..6779ab4 100644
> --- a/docs/devinfo.html
> +++ b/docs/devinfo.html
> @@ -244,7 +244,7 @@ to update the tests themselves.
>
>  
>  Whenever possible and applicable, test the patch with
> -http://people.freedesktop.org/~nh/piglit/";>Piglit to
> +http://cgit.freedesktop.org/piglit";>Piglit to
>  check for regressions.
>  
>
> @@ -266,6 +266,12 @@ re-sending the whole series). Using --in-reply-to makes
>  it harder for reviewers to accidentally review old patches.
>  
>
> +
> +When submitting follow-up patches you should also login to
> +https://patchwork.freedesktop.org";>patchwork and change the
> +state of your old patches to Superseded.
> +
> +
>  Reviewing Patches
>
>  
> --
> 2.1.0

Thanks! Can't believe we're still seeing piglit links to ~nh/ these days.

Acked-by: Matt Turner 

Feel free to commit.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: move ARB_gs5 enums to core, EXT_polygon_offset_clamp to desktop

2015-06-19 Thread Matt Turner

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965: add EXT_polygon_offset_clamp support to gen4/gen5

2015-06-19 Thread Matt Turner

From: Ilia Mirkin 

Reviewed-by: Matt Turner 
Signed-off-by: Ilia Mirkin 
---
mattst88: Changed BRW_CONDITIONAL_G -> BRW_CONDITIONAL_GE (see commit 3b7f683f)

Somewhat worryingly, I wasn't able to break any of the piglit tests by inverting
the cmod on the CMP in brw_clip_unfilled.c.

 src/mesa/drivers/dri/i965/brw_clip.c  |  1 +
 src/mesa/drivers/dri/i965/brw_clip.h  |  1 +
 src/mesa/drivers/dri/i965/brw_clip_unfilled.c | 14 ++
 src/mesa/drivers/dri/i965/brw_context.h   |  2 ++
 src/mesa/drivers/dri/i965/brw_misc_state.c|  8 
 src/mesa/drivers/dri/i965/brw_wm_state.c  | 11 +++
 src/mesa/drivers/dri/i965/intel_extensions.c  |  2 +-
 7 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_clip.c 
b/src/mesa/drivers/dri/i965/brw_clip.c
index 3a73c64..2d5abc7 100644
--- a/src/mesa/drivers/dri/i965/brw_clip.c
+++ b/src/mesa/drivers/dri/i965/brw_clip.c
@@ -223,6 +223,7 @@ brw_upload_clip_prog(struct brw_context *brw)
   /* _NEW_POLYGON, _NEW_BUFFERS */
   key.offset_units = ctx->Polygon.OffsetUnits * 
ctx->DrawBuffer->_MRD * 2;
   key.offset_factor = ctx->Polygon.OffsetFactor * 
ctx->DrawBuffer->_MRD;
+  key.offset_clamp = ctx->Polygon.OffsetClamp * 
ctx->DrawBuffer->_MRD;
}
 
if (!ctx->Polygon._FrontBit) {
diff --git a/src/mesa/drivers/dri/i965/brw_clip.h 
b/src/mesa/drivers/dri/i965/brw_clip.h
index 4e38f2f..54c7682 100644
--- a/src/mesa/drivers/dri/i965/brw_clip.h
+++ b/src/mesa/drivers/dri/i965/brw_clip.h
@@ -62,6 +62,7 @@ struct brw_clip_prog_key {
 
GLfloat offset_factor;
GLfloat offset_units;
+   GLfloat offset_clamp;
 };
 
 
diff --git a/src/mesa/drivers/dri/i965/brw_clip_unfilled.c 
b/src/mesa/drivers/dri/i965/brw_clip_unfilled.c
index 6baf620..9a4d2a9 100644
--- a/src/mesa/drivers/dri/i965/brw_clip_unfilled.c
+++ b/src/mesa/drivers/dri/i965/brw_clip_unfilled.c
@@ -188,6 +188,12 @@ static void copy_bfc( struct brw_clip_compile *c )
   GLfloat bc   = dir.y * iz;
   offset = ctx->Polygon.OffsetUnits * DEPTH_SCALE;
   offset += MAX2( abs(ac), abs(bc) ) * ctx->Polygon.OffsetFactor;
+  if (ctx->Polygon.OffsetClamp && isfinite(ctx->Polygon.OffsetClamp)) {
+if (ctx->Polygon.OffsetClamp < 0)
+  offset = MAX2( offset, ctx->Polygon.OffsetClamp );
+else
+  offset = MIN2( offset, ctx->Polygon.OffsetClamp );
+  }
   offset *= MRD;
 */
 static void compute_offset( struct brw_clip_compile *c )
@@ -211,6 +217,14 @@ static void compute_offset( struct brw_clip_compile *c )
 
brw_MUL(p, vec1(off), vec1(off), brw_imm_f(c->key.offset_factor));
brw_ADD(p, vec1(off), vec1(off), brw_imm_f(c->key.offset_units));
+   if (c->key.offset_clamp && isfinite(c->key.offset_clamp)) {
+  brw_CMP(p,
+  vec1(brw_null_reg()),
+  c->key.offset_clamp < 0 ? BRW_CONDITIONAL_GE : BRW_CONDITIONAL_L,
+  vec1(off),
+  brw_imm_f(c->key.offset_clamp));
+  brw_SEL(p, vec1(off), vec1(off), brw_imm_f(c->key.offset_clamp));
+   }
 }
 
 
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 58119ee..7331276 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1394,6 +1394,8 @@ struct brw_context
*/
   drm_intel_bo *multisampled_null_render_target_bo;
   uint32_t fast_clear_op;
+
+  float offset_clamp;
} wm;
 
struct {
diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c 
b/src/mesa/drivers/dri/i965/brw_misc_state.c
index 5a4515b..08885d9 100644
--- a/src/mesa/drivers/dri/i965/brw_misc_state.c
+++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
@@ -884,14 +884,6 @@ brw_upload_invariant_state(struct brw_context *brw)
 
brw_select_pipeline(brw, BRW_RENDER_PIPELINE);
 
-   if (brw->gen < 6) {
-  /* Disable depth offset clamping. */
-  BEGIN_BATCH(2);
-  OUT_BATCH(_3DSTATE_GLOBAL_DEPTH_OFFSET_CLAMP << 16 | (2 - 2));
-  OUT_BATCH_F(0.0);
-  ADVANCE_BATCH();
-   }
-
if (brw->gen >= 8) {
   BEGIN_BATCH(3);
   OUT_BATCH(CMD_STATE_SIP << 16 | (3 - 2));
diff --git a/src/mesa/drivers/dri/i965/brw_wm_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_state.c
index 0cd4390..e436175 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_state.c
@@ -31,6 +31,7 @@
 
 
 
+#include "intel_batchbuffer.h"
 #include "intel_fbo.h"
 #include "brw_context.h"
 #include "brw_state.h"
@@ -251,6 +252,16 @@ brw_upload_wm_unit(struct brw_context *brw)
}
 
brw->ctx.NewDriverState |= BRW_NEW_GEN4_UNIT_STATE;
+
+   /* _NEW_POLGYON */
+   if (brw->wm.offset_clamp != ctx->Polygon.OffsetClamp) {
+  BEGIN_BATCH(2);
+  OUT_BATCH(_3DSTATE_GLOBAL_DEPTH_OFFSET_C

Re: [Mesa-dev] [PATCH v2 02/18] i965/fs: Fix fs_inst::regs_read() for uniform pull constant loads

2015-06-19 Thread Matt Turner

On Fri, Jun 19, 2015 at 1:18 PM, Jason Ekstrand  wrote:
> Previously, fs_inst::regs_read() fell back to depending on the register
> width for the second source.  This isn't really correct since it isn't a
> SIMD8 value at all, but a SIMD4x2 value.  This commit changes it to
> explicitly be always one register.
>
> Reviewed-by: Iago Toral Quiroga 
>
> v2: Use mlen for determining the number of registers written
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 17a940b..6d25ba4 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -758,6 +758,12 @@ fs_inst::regs_read(int arg) const
>   return mlen;
>break;
>
> +   case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GEN7:
> +  /* The payload is actually storred in src1 */

stored just has one r
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] [RFC] i965/vec4: Reward spills in if/else/endif blocks

2015-06-19 Thread Matt Turner

On Fri, Jun 19, 2015 at 6:53 PM, Connor Abbott  wrote:
> I don't think this is doing what you think it's doing. This code is
> for calculating the *cost* of spills, so a higher cost means a lower
> priority for choosing the register. We increase the cost for things
> inside loops because we don't want to spill inside loops, and by doing
> the same thing for if's you're actually discouraging spills inside an
> if block.

Top quoting is bad, m'kay.

But, I think it is doing what he thinks since he increases costs for
ENDIF and decreases costs for IF. That is, it's backwards from
DO/WHILE.

Why this is a good thing to do... I don't know. I'd expect some data
along with this patch in order to evaluate it properly.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Don't count NIR instructions for shader-db.

2015-06-22 Thread Matt Turner

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] i965: Initialize backend_shader::mem_ctx in its constructor.

2015-06-22 Thread Matt Turner

We were initializing it in each subclasses' constructors for some
reason.
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 4 +---
 src/mesa/drivers/dri/i965/brw_shader.cpp   | 2 ++
 src/mesa/drivers/dri/i965/brw_shader.h | 1 +
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 3 +--
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 4770838..dc992dd 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1984,13 +1984,11 @@ fs_visitor::fs_visitor(struct brw_context *brw,
struct gl_shader_program *shader_prog,
struct gl_program *prog,
unsigned dispatch_width)
-   : backend_shader(brw, shader_prog, prog, prog_data, stage),
+   : backend_shader(brw, mem_ctx, shader_prog, prog, prog_data, stage),
  key(key), prog_data(prog_data),
  dispatch_width(dispatch_width), promoted_constants(0),
  bld(fs_builder(this, dispatch_width).at_end())
 {
-   this->mem_ctx = mem_ctx;
-
switch (stage) {
case MESA_SHADER_FRAGMENT:
   key_tex = &((const brw_wm_prog_key *) key)->tex;
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 545ec26..7a26939 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -757,6 +757,7 @@ brw_abs_immediate(enum brw_reg_type type, struct brw_reg 
*reg)
 }
 
 backend_shader::backend_shader(struct brw_context *brw,
+   void *mem_ctx,
struct gl_shader_program *shader_prog,
struct gl_program *prog,
struct brw_stage_prog_data *stage_prog_data,
@@ -769,6 +770,7 @@ backend_shader::backend_shader(struct brw_context *brw,
  shader_prog(shader_prog),
  prog(prog),
  stage_prog_data(stage_prog_data),
+ mem_ctx(mem_ctx),
  cfg(NULL),
  stage(stage)
 {
diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
b/src/mesa/drivers/dri/i965/brw_shader.h
index da01d2f..e647749 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.h
+++ b/src/mesa/drivers/dri/i965/brw_shader.h
@@ -215,6 +215,7 @@ class backend_shader {
 protected:
 
backend_shader(struct brw_context *brw,
+  void *mem_ctx,
   struct gl_shader_program *shader_prog,
   struct gl_program *prog,
   struct brw_stage_prog_data *stage_prog_data,
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 0a76bde..669f769 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -3691,7 +3691,7 @@ vec4_visitor::vec4_visitor(struct brw_context *brw,
shader_time_shader_type st_base,
shader_time_shader_type st_written,
shader_time_shader_type st_reset)
-   : backend_shader(brw, shader_prog, prog, &prog_data->base, stage),
+   : backend_shader(brw, mem_ctx, shader_prog, prog, &prog_data->base, stage),
  c(c),
  key(key),
  prog_data(prog_data),
@@ -3704,7 +3704,6 @@ vec4_visitor::vec4_visitor(struct brw_context *brw,
  st_written(st_written),
  st_reset(st_reset)
 {
-   this->mem_ctx = mem_ctx;
this->failed = false;
 
this->base_ir = NULL;
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] i965/cfg: Assert that cur_do/while/if pointers are non-NULL.

2015-06-22 Thread Matt Turner

Coverity sees that the functions immediately below the new assertions
dereference these pointers, but is unaware that an ENDIF always follows
an IF, etc.
---
 src/mesa/drivers/dri/i965/brw_cfg.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_cfg.cpp 
b/src/mesa/drivers/dri/i965/brw_cfg.cpp
index 39c419b..f1f230e 100644
--- a/src/mesa/drivers/dri/i965/brw_cfg.cpp
+++ b/src/mesa/drivers/dri/i965/brw_cfg.cpp
@@ -231,6 +231,7 @@ cfg_t::cfg_t(exec_list *instructions)
  if (cur_else) {
 cur_else->add_successor(mem_ctx, cur_endif);
  } else {
+assert(cur_if != NULL);
 cur_if->add_successor(mem_ctx, cur_endif);
  }
 
@@ -299,6 +300,7 @@ cfg_t::cfg_t(exec_list *instructions)
  inst->exec_node::remove();
  cur->instructions.push_tail(inst);
 
+ assert(cur_do != NULL && cur_while != NULL);
 cur->add_successor(mem_ctx, cur_do);
 set_next_block(&cur, cur_while, ip);
 
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] i965: Assert that the GL primitive isn't out of range.

2015-06-22 Thread Matt Turner

Coverity sees the if (mode >= BRW_PRIM_OFFSET (128)) test and assumes
that the else-branch might execute for mode to up 127, which out be out
of bounds.
---
 src/mesa/drivers/dri/i965/brw_draw.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index a7164db..b91597a 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -92,8 +92,10 @@ get_hw_prim_for_gl_prim(int mode)
 {
if (mode >= BRW_PRIM_OFFSET)
   return mode - BRW_PRIM_OFFSET;
-   else
+   else {
+  assert(mode < ARRAY_SIZE(prim_to_hw_prim));
   return prim_to_hw_prim[mode];
+   }
 }
 
 
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965/fs: Don't mess up stride for uniform integer multiplication.

2015-06-22 Thread Matt Turner

If the stride is 0, the source is a uniform and we should not modify the
stride.

Cc: "10.6" 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91047
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 5563c5a..903624c 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3196,10 +3196,16 @@ fs_visitor::lower_integer_multiplication()
src1_1_w.fixed_hw_reg.dw1.ud >>= 16;
 } else {
src1_0_w.type = BRW_REGISTER_TYPE_UW;
-   src1_0_w.stride = 2;
+   if (src1_0_w.stride != 0) {
+  assert(src1_0_w.stride == 1);
+  src1_0_w.stride = 2;
+   }
 
src1_1_w.type = BRW_REGISTER_TYPE_UW;
-   src1_1_w.stride = 2;
+   if (src1_1_w.stride != 0) {
+  assert(src1_1_w.stride == 1);
+  src1_1_w.stride = 2;
+   }
src1_1_w.subreg_offset += type_sz(BRW_REGISTER_TYPE_UW);
 }
 ibld.MUL(low, inst->src[0], src1_0_w);
@@ -3209,10 +3215,16 @@ fs_visitor::lower_integer_multiplication()
 fs_reg src0_1_w = inst->src[0];
 
 src0_0_w.type = BRW_REGISTER_TYPE_UW;
-src0_0_w.stride = 2;
+if (src0_0_w.stride != 0) {
+   assert(src0_0_w.stride == 1);
+   src0_0_w.stride = 2;
+}
 
 src0_1_w.type = BRW_REGISTER_TYPE_UW;
-src0_1_w.stride = 2;
+if (src0_1_w.stride != 0) {
+   assert(src0_1_w.stride == 1);
+   src0_1_w.stride = 2;
+}
 src0_1_w.subreg_offset += type_sz(BRW_REGISTER_TYPE_UW);
 
 ibld.MUL(low, src0_0_w, inst->src[1]);
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] mesa: Delete unused ICEIL().

2015-06-22 Thread Matt Turner

Can't find any uses of it in git history.
---
Strangely, when it was moved to its current location in commit 27558a1,
it was moved from mmath.h... which seems to have been lost from git's
history. Searching further git log --grep mmath.h shows that various
commit messages mention modifying mmath.h and none of the commits
actually do.

 src/mesa/main/imports.h | 32 
 1 file changed, 32 deletions(-)

diff --git a/src/mesa/main/imports.h b/src/mesa/main/imports.h
index c4d917e..9ffe3de 100644
--- a/src/mesa/main/imports.h
+++ b/src/mesa/main/imports.h
@@ -230,38 +230,6 @@ static inline int IFLOOR(float f)
 }
 
 
-/** Return (as an integer) ceiling of float */
-static inline int ICEIL(float f)
-{
-#if defined(USE_X86_ASM) && defined(__GNUC__) && defined(__i386__)
-   /*
-* IEEE ceil for computers that round to nearest or even.
-* 'f' must be between -4194304 and 4194303.
-* This ceil operation is done by "(iround(f + .5) + iround(f - .5) + 1) >> 
1",
-* but uses some IEEE specific tricks for better speed.
-* Contributed by Josh Vanderhoof
-*/
-   int ai, bi;
-   double af, bf;
-   af = (3 << 22) + 0.5 + (double)f;
-   bf = (3 << 22) + 0.5 - (double)f;
-   /* GCC generates an extra fstp/fld without this. */
-   __asm__ ("fstps %0" : "=m" (ai) : "t" (af) : "st");
-   __asm__ ("fstps %0" : "=m" (bi) : "t" (bf) : "st");
-   return (ai - bi + 1) >> 1;
-#else
-   int ai, bi;
-   double af, bf;
-   fi_type u;
-   af = (3 << 22) + 0.5 + (double)f;
-   bf = (3 << 22) + 0.5 - (double)f;
-   u.f = (float) af; ai = u.i;
-   u.f = (float) bf; bi = u.i;
-   return (ai - bi + 1) >> 1;
-#endif
-}
-
-
 /**
  * Is x a power of two?
  */
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 6/6] mesa/es3.1: Fix error code for glCreateShaderProgram

2015-06-23 Thread Matt Turner

On Tue, Jun 23, 2015 at 5:23 AM, Marta Lofstedt
 wrote:
> From: Marta Lofstedt 
>
> According to the OpenGL ES standard, 7.3.
> For a call to glCreateShaderProgram with count < 0,
> a GL_INVALID_VALUE error should be generated.
>
> Signed-off-by: Marta Lofstedt 
> ---
>  src/mesa/main/shaderapi.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
> index c783c69..266064d 100644
> --- a/src/mesa/main/shaderapi.c
> +++ b/src/mesa/main/shaderapi.c
> @@ -1890,6 +1890,15 @@ _mesa_create_shader_program(struct gl_context* ctx, 
> GLboolean separate,
> const GLuint shader = create_shader(ctx, type);
> GLuint program = 0;
>
> +   /*
> +* According to OpenGL ES 3.1 standard 7.3: GL_INVALID_VALUE
> +* should be generated, if count < 0.
> +*/

The format of spec citations is

/* Page 65 in section 7.3 Program Objects of the OpenGL ES 3.1 spec says:
 *
 * "An INVALID_VALUE error is generated if count is negative."
 *
 * "If an error is generated, zero is returned."
 */

> +   if (_mesa_is_gles31(ctx) && count < 0) {

glCreateShaderProgramv comes from ARB_separate_shader_objects (merged
in GL 4.1), and in GL 4.3 the spec gained the new text cited above. I
think we should take that change as a clarification (a change that
should apply to previous versions retroactively) instead of an actual
change in behavior. As such, I think we should remove the
_mesa_is_gles31 check. I'd modify the citation to mention GL 4.3 as
well.

> +  _mesa_error(ctx, GL_INVALID_VALUE, "glCreateShaderProgram (count < 
> 0)");

Missing the "v" at the end of "glCreateShaderProgramv"

> +  return program;

Just return literal 0 here to make it more clear.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 6/6] mesa/es3.1: Fix error code for glCreateShaderProgram

2015-06-23 Thread Matt Turner

I should have also mentioned -- the commit titles need some
improvement. "Fix error code" isn't very descriptive of the change,
and the change isn't actually specific to es3.1. How about

> mesa: Raise INVALID_VALUE from glCreateShaderProgramv if count < 0.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/6] mesa/es3.1 : Correct error code for defect texture target

2015-06-23 Thread Matt Turner

On Tue, Jun 23, 2015 at 5:23 AM, Marta Lofstedt
 wrote:
> From: Marta Lofstedt 
>
> According to GLES 3.1 CTS test:
> ES31-CTS.texture_storage_multisample.
> APIGLGetTexLevelParameterifv.
> invalid_texture_target_rejected:
>
> GL_INVALID_ENUM should be generated when
> glGetTexLevelParameteriv is called with a defect
> texture target.
>
> Signed-off-by: Marta Lofstedt 
> ---
>  src/mesa/main/texobj.c | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/src/mesa/main/texobj.c b/src/mesa/main/texobj.c
> index c563f1e..c239deb 100644
> --- a/src/mesa/main/texobj.c
> +++ b/src/mesa/main/texobj.c
> @@ -222,6 +222,17 @@ _mesa_get_current_tex_object(struct gl_context *ctx, 
> GLenum target)

Since this function is called internally from a lot of places, it's
not clear to me that it's appropriate to raise a GL error here, but
I'm not sure.

For instance, it's called by TexSubImage, and the spec says for TexSubImage...

An INVALID_VALUE error is generated by *TexSubImage* if target does
not match the command, as shown in table 8.15.

>   return ctx->Extensions.ARB_texture_multisample
>  ? ctx->Texture.ProxyTex[TEXTURE_2D_MULTISAMPLE_ARRAY_INDEX] : 
> NULL;
>default:
> + if(_mesa_is_gles31(ctx))

Not specific to ES 3.1.

Commit message needs to be improved.

> + {

{ goes on the line with the if, like the rest of Mesa.

> +/*
> + * According to OpenGL ES 3.1 CTS:
> + * 
> ES31-CTS.texture_storage_multisample.APIGLGetTexLevelParameterifv.
> + * invalid_value_argument_rejected
> + * 
> es31cTextureStorageMultisampleGetTexLevelParameterifvTests.cpp:1277
> + * INVALID_ENUM should be reported for bad targets.

Like Erik says, need an actual spec citation.

> + */
> +_mesa_error(ctx, GL_INVALID_ENUM, "%s(target)", __func__);
> + }
>   _mesa_problem(NULL, "bad target in _mesa_get_current_tex_object()");
>   return NULL;
> }
> --
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/6] OpenGL ES 3.1 API checks and corner cases.

2015-06-23 Thread Matt Turner

On Tue, Jun 23, 2015 at 5:23 AM, Marta Lofstedt
 wrote:
> This is a series of patches that solves a couple of
> API check and corner cases issues that the OpenGL ES 3.1
> CTS exploits.
>
> Marta Lofstedt (6):
>   mesa/es3.1: Do not allow zero size multisampled textures
>   mesa/es3.1: Correct error code for illegal internal formats
>   mesa/es3.1 : Correct error code for zero samples
>   mesa/es3.1 : Correct error code for defect texture target
>   mesa/es31: AtomicBufferBindings should be initialized to zero.
>   mesa/es3.1: Fix error code for glCreateShaderProgram
>
>  src/mesa/main/bufferobj.c | 11 ---
>  src/mesa/main/shaderapi.c |  9 +
>  src/mesa/main/teximage.c  | 25 +
>  src/mesa/main/texobj.c| 11 +++
>  4 files changed, 53 insertions(+), 3 deletions(-)
>
> --

I commented on #4 and #6, but the rest basically had the same problems:

  - the spec needs to cited, not the test suite
  - changes likely apply to desktop, don't raise errors just
if(_mesa_gles_31(ctx))
  - commit titles need to be more specific
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa : NULL check InfoLog

2015-06-23 Thread Matt Turner

On Tue, Jun 23, 2015 at 4:03 AM, Marta Lofstedt
 wrote:
> From: Marta Lofstedt 
>
> When a program is compiled, but linking failed the
> sh->InfoLog could be NULL. This is expoloited
> by OpenGL ES 3.1 conformance tests.
>
> Signed-off-by: Marta Lofstedt 
> ---
>  src/mesa/main/shaderapi.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
> index a4296ad..c783c69 100644
> --- a/src/mesa/main/shaderapi.c
> +++ b/src/mesa/main/shaderapi.c
> @@ -1920,8 +1920,8 @@ _mesa_create_shader_program(struct gl_context* ctx, 
> GLboolean separate,
> }
>  #endif
>  }
> -
> -ralloc_strcat(&shProg->InfoLog, sh->InfoLog);
> + if (sh->InfoLog)
> +   ralloc_strcat(&shProg->InfoLog, sh->InfoLog);

Wrong indentation on this line.

Surely just not writing to the info log isn't the right fix? If it's
null, shouldn't we instead ralloc_strdup() the string?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl/es3.1: Fix up GL_ARB_compute_shader for GLSL ES 3.1

2015-06-23 Thread Matt Turner

I don't really think the "/es3.1" in the commit summary adds anything.
With that removed:

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa/es3.1: Enable GL_ARB_separate_shader_objects for GLES 3.1

2015-06-23 Thread Matt Turner

On Tue, Jun 23, 2015 at 4:30 AM, Marta Lofstedt
 wrote:
> From: Marta Lofstedt 
>
> Signed-off-by: Marta Lofstedt 
> ---
>  src/mesa/main/get_hash_params.py | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/src/mesa/main/get_hash_params.py 
> b/src/mesa/main/get_hash_params.py
> index 74ff3ba..f7134a9 100644
> --- a/src/mesa/main/get_hash_params.py
> +++ b/src/mesa/main/get_hash_params.py
> @@ -407,6 +407,10 @@ descriptor=[
>  { "apis": ["GL", "GL_CORE", "GLES3"], "params": [
>  # GL_ARB_sampler_objects / GL 3.3 / GLES 3.0
>[ "SAMPLER_BINDING", "LOC_CUSTOM, TYPE_INT, GL_SAMPLER_BINDING, NO_EXTRA" 
> ],
> +
> +# GL_ARB_vertex_attrib_binding / GLES 3.1
> +  [ "MAX_VERTEX_ATTRIB_RELATIVE_OFFSET", 
> "CONTEXT_ENUM(Const.MaxVertexAttribRelativeOffset), NO_EXTRA" ],
> +  [ "MAX_VERTEX_ATTRIB_BINDINGS", 
> "CONTEXT_ENUM(Const.MaxVertexAttribBindings), NO_EXTRA" ],

Won't this enable it in ES 3.0 as well? I think you need to replace
NO_EXTRA with something that contains EXTRA_API_ES31. See
extra_ARB_draw_indirect_es31 for an example.

>  ]},
>
>  # Enums in OpenGL Core profile and ES 3.1
> @@ -776,10 +780,6 @@ descriptor=[
>[ "MAX_COMBINED_ATOMIC_COUNTER_BUFFERS", 
> "CONTEXT_INT(Const.MaxCombinedAtomicBuffers), 
> extra_ARB_shader_atomic_counters" ],
>[ "MAX_COMBINED_ATOMIC_COUNTERS", 
> "CONTEXT_INT(Const.MaxCombinedAtomicCounters), 
> extra_ARB_shader_atomic_counters" ],
>
> -# GL_ARB_vertex_attrib_binding
> -  [ "MAX_VERTEX_ATTRIB_RELATIVE_OFFSET", 
> "CONTEXT_ENUM(Const.MaxVertexAttribRelativeOffset), NO_EXTRA" ],
> -  [ "MAX_VERTEX_ATTRIB_BINDINGS", 
> "CONTEXT_ENUM(Const.MaxVertexAttribBindings), NO_EXTRA" ],
> -
>  # GL_ARB_shader_image_load_store
>[ "MAX_IMAGE_UNITS", "CONTEXT_INT(Const.MaxImageUnits), 
> extra_ARB_shader_image_load_store"],
>[ "MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS", 
> "CONTEXT_INT(Const.MaxCombinedImageUnitsAndFragmentOutputs), 
> extra_ARB_shader_image_load_store"],
> --
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] Fix strict-aliasing violations in GLSL shader list implementation

2015-06-23 Thread Matt Turner

On Wed, Jun 24, 2015 at 2:05 PM, Davin McCall  wrote:
> Hi - I'm new here.

Welcome!

> I've recently started poking the Mesa codebase for little reason other than
> personal interest. In the "help wanted" section of the website it mentions
> aliasing violations as a target for newcomers to fix, so with that in mind
> I've attached a patch (against git head) which resolves a few of them, by
> targeting the linked list implementation (list.h) used in the GLSL
> compiler/optimizers. This change slightly increases the storage requirements
> for a list (adds one word) but resolves the blatant aliasing violation that
> was caused by the trick used to conserve that word in the first place.
>
> (I toyed with another approach - using a single sentinel node for both the
> head and tail of a list - but this was much more invasive, and meant that
> you could no longer check whether a particular node was a sentinel node
> unless you had a reference to the list, so I gave up and went with this
> simpler approach).
>
> The most essential change is in the 'exec_list' structure. Three fields
> 'head', 'tail' and 'tail_pred' are removed, and two separate sentinel nodes
> are inserted in their place. The old 'head' is replaced by
> 'head_sentinel.next', 'tail_pred' by 'tail_sentinel.prev', and tail (always
> NULL) by 'head_sentinel.prev' and 'tail_sentinel.next' (both always NULL).
>
> With this patch, I can build a working (though perhaps not 100% bug-free)
> Mesa without using -fno-strict-aliasing during compilation. Before the
> patch, applications using Mesa would hang at runtime.

Thanks for the patch. I'm impressed that fixing exec_list/exec_node
allows the removal of -fno-strict-aliasing (at least, we don't know of
the rest of the bugs yet :).

To summarize your change, instead of the head/tail/tail_pred pointers
with "tail" shared between two aliased exec_nodes, you've simply
allocated two sentinel exec_nodes inside exec_list directly.

Like Ilia says, some memory usage statistics before/after your patch
would be very welcome. There's an apitrace of Dota2 that we've often
used for memory usage testing (via valgrind massif):

http://people.freedesktop.org/~anholt/dota_linux.trace

> I don't really know the process you have in place so if I need to do
> anything else to get this patch into the codebase please let me know. Any
> comments and criticisms welcome.

Please send patches inline using git send-email with a commit title
and message that looks similar to others. In this case, the patch
should be titled something like "glsl: Modify exec_list to avoid
strict-aliasing violations."

If you wanted to go farther and remove -fno-strict-aliasing from configure.ac --

Since strict-aliasing allows some additional optimization
opportunities, I'd expect some justification for the patch such as the
output of the 'size' command on the resulting binaries, like in this
commit:

commit d35720da9b9824d104532028775e497491f433ad
Author: Matt Turner 
Date:   Wed Mar 4 17:27:21 2015 -0800

i965: Mark paths in linear <-> tiled functions as unreachable().

textdata bss dec hex filename
9663   0   0966325bf intel_tiled_memcpy.o   before
8215   0   082152017 intel_tiled_memcpy.o   after

Presumably if strict-aliasing indeed allows additional optimizations,
we'd see smaller .text sizes after the removal of
-fno-strict-aliasing.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] configure: use $target_cpu, not $host_cpu when setting asm_arch

2015-06-23 Thread Matt Turner

On Tue, Jun 23, 2015 at 3:04 PM, Brian Paul  wrote:
> Otherwise, if we're trying to build a 32-bit Mesa on a 64-bit host
> we wind up with -DUSE_X86_64_ASM, which is incorrect.
> ---
>  configure.ac | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/configure.ac b/configure.ac
> index ddc757e..b12f5f9 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -605,7 +605,7 @@ if test "x$enable_asm" = xyes -a "x$cross_compiling" = 
> xyes; then
>  fi
>  # check for supported arches
>  if test "x$enable_asm" = xyes; then
> -case "$host_cpu" in
> +case "$target_cpu" in
>  i?86)
>  case "$host_os" in
>  linux* | *freebsd* | dragonfly* | *netbsd* | openbsd* | gnu*)
> --
> 1.9.1

According to [1], host is "the machine that you are building for" and
target is "the machine that GCC will produce code for". In the context
of building GCC, I think this means that the resulting GCC binaries
will run on $host and will produce code for $target. In the context of
Mesa, I can't come up with a way that host != target makes sense.

docs/autoconf.html suggests using --build=x86_64-pc-linux-gnu
--host=i686-pc-linux-gnu to build on x86_64 for i686. Is that what
you're doing?

[1] https://gcc.gnu.org/onlinedocs/gccint/Configure-Terms.html
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] Fix strict-aliasing violations in GLSL shader list implementation

2015-06-23 Thread Matt Turner

On Tue, Jun 23, 2015 at 6:44 PM, Ian Romanick  wrote:
> Please quote the spec.

Section 6.5 Expressions of the draft C99 spec I have says (page 68, 80
of the pdf):

7 An object shall have its stored value accessed only by an lvalue
expression that has one of the following types: 76)

— a type compatible with the effective type of the object,

— a qualified version of a type compatible with the effective type of
the object,

— a type that is the signed or unsigned type corresponding to the
effective type of the object,

— a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,

— an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or

— a character type.

And the footnote 76 is:

76) The intent of this list is to specify those circumstances in which
an object may or may not be aliased.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] Fix strict-aliasing violations in GLSL shader list implementation

2015-06-23 Thread Matt Turner

On Tue, Jun 23, 2015 at 7:57 PM, Michel Dänzer  wrote:
> On 24.06.2015 11:39, Dave Airlie wrote:
>>>
>>> Actually, I'm almost 100% certain that there are lots of other strict
>>> aliasing violations in the Mesa code. That's why we've always disabled it.
>>>
>>> More generally, IMO it's unrealistic to rely on strict aliasing for
>>> optimization, because very few people really understand it (I'm not one
>>> of them).
>>
>> I personally think we should get past the, aliasing is hard, lets go 
>> shopping,
>
> I'm not saying that. I'm saying we'll keep getting it wrong, which will
> cause subtle and very hard to debug problems.

As far as I know, we haven't really tried to get it right for a long
time. -fno-strict-aliasing was added to Mesa's build system in 2007.

For what it's worth, gcc has a -Wstrict-aliasing flag [1] (part of
-Wall) that will warn about potential problems. Its default level is
advertised as "Should have very few false positives and few false
negatives."

I recompiled Mesa in my normal configuration (i965 only) with
-fstrict-aliasing -Wstrict-aliasing and it warned about a handful of
things in src/glx, one in the FXT1 compressor, another in swrast and
they all look valid from my understanding of the strict-aliasing
rules.

[1] 
https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wstrict-aliasing-407
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: fold duplicated GL/GL_CORE/GLES3 entry in get_hash_params.py

2015-06-24 Thread Matt Turner

On Wed, Jun 24, 2015 at 6:07 AM, Emil Velikov  wrote:
> Signed-off-by: Emil Velikov 
> ---
>  src/mesa/main/get_hash_params.py | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)

Doh!

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 4/7] glsl: No need to lock in _mesa_glsl_release_types

2015-06-25 Thread Matt Turner

On Thu, Jun 25, 2015 at 2:05 PM, Erik Faye-Lund  wrote:
> This function only gets called while mesa is unloading, so there's
> no potential of racing or multiple calls at the same time. So let's
> just get rid of the locking.
>
> Signed-off-by: Erik Faye-Lund 
> ---
>  src/glsl/glsl_types.cpp | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
> index f675e90..dbe2382 100644
> --- a/src/glsl/glsl_types.cpp
> +++ b/src/glsl/glsl_types.cpp
> @@ -324,8 +324,10 @@ const glsl_type *glsl_type::get_scalar_type() const
>  void
>  _mesa_glsl_release_types(void)
>  {
> -   mtx_lock(&glsl_type::mutex);
> -
> +   /* should only be called during atexit (either when unloading shared

Let's capitalize "should"

> +* object, or if process terminates), so no mutex-locking should be
> +* nessecary.

typo: necessary
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 6/7] mesa/main: free locale at exit

2015-06-25 Thread Matt Turner

On Thu, Jun 25, 2015 at 2:05 PM, Erik Faye-Lund  wrote:
> In order to save a small leak if mesa is continously loaded and
> unloaded, let's free the locale when the shared object is unloaded.
>
> Signed-off-by: Erik Faye-Lund 
> ---
>  src/mesa/main/context.c | 12 +++-
>  src/util/strtod.c   |  8 
>  src/util/strtod.h   |  3 +++
>  3 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
> index e68de68..dee1fa8 100644
> --- a/src/mesa/main/context.c
> +++ b/src/mesa/main/context.c
> @@ -346,6 +346,16 @@ _mesa_destroy_visual( struct gl_config *vis )
>  mtx_t OneTimeLock = _MTX_INITIALIZER_NP;
>
>
> +/**
> + * Calls all the various one-time-fini functions in Mesa
> + */
> +
> +static void
> +one_time_fini()
> +{
> +   _mesa_destroy_shader_compiler();
> +   _mesa_locale_fini();
> +}
>
>  /**
>   * Calls all the various one-time-init functions in Mesa.
> @@ -385,7 +395,7 @@ one_time_init( struct gl_context *ctx )
>   _mesa_ubyte_to_float_color_tab[i] = (float) i / 255.0F;
>}
>
> -  atexit(_mesa_destroy_shader_compiler);
> +  atexit(one_time_fini);
>
>  #if defined(DEBUG) && defined(__DATE__) && defined(__TIME__)
>if (MESA_VERBOSE != 0) {
> diff --git a/src/util/strtod.c b/src/util/strtod.c
> index e5e6f76..5c36b05 100644
> --- a/src/util/strtod.c
> +++ b/src/util/strtod.c
> @@ -46,6 +46,14 @@ _mesa_locale_init(void)
>  #endif
>  }
>
> +void
> +_mesa_locale_fini(void)
> +{
> +#if defined(_GNU_SOURCE) && defined(HAVE_XLOCALE_H)

_GNU_SOURCE isn't a macro that you're supposed to check if it's
defined -- you're supposed to define it if you want GNU extensions.
We're misusing it elsewhere, but it would be cool if we could do this
right.

The man page of freelocale says

   Since glibc 2.10:
  _XOPEN_SOURCE >= 700
   Before glibc 2.10:
  _GNU_SOURCE

(same for newlocale() used in

glibc-2.10 is more than 6 years old. I'd be happy to use only _XOPEN_SOURCE.

I'd be fine with fixing this in a follow up.

> +   freelocale(loc);
> +#endif
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 5/7] util: port _mesa_strto[df] to C

2015-06-25 Thread Matt Turner

On Thu, Jun 25, 2015 at 2:05 PM, Erik Faye-Lund  wrote:
> _mesa_strtod and _mesa_strtof are only used from the GLSL compiler and
> the ARB_[vertex|fragment]_program code, meaning that the locale doesn't
> need to be initialized before the first OpenGL context gets initialized.
>
> So let's use explicit initialization from the one-time init code instead
> of depending on a C++ compiler to initialize at image-load time.
>
> Signed-off-by: Erik Faye-Lund 
> Reviewed-by: Matt Turner 
> ---
>  src/glsl/glcpp/glcpp.c|  3 ++
>  src/glsl/main.cpp |  3 ++
>  src/mesa/main/context.c   |  3 ++
>  src/util/Makefile.sources |  2 +-
>  src/util/strtod.c | 78 
> +++
>  src/util/strtod.cpp   | 75 -
>  src/util/strtod.h |  3 ++
>  7 files changed, 91 insertions(+), 76 deletions(-)
>  create mode 100644 src/util/strtod.c
>  delete mode 100644 src/util/strtod.cpp
>
> diff --git a/src/glsl/glcpp/glcpp.c b/src/glsl/glcpp/glcpp.c
> index 5144516..c62f4ef 100644
> --- a/src/glsl/glcpp/glcpp.c
> +++ b/src/glsl/glcpp/glcpp.c
> @@ -29,6 +29,7 @@
>  #include "glcpp.h"
>  #include "main/mtypes.h"
>  #include "main/shaderobj.h"
> +#include "util/strtod.h"
>
>  extern int glcpp_parser_debug;
>
> @@ -168,6 +169,8 @@ main (int argc, char *argv[])
> if (shader == NULL)
>return 1;
>
> +   _mesa_locale_init();
> +
> ret = glcpp_preprocess(ctx, &shader, &info_log, NULL, &gl_ctx);
>
> printf("%s", shader);
> diff --git a/src/glsl/main.cpp b/src/glsl/main.cpp
> index 2341298..58651df 100644
> --- a/src/glsl/main.cpp
> +++ b/src/glsl/main.cpp
> @@ -38,6 +38,7 @@
>  #include "program/hash_table.h"
>  #include "loop_analysis.h"
>  #include "standalone_scaffolding.h"
> +#include "util/strtod.h"
>
>  static int glsl_version = 330;
>
> @@ -46,6 +47,8 @@ initialize_context(struct gl_context *ctx, gl_api api)
>  {
> initialize_context_to_defaults(ctx, api);
>
> +   _mesa_locale_init();
> +
> /* The standalone compiler needs to claim support for almost
>  * everything in order to compile the built-in functions.
>  */
> diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
> index c4af8ea..e68de68 100644
> --- a/src/mesa/main/context.c
> +++ b/src/mesa/main/context.c
> @@ -120,6 +120,7 @@
>  #include "shaderobj.h"
>  #include "shaderimage.h"
>  #include "util/simple_list.h"
> +#include "util/strtod.h"
>  #include "state.h"
>  #include "stencil.h"
>  #include "texcompress_s3tc.h"
> @@ -374,6 +375,8 @@ one_time_init( struct gl_context *ctx )
>assert( sizeof(GLint) == 4 );
>assert( sizeof(GLuint) == 4 );
>
> +  _mesa_locale_init();
> +
>_mesa_one_time_init_extension_overrides();
>
>_mesa_get_cpu_features();
> diff --git a/src/util/Makefile.sources b/src/util/Makefile.sources
> index dc55939..82df3bc 100644
> --- a/src/util/Makefile.sources
> +++ b/src/util/Makefile.sources
> @@ -19,7 +19,7 @@ MESA_UTIL_FILES :=\
> set.c \
> set.h \
> simple_list.h \
> -   strtod.cpp \
> +   strtod.c \
> strtod.h \
> texcompress_rgtc_tmp.h \
> u_atomic.h
> diff --git a/src/util/strtod.c b/src/util/strtod.c
> new file mode 100644
> index 000..e5e6f76
> --- /dev/null
> +++ b/src/util/strtod.c
> @@ -0,0 +1,78 @@
> +/*
> + * Copyright 2010 VMware, Inc.
> + * All Rights Reserved.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> + * "Software"), to deal in the Software without restriction, including
> + * without limitation the rights to use, copy, modify, merge, publish,
> + * distribute, sub license, and/or sell copies of the Software, and to
> + * permit persons to whom the Software is furnished to do so, subject to
> + * the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> + * next paragraph) shall be included in all copies or substantial portions
> + * of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
> + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
> + * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
> + * A

Re: [Mesa-dev] [PATCH v2 0/7] port _mesa_strto[df] to C

2015-06-25 Thread Matt Turner

On Thu, Jun 25, 2015 at 2:05 PM, Erik Faye-Lund  wrote:
> Back in March[1], I sent a patch porting _mesa_strto[df] to
> C rather than C++. I fixed up the patch according to the
> criticism, but unfortunately I dropped the ball before I sent
> out the result. So here I am, picking it back up!
>
> This time I've taken a deeper dive into the whole init/deinit
> of Mesa, and cleaned up a bunch of stuff in that area. And as
> a result, this time we end up freeing the locale also.
>
> No Piglit regressions observed.
>
> [1]: <1426446329-23984-1-git-send-email-kusmab...@gmail.com>
>
> Erik Faye-Lund (7):
>   mesa/main: Get rid of outdated GDB-hack
>   dri: don't touch the shader compiler
>   mesa/main: only call _mesa_destroy_shader_compiler once on exit
>   glsl: No need to lock in _mesa_glsl_release_types
>   util: port _mesa_strto[df] to C
>   mesa/main: free locale at exit
>   util: assert to verify that locale is initialized

Thanks for this! The series looks good to me.

I'm slightly worried about 2/7, but not for any reasons other than I'm
not very familiar with that code.

1-6 (with the caveat that I may not have any idea what I'm saying
about 2/7 ;) are:

Reviewed-by: Matt Turner 

7/7 looks like it may have been useful for debugging, but I don't
think we should necessarily commit it.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] Allow setting GL_TEXTURE_COMPARE_MODE on a sampler object without ARB_shadow support.

2015-06-25 Thread Matt Turner

On Mon, Mar 23, 2015 at 12:25 PM, Stefan Dösinger
 wrote:
> This fixes a GL error warning on r200 in Wine.
>
> The GL_ARB_sampler_objects extension does not specify a dependency on
> GL_ARB_shadow or GL_ARB_depth_texture for this value. Just set the value
> and don't do anything else. It won't matter without a depth texture
> being assigned anyway.
> ---

Brian, Ian -- you guys seem like you might have opinions about this.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/vec4_live_variables: Do liveness analysis bottom-to-top

2015-06-25 Thread Matt Turner

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa : NULL check InfoLog

2015-06-25 Thread Matt Turner

On Tue, Jun 23, 2015 at 2:04 PM, Matt Turner  wrote:
> On Tue, Jun 23, 2015 at 4:03 AM, Marta Lofstedt
>  wrote:
>> From: Marta Lofstedt 
>>
>> When a program is compiled, but linking failed the
>> sh->InfoLog could be NULL. This is expoloited

exploited is misspelled

>> by OpenGL ES 3.1 conformance tests.
>>
>> Signed-off-by: Marta Lofstedt 
>> ---
>>  src/mesa/main/shaderapi.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
>> index a4296ad..c783c69 100644
>> --- a/src/mesa/main/shaderapi.c
>> +++ b/src/mesa/main/shaderapi.c
>> @@ -1920,8 +1920,8 @@ _mesa_create_shader_program(struct gl_context* ctx, 
>> GLboolean separate,
>> }
>>  #endif
>>  }
>> -
>> -ralloc_strcat(&shProg->InfoLog, sh->InfoLog);
>> + if (sh->InfoLog)
>> +   ralloc_strcat(&shProg->InfoLog, sh->InfoLog);
>
> Wrong indentation on this line.
>
> Surely just not writing to the info log isn't the right fix? If it's
> null, shouldn't we instead ralloc_strdup() the string?

In the process of reading the second version of this patch, I realized
I had misread this as "if the existing log is empty, don't add
anything to the log", when in fact the code was "if there's no new
info, don't attempt to add new info"!

So this patch is correct as is.

Remove the space between "mesa" and ":" in the title, and maybe just
change the title to something like "mesa: Don't give ralloc_strcat a
NULL src."

With those changes, this is

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] mesa : NULL check InfoLog

2015-06-25 Thread Matt Turner

On Thu, Jun 25, 2015 at 4:16 PM, Ben Widawsky  wrote:
> On Thu, Jun 25, 2015 at 02:52:47PM +0200, Marta Lofstedt wrote:
>> From: Marta Lofstedt 
>>
>> When a program is compiled, but linking failed the
>> sh->InfoLog could be NULL. This is expoloited
>> by OpenGL ES 3.1 conformance tests.
>>
>> V2: ralloc_strdup shProg->InfoLog
>>
>> Signed-off-by: Marta Lofstedt 
>> ---
>>  src/mesa/main/shaderapi.c | 5 -
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
>> index a4296ad..bc6625a 100644
>> --- a/src/mesa/main/shaderapi.c
>> +++ b/src/mesa/main/shaderapi.c
>> @@ -1921,7 +1921,10 @@ _mesa_create_shader_program(struct gl_context* ctx, 
>> GLboolean separate,
>>  #endif
>>}
>>
>> -  ralloc_strcat(&shProg->InfoLog, sh->InfoLog);
>> + if (sh->InfoLog)
>> +ralloc_strcat(&shProg->InfoLog, sh->InfoLog);
>> + else
>> +ralloc_strdup(ctx, shProg->InfoLog);
>
> I don't understand what the strdup part is meant to do. Without the else, this
> is:
> Reviewed-by: Ben Widawsky 
>
> Feel free to explain why you need to dup the log in the else case, and I'll 
> look
> again.

The story is that I misread version 1 and thought the code was not
adding anything to the log if the existing log was empty and told her
to just ralloc_strdup a new log in that case.

But as you identified, the ralloc_strdup in this patch is broken (not
using the return value), and with your suggested change we're back to
v1, so:

Marta, please add Ben's R-b to your patch as well. :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] Fix for make[4]: *** No rule to make target '../../../mesa/src/util/strtod.cpp', needed by 'libmesautil_la-strtod.lo'. Stop.

2015-06-29 Thread Matt Turner

In commit c61bc6ed8 ("util: port _mesa_strto[df] to C") we renamed
strtod.cpp -> strtod.c.

Applying this patch on top of a tree that has already built strtod.cpp
will yield the error in the subject. To fix, simply run

sed -i -e 's/strtod\.cpp/strtod.c/' src/util/.deps/libmesautil_la-strtod.Plo

in your build tree.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v4 0/6] port _mesa_strto[df] to C

2015-06-29 Thread Matt Turner

Thanks!

Pushed, and sent a note describing how to fix the problems caused by
renaming strtod.cpp -> strtod.c.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] mesa: Convert some asserts into STATIC_ASSERT.

2015-06-29 Thread Matt Turner

---
 src/mesa/main/context.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
index fdef412..faa1de7 100644
--- a/src/mesa/main/context.c
+++ b/src/mesa/main/context.c
@@ -377,13 +377,12 @@ one_time_init( struct gl_context *ctx )
if (!api_init_mask) {
   GLuint i;
 
-  /* do some implementation tests */
-  assert( sizeof(GLbyte) == 1 );
-  assert( sizeof(GLubyte) == 1 );
-  assert( sizeof(GLshort) == 2 );
-  assert( sizeof(GLushort) == 2 );
-  assert( sizeof(GLint) == 4 );
-  assert( sizeof(GLuint) == 4 );
+  STATIC_ASSERT(sizeof(GLbyte) == 1);
+  STATIC_ASSERT(sizeof(GLubyte) == 1);
+  STATIC_ASSERT(sizeof(GLshort) == 2);
+  STATIC_ASSERT(sizeof(GLushort) == 2);
+  STATIC_ASSERT(sizeof(GLint) == 4);
+  STATIC_ASSERT(sizeof(GLuint) == 4);
 
   _mesa_locale_init();
 
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] glx: Drop CRAY support.

2015-06-29 Thread Matt Turner

It couldn't have worked anyway. There were calls to undefined functions.
---
 src/glx/packrender.h | 46 --
 src/glx/packsingle.h | 56 
 2 files changed, 102 deletions(-)

diff --git a/src/glx/packrender.h b/src/glx/packrender.h
index 4266d5c..f8f38ca 100644
--- a/src/glx/packrender.h
+++ b/src/glx/packrender.h
@@ -157,7 +157,6 @@
 #define __GLX_PUT_CHAR(offset,a)\
*((INT8 *) (pc + offset)) = a
 
-#ifndef _CRAY
 #define __GLX_PUT_SHORT(offset,a)   \
*((INT16 *) (pc + offset)) = a
 
@@ -167,29 +166,6 @@
 #define __GLX_PUT_FLOAT(offset,a)   \
*((FLOAT32 *) (pc + offset)) = a
 
-#else
-#define __GLX_PUT_SHORT(offset,a)   \
-   { GLubyte *cp = (pc+offset); \
-  int shift = (64-16) - ((int)(cp) >> (64-6));  \
-  *(int *)cp = (*(int *)cp & ~(0x << shift)) | ((a & 0x) << 
shift); }
-
-#define __GLX_PUT_LONG(offset,a)\
-   { GLubyte *cp = (pc+offset); \
-  int shift = (64-32) - ((int)(cp) >> (64-6));  \
-  *(int *)cp = (*(int *)cp & ~(0x << shift)) | ((a & 0x) 
<< shift); }
-
-#define __GLX_PUT_FLOAT(offset,a)   \
-   gl_put_float((pc + offset),a)
-
-#define __GLX_PUT_DOUBLE(offset,a)  \
-   gl_put_double(pc + offset, a)
-
-extern void gl_put_float( /*GLubyte *, struct cray_single */ );
-extern void gl_put_double( /*GLubyte *, struct cray_double */ );
-#endif
-
-#ifndef _CRAY
-
 #ifdef __GLX_ALIGN64
 /*
 ** This can certainly be done better for a particular machine
@@ -202,12 +178,9 @@ extern void gl_put_double( /*GLubyte *, struct cray_double 
*/ );
*((FLOAT64 *) (pc + offset)) = a
 #endif
 
-#endif
-
 #define __GLX_PUT_CHAR_ARRAY(offset,a,alen) \
__GLX_MEM_COPY(pc + offset, a, alen * __GLX_SIZE_INT8)
 
-#ifndef _CRAY
 #define __GLX_PUT_SHORT_ARRAY(offset,a,alen)\
__GLX_MEM_COPY(pc + offset, a, alen * __GLX_SIZE_INT16)
 
@@ -220,24 +193,5 @@ extern void gl_put_double( /*GLubyte *, struct cray_double 
*/ );
 #define __GLX_PUT_DOUBLE_ARRAY(offset,a,alen)  \
__GLX_MEM_COPY(pc + offset, a, alen * __GLX_SIZE_FLOAT64)
 
-#else
-#define __GLX_PUT_SHORT_ARRAY(offset,a,alen)\
-   gl_put_short_array((GLubyte *)(pc + offset), a, alen * __GLX_SIZE_INT16)
-
-#define __GLX_PUT_LONG_ARRAY(offset,a,alen) \
-   gl_put_long_array((GLubyte *)(pc + offset), (long *)a, alen * 
__GLX_SIZE_INT32)
-
-#define __GLX_PUT_FLOAT_ARRAY(offset,a,alen)\
-   gl_put_float_array((GLubyte *)(pc + offset), (float *)a, alen * 
__GLX_SIZE_FLOAT32)
-
-#define __GLX_PUT_DOUBLE_ARRAY(offset,a,alen)   \
-   gl_put_double_array((GLubyte *)(pc + offset), (double *)a, alen * 
__GLX_SIZE_FLOAT64)
-
-extern gl_put_short_array(GLubyte *, short *, int);
-extern gl_put_long_array(GLubyte *, long *, int);
-extern gl_put_float_array(GLubyte *, float *, int);
-extern gl_put_double_array(GLubyte *, double *, int);
-
-#endif /* _CRAY */
 
 #endif /* !__GLX_packrender_h__ */
diff --git a/src/glx/packsingle.h b/src/glx/packsingle.h
index 037265a..fddcbf1 100644
--- a/src/glx/packsingle.h
+++ b/src/glx/packsingle.h
@@ -83,7 +83,6 @@
 #define __GLX_SINGLE_PUT_CHAR(offset,a) \
*((INT8 *) (pc + offset)) = a
 
-#ifndef CRAY
 #define __GLX_SINGLE_PUT_SHORT(offset,a)\
*((INT16 *) (pc + offset)) = a
 
@@ -93,21 +92,6 @@
 #define __GLX_SINGLE_PUT_FLOAT(offset,a)\
*((FLOAT32 *) (pc + offset)) = a
 
-#else
-#define __GLX_SINGLE_PUT_SHORT(offset,a)\
-   { GLubyte *cp = (pc+offset);\
-  int shift = (64-16) - ((int)(cp) >> (64-6));  \
-  *(int *)cp = (*(int *)cp & ~(0x << shift)) | ((a & 0x) << 
shift); }
-
-#define __GLX_SINGLE_PUT_LONG(offset,a) \
-   { GLubyte *cp = (pc+offset);\
-  int shift = (64-32) - ((int)(cp) >> (64-6));  \
-  *(int *)cp = (*(int *)cp & ~(0x << shift)) | ((a & 0x) 
<< shift); }
-
-#define __GLX_SINGLE_PUT_FLOAT(offset,a)\
-   gl_put_float(pc + offset, a)
-#endif
-
 /* Read support macros */
 #define __GLX_SINGLE_READ_XREPLY()\
(void) _XReply(dpy, (xReply*) &reply, 0, False)
@@ -118,7 +102,6 @@
 #define __GLX_SINGLE_GET_SIZE(a)\
a = (GLint) reply.size
 
-#ifndef _CRAY
 #define __GLX_SINGLE_GET_CHAR(p)\
*p = *(GLbyte *)&reply.pad3;
 
@@ -131,31 +114,6 @@
 #define __GLX_SINGLE_GET_FLOAT(p)   \
*p = *(GLfloat *)&reply.pad3;
 
-#else
-#define __GLX_SINGLE_GET_CHAR(p)\
-   *p = reply.pad3 >> 24;
-
-#define __GLX_SINGLE_GET_SHORT(p)   \
-   {int t = reply.pad3 >> 16;\
-  *p = (t &

Re: [Mesa-dev] [PATCH] mesa/prog: relative offsets into constbufs are not constant

2015-07-01 Thread Matt Turner

On Wed, Jul 1, 2015 at 3:22 PM, Ilia Mirkin  wrote:
> The optimization logic relies on being able to read out constbuf values
> from program parameters. However that only works if there's no relative
> addressing involved.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91173
> Signed-off-by: Ilia Mirkin 
> ---

It would be a pretty neat project to make i915 and r200 consume NIR.

We'd get to delete Mesa IR and these optimization passes and probably
generate better code.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/chv|skl: Apply sampler bypass w/a

2015-07-02 Thread Matt Turner

On Wed, Jul 1, 2015 at 4:03 PM, Ben Widawsky
 wrote:
> Certain compressed formats require this setting. The docs don't go into much
> detail as to why it's needed exactly.
>
> This fixes 0 piglit failures with a GBM gpu piglit run.

That's a really weird way of saying that.

>
> Signed-off-by: Ben Widawsky 
> ---
>
> I had this one sitting around for almost 2 months. I'm not sure why I didn't
> send it out sooner. It seems like it's needed.
>
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h|  1 +
>  src/mesa/drivers/dri/i965/gen8_surface_state.c | 26 
> ++
>  2 files changed, 27 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 66b9abc..f55fd49 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -276,6 +276,7 @@
>  #define GEN8_SURFACE_TILING_W   (1 << 12)
>  #define GEN8_SURFACE_TILING_X   (2 << 12)
>  #define GEN8_SURFACE_TILING_Y   (3 << 12)
> +#define GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE  (1 << 9)
>  #define BRW_SURFACE_RC_READ_WRITE  (1 << 8)
>  #define BRW_SURFACE_MIPLAYOUT_SHIFT10
>  #define BRW_SURFACE_MIPMAPLAYOUT_BELOW   0
> diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> index ca5ed17..a245379 100644
> --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> @@ -264,6 +264,19 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
>surf[0] |= BRW_SURFACE_CUBEFACE_ENABLES;
> }
>
> +   if (brw->is_cherryview || brw->gen >= 9) {
> +  /* "This bit must be set for the following surface types: BC2_UNORM
> +   * BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM"

Don't do naked quotes -- Use the normal style.

> +   */
> +  switch (format) {
> +  case BRW_SURFACEFORMAT_BC2_UNORM:
> +  case BRW_SURFACEFORMAT_BC3_UNORM:
> +  case BRW_SURFACEFORMAT_BC5_SNORM:

Missing BRW_SURFACEFORMAT_BC5_UNORM.

> +  case BRW_SURFACEFORMAT_BC7_UNORM:
> + surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;

It wouldn't surprise me if static analysis tools complain about the
missing break.

Add a break or make it an if statement (which would be two lines
shorter). All together, how about

   /* From the CHV PRM, Volume 2d, page 321 (RENDER_SURFACE_STATE dword 0
* bit 9 "Sampler L2 Bypass Mode Disable" Programming Notes):
*
*This bit must be set for the following surface types: BC2_UNORM
*BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM
*/
   if (format == BRW_SURFACEFORMAT_BC2_UNORM ||
   format == BRW_SURFACEFORMAT_BC3_UNORM ||
   format == BRW_SURFACEFORMAT_BC5_SNORM ||
   format == BRW_SURFACEFORMAT_BC5_UNORM ||
   format == BRW_SURFACEFORMAT_BC7_UNORM)
  surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
   }
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/chv|skl: Apply sampler bypass w/a

2015-07-02 Thread Matt Turner

On Thu, Jul 2, 2015 at 12:57 PM, Matt Turner  wrote:
> On Wed, Jul 1, 2015 at 4:03 PM, Ben Widawsky
>  wrote:
>> Certain compressed formats require this setting. The docs don't go into much
>> detail as to why it's needed exactly.
>>
>> This fixes 0 piglit failures with a GBM gpu piglit run.
>
> That's a really weird way of saying that.
>
>>
>> Signed-off-by: Ben Widawsky 
>> ---
>>
>> I had this one sitting around for almost 2 months. I'm not sure why I didn't
>> send it out sooner. It seems like it's needed.
>>
>> ---
>>  src/mesa/drivers/dri/i965/brw_defines.h|  1 +
>>  src/mesa/drivers/dri/i965/gen8_surface_state.c | 26 
>> ++
>>  2 files changed, 27 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
>> b/src/mesa/drivers/dri/i965/brw_defines.h
>> index 66b9abc..f55fd49 100644
>> --- a/src/mesa/drivers/dri/i965/brw_defines.h
>> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
>> @@ -276,6 +276,7 @@
>>  #define GEN8_SURFACE_TILING_W   (1 << 12)
>>  #define GEN8_SURFACE_TILING_X   (2 << 12)
>>  #define GEN8_SURFACE_TILING_Y   (3 << 12)
>> +#define GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE  (1 << 9)
>>  #define BRW_SURFACE_RC_READ_WRITE  (1 << 8)
>>  #define BRW_SURFACE_MIPLAYOUT_SHIFT10
>>  #define BRW_SURFACE_MIPMAPLAYOUT_BELOW   0
>> diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
>> b/src/mesa/drivers/dri/i965/gen8_surface_state.c
>> index ca5ed17..a245379 100644
>> --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
>> +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
>> @@ -264,6 +264,19 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
>>surf[0] |= BRW_SURFACE_CUBEFACE_ENABLES;
>> }
>>
>> +   if (brw->is_cherryview || brw->gen >= 9) {
>> +  /* "This bit must be set for the following surface types: BC2_UNORM
>> +   * BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM"
>
> Don't do naked quotes -- Use the normal style.
>
>> +   */
>> +  switch (format) {
>> +  case BRW_SURFACEFORMAT_BC2_UNORM:
>> +  case BRW_SURFACEFORMAT_BC3_UNORM:
>> +  case BRW_SURFACEFORMAT_BC5_SNORM:
>
> Missing BRW_SURFACEFORMAT_BC5_UNORM.
>
>> +  case BRW_SURFACEFORMAT_BC7_UNORM:
>> + surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
>
> It wouldn't surprise me if static analysis tools complain about the
> missing break.
>
> Add a break or make it an if statement (which would be two lines
> shorter). All together, how about
>
>/* From the CHV PRM, Volume 2d, page 321 (RENDER_SURFACE_STATE dword 0
> * bit 9 "Sampler L2 Bypass Mode Disable" Programming Notes):
> *
> *This bit must be set for the following surface types: BC2_UNORM
> *BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM
> */
>if (format == BRW_SURFACEFORMAT_BC2_UNORM ||
>format == BRW_SURFACEFORMAT_BC3_UNORM ||
>format == BRW_SURFACEFORMAT_BC5_SNORM ||
>format == BRW_SURFACEFORMAT_BC5_UNORM ||

Bah, would be nice to so BC5_UNORM and then BC5_SNORM to better match
the comment.

>format == BRW_SURFACEFORMAT_BC7_UNORM)
>   surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
>}
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: add a missing call to _mesa_locale_init

2015-07-02 Thread Matt Turner

On Thu, Jul 2, 2015 at 2:22 PM, Ilia Mirkin  wrote:
> Can this be done at dlopen/init time? For example what happens if you do
>
> static int foo = _mesa_locale_init()
>
> IIRC things like that are possible in C++, not sure about C.

gcc has __attribute__((constructor)).

But I don't think we really care... Erik's series converted the strtod
code from C++ to C (including moving locale init from being a static
constructor to being called in one_time_init) and fixing the memory
leak.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: add a missing call to _mesa_locale_init

2015-07-02 Thread Matt Turner

On Thu, Jul 2, 2015 at 2:18 PM, Erik Faye-Lund  wrote:
> After c61bc6e ("util: port _mesa_strto[df] to C"), "make check"
> fails due to a missing _mesa_locale_init. Fixup this oversight.
>
> Signed-off-by: Erik Faye-Lund 
> Tested-by: Vinson Lee 

Reviewed-by: Matt Turner 

I'll commit this soon.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] glsl: add a missing call to _mesa_locale_init

2015-07-03 Thread Matt Turner

On Fri, Jul 3, 2015 at 12:46 AM, Erik Faye-Lund  wrote:
> After c61bc6e ("util: port _mesa_strto[df] to C"), "make check"
> fails due to a missing _mesa_locale_init. Fixup this oversight,
> by moving the stand-alone compiler initializer inside
> initialize_context_to_defaults().
>
> Signed-off-by: Erik Faye-Lund 
> ---
> Here's what the latter suggestion would look like. Personally,
> I have a slight preference for this version.

Thanks, I've pushed this instead.

Reviewed-by: Matt Turner http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/fs: Don't disable SIMD16 when using the pixel interpolator

2015-07-05 Thread Matt Turner

On Fri, Jul 3, 2015 at 3:46 AM, Francisco Jerez  wrote:
> Heh, I happened to come across this comment yesterday while looking for
> the remaining no16 calls and wondered why on earth it couldn't do the
> same that the normal interpolation code does.  After this patch and a
> series coming up that will remove all SIMD8 fallbacks from the texturing
> code, the only case left still applicable to Gen7 hardware and later
> will be "SIMD16 explicit accumulator operands unsupported".  Anyone?

I can explain the problem:

Prior to Gen7, the were were two accumulator registers usable for most
datatypes (acc0, acc1). On Gen7, they removed integer-support from
acc1, which was necessary to implement SIMD16 integer multiplication
using the normal MUL/MACH sequence. I implemented 32-bit integer
multiplication without using the accumulator in:

commit f7df169ba13d22338e9276839a7e9629ca0a6b4f
Author: Matt Turner 
Date:   Wed May 13 18:34:03 2015 -0700

i965/fs: Implement integer multiply without mul/mach.

The remaining cases of "SIMD16 explicit accumulator operands
unsupported" are ADDC, SUBB, and 32x32 -> high 32-bit multiplication.
The remaining multiplication case can probably be reimplemented
without the accumulator, like I did for the low 32-bit result.

The ADDC and SUBB instructions implicitly write a bit to the
accumulator if their operations overflowed. The 1Q/2Q quarter control
is supposed to select which register is implicitly written -- except
that there is no acc1 for integer types. Haswell and newer ignore the
quarter control and always write acc0, but IVB (and presumably BYT)
attempt to write to the nonexistent acc1.

You could split the the SIMD16 operations into 2x SIMD8s and set
force_writemask_all on the second, followed by a 2Q MOV from the
accumulator. Maybe we'd rather use the .o (overflow) conditional mod
on a result ADD to implement this.

Ideally, we'd recognize merge the addition and carry operations into a
single ADDC instruction, but it's pretty unimportant. It's all pretty
academic -- I've never seen an application use either operation (or
[iu]mulExtended either).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 5/6] i965: Upload binding tables in hw-generated binding table format.

2015-07-05 Thread Matt Turner

On Fri, Jul 3, 2015 at 12:00 AM, Abdiel Janulgue
 wrote:
> diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
> b/src/mesa/drivers/dri/i965/brw_binding_tables.c
> index 7396c94..34a88b9 100644
> --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
> +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
> @@ -70,30 +90,51 @@ brw_upload_binding_table(struct brw_context *brw,
>
>stage_state->bind_bo_offset = 0;
> } else {
> -  /* Upload a new binding table. */
> -  if (INTEL_DEBUG & DEBUG_SHADER_TIME) {
> - brw->vtbl.emit_buffer_surface_state(
> -brw, &stage_state->surf_offset[
> -prog_data->binding_table.shader_time_start],
> -brw->shader_time.bo, 0, BRW_SURFACEFORMAT_RAW,
> -brw->shader_time.bo->size, 1, true);
> +  /* When RS is enabled use hw-binding table uploads, otherwise fallback 
> to
> +   * software-uploads.
> +   */
> +  if (brw->use_resource_streamer) {
> + gen7_update_binding_table_from_array(brw, stage_state->stage,
> +  stage_state->surf_offset,
> +  prog_data->binding_table
> +  .size_bytes / 4);
> +  } else {
> + /* Upload a new binding table. */
> + if (INTEL_DEBUG & DEBUG_SHADER_TIME) {
> +brw->vtbl.emit_buffer_surface_state(
> +   brw, &stage_state->surf_offset[
> +  prog_data->binding_table.shader_time_start],
> +   brw->shader_time.bo, 0, BRW_SURFACEFORMAT_RAW,
> +   brw->shader_time.bo->size, 1, true);
> + }
> +
> + uint32_t *bind = brw_state_batch(brw, AUB_TRACE_BINDING_TABLE,
> +  
> prog_data->binding_table.size_bytes,
> +  32,
> +  &stage_state->bind_bo_offset);
> +
> + /* BRW_NEW_SURFACES and BRW_NEW_*_CONSTBUF */
> + memcpy(bind, stage_state->surf_offset,
> +prog_data->binding_table.size_bytes);
>}
> -
> -  uint32_t *bind = brw_state_batch(brw, AUB_TRACE_BINDING_TABLE,
> -   prog_data->binding_table.size_bytes, 
> 32,
> -   &stage_state->bind_bo_offset);
> -
> -  /* BRW_NEW_SURFACES and BRW_NEW_*_CONSTBUF */
> -  memcpy(bind, stage_state->surf_offset,
> - prog_data->binding_table.size_bytes);
> }
>
> brw->ctx.NewDriverState |= brw_new_binding_table;
>
> if (brw->gen >= 7) {
> +

Extra newline.

> +  if (brw->use_resource_streamer)
> + stage_state->bind_bo_offset =
> +reserve_hw_bt_space(brw, prog_data->binding_table.size_bytes);

We usually put braces around nested if statements, but *always* put
braces around if statements if they stretch across multiple lines (see
commit 22af95af).

> +
>BEGIN_BATCH(2);
>OUT_BATCH(packet_name << 16 | (2 - 2));
> -  OUT_BATCH(stage_state->bind_bo_offset);
> +  /* Align SurfaceStateOffset[16:6] format to [15:5] PS Binding Table 
> field
> +   * when hw-generated binding table is enabled.
> +   */
> +  OUT_BATCH(brw->use_resource_streamer ?
> +(stage_state->bind_bo_offset >> 1) :
> +stage_state->bind_bo_offset);
>ADVANCE_BATCH();
> }
>  }
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] glsl: Add missing check for whether an expression is an add operation

2015-07-06 Thread Matt Turner

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] glsl: Make sure not to dereference NULL

2015-07-06 Thread Matt Turner

On Sat, Jul 4, 2015 at 2:40 PM, Neil Roberts  wrote:
> In this bit of code point_five can be NULL if the expression is not a
> constant. This fixes it to match the pattern of the rest of the chunk
> of code so that it checks for NULLs.
>
> Cc: Matt Turner 
> Cc: "10.6" 
> ---
>  src/glsl/opt_algebraic.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
> index 9b8a426..c4b8715 100644
> --- a/src/glsl/opt_algebraic.cpp
> +++ b/src/glsl/opt_algebraic.cpp
> @@ -589,7 +589,7 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
> continue;
>
>  ir_constant *point_five = add_expr->operands[1 - 
> j]->as_constant();
> -if (!point_five->is_value(0.5, 0))
> +if (!point_five || !point_five->is_value(0.5, 0))
> continue;
>
>  if (abs_expr->operands[0]->equals(sign_expr->operands[0])) {
> --
> 1.9.3
>

Whoops. I think I thought that is_value NULL checked 'this'.

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: Add a MUST_CHECK macro for attribute((warn_unused_result)).

2015-07-06 Thread Matt Turner

On Mon, Jul 6, 2015 at 11:18 AM, Kenneth Graunke  wrote:
> In the kernel, this is called __must_check; all our attribute macros in
> Mesa appear to be uppercase, so I went with that.
>
> Signed-off-by: Kenneth Graunke 
> Cc: ch...@chris-wilson.co.uk
> Cc: matts...@gmail.com
> ---

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] i965/gen4-5: Set ENDIF dst and src0 fields to the null register.

2015-07-06 Thread Matt Turner

On Mon, Jul 6, 2015 at 11:03 AM, Francisco Jerez  wrote:
> The hardware docs don't mention explicitly what these fields should
> be, but I've verified experimentally on ILK that using a GRF as
> destination causes the register to be corrupted when the execution
> size of an ENDIF instruction is higher than 8 -- and because the
> destination we were using was g0, eventually a hang.
>
> Fixes some 150 piglit tests on Gen4-5 when forced to run shaders with
> if conditionals 16-wide, e.g. shaders/glsl-fs-sampler-numbering-3.

Neat!

> ---
>  src/mesa/drivers/dri/i965/brw_eu_emit.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> index 0f53604..4d39762 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> @@ -1584,8 +1584,8 @@ brw_ENDIF(struct brw_codegen *p)
> }
>
> if (devinfo->gen < 6) {
> -  brw_set_dest(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD));
> -  brw_set_src0(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD));
> +  brw_set_dest(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
> +  brw_set_src0(p, insn, retype(brw_null_reg(), BRW_REGISTER_TYPE_D));

Strange that it was an imm32 (type D) on gen < 6 and then imm16 after that.

Reviewed-by: Matt Turner 

>brw_set_src1(p, insn, brw_imm_d(0x0));
> } else if (devinfo->gen == 6) {
>brw_set_dest(p, insn, brw_imm_w(0));
> --
> 2.4.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] i965/gen4-5: Program the execution size correctly for DO/WHILE instructions.

2015-07-06 Thread Matt Turner

On Mon, Jul 6, 2015 at 11:03 AM, Francisco Jerez  wrote:
> From the hardware docs for the DO instruction:
>
>  "Execution size is ignored for this instruction."
>
> My observation on ILK hardware contradicts the spec though, channels

/facepalm

> over the execution size of a DO instruction won't enter the loop, and
> channels over the execution size of a WHILE instruction will exit the
> loop after the first iteration -- The latter is consistent with the
> spec though, there's no claim that the execution size is ignored for
> the WHILE instruction so it's not completely unexpected that it has an
> influence on the evaluation of EMask.
>
> The execute_size argument of brw_DO() shouldn't have any effect on
> Gen6 and newer hardware.  On Gen4-5 WHILE instructions inherit the
> execution size from the matching DO, so this patch should fix them
> too.  The execution size of BREAK and CONT instructions was already
> being set correctly.
>
> Fixes some 50 piglit tests on Gen4-5 when forced to run shaders with
> conditional and loop instructions 16-wide,
> e.g. shaders/glsl-fs-continue-inside-do-while.

Awesome!

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] i965/gen4-5: Enable 16-wide dispatch on shaders with control flow.

2015-07-06 Thread Matt Turner

On Mon, Jul 6, 2015 at 11:03 AM, Francisco Jerez  wrote:
> This was probably disabled due to a combination of several bugs in the
> generator code (fixed earlier in this series) and a misunderstanding
> of the hardware spec.  The documentation for most control flow
> instructions mentions among other restrictions:
>
>  "Instruction compression is not allowed."
>
> This however doesn't have any implications on 16 wide not being
> supported, because none of the control flow instructions have
> multi-register operands (control flow instructions are not compressed
> on more recent hardware either, except maybe SNB's IF with inline
> compare).  In fact Gen4-5 had 16-wide control flow masks and stacks,
> and the spec mentions in several places that control flow instructions
> push and pop 16 channels worth of data -- Otherwise there doesn't seem
> to be any indication that it shouldn't work.

Awesome. Yeah, I was wondering aloud to Ken recently about what
actually prevented non-uniform control flow from working -- because as
you say the mask registers are 16-channels wide.

Really cool to find out that it was just a couple of bugs and nothing
fundamental. Really nice work.

> Causes no piglit regressions, and gives the following shader-db
> results on ILK:

Assuming no regressions on Gen4 and G45 (running through Jenkins should do it):

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 15/18] swrast: Defer _tnl_vertex_init until first use

2015-07-06 Thread Matt Turner

On Mon, Jul 6, 2015 at 3:33 AM, Chris Wilson  wrote:
> The vertices require a large chunk of memory, currently allocated during
> context creation. However, this memory is not required until use so we
> can defer the allocation until the first swrast_Wakeup().

Makes sense to me. Someone like Brian like to take a look.

>
> Signed-off-by: Chris Wilson 
> Cc: Kenneth Graunke 
> ---
>  src/mesa/swrast_setup/ss_context.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/src/mesa/swrast_setup/ss_context.c 
> b/src/mesa/swrast_setup/ss_context.c
> index 74b1da3..028eccb 100644
> --- a/src/mesa/swrast_setup/ss_context.c
> +++ b/src/mesa/swrast_setup/ss_context.c
> @@ -59,10 +59,6 @@ _swsetup_CreateContext( struct gl_context *ctx )
> swsetup->NewState = ~0;
> _swsetup_trifuncs_init( ctx );
>
> -   _tnl_init_vertices( ctx, ctx->Const.MaxArrayLockSize + 12,
> -  sizeof(SWvertex) );
> -
> -
> return GL_TRUE;
>  }
>
> @@ -233,6 +229,11 @@ _swsetup_Wakeup( struct gl_context *ctx )
> TNLcontext *tnl = TNL_CONTEXT(ctx);
> SScontext *swsetup = SWSETUP_CONTEXT(ctx);
>
> +   if (!(GET_VERTEX_STATE(ctx))->max_vertex_size)
> +  _tnl_init_vertices(ctx,
> +ctx->Const.MaxArrayLockSize + 12,
> +sizeof(SWvertex));

Replace the tabs on these two lines with spaces.

> +
> tnl->Driver.Render.Start = _swsetup_RenderStart;
> tnl->Driver.Render.Finish = _swsetup_RenderFinish;
> tnl->Driver.Render.PrimitiveNotify = _swsetup_RenderPrimitive;
> --
> 2.1.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Fix missing BRW_NEW_FS_PROG_DATA in gen6_renderbuffer_surfaces.

2015-07-06 Thread Matt Turner

On Mon, Jul 6, 2015 at 9:55 AM, Kenneth Graunke  wrote:
> It looks like this was forgotten in commit 3c9dc2d31b80fc73bffa1f40a
> (i965: Make a brw_stage_prog_data for storing the SURF_INDEX
> information.)   In other words, it's been missing since we moved to
> dynamic binding table slot assignments.

Author: Eric Anholt 
AuthorDate: Wed Oct 2 14:07:40 2013 -0700
CommitDate: Tue Oct 15 10:18:42 2013 -0700

Dang.

How did you find this?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/fs: Don't disable SIMD16 when using the pixel interpolator

2015-07-06 Thread Matt Turner

On Sun, Jul 5, 2015 at 4:45 PM, Francisco Jerez  wrote:
> Hi Matt,
>
> Matt Turner  writes:
>
>> On Fri, Jul 3, 2015 at 3:46 AM, Francisco Jerez  
>> wrote:
>>> Heh, I happened to come across this comment yesterday while looking for
>>> the remaining no16 calls and wondered why on earth it couldn't do the
>>> same that the normal interpolation code does.  After this patch and a
>>> series coming up that will remove all SIMD8 fallbacks from the texturing
>>> code, the only case left still applicable to Gen7 hardware and later
>>> will be "SIMD16 explicit accumulator operands unsupported".  Anyone?
>>
>> I can explain the problem:
>>
>> Prior to Gen7, the were were two accumulator registers usable for most
>> datatypes (acc0, acc1). On Gen7, they removed integer-support from
>> acc1, which was necessary to implement SIMD16 integer multiplication
>> using the normal MUL/MACH sequence.
>
> IIRC they got rid of the acc1 register on IVB altogether, but managed to
> emulate it for floating point types by taking advantage of the extra
> precision not normally used for floating point arithmetic (the fake acc1
> basically uses the same storage in the EU that holds the 32 MSBs of each
> component of acc0), what explains the apparent asymmetry between integer
> and floating point data types.

I've never read anything that told me that -- what have you seen?

>> I implemented 32-bit integer multiplication without using the
>> accumulator in:
>>
>> commit f7df169ba13d22338e9276839a7e9629ca0a6b4f
>> Author: Matt Turner 
>> Date:   Wed May 13 18:34:03 2015 -0700
>>
>> i965/fs: Implement integer multiply without mul/mach.
>>
>> The remaining cases of "SIMD16 explicit accumulator operands
>> unsupported" are ADDC, SUBB, and 32x32 -> high 32-bit multiplication.
>> The remaining multiplication case can probably be reimplemented
>> without the accumulator, like I did for the low 32-bit result.
>>
> Hmm, I have the suspicion that high 32-bit multiplication is the one
> legit use-case of the accumulator we have left, any algorithm breaking
> it up into individual 32/16-bit MULs would end up doing more
> multiplications than the two MUL/MACH instructions we do now, because we
> wouldn't be able to take advantage of the full precision implemented in
> the hardware if we truncate the 48-bit intermediate results to fit in a
> 32-bit register.

That's probably true. It's just that Sandybridge and earlier don't
expose the functionality (but could do 64-bit integer multiplication
just fine), Ivybridge has the quarter-control/accumulator bug, Haswell
works fine if you split the multiplication sequence into SIMD8, and
Broadwell let's you do 32x32 -> 64-bit multiplication without the
accumulator.

So you have only two platforms where it's you have to use the
accumulator, and one of them is broken (but I guess can be trivially
fixed by some force-writemask-all hackery).

The best SIMD16 code for [iu]mulExtended() where both lsb and msb
results are used is probably 2 sets of mul/mach/mov (with some kind of
work around for Ivybridge), but that's kind of hard to recognize.

> How about we use the SIMD width lowering pass to split the computation
> in half?  It should be quite straightforward but will probably require
> adding a new virtual opcode so that the SIMD width lowering pass doesn't
> have to deal with (seriously fucked-up) accumulators directly.

Seems fine to me.

>> The ADDC and SUBB instructions implicitly write a bit to the
>> accumulator if their operations overflowed. The 1Q/2Q quarter control
>> is supposed to select which register is implicitly written -- except
>> that there is no acc1 for integer types. Haswell and newer ignore the
>> quarter control and always write acc0, but IVB (and presumably BYT)
>> attempt to write to the nonexistent acc1.
>>
>> You could split the the SIMD16 operations into 2x SIMD8s and set
>> force_writemask_all on the second, followed by a 2Q MOV from the
>> accumulator. Maybe we'd rather use the .o (overflow) conditional mod
>> on a result ADD to implement this.
>>
> Yeah.  I did in fact try to implement uaddCarry last Friday without
> using the accumulator by doing something like:
>
> | CMP.o tmp, src0, -src1
> | MOV dst, -tmp
>
> ...what of course didn't work because of the extra argument precision
> post-source modifiers and also because the .o condmod doesn't work at
> all on CMP, but...

Ah, you were trying to use the fact that CMP returns 0/-1. That's a
cool idea. It's too bad that the CMP instruction doesn't do .o

I'd been thinking of doing "ADD.o tmp, src0, src1" and then something
that sets/selects 0/1 based on the flag register. Maybe even a move
from the flag register would be best.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] util: Don't link to SHA1 library if shader-cache is disabled.

2015-07-07 Thread Matt Turner

Cc: "10.6" 
---
 src/util/Makefile.am | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/util/Makefile.am b/src/util/Makefile.am
index 2e7542e..1e087b4 100644
--- a/src/util/Makefile.am
+++ b/src/util/Makefile.am
@@ -46,9 +46,9 @@ libmesautil_la_SOURCES = \
 
 if ENABLE_SHADER_CACHE
 libmesautil_la_SOURCES += $(MESA_UTIL_SHADER_CACHE_FILES)
-endif
 
 libmesautil_la_LIBADD = $(SHA1_LIBS)
+endif
 
 roundeven_test_LDADD = -lm
 
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [PATCH] util: Don't link to SHA1 library if shader-cache is disabled.

2015-07-08 Thread Matt Turner

On Wed, Jul 8, 2015 at 3:00 AM, Emil Velikov  wrote:
> On 08/07/15 07:33, Matt Turner wrote:
>> Cc: "10.6" 
> Thanks for that I've completely forgot about this.
>
> Seems like we never got to using mesa_sha. Can we just remove it for now
> and revive it as a user comes along ?

I don't have a problem with that. I shouldn't have let in what's been committed.

I guess it mostly depends on how soon we expect to finish the shader-cache.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/11] i915; remove unused driFd variable

2015-07-08 Thread Matt Turner

On Wed, Jul 8, 2015 at 10:07 AM, Emil Velikov  wrote:
> Signed-off-by: Emil Velikov 
> ---
>  src/mesa/drivers/dri/i915/intel_context.c | 1 -
>  src/mesa/drivers/dri/i915/intel_context.h | 2 --
>  2 files changed, 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i915/intel_context.c 
> b/src/mesa/drivers/dri/i915/intel_context.c
> index 5618dcd..c780103 100644
> --- a/src/mesa/drivers/dri/i915/intel_context.c
> +++ b/src/mesa/drivers/dri/i915/intel_context.c
> @@ -428,7 +428,6 @@ intelInitContext(struct intel_context *intel,
>
> driContextPriv->driverPrivate = intel;
> intel->driContext = driContextPriv;
> -   intel->driFd = sPriv->fd;
>
> intel->gen = intelScreen->gen;
>
> diff --git a/src/mesa/drivers/dri/i915/intel_context.h 
> b/src/mesa/drivers/dri/i915/intel_context.h
> index 350d35d..4ec4015 100644
> --- a/src/mesa/drivers/dri/i915/intel_context.h
> +++ b/src/mesa/drivers/dri/i915/intel_context.h
> @@ -273,8 +273,6 @@ struct intel_context
>
> bool use_early_z;
>
> -   int driFd;
> -
> __DRIcontext *driContext;
> struct intel_screen *intelScreen;
>
> --

s/;/:/ in the subject.

Wow, that's some DRI1 stuff!

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] i965: Optimize intel_batchbuffer_emit_dword().

2015-07-08 Thread Matt Turner

By keeping a pointer to the next available location, we reduce the
number of memory accesses needed to write to the batchbuffer.

A net ~7k reduction of .text size, 7.5k of which is from the change to
intel_batchbuffer_emit_dword().

   text data  bss  dec  hex  filename
4943740   19515226192  5165084   4ed01c  i965_dri.so before
4936804   19515226192  5158148   4eb504  i965_dri.so after

Combined with the previous patch, improves performance of Synmark
OglBatch7 by 4.05914% +/- 1.49373% (n=270) on Haswell.
---
Full disclosure: when testing on an IVB desktop, I measured a
regression in the same benchmark of -4.19005% +/- 1.15188% (n=30).
I don't have any explanation.

Testing on other benchmarks and especially on Atom systems would
be very welcome.

 src/mesa/drivers/dri/i965/brw_blorp.cpp|  4 +--
 src/mesa/drivers/dri/i965/brw_context.h|  5 ++--
 .../drivers/dri/i965/brw_performance_monitor.c |  6 ++--
 src/mesa/drivers/dri/i965/brw_state_batch.c|  4 +--
 src/mesa/drivers/dri/i965/brw_urb.c|  6 ++--
 src/mesa/drivers/dri/i965/intel_batchbuffer.c  | 35 +++---
 src/mesa/drivers/dri/i965/intel_batchbuffer.h  | 11 ---
 7 files changed, 38 insertions(+), 33 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp.cpp
index 2ccfae1..eac1f00 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp
@@ -226,7 +226,7 @@ retry:
intel_batchbuffer_require_space(brw, estimated_max_batch_usage, 
RENDER_RING);
intel_batchbuffer_save_state(brw);
drm_intel_bo *saved_bo = brw->batch.bo;
-   uint32_t saved_used = brw->batch.used;
+   uint32_t saved_used = USED_BATCH(brw->batch);
uint32_t saved_state_batch_offset = brw->batch.state_batch_offset;
 
switch (brw->gen) {
@@ -245,7 +245,7 @@ retry:
 * reserved enough space that a wrap will never happen.
 */
assert(brw->batch.bo == saved_bo);
-   assert((brw->batch.used - saved_used) * 4 +
+   assert((USED_BATCH(brw->batch) - saved_used) * 4 +
   (saved_state_batch_offset - brw->batch.state_batch_offset) <
   estimated_max_batch_usage);
/* Shut up compiler warnings on release build */
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index afb714b..c7ad35e 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -875,7 +875,8 @@ struct intel_batchbuffer {
 #ifdef DEBUG
uint16_t emit, total;
 #endif
-   uint16_t used, reserved_space;
+   uint16_t reserved_space;
+   uint32_t *map_next;
uint32_t *map;
uint32_t *cpu_map;
 #define BATCH_SZ (8192*sizeof(uint32_t))
@@ -887,7 +888,7 @@ struct intel_batchbuffer {
uint8_t pipe_controls_since_last_cs_stall;
 
struct {
-  uint16_t used;
+  uint32_t *map_next;
   int reloc_count;
} saved;
 };
diff --git a/src/mesa/drivers/dri/i965/brw_performance_monitor.c 
b/src/mesa/drivers/dri/i965/brw_performance_monitor.c
index 0a12375..7e90e8a 100644
--- a/src/mesa/drivers/dri/i965/brw_performance_monitor.c
+++ b/src/mesa/drivers/dri/i965/brw_performance_monitor.c
@@ -710,7 +710,7 @@ emit_mi_report_perf_count(struct brw_context *brw,
/* Make sure the commands to take a snapshot fits in a single batch. */
intel_batchbuffer_require_space(brw, MI_REPORT_PERF_COUNT_BATCH_DWORDS * 4,
RENDER_RING);
-   int batch_used = brw->batch.used;
+   int batch_used = USED_BATCH(brw->batch);
 
/* Reports apparently don't always get written unless we flush first. */
brw_emit_mi_flush(brw);
@@ -754,7 +754,7 @@ emit_mi_report_perf_count(struct brw_context *brw,
brw_emit_mi_flush(brw);
 
(void) batch_used;
-   assert(brw->batch.used - batch_used <= MI_REPORT_PERF_COUNT_BATCH_DWORDS * 
4);
+   assert(USED_BATCH(brw->batch) - batch_used <= 
MI_REPORT_PERF_COUNT_BATCH_DWORDS * 4);
 }
 
 /**
@@ -1386,7 +1386,7 @@ void
 brw_perf_monitor_new_batch(struct brw_context *brw)
 {
assert(brw->batch.ring == RENDER_RING);
-   assert(brw->gen < 6 || brw->batch.used == 0);
+   assert(brw->gen < 6 || USED_BATCH(brw->batch) == 0);
 
if (brw->perfmon.oa_users == 0)
   return;
diff --git a/src/mesa/drivers/dri/i965/brw_state_batch.c 
b/src/mesa/drivers/dri/i965/brw_state_batch.c
index a405a80..d79e0ea 100644
--- a/src/mesa/drivers/dri/i965/brw_state_batch.c
+++ b/src/mesa/drivers/dri/i965/brw_state_batch.c
@@ -87,7 +87,7 @@ brw_annotate_aub(struct brw_context *brw)
drm_intel_aub_annotation annotations[annotation_count];
int a = 0;
make_annotation(&annotations[a++], AUB_TRACE_TYPE_BATCH, 0,
-   4*brw->batch.used);
+   4 * USED_BATCH(brw->batch));
for (int i = brw->state_batch_count; i-- > 0; ) {
   uint32_t type = brw->state_batch_list[i].type;
   uint32_t start_offset = brw->state_batch_list[i].offset;

[Mesa-dev] [PATCH 1/2] i965: Set brw->batch.emit only #ifdef DEBUG.

2015-07-08 Thread Matt Turner

It's only used inside #ifdef DEBUG. Cuts ~1.7k of .text, and more
importantly prevents a larger code size regression in the next commit
when the .used field is replaced and calculated on demand.

   text data  bss  dec  hex  filename
4945468   19515226192  5166812   4ed6dc  i965_dri.so before
4943740   19515226192  5165084   4ed01c  i965_dri.so after

And surround the emit and total fields with #ifdef DEBUG to prevent
such mistakes from happening again.
---
 src/mesa/drivers/dri/i965/brw_context.h   | 2 ++
 src/mesa/drivers/dri/i965/intel_batchbuffer.h | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 7596139..afb714b 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -872,7 +872,9 @@ struct intel_batchbuffer {
/** BO for post-sync nonzero writes for gen6 workaround. */
drm_intel_bo *workaround_bo;
 
+#ifdef DEBUG
uint16_t emit, total;
+#endif
uint16_t used, reserved_space;
uint32_t *map;
uint32_t *cpu_map;
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
index fdd07e0..f0971e9 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
@@ -134,8 +134,8 @@ intel_batchbuffer_begin(struct brw_context *brw, int n, 
enum brw_gpu_ring ring)
 {
intel_batchbuffer_require_space(brw, n * 4, ring);
 
-   brw->batch.emit = brw->batch.used;
 #ifdef DEBUG
+   brw->batch.emit = brw->batch.used;
brw->batch.total = n;
 #endif
 }
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: Optimize intel_batchbuffer_emit_dword().

2015-07-08 Thread Matt Turner

On Wed, Jul 8, 2015 at 2:07 PM, Chris Wilson  wrote:
> On Wed, Jul 08, 2015 at 02:00:02PM -0700, Matt Turner wrote:
>> By keeping a pointer to the next available location, we reduce the
>> number of memory accesses needed to write to the batchbuffer.
>>
>> A net ~7k reduction of .text size, 7.5k of which is from the change to
>> intel_batchbuffer_emit_dword().
>>
>>text data  bss  dec  hex  filename
>> 4943740   19515226192  5165084   4ed01c  i965_dri.so before
>> 4936804   19515226192  5158148   4eb504  i965_dri.so after
>>
>> Combined with the previous patch, improves performance of Synmark
>> OglBatch7 by 4.05914% +/- 1.49373% (n=270) on Haswell.
>> ---
>> Full disclosure: when testing on an IVB desktop, I measured a
>> regression in the same benchmark of -4.19005% +/- 1.15188% (n=30).
>> I don't have any explanation.
>
> The problem is that it seems to generate worse code with multiple
> adjacent emit_dwords. I have seen similar regressions when doing the
> same batch[index] to *batch++ elsewhere.
> -Chris

That is in conflict with the data I've provided. In fact, I started by
noticing that if I added intel_batchbuffer_emit_dword* functions that
took multiple dwords that there was a reduction in .text size, so it's
something that I've considered.

The two attached patches demonstrate that the batch[index++] pattern
generates larger (worse) code than *batch++. 1.patch

   text   databssdechex filename
4936804 195152  26192 5158148 4eb504 mesa-release/lib/i965_dri.so
after 1.patch (no change)
4936884 195152  26192 5158228 4eb554 mesa-release/lib/i965_dri.so after 2.patch
From 03256a2898bf74b56a35f8fcc130f8fee5902b59 Mon Sep 17 00:00:00 2001
From: Matt Turner 
Date: Wed, 8 Jul 2015 15:11:07 -0700
Subject: [PATCH 1/2] 1

---
 src/mesa/drivers/dri/i965/gen7_blorp.cpp  | 17 +
 src/mesa/drivers/dri/i965/intel_batchbuffer.h | 24 
 2 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
index abace6d..74e0763 100644
--- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
@@ -520,14 +520,15 @@ gen7_blorp_emit_ps_config(struct brw_context *brw,
}
 
BEGIN_BATCH(8);
-   OUT_BATCH(_3DSTATE_PS << 16 | (8 - 2));
-   OUT_BATCH(params->use_wm_prog ? prog_offset : 0);
-   OUT_BATCH(dw2);
-   OUT_BATCH(0);
-   OUT_BATCH(dw4);
-   OUT_BATCH(dw5);
-   OUT_BATCH(0);
-   OUT_BATCH(0);
+   intel_batchbuffer_emit_dword8(brw,
+ _3DSTATE_PS << 16 | (8 - 2),
+ params->use_wm_prog ? prog_offset : 0,
+ dw2,
+ 0,
+ dw4,
+ dw5,
+ 0,
+ 0);
ADVANCE_BATCH();
 }
 
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
index a12387b..a5d8f0b 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
@@ -101,6 +101,30 @@ intel_batchbuffer_emit_dword(struct brw_context *brw, GLuint dword)
 }
 
 static inline void
+intel_batchbuffer_emit_dword8(struct brw_context *brw, GLuint dword1,
+   GLuint dword2,
+   GLuint dword3,
+   GLuint dword4,
+   GLuint dword5,
+   GLuint dword6,
+   GLuint dword7,
+   GLuint dword8)
+{
+#ifdef DEBUG
+   assert(intel_batchbuffer_space(brw) >= 4);
+#endif
+   *brw->batch.map_next++ = dword1;
+   *brw->batch.map_next++ = dword2;
+   *brw->batch.map_next++ = dword3;
+   *brw->batch.map_next++ = dword4;
+   *brw->batch.map_next++ = dword5;
+   *brw->batch.map_next++ = dword6;
+   *brw->batch.map_next++ = dword7;
+   *brw->batch.map_next++ = dword8;
+   assert(brw->batch.ring != UNKNOWN_RING);
+}
+
+static inline void
 intel_batchbuffer_emit_float(struct brw_context *brw, float f)
 {
intel_batchbuffer_emit_dword(brw, float_as_int(f));
-- 
2.3.6

From 08c78f95dfbaa39303d25c3348ba1fa99b04b31b Mon Sep 17 00:00:00 2001
From: Matt Turner 
Date: Wed, 8 Jul 2015 15:30:48 -0700
Subject: [PATCH 2/2] 2

---
 src/mesa/drivers/dri/i965/brw_context.h   |  2 +-
 src/mesa/drivers/dri/i965/intel_batchbuffer.h | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/

Re: [Mesa-dev] [PATCH] mesa: Implement faster streaming memcpy

2015-07-08 Thread Matt Turner

On Wed, Jul 8, 2015 at 2:07 PM, Ben Widawsky
 wrote:
> WARNING: No perf data, please keep reading though)
>
> This implements the suggestion provided by the paper, "Fast USWC to WB Memory
> Copy"
> (https://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers).
> This is described throughout the paper, but the sample code lives in Figure 
> 3-3.
> That paper purports a roughly 40% performance gain in Mbyte/second over the
> original implementation done by Matt.

Sounds interesting!

> Section 3.1.2 is the summary of why an intermediate cache buffer is used. It
> claims that if you use the naive implementation, fill buffers are contended 
> for.
> To be honest, I can't quite fathom the underlying explanation, but I'll think
> about it some more. Most importantly would be to get the perf data... This 
> patch
> does need performance data. I don't currently have a platform that this would
> benefit (BYT or BSW), so I can't get anything useful. As soon as I get a
> platform to test it on, I will - meanwhile, maybe whomever tested the original
> patch the first time around come run this through?
>
> Cc: Matt Turner 
> Cc: Chad Versace 
> Cc: Kristian Høgsberg 
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/main/streaming-load-memcpy.c | 61 
> +++
>  1 file changed, 47 insertions(+), 14 deletions(-)
>
> diff --git a/src/mesa/main/streaming-load-memcpy.c 
> b/src/mesa/main/streaming-load-memcpy.c
> index d7147af..3cd310a 100644
> --- a/src/mesa/main/streaming-load-memcpy.c
> +++ b/src/mesa/main/streaming-load-memcpy.c
> @@ -30,6 +30,8 @@
>  #include "main/streaming-load-memcpy.h"
>  #include 
>
> +static uint8_t rsvd_space[4096];

This being static makes this thread-unsafe.

Also, I don't know if it's guaranteed to be 16-byte aligned.
__attribute__((__aligned__(16))) would avoid any problems, but I think
you should do 64-byte alignment to ensure the block is cache-line
aligned. Oh, in fact the linked article says as much: "This buffer
should also be 64 byte (cache line) aligned."

> +
>  /* Copies memory from src to dst, using SSE 4.1's MOVNTDQA to get streaming
>   * read performance from uncached memory.
>   */
> @@ -59,23 +61,54 @@ _mesa_streaming_load_memcpy(void *restrict dst, void 
> *restrict src, size_t len)
>len -= MIN2(bytes_before_alignment_boundary, len);
> }
>
> -   while (len >= 64) {
> -  __m128i *dst_cacheline = (__m128i *)d;
> -  __m128i *src_cacheline = (__m128i *)s;
> +   while (len > 64) {

The article suggests we align the source to a 64-byte boundary as
well. We should do some performance testing around that.

It also suggests copying data to a 64-byte aligned destination as
well, which we could ensure by making the coalignment check above more
strict, though at the cost of skipping this fast path for some other
cases.

> +  __m128i *cached_buffer = (__m128i *)rsvd_space;
> +  size_t streaming_len = len > 4096 ? 4096 : len;

MIN2(len, 4096)

> +
> +  __asm__ volatile("mfence" ::: "memory");

Use _mm_mfence() like the sample code does.

> +
> +  while (streaming_len >= 64) {
> + __m128i *src_cacheline = (__m128i *)s;
> +
> + __m128i temp1 = _mm_stream_load_si128(src_cacheline + 0);
> + __m128i temp2 = _mm_stream_load_si128(src_cacheline + 1);
> + __m128i temp3 = _mm_stream_load_si128(src_cacheline + 2);
> + __m128i temp4 = _mm_stream_load_si128(src_cacheline + 3);
> +
> + _mm_store_si128(cached_buffer + 0, temp1);
> + _mm_store_si128(cached_buffer + 1, temp2);
> + _mm_store_si128(cached_buffer + 2, temp3);
> + _mm_store_si128(cached_buffer + 3, temp4);
> +
> + s += 64;
> + streaming_len -= 64;
> + cached_buffer += 4;
> +  }
> +
> +  cached_buffer = (__m128i *)rsvd_space;
> +  streaming_len = len > 4096 ? 4096 : len;

MIN2(len, 4096)

> +
> +  __asm__ volatile("mfence" ::: "memory");

_mm_mfence()

> +
> +  while (streaming_len >= 64) {
> + __m128i *dst_cacheline = (__m128i *)d;
> +
> + __m128i temp1 = _mm_stream_load_si128(cached_buffer + 0);
> + __m128i temp2 = _mm_stream_load_si128(cached_buffer + 1);
> + __m128i temp3 = _mm_stream_load_si128(cached_buffer + 2);
> + __m128i temp4 = _mm_stream_load_si128(cached_buffer + 3);

_mm_load_si128()

>
> -  __m128i temp1 = _mm_stream_load_si128(src_cacheline + 0);
> -  __m128i temp2 = _mm_stream_load_si128(src_cacheline + 1);
> -  __m128i temp3 = _mm_stream_load_si128(src_cacheline + 2);
> -

Re: [Mesa-dev] [PATCH 2/2] i965: Optimize intel_batchbuffer_emit_dword().

2015-07-08 Thread Matt Turner

On Wed, Jul 8, 2015 at 4:53 PM, Chris Wilson  wrote:
> On Wed, Jul 08, 2015 at 03:33:17PM -0700, Matt Turner wrote:
>> On Wed, Jul 8, 2015 at 2:07 PM, Chris Wilson  
>> wrote:
>> > On Wed, Jul 08, 2015 at 02:00:02PM -0700, Matt Turner wrote:
>> >> By keeping a pointer to the next available location, we reduce the
>> >> number of memory accesses needed to write to the batchbuffer.
>> >>
>> >> A net ~7k reduction of .text size, 7.5k of which is from the change to
>> >> intel_batchbuffer_emit_dword().
>> >>
>> >>text data  bss  dec  hex  filename
>> >> 4943740   19515226192  5165084   4ed01c  i965_dri.so before
>> >> 4936804   19515226192  5158148   4eb504  i965_dri.so after
>> >>
>> >> Combined with the previous patch, improves performance of Synmark
>> >> OglBatch7 by 4.05914% +/- 1.49373% (n=270) on Haswell.
>> >> ---
>> >> Full disclosure: when testing on an IVB desktop, I measured a
>> >> regression in the same benchmark of -4.19005% +/- 1.15188% (n=30).
>> >> I don't have any explanation.
>> >
>> > The problem is that it seems to generate worse code with multiple
>> > adjacent emit_dwords. I have seen similar regressions when doing the
>> > same batch[index] to *batch++ elsewhere.
>> > -Chris
>>
>> That is in conflict with the data I've provided. In fact, I started by
>> noticing that if I added intel_batchbuffer_emit_dword* functions that
>> took multiple dwords that there was a reduction in .text size, so it's
>> something that I've considered.
>
> This is part of a disassembly of the index version of
> gen6_viewport_state.c::upload_viewport_state_pointers()
> (ivb with -march=native -Ofast)
>
>0x0287 <+103>:   lea0x1(%rax),%esi
>0x028a <+106>:   mov%si,0x22f24(%rdi)
>0x0291 <+113>:   mov%ecx,(%rdx,%rax,4)
>0x0294 <+116>:   movzwl 0x22f24(%rdi),%eax
>0x029b <+123>:   mov0x24330(%rdi),%ecx
>0x02a1 <+129>:   mov0x22f08(%rdi),%rdx
>0x02a8 <+136>:   lea0x1(%rax),%esi
>0x02ab <+139>:   mov%si,0x22f24(%rdi)
>0x02b2 <+146>:   mov%ecx,(%rdx,%rax,4)
>0x02b5 <+149>:   movzwl 0x22f24(%rdi),%eax
>0x02bc <+156>:   mov0x2480c(%rdi),%ecx
>0x02c2 <+162>:   mov0x22f08(%rdi),%rdx
>0x02c9 <+169>:   lea0x1(%rax),%esi
>0x02cc <+172>:   mov%si,0x22f24(%rdi)
>0x02d3 <+179>:   mov%ecx,(%rdx,%rax,4)
>0x02d6 <+182>:   pop%rbp
>0x02d7 <+183>:   retq
>
> and for comparison the pointer version:
>
>0x0280 <+96>:lea0x4(%rax),%rcx
>0x0284 <+100>:   mov%rcx,0x22f10(%rdi)
>0x028b <+107>:   mov%edx,(%rax)
>0x028d <+109>:   mov0x22f10(%rdi),%rax
>0x0294 <+116>:   mov0x24338(%rdi),%edx
>0x029a <+122>:   lea0x4(%rax),%rcx
>0x029e <+126>:   mov%rcx,0x22f10(%rdi)
>0x02a5 <+133>:   mov%edx,(%rax)
>0x02a7 <+135>:   mov0x22f10(%rdi),%rax
>0x02ae <+142>:   mov0x24814(%rdi),%edx
>0x02b4 <+148>:   lea0x4(%rax),%rcx
>0x02b8 <+152>:   mov%rcx,0x22f10(%rdi)
>0x02bf <+159>:   mov%edx,(%rax)
>0x02c1 <+161>:   pop%rbp
>0x02c2 <+162>:   retq
>
> So in neither case does gcc avoid incrementing either the index or the
> pointer in the struct after emit_dword, and then reloads it for the
> next.
>
> This is what I expected to see
>0x025e <+62>:movl   $0x780d1c02,(%rcx)
>0x0264 <+68>:mov0x22f08(%rdi),%rax
>0x026b <+75>:mov0x24320(%rdi),%edx
>0x0271 <+81>:mov%edx,0x4(%rax)
>0x0274 <+84>:mov0x22f08(%rdi),%rax
>0x027b <+91>:mov0x24338(%rdi),%edx
>0x0281 <+97>:mov%edx,0x8(%rax)
>0x0284 <+100>:   mov0x22f08(%rdi),%rax
>0x028b <+107>:   mov0x24814(%rdi),%edx
>0x0291 <+113>:   mov%ed

Re: [Mesa-dev] [PATCH 1/2] i965: Implement b2f and b2i using negation.

2015-07-10 Thread Matt Turner

On Fri, Jul 10, 2015 at 10:06 AM, Francisco Jerez  wrote:
> Booleans are represented as 0/-1 on modern hardware which means we can
> just negate them to convert them into a numeric type.  Negation has
> the benefit that it can be implemented using a source modifier which
> can easily be propagated into some other instruction.  shader-db
> results on HSW:
>
>  total instructions in shared programs: 5264246 -> 5264211 (-0.00%)
>  instructions in affected programs: 1464 -> 1429 (-2.39%)
>  helped:15
>  HURT:  1

Strange, I get different (better) numbers on Haswell:

total instructions in shared programs: 6279705 -> 6277316 (-0.04%)
instructions in affected programs: 40948 -> 38559 (-5.83%)
helped:123
HURT:  1
GAINED:1
LOST:  0

Certainly more than 15 helped programs in Civilization Beyond Earth alone.

The one hurt program is
rocketbirds-hardboiled-chicken/fp-2.shader_test, which is hurt because
we do not CSE the MOV instructions. I'll send a patch to fix this.

> No piglit regressions.

As a rule, this is implied by sending the patch. Don't put it in the
commit log -- in the worst case the patch is rebased and it's no
longer true (this has happened, embarrassingly enough). Same thing in
2/2.

> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 4 +---
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 7 +--
>  2 files changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 4690d00..64ff24c 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -969,10 +969,8 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, 
> nir_alu_instr *instr)
>break;
>
> case nir_op_b2i:
> -  bld.AND(result, op[0], fs_reg(1));
> -  break;
> case nir_op_b2f:
> -  bld.AND(retype(result, BRW_REGISTER_TYPE_UD), op[0], 
> fs_reg(0x3f80u));
> +  bld.MOV(result, negate(op[0]));
>break;
>
> case nir_op_f2b:
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> index c9c2661..fd94a70 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> @@ -1733,16 +1733,11 @@ vec4_visitor::visit(ir_expression *ir)
>emit(MOV(result_dst, op[0]));
>break;
> case ir_unop_b2i:
> -  emit(AND(result_dst, op[0], src_reg(1)));
> -  break;
> case ir_unop_b2f:
>if (devinfo->gen <= 5) {
>   resolve_bool_comparison(ir->operands[0], &op[0]);
>}
> -  op[0].type = BRW_REGISTER_TYPE_D;
> -  result_dst.type = BRW_REGISTER_TYPE_D;
> -  emit(AND(result_dst, op[0], src_reg(0x3f80u)));
> -  result_dst.type = BRW_REGISTER_TYPE_F;
> +  emit(MOV(result_dst, negate(op[0])));
>    break;
> case ir_unop_f2b:
>emit(CMP(result_dst, op[0], src_reg(0.0f), BRW_CONDITIONAL_NZ));
> --
> 2.4.3
>

Good idea. Not sure why I didn't think of that before.

Both are:

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 7/7] egl/wayland: remove dead code

2015-07-10 Thread Matt Turner

On Fri, Jul 10, 2015 at 10:49 AM, Emil Velikov  wrote:
> The macro HAVE_MKOSTEMP was never defined.
>
> Cc: Axel Davy 
> Signed-off-by: Emil Velikov 
> ---
>  src/egl/drivers/dri2/platform_wayland.c | 6 --
>  1 file changed, 6 deletions(-)
>
> diff --git a/src/egl/drivers/dri2/platform_wayland.c 
> b/src/egl/drivers/dri2/platform_wayland.c
> index 9005eb7..bdccee3 100644
> --- a/src/egl/drivers/dri2/platform_wayland.c
> +++ b/src/egl/drivers/dri2/platform_wayland.c
> @@ -1255,17 +1255,11 @@ create_tmpfile_cloexec(char *tmpname)
>  {
> int fd;
>
> -#ifdef HAVE_MKOSTEMP
> -   fd = mkostemp(tmpname, O_CLOEXEC);
> -   if (fd >= 0)
> -  unlink(tmpname);

It is indeed dead-code, but wouldn't you rather simply add

AC_CHECK_FUNC([mkostemp], AC_DEFINE(HAVE_MKOSTEMP, 1, [Use mkostemp(3)]))

to configure.ac and bring it to life?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: Implement faster streaming memcpy

2015-07-10 Thread Matt Turner

On Fri, Jul 10, 2015 at 4:20 AM, Marek Olšák  wrote:
> Shouldn't this stuff be in src/util?

I guess? I'm not opposed to moving it, if other drivers can make use
of it, but I don't want this patch to do the move.

I wrote it in November 2013 and src/util only came into being in
August 2014 if you're wondering why it's not in src/util.

Also, top quoting.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/5] i965: Optimize batchbuffer macros.

2015-07-10 Thread Matt Turner

Previously OUT_BATCH was just a macro around an inline function which
does

   brw->batch.map[brw->batch.used++] = dword;

When making consecutive calls to intel_batchbuffer_emit_dword() the
compiler isn't able to recognize that we're writing consecutive memory
locations or that it doesn't need to write batch.used back to memory
each time.

We can avoid both of these problems by making a local pointer to the
next location in the batch in BEGIN_BATCH(), indexing it with a local
variable, and incrementing batch.used once in ADVANCE_BATCH().

Cuts 18k from the .text size.

   text data  bss  dec  hex  filename
4946956   19515226192  5168300   4edcac  i965_dri.so before
4928588   19515226192  5149932   4e94ec  i965_dri.so after

This series (including commit c0433948) improves performance of Synmark
OglBatch7 by 3.64514% +/- 0.298131% (n=282) on Ivybridge.
---
That -4.19005% +/- 1.15188% (n=30) regression on Ivybridge is now a
performance improvement! Thanks Chris for the help!

 src/mesa/drivers/dri/i965/intel_batchbuffer.c |  8 ++---
 src/mesa/drivers/dri/i965/intel_batchbuffer.h | 52 +++
 2 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index f82958f..93f2872 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -395,13 +395,13 @@ _intel_batchbuffer_flush(struct brw_context *brw,
  */
 uint32_t
 intel_batchbuffer_reloc(struct brw_context *brw,
-drm_intel_bo *buffer,
+drm_intel_bo *buffer, uint32_t offset,
 uint32_t read_domains, uint32_t write_domain,
 uint32_t delta)
 {
int ret;
 
-   ret = drm_intel_bo_emit_reloc(brw->batch.bo, 4*brw->batch.used,
+   ret = drm_intel_bo_emit_reloc(brw->batch.bo, offset,
 buffer, delta,
 read_domains, write_domain);
assert(ret == 0);
@@ -416,11 +416,11 @@ intel_batchbuffer_reloc(struct brw_context *brw,
 
 uint64_t
 intel_batchbuffer_reloc64(struct brw_context *brw,
-  drm_intel_bo *buffer,
+  drm_intel_bo *buffer, uint32_t offset,
   uint32_t read_domains, uint32_t write_domain,
   uint32_t delta)
 {
-   int ret = drm_intel_bo_emit_reloc(brw->batch.bo, 4*brw->batch.used,
+   int ret = drm_intel_bo_emit_reloc(brw->batch.bo, offset,
  buffer, delta,
  read_domains, write_domain);
assert(ret == 0);
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
index c0456f3..6342c97 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
@@ -59,14 +59,16 @@ void intel_batchbuffer_data(struct brw_context *brw,
 
 uint32_t intel_batchbuffer_reloc(struct brw_context *brw,
  drm_intel_bo *buffer,
+ uint32_t offset,
  uint32_t read_domains,
  uint32_t write_domain,
- uint32_t offset);
+ uint32_t delta);
 uint64_t intel_batchbuffer_reloc64(struct brw_context *brw,
drm_intel_bo *buffer,
+   uint32_t offset,
uint32_t read_domains,
uint32_t write_domain,
-   uint32_t offset);
+   uint32_t delta);
 static inline uint32_t float_as_int(float f)
 {
union {
@@ -160,23 +162,43 @@ intel_batchbuffer_advance(struct brw_context *brw)
 #endif
 }
 
-#define BEGIN_BATCH(n) intel_batchbuffer_begin(brw, n, RENDER_RING)
-#define BEGIN_BATCH_BLT(n) intel_batchbuffer_begin(brw, n, BLT_RING)
-#define OUT_BATCH(d) intel_batchbuffer_emit_dword(brw, d)
-#define OUT_BATCH_F(f) intel_batchbuffer_emit_float(brw, f)
-#define OUT_RELOC(buf, read_domains, write_domain, delta)  \
-   OUT_BATCH(intel_batchbuffer_reloc(brw, buf, read_domains, write_domain, \
- delta))
+#define BEGIN_BATCH(n) do {\
+   intel_batchbuffer_begin(brw, (n), RENDER_RING); \
+   uint32_t *__map = &brw->batch.map[brw->batch.used]; \
+   int __idx = 0, UNUSED __final_idx = (n)
+
+#define BEGIN_BATCH_BLT(n) do {\
+   intel_batchbuffer_begin(brw, (n), BLT_RING);\
+   uint32_t *__map = &brw->batch.map[brw->batch.used]; \
+   int __idx = 0, UNUSED __final_idx = (n)
+
+#define OUT_BATCH(d) __map[__idx++] = (d)
+#define OUT_BATCH_F(f) OUT_BATCH(float_as_int((f)))
+
+#define OUT_RELOC(buf, read_d

[Mesa-dev] [PATCH 1/5] i965: Move BEGIN_BATCH() into same control flow as ADVANCE_BATCH().

2015-07-10 Thread Matt Turner

BEGIN_BATCH() and ADVANCE_BATCH() will contain "do {" and "} while (0)"
respectively to allow declaring local variables used by intervening
OUT_BATCH macros. As such, BEGIN_BATCH() and ADVANCE_BATCH() need to be
in the same control flow.
---
 src/mesa/drivers/dri/i965/brw_draw.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index 69ad4d4..ec13473 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -261,17 +261,17 @@ static void brw_emit_prim(struct brw_context *brw,
   indirect_flag = 0;
}
 
+   BEGIN_BATCH(brw->gen >= 7 ? 7 : 6);
+
if (brw->gen >= 7) {
   if (brw->predicate.state == BRW_PREDICATE_STATE_USE_BIT)
  predicate_enable = GEN7_3DPRIM_PREDICATE_ENABLE;
   else
  predicate_enable = 0;
 
-  BEGIN_BATCH(7);
   OUT_BATCH(CMD_3D_PRIM << 16 | (7 - 2) | indirect_flag | 
predicate_enable);
   OUT_BATCH(hw_prim | vertex_access_type);
} else {
-  BEGIN_BATCH(6);
   OUT_BATCH(CMD_3D_PRIM << 16 | (6 - 2) |
 hw_prim << GEN4_3DPRIM_TOPOLOGY_TYPE_SHIFT |
 vertex_access_type);
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/5] i965: Turn set_blitter_tiling() into a macro.

2015-07-10 Thread Matt Turner

Its uses of OUT_BATCH will need a local variable defined by BEGIN_BATCH.

Increases .text size by 528 bytes.
---
 src/mesa/drivers/dri/i965/intel_blit.c | 55 --
 1 file changed, 25 insertions(+), 30 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_blit.c 
b/src/mesa/drivers/dri/i965/intel_blit.c
index bc39053..2a0f621 100644
--- a/src/mesa/drivers/dri/i965/intel_blit.c
+++ b/src/mesa/drivers/dri/i965/intel_blit.c
@@ -176,36 +176,31 @@ get_tr_vertical_align(uint32_t tr_mode, uint32_t cpp, 
bool is_src) {
  * tiling state would leak into other unsuspecting applications (like the X
  * server).
  */
-static void
-set_blitter_tiling(struct brw_context *brw,
-   bool dst_y_tiled, bool src_y_tiled)
-{
-   assert(brw->gen >= 6);
-
-   /* Idle the blitter before we update how tiling is interpreted. */
-   OUT_BATCH(MI_FLUSH_DW);
-   OUT_BATCH(0);
-   OUT_BATCH(0);
-   OUT_BATCH(0);
-
-   OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2));
-   OUT_BATCH(BCS_SWCTRL);
-   OUT_BATCH((BCS_SWCTRL_DST_Y | BCS_SWCTRL_SRC_Y) << 16 |
- (dst_y_tiled ? BCS_SWCTRL_DST_Y : 0) |
- (src_y_tiled ? BCS_SWCTRL_SRC_Y : 0));
-}
-
-#define BEGIN_BATCH_BLT_TILED(n, dst_y_tiled, src_y_tiled) do { \
-  BEGIN_BATCH_BLT(n + ((dst_y_tiled || src_y_tiled) ? 14 : 0)); \
-  if (dst_y_tiled || src_y_tiled)   \
- set_blitter_tiling(brw, dst_y_tiled, src_y_tiled); \
-   } while (0)
-
-#define ADVANCE_BATCH_TILED(dst_y_tiled, src_y_tiled) do {  \
-  if (dst_y_tiled || src_y_tiled)   \
- set_blitter_tiling(brw, false, false); \
-  ADVANCE_BATCH();  \
-   } while (0)
+#define SET_BLITTER_TILING(dst_y_tiled, src_y_tiled) do { \
+   assert(brw->gen >= 6); \
+  \
+   /* Idle the blitter before we update how tiling is interpreted. */ \
+   OUT_BATCH(MI_FLUSH_DW);\
+   OUT_BATCH(0);  \
+   OUT_BATCH(0);  \
+   OUT_BATCH(0);  \
+  \
+   OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2)); \
+   OUT_BATCH(BCS_SWCTRL); \
+   OUT_BATCH((BCS_SWCTRL_DST_Y | BCS_SWCTRL_SRC_Y) << 16 |\
+ (dst_y_tiled ? BCS_SWCTRL_DST_Y : 0) |   \
+ (src_y_tiled ? BCS_SWCTRL_SRC_Y : 0));   \
+} while (0)
+
+#define BEGIN_BATCH_BLT_TILED(n, dst_y_tiled, src_y_tiled)\
+   BEGIN_BATCH_BLT(n + ((dst_y_tiled || src_y_tiled) ? 14 : 0));  \
+   if (dst_y_tiled || src_y_tiled)\
+  SET_BLITTER_TILING(dst_y_tiled, src_y_tiled)\
+
+#define ADVANCE_BATCH_TILED(dst_y_tiled, src_y_tiled) \
+   if (dst_y_tiled || src_y_tiled)\
+  SET_BLITTER_TILING(false, false);   \
+   ADVANCE_BATCH()
 
 static int
 blt_pitch(struct intel_mipmap_tree *mt)
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/5] i965: Turn emit_vertex_buffer_state() into a macro.

2015-07-10 Thread Matt Turner

Its uses of OUT_BATCH will need a local variable defined by BEGIN_BATCH.

Increases .text size by 8 bytes.
---
 src/mesa/drivers/dri/i965/brw_draw_upload.c | 79 +
 1 file changed, 36 insertions(+), 43 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c 
b/src/mesa/drivers/dri/i965/brw_draw_upload.c
index 320e40e..0536ac3 100644
--- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
@@ -604,46 +604,40 @@ brw_prepare_shader_draw_parameters(struct brw_context 
*brw)
 /**
  * Emit a VERTEX_BUFFER_STATE entry (part of 3DSTATE_VERTEX_BUFFERS).
  */
-static void
-emit_vertex_buffer_state(struct brw_context *brw,
- unsigned buffer_nr,
- drm_intel_bo *bo,
- unsigned bo_ending_address,
- unsigned bo_offset,
- unsigned stride,
- unsigned step_rate)
-{
-   struct gl_context *ctx = &brw->ctx;
-   uint32_t dw0;
-
-   if (brw->gen >= 6) {
-  dw0 = (buffer_nr << GEN6_VB0_INDEX_SHIFT) |
-(step_rate ? GEN6_VB0_ACCESS_INSTANCEDATA
-   : GEN6_VB0_ACCESS_VERTEXDATA);
-   } else {
-  dw0 = (buffer_nr << BRW_VB0_INDEX_SHIFT) |
-(step_rate ? BRW_VB0_ACCESS_INSTANCEDATA
-   : BRW_VB0_ACCESS_VERTEXDATA);
-   }
-
-   if (brw->gen >= 7)
-  dw0 |= GEN7_VB0_ADDRESS_MODIFYENABLE;
-
-   if (brw->gen == 7)
-  dw0 |= GEN7_MOCS_L3 << 16;
-
-   WARN_ONCE(stride >= (brw->gen >= 5 ? 2048 : 2047),
- "VBO stride %d too large, bad rendering may occur\n",
- stride);
-   OUT_BATCH(dw0 | (stride << BRW_VB0_PITCH_SHIFT));
-   OUT_RELOC(bo, I915_GEM_DOMAIN_VERTEX, 0, bo_offset);
-   if (brw->gen >= 5) {
-  OUT_RELOC(bo, I915_GEM_DOMAIN_VERTEX, 0, bo_ending_address);
-   } else {
-  OUT_BATCH(0);
-   }
-   OUT_BATCH(step_rate);
-}
+#define EMIT_VERTEX_BUFFER_STATE(buffer_nr, bo, bo_ending_address, \
+ bo_offset,stride, step_rate)  \
+do {   \
+   struct gl_context *ctx = &brw->ctx; \
+   uint32_t dw0;   \
+   \
+   if (brw->gen >= 6) {\
+  dw0 = (buffer_nr << GEN6_VB0_INDEX_SHIFT) |  \
+(step_rate ? GEN6_VB0_ACCESS_INSTANCEDATA  \
+   : GEN6_VB0_ACCESS_VERTEXDATA);  \
+   } else {\
+  dw0 = (buffer_nr << BRW_VB0_INDEX_SHIFT) |   \
+(step_rate ? BRW_VB0_ACCESS_INSTANCEDATA   \
+   : BRW_VB0_ACCESS_VERTEXDATA);   \
+   }   \
+   \
+   if (brw->gen >= 7)  \
+  dw0 |= GEN7_VB0_ADDRESS_MODIFYENABLE;\
+   \
+   if (brw->gen == 7)  \
+  dw0 |= GEN7_MOCS_L3 << 16;   \
+   \
+   WARN_ONCE(stride >= (brw->gen >= 5 ? 2048 : 2047),  \
+ "VBO stride %d too large, bad rendering may occur\n", \
+ stride);  \
+   OUT_BATCH(dw0 | (stride << BRW_VB0_PITCH_SHIFT));   \
+   OUT_RELOC(bo, I915_GEM_DOMAIN_VERTEX, 0, bo_offset);\
+   if (brw->gen >= 5) {\
+  OUT_RELOC(bo, I915_GEM_DOMAIN_VERTEX, 0, bo_ending_address); \
+   } else {\
+  OUT_BATCH(0);\
+   }   \
+   OUT_BATCH(step_rate);   \
+} while (0)
 
 static void brw_emit_vertices(struct brw_context *brw)
 {
@@ -704,14 +698,13 @@ static void brw_emit_vertices(struct brw_context *brw)
   OUT_BATCH((_3DSTATE_VERTEX_BUFFERS << 16) | (4 * nr_buffers - 1));
   for (i = 0; i < brw->vb.nr_buffers; i++) {
 struct brw_vertex_buffer *buffer = &brw->vb.buffers[i];
- emit_vertex_buffer_state(brw, i, buffer->bo, buffer->bo->size - 1,
+ EMIT_VERTEX_BUFFER_STATE(i, buffer->bo, buffer->bo->size - 1,
   buffer->offset, buffer->stride,
   buffer->step_rate);
-
   }
 
   if (brw->vs.prog_data->uses_vertexid) {
- emit_vertex_buffer_state(brw, brw->vb.nr_buffers

[Mesa-dev] [PATCH 4/5] i965: Split batch emission from relocation functions.

2015-07-10 Thread Matt Turner

So that everything writing to the batch between BEGIN_BATCH() and
ADVANCE_BATCH() goes through OUT_BATCH.
---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 30 ++-
 src/mesa/drivers/dri/i965/intel_batchbuffer.h | 34 ++-
 2 files changed, 30 insertions(+), 34 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index d93ee6e..f82958f 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -393,11 +393,11 @@ _intel_batchbuffer_flush(struct brw_context *brw,
 
 /*  This is the only way buffers get added to the validate list.
  */
-bool
-intel_batchbuffer_emit_reloc(struct brw_context *brw,
- drm_intel_bo *buffer,
- uint32_t read_domains, uint32_t write_domain,
-uint32_t delta)
+uint32_t
+intel_batchbuffer_reloc(struct brw_context *brw,
+drm_intel_bo *buffer,
+uint32_t read_domains, uint32_t write_domain,
+uint32_t delta)
 {
int ret;
 
@@ -411,16 +411,14 @@ intel_batchbuffer_emit_reloc(struct brw_context *brw,
 * case the buffer doesn't move and we can short-circuit the relocation
 * processing in the kernel
 */
-   intel_batchbuffer_emit_dword(brw, buffer->offset64 + delta);
-
-   return true;
+   return buffer->offset64 + delta;
 }
 
-bool
-intel_batchbuffer_emit_reloc64(struct brw_context *brw,
-   drm_intel_bo *buffer,
-   uint32_t read_domains, uint32_t write_domain,
-  uint32_t delta)
+uint64_t
+intel_batchbuffer_reloc64(struct brw_context *brw,
+  drm_intel_bo *buffer,
+  uint32_t read_domains, uint32_t write_domain,
+  uint32_t delta)
 {
int ret = drm_intel_bo_emit_reloc(brw->batch.bo, 4*brw->batch.used,
  buffer, delta,
@@ -432,11 +430,7 @@ intel_batchbuffer_emit_reloc64(struct brw_context *brw,
 * case the buffer doesn't move and we can short-circuit the relocation
 * processing in the kernel
 */
-   uint64_t offset = buffer->offset64 + delta;
-   intel_batchbuffer_emit_dword(brw, offset);
-   intel_batchbuffer_emit_dword(brw, offset >> 32);
-
-   return true;
+   return buffer->offset64 + delta;
 }
 
 
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
index e58eae4..c0456f3 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
@@ -57,16 +57,16 @@ void intel_batchbuffer_data(struct brw_context *brw,
 const void *data, GLuint bytes,
 enum brw_gpu_ring ring);
 
-bool intel_batchbuffer_emit_reloc(struct brw_context *brw,
-   drm_intel_bo *buffer,
-  uint32_t read_domains,
-  uint32_t write_domain,
-  uint32_t offset);
-bool intel_batchbuffer_emit_reloc64(struct brw_context *brw,
-drm_intel_bo *buffer,
-uint32_t read_domains,
-uint32_t write_domain,
-uint32_t offset);
+uint32_t intel_batchbuffer_reloc(struct brw_context *brw,
+ drm_intel_bo *buffer,
+ uint32_t read_domains,
+ uint32_t write_domain,
+ uint32_t offset);
+uint64_t intel_batchbuffer_reloc64(struct brw_context *brw,
+   drm_intel_bo *buffer,
+   uint32_t read_domains,
+   uint32_t write_domain,
+   uint32_t offset);
 static inline uint32_t float_as_int(float f)
 {
union {
@@ -164,14 +164,16 @@ intel_batchbuffer_advance(struct brw_context *brw)
 #define BEGIN_BATCH_BLT(n) intel_batchbuffer_begin(brw, n, BLT_RING)
 #define OUT_BATCH(d) intel_batchbuffer_emit_dword(brw, d)
 #define OUT_BATCH_F(f) intel_batchbuffer_emit_float(brw, f)
-#define OUT_RELOC(buf, read_domains, write_domain, delta) do { \
-   intel_batchbuffer_emit_reloc(brw, buf,  \
-   read_domains, write_domain, delta); \
-} while (0)
+#define OUT_RELOC(buf, read_domains, write_domain, delta)  \
+   OUT_BATCH(intel_batchbuffer_reloc(brw, buf, read_domains, write_domain, \
+ delta))
 
 /* Handle 48-bit address relocations for Gen8+ */
-#define OUT_RELOC64(buf, read_domains, write_domain, delta) do { \
-   intel_batchbuffer_emit

Re: [Mesa-dev] [PATCH] [v3] i965: Split out gen8 push constant state upload

2015-07-10 Thread Matt Turner

On Thu, Jul 9, 2015 at 11:00 AM, Ben Widawsky
 wrote:
> While implementing the workaround in the previous patch I noticed things were
> starting to get a bit messy. Since gen8 works differently enough from gen7, I
> thought splitting it out with be good.
>
> While here, get rid of gen8 MOCS which does nothing and was in the wrong place
> anyway.
>
> This patch is totally optional. I'd be willing to just always use buffer #2 on
> gen8+. Pre-HSW this wasn't allowed, but it looks like it's okay for gen8 too.
>
> v2: Move inactive batch generation to the top of the function in order to make
> the rest of the code easier to read.
>
> Jenkins results (still a bunch of spurious failures, I miss Mark):
> http://otc-mesa-ci.jf.intel.com/job/bwidawsk/169/
>
> v3: v2 had a bug in that it both didn't emit the right number of dwords, and 
> it
> didn't do ADVANCE_BATCH(). I'm moderately worried that there were no failures 
> as
> a result.
> http://otc-mesa-ci.jf.intel.com/job/bwidawsk/170/

I don't think putting Intel-internal links in the commit message is a good idea.

Ken's made similar comments to me.

Also, so much off the wall commentary...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Replace illegal compacted NOP with valid compact instruction

2015-07-10 Thread Matt Turner

On Wed, Jul 8, 2015 at 10:58 PM, Zhenyu Wang  wrote:
> NOP actually has no compact version, but we use it for instruction
> alignment for compact kernel. Although it seems working on HW, it is
> illegal and might not be valid for any future one.
>
> This trys to get a temporary compact instruction with no effect for
> alignment to replace compacted NOP. G45 spec has note that HW compact
> logic could determine NENOP and drop it right away, so we can still
> keep with that.
>
> v2: rebase to master, we still need this to work with internal tool.
>
> Signed-off-by: Zhenyu Wang 
> ---
>  src/mesa/drivers/dri/i965/brw_eu_compact.c | 41 
> +-
>  1 file changed, 35 insertions(+), 6 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_compact.c 
> b/src/mesa/drivers/dri/i965/brw_eu_compact.c
> index 67f0b45..719667a 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_compact.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_compact.c
> @@ -1367,6 +1367,39 @@ brw_init_compaction_tables(const struct 
> brw_device_info *devinfo)
> }
>  }
>
> +static void
> +brw_get_noop_compact(struct brw_codegen *p, brw_compact_inst *dst)

I'd rather call this function fill_compaction_padding() or something
similar -- there's no need for the brw_ prefix since it's static, it's
not "get"ting anything, and "noop" in the name is a little confusing
since it's not emitting a NOP. :)

> +{
> +   const struct brw_device_info *devinfo = p->devinfo;
> +   brw_inst *inst, i;
> +   struct brw_reg g0 = brw_vec8_grf(0, 0);
> +
> +   memset(dst, 0, sizeof(*dst));
> +
> +   /* G45 compact logic could recognize NENOP and drop right away. */
> +   if (devinfo->is_g4x) {
> +  brw_compact_inst_set_opcode(dst, BRW_OPCODE_NENOP);
> +  brw_compact_inst_set_cmpt_control(dst, true);
> +  return;
> +   }
> +
> +   /*
> +* As NOP has no legal compact version, try to use a legal compact
> +* instruction for compact instruction alignment.
> +*/
> +   brw_push_insn_state(p);
> +   brw_set_default_predicate_control(p, BRW_PREDICATE_NONE);
> +   brw_set_default_access_mode(p, BRW_ALIGN_1);
> +   inst = brw_MOV(p, g0, g0);
> +   memcpy(&i, inst, sizeof(brw_inst));
> +   brw_pop_insn_state(p);
> +
> +   if (!brw_try_compact_instruction(devinfo, dst, &i)) {
> +  fprintf(stderr, "Failed to generate compact inst for alignment!\n");
> +  exit(1);

This isn't an error we ever expect a user to hit, to let's make it an assert:

bool UNUSED ret = brw_try_compact_instruction(devinfo, dst, &i);
assert(ret);

With those small changes, this patch is

Reviewed-by: Matt Turner 

I'd be happy to make the changes myself and commit the patch if you'd
like -- just tell me so. :)

Thanks Zhenyu!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/5] i965/fs: fix stride and type for hw_reg's in regs_read()

2015-07-10 Thread Matt Turner

On Wed, Jul 1, 2015 at 11:51 AM, Connor Abbott  wrote:
> sources with file == HW_REG get all their information from the
> fixed_hw_reg field, so we need to get the stride and type from there
> when computing the size.
>
> Signed-off-by: Connor Abbott 
> ---

Patches 1 and 2 are

Reviewed-by: Matt Turner 

I think everything in this series is now reviewed.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] i965/gs: Move vertex_count != 0 check up a level; skip one caller.

2015-07-10 Thread Matt Turner

On Wed, Jul 1, 2015 at 7:28 PM, Kenneth Graunke  wrote:
> Paul's original code had emit_control_data_bits() skip the URB write if
> vertex_count was 0.  This meant wrapping every control data write in a
> conditional write.
>
> We accumulate control data bits in a single UD (32-bit) register.  For
> simple shaders that don't emit many vertices, the control data header
> will be <= 32-bits long, so we only need to write it once at the end of
> the shader.
>
> For shaders with larger headers, we write out batches of control data
> bits at EmitVertex(), when (vertex_count * bits_per_vertex) % 32 == 0.
> On the first EmitVertex() call, the above expression will evaluate to
> true simply because vertex_count == 0.  But we want to avoid emitting
> the control data bits, because we haven't accumulated 32-bits worth yet.
>
> In other words, the vertex_count != 0 check is really only necessary in
> the EmitVertex() batching case, not the end-of-thread case.
>
> This saves a CMP/IF/ENDIF in every shader that uses EndPrimitive() or
> multiple streams.  The only downside is that a shader which emits no
> vertices at all will execute an additional URB write---but such shaders
> are pointless and not worth optimizing.
>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> index 2f948ee..55408eb 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
> @@ -348,11 +348,6 @@ vec4_gs_visitor::emit_control_data_bits()
> if (c->control_data_header_size_bits > 128)
>urb_write_flags = urb_write_flags | BRW_URB_WRITE_PER_SLOT_OFFSET;
>
> -   /* If vertex_count is 0, then no control data bits have been accumulated
> -* yet, so we should do nothing.
> -*/
> -   emit(CMP(dst_null_d(), this->vertex_count, 0u, BRW_CONDITIONAL_NEQ));
> -   emit(IF(BRW_PREDICATE_NORMAL));
> {
>/* If we are using either channel masks or a per-slot offset, then we
> * need to figure out which DWORD we are trying to write to, using the
> @@ -431,7 +426,6 @@ vec4_gs_visitor::emit_control_data_bits()
>inst->base_mrf = base_mrf;
>inst->mlen = 2;
> }
> -   emit(BRW_OPCODE_ENDIF);
>  }
>
>  void
> @@ -531,9 +525,17 @@ vec4_gs_visitor::visit(ir_emit_vertex *ir)
>  emit(AND(dst_null_d(), this->vertex_count,
>   (uint32_t) (32 / c->control_data_bits_per_vertex - 1)));
>   inst->conditional_mod = BRW_CONDITIONAL_Z;
> +
>   emit(IF(BRW_PREDICATE_NORMAL));
>   {
> +/* If vertex_count is 0, then no control data bits have been
> + * accumulated yet, so we skip emitting them.
> + */
> +    emit(CMP(dst_null_d(), this->vertex_count, 0u,
> + BRW_CONDITIONAL_NEQ));

I think you wanted to indent BRW_CONDITIONAL_NEQ to match dst_null_d().

Also, can we s/NEQ/NZ/ while we're here?

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: Fix indentation in emit_control_data_bits().

2015-07-10 Thread Matt Turner

Confirmed that git show -w shows only the removal of matching braces.

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] [v2] i965/chv|skl: Apply sampler bypass w/a

2015-07-10 Thread Matt Turner

On Wed, Jul 8, 2015 at 5:04 PM, Ben Widawsky
 wrote:
> Certain compressed formats require this setting. The docs don't go into much
> detail as to why it's needed exactly.
>
> This patch introduces no piglit regressions on gen9 (bsw is untested). Note 
> that
> the SKL "regressions" are fixed tests, and the egl_khr_gl_colorspace tests are
> WTF. The patch also fixes nothing I can find.
> http://otc-mesa-ci.jf.intel.com/job/Leeroy/127820/
>
> v2:
> Reworded commit message (Matt); Added piglit results link.
> Restructured condition (Matt)
> Moved check out to function (Nanley). I left the setting of the bit in the
>   surface state open coded because it seems to go better with the existing 
> code.
>
> Cc: Matt Turner 
> Cc: Nanley Chery 
> Cc: Jordan Justen  (aux-hiz needs this too)
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h|  1 +
>  src/mesa/drivers/dri/i965/gen8_surface_state.c | 29 
> ++
>  2 files changed, 30 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 19489ab..3668967 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -276,6 +276,7 @@
>  #define GEN8_SURFACE_TILING_W   (1 << 12)
>  #define GEN8_SURFACE_TILING_X   (2 << 12)
>  #define GEN8_SURFACE_TILING_Y   (3 << 12)
> +#define GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE  (1 << 9)
>  #define BRW_SURFACE_RC_READ_WRITE  (1 << 8)
>  #define BRW_SURFACE_MIPLAYOUT_SHIFT10
>  #define BRW_SURFACE_MIPMAPLAYOUT_BELOW   0
> diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> index bd3eb00..9bbe8ae 100644
> --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> @@ -132,6 +132,27 @@ horizontal_alignment(const struct brw_context *brw,
> }
>  }
>
> +static bool
> +sampler_l2_bypass_disable(struct brw_context *brw, unsigned format)
> +{
> +   /* From the CHV PRM, Volume 2d, page 321 (RENDER_SURFACE_STATE dword 0
> +* bit 9 "Sampler L2 Bypass Mode Disable" Programming Notes):
> +*
> +*This bit must be set for the following surface types: BC2_UNORM
> +*BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM
> +*/
> +   if ((brw->gen >= 9 || brw->is_cherryview) &&
> +   (format == BRW_SURFACEFORMAT_BC2_UNORM ||
> +format == BRW_SURFACEFORMAT_BC3_UNORM ||
> +format == BRW_SURFACEFORMAT_BC5_UNORM ||
> +format == BRW_SURFACEFORMAT_BC5_SNORM ||
> +format == BRW_SURFACEFORMAT_BC7_UNORM)) {
> +  return true;
> +   }
> +
> +   return false;
> +}
> +
>  static uint32_t *
>  allocate_surface_state(struct brw_context *brw, uint32_t *out_offset, int 
> index)
>  {
> @@ -238,6 +259,10 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
>surf[0] |= BRW_SURFACE_CUBEFACE_ENABLES;
> }
>
> +   if (sampler_l2_bypass_disable(brw, format)) {
> +  surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
> +   }
> +
> if (_mesa_is_array_texture(target) || target == GL_TEXTURE_CUBE_MAP)
>surf[0] |= GEN8_SURFACE_IS_ARRAY;
>
> @@ -465,6 +490,10 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
>   horizontal_alignment(brw, mt, surf_type) |
>   surface_tiling_mode(tiling);
>
> +   if (sampler_l2_bypass_disable(brw, format)) {
> +  surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
> +   }

I don't think any of the formats this workaround applies to can be
render targets, can they?

All BC* RT entries in the brw_surface_formats.c table are 'x'
(unsupported) and I don't think new hardware is going to implement a
compressor for these. :)

Remove the change to gen8_update_renderbuffer_surface, and inline the
function in its one actual use in gen8_emit_texture_surface_state(),
and it gets a

Reviewed-by: Matt Turner 

> +
> surf[1] = SET_FIELD(mocs, GEN8_SURFACE_MOCS) | mt->qpitch >> 2;
>
> surf[2] = SET_FIELD(width - 1, GEN7_SURFACE_WIDTH) |
> --
> 2.4.4
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] i965: Check the INTEL_USE_NIR environment variable once at context creation

2015-07-10 Thread Matt Turner

On Fri, Apr 3, 2015 at 10:04 AM, Jason Ekstrand  wrote:
> On Fri, Apr 3, 2015 at 9:46 AM, Matt Turner  wrote:
>> On Fri, Apr 3, 2015 at 1:07 AM, Jordan Justen  
>> wrote:
>>> On 2015-04-02 20:56:15, Jason Ekstrand wrote:
>>>> ---
>>>>  src/mesa/drivers/dri/i965/brw_context.c | 10 +-
>>>>  src/mesa/drivers/dri/i965/brw_fs.cpp|  4 ++--
>>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp  |  4 +++-
>>>>  3 files changed, 14 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
>>>> b/src/mesa/drivers/dri/i965/brw_context.c
>>>> index 84818f0..f0de711 100644
>>>> --- a/src/mesa/drivers/dri/i965/brw_context.c
>>>> +++ b/src/mesa/drivers/dri/i965/brw_context.c
>>>> @@ -560,6 +560,12 @@ brw_initialize_context_constants(struct brw_context 
>>>> *brw)
>>>>.lower_ffma = true,
>>>> };
>>>>
>>>> +   bool use_nir_default[MESA_SHADER_STAGES];
>>>> +   use_nir_default[MESA_SHADER_VERTEX] = false;
>>>> +   use_nir_default[MESA_SHADER_GEOMETRY] = false;
>>>> +   use_nir_default[MESA_SHADER_FRAGMENT] = false;
>>>> +   use_nir_default[MESA_SHADER_COMPUTE] = false;
>>>
>>> How about memset to 0 for now to make sure all stages are set? We can
>>> add use_nir_default[MESA_SHADER_FOO] = true; after the memset to
>>> update the default for the shader stage.
>
> Sure, we could do that.  I'm not sure if it really saves us anything.
> I guess it would make sure that we initialize everything.
>
>> Isn't this sufficient?
>>
>> bool use_nir_default[MESA_SHADER_STAGES] = {false};
>
> Yes, that would accomplish the memset in less code.
>
>> and use C99 designated initializers when we want to change the default
>> per-stage.
>
> No, we can't do this.  When we flip the switch, we're going to have
>
>> use_nir_default[MESA_SHADER_VERTEX] = brw->gen >= 8;
>
> and you can only use compile-time constants in initializers.

I was cleaning out old mail and came across this. It's irrelevant now,
especially since this patch landed long ago and has since been
deleted, but it's totally okay to have non-constant initializers in
initializer lists -- you can even initialize with function calls:

struct brw_context {
int gen;
};

struct s {
int bool;
};

int gencheck(struct brw_context *brw) {
return brw->gen >= 8;
}

struct s foo(struct brw_context *brw)
{
struct s s = { .bool = gencheck(brw) };
return s;
}

I suspect what you were thinking about was initializer lists for
things that are themselves compile-time constants. Of course you can't
initialize them with non-constants, but that has nothing to do with
initializer lists.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/4] i965: Split batch emission from relocation functions.

2015-07-13 Thread Matt Turner

So that everything writing to the batch between BEGIN_BATCH() and
ADVANCE_BATCH() goes through OUT_BATCH.

Reviewed-by: Chris Wilson 
---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 30 ++-
 src/mesa/drivers/dri/i965/intel_batchbuffer.h | 34 ++-
 2 files changed, 30 insertions(+), 34 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index d93ee6e..f82958f 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -393,11 +393,11 @@ _intel_batchbuffer_flush(struct brw_context *brw,
 
 /*  This is the only way buffers get added to the validate list.
  */
-bool
-intel_batchbuffer_emit_reloc(struct brw_context *brw,
- drm_intel_bo *buffer,
- uint32_t read_domains, uint32_t write_domain,
-uint32_t delta)
+uint32_t
+intel_batchbuffer_reloc(struct brw_context *brw,
+drm_intel_bo *buffer,
+uint32_t read_domains, uint32_t write_domain,
+uint32_t delta)
 {
int ret;
 
@@ -411,16 +411,14 @@ intel_batchbuffer_emit_reloc(struct brw_context *brw,
 * case the buffer doesn't move and we can short-circuit the relocation
 * processing in the kernel
 */
-   intel_batchbuffer_emit_dword(brw, buffer->offset64 + delta);
-
-   return true;
+   return buffer->offset64 + delta;
 }
 
-bool
-intel_batchbuffer_emit_reloc64(struct brw_context *brw,
-   drm_intel_bo *buffer,
-   uint32_t read_domains, uint32_t write_domain,
-  uint32_t delta)
+uint64_t
+intel_batchbuffer_reloc64(struct brw_context *brw,
+  drm_intel_bo *buffer,
+  uint32_t read_domains, uint32_t write_domain,
+  uint32_t delta)
 {
int ret = drm_intel_bo_emit_reloc(brw->batch.bo, 4*brw->batch.used,
  buffer, delta,
@@ -432,11 +430,7 @@ intel_batchbuffer_emit_reloc64(struct brw_context *brw,
 * case the buffer doesn't move and we can short-circuit the relocation
 * processing in the kernel
 */
-   uint64_t offset = buffer->offset64 + delta;
-   intel_batchbuffer_emit_dword(brw, offset);
-   intel_batchbuffer_emit_dword(brw, offset >> 32);
-
-   return true;
+   return buffer->offset64 + delta;
 }
 
 
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
index e58eae4..c0456f3 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
@@ -57,16 +57,16 @@ void intel_batchbuffer_data(struct brw_context *brw,
 const void *data, GLuint bytes,
 enum brw_gpu_ring ring);
 
-bool intel_batchbuffer_emit_reloc(struct brw_context *brw,
-   drm_intel_bo *buffer,
-  uint32_t read_domains,
-  uint32_t write_domain,
-  uint32_t offset);
-bool intel_batchbuffer_emit_reloc64(struct brw_context *brw,
-drm_intel_bo *buffer,
-uint32_t read_domains,
-uint32_t write_domain,
-uint32_t offset);
+uint32_t intel_batchbuffer_reloc(struct brw_context *brw,
+ drm_intel_bo *buffer,
+ uint32_t read_domains,
+ uint32_t write_domain,
+ uint32_t offset);
+uint64_t intel_batchbuffer_reloc64(struct brw_context *brw,
+   drm_intel_bo *buffer,
+   uint32_t read_domains,
+   uint32_t write_domain,
+   uint32_t offset);
 static inline uint32_t float_as_int(float f)
 {
union {
@@ -164,14 +164,16 @@ intel_batchbuffer_advance(struct brw_context *brw)
 #define BEGIN_BATCH_BLT(n) intel_batchbuffer_begin(brw, n, BLT_RING)
 #define OUT_BATCH(d) intel_batchbuffer_emit_dword(brw, d)
 #define OUT_BATCH_F(f) intel_batchbuffer_emit_float(brw, f)
-#define OUT_RELOC(buf, read_domains, write_domain, delta) do { \
-   intel_batchbuffer_emit_reloc(brw, buf,  \
-   read_domains, write_domain, delta); \
-} while (0)
+#define OUT_RELOC(buf, read_domains, write_domain, delta)  \
+   OUT_BATCH(intel_batchbuffer_reloc(brw, buf, read_domains, write_domain, \
+ delta))
 
 /* Handle 48-bit address relocations for Gen8+ */
-#define OUT_RELOC64(buf, read_domains, write_domain, delta) do {

[Mesa-dev] [PATCH 1/4] i965: Move BEGIN_BATCH() into same control flow as ADVANCE_BATCH().

2015-07-13 Thread Matt Turner

BEGIN_BATCH() and ADVANCE_BATCH() will contain "do {" and "} while (0)"
respectively to allow declaring local variables used by intervening
OUT_BATCH macros. As such, BEGIN_BATCH() and ADVANCE_BATCH() need to be
in the same control flow.

Reviewed-by: Iago Toral Quiroga 
---
 src/mesa/drivers/dri/i965/brw_draw.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index 69ad4d4..ec13473 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -261,17 +261,17 @@ static void brw_emit_prim(struct brw_context *brw,
   indirect_flag = 0;
}
 
+   BEGIN_BATCH(brw->gen >= 7 ? 7 : 6);
+
if (brw->gen >= 7) {
   if (brw->predicate.state == BRW_PREDICATE_STATE_USE_BIT)
  predicate_enable = GEN7_3DPRIM_PREDICATE_ENABLE;
   else
  predicate_enable = 0;
 
-  BEGIN_BATCH(7);
   OUT_BATCH(CMD_3D_PRIM << 16 | (7 - 2) | indirect_flag | 
predicate_enable);
   OUT_BATCH(hw_prim | vertex_access_type);
} else {
-  BEGIN_BATCH(6);
   OUT_BATCH(CMD_3D_PRIM << 16 | (6 - 2) |
 hw_prim << GEN4_3DPRIM_TOPOLOGY_TYPE_SHIFT |
 vertex_access_type);
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] i965: Add and use USED_BATCH macro.

2015-07-13 Thread Matt Turner

The next patch will replace the .used field with an on-demand
calculation of batchbuffer usage.
---
 src/mesa/drivers/dri/i965/brw_blorp.cpp |  4 ++--
 src/mesa/drivers/dri/i965/brw_performance_monitor.c |  6 +++---
 src/mesa/drivers/dri/i965/brw_state_batch.c |  4 ++--
 src/mesa/drivers/dri/i965/brw_urb.c |  4 ++--
 src/mesa/drivers/dri/i965/intel_batchbuffer.c   | 20 ++--
 src/mesa/drivers/dri/i965/intel_batchbuffer.h   |  9 ++---
 6 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp.cpp
index 2ccfae1..eac1f00 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp
@@ -226,7 +226,7 @@ retry:
intel_batchbuffer_require_space(brw, estimated_max_batch_usage, 
RENDER_RING);
intel_batchbuffer_save_state(brw);
drm_intel_bo *saved_bo = brw->batch.bo;
-   uint32_t saved_used = brw->batch.used;
+   uint32_t saved_used = USED_BATCH(brw->batch);
uint32_t saved_state_batch_offset = brw->batch.state_batch_offset;
 
switch (brw->gen) {
@@ -245,7 +245,7 @@ retry:
 * reserved enough space that a wrap will never happen.
 */
assert(brw->batch.bo == saved_bo);
-   assert((brw->batch.used - saved_used) * 4 +
+   assert((USED_BATCH(brw->batch) - saved_used) * 4 +
   (saved_state_batch_offset - brw->batch.state_batch_offset) <
   estimated_max_batch_usage);
/* Shut up compiler warnings on release build */
diff --git a/src/mesa/drivers/dri/i965/brw_performance_monitor.c 
b/src/mesa/drivers/dri/i965/brw_performance_monitor.c
index 0a12375..7e90e8a 100644
--- a/src/mesa/drivers/dri/i965/brw_performance_monitor.c
+++ b/src/mesa/drivers/dri/i965/brw_performance_monitor.c
@@ -710,7 +710,7 @@ emit_mi_report_perf_count(struct brw_context *brw,
/* Make sure the commands to take a snapshot fits in a single batch. */
intel_batchbuffer_require_space(brw, MI_REPORT_PERF_COUNT_BATCH_DWORDS * 4,
RENDER_RING);
-   int batch_used = brw->batch.used;
+   int batch_used = USED_BATCH(brw->batch);
 
/* Reports apparently don't always get written unless we flush first. */
brw_emit_mi_flush(brw);
@@ -754,7 +754,7 @@ emit_mi_report_perf_count(struct brw_context *brw,
brw_emit_mi_flush(brw);
 
(void) batch_used;
-   assert(brw->batch.used - batch_used <= MI_REPORT_PERF_COUNT_BATCH_DWORDS * 
4);
+   assert(USED_BATCH(brw->batch) - batch_used <= 
MI_REPORT_PERF_COUNT_BATCH_DWORDS * 4);
 }
 
 /**
@@ -1386,7 +1386,7 @@ void
 brw_perf_monitor_new_batch(struct brw_context *brw)
 {
assert(brw->batch.ring == RENDER_RING);
-   assert(brw->gen < 6 || brw->batch.used == 0);
+   assert(brw->gen < 6 || USED_BATCH(brw->batch) == 0);
 
if (brw->perfmon.oa_users == 0)
   return;
diff --git a/src/mesa/drivers/dri/i965/brw_state_batch.c 
b/src/mesa/drivers/dri/i965/brw_state_batch.c
index a405a80..d79e0ea 100644
--- a/src/mesa/drivers/dri/i965/brw_state_batch.c
+++ b/src/mesa/drivers/dri/i965/brw_state_batch.c
@@ -87,7 +87,7 @@ brw_annotate_aub(struct brw_context *brw)
drm_intel_aub_annotation annotations[annotation_count];
int a = 0;
make_annotation(&annotations[a++], AUB_TRACE_TYPE_BATCH, 0,
-   4*brw->batch.used);
+   4 * USED_BATCH(brw->batch));
for (int i = brw->state_batch_count; i-- > 0; ) {
   uint32_t type = brw->state_batch_list[i].type;
   uint32_t start_offset = brw->state_batch_list[i].offset;
@@ -136,7 +136,7 @@ __brw_state_batch(struct brw_context *brw,
 * space, then flush and try again.
 */
if (batch->state_batch_offset < size ||
-   offset < 4*batch->used + batch->reserved_space) {
+   offset < 4 * USED_BATCH(*batch) + batch->reserved_space) {
   intel_batchbuffer_flush(brw);
   offset = ROUND_DOWN_TO(batch->state_batch_offset - size, alignment);
}
diff --git a/src/mesa/drivers/dri/i965/brw_urb.c 
b/src/mesa/drivers/dri/i965/brw_urb.c
index 6fcf1b0..8fc06ba 100644
--- a/src/mesa/drivers/dri/i965/brw_urb.c
+++ b/src/mesa/drivers/dri/i965/brw_urb.c
@@ -249,8 +249,8 @@ void brw_upload_urb_fence(struct brw_context *brw)
uf.bits1.cs_fence  = brw->urb.size;
 
/* erratum: URB_FENCE must not cross a 64byte cacheline */
-   if ((brw->batch.used & 15) > 12) {
-  int pad = 16 - (brw->batch.used & 15);
+   if ((USED_BATCH(brw->batch) & 15) > 12) {
+  int pad = 16 - (USED_BATCH(brw->batch) & 15);
   do
 brw->batch.map[brw->batch.used++] = MI_NOOP;
   while (--pad);
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index f82958f..628a7b7 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -94,7 +94,7 @@ intel_batchbuffer_reset_to_saved(struct brw_context *brw)
drm_intel_gem_bo_clear_relocs(brw->batch.bo, brw->b

[Mesa-dev] [PATCH 4/4] i965: Optimize batchbuffer macros.

2015-07-13 Thread Matt Turner

Previously OUT_BATCH was just a macro around an inline function which
does

   brw->batch.map[brw->batch.used++] = dword;

When making consecutive calls to intel_batchbuffer_emit_dword() the
compiler isn't able to recognize that we're writing consecutive memory
locations or that it doesn't need to write batch.used back to memory
each time.

We can avoid both of these problems by making a local pointer to the
next location in the batch in BEGIN_BATCH().

Cuts 18k from the .text size.

   text data  bss  dec  hex  filename
4946956   19515226192  5168300   4edcac  i965_dri.so before
4928956   19515226192  5150300   4e965c  i965_dri.so after

This series (including commit c0433948) improves performance of Synmark
OglBatch7 by 8.01389% +/- 0.63922% (n=83) on Ivybridge.
---
Again, Chris makes an excellent suggestion that in turn improves performance!

 src/mesa/drivers/dri/i965/brw_context.h   |  5 ++-
 src/mesa/drivers/dri/i965/brw_draw_upload.c   | 12 --
 src/mesa/drivers/dri/i965/brw_urb.c   |  2 +-
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 19 -
 src/mesa/drivers/dri/i965/intel_batchbuffer.h | 55 ++-
 src/mesa/drivers/dri/i965/intel_blit.c| 19 -
 6 files changed, 70 insertions(+), 42 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 44d1aea..34a49b2 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -873,7 +873,8 @@ struct intel_batchbuffer {
 #ifdef DEBUG
uint16_t emit, total;
 #endif
-   uint16_t used, reserved_space;
+   uint16_t reserved_space;
+   uint32_t *map_next;
uint32_t *map;
uint32_t *cpu_map;
 #define BATCH_SZ (8192*sizeof(uint32_t))
@@ -883,7 +884,7 @@ struct intel_batchbuffer {
bool needs_sol_reset;
 
struct {
-  uint16_t used;
+  uint32_t *map_next;
   int reloc_count;
} saved;
 };
diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c 
b/src/mesa/drivers/dri/i965/brw_draw_upload.c
index 320e40e..7cb091d 100644
--- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
@@ -604,14 +604,15 @@ brw_prepare_shader_draw_parameters(struct brw_context 
*brw)
 /**
  * Emit a VERTEX_BUFFER_STATE entry (part of 3DSTATE_VERTEX_BUFFERS).
  */
-static void
+static uint32_t *
 emit_vertex_buffer_state(struct brw_context *brw,
  unsigned buffer_nr,
  drm_intel_bo *bo,
  unsigned bo_ending_address,
  unsigned bo_offset,
  unsigned stride,
- unsigned step_rate)
+ unsigned step_rate,
+ uint32_t *__map)
 {
struct gl_context *ctx = &brw->ctx;
uint32_t dw0;
@@ -643,7 +644,10 @@ emit_vertex_buffer_state(struct brw_context *brw,
   OUT_BATCH(0);
}
OUT_BATCH(step_rate);
+
+   return __map;
 }
+#define EMIT_VERTEX_BUFFER_STATE(...) __map = 
emit_vertex_buffer_state(__VA_ARGS__, __map)
 
 static void brw_emit_vertices(struct brw_context *brw)
 {
@@ -704,14 +708,14 @@ static void brw_emit_vertices(struct brw_context *brw)
   OUT_BATCH((_3DSTATE_VERTEX_BUFFERS << 16) | (4 * nr_buffers - 1));
   for (i = 0; i < brw->vb.nr_buffers; i++) {
 struct brw_vertex_buffer *buffer = &brw->vb.buffers[i];
- emit_vertex_buffer_state(brw, i, buffer->bo, buffer->bo->size - 1,
+ EMIT_VERTEX_BUFFER_STATE(brw, i, buffer->bo, buffer->bo->size - 1,
   buffer->offset, buffer->stride,
   buffer->step_rate);
 
   }
 
   if (brw->vs.prog_data->uses_vertexid) {
- emit_vertex_buffer_state(brw, brw->vb.nr_buffers,
+ EMIT_VERTEX_BUFFER_STATE(brw, brw->vb.nr_buffers,
   brw->draw.draw_params_bo,
   brw->draw.draw_params_bo->size - 1,
   brw->draw.draw_params_offset,
diff --git a/src/mesa/drivers/dri/i965/brw_urb.c 
b/src/mesa/drivers/dri/i965/brw_urb.c
index 8fc06ba..6078c38 100644
--- a/src/mesa/drivers/dri/i965/brw_urb.c
+++ b/src/mesa/drivers/dri/i965/brw_urb.c
@@ -252,7 +252,7 @@ void brw_upload_urb_fence(struct brw_context *brw)
if ((USED_BATCH(brw->batch) & 15) > 12) {
   int pad = 16 - (USED_BATCH(brw->batch) & 15);
   do
-brw->batch.map[brw->batch.used++] = MI_NOOP;
+ *brw->batch.map_next++ = MI_NOOP;
   while (--pad);
}
 
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 628a7b7..088ffd2 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -48,6 +48,7 @@ intel_batchbuffer_init(struct brw_context *brw)
if (!brw->has_llc) {
   brw->batch.cpu_map = malloc(BATCH_SZ);
   b

Re: [Mesa-dev] [PATCH 4/5] i965: Split batch emission from relocation functions.

2015-07-13 Thread Matt Turner

On Sat, Jul 11, 2015 at 11:08 AM, Chris Wilson  wrote:
> On Fri, Jul 10, 2015 at 11:44:58AM -0700, Matt Turner wrote:
>> So that everything writing to the batch between BEGIN_BATCH() and
>> ADVANCE_BATCH() goes through OUT_BATCH.
>
> Reviewed-by: Chris Wilson 
>
>> +#define OUT_RELOC64(buf, read_domains, write_domain, delta) do {\
>> +   uint64_t reloc64 = intel_batchbuffer_reloc64(brw, buf, read_domains, \
>> +write_domain, delta);   \
>> +   OUT_BATCH(reloc64);  \
>> +   OUT_BATCH(reloc64 >> 32);\
>>  } while (0)
>
> Is the compiler smart enough to convert this to movq?

It is not, but I'm not totally sure that's better -- an unaligned movq
might cross a cache boundary which is apparently pretty expensive.

> The pointer emission variant at least makes it trivial to use
>*((uint64_t *)map)++ = reloc64
> if so desired.

I'll do some performance tests on top of my v3 series.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/5] i965: Optimize batchbuffer macros.

2015-07-13 Thread Matt Turner

On Sat, Jul 11, 2015 at 11:02 AM, Chris Wilson  wrote:
> On Fri, Jul 10, 2015 at 11:44:59AM -0700, Matt Turner wrote:
>> Previously OUT_BATCH was just a macro around an inline function which
>> does
>>
>>brw->batch.map[brw->batch.used++] = dword;
>>
>> When making consecutive calls to intel_batchbuffer_emit_dword() the
>> compiler isn't able to recognize that we're writing consecutive memory
>> locations or that it doesn't need to write batch.used back to memory
>> each time.
>>
>> We can avoid both of these problems by making a local pointer to the
>> next location in the batch in BEGIN_BATCH(), indexing it with a local
>> variable, and incrementing batch.used once in ADVANCE_BATCH().
>>
>> Cuts 18k from the .text size.
>>
>>text data  bss  dec  hex  filename
>> 4946956   19515226192  5168300   4edcac  i965_dri.so before
>> 4928588   19515226192  5149932   4e94ec  i965_dri.so after
>>
>> This series (including commit c0433948) improves performance of Synmark
>> OglBatch7 by 3.64514% +/- 0.298131% (n=282) on Ivybridge.
>> ---
>> That -4.19005% +/- 1.15188% (n=30) regression on Ivybridge is now a
>> performance improvement! Thanks Chris for the help!
>>
>>  src/mesa/drivers/dri/i965/intel_batchbuffer.c |  8 ++---
>>  src/mesa/drivers/dri/i965/intel_batchbuffer.h | 52 
>> +++
>>  2 files changed, 41 insertions(+), 19 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
>> b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
>> index f82958f..93f2872 100644
>> --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
>> +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
>> @@ -395,13 +395,13 @@ _intel_batchbuffer_flush(struct brw_context *brw,
>>   */
>>  uint32_t
>>  intel_batchbuffer_reloc(struct brw_context *brw,
>> -drm_intel_bo *buffer,
>> +drm_intel_bo *buffer, uint32_t offset,
>>  uint32_t read_domains, uint32_t write_domain,
>>  uint32_t delta)
>>  {
>> int ret;
>>
>> -   ret = drm_intel_bo_emit_reloc(brw->batch.bo, 4*brw->batch.used,
>> +   ret = drm_intel_bo_emit_reloc(brw->batch.bo, offset,
>>buffer, delta,
>>read_domains, write_domain);
>> assert(ret == 0);
>> @@ -416,11 +416,11 @@ intel_batchbuffer_reloc(struct brw_context *brw,
>>
>>  uint64_t
>>  intel_batchbuffer_reloc64(struct brw_context *brw,
>> -  drm_intel_bo *buffer,
>> +  drm_intel_bo *buffer, uint32_t offset,
>>uint32_t read_domains, uint32_t write_domain,
>>uint32_t delta)
>>  {
>> -   int ret = drm_intel_bo_emit_reloc(brw->batch.bo, 4*brw->batch.used,
>> +   int ret = drm_intel_bo_emit_reloc(brw->batch.bo, offset,
>>   buffer, delta,
>>   read_domains, write_domain);
>> assert(ret == 0);
>> diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h 
>> b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
>> index c0456f3..6342c97 100644
>> --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h
>> +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
>> @@ -59,14 +59,16 @@ void intel_batchbuffer_data(struct brw_context *brw,
>>
>>  uint32_t intel_batchbuffer_reloc(struct brw_context *brw,
>>   drm_intel_bo *buffer,
>> + uint32_t offset,
>>   uint32_t read_domains,
>>   uint32_t write_domain,
>> - uint32_t offset);
>> + uint32_t delta);
>>  uint64_t intel_batchbuffer_reloc64(struct brw_context *brw,
>> drm_intel_bo *buffer,
>> +   uint32_t offset,
>> uint32_t read_domains,
>> uint32_t write_domain,
>> -   uint32_t offset);
>> +   uint32_t delta);
>>  static inline uint32_t float_as_int(float f)
>>  {
>> union {
>> @@ -160,23 +162,43 @@ intel_batchbuffer_advance(struct brw_context *brw)
>>  #endif
>>  }
>>
>> -#define BEGIN_BATCH(n) intel

[Mesa-dev] [PATCH] i965: Mark constant static data as const.

2015-07-13 Thread Matt Turner

---
 src/mesa/drivers/dri/i965/brw_curbe.c   |  2 +-
 src/mesa/drivers/dri/i965/brw_draw_upload.c | 44 ++---
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_curbe.c 
b/src/mesa/drivers/dri/i965/brw_curbe.c
index befd7a9..a149ce3 100644
--- a/src/mesa/drivers/dri/i965/brw_curbe.c
+++ b/src/mesa/drivers/dri/i965/brw_curbe.c
@@ -176,7 +176,7 @@ void brw_upload_cs_urb_state(struct brw_context *brw)
ADVANCE_BATCH();
 }
 
-static GLfloat fixed_plane[6][4] = {
+static const GLfloat fixed_plane[6][4] = {
{ 0,0,   -1, 1 },
{ 0,0,1, 1 },
{ 0,   -1,0, 1 },
diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c 
b/src/mesa/drivers/dri/i965/brw_draw_upload.c
index 7cb091d..c95f0c3 100644
--- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
@@ -40,7 +40,7 @@
 #include "intel_batchbuffer.h"
 #include "intel_buffer_objects.h"
 
-static GLuint double_types[5] = {
+static const GLuint double_types[5] = {
0,
BRW_SURFACEFORMAT_R64_FLOAT,
BRW_SURFACEFORMAT_R64G64_FLOAT,
@@ -48,7 +48,7 @@ static GLuint double_types[5] = {
BRW_SURFACEFORMAT_R64G64B64A64_FLOAT
 };
 
-static GLuint float_types[5] = {
+static const GLuint float_types[5] = {
0,
BRW_SURFACEFORMAT_R32_FLOAT,
BRW_SURFACEFORMAT_R32G32_FLOAT,
@@ -56,7 +56,7 @@ static GLuint float_types[5] = {
BRW_SURFACEFORMAT_R32G32B32A32_FLOAT
 };
 
-static GLuint half_float_types[5] = {
+static const GLuint half_float_types[5] = {
0,
BRW_SURFACEFORMAT_R16_FLOAT,
BRW_SURFACEFORMAT_R16G16_FLOAT,
@@ -64,7 +64,7 @@ static GLuint half_float_types[5] = {
BRW_SURFACEFORMAT_R16G16B16A16_FLOAT
 };
 
-static GLuint fixed_point_types[5] = {
+static const GLuint fixed_point_types[5] = {
0,
BRW_SURFACEFORMAT_R32_SFIXED,
BRW_SURFACEFORMAT_R32G32_SFIXED,
@@ -72,7 +72,7 @@ static GLuint fixed_point_types[5] = {
BRW_SURFACEFORMAT_R32G32B32A32_SFIXED,
 };
 
-static GLuint uint_types_direct[5] = {
+static const GLuint uint_types_direct[5] = {
0,
BRW_SURFACEFORMAT_R32_UINT,
BRW_SURFACEFORMAT_R32G32_UINT,
@@ -80,7 +80,7 @@ static GLuint uint_types_direct[5] = {
BRW_SURFACEFORMAT_R32G32B32A32_UINT
 };
 
-static GLuint uint_types_norm[5] = {
+static const GLuint uint_types_norm[5] = {
0,
BRW_SURFACEFORMAT_R32_UNORM,
BRW_SURFACEFORMAT_R32G32_UNORM,
@@ -88,7 +88,7 @@ static GLuint uint_types_norm[5] = {
BRW_SURFACEFORMAT_R32G32B32A32_UNORM
 };
 
-static GLuint uint_types_scale[5] = {
+static const GLuint uint_types_scale[5] = {
0,
BRW_SURFACEFORMAT_R32_USCALED,
BRW_SURFACEFORMAT_R32G32_USCALED,
@@ -96,7 +96,7 @@ static GLuint uint_types_scale[5] = {
BRW_SURFACEFORMAT_R32G32B32A32_USCALED
 };
 
-static GLuint int_types_direct[5] = {
+static const GLuint int_types_direct[5] = {
0,
BRW_SURFACEFORMAT_R32_SINT,
BRW_SURFACEFORMAT_R32G32_SINT,
@@ -104,7 +104,7 @@ static GLuint int_types_direct[5] = {
BRW_SURFACEFORMAT_R32G32B32A32_SINT
 };
 
-static GLuint int_types_norm[5] = {
+static const GLuint int_types_norm[5] = {
0,
BRW_SURFACEFORMAT_R32_SNORM,
BRW_SURFACEFORMAT_R32G32_SNORM,
@@ -112,7 +112,7 @@ static GLuint int_types_norm[5] = {
BRW_SURFACEFORMAT_R32G32B32A32_SNORM
 };
 
-static GLuint int_types_scale[5] = {
+static const GLuint int_types_scale[5] = {
0,
BRW_SURFACEFORMAT_R32_SSCALED,
BRW_SURFACEFORMAT_R32G32_SSCALED,
@@ -120,7 +120,7 @@ static GLuint int_types_scale[5] = {
BRW_SURFACEFORMAT_R32G32B32A32_SSCALED
 };
 
-static GLuint ushort_types_direct[5] = {
+static const GLuint ushort_types_direct[5] = {
0,
BRW_SURFACEFORMAT_R16_UINT,
BRW_SURFACEFORMAT_R16G16_UINT,
@@ -128,7 +128,7 @@ static GLuint ushort_types_direct[5] = {
BRW_SURFACEFORMAT_R16G16B16A16_UINT
 };
 
-static GLuint ushort_types_norm[5] = {
+static const GLuint ushort_types_norm[5] = {
0,
BRW_SURFACEFORMAT_R16_UNORM,
BRW_SURFACEFORMAT_R16G16_UNORM,
@@ -136,7 +136,7 @@ static GLuint ushort_types_norm[5] = {
BRW_SURFACEFORMAT_R16G16B16A16_UNORM
 };
 
-static GLuint ushort_types_scale[5] = {
+static const GLuint ushort_types_scale[5] = {
0,
BRW_SURFACEFORMAT_R16_USCALED,
BRW_SURFACEFORMAT_R16G16_USCALED,
@@ -144,7 +144,7 @@ static GLuint ushort_types_scale[5] = {
BRW_SURFACEFORMAT_R16G16B16A16_USCALED
 };
 
-static GLuint short_types_direct[5] = {
+static const GLuint short_types_direct[5] = {
0,
BRW_SURFACEFORMAT_R16_SINT,
BRW_SURFACEFORMAT_R16G16_SINT,
@@ -152,7 +152,7 @@ static GLuint short_types_direct[5] = {
BRW_SURFACEFORMAT_R16G16B16A16_SINT
 };
 
-static GLuint short_types_norm[5] = {
+static const GLuint short_types_norm[5] = {
0,
BRW_SURFACEFORMAT_R16_SNORM,
BRW_SURFACEFORMAT_R16G16_SNORM,
@@ -160,7 +160,7 @@ static GLuint short_types_norm[5] = {
BRW_SURFACEFORMAT_R16G16B16A16_SNORM
 };
 
-static GLuint short_types_scale[5] = {
+static const GL

[Mesa-dev] [PATCH 11/13] program: Avoid double promotion.

2015-07-13 Thread Matt Turner

---
 src/mesa/program/prog_execute.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/src/mesa/program/prog_execute.c b/src/mesa/program/prog_execute.c
index 77274e2..2c52d0d 100644
--- a/src/mesa/program/prog_execute.c
+++ b/src/mesa/program/prog_execute.c
@@ -623,7 +623,7 @@ _mesa_execute_program(struct gl_context * ctx,
 GLfloat a[4], result[4];
 fetch_vector1(&inst->SrcReg[0], machine, a);
 result[0] = result[1] = result[2] = result[3]
-   = (GLfloat) cos(a[0]);
+   = cosf(a[0]);
 store_vector4(inst, machine, result);
  }
  break;
@@ -776,7 +776,7 @@ _mesa_execute_program(struct gl_context * ctx,
 if (inst->SrcReg[0].File != PROGRAM_UNDEFINED) {
GLfloat a[4];
fetch_vector1(&inst->SrcReg[0], machine, a);
-   cond = (a[0] != 0.0);
+   cond = (a[0] != 0.0F);
 }
 else {
cond = eval_condition(machine, inst);
@@ -834,7 +834,7 @@ _mesa_execute_program(struct gl_context * ctx,
val = -FLT_MAX;
 }
 else {
-   val = (float)(log(a[0]) * 1.442695F);
+   val = logf(a[0]) * 1.442695F;
 }
 result[0] = result[1] = result[2] = result[3] = val;
 store_vector4(inst, machine, result);
@@ -853,10 +853,10 @@ _mesa_execute_program(struct gl_context * ctx,
 result[1] = a[0];
 /* XXX we could probably just use pow() here */
 if (a[0] > 0.0F) {
-   if (a[1] == 0.0 && a[3] == 0.0)
+   if (a[1] == 0.0F && a[3] == 0.0F)
   result[2] = 1.0F;
else
-  result[2] = (GLfloat) pow(a[1], a[3]);
+  result[2] = powf(a[1], a[3]);
 }
 else {
result[2] = 0.0F;
@@ -886,12 +886,12 @@ _mesa_execute_program(struct gl_context * ctx,
   int exponent;
   GLfloat mantissa = frexpf(t[0], &exponent);
   q[0] = (GLfloat) (exponent - 1);
-  q[1] = (GLfloat) (2.0 * mantissa); /* map [.5, 1) -> [1, 2) 
*/
+  q[1] = 2.0F * mantissa; /* map [.5, 1) -> [1, 2) */
 
  /* The fast LOG2 macro doesn't meet the precision
   * requirements.
   */
-  q[2] = (float)(log(t[0]) * 1.442695F);
+  q[2] = logf(t[0]) * 1.442695F;
}
 }
 else {
@@ -1051,7 +1051,7 @@ _mesa_execute_program(struct gl_context * ctx,
 fetch_vector1(&inst->SrcReg[0], machine, a);
 fetch_vector1(&inst->SrcReg[1], machine, b);
 result[0] = result[1] = result[2] = result[3]
-   = (GLfloat) pow(a[0], b[0]);
+   = powf(a[0], b[0]);
 store_vector4(inst, machine, result);
  }
  break;
@@ -1095,10 +1095,10 @@ _mesa_execute_program(struct gl_context * ctx,
  {
 GLfloat a[4], result[4];
 fetch_vector1(&inst->SrcReg[0], machine, a);
-result[0] = (GLfloat) cos(a[0]);
-result[1] = (GLfloat) sin(a[0]);
-result[2] = 0.0;/* undefined! */
-result[3] = 0.0;/* undefined! */
+result[0] = cosf(a[0]);
+result[1] = sinf(a[0]);
+result[2] = 0.0F;/* undefined! */
+result[3] = 0.0F;/* undefined! */
 store_vector4(inst, machine, result);
  }
  break;
@@ -1161,7 +1161,7 @@ _mesa_execute_program(struct gl_context * ctx,
 GLfloat a[4], result[4];
 fetch_vector1(&inst->SrcReg[0], machine, a);
 result[0] = result[1] = result[2] = result[3]
-   = (GLfloat) sin(a[0]);
+   = sinf(a[0]);
 store_vector4(inst, machine, result);
  }
  break;
@@ -1360,7 +1360,7 @@ _mesa_execute_program(struct gl_context * ctx,
  * zero, we'd probably be fine except for an assert in
  * IROUND_POS() which gets triggered by the inf values created.
  */
-if (texcoord[3] != 0.0) {
+if (texcoord[3] != 0.0F) {
texcoord[0] /= texcoord[3];
texcoord[1] /= texcoord[3];
texcoord[2] /= texcoord[3];
@@ -1380,7 +1380,7 @@ _mesa_execute_program(struct gl_context * ctx,
 
 fetch_vector4(&inst->SrcReg[0], machine, texcoord);
 if (inst->TexSrcTarget != TEXTURE_CUBE_INDEX &&
-texcoord[3] != 0.0) {
+texcoord[3] != 0.0F) {
texcoord[0] /= texcoord[3];
texcoord[1] /= texcoord[3];
texcoord[2] /= texcoord[3];
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.fr

[Mesa-dev] [PATCH 04/13] i965: Use float calculations when double is unnecessary.

2015-07-13 Thread Matt Turner

Literals without an f/F suffix are of type double, and implicit
conversion rules specify that the float in (float op double) be
converted to a double before the operation is performed. I believe float
execution was intended (in nearly all cases) or is sufficient (in the
case of gen7_urb.c).

Removes a lot of float <-> double conversion instructions and replaces
many double instructions with float instructions which are cheaper.

   text data  bss  dec  hex  filename
4928659   19516026192  5150011   4e953b  i965_dri.so before
4928315   19515226192  5149659   4e93db  i965_dri.so after
---
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp   | 22 +++---
 src/mesa/drivers/dri/i965/brw_fs.cpp   |  4 ++--
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c|  4 ++--
 src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c  |  4 ++--
 src/mesa/drivers/dri/i965/brw_misc_state.c |  4 ++--
 src/mesa/drivers/dri/i965/brw_sampler_state.c  |  4 ++--
 src/mesa/drivers/dri/i965/brw_sf_state.c   |  9 +
 src/mesa/drivers/dri/i965/brw_state_cache.c|  2 +-
 src/mesa/drivers/dri/i965/brw_util.h   |  4 ++--
 src/mesa/drivers/dri/i965/gen6_multisample_state.c |  4 ++--
 src/mesa/drivers/dri/i965/gen6_sf_state.c  |  2 +-
 src/mesa/drivers/dri/i965/gen7_sf_state.c  |  2 +-
 src/mesa/drivers/dri/i965/gen7_urb.c   |  2 +-
 src/mesa/drivers/dri/i965/gen8_sf_state.c  |  2 +-
 14 files changed, 35 insertions(+), 34 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index 1561b59..205c905 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -1285,8 +1285,8 @@ brw_blorp_blit_program::translate_dst_to_src()
   /* Round the float coordinates down to nearest integer */
   emit_rndd(Xp_f, X_f);
   emit_rndd(Yp_f, Y_f);
-  emit_mul(X_f, Xp_f, brw_imm_f(1 / key->x_scale));
-  emit_mul(Y_f, Yp_f, brw_imm_f(1 / key->y_scale));
+  emit_mul(X_f, Xp_f, brw_imm_f(1.0f / key->x_scale));
+  emit_mul(Y_f, Yp_f, brw_imm_f(1.0f / key->y_scale));
   SWAP_XY_AND_XPYP();
} else if (!key->bilinear_filter) {
   /* Round the float coordinates down to nearest integer by moving to
@@ -1442,7 +1442,7 @@ brw_blorp_blit_program::manual_blend_average(unsigned 
num_samples)
   for (int j = 0; j < 4; ++j) {
  emit_mul(offset(texture_data[0], 2*j),
  offset(vec8(texture_data[0]), 2*j),
- brw_imm_f(1.0/num_samples));
+ brw_imm_f(1.0f / num_samples));
   }
}
 
@@ -1475,9 +1475,9 @@ brw_blorp_blit_program::manual_blend_bilinear(unsigned 
num_samples)
 
   /* Compute pixel coordinates */
   emit_add(vec16(x_sample_coords), Xp_f,
-  brw_imm_f((float)(i & 0x1) * (1.0 / key->x_scale)));
+  brw_imm_f((float)(i & 0x1) * (1.0f / key->x_scale)));
   emit_add(vec16(y_sample_coords), Yp_f,
-  brw_imm_f((float)((i >> 1) & 0x1) * (1.0 / key->y_scale)));
+  brw_imm_f((float)((i >> 1) & 0x1) * (1.0f / key->y_scale)));
   emit_mov(vec16(X), x_sample_coords);
   emit_mov(vec16(Y), y_sample_coords);
 
@@ -1789,7 +1789,7 @@ brw_blorp_coord_transform_params::setup(GLfloat src0, 
GLfloat src1,
* so 0.5 provides the necessary correction.
*/
   multiplier = scale;
-  offset = src0 + (-dst0 + 0.5) * scale;
+  offset = src0 + (-dst0 + 0.5f) * scale;
} else {
   /* When mirroring X we need:
*   src_x - src_x0 = dst_x1 - dst_x - 0.5
@@ -1797,7 +1797,7 @@ brw_blorp_coord_transform_params::setup(GLfloat src0, 
GLfloat src1,
*   src_x = src_x0 + (dst_x1 -dst_x - 0.5) * scale
*/
   multiplier = -scale;
-  offset = src0 + (dst1 - 0.5) * scale;
+  offset = src0 + (dst1 - 0.5f) * scale;
}
 }
 
@@ -1952,8 +1952,8 @@ brw_blorp_blit_params::brw_blorp_blit_params(struct 
brw_context *brw,
/* Scaling factors used for bilinear filtering in multisample scaled
 * blits.
 */
-   wm_prog_key.x_scale = 2.0;
-   wm_prog_key.y_scale = src_mt->num_samples / 2.0;
+   wm_prog_key.x_scale = 2.0f;
+   wm_prog_key.y_scale = src_mt->num_samples / 2.0f;
 
if (filter == GL_LINEAR && src.num_samples <= 1 && dst.num_samples <= 1)
   wm_prog_key.bilinear_filter = true;
@@ -2000,9 +2000,9 @@ brw_blorp_blit_params::brw_blorp_blit_params(struct 
brw_context *brw,
x1 = wm_push_consts.dst_x1 = roundf(dst_x1);
y1 = wm_push_consts.dst_y1 = roundf(dst_y1);
wm_push_consts.rect_grid_x1 = (minify(src_mt->logical_width0, src_level) *
-  wm_prog_key.x_scale - 1.0);
+  wm_prog_key.x_scale - 1.0f);
wm_push_consts.rect_grid_y1 = (minify(src_mt->logical_height0, src_level) *
-  wm_prog_key.y_scale - 1.0);
+

[Mesa-dev] [PATCH 06/13] gallium/auxiliary: Avoid double promotion.

2015-07-13 Thread Matt Turner

---
 src/gallium/auxiliary/util/u_format_rgb9e5.h | 2 +-
 src/gallium/auxiliary/util/u_math.h  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_format_rgb9e5.h 
b/src/gallium/auxiliary/util/u_format_rgb9e5.h
index 7a01f7f..d1ace3f 100644
--- a/src/gallium/auxiliary/util/u_format_rgb9e5.h
+++ b/src/gallium/auxiliary/util/u_format_rgb9e5.h
@@ -75,7 +75,7 @@ typedef union {
 
 static INLINE float rgb9e5_ClampRange(float x)
 {
-   if (x > 0.0) {
+   if (x > 0.0f) {
   if (x >= MAX_RGB9E5) {
  return MAX_RGB9E5;
   } else {
diff --git a/src/gallium/auxiliary/util/u_math.h 
b/src/gallium/auxiliary/util/u_math.h
index 3b4040f..9c3cb6a 100644
--- a/src/gallium/auxiliary/util/u_math.h
+++ b/src/gallium/auxiliary/util/u_math.h
@@ -240,7 +240,7 @@ util_iround(float f)
 static INLINE boolean
 util_is_approx(float a, float b, float tol)
 {
-   return fabs(b - a) <= tol;
+   return fabsf(b - a) <= tol;
 }
 
 
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 03/13] gallium/auxiliary: Use exp2(x) instead of pow(2.0, x).

2015-07-13 Thread Matt Turner

---
 src/gallium/auxiliary/util/u_format_rgb9e5.h | 6 +++---
 src/gallium/auxiliary/util/u_math.c  | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_format_rgb9e5.h 
b/src/gallium/auxiliary/util/u_format_rgb9e5.h
index c2a3f6f..7a01f7f 100644
--- a/src/gallium/auxiliary/util/u_format_rgb9e5.h
+++ b/src/gallium/auxiliary/util/u_format_rgb9e5.h
@@ -115,8 +115,8 @@ static INLINE unsigned float3_to_rgb9e5(const float rgb[3])
exp_shared = MAX2(-RGB9E5_EXP_BIAS-1, rgb9e5_FloorLog2(maxrgb)) + 1 + 
RGB9E5_EXP_BIAS;
assert(exp_shared <= RGB9E5_MAX_VALID_BIASED_EXP);
assert(exp_shared >= 0);
-   /* This pow function could be replaced by a table. */
-   denom = pow(2, exp_shared - RGB9E5_EXP_BIAS - RGB9E5_MANTISSA_BITS);
+   /* This exp2 function could be replaced by a table. */
+   denom = exp2(exp_shared - RGB9E5_EXP_BIAS - RGB9E5_MANTISSA_BITS);
 
maxm = (int) floor(maxrgb / denom + 0.5);
if (maxm == MAX_RGB9E5_MANTISSA+1) {
@@ -154,7 +154,7 @@ static INLINE void rgb9e5_to_float3(unsigned rgb, float 
retval[3])
 
v.raw = rgb;
exponent = v.field.biasedexponent - RGB9E5_EXP_BIAS - RGB9E5_MANTISSA_BITS;
-   scale = (float) pow(2, exponent);
+   scale = exp2f(exponent);
 
retval[0] = v.field.r * scale;
retval[1] = v.field.g * scale;
diff --git a/src/gallium/auxiliary/util/u_math.c 
b/src/gallium/auxiliary/util/u_math.c
index ae9e951..c58af91 100644
--- a/src/gallium/auxiliary/util/u_math.c
+++ b/src/gallium/auxiliary/util/u_math.c
@@ -48,7 +48,7 @@ init_pow2_table(void)
 {
int i;
for (i = 0; i < POW2_TABLE_SIZE; i++)
-  pow2_table[i] = (float) pow(2.0, (i - POW2_TABLE_OFFSET) / 
POW2_TABLE_SCALE);
+  pow2_table[i] = exp2f((i - POW2_TABLE_OFFSET) / POW2_TABLE_SCALE);
 }
 
 
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 02/13] program: Use exp2(x) instead of pow(2.0, x).

2015-07-13 Thread Matt Turner

---
 src/mesa/program/prog_execute.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/program/prog_execute.c b/src/mesa/program/prog_execute.c
index 46260b5..77274e2 100644
--- a/src/mesa/program/prog_execute.c
+++ b/src/mesa/program/prog_execute.c
@@ -723,7 +723,7 @@ _mesa_execute_program(struct gl_context * ctx,
 * result.z = result.x * APPX(result.y)
 * We do what the ARB extension says.
 */
-   q[2] = (GLfloat) pow(2.0, t[0]);
+   q[2] = exp2f(t[0]);
 }
 q[1] = t[0] - floor_t0;
 q[3] = 1.0F;
@@ -734,7 +734,7 @@ _mesa_execute_program(struct gl_context * ctx,
  {
 GLfloat a[4], result[4], val;
 fetch_vector1(&inst->SrcReg[0], machine, a);
-val = (GLfloat) pow(2.0, a[0]);
+val = exp2f(a[0]);
 /*
 if (IS_INF_OR_NAN(val))
val = 1.0e10;
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 07/13] util: Avoid double promition.

2015-07-13 Thread Matt Turner

---
 src/util/register_allocate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/util/register_allocate.c b/src/util/register_allocate.c
index 2ad8c3c..95be20f 100644
--- a/src/util/register_allocate.c
+++ b/src/util/register_allocate.c
@@ -648,7 +648,7 @@ ra_get_best_spill_node(struct ra_graph *g)
   float cost = g->nodes[n].spill_cost;
   float benefit;
 
-  if (cost <= 0.0)
+  if (cost <= 0.0f)
 continue;
 
   if (g->nodes[n].in_stack)
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 08/13] vbo: Avoid double promotion.

2015-07-13 Thread Matt Turner

---
 src/mesa/vbo/vbo_context.c| 6 +++---
 src/mesa/vbo/vbo_exec_array.c | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/mesa/vbo/vbo_context.c b/src/mesa/vbo/vbo_context.c
index fd1ffe2..e3eb286 100644
--- a/src/mesa/vbo/vbo_context.c
+++ b/src/mesa/vbo/vbo_context.c
@@ -37,9 +37,9 @@
 
 static GLuint check_size( const GLfloat *attr )
 {
-   if (attr[3] != 1.0) return 4;
-   if (attr[2] != 0.0) return 3;
-   if (attr[1] != 0.0) return 2;
+   if (attr[3] != 1.0F) return 4;
+   if (attr[2] != 0.0F) return 3;
+   if (attr[1] != 0.0F) return 2;
return 1;   
 }
 
diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c
index 72b8206..b73aa97 100644
--- a/src/mesa/vbo/vbo_exec_array.c
+++ b/src/mesa/vbo/vbo_exec_array.c
@@ -255,7 +255,7 @@ check_array_data(struct gl_context *ctx, struct 
gl_client_array *array,
 GLint k;
 for (k = 0; k < array->Size; k++) {
if (IS_INF_OR_NAN(f[k]) ||
-   f[k] >= 1.0e20 || f[k] <= -1.0e10) {
+   f[k] >= 1.0e20F || f[k] <= -1.0e10F) {
   printf("Bad array data:\n");
   printf("  Element[%u].%u = %f\n", j, k, f[k]);
   printf("  Array %u at %p\n", attrib, (void* ) array);
@@ -263,7 +263,7 @@ check_array_data(struct gl_context *ctx, struct 
gl_client_array *array,
 array->Type, array->Size, array->Stride);
   printf("  Address/offset %p in Buffer Object %u\n",
 array->Ptr, array->BufferObj->Name);
-  f[k] = 1.0; /* XXX replace the bad value! */
+  f[k] = 1.0F; /* XXX replace the bad value! */
}
/*assert(!IS_INF_OR_NAN(f[k]));*/
 }
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 05/13] nir: Avoid double promition.

2015-07-13 Thread Matt Turner

---
 src/glsl/nir/nir_opcodes.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/glsl/nir/nir_opcodes.py b/src/glsl/nir/nir_opcodes.py
index 56e96d9..df5b7e2 100644
--- a/src/glsl/nir/nir_opcodes.py
+++ b/src/glsl/nir/nir_opcodes.py
@@ -474,10 +474,10 @@ else
 """)
 
 opcode("ldexp", 0, tfloat, [0, 0], [tfloat, tint], "", """
-dst = ldexp(src0, src1);
+dst = ldexpf(src0, src1);
 /* flush denormals to zero. */
 if (!isnormal(dst))
-   dst = copysign(0.0f, src0);
+   dst = copysignf(0.0f, src0);
 """)
 
 # Combines the first component of each input to make a 2-component vector.
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 10/13] swrast: Avoid double promotion.

2015-07-13 Thread Matt Turner

---
 src/mesa/swrast/s_aaline.c| 28 ++--
 src/mesa/swrast/s_aalinetemp.h|  4 ++--
 src/mesa/swrast/s_atifragshader.c |  4 ++--
 src/mesa/swrast/s_copypix.c   |  6 +++---
 src/mesa/swrast/s_drawpix.c   | 12 ++--
 src/mesa/swrast/s_fragprog.c  |  4 ++--
 src/mesa/swrast/s_lines.c |  4 ++--
 src/mesa/swrast/s_points.c| 10 +-
 src/mesa/swrast/s_span.c  | 10 +-
 src/mesa/swrast/s_texcombine.c|  6 +++---
 src/mesa/swrast/s_texfilter.c |  8 
 src/mesa/swrast/s_tritemp.h   |  2 +-
 src/mesa/swrast/s_zoom.c  |  2 +-
 13 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/src/mesa/swrast/s_aaline.c b/src/mesa/swrast/s_aaline.c
index f3258e8..de5b42b 100644
--- a/src/mesa/swrast/s_aaline.c
+++ b/src/mesa/swrast/s_aaline.c
@@ -116,11 +116,11 @@ compute_plane(GLfloat x0, GLfloat y0, GLfloat x1, GLfloat 
y1,
const GLfloat b = pz * py;
const GLfloat c = px * px + py * py;
const GLfloat d = -(a * x0 + b * y0 + c * z0);
-   if (a == 0.0 && b == 0.0 && c == 0.0 && d == 0.0) {
-  plane[0] = 0.0;
-  plane[1] = 0.0;
-  plane[2] = 1.0;
-  plane[3] = 0.0;
+   if (a == 0.0F && b == 0.0F && c == 0.0F && d == 0.0F) {
+  plane[0] = 0.0F;
+  plane[1] = 0.0F;
+  plane[2] = 1.0F;
+  plane[3] = 0.0F;
}
else {
   plane[0] = a;
@@ -135,9 +135,9 @@ compute_plane(GLfloat x0, GLfloat y0, GLfloat x1, GLfloat 
y1,
 static inline void
 constant_plane(GLfloat value, GLfloat plane[4])
 {
-   plane[0] = 0.0;
-   plane[1] = 0.0;
-   plane[2] = -1.0;
+   plane[0] = 0.0F;
+   plane[1] = 0.0F;
+   plane[2] = -1.0F;
plane[3] = value;
 }
 
@@ -160,8 +160,8 @@ static inline GLfloat
 solve_plane_recip(GLfloat x, GLfloat y, const GLfloat plane[4])
 {
const GLfloat denom = plane[3] + plane[0] * x + plane[1] * y;
-   if (denom == 0.0)
-  return 0.0;
+   if (denom == 0.0F)
+  return 0.0F;
else
   return -plane[2] / denom;
 }
@@ -374,7 +374,7 @@ segment(struct gl_context *ctx,
   if (x0 < x1) {
  xLeft = x0 - line->halfWidth;
  xRight = x1 + line->halfWidth;
- if (line->dy >= 0.0) {
+ if (line->dy >= 0.0F) {
 yBot = y0 - 3.0F * line->halfWidth;
 yTop = y0 + line->halfWidth;
  }
@@ -386,7 +386,7 @@ segment(struct gl_context *ctx,
   else {
  xLeft = x1 - line->halfWidth;
  xRight = x0 + line->halfWidth;
- if (line->dy <= 0.0) {
+ if (line->dy <= 0.0F) {
 yBot = y1 - 3.0F * line->halfWidth;
 yTop = y1 + line->halfWidth;
  }
@@ -420,7 +420,7 @@ segment(struct gl_context *ctx,
   if (y0 < y1) {
  yBot = y0 - line->halfWidth;
  yTop = y1 + line->halfWidth;
- if (line->dx >= 0.0) {
+ if (line->dx >= 0.0F) {
 xLeft = x0 - 3.0F * line->halfWidth;
 xRight = x0 + line->halfWidth;
  }
@@ -432,7 +432,7 @@ segment(struct gl_context *ctx,
   else {
  yBot = y1 - line->halfWidth;
  yTop = y0 + line->halfWidth;
- if (line->dx <= 0.0) {
+ if (line->dx <= 0.0F) {
 xLeft = x1 - 3.0F * line->halfWidth;
 xRight = x1 + line->halfWidth;
  }
diff --git a/src/mesa/swrast/s_aalinetemp.h b/src/mesa/swrast/s_aalinetemp.h
index f1d078f..bebb131 100644
--- a/src/mesa/swrast/s_aalinetemp.h
+++ b/src/mesa/swrast/s_aalinetemp.h
@@ -44,7 +44,7 @@ NAME(plot)(struct gl_context *ctx, struct LineInfo *line, int 
ix, int iy)
 
(void) swrast;
 
-   if (coverage == 0.0)
+   if (coverage == 0.0F)
   return;
 
line->span.end++;
@@ -123,7 +123,7 @@ NAME(line)(struct gl_context *ctx, const SWvertex *v0, 
const SWvertex *v1)
  ctx->Const.MinLineWidthAA,
  ctx->Const.MaxLineWidthAA);
 
-   if (line.len == 0.0 || IS_INF_OR_NAN(line.len))
+   if (line.len == 0.0F || IS_INF_OR_NAN(line.len))
   return;
 
INIT_SPAN(line.span, GL_LINE);
diff --git a/src/mesa/swrast/s_atifragshader.c 
b/src/mesa/swrast/s_atifragshader.c
index 9e029db..2974dee 100644
--- a/src/mesa/swrast/s_atifragshader.c
+++ b/src/mesa/swrast/s_atifragshader.c
@@ -436,13 +436,13 @@ execute_shader(struct gl_context *ctx, const struct 
ati_fragment_shader *shader,
 for (i = 0; i < 3; i++) {
dst[optype][i] =
   (src[optype][2][i] >
-   0.5) ? src[optype][0][i] : src[optype][1][i];
+   0.5F) ? src[optype][0][i] : src[optype][1][i];
 }
  }
  else {
 dst[optype][3] =
(src[optype][2][3] >
-0.5) ? src[optype][0][3] : src[optype][1][3];
+0.5F) ? src[optype][0][3] : src[optype][1][3];
  }
  break;
 
diff --g

[Mesa-dev] [PATCH 12/13] mesa/math: Avoid double promotion.

2015-07-13 Thread Matt Turner

---
 src/mesa/math/m_clip_tmp.h | 20 ++---
 src/mesa/math/m_matrix.c   | 70 +++---
 src/mesa/math/m_norm_tmp.h |  2 +-
 3 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/src/mesa/math/m_clip_tmp.h b/src/mesa/math/m_clip_tmp.h
index e289be7..60c0004 100644
--- a/src/mesa/math/m_clip_tmp.h
+++ b/src/mesa/math/m_clip_tmp.h
@@ -194,13 +194,13 @@ static GLvector4f * TAG(cliptest_points3)( GLvector4f 
*clip_vec,
STRIDE_LOOP {
   const GLfloat cx = from[0], cy = from[1], cz = from[2];
   GLubyte mask = 0;
-  if (cx >  1.0)   mask |= CLIP_RIGHT_BIT;
-  else if (cx < -1.0)  mask |= CLIP_LEFT_BIT;
-  if (cy >  1.0)   mask |= CLIP_TOP_BIT;
-  else if (cy < -1.0)  mask |= CLIP_BOTTOM_BIT;
+  if (cx >  1.0F)   mask |= CLIP_RIGHT_BIT;
+  else if (cx < -1.0F)  mask |= CLIP_LEFT_BIT;
+  if (cy >  1.0F)   mask |= CLIP_TOP_BIT;
+  else if (cy < -1.0F)  mask |= CLIP_BOTTOM_BIT;
   if (viewport_z_clip) {
-if (cz >  1.0)   mask |= CLIP_FAR_BIT;
-else if (cz < -1.0)  mask |= CLIP_NEAR_BIT;
+if (cz >  1.0F)   mask |= CLIP_FAR_BIT;
+else if (cz < -1.0F)  mask |= CLIP_NEAR_BIT;
   }
   clipMask[i] = mask;
   tmpOrMask |= mask;
@@ -230,10 +230,10 @@ static GLvector4f * TAG(cliptest_points2)( GLvector4f 
*clip_vec,
STRIDE_LOOP {
   const GLfloat cx = from[0], cy = from[1];
   GLubyte mask = 0;
-  if (cx >  1.0)   mask |= CLIP_RIGHT_BIT;
-  else if (cx < -1.0)  mask |= CLIP_LEFT_BIT;
-  if (cy >  1.0)   mask |= CLIP_TOP_BIT;
-  else if (cy < -1.0)  mask |= CLIP_BOTTOM_BIT;
+  if (cx >  1.0F)   mask |= CLIP_RIGHT_BIT;
+  else if (cx < -1.0F)  mask |= CLIP_LEFT_BIT;
+  if (cy >  1.0F)   mask |= CLIP_TOP_BIT;
+  else if (cy < -1.0F)  mask |= CLIP_BOTTOM_BIT;
   clipMask[i] = mask;
   tmpOrMask |= mask;
   tmpAndMask &= mask;
diff --git a/src/mesa/math/m_matrix.c b/src/mesa/math/m_matrix.c
index 6a42c6c..6522200 100644
--- a/src/mesa/math/m_matrix.c
+++ b/src/mesa/math/m_matrix.c
@@ -380,7 +380,7 @@ static GLboolean invert_matrix_general( GLmatrix *mat )
if (fabsf(r3[0])>fabsf(r2[0])) SWAP_ROWS(r3, r2);
if (fabsf(r2[0])>fabsf(r1[0])) SWAP_ROWS(r2, r1);
if (fabsf(r1[0])>fabsf(r0[0])) SWAP_ROWS(r1, r0);
-   if (0.0 == r0[0])  return GL_FALSE;
+   if (0.0F == r0[0])  return GL_FALSE;
 
/* eliminate first variable */
m1 = r1[0]/r0[0]; m2 = r2[0]/r0[0]; m3 = r3[0]/r0[0];
@@ -388,31 +388,31 @@ static GLboolean invert_matrix_general( GLmatrix *mat )
s = r0[2]; r1[2] -= m1 * s; r2[2] -= m2 * s; r3[2] -= m3 * s;
s = r0[3]; r1[3] -= m1 * s; r2[3] -= m2 * s; r3[3] -= m3 * s;
s = r0[4];
-   if (s != 0.0) { r1[4] -= m1 * s; r2[4] -= m2 * s; r3[4] -= m3 * s; }
+   if (s != 0.0F) { r1[4] -= m1 * s; r2[4] -= m2 * s; r3[4] -= m3 * s; }
s = r0[5];
-   if (s != 0.0) { r1[5] -= m1 * s; r2[5] -= m2 * s; r3[5] -= m3 * s; }
+   if (s != 0.0F) { r1[5] -= m1 * s; r2[5] -= m2 * s; r3[5] -= m3 * s; }
s = r0[6];
-   if (s != 0.0) { r1[6] -= m1 * s; r2[6] -= m2 * s; r3[6] -= m3 * s; }
+   if (s != 0.0F) { r1[6] -= m1 * s; r2[6] -= m2 * s; r3[6] -= m3 * s; }
s = r0[7];
-   if (s != 0.0) { r1[7] -= m1 * s; r2[7] -= m2 * s; r3[7] -= m3 * s; }
+   if (s != 0.0F) { r1[7] -= m1 * s; r2[7] -= m2 * s; r3[7] -= m3 * s; }
 
/* choose pivot - or die */
if (fabsf(r3[1])>fabsf(r2[1])) SWAP_ROWS(r3, r2);
if (fabsf(r2[1])>fabsf(r1[1])) SWAP_ROWS(r2, r1);
-   if (0.0 == r1[1])  return GL_FALSE;
+   if (0.0F == r1[1])  return GL_FALSE;
 
/* eliminate second variable */
m2 = r2[1]/r1[1]; m3 = r3[1]/r1[1];
r2[2] -= m2 * r1[2]; r3[2] -= m3 * r1[2];
r2[3] -= m2 * r1[3]; r3[3] -= m3 * r1[3];
-   s = r1[4]; if (0.0 != s) { r2[4] -= m2 * s; r3[4] -= m3 * s; }
-   s = r1[5]; if (0.0 != s) { r2[5] -= m2 * s; r3[5] -= m3 * s; }
-   s = r1[6]; if (0.0 != s) { r2[6] -= m2 * s; r3[6] -= m3 * s; }
-   s = r1[7]; if (0.0 != s) { r2[7] -= m2 * s; r3[7] -= m3 * s; }
+   s = r1[4]; if (0.0F != s) { r2[4] -= m2 * s; r3[4] -= m3 * s; }
+   s = r1[5]; if (0.0F != s) { r2[5] -= m2 * s; r3[5] -= m3 * s; }
+   s = r1[6]; if (0.0F != s) { r2[6] -= m2 * s; r3[6] -= m3 * s; }
+   s = r1[7]; if (0.0F != s) { r2[7] -= m2 * s; r3[7] -= m3 * s; }
 
/* choose pivot - or die */
if (fabsf(r3[2])>fabsf(r2[2])) SWAP_ROWS(r3, r2);
-   if (0.0 == r2[2])  return GL_FALSE;
+   if (0.0F == r2[2])  return GL_FALSE;
 
/* eliminate third variable */
m3 = r3[2]/r2[2];
@@ -421,7 +421,7 @@ static GLboolean invert_matrix_general( GLmatrix *mat )
r3[7] -= m3 * r2[7];
 
/* last check */
-   if (0.0 == r3[3]) return GL_FALSE;
+   if (0.0F == r3[3]) return GL_FALSE;
 
s = 1.0F/r3[3]; /* now back substitute row 3 */
r3[4] *= s; r3[5] *= s; r3[6] *= s; r3[7] *= s;
@@ -490,26 +490,26 @@ static GLboolean invert_matrix_3d_general( GLmatrix *mat )
 */
pos = neg = 0.0;

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 6487 matches

Mail list logo