Re: [Mesa-dev] threading in OSMesa and gallium swr driver

2018-05-18 Thread Rowley, Timothy O

On May 16, 2018, at 9:25 AM, Alexandre 
>
 wrote:

Thank you for your answer.
I understand  I can control the number of threads and prevent them to be 
assigned to actual hardware threads.
Preventing oversubscription of the hardware threads is challenging when using 
OpenMP/TBB/OpenSWR in hybrid environments.

I am wondering if having N SWR contexts (where N correspond to the number of 
hardware threads) each single-threaded
is *good enough* (not too bad performances compared to a single SWR context 
that serially render the tasks).
Do you have a take on this ?
That might be do the trick.

A single threaded swr context would not give high performance; swr was 
architected to parallelize the pipeline stages and depends on multiple 
threads/cpus to deliver high performance.  Notably compared to llvmpipe we can 
parallelize the geometry frontend and thus achieve much higher throughput.


Similar oversubscription problems occur with all applications that use multiple 
threading technologies (Cilk, TBB, OpenMP … ) and there are minimal solutions 
to prevent it besides re-writing code to use only 1 tech.


Yes, getting different threading libraries to agree can be tricky.  Does your 
application overlap heavy compute with graphics rendering?  If not, the 
oversubscription point might be moot.  One bit of advice we give to TBB library 
users is to initialize the TBB library before creating an OpenGL/SWR context.  
This allows TBB to size its thread pool to the entire machine, and then SWR 
will come in and create all its threads.  The other way round, SWR binds 
threads to cores, which TBB understands as unavailable resources resulting in a 
thread pool size of one.

If your concern is multiple SWR contexts running simultaneously and 
oversubscribing, it’s true that the swr thread pool creation is per-context and 
as Bruce says the only way to prevent that currently is setting the 
environmental variable to limit the number of worker threads.  This number 
should be greater than 1 for good performance, though.

-Tim

An alternative solution would be to have a callback mechanism in OpenSWR to 
launch a task on the application.

Cheers

Alex


On 16 May 2018, at 14:34, Cherniak, Bruce 
> wrote:


On May 14, 2018, at 8:59 AM, Alexandre 
>
 wrote:

Hello,

Sorry for the inconvenience if this message is not appropriate for this mailing 
list.

The following is a question for developers of the swr driver of gallium.

I am the main developer of a motion graphics application.
Our application internally has a dependency graph where each node may run 
concurrently.
We use OpenGL extensively in the implementation of the nodes (for example with 
Shadertoy).

Our application has 2 main requirements:
- A GPU backend, mainly for user interaction and fast results
- A CPU backend for batch rendering

Internally we use OSMesa for CPU backend so that our code is mostly identical 
for both GPU and CPU paths.
However when it comes to CPU, our application is heavily multi-threaded: each 
processing node can potentially run in parallel of others as a dependency graph.
We use Intel TBB to schedule the CPU threads.

For each actual hardware thread (not task) we allocate a new OSMesa context so 
that we can freely multi-thread operators rendering. It works fine with 
llvmpipe and also SWR so far (with a  patch to fix some static variables inside 
state_trackers/osmesa.c).

However with SWR using its own thread pool, I’m afraid of over-threading, 
introducing a bottleneck in threads scheduling
e.g: on a 32 cores processor, we already have lets say 24 threads busy on a TBB 
task on each core with 1 OSMesa context.
I looked at the code and all those concurrent OSMesa contexts will create a SWR 
context and each will try to initialise its own thread pool in CreateThreadPool 
in swr/rasterizer/core/api.cpp

Is there a way to have a single “static” thread-pool shared across all contexts 
?

There is not currently a way to create a single thread-pool shared across all 
contexts.  Each context creates unique worker threads.

However, OpenSWR provides an environment variable, KNOB_MAX_WORKER_THREADS, 
that overrides the default thread allocation.
Setting this will limit the number of threads created by an OpenSWR context 
*and* prevent the threads from being bound to physical cores.

Please, give this a try.  By adjusting the value, you may find the optimal 
value for your situation.

Cheers,
Bruce

Thank you

Alexandre










___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list

Re: [Mesa-dev] [Mesa-stable] [PATCH] swr: Fix KNOB_MAX_WORKER_THREADS thread creation override.

2017-12-13 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Dec 12, 2017, at 5:37 PM, Bruce Cherniak 
> wrote:

Environment variable KNOB_MAX_WORKER_THREADS allows the user to override
default thread creation and thread binding.  Previous commit to adjust
linux cpu topology caused setting this KNOB to bind all threads to a single
core.

This patch restores correct functionality of override.

Cc: 
>
---
src/gallium/drivers/swr/rasterizer/core/threads.cpp | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/threads.cpp 
b/src/gallium/drivers/swr/rasterizer/core/threads.cpp
index f4ddc21226..6242cb3fc7 100644
--- a/src/gallium/drivers/swr/rasterizer/core/threads.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/threads.cpp
@@ -213,8 +213,7 @@ void CalculateProcessorTopology(CPUNumaNodes& out_nodes, 
uint32_t& out_numThread
{
for (auto  : node.cores)
{
-out_numThreadsPerProcGroup = 
std::max((size_t)out_numThreadsPerProcGroup,
-  core.threadIds.size());
+out_numThreadsPerProcGroup += core.threadIds.size();
}
}

--
2.11.0

___
mesa-stable mailing list
mesa-sta...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-stable

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallivm: allow arch rounding with avx512

2017-11-01 Thread Rowley, Timothy O
I agree that we’re probably dropping into fallback paths in a variety of 
locations as there are a number of width==256 tests in the gallivm code.  Right 
now I’m working through piglit regressions versus avx2 in our driver, and the 
rounding tests weren't passing.

Thanks.

> On Nov 1, 2017, at 2:27 PM, Roland Scheidegger  wrote:
> 
> Looks good to me.
> Albeit I think there's quite a few more places which probably should
> handle avx512...
> 
> Reviewed-by: Roland Scheidegger 
> 
> Am 01.11.2017 um 20:17 schrieb Tim Rowley:
>> Fixes piglit vs-roundeven-{float,vec[234]} with simd16 VS.
>> ---
>> src/gallium/auxiliary/gallivm/lp_bld_arit.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c 
>> b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
>> index cf1958b3b6..a1edd349f1 100644
>> --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
>> @@ -1953,7 +1953,8 @@ arch_rounding_available(const struct lp_type type)
>> {
>>if ((util_cpu_caps.has_sse4_1 &&
>>(type.length == 1 || type.width*type.length == 128)) ||
>> -   (util_cpu_caps.has_avx && type.width*type.length == 256))
>> +   (util_cpu_caps.has_avx && type.width*type.length == 256) ||
>> +   (util_cpu_caps.has_avx512f && type.width*type.length == 512))
>>   return TRUE;
>>else if ((util_cpu_caps.has_altivec &&
>> (type.width == 32 && type.length == 4)))
>> 
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] configure.ac: add _DEBUG to strip_unwanted_llvm_flags

2017-10-16 Thread Rowley, Timothy O
Ping.  This is useful for those building their own llvm.

> On Oct 3, 2017, at 3:23 PM, Rowley, Timothy O <timothy.o.row...@intel.com> 
> wrote:
> 
> Assert-enabled builds of llvm add _DEBUG to the LLVM_CFLAGS.
> 
> This was causing a crash with swr running the ParaView
> waveletcontour.py test, due to a bug in our _DEBUG code.
> ---
> configure.ac | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/configure.ac b/configure.ac
> index 903a3979d4..b2768f46c0 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -987,6 +987,7 @@ strip_unwanted_llvm_flags() {
> echo " `$1` " | sed -E \
> -e 's/[[[:space:]]]+-m[[^[:space:]]]*//g' \
> -e 's/[[[:space:]]]+-DNDEBUG[[[:space:]]]/ /g' \
> +-e 's/[[[:space:]]]+-D_DEBUG[[[:space:]]]/ /g' \
> -e 's/[[[:space:]]]+-D_GNU_SOURCE[[[:space:]]]/ /g' \
> -e 's/[[[:space:]]]+-pedantic[[[:space:]]]/ /g' \
> -e 's/[[[:space:]]]+-W[[^[:space:]]]*//g' \
> -- 
> 2.11.0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/9] swr/rast: New GS state/context API

2017-09-25 Thread Rowley, Timothy O
Ok, made the following changes - want a full v2 commit, or ok to do this on 
push?

--- a/src/gallium/drivers/swr/swr_shader.cpp
+++ b/src/gallium/drivers/swr/swr_shader.cpp
@@ -533,12 +533,12 @@ BuilderSWR::CompileGS(struct swr_context *ctx, 
swr_jit_gs_key )
pGS->inputVertStride = pGS->numInputAttribs + pGS->vertexAttribOffset;
pGS->outputVertexSize = SWR_VTX_NUM_SLOTS;
pGS->controlDataSize = 8; // GS ouputs max of 8 32B units
-   pGS->controlDataOffset = 32;
-   pGS->outputVertexOffset = pGS->controlDataOffset + pGS->controlDataSize * 
32;
+   pGS->controlDataOffset = VERTEX_COUNT_SIZE;
+   pGS->outputVertexOffset = pGS->controlDataOffset + CONTROL_HEADER_SIZE;

pGS->allocationSize =
-  32 + // vertex count
-  (8 * 32) + // control header
+  VERTEX_COUNT_SIZE + // vertex count
+  CONTROL_HEADER_SIZE + // control header
   (SWR_VTX_NUM_SLOTS * 16) * // sizeof vertex
   pGS->maxNumVerts; // num verts


On Sep 23, 2017, at 9:51 PM, Cherniak, Bruce 
> wrote:


On Sep 21, 2017, at 7:46 PM, Tim Rowley 
> wrote:

One piglit regression, which was a false pass:
spec@glsl-1.50@execution@geometry@dynamic_input_array_index
---
.../drivers/swr/rasterizer/core/frontend.cpp   | 227 -
src/gallium/drivers/swr/rasterizer/core/state.h|  55 +++--
src/gallium/drivers/swr/swr_shader.cpp | 183 -
3 files changed, 253 insertions(+), 212 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/frontend.cpp 
b/src/gallium/drivers/swr/rasterizer/core/frontend.cpp
index f882869..26e76a9 100644
--- a/src/gallium/drivers/swr/rasterizer/core/frontend.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/frontend.cpp
@@ -710,45 +710,67 @@ void ProcessStreamIdBuffer(uint32_t stream, uint8_t* 
pStreamIdBase, uint32_t num

THREAD SWR_GS_CONTEXT tlsGsContext;

-template
-struct GsBufferInfo
+// Buffers that are allocated if GS is enabled
+struct GsBuffers
{
-GsBufferInfo(const SWR_GS_STATE )
-{
-const uint32_t vertexCount = gsState.maxNumVerts;
-const uint32_t vertexStride = sizeof(SIMDVERTEX);
-const uint32_t numSimdBatches = (vertexCount + SIMD_WIDTH - 1) / 
SIMD_WIDTH;
+uint8_t* pGsIn;
+uint8_t* pGsOut[KNOB_SIMD_WIDTH];
+uint8_t* pGsTransposed;
+void* pStreamCutBuffer;
+};

-vertexPrimitiveStride = vertexStride * numSimdBatches;
-vertexInstanceStride = vertexPrimitiveStride * SIMD_WIDTH;
+//
+/// @brief Transposes GS output from SOA to AOS to feed the primitive assembler
+/// @param pDst - Destination buffer in AOS form for the current SIMD width, 
fed into the primitive assembler
+/// @param pSrc - Buffer of vertices in SOA form written by the geometry shader
+/// @param numVerts - Number of vertices outputted by the GS
+/// @param numAttribs - Number of attributes per vertex
+template
+void TransposeSOAtoAOS(uint8_t* pDst, uint8_t* pSrc, uint32_t numVerts, 
uint32_t numAttribs)
+{
+uint32_t srcVertexStride = numAttribs * sizeof(float) * 4;
+uint32_t dstVertexStride = numAttribs * sizeof(typename SIMD_T::Float) * 4;

-if (gsState.isSingleStream)
-{
-cutPrimitiveStride = (vertexCount + 7) / 8;
-cutInstanceStride = cutPrimitiveStride * SIMD_WIDTH;
+OSALIGNSIMD16(uint32_t) gatherOffsets[SimdWidth];

-streamCutPrimitiveStride = 0;
-streamCutInstanceStride = 0;
-}
-else
-{
-cutPrimitiveStride = AlignUp(vertexCount * 2 / 8, 4);
-cutInstanceStride = cutPrimitiveStride * SIMD_WIDTH;
-
-streamCutPrimitiveStride = (vertexCount + 7) / 8;
-streamCutInstanceStride = streamCutPrimitiveStride * SIMD_WIDTH;
-}
+for (uint32_t i = 0; i < SimdWidth; ++i)
+{
+gatherOffsets[i] = srcVertexStride * i;
   }
+auto vGatherOffsets = SIMD_T::load_si((typename 
SIMD_T::Integer*)[0]);

-uint32_t vertexPrimitiveStride;
-uint32_t vertexInstanceStride;
+uint32_t numSimd = AlignUp(numVerts, SimdWidth) / SimdWidth;
+uint32_t remainingVerts = numVerts;

-uint32_t cutPrimitiveStride;
-uint32_t cutInstanceStride;
+for (uint32_t s = 0; s < numSimd; ++s)
+{
+uint8_t* pSrcBase = pSrc + s * srcVertexStride * SimdWidth;
+uint8_t* pDstBase = pDst + s * dstVertexStride;

-uint32_t streamCutPrimitiveStride;
-uint32_t streamCutInstanceStride;
-};
+// Compute mask to prevent src overflow
+uint32_t mask = std::min(remainingVerts, SimdWidth);
+mask = GenMask(mask);
+auto vMask = SIMD_T::vmask_ps(mask);
+auto viMask = SIMD_T::castps_si(vMask);
+
+for (uint32_t a = 0; a < numAttribs; ++a)
+{
+auto attribGatherX = SIMD_T::template 

Re: [Mesa-dev] [PATCH] swr/rast: do not crash on NULL strings returned by getenv

2017-09-19 Thread Rowley, Timothy O
I have a bit of a preference for Eric’s version.

-Tim

On Sep 18, 2017, at 7:10 AM, Emil Velikov 
> wrote:

On 18 September 2017 at 11:48, Eric Engestrom 
> wrote:
On Monday, 2017-09-18 11:29:21 +0100, Emil Velikov wrote:
From: Bernhard Rosenkraenzer >

The current convinince function GetEnv feeds the results of getenv
directly into std::string(). That is a bad idea, since the variable
may be unset, thus we feed NULL into the C++ construct.

The latter of which is not allowed and leads to a crash.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101832
Fixes: a25093de718 ("swr/rast: Implement JIT shader caching to disk")
Cc: Tim Rowley >
Cc: Laurent Carlier >
Cc: Bernhard Rosenkraenzer >
[Emil Velikov: make an actual commit from the misc diff]
Signed-off-by: Emil Velikov 
>

Laurent just sent this to the ML an hour before you, but this commit
message is much better.

Guilty, sorry about that gents.

Reviewed-by: Eric Engestrom 
>

---
src/gallium/drivers/swr/rasterizer/core/utils.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/utils.h 
b/src/gallium/drivers/swr/rasterizer/core/utils.h
index b096d2120cb..3c849e82d3b 100644
--- a/src/gallium/drivers/swr/rasterizer/core/utils.h
+++ b/src/gallium/drivers/swr/rasterizer/core/utils.h
@@ -365,7 +365,8 @@ static INLINE std::string GetEnv(const std::string& 
variableName)
output.resize(valueSize - 1); // valueSize includes null, output.resize() 
does not
GetEnvironmentVariableA(variableName.c_str(), [0], valueSize);
#else
-output = getenv(variableName.c_str());
+char *o = getenv(variableName.c_str());
+output = o ? std::string(o) : std::string();

Like I mentioned [2] on his patch, I think this could be written
better, eg:

   char *env = getenv(variableName.c_str());
   output = env ? env : "";

I'll let Tim and other SWR devs weight-in - I'm fine either way.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: Add arch flags to support Cray and PGI compilers

2017-08-01 Thread Rowley, Timothy O

On Jul 31, 2017, at 3:51 PM, Chuck Atkins 
> wrote:

Hi Tim,

If the Cray flags are for wrapper scripts, why do we need specific flags for 
that instead of using the underlying compiler flags?

Sort answer: It's the "Cray" way of doing things.

Long answer: The target-cpu flag sometimes just controlls the -march flags (or 
equiv) but it can also add other low level flags.  By using the target-cpu flag 
with the cray compiler wrappers, you ensure that you're using whatever flags 
for a given architecture are appropriate for the underlying compiler, even if 
you don't have that compiler knowledge specified encoded anywhere in your 
configure.  For instance, when using another compiler backend that ./configure 
isn't explicitly checking for (pathscale, actual cray compiler, etc.), then the 
build will continue to work because -target-cpu gets translated by the wrpper 
to whatever is appropriate.  You'll also get a default set of flags loaded 
anyways based on your module environment.  Specifying target-cpu replaces those 
default flags whereas adding -xCORE-AVX512 would just append to them, maybe 
overriding the default flags, maybe not, depending on how the module 
environment is set up.  It's one of the many quirks and oddities of the Cray 
Programming Environment.

Thanks for explanation.

Reviewed-by: Tim Rowley 
>


I’m guessing you intend this for the 17.2 branch as well?

Nope.  I've no pressing customer need for it so keeping it in master but out of 
stable is fine with me.


--
Chuck Atkins
Staff R Engineer, Scientific Computing
Kitware, Inc.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 12/13] swr/rast: split gen_knobs template into .cpp and .h files

2017-08-01 Thread Rowley, Timothy O

On Jul 31, 2017, at 3:18 PM, Emil Velikov 
> wrote:

Hi Tim,

What's the goal behind the split. Please add a couple of words in the
commit message.

Will do.


On 31 July 2017 at 20:40, Tim Rowley 
> wrote:
---
src/gallium/drivers/swr/Makefile.am|   3 +-
src/gallium/drivers/swr/SConscript |   4 +-
.../drivers/swr/rasterizer/codegen/gen_knobs.py|  14 +-
.../swr/rasterizer/codegen/templates/gen_knobs.cpp | 112 +---
.../swr/rasterizer/codegen/templates/gen_knobs.h   | 147 +
.../drivers/swr/rasterizer/core/knobs_init.h   |  12 +-
6 files changed, 166 insertions(+), 126 deletions(-)
create mode 100644 
src/gallium/drivers/swr/rasterizer/codegen/templates/gen_knobs.h

diff --git a/src/gallium/drivers/swr/Makefile.am 
b/src/gallium/drivers/swr/Makefile.am
index 73fe904..b20f128 100644
--- a/src/gallium/drivers/swr/Makefile.am
+++ b/src/gallium/drivers/swr/Makefile.am
@@ -115,7 +115,7 @@ rasterizer/codegen/gen_knobs.cpp: 
rasterizer/codegen/gen_knobs.py rasterizer/cod
   --output rasterizer/codegen/gen_knobs.cpp \
   --gen_cpp

-rasterizer/codegen/gen_knobs.h: rasterizer/codegen/gen_knobs.py 
rasterizer/codegen/knob_defs.py rasterizer/codegen/templates/gen_knobs.cpp 
rasterizer/codegen/gen_common.py
+rasterizer/codegen/gen_knobs.h: rasterizer/codegen/gen_knobs.py 
rasterizer/codegen/knob_defs.py rasterizer/codegen/templates/gen_knobs.h 
rasterizer/codegen/gen_common.py
   $(MKDIR_GEN)
   $(PYTHON_GEN) \
   $(srcdir)/rasterizer/codegen/gen_knobs.py \
@@ -347,5 +347,6 @@ EXTRA_DIST = \
   rasterizer/codegen/templates/gen_builder.hpp \
   rasterizer/codegen/templates/gen_header_init.hpp \
   rasterizer/codegen/templates/gen_knobs.cpp \
+   rasterizer/codegen/templates/gen_knobs.h \
   rasterizer/codegen/templates/gen_llvm.hpp \
   rasterizer/codegen/templates/gen_rasterizer.cpp
diff --git a/src/gallium/drivers/swr/SConscript 
b/src/gallium/drivers/swr/SConscript
index a32807d..b394cbc 100644
--- a/src/gallium/drivers/swr/SConscript
+++ b/src/gallium/drivers/swr/SConscript
@@ -53,8 +53,8 @@ env.CodeGenerate(
source = '',
command = python_cmd + ' $SCRIPT --output $TARGET --gen_h'
)
-Depends('rasterizer/codegen/gen_knobs.cpp',
Seems like this should have been gen_knobs.h in the first place - oops :-)

Yep, noticed that bug when updating the rule - I’ve pulled the fix into a 
separate commit.


-swrroot + 'rasterizer/codegen/templates/gen_knobs.cpp')
+Depends('rasterizer/codegen/gen_knobs.h',
+swrroot + 'rasterizer/codegen/templates/gen_knobs.h')


The build bits are
Reviewed-by: Emil Velikov 
>

--- /dev/null
+++ b/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_knobs.h
@@ -0,0 +1,147 @@
+/**
+*
+* Copyright 2015-2017
+* Intel Corporation
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+* http 
://www.apache.org/licenses/LICENSE-2.0
+*
I'm not a lawyer so I'm not sure if having Apache licensed code is
fine with rest of Mesa.

Considering that rest of SWR (barring the original gen_knobs.cpp where
this is comes from) uses MIT X11/Expat I'd stay consistent and
re-license this/these files.
If possible, of course.

Adding files with another license was unintentional - I’ve added a relicense 
commit prior to the split.



--- a/src/gallium/drivers/swr/rasterizer/core/knobs_init.h
+++ b/src/gallium/drivers/swr/rasterizer/core/knobs_init.h
@@ -91,16 +91,18 @@ static inline void ConvertEnvToKnob(const char* pOverride, 
std::string& knobValu
template 
static inline void InitKnob(T& knob)
{
-
-// TODO, read registry first
-
-// Second, read environment variables
+// Read environment variables
const char* pOverride = getenv(knob.Name());

if (pOverride)
{
-auto knobValue = knob.Value();
+auto knobValue = knob.DefaultValue();
ConvertEnvToKnob(pOverride, knobValue);
knob.Value(knobValue);
}
+else
+{
+// Set default value
+knob.Value(knob.DefaultValue());
This and the underlying code seems to have changed a bit.

Would be nice to keep "dummy split" and functionality changes as
separate patches.
Then again: it's not my code, so please don't read too much into my suggestion.

I’ve unwoven this commit into five commits for the upcoming v2 of the patchset:

commit 566ea1983277bf62f07ea02571854009b667081f
Author: Tim Rowley 
>
Date:   Mon Jul 31 17:22:54 2017 -0500

swr/rast: simplify knob 

Re: [Mesa-dev] [PATCH 11/13] swr/rast: fixes for 32-bit builds

2017-08-01 Thread Rowley, Timothy O

> On Jul 31, 2017, at 3:56 PM, Emil Velikov  wrote:
> 
> Hi Tim,
> 
> Some of the inline functions seem unused.
> Very quick search showed the following:
> 
> InterpolateComponent
> _simd128_abs_ps
> _simd_abs_ps

The intent of simdlib is a general purpose vector library, so some 
functions/methods are in there for completeness sake so that when a developer 
is using it in the future they’re not surprised by obvious holes in the api.

> Might be worth cleaning things up, first?

I’ve been experimenting with —print-gc-sections to see if I can prune used code 
from the tree, but real unused functions seem to get lost in the inline 
functions/methods defined in headers which the linker is reaping as expected.

> -Emil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: Add arch flags to support Cray and PGI compilers

2017-07-31 Thread Rowley, Timothy O
If the Cray flags are for wrapper scripts, why do we need specific flags for 
that instead of using the underlying compiler flags?

I’m guessing you intend this for the 17.2 branch as well?

-Tim

> On Jul 31, 2017, at 2:53 PM, Chuck Atkins  wrote:
> 
> Note that the Cray flags (-target-cpu=) need to come first since the
> cray programming environment uses wappers around other compilers.  By
> checking the wrapper flags first, you can be sure to match the wrapper
> flag instead of the underlying compiler (gcc, intel, pgi, etc.) flags.
> 
> Signed-off-by: Chuck Atkins 
> Cc: Tim Rowley 
> ---
> configure.ac | 8 
> 1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/configure.ac b/configure.ac
> index 6302aa2b0c..3b45baf6d0 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -2511,7 +2511,7 @@ if test -n "$with_gallium_drivers"; then
> AC_SUBST([SWR_CXX11_CXXFLAGS])
> 
> swr_require_cxx_feature_flags "AVX" "defined(__AVX__)" \
> -",-mavx,-march=core-avx" \
> +
> ",-target-cpu=sandybridge,-mavx,-march=core-avx,-tp=sandybridge" \
> SWR_AVX_CXXFLAGS
> AC_SUBST([SWR_AVX_CXXFLAGS])
> 
> @@ -2523,21 +2523,21 @@ if test -n "$with_gallium_drivers"; then
> ;;
> xavx2)
> swr_require_cxx_feature_flags "AVX2" "defined(__AVX2__)" \
> -",-mavx2 -mfma -mbmi2 -mf16c,-march=core-avx2" \
> +",-target-cpu=haswell,-mavx2 -mfma -mbmi2 
> -mf16c,-march=core-avx2,-tp=haswell" \
> SWR_AVX2_CXXFLAGS
> AC_SUBST([SWR_AVX2_CXXFLAGS])
> HAVE_SWR_AVX2=yes
> ;;
> xknl)
> swr_require_cxx_feature_flags "KNL" "defined(__AVX512F__) 
> && defined(__AVX512ER__)" \
> -",-march=knl,-xMIC-AVX512" \
> +",-target-cpu=mic-knl,-march=knl,-xMIC-AVX512" \
> SWR_KNL_CXXFLAGS
> AC_SUBST([SWR_KNL_CXXFLAGS])
> HAVE_SWR_KNL=yes
> ;;
> xskx)
> swr_require_cxx_feature_flags "SKX" "defined(__AVX512F__) 
> && defined(__AVX512BW__)" \
> -",-march=skylake-avx512,-xCORE-AVX512" \
> +
> ",-target-cpu=x86-skylake,-march=skylake-avx512,-xCORE-AVX512" \
> SWR_SKX_CXXFLAGS
> AC_SUBST([SWR_SKX_CXXFLAGS])
> HAVE_SWR_SKX=yes
> -- 
> 2.13.3
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: fix transform feedback logic

2017-07-23 Thread Rowley, Timothy O

> On Jul 23, 2017, at 11:08 AM, George Kyriazis  
> wrote:
> 
> The shader that is used to copy vertex data out of the vs/gs shaders to
> the user-specified buffer (streamout os SO shader) was not using the
> correct offsets.
> 
> Adjust the offsets that are used just for the SO shader:
> - Make sure that position is handled in the same special way
>  as in the vs/gs shaders
> - Use the correct offset to be passed in the core
> - consolidate register slot mapping logic into one function, since it's
>  been calculated in 2 different places (one for calcuating the slot mask,
>  and one for the register offsets themselves
> 
> Also make room for all attibutes in the backend vertex area.

Add a comment to the commit indicating that as Ilia states, this is not a 
complete solution.

> 
> Fixes:
> - all vtk GL2PS tests
> - 18 piglit tests (16 ext_transform_feedback tests,
>  arb-quads-follow-provoking-vertex and primitive-type gl_points
> ---
> src/gallium/drivers/swr/swr_draw.cpp  | 11 ---
> src/gallium/drivers/swr/swr_state.cpp | 31 +--
> src/gallium/drivers/swr/swr_state.h   |  3 +++
> 3 files changed, 40 insertions(+), 5 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/swr_draw.cpp 
> b/src/gallium/drivers/swr/swr_draw.cpp
> index 62ad3f7..218de0f 100644
> --- a/src/gallium/drivers/swr/swr_draw.cpp
> +++ b/src/gallium/drivers/swr/swr_draw.cpp
> @@ -26,6 +26,7 @@
> #include "swr_resource.h"
> #include "swr_fence.h"
> #include "swr_query.h"
> +#include "swr_state.h"
> #include "jit_api.h"
> 
> #include "util/u_draw.h"
> @@ -81,8 +82,11 @@ swr_draw_vbo(struct pipe_context *pipe, const struct 
> pipe_draw_info *info)
>offsets[output_buffer] = so->output[i].dst_offset;
> }
> 
> +unsigned attrib_slot = so->output[i].register_index;
> +attrib_slot = swr_so_adjust_attrib(attrib_slot, ctx->vs);
> +
> state.stream.decl[num].bufferIndex = output_buffer;
> -state.stream.decl[num].attribSlot = so->output[i].register_index 
> - 1;
> +state.stream.decl[num].attribSlot = attrib_slot;
> state.stream.decl[num].componentMask =
>((1 << so->output[i].num_components) - 1)
><< so->output[i].start_component;
> @@ -130,9 +134,10 @@ swr_draw_vbo(struct pipe_context *pipe, const struct 
> pipe_draw_info *info)
>SWR_FRONTEND_STATE feState = {0};
> 
>feState.vsVertexSize =
> -  VERTEX_ATTRIB_START_SLOT +
> +  VERTEX_ATTRIB_START_SLOT
>   + ctx->vs->info.base.num_outputs
> -  - (ctx->vs->info.base.writes_position ? 1 : 0);
> +  - (ctx->vs->info.base.writes_position ? 1 : 0)
> +  + ctx->fs->info.base.num_outputs;

Sizing vsVertexSize to essentially vs->num_outputs + fs->num_outputs seems odd, 
as the fe shouldn’t care about the number of outputs of the fs (inputs, maybe).

>if (ctx->rasterizer->flatshade_first) {
>   feState.provokingVertex = {1, 0, 0};
> diff --git a/src/gallium/drivers/swr/swr_state.cpp 
> b/src/gallium/drivers/swr/swr_state.cpp
> index 501fdea..3e07929 100644
> --- a/src/gallium/drivers/swr/swr_state.cpp
> +++ b/src/gallium/drivers/swr/swr_state.cpp
> @@ -345,13 +345,15 @@ swr_create_vs_state(struct pipe_context *pipe,
>   // soState.streamToRasterizer not used
> 
>   for (uint32_t i = 0; i < stream_output->num_outputs; i++) {
> + unsigned attrib_slot = stream_output->output[i].register_index;
> + attrib_slot = swr_so_adjust_attrib(attrib_slot, swr_vs);
>  swr_vs->soState.streamMasks[stream_output->output[i].stream] |=
> -1 << (stream_output->output[i].register_index - 1);
> +(1 << attrib_slot);
>   }
>   for (uint32_t i = 0; i < MAX_SO_STREAMS; i++) {
> swr_vs->soState.streamNumEntries[i] =
>  _mm_popcnt_u32(swr_vs->soState.streamMasks[i]);
> -swr_vs->soState.vertexAttribOffset[i] = VERTEX_ATTRIB_START_SLOT; // 
> TODO: optimize
> +swr_vs->soState.vertexAttribOffset[i] = 0;
>}
>}
> 
> @@ -1777,6 +1779,31 @@ swr_update_derived(struct pipe_context *pipe,
>ctx->dirty = post_update_dirty_flags;
> }
> 
> +unsigned
> +swr_so_adjust_attrib(unsigned in_attrib,
> + swr_vertex_shader *swr_vs)
> +{
> +   ubyte semantic_name;
> +   unsigned attrib;
> +
> +   attrib = in_attrib + VERTEX_ATTRIB_START_SLOT;
> +
> +   if (swr_vs) {
> +  semantic_name = swr_vs->info.base.output_semantic_name[in_attrib];
> +  if (semantic_name == TGSI_SEMANTIC_POSITION) {
> + attrib = VERTEX_POSITION_SLOT;
> +  } else {
> + for (int i = 0; i < PIPE_MAX_SHADER_OUTPUTS; i++) {
> +if (swr_vs->info.base.output_semantic_name[i] == 
> TGSI_SEMANTIC_POSITION) {
> +   attrib--;
> +   break;
> +}
> + }

Couldn’t this for loop be replaced with a “if 
(swr_vs->info.base.writes_position) attrib—;”?

> +  }

Re: [Mesa-dev] [Mesa-stable] [PATCH 3/3] swr: use the correct variable for no undefined symbols

2017-07-21 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 

> On Jul 21, 2017, at 1:05 PM, Emil Velikov  wrote:
> 
> From: Emil Velikov 
> 
> The variable name was missing a leading LD_, which resulted in the
> backend binaries having unresolved symbols.
> 
> With the link addressed with earlier patches, we can correct the typo.
> 
> Thanks to Laurent for the help spotting this.
> 
> v2: Split from a larger patch.
> 
> Cc: mesa-sta...@lists.freedesktop.org
> Cc: Bruce Cherniak 
> Cc: Tim Rowley 
> Cc: Laurent Carlier 
> Fixes: 9475251145174882b532 "swr: standardize linkage and check for
> unresolved symbols"
> Reviewed-by: Eric Engestrom 
> Reported-by: Laurent Carlier 
> Signed-off-by: Emil Velikov 
> ---
> src/gallium/drivers/swr/Makefile.am | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/gallium/drivers/swr/Makefile.am 
> b/src/gallium/drivers/swr/Makefile.am
> index bc7abad06af..05fc3b3595b 100644
> --- a/src/gallium/drivers/swr/Makefile.am
> +++ b/src/gallium/drivers/swr/Makefile.am
> @@ -238,7 +238,7 @@ COMMON_LDFLAGS = \
>   -module \
>   -no-undefined \
>   $(GC_SECTIONS) \
> - $(NO_UNDEFINED)
> + $(LD_NO_UNDEFINED)
> 
> lib_LTLIBRARIES =
> 
> -- 
> 2.13.0
> 
> ___
> mesa-stable mailing list
> mesa-sta...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-stable

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] swr: don't forget to link AVX/AVX2 against pthreads

2017-07-21 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 

> On Jul 21, 2017, at 1:05 PM, Emil Velikov  wrote:
> 
> From: Emil Velikov 
> 
> Seems like the backends have been using pthreads since day one, yet
> we've been missing the link.
> 
> With later commit we'll fix a typo, hence the libraries will be build
> with -Wl,no-undefined, aka failing the build on unresolved symbols.
> 
> v2: Split from a larger patch.
> 
> Cc: mesa-sta...@lists.freedesktop.org
> Cc: Bruce Cherniak 
> Cc: Tim Rowley 
> Cc: Laurent Carlier 
> Fixes: c6e67f5a9373e916a8d2 "gallium/swr: add OpenSWR rasterizer"
> Reviewed-by: Eric Engestrom 
> Signed-off-by: Emil Velikov 
> ---
> src/gallium/drivers/swr/Makefile.am | 8 
> 1 file changed, 8 insertions(+)
> 
> diff --git a/src/gallium/drivers/swr/Makefile.am 
> b/src/gallium/drivers/swr/Makefile.am
> index 64950214572..02010727d9b 100644
> --- a/src/gallium/drivers/swr/Makefile.am
> +++ b/src/gallium/drivers/swr/Makefile.am
> @@ -246,6 +246,7 @@ if HAVE_SWR_AVX
> lib_LTLIBRARIES += libswrAVX.la
> 
> libswrAVX_la_CXXFLAGS = \
> + $(PTHREAD_CFLAGS) \
>   $(SWR_AVX_CXXFLAGS) \
>   -DKNOB_ARCH=KNOB_ARCH_AVX \
>   $(COMMON_CXXFLAGS)
> @@ -253,6 +254,9 @@ libswrAVX_la_CXXFLAGS = \
> libswrAVX_la_SOURCES = \
>   $(COMMON_SOURCES)
> 
> +libswrAVX_la_LIBADD = \
> + $(PTHREAD_LIBS)
> +
> libswrAVX_la_LDFLAGS = \
>   $(COMMON_LDFLAGS)
> endif
> @@ -260,6 +264,7 @@ endif
> if HAVE_SWR_AVX2
> lib_LTLIBRARIES += libswrAVX2.la
> libswrAVX2_la_CXXFLAGS = \
> + $(PTHREAD_CFLAGS) \
>   $(SWR_AVX2_CXXFLAGS) \
>   -DKNOB_ARCH=KNOB_ARCH_AVX2 \
>   $(COMMON_CXXFLAGS)
> @@ -267,6 +272,9 @@ libswrAVX2_la_CXXFLAGS = \
> libswrAVX2_la_SOURCES = \
>   $(COMMON_SOURCES)
> 
> +libswrAVX2_la_LIBADD = \
> + $(PTHREAD_LIBS)
> +
> libswrAVX2_la_LDFLAGS = \
>   $(COMMON_LDFLAGS)
> endif
> -- 
> 2.13.0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] swr: don't forget to link KNL/SKX against pthreads

2017-07-21 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 

> On Jul 21, 2017, at 1:05 PM, Emil Velikov  wrote:
> 
> From: Emil Velikov 
> 
> Analogous to previous commit but for the KNL/SKX backends.
> 
> Cc: Bruce Cherniak 
> Cc: Tim Rowley 
> Cc: Laurent Carlier 
> Fixes: 1cb5a6061ce ("configure/swr: add KNL and SKX architecture targets")
> Signed-off-by: Emil Velikov 
> ---
> src/gallium/drivers/swr/Makefile.am | 8 
> 1 file changed, 8 insertions(+)
> 
> diff --git a/src/gallium/drivers/swr/Makefile.am 
> b/src/gallium/drivers/swr/Makefile.am
> index 02010727d9b..bc7abad06af 100644
> --- a/src/gallium/drivers/swr/Makefile.am
> +++ b/src/gallium/drivers/swr/Makefile.am
> @@ -283,6 +283,7 @@ if HAVE_SWR_KNL
> lib_LTLIBRARIES += libswrKNL.la
> 
> libswrKNL_la_CXXFLAGS = \
> + $(PTHREAD_CFLAGS) \
>   $(SWR_KNL_CXXFLAGS) \
>   -DKNOB_ARCH=KNOB_ARCH_AVX512 -DAVX512F_STRICT \
>   $(COMMON_CXXFLAGS)
> @@ -290,6 +291,9 @@ libswrKNL_la_CXXFLAGS = \
> libswrKNL_la_SOURCES = \
>   $(COMMON_SOURCES)
> 
> +libswrKNL_la_LIBADD = \
> + $(PTHREAD_LIBS)
> +
> libswrKNL_la_LDFLAGS = \
>   $(COMMON_LDFLAGS)
> endif
> @@ -298,6 +302,7 @@ if HAVE_SWR_SKX
> lib_LTLIBRARIES += libswrSKX.la
> 
> libswrSKX_la_CXXFLAGS = \
> + $(PTHREAD_CFLAGS) \
>   $(SWR_SKX_CXXFLAGS) \
>   -DKNOB_ARCH=KNOB_ARCH_AVX512 \
>   $(COMMON_CXXFLAGS)
> @@ -305,6 +310,9 @@ libswrSKX_la_CXXFLAGS = \
> libswrSKX_la_SOURCES = \
>   $(COMMON_SOURCES)
> 
> +libswrSKX_la_LIBADD = \
> + $(PTHREAD_LIBS)
> +
> libswrSKX_la_LDFLAGS = \
>   $(COMMON_LDFLAGS)
> endif
> -- 
> 2.13.0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] swr: use the correct variable for no undefined symbols

2017-07-21 Thread Rowley, Timothy O
Couple things about the patch: should PTHREAD_CFLAGS be added to 
COMMON_CXXFLAGS to avoid needing to modify the section for each architecture?  
The KNL and SKX sections should have similar changes.

I’ve tested that the scons binaries work on linux, and George tests them on 
windows.

-Tim

> On Jul 21, 2017, at 7:53 AM, Emil Velikov  wrote:
> 
> From: Emil Velikov 
> 
> The variable name was missing a leading LD_, which resulted in the
> backend binaries having unresolved symbols.
> 
> Thanks to Laurent for the list.
> 
> The fix is applicable for stable as well, although the actual pthread
> linking may not be. That plus additional [missing] links will be
> resolved in that branch.
> 
> Cc: mesa-sta...@lists.freedesktop.org
> Cc: Bruce Cherniak 
> Cc: Tim Rowley 
> Cc: Laurent Carlier 
> Reported-by: Laurent Carlier 
> Signed-off-by: Emil Velikov 
> ---
> Laurent, the output of `ldd -r $binary` should be free of undefined
> symbols. Can you give it a quick test?
> 
> Tim, Bruce - the new backends might need the PTHREAD* bits.
> The SCons build has the -Wl,no-undef... parts but one might want to
> double-check the binaries it produced.
> 
> Thanks
> ---
> src/gallium/drivers/swr/Makefile.am | 10 +-
> 1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/src/gallium/drivers/swr/Makefile.am 
> b/src/gallium/drivers/swr/Makefile.am
> index 74612280fe7..3bffa9595d5 100644
> --- a/src/gallium/drivers/swr/Makefile.am
> +++ b/src/gallium/drivers/swr/Makefile.am
> @@ -222,11 +222,12 @@ COMMON_LDFLAGS = \
>   -module \
>   -no-undefined \
>   $(GC_SECTIONS) \
> - $(NO_UNDEFINED)
> + $(LD_NO_UNDEFINED)
> 
> lib_LTLIBRARIES = libswrAVX.la libswrAVX2.la
> 
> libswrAVX_la_CXXFLAGS = \
> + $(PTHREAD_CFLAGS) \
>   $(SWR_AVX_CXXFLAGS) \
>   -DKNOB_ARCH=KNOB_ARCH_AVX \
>   $(COMMON_CXXFLAGS)
> @@ -234,10 +235,14 @@ libswrAVX_la_CXXFLAGS = \
> libswrAVX_la_SOURCES = \
>   $(COMMON_SOURCES)
> 
> +libswrAVX_la_LIBADD = \
> + $(PTHREAD_LIBS)
> +
> libswrAVX_la_LDFLAGS = \
>   $(COMMON_LDFLAGS)
> 
> libswrAVX2_la_CXXFLAGS = \
> + $(PTHREAD_CFLAGS) \
>   $(SWR_AVX2_CXXFLAGS) \
>   -DKNOB_ARCH=KNOB_ARCH_AVX2 \
>   $(COMMON_CXXFLAGS)
> @@ -245,6 +250,9 @@ libswrAVX2_la_CXXFLAGS = \
> libswrAVX2_la_SOURCES = \
>   $(COMMON_SOURCES)
> 
> +libswrAVX2_la_LIBADD = \
> + $(PTHREAD_LIBS)
> +
> libswrAVX2_la_LDFLAGS = \
>   $(COMMON_LDFLAGS)
> 
> -- 
> 2.13.0
> 
> ___
> mesa-stable mailing list
> mesa-sta...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-stable

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/util: fix nondeterministic avx512 detection

2017-07-19 Thread Rowley, Timothy O
There weren’t any direct users of the avx512 features yet, but probably worth 
proposing for stable pickup.

Thanks.

-Tim

> On Jul 19, 2017, at 2:40 PM, Roland Scheidegger  wrote:
> 
> Makes sense to me.
> Probably should go into stable?
> 
> Reviewed-by: Roland Scheidegger 
> 
> Am 19.07.2017 um 21:29 schrieb Tim Rowley:
>> cpuid.7 requires cx=0 to select the extended feature leaf.
>> 
>> avx512 detection was using the non-indexed cpuid resulting
>> in random non-detection of avx512.
>> ---
>> src/gallium/auxiliary/util/u_cpu_detect.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c 
>> b/src/gallium/auxiliary/util/u_cpu_detect.c
>> index 3d6ccb5822..4e71041bc9 100644
>> --- a/src/gallium/auxiliary/util/u_cpu_detect.c
>> +++ b/src/gallium/auxiliary/util/u_cpu_detect.c
>> @@ -438,7 +438,7 @@ util_cpu_detect(void)
>>   (xgetbv() & (0x7 << 5)) && // OPMASK: upper-256 enabled by OS
>>   ((xgetbv() & 6) == 6)) { // XMM/YMM enabled by OS
>>  uint32_t regs3[4];
>> - cpuid(0x0007, regs3);
>> + cpuid_count(0x0007, 0x, regs3);
>>  util_cpu_caps.has_avx512f= (regs3[1] >> 16) & 1;
>>  util_cpu_caps.has_avx512dq   = (regs3[1] >> 17) & 1;
>>  util_cpu_caps.has_avx512ifma = (regs3[1] >> 21) & 1;
>> 
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] configure/swr: add KNL and SKX architecture targets

2017-07-18 Thread Rowley, Timothy O

On Jul 17, 2017, at 11:51 AM, Emil Velikov 
> wrote:

On 17 July 2017 at 15:08, Tim Rowley 
> wrote:
Not built by default.
---
configure.ac   | 16 ++
src/gallium/drivers/swr/Makefile.am| 38 ++
src/gallium/drivers/swr/swr_loader.cpp | 20 +-
3 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index 3a8fa4d7ea..4437c8189d 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2518,6 +2518,20 @@ if test -n "$with_gallium_drivers"; then
AC_SUBST([SWR_AVX2_CXXFLAGS])
HAVE_SWR_AVX2=yes
;;
+xknl)
+swr_require_cxx_feature_flags "KNL" "defined(__AVX512F__) 
&& defined(__AVX512ER__)" \
+",-march=knl,-xMIC-AVX512" \
+SWR_KNL_CXXFLAGS
+AC_SUBST([SWR_KNL_CXXFLAGS])
+HAVE_SWR_KNL=yes
+;;
+xskx)
+swr_require_cxx_feature_flags "SKX" "defined(__AVX512F__) 
&& defined(__AVX512BW__)" \
+",-march=skylake-avx512,-xCORE-AVX512" \
+SWR_SKX_CXXFLAGS
+AC_SUBST([SWR_SKX_CXXFLAGS])
+HAVE_SWR_SKX=yes
+;;
Please update of the help string. Otherwise these two are completely
undocumented.


Will do.

Can I bribe you to add a Travis entries for the above? If it doesn't
take too long to build, you can it squash into the existing ones.

I’ll do that in a future patch; the avx512 code in swr as it currently exists 
in mesa-master only works on icc, which isn’t one of the Travis options.  We’re 
working on a patch which enables modern clang and gcc to also build it.


Thanks
Emil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] configure/swr: configurable swr architectures

2017-07-18 Thread Rowley, Timothy O

On Jul 17, 2017, at 11:42 AM, Emil Velikov 
> wrote:

On 17 July 2017 at 15:08, Tim Rowley 
> wrote:
Allow configuration of the SWR architecture depend libraries
we build for with --with-swr-archs.  Maintains current behavior
by defaulting to avx,avx2.

Scons changes made to make it still build and work, but
without the changes for configuring which architectures.
---
configure.ac   | 39 ++
src/gallium/drivers/swr/Makefile.am| 17 ++-
src/gallium/drivers/swr/SConscript |  1 +
src/gallium/drivers/swr/swr_loader.cpp | 22 +++
4 files changed, 70 insertions(+), 9 deletions(-)

diff --git a/configure.ac b/configure.ac
index 46fcd8f3fe..3a8fa4d7ea 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2349,6 +2349,15 @@ AC_ARG_WITH([d3d-libdir],
[D3D_DRIVER_INSTALL_DIR="${libdir}/d3d"])
AC_SUBST([D3D_DRIVER_INSTALL_DIR])

+dnl Architectures to build SWR library for
+
+AC_ARG_WITH([swr-archs],
+[AS_HELP_STRING([--with-swr-archs@<:@=DIRS...@:>@],
+[comma delimited swr architectures list, e.g.
+"avx,avx2" @<:@default="avx,avx2"@:>@])],
+[with_swr_archs="$withval"],
+[with_swr_archs="avx avx2"])
Add the missing comma - "avx,avx2”

Will do.


+
dnl
dnl r300 doesn't strictly require LLVM, but for performance reasons we
dnl highly recommend LLVM usage. So require it at least on x86 and x86_64
@@ -2496,10 +2505,24 @@ if test -n "$with_gallium_drivers"; then
SWR_AVX_CXXFLAGS
AC_SUBST([SWR_AVX_CXXFLAGS])

-swr_require_cxx_feature_flags "AVX2" "defined(__AVX2__)" \
-",-mavx2 -mfma -mbmi2 -mf16c,-march=core-avx2" \
-SWR_AVX2_CXXFLAGS
-AC_SUBST([SWR_AVX2_CXXFLAGS])
+swr_archs=`IFS=', '; echo $with_swr_archs`
+for arch in $swr_archs; do
+case "x$arch" in
+xavx)
You want to move the AVX flag detection here, right?

No, since we need to have SWR_AVX_CXXFLAGS to build the driver proper.


+HAVE_SWR_AVX=yes
+;;
+xavx2)
+swr_require_cxx_feature_flags "AVX2" "defined(__AVX2__)" \
+",-mavx2 -mfma -mbmi2 -mf16c,-march=core-avx2" \
+SWR_AVX2_CXXFLAGS
+AC_SUBST([SWR_AVX2_CXXFLAGS])
+HAVE_SWR_AVX2=yes
+;;
+*)
+AC_MSG_ERROR([unknown SWR build architecture '$arch'])
+;;
+esac
+done

And error out if building without any arch?

Added.

HAVE_GALLIUM_SWR=yes
;;
@@ -2538,6 +2561,9 @@ if test "x$enable_llvm" = "xyes" -a 
"$with_gallium_drivers"; then
llvm_add_default_components "gallium"
fi

+AM_CONDITIONAL(HAVE_SWR_AVX, test "x$HAVE_SWR_AVX" = xyes)
+AM_CONDITIONAL(HAVE_SWR_AVX2, test "x$HAVE_SWR_AVX2" = xyes)
+
dnl We need to validate some needed dependencies for renderonly drivers.

if test "x$HAVE_GALLIUM_ETNAVIV" != xyes -a "x$HAVE_GALLIUM_IMX" = xyes  ; then
@@ -2977,6 +3003,11 @@ else
echo "HUD lmsensors:   yes"
fi

+echo ""
+if test "x$HAVE_GALLIUM_SWR" != x; then
+echo "SWR archs:   $swr_archs"
+fi
+
dnl Libraries
echo ""
echo "Shared libs: $enable_shared"
diff --git a/src/gallium/drivers/swr/Makefile.am 
b/src/gallium/drivers/swr/Makefile.am
index 74612280fe..f38ce7b1d9 100644
--- a/src/gallium/drivers/swr/Makefile.am
+++ b/src/gallium/drivers/swr/Makefile.am
@@ -55,6 +55,14 @@ libmesaswr_la_CXXFLAGS = \
   $(SWR_AVX_CXXFLAGS) \
   $(COMMON_CXXFLAGS)

+if HAVE_SWR_AVX
+libmesaswr_la_CXXFLAGS += -DHAVE_SWR_AVX
+endif
+
+if HAVE_SWR_AVX2
+libmesaswr_la_CXXFLAGS += -DHAVE_SWR_AVX2
+endif
+
COMMON_SOURCES = \
   $(ARCHRAST_CXX_SOURCES) \
   $(COMMON_CXX_SOURCES) \
@@ -224,7 +232,10 @@ COMMON_LDFLAGS = \
   $(GC_SECTIONS) \
   $(NO_UNDEFINED)

-lib_LTLIBRARIES = libswrAVX.la 
libswrAVX2.la
+lib_LTLIBRARIES =
+
+if HAVE_SWR_AVX
+lib_LTLIBRARIES += libswrAVX.la

libswrAVX_la_CXXFLAGS = \
   $(SWR_AVX_CXXFLAGS) \
@@ -236,7 +247,10 @@ libswrAVX_la_SOURCES = \

libswrAVX_la_LDFLAGS = \
   $(COMMON_LDFLAGS)
+endif

+if HAVE_SWR_AVX2
+lib_LTLIBRARIES += libswrAVX2.la
libswrAVX2_la_CXXFLAGS = \
   $(SWR_AVX2_CXXFLAGS) \
   -DKNOB_ARCH=KNOB_ARCH_AVX2 \
@@ -247,6 +261,7 @@ libswrAVX2_la_SOURCES = \

libswrAVX2_la_LDFLAGS = \
   $(COMMON_LDFLAGS)
+endif

include $(top_srcdir)/install-gallium-links.mk

diff --git a/src/gallium/drivers/swr/SConscript 
b/src/gallium/drivers/swr/SConscript
index cdfb91a5bb..a32807d36b 100644
--- a/src/gallium/drivers/swr/SConscript
+++ b/src/gallium/drivers/swr/SConscript
@@ -245,6 +245,7 @@ 

Re: [Mesa-dev] [PATCH] swr: remove unneeded fallback strcasecmp define

2017-07-17 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 

> On Jul 17, 2017, at 9:34 AM, Emil Velikov  wrote:
> 
> From: Emil Velikov 
> 
> The last user of the function was removed with earlier commit.
> 
> Fixes: 50842e8a931 ("swr: replace gallium->swr format enum conversion")
> Cc: Tim Rowley 
> Signed-off-by: Emil Velikov 
> ---
> Continue and purge all the strcasecmp cases... it is locale specific YAY
> ---
> src/gallium/drivers/swr/swr_screen.cpp | 5 -
> 1 file changed, 5 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
> b/src/gallium/drivers/swr/swr_screen.cpp
> index e88b4551ae9..952ae0c77a5 100644
> --- a/src/gallium/drivers/swr/swr_screen.cpp
> +++ b/src/gallium/drivers/swr/swr_screen.cpp
> @@ -46,11 +46,6 @@
> #include 
> #include 
> 
> -/* MSVC case instensitive compare */
> -#if defined(PIPE_CC_MSVC)
> -   #define strcasecmp lstrcmpiA
> -#endif
> -
> /*
>  * Max texture sizes
>  * XXX Check max texture size values against core and sampler.
> -- 
> 2.13.0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 0/3] swr: Optimize large draws from client arrays.

2017-07-12 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 

> On Jul 12, 2017, at 3:04 PM, Bruce Cherniak  wrote:
> 
> If size of client memory copy is too large, don't copy. The draw will
> access user-buffer directly and then block.  This is faster and more
> efficient than queuing many large client draws.
> 
> Applications that use large draws from client arrays benefit from this.
> VMD is an example.
> 
> The threshold for this path defaults to 32KB.  This value can be
> overridden by setting environment variable SWR_CLIENT_COPY_LIMIT.
> 
> v2: Use #define for default value, rather than hard-coded constant.
> 
> 
> 
> Bruce Cherniak (3):
>  swr: Remove hard-coded constant and "todo" comment.
>  swr: Move environment config options into separate function.
>  swr: Add path to draw directly from client memory without copy.
> 
> src/gallium/drivers/swr/swr_context.h   |  1 +
> src/gallium/drivers/swr/swr_draw.cpp|  9 
> src/gallium/drivers/swr/swr_scratch.cpp |  3 +-
> src/gallium/drivers/swr/swr_screen.cpp  | 73 +
> src/gallium/drivers/swr/swr_screen.h|  2 +
> src/gallium/drivers/swr/swr_state.cpp   | 37 -
> 6 files changed, 87 insertions(+), 38 deletions(-)
> 
> -- 
> 2.11.0
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] swr: Add path to draw directly from client memory without copy.

2017-07-12 Thread Rowley, Timothy O

> On Jul 11, 2017, at 8:20 PM, Bruce Cherniak  wrote:
> 
> If size of client memory copy is too large, don't copy. The draw will
> access user-buffer directly and then block.  This is faster and more
> efficient than queuing many large client draws.
> 
> Applications that use large draws from client arrays benefit from this.
> VMD is an example.
> 
> The threshold for this path defaults to 32KB.  This value can be
> overridden by setting environment variable SWR_CLIENT_COPY_LIMIT.
> ---
> src/gallium/drivers/swr/swr_context.h  |  1 +
> src/gallium/drivers/swr/swr_draw.cpp   |  9 +
> src/gallium/drivers/swr/swr_screen.cpp | 10 +
> src/gallium/drivers/swr/swr_screen.h   |  2 ++
> src/gallium/drivers/swr/swr_state.cpp  | 37 --
> 5 files changed, 48 insertions(+), 11 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/swr_context.h 
> b/src/gallium/drivers/swr/swr_context.h
> index 3ff4bf3e2f..ab3057af96 100644
> --- a/src/gallium/drivers/swr/swr_context.h
> +++ b/src/gallium/drivers/swr/swr_context.h
> @@ -51,6 +51,7 @@
> #define SWR_NEW_FRAMEBUFFER (1 << 15)
> #define SWR_NEW_CLIP (1 << 16)
> #define SWR_NEW_SO (1 << 17)
> +#define SWR_LARGE_CLIENT_DRAW (1<<18) // Indicates client draw will block
> 
> namespace std
> {
> diff --git a/src/gallium/drivers/swr/swr_draw.cpp 
> b/src/gallium/drivers/swr/swr_draw.cpp
> index f26b8e873c..cbd1558624 100644
> --- a/src/gallium/drivers/swr/swr_draw.cpp
> +++ b/src/gallium/drivers/swr/swr_draw.cpp
> @@ -188,6 +188,15 @@ swr_draw_vbo(struct pipe_context *pipe, const struct 
> pipe_draw_info *info)
>info->instance_count,
>info->start,
>info->start_instance);
> +
> +   /* On large client-buffer draw, we used client buffer directly, without
> +* copy.  Block until draw is finished.
> +* VMD is an example application that benefits from this. */
> +   if (ctx->dirty & SWR_LARGE_CLIENT_DRAW) {
> +  struct swr_screen *screen = swr_screen(pipe->screen);
> +  swr_fence_submit(ctx, screen->flush_fence);
> +  swr_fence_finish(pipe->screen, NULL, screen->flush_fence, 0);
> +   }
> }
> 
> 
> diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
> b/src/gallium/drivers/swr/swr_screen.cpp
> index 9b3897ce6b..8be09697e6 100644
> --- a/src/gallium/drivers/swr/swr_screen.cpp
> +++ b/src/gallium/drivers/swr/swr_screen.cpp
> @@ -1066,6 +1066,16 @@ swr_destroy_screen(struct pipe_screen *p_screen)
> static void
> swr_validate_env_options(struct swr_screen *screen)
> {
> +   /* The client_copy_limit sets a maximum on the amount of user-buffer 
> memory
> +* copied to scratch space on a draw.  Past this, the draw will access
> +* user-buffer directly and then block.  This is faster than queuing many
> +* large client draws. */
> +   screen->client_copy_limit = 32768;
> +   int client_copy_limit =
> +  debug_get_num_option("SWR_CLIENT_COPY_LIMIT", 32768);

Could you move the default value into a macro defined at the top of the file, 
so it can be easily spotted in the future?

> +   if (client_copy_limit > 0)
> +  screen->client_copy_limit = client_copy_limit;
> +
>/* XXX msaa under development, disable by default for now */
>screen->msaa_max_count = 0; /* was SWR_MAX_NUM_MULTISAMPLES; */
> 
> diff --git a/src/gallium/drivers/swr/swr_screen.h 
> b/src/gallium/drivers/swr/swr_screen.h
> index dc1bb47f02..6d6d1cb87d 100644
> --- a/src/gallium/drivers/swr/swr_screen.h
> +++ b/src/gallium/drivers/swr/swr_screen.h
> @@ -43,8 +43,10 @@ struct swr_screen {
> 
>struct sw_winsys *winsys;
> 
> +   /* Configurable environment settings */
>boolean msaa_force_enable;
>uint8_t msaa_max_count;
> +   uint32_t client_copy_limit;
> 
>HANDLE hJitMgr;
> };
> diff --git a/src/gallium/drivers/swr/swr_state.cpp 
> b/src/gallium/drivers/swr/swr_state.cpp
> index 45c9c213e5..6c406a37ec 100644
> --- a/src/gallium/drivers/swr/swr_state.cpp
> +++ b/src/gallium/drivers/swr/swr_state.cpp
> @@ -1267,12 +1267,20 @@ swr_update_derived(struct pipe_context *pipe,
> partial_inbounds = 0;
> min_vertex_index = info.min_index;
> 
> -/* Copy only needed vertices to scratch space */
> size = AlignUp(size, 4);
> -const void *ptr = (const uint8_t *) vb->buffer.user + base;
> -ptr = (uint8_t *)swr_copy_to_scratch_space(
> -   ctx, >scratch->vertex_buffer, ptr, size);
> -p_data = (const uint8_t *)ptr - base;
> +/* If size of client memory copy is too large, don't copy. The
> + * draw will access user-buffer directly and then block.  This is
> + * faster than queuing many large client draws. */
> +if (size >= screen->client_copy_limit) {
> +   post_update_dirty_flags |= SWR_LARGE_CLIENT_DRAW;
> +   p_data = (const uint8_t *) vb->buffer.user;
> +} else {

Re: [Mesa-dev] [PATCH 2/2] swr: build driver proper separate from rasterizer

2017-07-10 Thread Rowley, Timothy O

On Jul 10, 2017, at 8:24 AM, Emil Velikov 
> wrote:

Hi Tim,

On 7 July 2017 at 22:25, Tim Rowley 
> wrote:
swr used to build and link the rasterizer to the driver, and to support
multiple architectures we needed to have multiple versions of the
driver/rasterizer combination, which needed to link in much of mesa.

Changing to having one instance of the driver and just building
architecture specific versions of the rasterizer gives a large reduction
in disk space.

libGL.so6464 Kb ->  7000 Kb
libswrAVX.so   10068 Kb ->  5432 Kb
libswrAVX2.so   9828 Kb ->  5200 Kb

If one considers the other binaries which include 
libmesaswr.la
(swr_dri.so, osmesa, etc) savings might be a bit smaller ;-)
Regardless, thank you for working on this.

I do have an ulterior motive in mind for reducing our footprint - there’s a 
couple follow-up patches to come, one of which will make the swr architectures 
we build configurable, and another which will add a KNL architecture (disabled 
by default).


Total  26360 Kb -> 17632 Kb
---
src/gallium/drivers/swr/Makefile.am | 24 +---
src/gallium/drivers/swr/swr_context.cpp |  2 +-
src/gallium/drivers/swr/swr_loader.cpp  | 14 ++
src/gallium/drivers/swr/swr_screen.h|  2 ++
4 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/src/gallium/drivers/swr/Makefile.am 
b/src/gallium/drivers/swr/Makefile.am
index 4b4bd37..e764e0d 100644
--- a/src/gallium/drivers/swr/Makefile.am
+++ b/src/gallium/drivers/swr/Makefile.am
@@ -26,7 +26,13 @@ AM_CXXFLAGS = $(GALLIUM_DRIVER_CFLAGS) $(SWR_CXX11_CXXFLAGS)

noinst_LTLIBRARIES = libmesaswr.la

-libmesaswr_la_SOURCES = $(LOADER_SOURCES)
+libmesaswr_la_SOURCES = \

+   $(COMMON_CXX_SOURCES) \
+   rasterizer/codegen/gen_knobs.cpp \
+   rasterizer/codegen/gen_knobs.h \
These three now seems to be duplicated across the frontend and
AVX/AVX2 backends. Is that intentional?
Worth adding a note?

Yes, that was intentional - our driver looks at a handful of knobs (primarily 
the hardcoded ones in knobs.h, but also one out of gen_knobs.h).  Adding a knob 
query to the api table didn’t really fit the rest of the api, so I decided to 
take the small hit of duplication.  That does mean the driver can no longer 
override knobs for the core, which is why a previous patch moved the tuning of 
the frontend draw split from the driver to the core.

+libmesaswr_la_CXXFLAGS = \
+   $(SWR_AVX_CXXFLAGS) \
+   -DKNOB_ARCH=KNOB_ARCH_AVX \
With his KNOB, the frontend will be build for AVX. What about AVX2?

This is an artifact of api.h including state.h, which contains both api-exposed 
state structures and internal ones which depend on the simd size.  The former 
are simd-size safe, but as our intrinsics layer is included we need to specify 
some architecture to allow a compile.  I chose the lowest common denominator 
architecture in case some simd-using helper function got called.  I’ll look 
into splitting state.h into internal/external in a future commit.

-COMMON_LIBADD = \
-   
$(top_builddir)/src/gallium/auxiliary/libgallium.la \
-   $(top_builddir)/src/mesa/libmesagallium.la \
-   $(LLVM_LIBS)
-
With this gone libswrAVX{,2}_la_LIBADD become empty, so we can drop them.

Will remove.


Can you check that configure --with-gallium-drivers=swr
--enable-gallium-osmesa --disable-dri --enable-glx=gallium-xlib build
fine (needs a second run dropping the latter two options). I cannot
spot anything obvious - just a gut feeling. You might want to sort the
SCons build as well?


gallium-xlib is the configuration we normally build and test with.  A dri 
version builds, but I don’t have a machine that I can actually run it on.

SCons - the build system I keep forgetting.  Working on getting this updated 
and tested for v2 of the patch.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] travis: lower SWR requirement to GCC 4.8, aka std=c++11

2017-07-06 Thread Rowley, Timothy O

On Jul 6, 2017, at 5:39 AM, Emil Velikov 
<emil.l.veli...@gmail.com<mailto:emil.l.veli...@gmail.com>> wrote:

On 5 July 2017 at 19:28, Rowley, Timothy O 
<timothy.o.row...@intel.com<mailto:timothy.o.row...@intel.com>> wrote:

On Jul 4, 2017, at 12:01 PM, Emil Velikov 
<emil.l.veli...@gmail.com<mailto:emil.l.veli...@gmail.com>> wrote:

From: Emil Velikov 
<emil.veli...@collabora.com<mailto:emil.veli...@collabora.com>>

With ealier commit we relaxed the requirement from C++14 to C++11.
Update the build script so that it

Cc: Tim Rowley <timothy.o.row...@intel.com<mailto:timothy.o.row...@intel.com>
Fixes: 0b80b025021 ("swr: relax c++ requirement from c++14 to c++11")
Signed-off-by: Emil Velikov 
<emil.veli...@collabora.com<mailto:emil.veli...@collabora.com>>
---
Tim, this does _not_ quite work, hence I'm sending it as RFC.
The current build failures can be seen here

Autotools
https://travis-ci.org/evelikov/Mesa/jobs/250043586
Scons
https://travis-ci.org/evelikov/Mesa/jobs/250043595


gcc-4.8 is throwing up some issues; I’ve patched them in the submission I just 
made to mesa-stable (swr: modifications to allow gcc-4.8 compilation).  Still 
get a weird link problem (undefined reference to 
llvm::RTDyldMemoryManager::getSymbolAddressInProcess), but that looks to be 
more of a core gallium problem.

Can we have the GCC 4.8 fix in master first? On the linking issue - I
recall similar one, and updating LLVM solved it.
Let me see if I can dig up some details.

mesa-master swr intrinsic usage has changed a fair bit with the switchover to 
simdlib - half the patch proposed for mesa-stable still applies, with one other 
change needed.  Both of these have been sent to mesa-dev for review.  LLVM 
linkage continues to be a problem for gcc-4.8 on mesa-master.


Although I expect additional issues, since w/o the patch a local build
throws the following:

src/gallium/drivers/swr/rasterizer/core/api.cpp: In function ‘void* 
SwrCreateContext(SWR_CREATECONTEXT_INFO*)’:
src/gallium/drivers/swr/rasterizer/core/api.cpp:111:64: warning: ‘new’ of type 
‘SWR_STATS’ with extended alignment 64 [-Waligned-new=]
   pContext->pStats = new SWR_STATS[pContext->NumWorkerThreads];
  ^
src/gallium/drivers/swr/rasterizer/core/api.cpp:111:64: note: uses ‘void* 
operator new [](std::size_t)’, which does not have an alignment parameter
src/gallium/drivers/swr/rasterizer/core/api.cpp:111:64: note: use 
‘-faligned-new’ to enable C++17 over-aligned new support

src/gallium/drivers/swr/rasterizer/core/threads.cpp: In function ‘void 
CreateThreadPool(SWR_CONTEXT*, THREAD_POOL*)’:
src/gallium/drivers/swr/rasterizer/core/threads.cpp:989:72: warning: ‘new’ of 
type ‘SWR_STATS’ with extended alignment 64 [-Waligned-new=]
   pContext->dcRing[dc].dynState.pStats = new SWR_STATS[numThreads];
  ^
src/gallium/drivers/swr/rasterizer/core/threads.cpp:989:72: note: uses ‘void* 
operator new [](std::size_t)’, which does not have an alignment parameter
src/gallium/drivers/swr/rasterizer/core/threads.cpp:989:72: note: use 
‘-faligned-new’ to enable C++17 over-aligned new support

These are curious - I’m not seeing warnings like this on gcc-4.8.5.

Using gcc (GCC) 7.1.1 20170528, but I've seen it with earlier GCC 7 versions.

Checking SWR_STATS and expanding all the macros gives us:

struct __attribute__((aligned(64))) SWR_STATS
{
   uint64_t DepthPassCount;

   uint64_t PsInvocations;
   uint64_t CsInvocations;
};

... which strictly speaking one does not even need the aligned bits.
Dropping the ALIGN bits may lead to breakage later on.
Which one should be able to catch at compile time via a combination of
static_assert and sizeof/offsetof.

Installed gcc-7.1 and now see these warnings.  Alignment was intentionally 
added to the stats structure and gave a good speedup with msvc at the time (a 
bit more than a year ago).  Investigating if this is still the case and if so 
what the clean c++11 friendly solution is.


Thanks
Emil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] travis: lower SWR requirement to GCC 4.8, aka std=c++11

2017-07-05 Thread Rowley, Timothy O

> On Jul 4, 2017, at 12:01 PM, Emil Velikov  wrote:
> 
> From: Emil Velikov 
> 
> With ealier commit we relaxed the requirement from C++14 to C++11.
> Update the build script so that it
> 
> Cc: Tim Rowley  Fixes: 0b80b025021 ("swr: relax c++ requirement from c++14 to c++11")
> Signed-off-by: Emil Velikov 
> ---
> Tim, this does _not_ quite work, hence I'm sending it as RFC.
> The current build failures can be seen here
> 
> Autotools
> https://travis-ci.org/evelikov/Mesa/jobs/250043586
> Scons
> https://travis-ci.org/evelikov/Mesa/jobs/250043595
> 

gcc-4.8 is throwing up some issues; I’ve patched them in the submission I just 
made to mesa-stable (swr: modifications to allow gcc-4.8 compilation).  Still 
get a weird link problem (undefined reference to 
llvm::RTDyldMemoryManager::getSymbolAddressInProcess), but that looks to be 
more of a core gallium problem.

> Although I expect additional issues, since w/o the patch a local build 
> throws the following:
> 
> src/gallium/drivers/swr/rasterizer/core/api.cpp: In function ‘void* 
> SwrCreateContext(SWR_CREATECONTEXT_INFO*)’:
> src/gallium/drivers/swr/rasterizer/core/api.cpp:111:64: warning: ‘new’ of 
> type ‘SWR_STATS’ with extended alignment 64 [-Waligned-new=]
> pContext->pStats = new SWR_STATS[pContext->NumWorkerThreads];
>^
> src/gallium/drivers/swr/rasterizer/core/api.cpp:111:64: note: uses ‘void* 
> operator new [](std::size_t)’, which does not have an alignment parameter
> src/gallium/drivers/swr/rasterizer/core/api.cpp:111:64: note: use 
> ‘-faligned-new’ to enable C++17 over-aligned new support
> 
> src/gallium/drivers/swr/rasterizer/core/threads.cpp: In function ‘void 
> CreateThreadPool(SWR_CONTEXT*, THREAD_POOL*)’:
> src/gallium/drivers/swr/rasterizer/core/threads.cpp:989:72: warning: ‘new’ of 
> type ‘SWR_STATS’ with extended alignment 64 [-Waligned-new=]
> pContext->dcRing[dc].dynState.pStats = new SWR_STATS[numThreads];
>^
> src/gallium/drivers/swr/rasterizer/core/threads.cpp:989:72: note: uses ‘void* 
> operator new [](std::size_t)’, which does not have an alignment parameter
> src/gallium/drivers/swr/rasterizer/core/threads.cpp:989:72: note: use 
> ‘-faligned-new’ to enable C++17 over-aligned new support

These are curious - I’m not seeing warnings like this on gcc-4.8.5.

> 
> 
> .travis.yml | 12 
> 1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/.travis.yml b/.travis.yml
> index 4fde6f45f4a..fa52bf96f16 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -57,8 +57,8 @@ matrix:
> - MAKE_CHECK_COMMAND="true"
> - LLVM_VERSION=3.9
> - LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
> -- OVERRIDE_CC="gcc-5"
> -- OVERRIDE_CXX="g++-5"
> +- OVERRIDE_CC="gcc-4.8"
> +- OVERRIDE_CXX="g++-4.8"
> - DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
> - DRI_DRIVERS=""
> - GALLIUM_ST="--enable-dri --disable-opencl --disable-xa 
> --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx 
> --disable-gallium-osmesa"
> @@ -67,13 +67,11 @@ matrix:
>   addons:
> apt:
>   sources:
> -- ubuntu-toolchain-r-test
> - llvm-toolchain-trusty-3.9
>   packages:
> # LLVM packaging is broken and misses these dependencies
> - libedit-dev
> # From sources above
> -- g++-5
> - llvm-3.9-dev
> # Common
> - xz-utils
> @@ -250,19 +248,17 @@ matrix:
> - LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
> # Keep it symmetrical to the make build. There's no actual SWR, yet.
> - SCONS_CHECK_COMMAND="true"
> -- OVERRIDE_CC="gcc-5"
> -- OVERRIDE_CXX="g++-5"
> +- OVERRIDE_CC="gcc-4.8"
> +- OVERRIDE_CXX="g++-4.8"
>   addons:
> apt:
>   sources:
> -- ubuntu-toolchain-r-test
> - llvm-toolchain-trusty-3.9
>   packages:
> - scons
> # LLVM packaging is broken and misses these dependencies
> - libedit-dev
> # From sources above
> -- g++-5
> - llvm-3.9-dev
> # Common
> - xz-utils
> -- 
> 2.13.0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: Minor cleanup of variable usage, no functional change.

2017-06-30 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 

> On Jun 29, 2017, at 2:41 PM, Bruce Cherniak  wrote:
> 
> In swr_update_derived, for consistency, index buffer validation should
> be using the p_draw_info copy "info" rather than referencing
> p_draw_info.
> 
> No functional change.
> ---
> src/gallium/drivers/swr/swr_state.cpp | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/swr_state.cpp 
> b/src/gallium/drivers/swr/swr_state.cpp
> index 7a8786d96f..03dc324afe 100644
> --- a/src/gallium/drivers/swr/swr_state.cpp
> +++ b/src/gallium/drivers/swr/swr_state.cpp
> @@ -1293,7 +1293,7 @@ swr_update_derived(struct pipe_context *pipe,
>  const uint8_t *p_data;
>  uint32_t size, pitch;
> 
> - pitch = p_draw_info->index_size ? p_draw_info->index_size : 
> sizeof(uint32_t);
> + pitch = info.index_size ? info.index_size : sizeof(uint32_t);
>  index_type = swr_convert_index_type(pitch);
> 
>  if (!info.has_user_indices) {
> @@ -1319,7 +1319,7 @@ swr_update_derived(struct pipe_context *pipe,
>  }
> 
>  SWR_INDEX_BUFFER_STATE swrIndexBuffer;
> - swrIndexBuffer.format = 
> swr_convert_index_type(p_draw_info->index_size);
> + swrIndexBuffer.format = swr_convert_index_type(info.index_size);
>  swrIndexBuffer.pIndices = p_data;
>  swrIndexBuffer.size = size;
> 
> -- 
> 2.11.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/8] swr/rast: Split backend.cpp to improve compile time

2017-06-29 Thread Rowley, Timothy O

> On Jun 28, 2017, at 3:56 AM, Emil Velikov <emil.l.veli...@gmail.com> wrote:
> 
> On 26 June 2017 at 17:14, Rowley, Timothy O <timothy.o.row...@intel.com> 
> wrote:
>> 
>> On Jun 26, 2017, at 7:57 AM, Emil Velikov <emil.l.veli...@gmail.com> wrote:
> 
>>>> +.INTERMEDIATE: backend.intermediate
>>>> 
>>> I have limited experience with .INTERMEDIATE and it didn't seem to
>>> bring single/incremental build times improvements.
>>> Have you seen any on your end? If not I'll just drop it.
>> 
>> 
>> I’m not really familiar with .INTERMEDIATE myself; found it when googling
>> around looking for a way to specify a code generator rule that produced
>> multiple files.  If there’s a better/cleaner way of doing this I’d like to
>> hear about it.
>> 
> AFAICT one can omit the line all together. I doubt it will hurt
> anything so don't bother removing it, just yet.
> 
>>> Hardcoding file names in generator scripts tends to be a bad idea. One
>>> example is the extra code needed to generate the cmake bits :-)
>>> One could prune that, but it's not a priority AFAICT.
>>> 
>>> 
>> I would like to be able to wildcard on the generated name, but it seems that
>> automake wants to have a static list of filenames at invocation.  Our cmake
>> approach internally generates a cmake fragment that is included by the
>> parent cmake, which is a little confusing but adds flexibility.
>> 
> Automake can use wildcards and template/suffix rules. Although we try
> to omit the former.
> 
> Can you share the flexibility points - I can only think of drawbacks.
> 
> I'm assuming your flow is as follows:
> A Ok, let's build the project
> B First go into go and execute a 'random file with magic arguments'
> C Now you can build via cmake
> 
> Doing any of the following and you're in a world of hurt:
> - Forget to do B - enjoy the strange error messages that you'll get ;-)
> - Do B, even when the file or any of it's dependencies have not changed.
> Your cmake files (and hence whole build) will be rebuild unnecessarily.
> 
> I think you/the team might have experienced some of those :-P
> Either way, I'll stop now since it's getting a tad much.

Ok, dug into the internal build system for this and the cmake solution isn’t as 
elegant as I thought.  We have a wrapper python script that runs cmake in the 
codegen directory to generate the cmake fragments and generated files based on 
the dependencies, then a normal cmake for the actual build that includes these 
fragments.  As long as people use that wrapper script everything builds without 
unnecessary work.

> -Emil
> P.S. Can we bribe you back to use plain text emails - I had to redo
> the email formatting :-\

This should be plain text.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: Remove need to allocate vertex buffer scratch space all in one go.

2017-06-29 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Jun 28, 2017, at 1:42 PM, Bruce Cherniak 
> wrote:

Deferred deletion (via "fence_work") has obsoleted the need to allocate
all client vertex buffer scratch space in a single chunk.  Scratch
allocations are now valid until the referenced fence is complete.
---
src/gallium/drivers/swr/swr_state.cpp | 25 ++---
1 file changed, 2 insertions(+), 23 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index 6dc06ed156..7a8786d96f 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -1219,32 +1219,12 @@ swr_update_derived(struct pipe_context *pipe,
*/
   if (ctx->dirty & SWR_NEW_VERTEX ||
  (p_draw_info && p_draw_info->index_size)) {
-  uint32_t scratch_total;
-  uint8_t *scratch = NULL;

  /* If being called by swr_draw_vbo, copy draw details */
  struct pipe_draw_info info = {0};
  if (p_draw_info)
 info = *p_draw_info;

-  /* We must get all the scratch space in one go */
-  scratch_total = 0;
-  for (UINT i = 0; i < ctx->num_vertex_buffers; i++) {
- struct pipe_vertex_buffer *vb = >vertex_buffer[i];
-
- if (!vb->is_user_buffer)
-continue;
-
- uint32_t elems, base, size;
- swr_user_vbuf_range(, ctx->velems, vb, i, , , );
- scratch_total += AlignUp(size, 4);
-  }
-
-  if (scratch_total) {
- scratch = (uint8_t *)swr_copy_to_scratch_space(
-   ctx, >scratch->vertex_buffer, NULL, scratch_total);
-  }
-
  /* vertex buffers */
  SWR_VERTEX_BUFFER_STATE swrVertexBuffers[PIPE_MAX_ATTRIBS];
  for (UINT i = 0; i < ctx->num_vertex_buffers; i++) {
@@ -1289,9 +1269,8 @@ swr_update_derived(struct pipe_context *pipe,
/* Copy only needed vertices to scratch space */
size = AlignUp(size, 4);
const void *ptr = (const uint8_t *) vb->buffer.user + base;
-memcpy(scratch, ptr, size);
-ptr = scratch;
-scratch += size;
+ptr = (uint8_t *)swr_copy_to_scratch_space(
+   ctx, >scratch->vertex_buffer, ptr, size);
p_data = (const uint8_t *)ptr - base;
 }

--
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: conditionally validate vertex buffer state

2017-06-29 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Jun 27, 2017, at 5:49 PM, Bruce Cherniak 
> wrote:

Vertex buffer state doesn't need to be validated on every call,
only on dirty _NEW_VERTEX or indexed draws.

Unconditional validation was introduced as part of patch 330d0607ed6,
"remove pipe_index_buffer and set_index_buffer", with the expectation
we'd optimize later.
---
src/gallium/drivers/swr/swr_state.cpp | 9 +
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index f65e642753..6dc06ed156 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -1212,12 +1212,13 @@ swr_update_derived(struct pipe_context *pipe,
  SwrSetViewports(ctx->swrContext, 1, vp, vpm);
   }

-   /* Set vertex & index buffers */
-   /* (using draw info if called by swr_draw_vbo) */
-   /* TODO: This is always true, because the index buffer comes from
+   /* Set vertex & index buffers
+* (using draw info if called by swr_draw_vbo)
+* If indexed draw, revalidate since index buffer comes from
* pipe_draw_info.
*/
-   if (1 || ctx->dirty & SWR_NEW_VERTEX) {
+   if (ctx->dirty & SWR_NEW_VERTEX ||
+  (p_draw_info && p_draw_info->index_size)) {
  uint32_t scratch_total;
  uint8_t *scratch = NULL;

--
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/8] swr/rast: Switch intrinsic usage to SIMDLib

2017-06-26 Thread Rowley, Timothy O

> On Jun 26, 2017, at 8:11 AM, Emil Velikov  wrote:
> 
> Hi Tim,
> 
> On 22 June 2017 at 22:13, Tim Rowley  wrote:
>> Switch from a macro-based simd intrinsics layer to a more C++
>> implementation, which also adds AVX512 optimizations to 128-bit
>> and 256-bit SIMD.
> 
>> +   rasterizer/common/simdlib_128_avx.inl \
>> +   rasterizer/common/simdlib_128_avx2.inl \
>> +   rasterizer/common/simdlib_128_avx512.inl \
>> +   rasterizer/common/simdlib_256_avx.inl \
>> +   rasterizer/common/simdlib_256_avx2.inl \
>> +   rasterizer/common/simdlib_256_avx512.inl \
>> +   rasterizer/common/simdlib_512_avx512.inl \
>> +   rasterizer/common/simdlib_512_avx512_masks.inl \
>> +   rasterizer/common/simdlib_512_emu.inl \
>> +   rasterizer/common/simdlib_512_emu_masks.inl \
> Your commit message said "make dist/check" but I'd imagine you used
> SCons for the whole series as well, correct?
> 
> Merely double-checking as some versions of SCons had issues with non
> {c,h}{,pp} files listed as sources, IIRC. Sadly I don't recall the
> specifics.

I thought this had been tested internally on scons; turns out I was mistaken - 
we will get it working for the next version of the commit.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] swr/rast: Split rasterizer.cpp to improve compile times

2017-06-26 Thread Rowley, Timothy O

On Jun 26, 2017, at 8:02 AM, Emil Velikov 
> wrote:

On 22 June 2017 at 22:13, Tim Rowley 
> wrote:
Hardcode split to four files currently.  Decreases swr build
time on KNL by over 50%.
Out of curiosity what is KNL?

KNL is the Intel Xeon Phi x200 Processor family, codenamed Knights Landing, 
which has between 64 and 72 cores with AVX512.

Also, over 50% decrease - time to pop the champagne ;-)

---
src/gallium/drivers/swr/Makefile.am|   36 +-
src/gallium/drivers/swr/Makefile.sources   |2 +-
src/gallium/drivers/swr/SConscript |   24 +-
.../drivers/swr/rasterizer/codegen/gen_backends.py |   15 +-
.../codegen/templates/gen_rasterizer.cpp   |   42 +
src/gallium/drivers/swr/rasterizer/core/api.cpp|1 +
.../drivers/swr/rasterizer/core/multisample.cpp|   48 -
.../drivers/swr/rasterizer/core/rasterizer.cpp | 1788 +++-
.../drivers/swr/rasterizer/core/rasterizer.h   |   31 +-
.../drivers/swr/rasterizer/core/rasterizer_impl.h  | 1376 +++
10 files changed, 1738 insertions(+), 1625 deletions(-)
create mode 100644 
src/gallium/drivers/swr/rasterizer/codegen/templates/gen_rasterizer.cpp
delete mode 100644 src/gallium/drivers/swr/rasterizer/core/multisample.cpp
create mode 100644 src/gallium/drivers/swr/rasterizer/core/rasterizer_impl.h

diff --git a/src/gallium/drivers/swr/Makefile.am 
b/src/gallium/drivers/swr/Makefile.am
index 0daec90..1a69cfc 100644
--- a/src/gallium/drivers/swr/Makefile.am
+++ b/src/gallium/drivers/swr/Makefile.am
@@ -67,7 +67,12 @@ BUILT_SOURCES = \
   rasterizer/core/backends/gen_BackendPixelRate1.cpp \
   rasterizer/core/backends/gen_BackendPixelRate2.cpp \
   rasterizer/core/backends/gen_BackendPixelRate3.cpp \
-   rasterizer/core/backends/gen_BackendPixelRate.hpp
+   rasterizer/core/backends/gen_BackendPixelRate.hpp \
+   rasterizer/core/backends/gen_rasterizer0.cpp \
+   rasterizer/core/backends/gen_rasterizer1.cpp \
+   rasterizer/core/backends/gen_rasterizer2.cpp \
+   rasterizer/core/backends/gen_rasterizer3.cpp \
+   rasterizer/core/backends/gen_rasterizer.hpp

MKDIR_GEN = $(AM_V_at)$(MKDIR_P) $(@D)
PYTHON_GEN = $(AM_V_GEN)$(PYTHON2) $(PYTHON_FLAGS)
@@ -170,6 +175,32 @@ backend.intermediate: rasterizer/codegen/gen_backends.py 
rasterizer/codegen/temp
   --cpp \
   --hpp

+rasterizer/core/backends/gen_rasterizer0.cpp \
+rasterizer/core/backends/gen_rasterizer1.cpp \
+rasterizer/core/backends/gen_rasterizer2.cpp \
+rasterizer/core/backends/gen_rasterizer3.cpp \
+rasterizer/core/backends/gen_rasterizer.hpp: \
+rasterizer.intermediate
+
+# 5 SWR_MULTISAMPLE_TYPE_COUNT
+# 2 CenterPattern
+# 2 Conservative
+# 3 SWR_INPUT_COVERAGE_COUNT
+# 5 STATE_VALID_TRI_EDGE_COUNT
+# 2 RasterScissorEdges
+
+.INTERMEDIATE: rasterizer.intermediate
Same question/suggestion as in PATCH 1 - please add a note (helps XXX) or drop

With that from build POV
Reviewed-by: Emil Velikov 
>

Mentioned in my other mail that I’m not wed to the .INTERMEDIATE approach; I’ll 
address this the same way we decide upon for the backend split-up.


--- a/src/gallium/drivers/swr/rasterizer/codegen/gen_backends.py
+++ b/src/gallium/drivers/swr/rasterizer/codegen/gen_backends.py

+
+args = parser.parse_args(args)

-args = parser.parse_args(args);

Unrelated cleanup?

I’ll try to pull the cleanups into a separate commit.


-Emil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/8] swr/rast: Split backend.cpp to improve compile time

2017-06-26 Thread Rowley, Timothy O

On Jun 26, 2017, at 7:57 AM, Emil Velikov 
> wrote:

Hi Tim,

On 22 June 2017 at 22:13, Tim Rowley 
> wrote:
Hardcode split to four files currently.  Decreases swr build
time on a quad-core by ~10%.
---
src/gallium/drivers/swr/Makefile.am|   26 +-
src/gallium/drivers/swr/Makefile.sources   |4 +
src/gallium/drivers/swr/SConscript |   19 +-
.../drivers/swr/rasterizer/codegen/gen_backends.py |   38 +-
.../drivers/swr/rasterizer/codegen/gen_common.py   |7 +
.../rasterizer/codegen/templates/gen_backend.cpp   |1 +
.../codegen/templates/gen_header_init.hpp  |   43 +
src/gallium/drivers/swr/rasterizer/core/api.cpp|7 +-
.../drivers/swr/rasterizer/core/backend.cpp|  809 +--
src/gallium/drivers/swr/rasterizer/core/backend.h  | 1033 +--
.../drivers/swr/rasterizer/core/backend_clear.cpp  |  281 ++
.../drivers/swr/rasterizer/core/backend_impl.h | 1067 
.../drivers/swr/rasterizer/core/backend_sample.cpp |  345 +++
.../swr/rasterizer/core/backend_singlesample.cpp   |  321 ++
14 files changed, 2160 insertions(+), 1841 deletions(-)
create mode 100644 
src/gallium/drivers/swr/rasterizer/codegen/templates/gen_header_init.hpp
create mode 100644 src/gallium/drivers/swr/rasterizer/core/backend_clear.cpp
create mode 100644 src/gallium/drivers/swr/rasterizer/core/backend_impl.h
create mode 100644 src/gallium/drivers/swr/rasterizer/core/backend_sample.cpp
create mode 100644 
src/gallium/drivers/swr/rasterizer/core/backend_singlesample.cpp

diff --git a/src/gallium/drivers/swr/Makefile.am 
b/src/gallium/drivers/swr/Makefile.am
index 6650abd..0daec90 100644
--- a/src/gallium/drivers/swr/Makefile.am
+++ b/src/gallium/drivers/swr/Makefile.am
@@ -34,6 +34,7 @@ COMMON_CXXFLAGS = \
   $(LLVM_CXXFLAGS) \
   $(SWR_CXX11_CXXFLAGS) \
   -I$(builddir)/rasterizer/codegen \
+   -I$(builddir)/rasterizer/core \
   -I$(builddir)/rasterizer/jitter \
   -I$(builddir)/rasterizer/archrast \
   -I$(srcdir)/rasterizer \
@@ -62,7 +63,11 @@ BUILT_SOURCES = \
   rasterizer/archrast/gen_ar_event.cpp \
   rasterizer/archrast/gen_ar_eventhandler.hpp \
   rasterizer/archrast/gen_ar_eventhandlerfile.hpp \
-   rasterizer/core/gen_BackendPixelRate0.cpp
+   rasterizer/core/backends/gen_BackendPixelRate0.cpp \
+   rasterizer/core/backends/gen_BackendPixelRate1.cpp \
+   rasterizer/core/backends/gen_BackendPixelRate2.cpp \
+   rasterizer/core/backends/gen_BackendPixelRate3.cpp \
+   rasterizer/core/backends/gen_BackendPixelRate.hpp

MKDIR_GEN = $(AM_V_at)$(MKDIR_P) $(@D)
PYTHON_GEN = $(AM_V_GEN)$(PYTHON2) $(PYTHON_FLAGS)
@@ -140,20 +145,30 @@ rasterizer/archrast/gen_ar_eventhandlerfile.hpp: 
rasterizer/codegen/gen_archrast
   --output rasterizer/archrast/gen_ar_eventhandlerfile.hpp \
   --gen_eventhandlerfile_h

+rasterizer/core/backends/gen_BackendPixelRate0.cpp \
+rasterizer/core/backends/gen_BackendPixelRate1.cpp \
+rasterizer/core/backends/gen_BackendPixelRate2.cpp \
+rasterizer/core/backends/gen_BackendPixelRate3.cpp \
+rasterizer/core/backends/gen_BackendPixelRate.hpp: \
+backend.intermediate
+
# 5 SWR_MULTISAMPLE_TYPE_COUNT
# 2 SWR_MSAA_SAMPLE_PATTERN_COUNT
# 3 SWR_INPUT_COVERAGE_COUNT
# 2 centroid
# 2 forcedSampleCount
# 2 canEarlyZ
-rasterizer/core/gen_BackendPixelRate0.cpp: rasterizer/codegen/gen_backends.py 
rasterizer/codegen/templates/gen_backend.cpp
+
+.INTERMEDIATE: backend.intermediate
I have limited experience with .INTERMEDIATE and it didn't seem to
bring single/incremental build times improvements.
Have you seen any on your end? If not I'll just drop it.

I’m not really familiar with .INTERMEDIATE myself; found it when googling 
around looking for a way to specify a code generator rule that produced 
multiple files.  If there’s a better/cleaner way of doing this I’d like to hear 
about it.


+backend.intermediate: rasterizer/codegen/gen_backends.py 
rasterizer/codegen/templates/gen_backend.cpp 
rasterizer/codegen/templates/gen_header_init.hpp
   $(MKDIR_GEN)
   $(PYTHON_GEN) \
   $(srcdir)/rasterizer/codegen/gen_backends.py \
-   --outdir rasterizer/core \
+   --outdir rasterizer/core/backends \
   --dim 5 2 3 2 2 2 \
-   --split 0 \
-   --cpp
+   --numfiles 4 \
+   --cpp \
+   --hpp

Hardcoding file names in generator scripts tends to be a bad idea. One
example is the extra code needed to generate the cmake bits :-)
One could prune that, but it's not a priority AFAICT.

I would like to be able to wildcard on the generated name, but it seems that 
automake wants to have a static list of filenames at invocation.  Our cmake 
approach internally generates a cmake fragment that is included by 

Re: [Mesa-dev] [PATCH 0/8] swr: update rasterizer

2017-06-26 Thread Rowley, Timothy O

> On Jun 26, 2017, at 7:41 AM, Emil Velikov  wrote:
> On 22 June 2017 at 22:12, Tim Rowley  wrote:
>> Highlights include splitting the heavily templated files into multiple
>> chunks to speed compile (2x for a large machine), and switching the
>> simd intrinsic usage from a macro-based header to a more c++ feeling
>> library.
>> 
> Yay \o/. Out of curiosity - does the simd library bring much more
> apart from a C++ feel?

A couple major intentions, mainly to produce better code for avx512:

* hide the differences in masking operations - avx/avx2 uses a normal ymm 
register for masking, while avx512 has separate mask registers

* allow reduced vector width operations to be implemented in terms of avx512 
code, so that a larger register set and mask registers can be used

> Did you notice the errors in the Travis build [1]? For some reason
> they don't flag up when building locally, although a few C++17
> warnings did pop-up. Speaking for which since we're back to C++11 for
> SWR can we toggle back to GCC 4.8(.1) for Travis?
> 
> Can you guys look at those, please... in case you haven't already.

Sorry, had a patch for this ready to go Friday, but we were working through 
some other issues and I forgot to send it to the list.  I’ve done so now.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: set an explicit clear_rect if scissor is not enabled.

2017-06-26 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Jun 26, 2017, at 10:26 AM, Bruce Cherniak 
> wrote:

Fix regression of "no rendering" on simple apps like glxgears by
setting an explicit full surface clear_rect when scissor is not
enabled.

This regressed with commit 00173d91 "st/mesa: don't set 16
scissors and 16 viewports if they're unused" due to an assumption
that a default scissor rect is always set, which was the case prior
to this optimization.
---
src/gallium/drivers/swr/swr_clear.cpp | 10 +-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_clear.cpp 
b/src/gallium/drivers/swr/swr_clear.cpp
index 53f4e02d45..3a35805a7a 100644
--- a/src/gallium/drivers/swr/swr_clear.cpp
+++ b/src/gallium/drivers/swr/swr_clear.cpp
@@ -68,11 +68,19 @@ swr_clear(struct pipe_context *pipe,
   ((union pipe_color_union *)color)->f[3] = 1.0; /* cast off your const'd-ness 
*/
#endif

+   SWR_RECT clear_rect;
+   /* If enabled, clear to scissor; otherwise clear full surface */
+   if (ctx->rasterizer && ctx->rasterizer->scissor) {
+  clear_rect = ctx->swr_scissor;
+   } else {
+  clear_rect = {0, 0, (int32_t)fb->width, (int32_t)fb->height};
+   }
+
   for (unsigned i = 0; i < layers; ++i) {
  swr_update_draw_context(ctx);
  SwrClearRenderTarget(ctx->swrContext, clearMask, i,
   color->f, depth, stencil,
-   ctx->swr_scissor);
+   clear_rect);

  // Mask out the attachments that are out of layers.
  if (fb->zsbuf &&
--
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: invalidate attachment on transition change

2017-06-22 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Jun 20, 2017, at 11:42 AM, George Kyriazis 
> wrote:

Consider the following RT attachment order:
1. Attach surfaces attachments 0 & 1, and render with them
2. Detach 0 & 1
3. Re-attach 0 & 1 to different surfaces
4. Render with the new attachment

The definition of a tile being resolved is that local changes have been
flushed out to the surface, hence there is no need to reload the tile before
it's written to.  For an invalid tile, the tile has to be reloaded from
the surface before rendering.

Stage (2) was marking hot tiles for attachements 0 & 1 as RESOLVED,
which means that the hot tiles can be written out to memory with no
need to read them back in (they are "clean").  They need to be marked as
resolved here, because a surface may be destroyed after a detach, and we
don't want to have un-resolved tiles that may force a readback from a
NULL (destroyed) surface.  (Part of a destroy is detach all attachments first)

Stage (3), during the no att -> att transition, we  need to realize that the
"new" surface tiles need to be fetched fresh from the new surface, instead
of using the resolved tiles, that belong to a stale attachment.

This is done by marking the hot tiles as invalid in stage (3), when we realize
that a new attachment is being made, so that they are re-fetched during
rendering in stage (4).

Also note that hot tiles are indexed by attachment.

- Fixes VTK dual depth-peeling tests.
- No piglit changes
---
src/gallium/drivers/swr/swr_draw.cpp   | 19 +++
src/gallium/drivers/swr/swr_resource.h |  4 
src/gallium/drivers/swr/swr_state.cpp  |  5 +
3 files changed, 28 insertions(+)

diff --git a/src/gallium/drivers/swr/swr_draw.cpp 
b/src/gallium/drivers/swr/swr_draw.cpp
index 03c82a7..ac300e2 100644
--- a/src/gallium/drivers/swr/swr_draw.cpp
+++ b/src/gallium/drivers/swr/swr_draw.cpp
@@ -215,6 +215,25 @@ swr_finish(struct pipe_context *pipe)
   swr_fence_reference(pipe->screen, , NULL);
}

+/*
+ * Invalidate tiles so they can be reloaded back when needed
+ */
+void
+swr_invalidate_render_target(struct pipe_context *pipe,
+ uint32_t attachment,
+ uint16_t width, uint16_t height)
+{
+   struct swr_context *ctx = swr_context(pipe);
+
+   /* grab the rect from the passed in arguments */
+   swr_update_draw_context(ctx);
+   SWR_RECT full_rect =
+  {0, 0, (int32_t)width, (int32_t)height};
+   SwrInvalidateTiles(ctx->swrContext,
+  1 << attachment,
+  full_rect);
+}
+

/*
 * Store SWR HotTiles back to renderTarget surface.
diff --git a/src/gallium/drivers/swr/swr_resource.h 
b/src/gallium/drivers/swr/swr_resource.h
index ae9954c..4effd46 100644
--- a/src/gallium/drivers/swr/swr_resource.h
+++ b/src/gallium/drivers/swr/swr_resource.h
@@ -96,6 +96,10 @@ swr_resource_data(struct pipe_resource *resource)
}


+void swr_invalidate_render_target(struct pipe_context *pipe,
+  uint32_t attachment,
+  uint16_t width, uint16_t height);
+
void swr_store_render_target(struct pipe_context *pipe,
 uint32_t attachment,
 enum SWR_TILE_STATE post_tile_state);
diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index 08549e5..deae4e6 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -933,6 +933,11 @@ swr_change_rt(struct swr_context *ctx,
   * INVALID so they are reloaded from surface. */
  swr_store_render_target(>pipe, attachment, SWR_TILE_INVALID);
  need_fence = true;
+   } else {
+  /* if no previous attachment, invalidate tiles that may be marked
+   * RESOLVED because of an old attachment */
+  swr_invalidate_render_target(>pipe, attachment, sf->width, 
sf->height);
+  /* no need to set fence here */
   }

   /* Make new attachment */
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr/rast: Include definition of missing function

2017-06-20 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Jun 20, 2017, at 12:53 PM, George Kyriazis 
> wrote:

Inline function SWR_MULTISAMPLE_POS::PrecalcSampleData() was missing
definition.  Include definition in core/state_funcs.h.

Fixes windows build.
---
src/gallium/drivers/swr/swr_state.cpp | 1 +
1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index c87393c..12da99f 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -31,6 +31,7 @@
#include "jit_api.h"
#include "gen_state_llvm.h"
#include "core/multisample.h"
+#include "core/state_funcs.h"

#include "gallivm/lp_bld_tgsi.h"
#include "util/u_format.h"
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] swr: Don't crash when encountering a VBO with stride = 0.

2017-06-16 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Jun 15, 2017, at 11:24 AM, Bruce Cherniak 
> wrote:

The swr driver uses vertex_buffer->stride to determine the number
of elements in a VBO. A recent change to the state-tracker made it
possible for VBO's with stride=0. This resulted in a divide by zero
crash in the driver. The solution is to use the pre-calculated vertex
element stream_pitch in this case.

This patch fixes the crash in a number of piglit and VTK tests introduced
by 17f776c27be266f2.

There are several VTK tests that still crash and need proper handling of
vertex_buffer_index.  This will come in a follow-on patch.

v2: Correctly update all parameters for VBO constants (stride = 0).
   Also fixes the remaining crashes/regressions that v1 did
   not address, without touching vertex_buffer_index.
---
src/gallium/drivers/swr/swr_state.cpp | 25 ++---
1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index 08549e51a1..316872581d 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -1247,13 +1247,24 @@ swr_update_derived(struct pipe_context *pipe,

 pitch = vb->stride;
 if (!vb->is_user_buffer) {
-/* VBO
- * size is based on buffer->width0 rather than info.max_index
- * to prevent having to validate VBO on each draw */
-size = vb->buffer.resource->width0;
-elems = size / pitch;
-partial_inbounds = size % pitch;
-min_vertex_index = 0;
+/* VBO */
+if (!pitch) {
+   /* If pitch=0 (ie vb->stride), buffer contains a single
+* constant attribute.  Use the stream_pitch which was
+* calculated during creation of vertex_elements_state for the
+* size of the attribute. */
+   size = ctx->velems->stream_pitch[i];
+   elems = 1;
+   partial_inbounds = 0;
+   min_vertex_index = 0;
+} else {
+   /* size is based on buffer->width0 rather than info.max_index
+* to prevent having to validate VBO on each draw. */
+   size = vb->buffer.resource->width0;
+   elems = size / pitch;
+   partial_inbounds = size % pitch;
+   min_vertex_index = 0;
+}

p_data = swr_resource_data(vb->buffer.resource) + vb->buffer_offset;
 } else {
--
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] configure.ac: remove explicit -lpthread link

2017-06-09 Thread Rowley, Timothy O
With this patch series applied, the build fails for me on ubuntu 16.04.

Tree configured thusly:

../mesa/configure --with-platforms=x11 --disable-gbm --disable-egl 
--disable-dri --disable-xvmc --disable-vdpau --disable-omx --disable-va 
--with-gallium-drivers=swrast,swr 
LLVM_CONFIG=/home/torowley/llvm-3.9.0-opt/bin/llvm-config

Error:

make[5]: Entering directory '/home/torowley/work/mesa-opt/src/util'
  CC   libmesautil_la-u_atomic.lo
  CC   libmesautil_la-u_queue.lo
In file included from ../../../mesa/src/util/u_thread.h:32:0,
 from ../../../mesa/src/util/u_queue.h:39,
 from ../../../mesa/src/util/u_queue.c:27:
../../../mesa/include/c11/threads.h:79:2: error: #error Not supported on this 
platform.
 #error Not supported on this platform.
  ^
In file included from ../../../mesa/src/util/u_queue.h:39:0,
 from ../../../mesa/src/util/u_queue.c:27:
../../../mesa/src/util/u_thread.h:39:15: error: unknown type name ‘thrd_t’
 static inline thrd_t u_thread_create(int (*routine)(void *), void *param)
   ^
../../../mesa/src/util/u_thread.h: In function ‘u_thread_create’:
../../../mesa/src/util/u_thread.h:41:4: error: unknown type name ‘thrd_t’
thrd_t thread;
^
../../../mesa/src/util/u_thread.h:52:10: error: implicit declaration of 
function ‘thrd_create’ [-Werror=implicit-function-declaration]
ret = thrd_create( , routine, param );
  ^
...

On Jun 9, 2017, at 5:42 AM, Emil Velikov 
> wrote:

Hi all,

On 5 June 2017 at 00:04, Emil Velikov 
> wrote:
From: Emil Velikov 
>

As mentioned in last commit - pthread manual suggests using -pthread.
Furthermore, to the best of my knowledge anything built with GCC and
Clang should just work (tm) with the said flag.

AFAICT using the Sun or Intel compiler may need special treatment, but
that is to be confirmed/dismissed.

Cc: Randy Fishel >
Cc: Niveditha Rau >
Cc: Jon Turney >
Cc: Tim Rowley >
Cc: Bruce Cherniak >
Cc: Jeremy Huddleston Sequoia >
Signed-off-by: Emil Velikov 
>
---
Ladies and gents,

Please confirm if "-pthread" works on your platform/compiler combos.
If fixup changes are needed, please send over a patch to squash.

Thanks
Emil
---
configure.ac |  15 +--
m4/ax_pthread.m4 | 309 ---
2 files changed, 2 insertions(+), 322 deletions(-)
delete mode 100644 m4/ax_pthread.m4

diff --git a/configure.ac b/configure.ac
index 1c15eb482f9..2e4264cf592 100644
--- a/configure.ac
+++ b/configure.ac
@@ -825,23 +825,12 @@ AC_CHECK_FUNC([posix_memalign], [DEFINES="$DEFINES 
-DHAVE_POSIX_MEMALIGN"])
dnl Check for zlib
PKG_CHECK_MODULES([ZLIB], [zlib >= $ZLIB_REQUIRED])

-dnl Check for pthreads
-AX_PTHREAD
-if test "x$ax_pthread_ok" = xno; then
-AC_MSG_ERROR([Building mesa on this platform requires pthreads])
-fi
-dnl AX_PTHREADS leaves PTHREAD_LIBS empty for gcc and sets PTHREAD_CFLAGS
-dnl to -pthread, which causes problems if we need -lpthread to appear in
-dnl pkgconfig files.  Since Android doesn't have a pthread lib, this check
-dnl is not valid for that platform.
-if test "x$android" = xno; then
-test -z "$PTHREAD_LIBS" && PTHREAD_LIBS="-lpthread"
-fi
dnl According to the manual when using pthreads, one should add -pthread to
dnl both compile and link-time arguments.
dnl In practise that should be sufficient for all platforms, since any
dnl platforms build with GCC and Clang support the flag.
-PTHREAD_LIBS="$PTHREAD_LIBS -pthread"
+PTHREAD_CFLAGS="-pthread"
+PTHREAD_LIBS="-pthread"

Humble ping? Anyone's testing would be appreciated, although I would
love to hear from anyone in the CC-chain.

If the commit message seems sparse/ambiguous or in general you think
more information is needed, please let me know.
I intentionally, tried to keep it concise although it might have gone too brief.

Thanks
Emil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] automake: add SWR LLVM gen_builder.hpp workaround

2017-05-19 Thread Rowley, Timothy O
Thanks for doing this; I would have been hunting for the dist-hook: magic for a 
while.

Tested “make dist” on llvm-3.9.0 (works) and llvm-4.0/llvm-svn (fails, expected 
desired behavior).

Built result of llvm-3.9.0 “make dist” with llvm-4.0 and llvm-svn and it 
compiles/works.

Reviewed-by: Tim Rowley 
>

On May 19, 2017, at 12:31 PM, Emil Velikov 
> wrote:

From: Emil Velikov 
>

As gen_builder.hpp file is generated, it contains information that is
specific to the LLVM version it originates from.

As suggested by Tim, the file seems to be forwards compatible. So in
order to produce ship a file which will work everywhere we should be
using earlies supported LLVM - 3.9.

With this we're back on track and can build all of mesa without
python/mako/flex and friends.

In the long term we might want to see if the python generators can be
updated to produce LLVM version agnostic files. At least within the
range supported by SWR.

Cc: 
>
Cc: Chuck Atkins >
Cc: Tim Rowley >
Signed-off-by: Emil Velikov 
>
---
configure.ac|  4 
src/gallium/drivers/swr/Makefile.am | 41 ++---
2 files changed, 15 insertions(+), 30 deletions(-)

diff --git a/configure.ac b/configure.ac
index ce5301f3e45..3d10a4b8935 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2472,6 +2472,10 @@ if test -n "$with_gallium_drivers"; then
done
fi

+# XXX: Keep in sync with LLVM_REQUIRED_SWR
+AM_CONDITIONAL(SWR_INVALID_LLVM_VERSION, test "x$LLVM_VERSION" != x3.9.0 -a \
+  "x$LLVM_VERSION" != x3.9.1)
+
if test "x$enable_llvm" = "xyes" -a "$with_gallium_drivers"; then
llvm_require_version $LLVM_REQUIRED_GALLIUM "gallium"
llvm_add_default_components "gallium"
diff --git a/src/gallium/drivers/swr/Makefile.am 
b/src/gallium/drivers/swr/Makefile.am
index 0d71f52b1e6..7b2da074162 100644
--- a/src/gallium/drivers/swr/Makefile.am
+++ b/src/gallium/drivers/swr/Makefile.am
@@ -56,6 +56,7 @@ BUILT_SOURCES = \
rasterizer/codegen/gen_knobs.cpp \
rasterizer/codegen/gen_knobs.h \
rasterizer/jitter/gen_state_llvm.h \
+ rasterizer/jitter/gen_builder.hpp \
rasterizer/jitter/gen_builder_x86.hpp \
rasterizer/archrast/gen_ar_event.hpp \
rasterizer/archrast/gen_ar_event.cpp \
@@ -168,20 +169,6 @@ COMMON_LDFLAGS = \
$(LLVM_LDFLAGS)


-# XXX: As we cannot use BUILT_SOURCES (the files will end up in the dist
-# tarball) just annotate the dependency directly.
-# As the single direct user of gen_builder.hpp is a header (builder.h) trace 
all
-# the translusive users (one that use the latter header).
-rasterizer/jitter/blend_jit.cpp: rasterizer/jitter/gen_builder.hpp
-rasterizer/jitter/builder.cpp: rasterizer/jitter/gen_builder.hpp
-rasterizer/jitter/builder_misc.cpp: rasterizer/jitter/gen_builder.hpp
-rasterizer/jitter/fetch_jit.cpp: rasterizer/jitter/gen_builder.hpp
-rasterizer/jitter/streamout_jit.cpp: rasterizer/jitter/gen_builder.hpp
-swr_shader.cpp: rasterizer/jitter/gen_builder.hpp
-
-CLEANFILES = \
- rasterizer/jitter/gen_builder.hpp
-
lib_LTLIBRARIES = libswrAVX.la 
libswrAVX2.la

libswrAVX_la_CXXFLAGS = \
@@ -192,14 +179,6 @@ libswrAVX_la_CXXFLAGS = \
libswrAVX_la_SOURCES = \
$(COMMON_SOURCES)

-# XXX: Don't ship these generated sources for now, since they are specific
-# to the LLVM version they are generated from. Thus a release tarball
-# containing the said files, generated against eg. LLVM 3.8 will fail to build
-# on systems with other versions of LLVM eg. 3.7 or 3.6.
-# Move these back to BUILT_SOURCES once that is resolved.
-nodist_libswrAVX_la_SOURCES = \
- rasterizer/jitter/gen_builder.hpp
-
libswrAVX_la_LIBADD = \
$(COMMON_LIBADD)

@@ -214,14 +193,6 @@ libswrAVX2_la_CXXFLAGS = \
libswrAVX2_la_SOURCES = \
$(COMMON_SOURCES)

-# XXX: Don't ship these generated sources for now, since they are specific
-# to the LLVM version they are generated from. Thus a release tarball
-# containing the said files, generated against eg. LLVM 3.8 will fail to build
-# on systems with other versions of LLVM eg. 3.7 or 3.6.
-# Move these back to BUILT_SOURCES once that is resolved.
-nodist_libswrAVX2_la_SOURCES = \
- rasterizer/jitter/gen_builder.hpp
-
libswrAVX2_la_LIBADD = \
$(COMMON_LIBADD)

@@ -230,6 +201,16 @@ libswrAVX2_la_LDFLAGS = \

include $(top_srcdir)/install-gallium-links.mk

+# Generated gen_builder.hpp is not backwards compatible. So ship only one
+# created with the oldest supported version of LLVM.
+dist-hook:
+if SWR_INVALID_LLVM_VERSION
+ @echo 

Re: [Mesa-dev] Bug in 17.1.0-rc4 source packaging for swr?

2017-05-19 Thread Rowley, Timothy O

On May 19, 2017, at 10:26 AM, Emil Velikov 
> wrote:

On 19 May 2017 at 13:11, Chuck Atkins 
> wrote:
Would it be feasible for packaging purposes to generate multiple headers,
i.e. gen_builder._llvm38.hpp, gen_builder_llvm39.hpp,
gen_builder_llvm40.hpp, etc. and then have gen_builder.hpp be a stub that
just has something like:

#include 
#if llvm_version >= 4.0
#include "gen_builder_llvm40.hpp"
#elif llvm_version >= 3.9
#include "gen_builder_llvm39.hpp"
#elif llvm_version >= 3.8
#include "gen_builder_llvm38.hpp"
#else
#error llvm version >= 3.8 is required
#elif

Idea sounds ok, but has a few drawbacks:
- creating all the files at the same time would be quite picky/hard
- shipping only one file solves the issue only for some people

We need to figure out where to put the burden of keeping gen_builder up to date:
  - swr developers take care of updating new versions as needed
  - build system + release maintainer generates a lower common denominator 
version

The original idea by Tim sounds OK imho and I'm actually giving it a try.

Are you referring to using a llvm-3.9 generated version?  Did you envision me 
checking that in a gen_builder.h file, or removing the logic that omitted it 
from the tarball and somehow enforcing that a packaging build needs llvm-3.9?

FWIW the diff between 3.9 and 4.0 seems quite trivial - see below.
It should be possible to update the python scripts to handle most/all of those.
Perhaps we can have this as a long term solution?

At this point llvm seems to be stable in just having intrinsics being added; 
for a while there was some churn.  Unless/until the swr driver/rasterizer 
starts to take advantage of new llvm intrinsics, we should be fine using the 
3.9 version.

-Tim


-Emil

@@ -86,6 +86,11 @@
   return IRB()->CreateLifetimeEnd(Ptr, Size);
}

+CallInst* INVARIANT_START(Value *Ptr, ConstantInt *Size = nullptr)
+{
+return IRB()->CreateInvariantStart(Ptr, Size);
+}
+
CallInst* MASKED_LOAD(Value *Ptr, unsigned Align, Value *Mask, Value
*PassThru = nullptr, const Twine  = "")
{
   return IRB()->CreateMaskedLoad(Ptr, Align, Mask, PassThru, Name);
@@ -176,6 +181,11 @@
   return IRB()->CreateCondBr(Cond, True, False, BranchWeights, Unpredictable);
}

+BranchInst* COND_BR(Value *Cond, BasicBlock *True, BasicBlock *False,
Instruction *MDSrc)
+{
+return IRB()->CreateCondBr(Cond, True, False, MDSrc);
+}
+
SwitchInst* SWITCH(Value *V, BasicBlock *Dest, unsigned NumCases = 10,
MDNode *BranchWeights = nullptr, MDNode *Unpredictable = nullptr)
{
   return IRB()->CreateSwitch(V, Dest, NumCases, BranchWeights, Unpredictable);
@@ -866,7 +876,7 @@
   return IRB()->CreateCall(Callee, Args, Name, FPMathTag);
}

-CallInst* CALLA(llvm::FunctionType *FTy, Value *Callee,
ArrayRef Args, const Twine  = "", MDNode *FPMathTag =
nullptr)
+CallInst* CALLA(FunctionType *FTy, Value *Callee, ArrayRef
Args, const Twine  = "", MDNode *FPMathTag = nullptr)
{
   return IRB()->CreateCall(FTy, Callee, Args, Name, FPMathTag);
}

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Bug in 17.1.0-rc4 source packaging for swr?

2017-05-18 Thread Rowley, Timothy O

On May 17, 2017, at 12:08 PM, Emil Velikov 
> wrote:

On 10 May 2017 at 03:51, Chuck Atkins 
> wrote:
I just tried to build 17.0.4-rc4 from the tarball with swr enabled and got
errors about my python not having mako:

make[5]: Entering directory
'/tmp/atkins3/mesa/build/mesa-17.1.0-rc4_gcc-6.3.0_haswell/src/gallium/drivers/swr'
 GEN  rasterizer/jitter/gen_builder.hpp
Traceback (most recent call last):
 File
"../../../../../../mesa-17.1.0-rc4/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py",
line 24, in 
   from gen_common import MakoTemplateWriter, ArgumentParser
 File
"/tmp/atkins3/mesa/mesa-17.1.0-rc4/src/gallium/drivers/swr/rasterizer/codegen/gen_common.py",
line 27, in 
   from mako.template import Template
ImportError: No module named mako.template
Makefile:2424: recipe for target 'rasterizer/jitter/gen_builder.hpp' failed

As I understood it, mako should only be required when building out of git.
Unless this has changed, it seems there are some generated files missing
from the source tarball.

You're spot on here Chuck. Tarball should build without any tools such
as python/lex/etc.

At the moment the rasterizer/jitter/gen_builder.hpp file isn't shipped
in the tarball, hence the problem.
The file is omitted intentionally, as mentioned in the Makefile [1].
Would be great if we can fix that, so any suggestions would be
appreciated.

Tim, I believe we briefly had a chat about this a while back. Do you
have any ideas how to generate the file that works across all
supported LLVM versions?

We could use a gen_builder.h generated from llvm-3.9 for all current versions 
of llvm at this time (as the changes are now just llvm IR additions), but I’m 
not sure how to enforce that the person creating the tarball has a particular 
version of llvm installed.

-Tim


Thanks
Emil

[1] 
https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/swr/Makefile.am#n195

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] swr: move msaa resolve to generalized StoreTile

2017-05-08 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On May 4, 2017, at 7:33 PM, Bruce Cherniak 
> wrote:

v3: list piglit tests fixed by this patch. Fixed typo Tim pointed out.
v2: Reword commit message to more closely adhere to community
guidelines.

This patch moves msaa resolve down into core/StoreTiles where the
surface format conversion routines are available.  The previous
"experimental" resolve was limited to 8-bit unsigned render targets.

This fixes a number of piglit msaa tests by adding resolve support for
all the render target formats we support.

Specifically:
layered-rendering/gl-layer-render: fail->pass
layered-rendering/gl-layer-render-storage: fail->pass
multisample-formats *[2,4,8,16] gl_arb_texture_rg: crash->pass
multisample-formats *[2,4,8,16] gl_ext_texture_snorm: crash->pass
multisample-formats *[2,4,8,16] gl_arb_texture_float: fail->pass
multisample-formats *[2,4,8,16] gl_arb_texture_rg-float: fail->pass

MSAA is still disabled by default, but can be enabled with
"export SWR_MSAA_MAX_COUNT=4" (1,2,4,8,16 are options)
The default is 0, which is disabled.

This patch improves the number of multisample-formats supported by swr,
and fixes several crashes currently in the 17.1 branch.  Therefore, it
should be considered for inclusion in the 17.1 stable release.  Being
disabled by default, it poses no risk to most users of swr.

cc: mesa-sta...@lists.freedesktop.org
---
.../drivers/swr/rasterizer/memory/StoreTile.h  | 75 +
src/gallium/drivers/swr/swr_context.cpp| 77 +-
src/gallium/drivers/swr/swr_screen.cpp | 10 +--
3 files changed, 82 insertions(+), 80 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h 
b/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h
index ffde574c03..12a5f3d8ce 100644
--- a/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h
+++ b/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h
@@ -1133,6 +1133,64 @@ struct StoreRasterTile
}
}
}
+
+//
+/// @brief Resolves an 8x8 raster tile to the resolve destination surface.
+/// @param pSrc - Pointer to raster tile.
+/// @param pDstSurface - Destination surface state
+/// @param x, y - Coordinates to raster tile.
+/// @param sampleOffset - Offset between adjacent multisamples
+INLINE static void Resolve(
+uint8_t *pSrc,
+SWR_SURFACE_STATE* pDstSurface,
+uint32_t x, uint32_t y, uint32_t sampleOffset, uint32_t 
renderTargetArrayIndex) // (x, y) pixel coordinate to start of raster tile.
+{
+uint32_t lodWidth = std::max(pDstSurface->width >> pDstSurface->lod, 
1U);
+uint32_t lodHeight = std::max(pDstSurface->height >> pDstSurface->lod, 
1U);
+
+float oneOverNumSamples = 1.0f / pDstSurface->numSamples;
+
+// For each raster tile pixel (rx, ry)
+for (uint32_t ry = 0; ry < KNOB_TILE_Y_DIM; ++ry)
+{
+for (uint32_t rx = 0; rx < KNOB_TILE_X_DIM; ++rx)
+{
+// Perform bounds checking.
+if (((x + rx) < lodWidth) &&
+((y + ry) < lodHeight))
+{
+// Sum across samples
+float resolveColor[4] = {0};
+for (uint32_t sampleNum = 0; sampleNum < 
pDstSurface->numSamples; sampleNum++)
+{
+float sampleColor[4] = {0};
+uint8_t *pSampleSrc = pSrc + sampleOffset * sampleNum;
+GetSwizzledSrcColor(pSampleSrc, rx, ry, sampleColor);
+resolveColor[0] += sampleColor[0];
+resolveColor[1] += sampleColor[1];
+resolveColor[2] += sampleColor[2];
+resolveColor[3] += sampleColor[3];
+}
+
+// Divide by numSamples to average
+resolveColor[0] *= oneOverNumSamples;
+resolveColor[1] *= oneOverNumSamples;
+resolveColor[2] *= oneOverNumSamples;
+resolveColor[3] *= oneOverNumSamples;
+
+// Use the resolve surface state
+SWR_SURFACE_STATE* pResolveSurface = 
(SWR_SURFACE_STATE*)pDstSurface->pAuxBaseAddress;
+uint8_t *pDst = (uint8_t*)ComputeSurfaceAddress((x + rx), (y + ry),
+pResolveSurface->arrayIndex + renderTargetArrayIndex, 
pResolveSurface->arrayIndex + renderTargetArrayIndex,
+0, pResolveSurface->lod, pResolveSurface);
+{
+ConvertPixelFromFloat(pDst, resolveColor);
+}
+}
+   

Re: [Mesa-dev] [Mesa-stable] [PATCH v2] swr: move msaa resolve to generalized StoreTile

2017-05-04 Thread Rowley, Timothy O

> On Apr 27, 2017, at 6:22 PM, Bruce Cherniak  wrote:
> 
> v2: Reword commit message to more closely adhere to community
> guidelines.
> 
> This patch moves msaa resolve down into core/StoreTiles where the
> surface format conversion routines are available.  The previous
> "experimental" resolve was limited to 8-bit unsigned render targets.
> 
> This fixes a number of piglit msaa tests by adding resolve support for
> all the render target formats we support.
> 
> MSAA is still disabled by default, but can be enabled with
> "export SWR_MSAA_MAX_COUNT=4" (1,2,4,8,16 are options)
> The default is 0, which is disabled.
> 
> Because it fixes a number of piglit tests, I kindly request inclusion
> into 17.1 stable.

Could you list the fixed piglit tests, wildcarding if necessary if the list is 
overly large?

> cc: mesa-sta...@lists.freedesktop.org
> ---
> .../drivers/swr/rasterizer/memory/StoreTile.h  | 75 +
> src/gallium/drivers/swr/swr_context.cpp| 77 +-
> src/gallium/drivers/swr/swr_screen.cpp | 10 +--
> 3 files changed, 82 insertions(+), 80 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h 
> b/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h
> index ffde574c03..12a5f3d8ce 100644
> --- a/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h
> +++ b/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h
> @@ -1133,6 +1133,64 @@ struct StoreRasterTile
> }
> }
> }
> +
> +
> //
> +/// @brief Resolves an 8x8 raster tile to the resolve destination 
> surface.
> +/// @param pSrc - Pointer to raster tile.
> +/// @param pDstSurface - Destination surface state
> +/// @param x, y - Coordinates to raster tile.
> +/// @param sampleOffset - Offset between adjacent multisamples
> +INLINE static void Resolve(
> +uint8_t *pSrc,
> +SWR_SURFACE_STATE* pDstSurface,
> +uint32_t x, uint32_t y, uint32_t sampleOffset, uint32_t 
> renderTargetArrayIndex) // (x, y) pixel coordinate to start of raster tile.
> +{
> +uint32_t lodWidth = std::max(pDstSurface->width >> pDstSurface->lod, 
> 1U);
> +uint32_t lodHeight = std::max(pDstSurface->height >> 
> pDstSurface->lod, 1U);
> +
> +float oneOverNumSamples = 1.0f / pDstSurface->numSamples;
> +
> +// For each raster tile pixel (rx, ry)
> +for (uint32_t ry = 0; ry < KNOB_TILE_Y_DIM; ++ry)
> +{
> +for (uint32_t rx = 0; rx < KNOB_TILE_X_DIM; ++rx)
> +{
> +// Perform bounds checking.
> +if (((x + rx) < lodWidth) &&
> +((y + ry) < lodHeight))
> +{
> +// Sum across samples
> +float resolveColor[4] = {0};
> +for (uint32_t sampleNum = 0; sampleNum < 
> pDstSurface->numSamples; sampleNum++)
> +{
> +float sampleColor[4] = {0};
> +uint8_t *pSampleSrc = pSrc + sampleOffset * 
> sampleNum;
> +GetSwizzledSrcColor(pSampleSrc, rx, ry, sampleColor);
> +resolveColor[0] += sampleColor[0];
> +resolveColor[1] += sampleColor[1];
> +resolveColor[2] += sampleColor[2];
> +resolveColor[3] += sampleColor[3];
> +}
> +
> +// Divide by numSamples to average
> +resolveColor[0] *= oneOverNumSamples;
> +resolveColor[1] *= oneOverNumSamples;
> +resolveColor[2] *= oneOverNumSamples;
> +resolveColor[3] *= oneOverNumSamples;
> +
> +// Use the resolve surface state
> +SWR_SURFACE_STATE* pResolveSurface = 
> (SWR_SURFACE_STATE*)pDstSurface->pAuxBaseAddress;
> +uint8_t *pDst = (uint8_t*)ComputeSurfaceAddress false>((x + rx), (y + ry),
> +pResolveSurface->arrayIndex + 
> renderTargetArrayIndex, pResolveSurface->arrayIndex + renderTargetArrayIndex,
> +0, pResolveSurface->lod, pResolveSurface);
> +{
> +ConvertPixelFromFloat(pDst, resolveColor);
> +}
> +}
> +}
> +}
> +}
> +
> };
> 
> template
> @@ -2316,6 +2374,9 @@ struct StoreMacroTile
> pfnStore[sampleNum] = (bForceGeneric || 
> KNOB_USE_GENERIC_STORETILE) ? StoreRasterTile DstFormat>::Store : OptStoreRasterTile::Store;
> }
> 
> +// Save original for pSrcHotTile resolve.
> +uint8_t *pResolveSrcHotTile = pSrcHotTile;
> +
> // Store each raster tile from the hot tile to the destination 

Re: [Mesa-dev] [PATCH 2/2] swr: Fix polygonmode for front==back

2017-04-25 Thread Rowley, Timothy O
Additionally I don’t think this should go into stable - without the 
corresponding rasterizer commit (which feels like a risky change post -rc1) it 
is of limited use.

On Apr 25, 2017, at 6:58 PM, Ilia Mirkin 
> wrote:

This will cause asserts on piglit and dEQP runs instead of failures. This is 
incredibly inconvenient, as e.g. dEQP runs everything in a single process.

On Apr 25, 2017 7:29 PM, "George Kyriazis" 
> wrote:
Add logic for converting enums and also making sure stipple works.

CC: 
>

---
 src/gallium/drivers/swr/swr_state.cpp | 14 +-
 src/gallium/drivers/swr/swr_state.h   | 20 
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index 56b1374..24a6759 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -201,6 +201,12 @@ swr_create_rasterizer_state(struct pipe_context *pipe,
struct pipe_rasterizer_state *state;
state = (pipe_rasterizer_state *)mem_dup(rast, sizeof *rast);

+   if (state) {
+  if (state->fill_front != state->fill_back) {
+ assert(0 && "front != back polygon mode not supported");
+  }
+   }
+
return state;
 }

@@ -1153,6 +1159,10 @@ swr_update_derived(struct pipe_context *pipe,
  rastState->slopeScaledDepthBias = 0;
  rastState->depthBiasClamp = 0;
   }
+
+  /* translate polygon mode, at least for the front==back case */
+  rastState->fillMode = swr_convert_fill_mode(rasterizer->fill_front);
+
   struct pipe_surface *zb = fb->zsbuf;
   if (zb && swr_resource(zb->texture)->has_depth)
  rastState->depthFormat = swr_resource(zb->texture)->swr.format;
@@ -1423,7 +1433,9 @@ swr_update_derived(struct pipe_context *pipe,
/* and points, since we rasterize them as triangles, too */
/* Has to be before fragment shader, since it sets SWR_NEW_FS */
if (p_draw_info) {
-  bool new_prim_is_poly = (u_reduced_prim(p_draw_info->mode) == 
PIPE_PRIM_TRIANGLES);
+  bool new_prim_is_poly =
+ (u_reduced_prim(p_draw_info->mode) == PIPE_PRIM_TRIANGLES) &&
+ (ctx->derived.rastState.fillMode == SWR_FILLMODE_SOLID);
   if (new_prim_is_poly != ctx->poly_stipple.prim_is_poly) {
  ctx->dirty |= SWR_NEW_FS;
  ctx->poly_stipple.prim_is_poly = new_prim_is_poly;
diff --git a/src/gallium/drivers/swr/swr_state.h 
b/src/gallium/drivers/swr/swr_state.h
index 9a8c4e1..7940a96 100644
--- a/src/gallium/drivers/swr/swr_state.h
+++ b/src/gallium/drivers/swr/swr_state.h
@@ -376,4 +376,24 @@ swr_convert_prim_topology(const unsigned mode)
   return TOP_UNKNOWN;
}
 };
+
+/*
+ * convert mesa PIPE_POLYGON_MODE_X to SWR enum SWR_FILLMODE
+ */
+static INLINE enum SWR_FILLMODE
+swr_convert_fill_mode(const unsigned mode)
+{
+   switch(mode) {
+   case PIPE_POLYGON_MODE_FILL:
+  return SWR_FILLMODE_SOLID;
+   case PIPE_POLYGON_MODE_LINE:
+  return SWR_FILLMODE_WIREFRAME;
+   case PIPE_POLYGON_MODE_POINT:
+  return SWR_FILLMODE_POINT;
+   default:
+  assert(0 && "Unknown fillmode");
+  return SWR_FILLMODE_SOLID; // at least do something sensible
+   }
+}
+
 #endif
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: add linux to scons build

2017-04-14 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Apr 13, 2017, at 2:17 PM, George Kyriazis 
> wrote:

Make swr compile for both linux and windows.
---
src/gallium/drivers/swr/SConscript| 7 +--
src/gallium/targets/libgl-xlib/SConscript | 2 +-
2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/swr/SConscript 
b/src/gallium/drivers/swr/SConscript
index eca5dba..5e3784b 100644
--- a/src/gallium/drivers/swr/SConscript
+++ b/src/gallium/drivers/swr/SConscript
@@ -17,11 +17,6 @@ if env['LLVM_VERSION'] < 
distutils.version.LooseVersion('3.9'):
env['swr'] = False
Return()

-if env['platform'] != 'windows':
-print "warning: swr scons build only supports windows: not building swr"
-env['swr'] = False
-Return()
-
env.MSVC2013Compat()

env = env.Clone()
@@ -205,7 +200,7 @@ envavx2.Append(CPPDEFINES = ['KNOB_ARCH=KNOB_ARCH_AVX2'])
if env['platform'] == 'windows':
envavx2.Append(CCFLAGS = ['/arch:AVX2'])
else:
-envavx2.Append(CCFLAGS = ['-mavx2'])
+envavx2.Append(CCFLAGS = ['-mavx2', '-mfma', '-mbmi2', '-mf16c'])

swrAVX2 = envavx2.SharedLibrary(
target = 'swrAVX2',
diff --git a/src/gallium/targets/libgl-xlib/SConscript 
b/src/gallium/targets/libgl-xlib/SConscript
index d01bb3c..a81ac79 100644
--- a/src/gallium/targets/libgl-xlib/SConscript
+++ b/src/gallium/targets/libgl-xlib/SConscript
@@ -49,7 +49,7 @@ if env['llvm']:
env.Prepend(LIBS = [llvmpipe])

if env['swr']:
-env.Append(CPPDEFINES = 'HAVE_SWR')
+env.Append(CPPDEFINES = 'GALLIUM_SWR')
env.Prepend(LIBS = [swr])

if env['platform'] != 'darwin':
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: Removed unnecessary PIPE_BIND flags from swr_is_format_supported

2017-04-13 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Apr 12, 2017, at 6:53 PM, Bruce Cherniak 
> wrote:

Removed unnecessary and probably wrong PIPE_BIND_SCANOUT and PIPE_BIND_SHARED
flags in favor of check on single PIPE_BIND_DISPLAY_TARGET flag.

Reference llvmpipe change 

---
src/gallium/drivers/swr/swr_screen.cpp | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index 3d3d103..87fd898 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -103,8 +103,7 @@ swr_is_format_supported(struct pipe_screen *screen,
   if (sample_count > 1)
  return FALSE;

-   if (bind
-   & (PIPE_BIND_DISPLAY_TARGET | PIPE_BIND_SCANOUT | PIPE_BIND_SHARED)) {
+   if (bind & PIPE_BIND_DISPLAY_TARGET) {
  if (!winsys->is_displaytarget_format_supported(winsys, bind, format))
 return FALSE;
   }
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: Align swr_context allocation to SIMD alignment.

2017-04-13 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Apr 12, 2017, at 6:43 PM, Bruce Cherniak 
> wrote:

The context now contains SIMD vectors which must be aligned (specifically
samplePositions in the rastState in the derived state).  Failure to align
can result in segv crash on unaligned memory access in vector
instructions.

---
src/gallium/drivers/swr/swr_context.cpp | 7 +--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_context.cpp 
b/src/gallium/drivers/swr/swr_context.cpp
index 8c5a269..6f46d66 100644
--- a/src/gallium/drivers/swr/swr_context.cpp
+++ b/src/gallium/drivers/swr/swr_context.cpp
@@ -386,7 +386,7 @@ swr_destroy(struct pipe_context *pipe)
   if (screen->pipe == pipe)
  screen->pipe = NULL;

-   FREE(ctx);
+   AlignedFree(ctx);
}


@@ -452,7 +452,10 @@ swr_UpdateStatsFE(HANDLE hPrivateContext, const 
SWR_STATS_FE *pStats)
struct pipe_context *
swr_create_context(struct pipe_screen *p_screen, void *priv, unsigned flags)
{
-   struct swr_context *ctx = CALLOC_STRUCT(swr_context);
+   struct swr_context *ctx = (struct swr_context *)
+  AlignedMalloc(sizeof(struct swr_context), KNOB_SIMD_BYTES);
+   memset(ctx, 0, sizeof(struct swr_context));
+
   ctx->blendJIT =
  new std::unordered_map;

--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: return true for PIPE_CAP_DOUBLES

2017-04-13 Thread Rowley, Timothy O

On Apr 13, 2017, at 4:26 AM, Nicolai Hähnle 
> wrote:

On 11.04.2017 18:53, Tim Rowley wrote:
---
src/gallium/drivers/swr/swr_screen.cpp | 1 +
1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index d737ddf..3d3d103 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -248,6 +248,7 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
   case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR:
   case PIPE_CAP_CULL_DISTANCE:
   case PIPE_CAP_CUBE_MAP_ARRAY:
+   case PIPE_CAP_DOUBLES:
  return 1;

  /* unsupported features */


You probably also want to update features.txt and the relnotes?


I don’t think so; we always returned the llvmpipe enablement of this feature, 
so the value hasn’t changed.  When the cap moved from being a shader cap to a 
screen cap, we missed adding it to swr.

-Tim

Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] docs: document the C++14 SWR requirement

2017-04-12 Thread Rowley, Timothy O

On Apr 12, 2017, at 9:11 AM, Emil Velikov 
> wrote:

From: Emil Velikov 
>

Earlier commit bumped the requirement for the SWR driver.

Cc: Tim Rowley >
Fixes: 3c52a7316a1 ("swr: [configure.ac/scons] require c++14")
Signed-off-by: Emil Velikov 
>
---
Tim, couple of unrelated questions:
- Did you get to nuke the in-tree mako copy?

Yes, this is now gone.

- Any plans on adding !Windows support in the scons build?

George is the one working with scons.  George, your thoughts?


docs/relnotes/17.1.0.html | 1 +
1 file changed, 1 insertion(+)

diff --git a/docs/relnotes/17.1.0.html b/docs/relnotes/17.1.0.html
index 0a5cabe4f1b..eb324e25a44 100644
--- a/docs/relnotes/17.1.0.html
+++ b/docs/relnotes/17.1.0.html
@@ -68,6 +68,7 @@ Note: some of the new features are only available with 
certain drivers.
The swr driver now requires LLVM = 3.9.0.
The radeonsi driver now requires LLVM 3.8.0.
The MESA_GLSL=opt and MESA_GLSL=no_opt environment vars have been 
removed.
+The SWR gallium driver now requires C++14 capable compiler

Wording should be adjusted to “The SWR gallium driver now requires a C++14 
capable compiler.”

Or possibly the two swr items could be combined: “The swr driver now requires 
LLVM >= 3.9.0 and a C++14 capable compiler."

Pick your poison, Reviewed-by: Tim Rowley 
>

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] swr: [rasterizer codegen] Fix windows build

2017-03-29 Thread Rowley, Timothy O
Commit comment should not include “[rasterizer codegen]”, as it doesn’t modify 
that code.

With that fixed, Reviewed-by: Tim Rowley 
>

On Mar 28, 2017, at 4:44 PM, George Kyriazis 
> wrote:

Fix codegen build break that was introduced earlier

v2: update rules for gen_knobs.cpp and gen_knobs.h

v3: Introduce bldroot and revert generator file changes, making patch simpler.
---
src/gallium/drivers/swr/SConscript | 38 +++---
1 file changed, 31 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/swr/SConscript 
b/src/gallium/drivers/swr/SConscript
index ad16162..18d6c9b 100644
--- a/src/gallium/drivers/swr/SConscript
+++ b/src/gallium/drivers/swr/SConscript
@@ -47,20 +47,25 @@ if not env['msvc'] :
])

swrroot = '#src/gallium/drivers/swr/'
+bldroot = Dir('.').abspath

env.CodeGenerate(
target = 'rasterizer/codegen/gen_knobs.cpp',
script = swrroot + 'rasterizer/codegen/gen_knobs.py',
-source = 'rasterizer/codegen/templates/gen_knobs.cpp',
-command = python_cmd + ' $SCRIPT --input $SOURCE --output $TARGET 
--gen_cpp'
+source = '',
+command = python_cmd + ' $SCRIPT --output $TARGET --gen_cpp'
)
+Depends('rasterizer/codegen/gen_knobs.cpp',
+swrroot + 'rasterizer/codegen/templates/gen_knobs.cpp')

env.CodeGenerate(
target = 'rasterizer/codegen/gen_knobs.h',
script = swrroot + 'rasterizer/codegen/gen_knobs.py',
-source = 'rasterizer/codegen/templates/gen_knobs.cpp',
-command = python_cmd + ' $SCRIPT --input $SOURCE --output $TARGET --gen_h'
+source = '',
+command = python_cmd + ' $SCRIPT --output $TARGET --gen_h'
)
+Depends('rasterizer/codegen/gen_knobs.cpp',
+swrroot + 'rasterizer/codegen/templates/gen_knobs.cpp')

env.CodeGenerate(
target = 'rasterizer/jitter/gen_state_llvm.h',
@@ -68,20 +73,26 @@ env.CodeGenerate(
source = 'rasterizer/core/state.h',
command = python_cmd + ' $SCRIPT --input $SOURCE --output $TARGET'
)
+Depends('rasterizer/jitter/gen_state_llvm.h',
+swrroot + 'rasterizer/codegen/templates/gen_llvm.hpp')

env.CodeGenerate(
target = 'rasterizer/jitter/gen_builder.hpp',
script = swrroot + 'rasterizer/codegen/gen_llvm_ir_macros.py',
source = os.path.join(llvm_includedir, 'llvm/IR/IRBuilder.h'),
-command = python_cmd + ' $SCRIPT --input $SOURCE --output 
rasterizer/jitter --gen_h'
+command = python_cmd + ' $SCRIPT --input $SOURCE --output ' + bldroot + 
'/rasterizer/jitter --gen_h'
)
+Depends('rasterizer/jitter/gen_builder.hpp',
+swrroot + 'rasterizer/codegen/templates/gen_builder.hpp')

env.CodeGenerate(
target = 'rasterizer/jitter/gen_builder_x86.hpp',
script = swrroot + 'rasterizer/codegen/gen_llvm_ir_macros.py',
source = '',
-command = python_cmd + ' $SCRIPT --output rasterizer/jitter --gen_x86_h'
+command = python_cmd + ' $SCRIPT --output ' + bldroot + 
'/rasterizer/jitter --gen_x86_h'
)
+Depends('rasterizer/jitter/gen_builder.hpp',
+swrroot + 'rasterizer/codegen/templates/gen_builder.hpp')

env.CodeGenerate(
target = './gen_swr_context_llvm.h',
@@ -89,6 +100,8 @@ env.CodeGenerate(
source = 'swr_context.h',
command = python_cmd + ' $SCRIPT --input $SOURCE --output $TARGET'
)
+Depends('rasterizer/jitter/gen_state_llvm.h',
+swrroot + 'rasterizer/codegen/templates/gen_llvm.hpp')

env.CodeGenerate(
target = 'rasterizer/archrast/gen_ar_event.hpp',
@@ -96,6 +109,8 @@ env.CodeGenerate(
source = 'rasterizer/archrast/events.proto',
command = python_cmd + ' $SCRIPT --proto $SOURCE --output $TARGET 
--gen_event_h'
)
+Depends('rasterizer/jitter/gen_state_llvm.h',
+swrroot + 'rasterizer/codegen/templates/gen_ar_event.hpp')

env.CodeGenerate(
target = 'rasterizer/archrast/gen_ar_event.cpp',
@@ -103,6 +118,8 @@ env.CodeGenerate(
source = 'rasterizer/archrast/events.proto',
command = python_cmd + ' $SCRIPT --proto $SOURCE --output $TARGET 
--gen_event_cpp'
)
+Depends('rasterizer/jitter/gen_state_llvm.h',
+swrroot + 'rasterizer/codegen/templates/gen_ar_event.cpp')

env.CodeGenerate(
target = 'rasterizer/archrast/gen_ar_eventhandler.hpp',
@@ -110,6 +127,8 @@ env.CodeGenerate(
source = 'rasterizer/archrast/events.proto',
command = python_cmd + ' $SCRIPT --proto $SOURCE --output $TARGET 
--gen_eventhandler_h'
)
+Depends('rasterizer/jitter/gen_state_llvm.h',
+swrroot + 'rasterizer/codegen/templates/gen_ar_eventhandler.hpp')

env.CodeGenerate(
target = 'rasterizer/archrast/gen_ar_eventhandlerfile.hpp',
@@ -117,6 +136,8 @@ env.CodeGenerate(
source = 'rasterizer/archrast/events.proto',
command = python_cmd + ' $SCRIPT --proto $SOURCE --output $TARGET 
--gen_eventhandlerfile_h'
)
+Depends('rasterizer/jitter/gen_state_llvm.h',
+swrroot + 'rasterizer/codegen/templates/gen_ar_eventhandlerfile.hpp')

# 5 

Re: [Mesa-dev] [PATCH v2 10/10] swr: [rasterizer codegen] Fix windows build

2017-03-28 Thread Rowley, Timothy O
I’m going to drop patch 10 from the patchset for now.

-Tim

On Mar 27, 2017, at 9:42 PM, Kyriazis, George 
<george.kyria...@intel.com<mailto:george.kyria...@intel.com>> wrote:

Tried Depends(), but it doesn’t work all the time.  There are some cases where 
it works, and some others where it doesn’t.

I’ll need to investigate more.  Fix in a separate checkin later?

George

On Mar 27, 2017, at 8:38 PM, Rowley, Timothy O 
<timothy.o.row...@intel.com<mailto:timothy.o.row...@intel.com>> wrote:

On closer review of 10/10, I don’t like the approach taken here.

You’ve added a —template argument to gen_backends.py, making it different from 
the rest of the scripts and actually running it with different parameters on 
automake and scons.  Can’t you get scons to have the necessary dependency using 
its Depends() call?

http://scons.org/doc/production/HTML/scons-user/ch06s05.html

On Mar 27, 2017, at 7:39 PM, George Kyriazis 
<george.kyria...@intel.com<mailto:george.kyria...@intel.com>> wrote:

Fix codegen build break that was introduced earlier

v2: update rules for gen_knobs.cpp and gen_knobs.h

---
src/gallium/drivers/swr/Makefile.am|  4 +--
src/gallium/drivers/swr/SConscript | 15 ++-
.../drivers/swr/rasterizer/codegen/gen_backends.py | 30 ++
.../swr/rasterizer/codegen/gen_llvm_ir_macros.py   | 20 +++
4 files changed, 39 insertions(+), 30 deletions(-)

diff --git a/src/gallium/drivers/swr/Makefile.am 
b/src/gallium/drivers/swr/Makefile.am
index 515a9089cc..cc37abf3e8 100644
--- a/src/gallium/drivers/swr/Makefile.am
+++ b/src/gallium/drivers/swr/Makefile.am
@@ -97,14 +97,14 @@ rasterizer/jitter/gen_builder.hpp: 
rasterizer/codegen/gen_llvm_ir_macros.py rast
$(PYTHON_GEN) \
$(srcdir)/rasterizer/codegen/gen_llvm_ir_macros.py \
--input $(LLVM_INCLUDEDIR)/llvm/IR/IRBuilder.h \
- --output rasterizer/jitter \
+ --output $@ \
--gen_h

rasterizer/jitter/gen_builder_x86.hpp: rasterizer/codegen/gen_llvm_ir_macros.py 
rasterizer/codegen/templates/gen_builder.hpp rasterizer/codegen/gen_common.py
$(MKDIR_GEN)
$(PYTHON_GEN) \
$(srcdir)/rasterizer/codegen/gen_llvm_ir_macros.py \
- --output rasterizer/jitter \
+ --output $@ \
--gen_x86_h

rasterizer/archrast/gen_ar_event.hpp: rasterizer/codegen/gen_archrast.py 
rasterizer/codegen/templates/gen_ar_event.hpp rasterizer/archrast/events.proto 
rasterizer/codegen/gen_common.py
diff --git a/src/gallium/drivers/swr/SConscript 
b/src/gallium/drivers/swr/SConscript
index ad16162c29..aa4a8e6d55 100644
--- a/src/gallium/drivers/swr/SConscript
+++ b/src/gallium/drivers/swr/SConscript
@@ -51,15 +51,15 @@ swrroot = '#src/gallium/drivers/swr/'
env.CodeGenerate(
target = 'rasterizer/codegen/gen_knobs.cpp',
script = swrroot + 'rasterizer/codegen/gen_knobs.py',
-source = 'rasterizer/codegen/templates/gen_knobs.cpp',
-command = python_cmd + ' $SCRIPT --input $SOURCE --output $TARGET 
--gen_cpp'
+source = '',
+command = python_cmd + ' $SCRIPT --output $TARGET --gen_cpp'
)

env.CodeGenerate(
target = 'rasterizer/codegen/gen_knobs.h',
script = swrroot + 'rasterizer/codegen/gen_knobs.py',
-source = 'rasterizer/codegen/templates/gen_knobs.cpp',
-command = python_cmd + ' $SCRIPT --input $SOURCE --output $TARGET --gen_h'
+source = '',
+command = python_cmd + ' $SCRIPT --output $TARGET --gen_h'
)

env.CodeGenerate(
@@ -73,14 +73,14 @@ env.CodeGenerate(
target = 'rasterizer/jitter/gen_builder.hpp',
script = swrroot + 'rasterizer/codegen/gen_llvm_ir_macros.py',
source = os.path.join(llvm_includedir, 'llvm/IR/IRBuilder.h'),
-command = python_cmd + ' $SCRIPT --input $SOURCE --output 
rasterizer/jitter --gen_h'
+command = python_cmd + ' $SCRIPT --input $SOURCE --output $TARGET --gen_h'
)

env.CodeGenerate(
target = 'rasterizer/jitter/gen_builder_x86.hpp',
script = swrroot + 'rasterizer/codegen/gen_llvm_ir_macros.py',
source = '',
-command = python_cmd + ' $SCRIPT --output rasterizer/jitter --gen_x86_h'
+command = python_cmd + ' $SCRIPT --output $TARGET --gen_x86_h'
)

env.CodeGenerate(
@@ -127,7 +127,8 @@ env.CodeGenerate(
env.CodeGenerate(
target = 'rasterizer/core/gen_BackendPixelRate0.cpp',
script = swrroot + 'rasterizer/codegen/gen_backends.py',
-command = python_cmd + ' $SCRIPT --output rasterizer/core --dim 5 2 3 2 2 
2 --split 0 --cpp'
+source = swrroot + 'rasterizer/codegen/templates/gen_backend.cpp',
+command = python_cmd + ' $SCRIPT --output $TARGET --template $SOURCE --dim 
5 2 3 2 2 2 --split 0 --cpp'
)

# Auto-generated .cpp files (that need to generate object files)
diff --git a/src/gallium/drivers/swr/rasterizer/codegen/gen_backends.py 
b/src/gallium/drivers/swr/rasterizer/codegen/gen_backends.py
index 242ab7a73e..8f7ba94ba1 100644
--- a/src/gallium/drivers/swr/rasterizer/codegen/gen_backends.py
+++ b/src/gallium/drivers/swr/rasterizer/codegen/gen_backends.py
@@ -34,7 +34,10 @@ def

Re: [Mesa-dev] [PATCH v2 10/10] swr: [rasterizer codegen] Fix windows build

2017-03-27 Thread Rowley, Timothy O
On closer review of 10/10, I don’t like the approach taken here.

You’ve added a —template argument to gen_backends.py, making it different from 
the rest of the scripts and actually running it with different parameters on 
automake and scons.  Can’t you get scons to have the necessary dependency using 
its Depends() call?

http://scons.org/doc/production/HTML/scons-user/ch06s05.html

On Mar 27, 2017, at 7:39 PM, George Kyriazis 
> wrote:

Fix codegen build break that was introduced earlier

v2: update rules for gen_knobs.cpp and gen_knobs.h

---
src/gallium/drivers/swr/Makefile.am|  4 +--
src/gallium/drivers/swr/SConscript | 15 ++-
.../drivers/swr/rasterizer/codegen/gen_backends.py | 30 ++
.../swr/rasterizer/codegen/gen_llvm_ir_macros.py   | 20 +++
4 files changed, 39 insertions(+), 30 deletions(-)

diff --git a/src/gallium/drivers/swr/Makefile.am 
b/src/gallium/drivers/swr/Makefile.am
index 515a9089cc..cc37abf3e8 100644
--- a/src/gallium/drivers/swr/Makefile.am
+++ b/src/gallium/drivers/swr/Makefile.am
@@ -97,14 +97,14 @@ rasterizer/jitter/gen_builder.hpp: 
rasterizer/codegen/gen_llvm_ir_macros.py rast
$(PYTHON_GEN) \
$(srcdir)/rasterizer/codegen/gen_llvm_ir_macros.py \
--input $(LLVM_INCLUDEDIR)/llvm/IR/IRBuilder.h \
- --output rasterizer/jitter \
+ --output $@ \
--gen_h

rasterizer/jitter/gen_builder_x86.hpp: rasterizer/codegen/gen_llvm_ir_macros.py 
rasterizer/codegen/templates/gen_builder.hpp rasterizer/codegen/gen_common.py
$(MKDIR_GEN)
$(PYTHON_GEN) \
$(srcdir)/rasterizer/codegen/gen_llvm_ir_macros.py \
- --output rasterizer/jitter \
+ --output $@ \
--gen_x86_h

rasterizer/archrast/gen_ar_event.hpp: rasterizer/codegen/gen_archrast.py 
rasterizer/codegen/templates/gen_ar_event.hpp rasterizer/archrast/events.proto 
rasterizer/codegen/gen_common.py
diff --git a/src/gallium/drivers/swr/SConscript 
b/src/gallium/drivers/swr/SConscript
index ad16162c29..aa4a8e6d55 100644
--- a/src/gallium/drivers/swr/SConscript
+++ b/src/gallium/drivers/swr/SConscript
@@ -51,15 +51,15 @@ swrroot = '#src/gallium/drivers/swr/'
env.CodeGenerate(
target = 'rasterizer/codegen/gen_knobs.cpp',
script = swrroot + 'rasterizer/codegen/gen_knobs.py',
-source = 'rasterizer/codegen/templates/gen_knobs.cpp',
-command = python_cmd + ' $SCRIPT --input $SOURCE --output $TARGET 
--gen_cpp'
+source = '',
+command = python_cmd + ' $SCRIPT --output $TARGET --gen_cpp'
)

env.CodeGenerate(
target = 'rasterizer/codegen/gen_knobs.h',
script = swrroot + 'rasterizer/codegen/gen_knobs.py',
-source = 'rasterizer/codegen/templates/gen_knobs.cpp',
-command = python_cmd + ' $SCRIPT --input $SOURCE --output $TARGET --gen_h'
+source = '',
+command = python_cmd + ' $SCRIPT --output $TARGET --gen_h'
)

env.CodeGenerate(
@@ -73,14 +73,14 @@ env.CodeGenerate(
target = 'rasterizer/jitter/gen_builder.hpp',
script = swrroot + 'rasterizer/codegen/gen_llvm_ir_macros.py',
source = os.path.join(llvm_includedir, 'llvm/IR/IRBuilder.h'),
-command = python_cmd + ' $SCRIPT --input $SOURCE --output 
rasterizer/jitter --gen_h'
+command = python_cmd + ' $SCRIPT --input $SOURCE --output $TARGET --gen_h'
)

env.CodeGenerate(
target = 'rasterizer/jitter/gen_builder_x86.hpp',
script = swrroot + 'rasterizer/codegen/gen_llvm_ir_macros.py',
source = '',
-command = python_cmd + ' $SCRIPT --output rasterizer/jitter --gen_x86_h'
+command = python_cmd + ' $SCRIPT --output $TARGET --gen_x86_h'
)

env.CodeGenerate(
@@ -127,7 +127,8 @@ env.CodeGenerate(
env.CodeGenerate(
target = 'rasterizer/core/gen_BackendPixelRate0.cpp',
script = swrroot + 'rasterizer/codegen/gen_backends.py',
-command = python_cmd + ' $SCRIPT --output rasterizer/core --dim 5 2 3 2 2 
2 --split 0 --cpp'
+source = swrroot + 'rasterizer/codegen/templates/gen_backend.cpp',
+command = python_cmd + ' $SCRIPT --output $TARGET --template $SOURCE --dim 
5 2 3 2 2 2 --split 0 --cpp'
)

# Auto-generated .cpp files (that need to generate object files)
diff --git a/src/gallium/drivers/swr/rasterizer/codegen/gen_backends.py 
b/src/gallium/drivers/swr/rasterizer/codegen/gen_backends.py
index 242ab7a73e..8f7ba94ba1 100644
--- a/src/gallium/drivers/swr/rasterizer/codegen/gen_backends.py
+++ b/src/gallium/drivers/swr/rasterizer/codegen/gen_backends.py
@@ -34,7 +34,10 @@ def main(args=sys.argv[1:]):
parser = ArgumentParser("Generate files and initialization functions for 
all permutuations of BackendPixelRate.")
parser.add_argument('--dim', help="gBackendPixelRateTable array 
dimensions", nargs='+', type=int, required=True)
parser.add_argument('--outdir', help="output directory", nargs='?', 
type=str, default=thisDir)
+parser.add_argument('--output', help="output filename", nargs='?', 
type=str)
+parser.add_argument('--template', help="input template", nargs='?', 
type=str)

Re: [Mesa-dev] [PATCH 01/10] swr: [rasterizer codegen] Refactor codegen

2017-03-27 Thread Rowley, Timothy O

On Mar 27, 2017, at 5:06 AM, Emil Velikov 
> wrote:

On 25 March 2017 at 12:00, Tim Rowley 
> wrote:
Move common codegen functions into gen_common.py.
---
src/gallium/drivers/swr/Makefile.am|  22 +--
.../drivers/swr/rasterizer/codegen/gen_archrast.py |  30 +---
.../drivers/swr/rasterizer/codegen/gen_backends.py |  30 +---
.../drivers/swr/rasterizer/codegen/gen_common.py   | 162 +
.../drivers/swr/rasterizer/codegen/gen_knobs.py|  55 +++
.../swr/rasterizer/codegen/gen_llvm_ir_macros.py   |  35 +
.../swr/rasterizer/codegen/gen_llvm_types.py   |  32 +---
7 files changed, 212 insertions(+), 154 deletions(-)
create mode 100644 src/gallium/drivers/swr/rasterizer/codegen/gen_common.py

diff --git a/src/gallium/drivers/swr/Makefile.am 
b/src/gallium/drivers/swr/Makefile.am
index 8ba9ac9..3a0d8da 100644
--- a/src/gallium/drivers/swr/Makefile.am
+++ b/src/gallium/drivers/swr/Makefile.am
@@ -71,30 +71,30 @@ gen_swr_context_llvm.h: 
rasterizer/codegen/gen_llvm_types.py rasterizer/codegen/
   --input $(srcdir)/swr_context.h \
   --output ./gen_swr_context_llvm.h

-rasterizer/codegen/gen_knobs.cpp: rasterizer/codegen/gen_knobs.py 
rasterizer/codegen/knob_defs.py rasterizer/codegen/templates/gen_knobs.cpp
+rasterizer/codegen/gen_knobs.cpp: rasterizer/codegen/gen_knobs.py 
rasterizer/codegen/knob_defs.py rasterizer/codegen/templates/gen_knobs.cpp 
rasterizer/codegen/gen_common.py
   $(MKDIR_GEN)
   $(PYTHON_GEN) \
   $(srcdir)/rasterizer/codegen/gen_knobs.py \
-   --input $(srcdir)/rasterizer/codegen/templates/gen_knobs.cpp \
+   --input $(realpath 
$(srcdir)/rasterizer/codegen/templates/gen_knobs.cpp) \
Do we need this - there's no changes to the scons build.
If yet, abs_srcdir will give you the full path.

Don’t need it if gen_knobs.py is modified to find the template file internally, 
like the rest of our gen* scripts do.

v2 patch coming up shortly does that.

-Tim


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/32] update swr rasterizer

2017-03-20 Thread Rowley, Timothy O

> On Mar 20, 2017, at 1:42 PM, Emil Velikov  wrote:
> 
> On 16 March 2017 at 19:09, Tim Rowley  wrote:
>> Hightlights include: lots of simd16 work, assert rework, and autogen
>> changes (scripts centralized, one file added, two removed).
>> 
>> v2:
>> * adjust scons build system along with automake
>> * separate backend template comment style change
>> * only one declaration of gBackendPixelRateTable
>> * prefix template and generated files with gen_
>> 
> Haven't looked at v2 of the patches, but I wanted to say a big thank
> you for addressing these !

Not sure why acting on comments seems unusual - what you were asking for was 
certainly reasonable.

(By the way, haven’t forgotten the swr format tables - it’s on my list)

-Tim

> -Emil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/28] update swr rasterizer

2017-03-16 Thread Rowley, Timothy O

On Mar 15, 2017, at 9:59 PM, Emil Velikov 
> wrote:

Hi Tim,

On 16 March 2017 at 00:12, Tim Rowley 
> wrote:
Hightlights include: lots of simd16 work, assert rework, and autogen
changes (scripts centralized, one file added, two removed).

Tim Rowley (28):
 swr: [rasterizer core] Support sparse numa id values on all OSes
 swr: [rasterizer core] Finish SIMD16 PA OPT except tesselation
 swr: [rasterizer core] Finish SIMD16 PA OPT including tesselation
 swr: [rasterizer core/scripts] Autogen backend initialization
   function(s)
 swr: [rasterizer archrast] Add additional API events
 swr: [rasterizer core] Implement SIMD16 GS and STREAMOUT
 swr: [rasterizer archrast] Fix performance issue with archrast stats
 swr: [rasterizer common] Add InterpolateComponentFlat utility
 swr: [rasterizer core] Fix RECT_LIST primitive assembly
 swr: [rasterizer archrast/scripts] Further archrast cleanups
 swr: [rasterizer archrast] Remove redundant data from archrast files
 swr: [rasterizer archrast/core/scripts] Fix archrast multithreading
   issue
 swr: [rasterizer core] Implement double pumped SIMD16 TESS
 swr: [rasterizer archrast] Fix the early and late depthstencil events
 swr: [rasterizer] Backend code adjustments
 swr: [rasterizer] Slight assert refactoring
 swr: [rasterizer core] Allow no arguments to SWR_INVALID macro
 swr: [rasterizer core/common] Fix the native AVX512 build under ICC
 swr: [rasterizer core] Fix typo in SIMD16 code path
 swr: [rasterizer] Convert more SWR_ASSERT(false, ...) to
   SWR_INVALID(...)
 swr: [rasterizer jitter] Fix LogicOp blend jit after assert changes
 swr: [rasterizer core] SIMD16 Frontend WIP - fix tesselation crashes
 swr: [rasterizer core] fix trifan regression from 52f9f54dce
 swr: [rasterizer scripts] Put codegen scripts into a separate
   directory
 swr: [rasterizer codegen] Quiet gen_backends.py execution
 swr: [rasterizer codegen] Rewrite gen_llvm_ir_macros.py to use mako
 swr: [rasterizer codegen] Fix generation of knobs
 swr: [rasterizer codegen] Rewrite gen_llvm_types.py to use mako

Above all - thanks for omitting the execute bit/shebang for the new
python script !

A few high-level comments/suggestions.

v2 of the patch upcoming with the following addressed as follows.

- seems like this series will break the scons build
You really do _not_ want to do that, please ensure that things build
[always] with both scons and autoconf.

Ah, I keep forgetting about the second build system.  I’ve gone back and added 
the scons changes.

- there's at least one patch that mixes how licence text is commented
alongside other changes
Please keep that to a separate patch ?

Separated out the comment style change patch.

- some variables (gBackendPixelRateTable comes to mind) are declared
in various places, and during the series not all instances are updated
at once
Just move them to a header, in a prep patch ?

Moved declaration of gBackendPixelRateTable to backend.h in a patch prior to 
the autogen and other changes.

- renaming actual mako templates to .cpp/.hpp is confusing/misleading
Worth leaving as-is and/or checking how other generators in-tree name
such files ?

The idea behind using a cpp/hpp extension was that the editor would better 
automatically handle the template files during editing.  In the interest of 
consistency, a commit has been added to cleanup the naming so all template and 
generated files are now prefixed with gen_.


I would strongly urge you to address the first issue, but at the end
of the day it's wishful thinking.

Thanks
Emil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] configure.ac: increase required swr llvm to 3.9.0

2017-03-03 Thread Rowley, Timothy O

> On Mar 3, 2017, at 5:55 AM, Emil Velikov  wrote:
> 
> On 3 March 2017 at 01:16, Tim Rowley  wrote:
>> GS implementation uses the masked.{gather,store} intrinsics,
>> introduced in llvm-3.9.0.
> 
> Please mention in the commit message that the SCons build already
> requires 3.9 or later.
> Can you add a note about the LLVM requirement and GS support in
> docs/relnotes/17.1.0.html, with a separate commit on top ?

Both of these are in v2 of the patch set.

> With this we have some ~20 preprocessor conditionals which want to be
> cleaned up. Look for
> $ git grep  "LLVM_.*VERSION\|HAVE_LLVM" -- src/gallium/drivers/swr/

Ah, good catch.  We’ve been ratcheting up our required llvm version without 
cleaning out some of the cruft.  Internally we’re still using 3.8 so not all of 
these can be removed.  I’ll work on that in a follow-up patch, as it’s 
unrelated to the geometry shader implementation.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: Fix crash in swr_update_derived following st/mesa state changes.

2017-03-02 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Mar 1, 2017, at 10:58 PM, Bruce Cherniak 
> wrote:

Recent change to st/mesa state update logic caused major regressions to
swr validation code.

swr uses the same validation logic (swr_update_derived) for both draw
and Clear calls.  New st/mesa state update logic results in certain state
objects not being set/bound during Clear.  This was causing null ptr
exceptions.  Creation of static dummy state objects allows setting these
pointers during Clear validation, without interfering with relevant state
validation.

Once fixed, new logic also highlighted an error in dirty bit checking for
fragment shader and clip validation.

(The alternative is to have a simplified validation routine for Clear.
Which may do that at some point.)
---
src/gallium/drivers/swr/swr_shader.cpp |  6 +
src/gallium/drivers/swr/swr_state.cpp  | 43 +++---
2 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_shader.cpp 
b/src/gallium/drivers/swr/swr_shader.cpp
index 676938c..9169f6d 100644
--- a/src/gallium/drivers/swr/swr_shader.cpp
+++ b/src/gallium/drivers/swr/swr_shader.cpp
@@ -366,6 +366,9 @@ BuilderSWR::CompileVS(struct swr_context *ctx, 
swr_jit_vs_key )
PFN_VERTEX_FUNC
swr_compile_vs(struct swr_context *ctx, swr_jit_vs_key )
{
+   if (!ctx->vs->pipe.tokens)
+  return NULL;
+
   BuilderSWR builder(
  reinterpret_cast(swr_screen(ctx->pipe.screen)->hJitMgr),
  "VS");
@@ -726,6 +729,9 @@ BuilderSWR::CompileFS(struct swr_context *ctx, 
swr_jit_fs_key )
PFN_PIXEL_KERNEL
swr_compile_fs(struct swr_context *ctx, swr_jit_fs_key )
{
+   if (!ctx->fs->pipe.tokens)
+  return NULL;
+
   BuilderSWR builder(
  reinterpret_cast(swr_screen(ctx->pipe.screen)->hJitMgr),
  "FS");
diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index 5e3d58d..e1f1734 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -914,6 +914,39 @@ swr_update_derived(struct pipe_context *pipe,
   struct swr_context *ctx = swr_context(pipe);
   struct swr_screen *screen = swr_screen(pipe->screen);

+   /* When called from swr_clear (p_draw_info = null), set any null
+* state-objects to the dummy state objects to prevent nullptr dereference
+* in validation below.
+*
+* Important that this remains static for zero initialization.  These
+* aren't meant to be proper state objects, just empty structs. They will
+* not be written to.
+*
+* Shaders can't be part of the union since they contain std::unordered_map
+*/
+   static struct {
+  union {
+ struct pipe_rasterizer_state rasterizer;
+ struct pipe_depth_stencil_alpha_state depth_stencil;
+ struct swr_blend_state blend;
+  } state;
+  struct swr_vertex_shader vs;
+  struct swr_fragment_shader fs;
+   } swr_dummy;
+
+   if (!p_draw_info) {
+  if (!ctx->rasterizer)
+ ctx->rasterizer = _dummy.state.rasterizer;
+  if (!ctx->depth_stencil)
+ ctx->depth_stencil = _dummy.state.depth_stencil;
+  if (!ctx->blend)
+ ctx->blend = _dummy.state.blend;
+  if (!ctx->vs)
+ ctx->vs = _dummy.vs;
+  if (!ctx->fs)
+ ctx->fs = _dummy.fs;
+   }
+
   /* Update screen->pipe to current pipe context. */
   if (screen->pipe != pipe)
  screen->pipe = pipe;
@@ -1236,8 +1269,12 @@ swr_update_derived(struct pipe_context *pipe,
   }

   /* FragmentShader */
-   if (ctx->dirty & (SWR_NEW_FS | SWR_NEW_SAMPLER | SWR_NEW_SAMPLER_VIEW
- | SWR_NEW_RASTERIZER | SWR_NEW_FRAMEBUFFER)) {
+   if (ctx->dirty & (SWR_NEW_FS |
+ SWR_NEW_VS |
+ SWR_NEW_RASTERIZER |
+ SWR_NEW_SAMPLER |
+ SWR_NEW_SAMPLER_VIEW |
+ SWR_NEW_FRAMEBUFFER)) {
  swr_jit_fs_key key;
  swr_generate_fs_key(key, ctx, ctx->fs);
  auto search = ctx->fs->map.find(key);
@@ -1505,7 +1542,7 @@ swr_update_derived(struct pipe_context *pipe,
  }
   }

-   if (ctx->dirty & SWR_NEW_CLIP) {
+   if (ctx->dirty & (SWR_NEW_CLIP | SWR_NEW_RASTERIZER | SWR_NEW_VS)) {
  // shader exporting clip distances overrides all user clip planes
  if (ctx->rasterizer->clip_plane_enable &&
  !ctx->vs->info.base.num_written_clipdistance)
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] docs: update features.txt for GL_ARB_clear_texture with swr

2017-03-02 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Feb 25, 2017, at 9:17 PM, Bruce Cherniak 
> wrote:

---
docs/features.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/features.txt b/docs/features.txt
index d9528e9..c42581a 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -192,7 +192,7 @@ GL 4.4, GLSL 4.40 -- all DONE: i965/gen8+, nvc0, radeonsi

  GL_MAX_VERTEX_ATTRIB_STRIDE   DONE (all drivers)
  GL_ARB_buffer_storage DONE (i965, nv50, r600)
-  GL_ARB_clear_texture  DONE (i965, nv50, 
r600, llvmpipe, softpipe)
+  GL_ARB_clear_texture  DONE (i965, nv50, 
r600, llvmpipe, softpipe, swr)
  GL_ARB_enhanced_layouts   DONE (i965, nv50, 
llvmpipe, softpipe)
  - compile-time constant expressions   DONE
  - explicit byte offsets for blocksDONE
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: [rasterizer common/core/jitter] fetch support for GL_FIXED

2017-03-01 Thread Rowley, Timothy O

> On Feb 9, 2017, at 8:50 AM, Emil Velikov  wrote:
> 
> On 7 December 2016 at 23:58, Tim Rowley  wrote:
>> ---
>> .../drivers/swr/rasterizer/common/formats.cpp  | 104 
>> ++---
>> .../drivers/swr/rasterizer/common/formats.h|   7 +-
>> .../drivers/swr/rasterizer/core/format_traits.h|  90 +-
> Tim, these three seems to be auto-generated from somewhere, yet the
> scripts are missing.
> Can we merge the script(s) and drop the files from git ?

Sorry, I’ve been meaning to get back to this mail sooner.

There’s a bit of work to allow these files to be autogenerated in the mesa 
tree, but I’m looking to get that done soon after finishing a couple large 
patch sets.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: [rasterizer core] Removed unused clip code.

2017-02-06 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Feb 3, 2017, at 11:35 AM, Bruce Cherniak 
> wrote:

Removed unused Clip() and FRUSTUM_CLIP_MASK define.
---
src/gallium/drivers/swr/rasterizer/core/clip.cpp | 22 --
src/gallium/drivers/swr/rasterizer/core/clip.h   |  4 
2 files changed, 26 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/clip.cpp 
b/src/gallium/drivers/swr/rasterizer/core/clip.cpp
index 7b1e09d..0a6afe5 100644
--- a/src/gallium/drivers/swr/rasterizer/core/clip.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/clip.cpp
@@ -157,28 +157,6 @@ int ClipTriToPlane( const float *pInPts, int numInPts,
return i;
}

-
-
-void Clip(const float *pTriangle, const float *pAttribs, int numAttribs, float 
*pOutTriangles, int *numVerts, float *pOutAttribs)
-{
-// temp storage to hold at least 6 sets of vertices, the max number that 
can be created during clipping
-OSALIGNSIMD(float) tempPts[6 * 4];
-OSALIGNSIMD(float) tempAttribs[6 * KNOB_NUM_ATTRIBUTES * 4];
-
-// we opt to clip to viewport frustum to produce smaller triangles for 
rasterization precision
-int NumOutPts = ClipTriToPlane(pTriangle, 3, pAttribs, 
numAttribs, tempPts, tempAttribs);
-NumOutPts = ClipTriToPlane(tempPts, NumOutPts, tempAttribs, 
numAttribs, pOutTriangles, pOutAttribs);
-NumOutPts = ClipTriToPlane(pOutTriangles, NumOutPts, 
pOutAttribs, numAttribs, tempPts, tempAttribs);
-NumOutPts = ClipTriToPlane(tempPts, NumOutPts, tempAttribs, 
numAttribs, pOutTriangles, pOutAttribs);
-NumOutPts = ClipTriToPlane(pOutTriangles, NumOutPts, 
pOutAttribs, numAttribs, tempPts, tempAttribs);
-NumOutPts = ClipTriToPlane(tempPts, NumOutPts, tempAttribs, 
numAttribs, pOutTriangles, pOutAttribs);
-
-SWR_ASSERT(NumOutPts <= 6);
-
-*numVerts = NumOutPts;
-return;
-}
-
void ClipTriangles(DRAW_CONTEXT *pDC, PA_STATE& pa, uint32_t workerId, 
simdvector prims[], uint32_t primMask, simdscalari primId, simdscalari 
viewportIdx)
{
SWR_CONTEXT *pContext = pDC->pContext;
diff --git a/src/gallium/drivers/swr/rasterizer/core/clip.h 
b/src/gallium/drivers/swr/rasterizer/core/clip.h
index f19858f..23a768f 100644
--- a/src/gallium/drivers/swr/rasterizer/core/clip.h
+++ b/src/gallium/drivers/swr/rasterizer/core/clip.h
@@ -56,12 +56,8 @@ enum SWR_CLIPCODES
GUARDBAND_BOTTOM = (0x80 << CLIPCODE_SHIFT | 0x8)
};

-#define FRUSTUM_CLIP_MASK 
(FRUSTUM_LEFT|FRUSTUM_TOP|FRUSTUM_RIGHT|FRUSTUM_BOTTOM|FRUSTUM_NEAR|FRUSTUM_FAR)
#define GUARDBAND_CLIP_MASK 
(FRUSTUM_NEAR|FRUSTUM_FAR|GUARDBAND_LEFT|GUARDBAND_TOP|GUARDBAND_RIGHT|GUARDBAND_BOTTOM|NEGW)

-void Clip(const float *pTriangle, const float *pAttribs, int numAttribs, float 
*pOutTriangles,
-  int *numVerts, float *pOutAttribs);
-
INLINE
void ComputeClipCodes(const API_STATE& state, const simdvector& vertex, 
simdscalar& clipCodes, simdscalari viewportIndexes)
{
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] swr: [rasterizer core] Remove dead code Clipper::ClipScalar()

2017-02-06 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Feb 4, 2017, at 5:55 PM, Vinson Lee 
> wrote:

Tested-by: Vinson Lee >

On Thu, Feb 2, 2017 at 12:42 PM, Cherniak, Bruce
> wrote:
I followed up with a v2 that includes the bugzilla reference.

Good point, I’ll look into following up with a patch to remove Clip().

Thanks for the quick review.

On Feb 2, 2017, at 2:26 PM, Ilia Mirkin 
> wrote:

Reviewed-by: Ilia Mirkin >

I got confused by this code as well when I was trying to understand
the clipper. I think the Clip() function can go too now in the .cpp
file (as well as the fwd decl in the header)?

On Thu, Feb 2, 2017 at 3:15 PM, Bruce Cherniak 
> wrote:
Clipper::ClipScalar() is dead code and should be removed.  It is causing
an error with gcc-7 because it references a now defunct member.

CC: "13.0 17.0" 
>
---
src/gallium/drivers/swr/rasterizer/core/clip.h | 39 --
1 file changed, 39 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/clip.h 
b/src/gallium/drivers/swr/rasterizer/core/clip.h
index 085e4a9..f19858f 100644
--- a/src/gallium/drivers/swr/rasterizer/core/clip.h
+++ b/src/gallium/drivers/swr/rasterizer/core/clip.h
@@ -262,45 +262,6 @@ public:
   return _simd_movemask_ps(vClipCullMask);
   }

-// clip a single primitive
-int ClipScalar(PA_STATE& pa, uint32_t primIndex, float* pOutPos, float* 
pOutAttribs)
-{
-OSALIGNSIMD(float) inVerts[3 * 4];
-OSALIGNSIMD(float) inAttribs[3 * KNOB_NUM_ATTRIBUTES * 4];
-
-// transpose primitive position
-__m128 verts[3];
-pa.AssembleSingle(VERTEX_POSITION_SLOT, primIndex, verts);
-_mm_store_ps([0], verts[0]);
-_mm_store_ps([4], verts[1]);
-_mm_store_ps([8], verts[2]);
-
-// transpose attribs
-uint32_t numScalarAttribs = this->state.linkageCount * 4;
-
-int idx = 0;
-DWORD slot = 0;
-uint32_t mapIdx = 0;
-uint32_t tmpLinkage = uint32_t(this->state.linkageMask);
-while (_BitScanForward(, tmpLinkage))
-{
-tmpLinkage &= ~(1 << slot);
-// Compute absolute attrib slot in vertex array
-uint32_t inputSlot = VERTEX_ATTRIB_START_SLOT + 
this->state.linkageMap[mapIdx++];
-__m128 attrib[3];// triangle attribs (always 4 wide)
-pa.AssembleSingle(inputSlot, primIndex, attrib);
-_mm_store_ps([idx], attrib[0]);
-_mm_store_ps([idx + numScalarAttribs], attrib[1]);
-_mm_store_ps([idx + numScalarAttribs * 2], attrib[2]);
-idx += 4;
-}
-
-int numVerts;
-Clip(inVerts, inAttribs, numScalarAttribs, pOutPos, , 
pOutAttribs);
-
-return numVerts;
-}
-
   // clip SIMD primitives
   void ClipSimd(const simdscalar& vPrimMask, const simdscalar& vClipMask, 
PA_STATE& pa, const simdscalari& vPrimId, const simdscalari& vViewportIdx)
   {
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-stable mailing list
mesa-sta...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-stable

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: Fix BugID 9919 compile error (icc-only).

2016-12-22 Thread Rowley, Timothy O
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99119
Reviewed-by: Tim Rowley 
>

On Dec 22, 2016, at 6:06 PM, Bruce Cherniak 
> wrote:

ICC doesn't like the use of nullptr (std::nullptr_t) argument in
p_atomic_set.  GCC and clang don't complain.
---
src/gallium/drivers/swr/swr_fence_work.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_fence_work.cpp 
b/src/gallium/drivers/swr/swr_fence_work.cpp
index 3f83e61..1fd2a83 100644
--- a/src/gallium/drivers/swr/swr_fence_work.cpp
+++ b/src/gallium/drivers/swr/swr_fence_work.cpp
@@ -39,7 +39,7 @@ swr_fence_do_work(struct swr_fence *fence)
  work = fence->work.head.next;
  /* Immediately clear the head so any new work gets added to a new work
   * queue */
-  p_atomic_set(>work.head.next, nullptr);
+  p_atomic_set(>work.head.next, 0);
  p_atomic_set(>work.tail, >work.head);
  p_atomic_set(>work.count, 0);

--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 2/2] swr: supply proper clip distances to point sprites

2016-12-08 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Dec 8, 2016, at 8:21 PM, Ilia Mirkin 
> wrote:

Large points become pairs of triangles when rasterized, so we must feed
it three clip distances, one for each vertex.

The clip distance is not subject to sprite coord replacement, so there's
no interpolation of it. We just take its value and put it in the "z"
component of the barycentric-ready plane equation.

(We could also just cull it at an earlier point in time, but that would
require larger changes.)

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/rasterizer/core/binner.cpp | 12 +---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
index 1538020..d5f2e97 100644
--- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
@@ -1185,9 +1185,15 @@ void BinPoints(
if (rastState.clipDistanceMask)
{
uint32_t numClipDist = 
_mm_popcnt_u32(rastState.clipDistanceMask);
-float one[2] = {1.0f, 1.0f};
-desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 2 * 
sizeof(float));
-ProcessUserClipDist<2>(pa, primIndex, 
rastState.clipDistanceMask, one, desc.pUserClipBuffer);
+desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 3 * 
sizeof(float));
+float dists[8];
+float one = 1.0f;
+ProcessUserClipDist<1>(pa, primIndex, 
rastState.clipDistanceMask, , dists);
+for (uint32_t i = 0; i < numClipDist; i++) {
+desc.pUserClipBuffer[3*i + 0] = 0.0f;
+desc.pUserClipBuffer[3*i + 1] = 0.0f;
+desc.pUserClipBuffer[3*i + 2] = dists[i];
+}
}

MacroTileMgr *pTileMgr = pDC->pTileMgr;
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/2] swr: perform perspective division on clip distances

2016-12-08 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Dec 8, 2016, at 8:21 PM, Ilia Mirkin 
> wrote:

Clip distances need to be perspective-divided. This fixes all the
interpolation-*-{distance,vertex} piglits.

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/rasterizer/core/binner.cpp | 14 --
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
index 6f9259f..1538020 100644
--- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
@@ -383,7 +383,7 @@ PFN_PROCESS_ATTRIBUTES GetProcessAttributesFunc(uint32_t 
NumVerts, bool IsSwizzl
/// @param clipDistMask - mask of enabled clip distances
/// @param pUserClipBuffer - buffer to store results
template
-void ProcessUserClipDist(PA_STATE& pa, uint32_t primIndex, uint8_t 
clipDistMask, float* pUserClipBuffer)
+void ProcessUserClipDist(PA_STATE& pa, uint32_t primIndex, uint8_t 
clipDistMask, float *pRecipW, float* pUserClipBuffer)
{
DWORD clipDist;
while (_BitScanForward(, clipDistMask))
@@ -407,11 +407,12 @@ void ProcessUserClipDist(PA_STATE& pa, uint32_t 
primIndex, uint8_t clipDistMask,

// setup plane equations for barycentric interpolation in the backend
float baryCoeff[NumVerts];
+float last = vertClipDist[NumVerts - 1] * pRecipW[NumVerts - 1];
for (uint32_t e = 0; e < NumVerts - 1; ++e)
{
-baryCoeff[e] = vertClipDist[e] - vertClipDist[NumVerts - 1];
+baryCoeff[e] = vertClipDist[e] * pRecipW[e] - last;
}
-baryCoeff[NumVerts - 1] = vertClipDist[NumVerts - 1];
+baryCoeff[NumVerts - 1] = last;

for (uint32_t e = 0; e < NumVerts; ++e)
{
@@ -834,7 +835,7 @@ endBinTriangles:
{
uint32_t numClipDist = _mm_popcnt_u32(rastState.clipDistanceMask);
desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 3 * 
sizeof(float));
-ProcessUserClipDist<3>(pa, triIndex, rastState.clipDistanceMask, 
desc.pUserClipBuffer);
+ProcessUserClipDist<3>(pa, triIndex, rastState.clipDistanceMask, 
[12], desc.pUserClipBuffer);
}

for (uint32_t y = aMTTop[triIndex]; y <= aMTBottom[triIndex]; ++y)
@@ -1184,8 +1185,9 @@ void BinPoints(
if (rastState.clipDistanceMask)
{
uint32_t numClipDist = 
_mm_popcnt_u32(rastState.clipDistanceMask);
+float one[2] = {1.0f, 1.0f};
desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 2 * 
sizeof(float));
-ProcessUserClipDist<2>(pa, primIndex, 
rastState.clipDistanceMask, desc.pUserClipBuffer);
+ProcessUserClipDist<2>(pa, primIndex, 
rastState.clipDistanceMask, one, desc.pUserClipBuffer);
}

MacroTileMgr *pTileMgr = pDC->pTileMgr;
@@ -1396,7 +1398,7 @@ void BinPostSetupLines(
{
uint32_t numClipDist = _mm_popcnt_u32(rastState.clipDistanceMask);
desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 2 * 
sizeof(float));
-ProcessUserClipDist<2>(pa, primIndex, rastState.clipDistanceMask, 
desc.pUserClipBuffer);
+ProcessUserClipDist<2>(pa, primIndex, rastState.clipDistanceMask, 
[12], desc.pUserClipBuffer);
}

MacroTileMgr *pTileMgr = pDC->pTileMgr;
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: perform perspective division on clip distances

2016-12-08 Thread Rowley, Timothy O

> On Nov 24, 2016, at 2:29 PM, Ilia Mirkin  wrote:
> 
> Clip distances need to be perspective-divided. This fixes all the
> interpolation-*-{distance,vertex} piglits.
> 
> Also take this opportunity to fix clip distances for points rasterized
> as triangles - the clip distance is not subject to sprite coord
> replacement, so there's no interpolation of it. We just take its value
> and put it in the "z" component of the barycentric-ready plane equation.
> (We could also just cull it at an earlier point in time, but that would
> require larger changes.)
> 

Would prefer this second change moved to a separate commit.  I’ve spent the 
most time looking at that, and still not convinced it’s correct.

> Signed-off-by: Ilia Mirkin 
> ---
> src/gallium/drivers/swr/rasterizer/core/binner.cpp | 22 +++---
> 1 file changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
> b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
> index 6f9259f..d5f2e97 100644
> --- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
> @@ -383,7 +383,7 @@ PFN_PROCESS_ATTRIBUTES GetProcessAttributesFunc(uint32_t 
> NumVerts, bool IsSwizzl
> /// @param clipDistMask - mask of enabled clip distances
> /// @param pUserClipBuffer - buffer to store results
> template
> -void ProcessUserClipDist(PA_STATE& pa, uint32_t primIndex, uint8_t 
> clipDistMask, float* pUserClipBuffer)
> +void ProcessUserClipDist(PA_STATE& pa, uint32_t primIndex, uint8_t 
> clipDistMask, float *pRecipW, float* pUserClipBuffer)
> {
> DWORD clipDist;
> while (_BitScanForward(, clipDistMask))
> @@ -407,11 +407,12 @@ void ProcessUserClipDist(PA_STATE& pa, uint32_t 
> primIndex, uint8_t clipDistMask,
> 
> // setup plane equations for barycentric interpolation in the backend
> float baryCoeff[NumVerts];
> +float last = vertClipDist[NumVerts - 1] * pRecipW[NumVerts - 1];
> for (uint32_t e = 0; e < NumVerts - 1; ++e)
> {
> -baryCoeff[e] = vertClipDist[e] - vertClipDist[NumVerts - 1];
> +baryCoeff[e] = vertClipDist[e] * pRecipW[e] - last;
> }
> -baryCoeff[NumVerts - 1] = vertClipDist[NumVerts - 1];
> +baryCoeff[NumVerts - 1] = last;
> 
> for (uint32_t e = 0; e < NumVerts; ++e)
> {
> @@ -834,7 +835,7 @@ endBinTriangles:
> {
> uint32_t numClipDist = _mm_popcnt_u32(rastState.clipDistanceMask);
> desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 3 * 
> sizeof(float));
> -ProcessUserClipDist<3>(pa, triIndex, rastState.clipDistanceMask, 
> desc.pUserClipBuffer);
> +ProcessUserClipDist<3>(pa, triIndex, rastState.clipDistanceMask, 
> [12], desc.pUserClipBuffer);
> }
> 
> for (uint32_t y = aMTTop[triIndex]; y <= aMTBottom[triIndex]; ++y)
> @@ -1184,8 +1185,15 @@ void BinPoints(
> if (rastState.clipDistanceMask)
> {
> uint32_t numClipDist = 
> _mm_popcnt_u32(rastState.clipDistanceMask);
> -desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 2 
> * sizeof(float));
> -ProcessUserClipDist<2>(pa, primIndex, 
> rastState.clipDistanceMask, desc.pUserClipBuffer);
> +desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 3 
> * sizeof(float));
> +float dists[8];
> +float one = 1.0f;
> +ProcessUserClipDist<1>(pa, primIndex, 
> rastState.clipDistanceMask, , dists);
> +for (uint32_t i = 0; i < numClipDist; i++) {
> +desc.pUserClipBuffer[3*i + 0] = 0.0f;
> +desc.pUserClipBuffer[3*i + 1] = 0.0f;
> +desc.pUserClipBuffer[3*i + 2] = dists[i];
> +}
> }
> 
> MacroTileMgr *pTileMgr = pDC->pTileMgr;
> @@ -1396,7 +1404,7 @@ void BinPostSetupLines(
> {
> uint32_t numClipDist = _mm_popcnt_u32(rastState.clipDistanceMask);
> desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 2 * 
> sizeof(float));
> -ProcessUserClipDist<2>(pa, primIndex, 
> rastState.clipDistanceMask, desc.pUserClipBuffer);
> +ProcessUserClipDist<2>(pa, primIndex, 
> rastState.clipDistanceMask, [12], desc.pUserClipBuffer);
> }
> 
> MacroTileMgr *pTileMgr = pDC->pTileMgr;
> -- 
> 2.7.3
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

2016-12-06 Thread Rowley, Timothy O
Interesting.  My testing was done using piglit on an avx512 capable processor, 
where I didn’t see any regressions.

llvmpipe’s “make check” also passes for me with this change on avx2 and avx512 
machines.

Was this the only regression you saw?

-Tim

> On Dec 6, 2016, at 12:27 AM, Michel Dänzer  wrote:
> 
> On 06/12/16 02:39 AM, Tim Rowley wrote:
>> Use llvm provided API based on cpuid rather than our own
>> manually mantained list of mattr enabling/disabling.
> 
> This change broke the llvmpipe unit test lp_test_format for me:
> 
> Testing PIPE_FORMAT_R32_FLOAT (float) ...
> FAILED
>  Packed: 00 00 00 00
>  Unpacked (0,0): 1 0 0 1 obtained
>  0 0 0 1 expected
> FAILED
>  Packed: 00 00 80 bf
>  Unpacked (0,0): 1 0 0 1 obtained
>  -1 0 0 1 expected
> 
> 
> This is on:
> 
> processor : 0
> vendor_id : AuthenticAMD
> cpu family: 21
> model : 48
> model name: AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
> stepping  : 1
> microcode : 0x6003106
> cpu MHz   : 4100.000
> cache size: 2048 KB
> physical id   : 0
> siblings  : 4
> core id   : 0
> cpu cores : 2
> apicid: 16
> initial apicid: 0
> fpu   : yes
> fpu_exception : yes
> cpuid level   : 13
> wp: yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
> pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
> rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf 
> eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave 
> avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 
> 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext 
> perfctr_core perfctr_nb bpext ptsc cpb hw_pstate vmmcall fsgsbase bmi1 
> xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid 
> decodeassists pausefilter pfthreshold overflow_recov
> bugs  : fxsave_leak sysret_ss_attrs null_seg
> bogomips  : 8200.42
> TLB size  : 1536 4K pages
> clflush size  : 64
> cache_alignment   : 64
> address sizes : 48 bits physical, 48 bits virtual
> power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [13]
> 
> 
> 
> -- 
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: Fix active_queries count

2016-12-02 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Dec 1, 2016, at 7:08 PM, Bruce Cherniak 
> wrote:

The active_query count was incorrect for query types that don't require
a begin_query.  Removed the unnecessary assert.
---
src/gallium/drivers/swr/swr_query.cpp | 13 +++--
1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_query.cpp 
b/src/gallium/drivers/swr/swr_query.cpp
index a95e0d8..6eb0781 100644
--- a/src/gallium/drivers/swr/swr_query.cpp
+++ b/src/gallium/drivers/swr/swr_query.cpp
@@ -165,8 +165,9 @@ swr_begin_query(struct pipe_context *pipe, struct 
pipe_query *q)
   /* Initialize Results */
   memset(>result, 0, sizeof(pq->result));
   switch (pq->type) {
+   case PIPE_QUERY_GPU_FINISHED:
   case PIPE_QUERY_TIMESTAMP:
-  /* nothing to do */
+  /* nothing to do, but don't want the default */
  break;
   case PIPE_QUERY_TIME_ELAPSED:
  pq->result.timestamp_start = swr_get_timestamp(pipe->screen);
@@ -181,10 +182,10 @@ swr_begin_query(struct pipe_context *pipe, struct 
pipe_query *q)
 SwrEnableStatsFE(ctx->swrContext, TRUE);
 SwrEnableStatsBE(ctx->swrContext, TRUE);
  }
+  ctx->active_queries++;
  break;
   }

-   ctx->active_queries++;

   return true;
}
@@ -195,11 +196,10 @@ swr_end_query(struct pipe_context *pipe, struct 
pipe_query *q)
   struct swr_context *ctx = swr_context(pipe);
   struct swr_query *pq = swr_query(q);

-   assert(ctx->active_queries
-  && "swr_end_query, there are no active queries!");
-   ctx->active_queries--;
-
   switch (pq->type) {
+   case PIPE_QUERY_GPU_FINISHED:
+  /* nothing to do, but don't want the default */
+  break;
   case PIPE_QUERY_TIMESTAMP:
   case PIPE_QUERY_TIME_ELAPSED:
  pq->result.timestamp_end = swr_get_timestamp(pipe->screen);
@@ -214,6 +214,7 @@ swr_end_query(struct pipe_context *pipe, struct pipe_query 
*q)
  swr_fence_submit(ctx, pq->fence);

  /* Only change stat collection if there are no active queries */
+  ctx->active_queries--;
  if (ctx->active_queries == 0) {
 SwrEnableStatsFE(ctx->swrContext, FALSE);
 SwrEnableStatsBE(ctx->swrContext, FALSE);
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] swr: Fix type to match parameters of std::max()

2016-12-02 Thread Rowley, Timothy O
Should have parens on the zsbuf test line to match your corresponding change 
for cbuf attachments.

With that change, Reviewed-by: Tim Rowley 
>

On Dec 2, 2016, at 1:18 PM, George Kyriazis 
> wrote:

Include propagation of comparisons further down.
---
src/gallium/drivers/swr/swr_clear.cpp | 14 +++---
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_clear.cpp 
b/src/gallium/drivers/swr/swr_clear.cpp
index f59179f..08eead8 100644
--- a/src/gallium/drivers/swr/swr_clear.cpp
+++ b/src/gallium/drivers/swr/swr_clear.cpp
@@ -35,7 +35,7 @@ swr_clear(struct pipe_context *pipe,
   struct pipe_framebuffer_state *fb = >framebuffer;

   UINT clearMask = 0;
-   int layers = 0;
+   unsigned layers = 0;

   if (!swr_check_render_cond(pipe))
  return;
@@ -47,20 +47,20 @@ swr_clear(struct pipe_context *pipe,
 if (fb->cbufs[i] && (buffers & (PIPE_CLEAR_COLOR0 << i))) {
clearMask |= (SWR_ATTACHMENT_COLOR0_BIT << i);
layers = std::max(layers, fb->cbufs[i]->u.tex.last_layer -
-  fb->cbufs[i]->u.tex.first_layer + 1);
+  fb->cbufs[i]->u.tex.first_layer + 1u);
 }
   }

   if (buffers & PIPE_CLEAR_DEPTH && fb->zsbuf) {
  clearMask |= SWR_ATTACHMENT_DEPTH_BIT;
  layers = std::max(layers, fb->zsbuf->u.tex.last_layer -
-fb->zsbuf->u.tex.first_layer + 1);
+fb->zsbuf->u.tex.first_layer + 1u);
   }

   if (buffers & PIPE_CLEAR_STENCIL && fb->zsbuf) {
  clearMask |= SWR_ATTACHMENT_STENCIL_BIT;
  layers = std::max(layers, fb->zsbuf->u.tex.last_layer -
-fb->zsbuf->u.tex.first_layer + 1);
+fb->zsbuf->u.tex.first_layer + 1u);
   }

#if 0 // XXX HACK, override clear color alpha. On ubuntu, clears are
@@ -68,7 +68,7 @@ swr_clear(struct pipe_context *pipe,
   ((union pipe_color_union *)color)->f[3] = 1.0; /* cast off your const'd-ness 
*/
#endif

-   for (int i = 0; i < layers; ++i) {
+   for (unsigned i = 0; i < layers; ++i) {
  swr_update_draw_context(ctx);
  SwrClearRenderTarget(ctx->swrContext, clearMask, i,
   color->f, depth, stencil,
@@ -76,11 +76,11 @@ swr_clear(struct pipe_context *pipe,

  // Mask out the attachments that are out of layers.
  if (fb->zsbuf &&
-  fb->zsbuf->u.tex.last_layer - fb->zsbuf->u.tex.first_layer <= i)
+  fb->zsbuf->u.tex.last_layer <= fb->zsbuf->u.tex.first_layer + i)
 clearMask &= ~(SWR_ATTACHMENT_DEPTH_BIT | SWR_ATTACHMENT_STENCIL_BIT);
  for (unsigned c = 0; c < fb->nr_cbufs; ++c) {
 const struct pipe_surface *sf = fb->cbufs[c];
- if (sf && sf->u.tex.last_layer - sf->u.tex.first_layer <= i)
+ if (sf && (sf->u.tex.last_layer <= sf->u.tex.first_layer + i))
clearMask &= ~(SWR_ATTACHMENT_COLOR0_BIT << c);
  }
   }
--
2.10.0.windows.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] swr: add streamout buffer offset into pBuffer pointer

2016-11-30 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 29, 2016, at 8:23 PM, Ilia Mirkin 
> wrote:

The buffer_size does not take the offset into account. Just add the
offset into the pointer which lines up the structures much better.

Signed-off-by: Ilia Mirkin >
---

This doesn't really fix anything right now, but logically the streamOffset
is incremented on each draw, and is optionally written back out as a watermark
indicator (for pausing/resuming streams). So it should be relative to the
logical start of the buffer.

src/gallium/drivers/swr/swr_state.cpp | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index fc835dc..4475252 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -1488,10 +1488,11 @@ swr_update_derived(struct pipe_context *pipe,
continue;
 buffer.enable = true;
 buffer.pBuffer =
-(uint32_t *)swr_resource_data(ctx->so_targets[i]->buffer);
+(uint32_t *)(swr_resource_data(ctx->so_targets[i]->buffer) +
+ ctx->so_targets[i]->buffer_offset);
 buffer.bufferSize = ctx->so_targets[i]->buffer_size >> 2;
 buffer.pitch = stream_output->stride[i];
- buffer.streamOffset = ctx->so_targets[i]->buffer_offset >> 2;
+ buffer.streamOffset = 0;

 SwrSetSoBuffers(ctx->swrContext, , i);
  }
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] swr: turn off queries around blits

2016-11-30 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 29, 2016, at 8:23 PM, Ilia Mirkin 
> wrote:

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/swr_context.cpp | 10 +-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_context.cpp 
b/src/gallium/drivers/swr/swr_context.cpp
index b355bba..b8c87fa 100644
--- a/src/gallium/drivers/swr/swr_context.cpp
+++ b/src/gallium/drivers/swr/swr_context.cpp
@@ -301,7 +301,10 @@ swr_blit(struct pipe_context *pipe, const struct 
pipe_blit_info *blit_info)
  return;
   }

-   /* XXX turn off occlusion and streamout queries */
+   if (ctx->active_queries) {
+  SwrEnableStatsFE(ctx->swrContext, FALSE);
+  SwrEnableStatsBE(ctx->swrContext, FALSE);
+   }

   util_blitter_save_vertex_buffer_slot(ctx->blitter, ctx->vertex_buffer);
   util_blitter_save_vertex_elements(ctx->blitter, (void *)ctx->velems);
@@ -335,6 +338,11 @@ swr_blit(struct pipe_context *pipe, const struct 
pipe_blit_info *blit_info)
  ctx->render_cond_mode);

   util_blitter_blit(ctx->blitter, );
+
+   if (ctx->active_queries) {
+  SwrEnableStatsFE(ctx->swrContext, TRUE);
+  SwrEnableStatsBE(ctx->swrContext, TRUE);
+   }
}


--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] swr: fix assertion for max number of so targets

2016-11-30 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 29, 2016, at 8:23 PM, Ilia Mirkin 
> wrote:

The number has to be less than or equal to the max, not just less than.

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/swr_state.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index 9f6b5b0..fc835dc 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -1570,7 +1570,7 @@ swr_set_so_targets(struct pipe_context *pipe,
   struct swr_context *swr = swr_context(pipe);
   uint32_t i;

-   assert(num_targets < MAX_SO_STREAMS);
+   assert(num_targets <= MAX_SO_STREAMS);

   for (i = 0; i < num_targets; i++) {
  pipe_so_target_reference(
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/5] swr: properly report max number of SO components

2016-11-30 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 29, 2016, at 8:23 PM, Ilia Mirkin 
> wrote:

The components count the number of individual values, not the number of
slots.

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/swr_screen.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index e184548..2388922 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -166,7 +166,7 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
  return MAX_SO_STREAMS;
   case PIPE_CAP_MAX_STREAM_OUTPUT_SEPARATE_COMPONENTS:
   case PIPE_CAP_MAX_STREAM_OUTPUT_INTERLEAVED_COMPONENTS:
-  return MAX_ATTRIBUTES;
+  return MAX_ATTRIBUTES * 4;
   case PIPE_CAP_MAX_GEOMETRY_OUTPUT_VERTICES:
   case PIPE_CAP_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS:
  return 1024;
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] swr: don't advertise stream pause/resume

2016-11-30 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 29, 2016, at 8:23 PM, Ilia Mirkin 
> wrote:

There is no support for resuming streamout. Furthermore, this also
controls glDrawTransformFeedback functionality which requires the same
ability to query how many primitives were sent out of TF.

Signed-off-by: Ilia Mirkin >
---

I have a partially-working patch for bringing this back, but it's not 100%
quite yet - some sort of issues with concurrency I have yet to track down.

However in the current state, this is just totally not supported by the FE
(but the swr core does do this).

src/gallium/drivers/swr/swr_screen.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index 19bb102..e184548 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -232,7 +232,6 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
   case PIPE_CAP_USER_VERTEX_BUFFERS:
   case PIPE_CAP_USER_INDEX_BUFFERS:
   case PIPE_CAP_USER_CONSTANT_BUFFERS:
-   case PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME:
   case PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS:
   case PIPE_CAP_QUERY_TIMESTAMP:
   case PIPE_CAP_TEXTURE_BUFFER_OBJECTS:
@@ -311,6 +310,7 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
   case PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED:
   case PIPE_CAP_VIEWPORT_SUBPIXEL_BITS:
   case PIPE_CAP_TGSI_ARRAY_COMPONENTS:
+   case PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME:
  return 0;

   case PIPE_CAP_VENDOR_ID:
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: remove warning about multi-layer surfaces

2016-11-30 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 29, 2016, at 8:05 PM, Ilia Mirkin 
> wrote:

We now support clearing these, and actually rendering to multiple layers
would require GS support, which will fail in much more spectacular ways
for now. Once that is hooked up, there won't be anything else to do
here.

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/swr_context.cpp | 4 
1 file changed, 4 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_context.cpp 
b/src/gallium/drivers/swr/swr_context.cpp
index 5a1927c..b355bba 100644
--- a/src/gallium/drivers/swr/swr_context.cpp
+++ b/src/gallium/drivers/swr/swr_context.cpp
@@ -62,10 +62,6 @@ swr_create_surface(struct pipe_context *pipe,
 ps->u.tex.level = surf_tmpl->u.tex.level;
 ps->u.tex.first_layer = surf_tmpl->u.tex.first_layer;
 ps->u.tex.last_layer = surf_tmpl->u.tex.last_layer;
- if (ps->u.tex.first_layer != ps->u.tex.last_layer) {
-debug_printf("creating surface with multiple layers, rendering "
- "to first layer only\n");
- }
  } else {
 /* setting width as number of elements should get us correct
  * renderbuffer width */
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: [rasterizer core] don't attempt to load another RTAI when storing

2016-11-30 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 16, 2016, at 9:04 PM, Ilia Mirkin 
> wrote:

Since we don't pass a renderTargetArrayIndex in, and the current hot
tile may be for a different index, we may end up loading the RTAI=0 into
the hot tile for no reason.

Signed-off-by: Ilia Mirkin >
---

Noticed this when doing an audit of GetHotTile calls without a 
renderTargetArrayIndex being passed in. In this case, I don't think it should 
be loading at all...

Note that this has not been rigorously tested.

src/gallium/drivers/swr/rasterizer/core/backend.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/backend.cpp 
b/src/gallium/drivers/swr/rasterizer/core/backend.cpp
index 3375585..29d0ff5 100644
--- a/src/gallium/drivers/swr/rasterizer/core/backend.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/backend.cpp
@@ -361,7 +361,7 @@ void ProcessStoreTileBE(DRAW_CONTEXT *pDC, uint32_t 
workerId, uint32_t macroTile
MacroTileMgr::getTileIndices(macroTile, x, y);

// Only need to store the hottile if it's been rendered to...
-HOTTILE *pHotTile = pContext->pHotTileMgr->GetHotTile(pContext, pDC, 
macroTile, attachment, false);
+HOTTILE *pHotTile = pContext->pHotTileMgr->GetHotTileNoLoad(pContext, pDC, 
macroTile, attachment, false);
if (pHotTile)
{
// clear if clear is pending (i.e., not rendered to), then mark as 
dirty for store.
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: [rasterizer memory] assert when trying to convert an unknown format

2016-11-30 Thread Rowley, Timothy O
Not seeing this assert fire on our tests either.

Reviewed-by: Tim Rowley 
>

On Nov 29, 2016, at 8:04 PM, Ilia Mirkin 
> wrote:

Signed-off-by: Ilia Mirkin >
---

I've been running this for a little while and haven't hit it. I had a theory
at one point that there was a missing format in there which turned out to be
false, but I think this is still good to have rather than silently fail.

src/gallium/drivers/swr/rasterizer/memory/Convert.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/swr/rasterizer/memory/Convert.h 
b/src/gallium/drivers/swr/rasterizer/memory/Convert.h
index c31459c..527324c 100644
--- a/src/gallium/drivers/swr/rasterizer/memory/Convert.h
+++ b/src/gallium/drivers/swr/rasterizer/memory/Convert.h
@@ -724,6 +724,7 @@ INLINE static void ConvertPixelFromFloat(
case R8G8B8_SINT: ConvertPixelFromFloat(pDst, srcPixel); break;
case RAW: ConvertPixelFromFloat(pDst, srcPixel); break;
default:
+SWR_ASSERT(0);
break;
}
}
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: [rasterizer jit] use signed integer representation for logic op

2016-11-29 Thread Rowley, Timothy O

On Nov 27, 2016, at 11:13 PM, Ilia Mirkin 
> wrote:

On Thu, Nov 24, 2016 at 6:11 PM, Ilia Mirkin 
> wrote:
Instead of (incorrectly) biasing the snorm value to make it look like a
unorm, just use signed integer math.

This fixes arb_color_buffer_float-render GL_RGBA8_SNORM

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp | 17 -
1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
index ad809c4..339ca52 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
@@ -692,9 +692,13 @@ struct BlendJit : public Builder
dst[i] = BITCAST(dst[i], mSimdInt32Ty);
break;
case SWR_TYPE_SNORM:
-src[i] = FADD(src[i], VIMMED1(0.5f));
-dst[i] = FADD(dst[i], VIMMED1(0.5f));
-/* fallthrough */
+src[i] = FP_TO_SI(
+FMUL(src[i], VIMMED1(scale[i])),
+mSimdInt32Ty);
+dst[i] = FP_TO_SI(
+FMUL(dst[i], VIMMED1(scale[i])),
+mSimdInt32Ty);
+break;
case SWR_TYPE_UNORM:
src[i] = FP_TO_UI(
FMUL(src[i], VIMMED1(scale[i])),
@@ -728,11 +732,14 @@ struct BlendJit : public Builder
result[i] = BITCAST(result[i], mSimdFP32Ty);
break;
case SWR_TYPE_SNORM:
+result[i] = SHL(result[i], 32 - info.bpc[i]);
+result[i] = ASHR(result[i], 32 - info.bpc[i]);

These two immediate arguments should probably have a C() around them.
I've fixed that up in my tree. Hopefully these will emit as VPSLLD and
VPSRAD. Not sure how to check that.

With the version of the patch from your tree, I’m seeing this IR:

  %24 = ashr exact <8 x i32> %23, i32 24
  %25 = sitofp <8 x i32> %24 to <8 x float>
  %26 = fmul <8 x float> %25, 
  store <8 x float> %26, <8 x float>* %result, align 32

Turn into this x86 code:

  9a:   vpslld ymm1,ymm3,0x18
  9f:   vpsrad ymm1,ymm1,0x18
  a4:   vcvtdq2ps ymm1,ymm1
  a8:   vmulps ymm1,ymm1,ymm2
  ac:   vmovaps YMMWORD PTR [rax+0x20],ymm1

So llvm does what you expected.

Version of this patch from your tree Reviewed-by: Tim Rowley 
>



+result[i] = FMUL(SI_TO_FP(result[i], mSimdFP32Ty),
+ VIMMED1(1.0f / scale[i]));
+break;
case SWR_TYPE_UNORM:
result[i] = FMUL(UI_TO_FP(result[i], mSimdFP32Ty),
 VIMMED1(1.0f / scale[i]));
-if (info.type[i] == SWR_TYPE_SNORM)
-result[i] = FADD(result[i], VIMMED1(-0.5f));
break;
}

--
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/6] swr: add missing rgbx8_srgb variant

2016-11-29 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 22, 2016, at 7:37 PM, Ilia Mirkin 
> wrote:

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/swr_screen.cpp | 1 +
1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index 642f9be..19bb102 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -497,6 +497,7 @@ mesa_to_swr_format(enum pipe_format format)
  {PIPE_FORMAT_R8G8B8A8_UNORM, R8G8B8A8_UNORM},
  {PIPE_FORMAT_R8G8B8A8_SRGB,  R8G8B8A8_UNORM_SRGB},
  {PIPE_FORMAT_R8G8B8X8_UNORM, R8G8B8X8_UNORM},
+  {PIPE_FORMAT_R8G8B8X8_SRGB,  R8G8B8X8_UNORM_SRGB},

  {PIPE_FORMAT_R8_USCALED, R8_USCALED},
  {PIPE_FORMAT_R8G8_USCALED,   R8G8_USCALED},
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/6] swr: reorder renderable formats, add grouping comments

2016-11-29 Thread Rowley, Timothy O
I’ve verified the same entries are in the list before/after.

Reviewed-by: Tim Rowley 
>

On Nov 22, 2016, at 7:37 PM, Ilia Mirkin 
> wrote:

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/swr_screen.cpp | 152 +++--
1 file changed, 87 insertions(+), 65 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index b17faee..642f9be 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -377,89 +377,141 @@ SWR_FORMAT
mesa_to_swr_format(enum pipe_format format)
{
   static const std::map mesa2swr = {
-  {PIPE_FORMAT_B8G8R8A8_UNORM, B8G8R8A8_UNORM},
-  {PIPE_FORMAT_B8G8R8X8_UNORM, B8G8R8X8_UNORM},
-  {PIPE_FORMAT_B5G5R5A1_UNORM, B5G5R5A1_UNORM},
-  {PIPE_FORMAT_B4G4R4A4_UNORM, B4G4R4A4_UNORM},
-  {PIPE_FORMAT_B5G6R5_UNORM,   B5G6R5_UNORM},
-  {PIPE_FORMAT_R10G10B10A2_UNORM,  R10G10B10A2_UNORM},
-  {PIPE_FORMAT_A8_UNORM,   A8_UNORM},
+  /* depth / stencil */
  {PIPE_FORMAT_Z16_UNORM,  R16_UNORM}, // z
  {PIPE_FORMAT_Z32_FLOAT,  R32_FLOAT}, // z
  {PIPE_FORMAT_Z24_UNORM_S8_UINT,  R24_UNORM_X8_TYPELESS}, // z
  {PIPE_FORMAT_Z24X8_UNORM,R24_UNORM_X8_TYPELESS}, // z
+  {PIPE_FORMAT_Z32_FLOAT_S8X24_UINT,   R32_FLOAT_X8X24_TYPELESS}, // z
+
+  /* alpha */
+  {PIPE_FORMAT_A8_UNORM,   A8_UNORM},
+  {PIPE_FORMAT_A16_UNORM,  A16_UNORM},
+  {PIPE_FORMAT_A16_FLOAT,  A16_FLOAT},
+  {PIPE_FORMAT_A32_FLOAT,  A32_FLOAT},
+
+  /* odd sizes, bgr */
+  {PIPE_FORMAT_B5G6R5_UNORM,   B5G6R5_UNORM},
+  {PIPE_FORMAT_B5G6R5_SRGB,B5G6R5_UNORM_SRGB},
+  {PIPE_FORMAT_B5G5R5A1_UNORM, B5G5R5A1_UNORM},
+  {PIPE_FORMAT_B5G5R5X1_UNORM, B5G5R5X1_UNORM},
+  {PIPE_FORMAT_B4G4R4A4_UNORM, B4G4R4A4_UNORM},
+  {PIPE_FORMAT_B8G8R8A8_UNORM, B8G8R8A8_UNORM},
+  {PIPE_FORMAT_B8G8R8A8_SRGB,  B8G8R8A8_UNORM_SRGB},
+  {PIPE_FORMAT_B8G8R8X8_UNORM, B8G8R8X8_UNORM},
+  {PIPE_FORMAT_B8G8R8X8_SRGB,  B8G8R8X8_UNORM_SRGB},
+
+  /* rgb10a2 */
+  {PIPE_FORMAT_R10G10B10A2_UNORM,  R10G10B10A2_UNORM},
+  {PIPE_FORMAT_R10G10B10A2_SNORM,  R10G10B10A2_SNORM},
+  {PIPE_FORMAT_R10G10B10A2_USCALED,R10G10B10A2_USCALED},
+  {PIPE_FORMAT_R10G10B10A2_SSCALED,R10G10B10A2_SSCALED},
+  {PIPE_FORMAT_R10G10B10A2_UINT,   R10G10B10A2_UINT},
+
+  /* rgb10x2 */
+  {PIPE_FORMAT_R10G10B10X2_USCALED,R10G10B10X2_USCALED},
+
+  /* bgr10a2 */
+  {PIPE_FORMAT_B10G10R10A2_UNORM,  B10G10R10A2_UNORM},
+  {PIPE_FORMAT_B10G10R10A2_SNORM,  B10G10R10A2_SNORM},
+  {PIPE_FORMAT_B10G10R10A2_USCALED,B10G10R10A2_USCALED},
+  {PIPE_FORMAT_B10G10R10A2_SSCALED,B10G10R10A2_SSCALED},
+  {PIPE_FORMAT_B10G10R10A2_UINT,   B10G10R10A2_UINT},
+
+  /* bgr10x2 */
+  {PIPE_FORMAT_B10G10R10X2_UNORM,  B10G10R10X2_UNORM},
+
+  /* r11g11b10 */
+  {PIPE_FORMAT_R11G11B10_FLOAT,R11G11B10_FLOAT},
+
+  /* 32 bits per component */
  {PIPE_FORMAT_R32_FLOAT,  R32_FLOAT},
  {PIPE_FORMAT_R32G32_FLOAT,   R32G32_FLOAT},
  {PIPE_FORMAT_R32G32B32_FLOAT,R32G32B32_FLOAT},
  {PIPE_FORMAT_R32G32B32A32_FLOAT, R32G32B32A32_FLOAT},
+  {PIPE_FORMAT_R32G32B32X32_FLOAT, R32G32B32X32_FLOAT},
+
  {PIPE_FORMAT_R32_USCALED,R32_USCALED},
  {PIPE_FORMAT_R32G32_USCALED, R32G32_USCALED},
  {PIPE_FORMAT_R32G32B32_USCALED,  R32G32B32_USCALED},
  {PIPE_FORMAT_R32G32B32A32_USCALED,   R32G32B32A32_USCALED},
+
  {PIPE_FORMAT_R32_SSCALED,R32_SSCALED},
  {PIPE_FORMAT_R32G32_SSCALED, R32G32_SSCALED},
  {PIPE_FORMAT_R32G32B32_SSCALED,  R32G32B32_SSCALED},
  {PIPE_FORMAT_R32G32B32A32_SSCALED,   R32G32B32A32_SSCALED},
+
+  {PIPE_FORMAT_R32_UINT,   R32_UINT},
+  {PIPE_FORMAT_R32G32_UINT,R32G32_UINT},
+  {PIPE_FORMAT_R32G32B32_UINT, R32G32B32_UINT},
+  {PIPE_FORMAT_R32G32B32A32_UINT,  R32G32B32A32_UINT},
+
+  {PIPE_FORMAT_R32_SINT,   R32_SINT},
+  {PIPE_FORMAT_R32G32_SINT,R32G32_SINT},
+  {PIPE_FORMAT_R32G32B32_SINT, R32G32B32_SINT},
+  {PIPE_FORMAT_R32G32B32A32_SINT,  R32G32B32A32_SINT},
+
+  /* 16 bits per component */
  {PIPE_FORMAT_R16_UNORM,  R16_UNORM},
  {PIPE_FORMAT_R16G16_UNORM,   R16G16_UNORM},
  {PIPE_FORMAT_R16G16B16_UNORM,R16G16B16_UNORM},
  {PIPE_FORMAT_R16G16B16A16_UNORM, R16G16B16A16_UNORM},
+

Re: [Mesa-dev] [PATCH 3/6] swr: enable cubemap arrays

2016-11-29 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 22, 2016, at 7:37 PM, Ilia Mirkin 
> wrote:

Everything is in place for these.

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/swr_screen.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index dc55d3e..b17faee 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -247,6 +247,7 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
   case PIPE_CAP_TEXTURE_FLOAT_LINEAR:
   case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR:
   case PIPE_CAP_CULL_DISTANCE:
+   case PIPE_CAP_CUBE_MAP_ARRAY:
  return 1;

  /* unsupported features */
@@ -264,7 +265,6 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
   case PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY:
   case PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY:
   case PIPE_CAP_TEXTURE_MULTISAMPLE:
-   case PIPE_CAP_CUBE_MAP_ARRAY:
   case PIPE_CAP_TGSI_TEXCOORD:
   case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
   case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/6] swr: use util_copy_framebuffer_state helper

2016-11-29 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 22, 2016, at 7:37 PM, Ilia Mirkin 
> wrote:

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/swr_state.cpp | 13 +
1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index 4119379..8541aca 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -617,18 +617,7 @@ swr_set_framebuffer_state(struct pipe_context *pipe,
   assert(fb->height <= KNOB_GUARDBAND_HEIGHT);

   if (changed) {
-  unsigned i;
-  for (i = 0; i < fb->nr_cbufs; ++i)
- pipe_surface_reference(>framebuffer.cbufs[i], fb->cbufs[i]);
-  for (; i < ctx->framebuffer.nr_cbufs; ++i)
- pipe_surface_reference(>framebuffer.cbufs[i], NULL);
-
-  ctx->framebuffer.nr_cbufs = fb->nr_cbufs;
-
-  ctx->framebuffer.width = fb->width;
-  ctx->framebuffer.height = fb->height;
-
-  pipe_surface_reference(>framebuffer.zsbuf, fb->zsbuf);
+  util_copy_framebuffer_state(>framebuffer, fb);

  ctx->dirty |= SWR_NEW_FRAMEBUFFER;
   }
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] swr: rearrange caps into limits/supported/unsupported groups

2016-11-29 Thread Rowley, Timothy O
Ouch, that must have been a pain to reorganize - thanks.  Visual inspection 
says the caps are the same before and after, and testing shows it still passing 
the same tests.

Reviewed-by: Tim Rowley 
>

On Nov 22, 2016, at 7:37 PM, Ilia Mirkin 
> wrote:

I find this a lot more readable and compact - much easier to scan
through the list and see what's on and what's off.

No functional change intended.

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/swr_screen.cpp | 213 +
1 file changed, 84 insertions(+), 129 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index 194b8f0..dc55d3e 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -153,54 +153,15 @@ static int
swr_get_param(struct pipe_screen *screen, enum pipe_cap param)
{
   switch (param) {
-   case PIPE_CAP_NPOT_TEXTURES:
-   case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES:
-   case PIPE_CAP_MIXED_COLOR_DEPTH_BITS:
-  return 1;
-   case PIPE_CAP_TWO_SIDED_STENCIL:
-  return 1;
-   case PIPE_CAP_SM3:
-  return 1;
-   case PIPE_CAP_ANISOTROPIC_FILTER:
-  return 0;
-   case PIPE_CAP_POINT_SPRITE:
-  return 1;
+  /* limits */
   case PIPE_CAP_MAX_RENDER_TARGETS:
  return PIPE_MAX_COLOR_BUFS;
-   case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS:
-  return 1;
-   case PIPE_CAP_OCCLUSION_QUERY:
-   case PIPE_CAP_QUERY_TIME_ELAPSED:
-   case PIPE_CAP_QUERY_PIPELINE_STATISTICS:
-  return 1;
-   case PIPE_CAP_TEXTURE_MIRROR_CLAMP:
-  return 1;
-   case PIPE_CAP_TEXTURE_SHADOW_MAP:
-  return 1;
-   case PIPE_CAP_TEXTURE_SWIZZLE:
-  return 1;
-   case PIPE_CAP_TEXTURE_BORDER_COLOR_QUIRK:
-  return 0;
   case PIPE_CAP_MAX_TEXTURE_2D_LEVELS:
  return SWR_MAX_TEXTURE_2D_LEVELS;
   case PIPE_CAP_MAX_TEXTURE_3D_LEVELS:
  return SWR_MAX_TEXTURE_3D_LEVELS;
   case PIPE_CAP_MAX_TEXTURE_CUBE_LEVELS:
  return SWR_MAX_TEXTURE_CUBE_LEVELS;
-   case PIPE_CAP_BLEND_EQUATION_SEPARATE:
-  return 1;
-   case PIPE_CAP_INDEP_BLEND_ENABLE:
-  return 1;
-   case PIPE_CAP_INDEP_BLEND_FUNC:
-  return 1;
-   case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT:
-  return 0; // Don't support lower left frag coord.
-   case PIPE_CAP_TGSI_FS_COORD_ORIGIN_UPPER_LEFT:
-   case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_HALF_INTEGER:
-   case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_INTEGER:
-  return 1;
-   case PIPE_CAP_DEPTH_CLIP_DISABLE:
-  return 1;
   case PIPE_CAP_MAX_STREAM_OUTPUT_BUFFERS:
  return MAX_SO_STREAMS;
   case PIPE_CAP_MAX_STREAM_OUTPUT_SEPARATE_COMPONENTS:
@@ -213,134 +174,112 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
  return 1;
   case PIPE_CAP_MAX_VERTEX_ATTRIB_STRIDE:
  return 2048;
-   case PIPE_CAP_PRIMITIVE_RESTART:
-  return 1;
-   case PIPE_CAP_SHADER_STENCIL_EXPORT:
-  return 0;
-   case PIPE_CAP_TGSI_INSTANCEID:
-   case PIPE_CAP_VERTEX_ELEMENT_INSTANCE_DIVISOR:
-   case PIPE_CAP_START_INSTANCE:
-  return 1;
-   case PIPE_CAP_SEAMLESS_CUBE_MAP:
-   case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
-  return 1;
   case PIPE_CAP_MAX_TEXTURE_ARRAY_LAYERS:
  return SWR_MAX_TEXTURE_ARRAY_LAYERS;
   case PIPE_CAP_MIN_TEXEL_OFFSET:
  return -8;
   case PIPE_CAP_MAX_TEXEL_OFFSET:
  return 7;
-   case PIPE_CAP_CONDITIONAL_RENDER:
-  return 1;
-   case PIPE_CAP_TEXTURE_BARRIER:
+   case PIPE_CAP_GLSL_FEATURE_LEVEL:
+  return 330;
+   case PIPE_CAP_CONSTANT_BUFFER_OFFSET_ALIGNMENT:
+  return 16;
+   case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT:
+  return 64;
+   case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
+  return 65536;
+   case PIPE_CAP_TEXTURE_BUFFER_OFFSET_ALIGNMENT:
  return 0;
-   case PIPE_CAP_FRAGMENT_COLOR_CLAMPED:
-   case PIPE_CAP_VERTEX_COLOR_CLAMPED:
+   case PIPE_CAP_MAX_VIEWPORTS:
+  return 1;
+   case PIPE_CAP_ENDIANNESS:
+  return PIPE_ENDIAN_NATIVE;
+   case PIPE_CAP_MIN_TEXTURE_GATHER_OFFSET:
+   case PIPE_CAP_MAX_TEXTURE_GATHER_OFFSET:
  return 0;
+
+  /* supported features */
+   case PIPE_CAP_NPOT_TEXTURES:
+   case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES:
+   case PIPE_CAP_MIXED_COLOR_DEPTH_BITS:
+   case PIPE_CAP_TWO_SIDED_STENCIL:
+   case PIPE_CAP_SM3:
+   case PIPE_CAP_POINT_SPRITE:
+   case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS:
+   case PIPE_CAP_OCCLUSION_QUERY:
+   case PIPE_CAP_QUERY_TIME_ELAPSED:
+   case PIPE_CAP_QUERY_PIPELINE_STATISTICS:
+   case PIPE_CAP_TEXTURE_MIRROR_CLAMP:
+   case PIPE_CAP_TEXTURE_SHADOW_MAP:
+   case PIPE_CAP_TEXTURE_SWIZZLE:
+   case PIPE_CAP_BLEND_EQUATION_SEPARATE:
+   case PIPE_CAP_INDEP_BLEND_ENABLE:
+   case PIPE_CAP_INDEP_BLEND_FUNC:
+   case PIPE_CAP_TGSI_FS_COORD_ORIGIN_UPPER_LEFT:
+   case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_HALF_INTEGER:
+   case 

Re: [Mesa-dev] [PATCH 1/6] swr: only store up to the LOD size

2016-11-29 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 22, 2016, at 7:37 PM, Ilia Mirkin 
> wrote:

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/swr_draw.cpp | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_draw.cpp 
b/src/gallium/drivers/swr/swr_draw.cpp
index e8c5b23..c4d5e5c 100644
--- a/src/gallium/drivers/swr/swr_draw.cpp
+++ b/src/gallium/drivers/swr/swr_draw.cpp
@@ -259,7 +259,9 @@ swr_store_render_target(struct pipe_context *pipe,
   if (renderTarget->pBaseAddress) {
  swr_update_draw_context(ctx);
  SWR_RECT full_rect =
- {0, 0, (int32_t)renderTarget->width, (int32_t)renderTarget->height};
+ {0, 0,
+  (int32_t)u_minify(renderTarget->width, renderTarget->lod),
+  (int32_t)u_minify(renderTarget->height, renderTarget->lod)};
  SwrStoreTiles(ctx->swrContext,
1 << attachment,
post_tile_state,
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] swr: [rasterizer core] use ClearTile helper to store fast clears

2016-11-28 Thread Rowley, Timothy O
This patch is showing some regressions on internal testing.  As we talked about 
on irc, it appears to be a combination of crashes (probably missing table 
entries) and possibly wrong clear values.  Will need to back to you later about 
the errors, but for now we need to hold off on this patch.

-Tim

> On Nov 19, 2016, at 9:48 AM, Ilia Mirkin  wrote:
> 
> No point in clearing the hot tile and then storing that - may as well
> just store the clear color to the surface directly. Use the helper that
> already exists for this purpose.
> 
> Signed-off-by: Ilia Mirkin 
> ---
> 
> My theory is that this is going to be a very modest perf improvement. Instead
> of first clearing the hot tile and then storing it, we store the clear color
> directly.
> 
> It does bring up a rare case where a tile might be cleared, stored, and then
> re-used with the same buffer. In that case, the former logic would avoid the
> load while the new logic will end up reloading the clear color/etc. There was
> a grand total of one piglit that was hit by this:
> 
>  fbo-attachments-blit-scaled-linear
> 
> (and that is the reason that we have to set the hottile to INVALID rather than
> the post state when storing.)
> 
> src/gallium/drivers/swr/rasterizer/core/backend.cpp | 17 ++---
> src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp | 15 +--
> 2 files changed, 15 insertions(+), 17 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/core/backend.cpp 
> b/src/gallium/drivers/swr/rasterizer/core/backend.cpp
> index c45c0a7..ff08233 100644
> --- a/src/gallium/drivers/swr/rasterizer/core/backend.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/core/backend.cpp
> @@ -358,16 +358,19 @@ void ProcessStoreTileBE(DRAW_CONTEXT *pDC, uint32_t 
> workerId, uint32_t macroTile
> HOTTILE *pHotTile = pContext->pHotTileMgr->GetHotTileNoLoad(pContext, 
> pDC, macroTile, attachment, false);
> if (pHotTile)
> {
> -// clear if clear is pending (i.e., not rendered to), then mark as 
> dirty for store.
> +// clear the surface directly
> if (pHotTile->state == HOTTILE_CLEAR)
> {
> -PFN_CLEAR_TILES pfnClearTiles = sClearTilesTable[srcFormat];
> -SWR_ASSERT(pfnClearTiles != nullptr);
> -
> -pfnClearTiles(pDC, attachment, macroTile, 
> pHotTile->renderTargetArrayIndex, pHotTile->clearData, pDesc->rect);
> +pContext->pfnClearTile(GetPrivateState(pDC), attachment,
> +x * KNOB_MACROTILE_X_DIM, y * KNOB_MACROTILE_Y_DIM,
> +pHotTile->renderTargetArrayIndex,
> +(const float *)pHotTile->clearData);
> +
> +// Since the state is effectively uninitialized, make sure that 
> we
> +// reload any data.
> +pHotTile->state = HOTTILE_INVALID;
> }
> -
> -if (pHotTile->state == HOTTILE_DIRTY || pDesc->postStoreTileState == 
> (SWR_TILE_STATE)HOTTILE_DIRTY)
> +else if (pHotTile->state == HOTTILE_DIRTY || 
> pDesc->postStoreTileState == (SWR_TILE_STATE)HOTTILE_DIRTY)
> {
> int32_t destX = KNOB_MACROTILE_X_DIM * x;
> int32_t destY = KNOB_MACROTILE_Y_DIM * y;
> diff --git a/src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp 
> b/src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp
> index f398667..a4a6152 100644
> --- a/src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp
> @@ -151,17 +151,12 @@ HOTTILE* HotTileMgr::GetHotTile(SWR_CONTEXT* pContext, 
> DRAW_CONTEXT* pDC, uint32
> 
> if (hotTile.state == HOTTILE_CLEAR)
> {
> -if (attachment == SWR_ATTACHMENT_STENCIL)
> -ClearStencilHotTile();
> -else if (attachment == SWR_ATTACHMENT_DEPTH)
> -ClearDepthHotTile();
> -else
> -ClearColorHotTile();
> -
> -hotTile.state = HOTTILE_DIRTY;
> +pContext->pfnClearTile(GetPrivateState(pDC), attachment,
> +x * KNOB_MACROTILE_X_DIM, y * KNOB_MACROTILE_Y_DIM,
> +hotTile.renderTargetArrayIndex,
> +(const float *)hotTile.clearData);
> }
> -
> -if (hotTile.state == HOTTILE_DIRTY)
> +else if (hotTile.state == HOTTILE_DIRTY)
> {
> pContext->pfnStoreTile(GetPrivateState(pDC), format, 
> attachment,
> x * KNOB_MACROTILE_X_DIM, y * KNOB_MACROTILE_Y_DIM, 
> hotTile.renderTargetArrayIndex, hotTile.pBuffer);
> -- 
> 2.7.3
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] swr: [rasterizer memory] hook up stencil clears for ClearTile

2016-11-28 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 19, 2016, at 9:48 AM, Ilia Mirkin 
> wrote:

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp | 13 -
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp 
b/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
index 8501e21..31a40a3 100644
--- a/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
+++ b/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
@@ -156,16 +156,19 @@ void StoreHotTileClear(
{
PFN_STORE_TILES_CLEAR pfnStoreTilesClear = NULL;

-SWR_ASSERT(renderTargetIndex != SWR_ATTACHMENT_STENCIL);  ///@todo Not 
supported yet.
-
-if (renderTargetIndex != SWR_ATTACHMENT_DEPTH)
+if (renderTargetIndex == SWR_ATTACHMENT_STENCIL)
{
-pfnStoreTilesClear = sStoreTilesClearColorTable[pDstSurface->format];
+SWR_ASSERT(pDstSurface->format == R8_UINT);
+pfnStoreTilesClear = StoreMacroTileClear::StoreClear;
}
-else
+else if (renderTargetIndex == SWR_ATTACHMENT_DEPTH)
{
pfnStoreTilesClear = sStoreTilesClearDepthTable[pDstSurface->format];
}
+else
+{
+pfnStoreTilesClear = sStoreTilesClearColorTable[pDstSurface->format];
+}

SWR_ASSERT(pfnStoreTilesClear != NULL);

--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] swr: [rasterizer memory] only clear up to the LOD size

2016-11-28 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 19, 2016, at 9:48 AM, Ilia Mirkin 
> wrote:

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp | 10 --
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp 
b/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
index 31a40a3..ee13f55 100644
--- a/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
+++ b/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
@@ -60,6 +60,12 @@ struct StoreRasterTileClear
UINT x, UINT y, // (x, y) pixel coordinate to start of raster tile.
uint32_t renderTargetArrayIndex)
{
+// If we're outside of the surface, stop.
+uint32_t lodWidth = std::max(pDstSurface->width >> 
pDstSurface->lod, 1U);
+uint32_t lodHeight = std::max(pDstSurface->height >> 
pDstSurface->lod, 1U);
+if (x >= lodWidth || y >= lodHeight)
+return;
+
// Compute destination address for raster tile.
uint8_t* pDstTile = (uint8_t*)ComputeSurfaceAddress(
x, y, pDstSurface->arrayIndex + renderTargetArrayIndex,
@@ -73,7 +79,7 @@ struct StoreRasterTileClear
UINT dstBytesPerRow = 0;

// For each raster tile pixel in row 0 (rx, 0)
-for (UINT rx = 0; (rx < KNOB_TILE_X_DIM) && ((x + rx) < 
pDstSurface->width); ++rx)
+for (UINT rx = 0; (rx < KNOB_TILE_X_DIM) && ((x + rx) < lodWidth); 
++rx)
{
memcpy(pDst, dstFormattedColor, dstBytesPerPixel);

@@ -86,7 +92,7 @@ struct StoreRasterTileClear
pDst = pDstTile + pDstSurface->pitch;

// For each remaining row in the rest of the raster tile
-for (UINT ry = 1; (ry < KNOB_TILE_Y_DIM) && ((y + ry) < 
pDstSurface->height); ++ry)
+for (UINT ry = 1; (ry < KNOB_TILE_Y_DIM) && ((y + ry) < lodHeight); 
++ry)
{
// copy row
memcpy(pDst, pDstTile, dstBytesPerRow);
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] swr: [rasterizer memory] add support for clearing Z32F_X32 and Z16

2016-11-28 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 19, 2016, at 9:48 AM, Ilia Mirkin 
> wrote:

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp | 2 ++
1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp 
b/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
index 717d12c..8501e21 100644
--- a/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
+++ b/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
@@ -282,7 +282,9 @@ void StoreHotTileClear(
memset(sStoreTilesClearDepthTable, 0, sizeof(sStoreTilesClearDepthTable)); \
\
sStoreTilesClearDepthTable[R32_FLOAT] = StoreMacroTileClear::StoreClear; \
+sStoreTilesClearDepthTable[R32_FLOAT_X8X24_TYPELESS] = 
StoreMacroTileClear::StoreClear; \
sStoreTilesClearDepthTable[R24_UNORM_X8_TYPELESS] = 
StoreMacroTileClear::StoreClear; \
+sStoreTilesClearDepthTable[R16_UNORM] = StoreMacroTileClear::StoreClear; \

//
/// @brief Sets up tables for ClearTile
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: [rasterizer core] fix typo in scissor tile-alignment logic

2016-11-28 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 25, 2016, at 7:35 PM, Ilia Mirkin 
> wrote:

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/rasterizer/core/api.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/api.cpp 
b/src/gallium/drivers/swr/rasterizer/core/api.cpp
index 383a7ad..6c0d5dd 100644
--- a/src/gallium/drivers/swr/rasterizer/core/api.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/api.cpp
@@ -765,7 +765,7 @@ void SetupMacroTileScissors(DRAW_CONTEXT *pDC)
tileAligned  = (scissorInFixedPoint.xmin % KNOB_TILE_X_DIM) == 0;
tileAligned &= (scissorInFixedPoint.ymin % KNOB_TILE_Y_DIM) == 0;
tileAligned &= (scissorInFixedPoint.xmax % KNOB_TILE_X_DIM) == 0;
-tileAligned &= (scissorInFixedPoint.xmax % KNOB_TILE_Y_DIM) == 0;
+tileAligned &= (scissorInFixedPoint.ymax % KNOB_TILE_Y_DIM) == 0;

pState->scissorsTileAligned &= tileAligned;

--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] swr: clear every layer of the attached surfaces

2016-11-23 Thread Rowley, Timothy O
Ah, didn’t notice that they were all shifted by arrayIndex.  Fine to leave the 
changes as they are, then.

This series of four patches (or rather, the rebased versions in your repo) are 
Reviewed-by: Tim Rowley 
<timothy.o.row...@intel.com<mailto:timothy.o.row...@intel.com>>


On Nov 23, 2016, at 2:11 PM, Ilia Mirkin 
<imir...@alum.mit.edu<mailto:imir...@alum.mit.edu>> wrote:

On Wed, Nov 23, 2016 at 3:02 PM, Rowley, Timothy O
<timothy.o.row...@intel.com<mailto:timothy.o.row...@intel.com>> wrote:
This code seems to assume that all attached buffers have the same start layer, 
and that start will be zero.  Maybe it should construct the clearMask inside 
the layer loop, which would also be a bit clearer than the code you added to 
drop bits out of the mask?

They have a logical start layer which is the same (0), since the real
start layer is in the SWR_SURFACE_STATE's arrayIndex. The arrayIndex
is added to the renderTargetArrayIndex to compute a final layer to
operate on.

If you'd like to simplify this code, I could just clear every
attachment/layer one at a time rather than trying to do it in fewer
steps. I suspect that the end effect on the swr backend will be
largely identical.

 -ilia


-Tim

On Nov 17, 2016, at 6:51 PM, Ilia Mirkin 
<imir...@alum.mit.edu<mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu<mailto:imir...@alum.mit.edu>>
---

With this patch, the layered-rendering clear tests pass, both with fast clear
enabled and disabled.

src/gallium/drivers/swr/swr_clear.cpp | 35 +--
1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_clear.cpp 
b/src/gallium/drivers/swr/swr_clear.cpp
index 25f066e..7ac308e 100644
--- a/src/gallium/drivers/swr/swr_clear.cpp
+++ b/src/gallium/drivers/swr/swr_clear.cpp
@@ -35,6 +35,7 @@ swr_clear(struct pipe_context *pipe,
  struct pipe_framebuffer_state *fb = >framebuffer;

  UINT clearMask = 0;
+   int layers = 0;

  if (!swr_check_render_cond(pipe))
 return;
@@ -44,24 +45,46 @@ swr_clear(struct pipe_context *pipe,

  if (buffers & PIPE_CLEAR_COLOR && fb->nr_cbufs) {
 for (unsigned i = 0; i < fb->nr_cbufs; ++i)
- if (fb->cbufs[i])
+ if (fb->cbufs[i] && (buffers & (PIPE_CLEAR_COLOR0 << i))) {
   clearMask |= (SWR_ATTACHMENT_COLOR0_BIT << i);
+layers = std::max(layers, fb->cbufs[i]->u.tex.last_layer -
+  fb->cbufs[i]->u.tex.first_layer + 1);
+ }
  }

-   if (buffers & PIPE_CLEAR_DEPTH && fb->zsbuf)
+   if (buffers & PIPE_CLEAR_DEPTH && fb->zsbuf) {
 clearMask |= SWR_ATTACHMENT_DEPTH_BIT;
+  layers = std::max(layers, fb->zsbuf->u.tex.last_layer -
+fb->zsbuf->u.tex.first_layer + 1);
+   }

-   if (buffers & PIPE_CLEAR_STENCIL && fb->zsbuf)
+   if (buffers & PIPE_CLEAR_STENCIL && fb->zsbuf) {
 clearMask |= SWR_ATTACHMENT_STENCIL_BIT;
+  layers = std::max(layers, fb->zsbuf->u.tex.last_layer -
+fb->zsbuf->u.tex.first_layer + 1);
+   }

#if 0 // XXX HACK, override clear color alpha. On ubuntu, clears are
 // transparent.
  ((union pipe_color_union *)color)->f[3] = 1.0; /* cast off your const'd-ness 
*/
#endif

-   swr_update_draw_context(ctx);
-   SwrClearRenderTarget(ctx->swrContext, clearMask, 0, color->f, depth, 
stencil,
-ctx->swr_scissor);
+   for (int i = 0; i < layers; ++i) {
+  swr_update_draw_context(ctx);
+  SwrClearRenderTarget(ctx->swrContext, clearMask, i,
+   color->f, depth, stencil,
+   ctx->swr_scissor);
+
+  // Mask out the attachments that are out of layers.
+  if (fb->zsbuf &&
+  fb->zsbuf->u.tex.last_layer - fb->zsbuf->u.tex.first_layer <= i)
+ clearMask &= ~(SWR_ATTACHMENT_DEPTH_BIT | SWR_ATTACHMENT_STENCIL_BIT);
+  for (unsigned c = 0; c < fb->nr_cbufs; ++c) {
+ const struct pipe_surface *sf = fb->cbufs[c];
+ if (sf && sf->u.tex.last_layer - sf->u.tex.first_layer <= i)
+clearMask &= ~(SWR_ATTACHMENT_COLOR0_BIT << c);
+  }
+   }
}


--
2.7.3



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] swr: clear every layer of the attached surfaces

2016-11-23 Thread Rowley, Timothy O
This code seems to assume that all attached buffers have the same start layer, 
and that start will be zero.  Maybe it should construct the clearMask inside 
the layer loop, which would also be a bit clearer than the code you added to 
drop bits out of the mask?

-Tim

> On Nov 17, 2016, at 6:51 PM, Ilia Mirkin  wrote:
> 
> Signed-off-by: Ilia Mirkin 
> ---
> 
> With this patch, the layered-rendering clear tests pass, both with fast clear
> enabled and disabled.
> 
> src/gallium/drivers/swr/swr_clear.cpp | 35 +--
> 1 file changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/swr_clear.cpp 
> b/src/gallium/drivers/swr/swr_clear.cpp
> index 25f066e..7ac308e 100644
> --- a/src/gallium/drivers/swr/swr_clear.cpp
> +++ b/src/gallium/drivers/swr/swr_clear.cpp
> @@ -35,6 +35,7 @@ swr_clear(struct pipe_context *pipe,
>struct pipe_framebuffer_state *fb = >framebuffer;
> 
>UINT clearMask = 0;
> +   int layers = 0;
> 
>if (!swr_check_render_cond(pipe))
>   return;
> @@ -44,24 +45,46 @@ swr_clear(struct pipe_context *pipe,
> 
>if (buffers & PIPE_CLEAR_COLOR && fb->nr_cbufs) {
>   for (unsigned i = 0; i < fb->nr_cbufs; ++i)
> - if (fb->cbufs[i])
> + if (fb->cbufs[i] && (buffers & (PIPE_CLEAR_COLOR0 << i))) {
> clearMask |= (SWR_ATTACHMENT_COLOR0_BIT << i);
> +layers = std::max(layers, fb->cbufs[i]->u.tex.last_layer -
> +  fb->cbufs[i]->u.tex.first_layer + 1);
> + }
>}
> 
> -   if (buffers & PIPE_CLEAR_DEPTH && fb->zsbuf)
> +   if (buffers & PIPE_CLEAR_DEPTH && fb->zsbuf) {
>   clearMask |= SWR_ATTACHMENT_DEPTH_BIT;
> +  layers = std::max(layers, fb->zsbuf->u.tex.last_layer -
> +fb->zsbuf->u.tex.first_layer + 1);
> +   }
> 
> -   if (buffers & PIPE_CLEAR_STENCIL && fb->zsbuf)
> +   if (buffers & PIPE_CLEAR_STENCIL && fb->zsbuf) {
>   clearMask |= SWR_ATTACHMENT_STENCIL_BIT;
> +  layers = std::max(layers, fb->zsbuf->u.tex.last_layer -
> +fb->zsbuf->u.tex.first_layer + 1);
> +   }
> 
> #if 0 // XXX HACK, override clear color alpha. On ubuntu, clears are
>   // transparent.
>((union pipe_color_union *)color)->f[3] = 1.0; /* cast off your 
> const'd-ness */
> #endif
> 
> -   swr_update_draw_context(ctx);
> -   SwrClearRenderTarget(ctx->swrContext, clearMask, 0, color->f, depth, 
> stencil,
> -ctx->swr_scissor);
> +   for (int i = 0; i < layers; ++i) {
> +  swr_update_draw_context(ctx);
> +  SwrClearRenderTarget(ctx->swrContext, clearMask, i,
> +   color->f, depth, stencil,
> +   ctx->swr_scissor);
> +
> +  // Mask out the attachments that are out of layers.
> +  if (fb->zsbuf &&
> +  fb->zsbuf->u.tex.last_layer - fb->zsbuf->u.tex.first_layer <= i)
> + clearMask &= ~(SWR_ATTACHMENT_DEPTH_BIT | 
> SWR_ATTACHMENT_STENCIL_BIT);
> +  for (unsigned c = 0; c < fb->nr_cbufs; ++c) {
> + const struct pipe_surface *sf = fb->cbufs[c];
> + if (sf && sf->u.tex.last_layer - sf->u.tex.first_layer <= i)
> +clearMask &= ~(SWR_ATTACHMENT_COLOR0_BIT << c);
> +  }
> +   }
> }
> 
> 
> -- 
> 2.7.3
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] swr: [rasterizer core] actually perform clear before store in GetHotTile

2016-11-23 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 17, 2016, at 6:51 PM, Ilia Mirkin 
> wrote:

When switching render target array indexes (as might happen in a GS, or
in a future change, with layered clears), if the previous state is
HOTTILE_CLEAR, we should actually clear the tile before saving it off.

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp | 12 
1 file changed, 12 insertions(+)

diff --git a/src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp 
b/src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp
index 804fc4f..f398667 100644
--- a/src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp
@@ -149,6 +149,18 @@ HOTTILE* HotTileMgr::GetHotTile(SWR_CONTEXT* pContext, 
DRAW_CONTEXT* pDC, uint32
default: SWR_ASSERT(false, "Unknown attachment: %d", attachment); 
format = KNOB_COLOR_HOT_TILE_FORMAT; break;
}

+if (hotTile.state == HOTTILE_CLEAR)
+{
+if (attachment == SWR_ATTACHMENT_STENCIL)
+ClearStencilHotTile();
+else if (attachment == SWR_ATTACHMENT_DEPTH)
+ClearDepthHotTile();
+else
+ClearColorHotTile();
+
+hotTile.state = HOTTILE_DIRTY;
+}
+
if (hotTile.state == HOTTILE_DIRTY)
{
pContext->pfnStoreTile(GetPrivateState(pDC), format, attachment,
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] swr: [rasterizer core] pipe renderTargetArrayIndex through to clears

2016-11-23 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 17, 2016, at 6:51 PM, Ilia Mirkin 
> wrote:

Currently clears only operate on the 0th array index (ignoring surface
layout parameters). Instead normalize to take a RTAI like all the
load/store tile logic does, and use ComputeSurfaceAddress to properly
take the surface state's lod/array index into account.
---
src/gallium/drivers/swr/rasterizer/core/api.cpp  |  3 +++
src/gallium/drivers/swr/rasterizer/core/api.h|  5 -
src/gallium/drivers/swr/rasterizer/core/backend.cpp  | 20 ++--
src/gallium/drivers/swr/rasterizer/core/context.h|  1 +
.../drivers/swr/rasterizer/memory/ClearTile.cpp  | 20 +---
src/gallium/drivers/swr/swr_clear.cpp|  2 +-
src/gallium/drivers/swr/swr_memory.h |  4 +++-
7 files changed, 35 insertions(+), 20 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/api.cpp 
b/src/gallium/drivers/swr/rasterizer/core/api.cpp
index 6ade65a..383a7ad 100644
--- a/src/gallium/drivers/swr/rasterizer/core/api.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/api.cpp
@@ -1476,6 +1476,7 @@ void SWR_API SwrStoreTiles(
/// @brief SwrClearRenderTarget - Clear attached render targets / depth / 
stencil
/// @param hContext - Handle passed back from SwrCreateContext
/// @param attachmentMask - combination of SWR_ATTACHMENT_*_BIT attachments to 
clear
+/// @param renderTargetArrayIndex - the RT array index to clear
/// @param clearColor - color use for clearing render targets
/// @param z - depth value use for clearing depth buffer
/// @param stencil - stencil value used for clearing stencil buffer
@@ -1483,6 +1484,7 @@ void SWR_API SwrStoreTiles(
void SWR_API SwrClearRenderTarget(
HANDLE hContext,
uint32_t attachmentMask,
+uint32_t renderTargetArrayIndex,
const float clearColor[4],
float z,
uint8_t stencil,
@@ -1503,6 +1505,7 @@ void SWR_API SwrClearRenderTarget(
pDC->FeWork.desc.clear.rect = clearRect;
pDC->FeWork.desc.clear.rect &= g_MaxScissorRect;
pDC->FeWork.desc.clear.attachmentMask = attachmentMask;
+pDC->FeWork.desc.clear.renderTargetArrayIndex = renderTargetArrayIndex;
pDC->FeWork.desc.clear.clearDepth = z;
pDC->FeWork.desc.clear.clearRTColor[0] = clearColor[0];
pDC->FeWork.desc.clear.clearRTColor[1] = clearColor[1];
diff --git a/src/gallium/drivers/swr/rasterizer/core/api.h 
b/src/gallium/drivers/swr/rasterizer/core/api.h
index 1a41637..d0f29dd 100644
--- a/src/gallium/drivers/swr/rasterizer/core/api.h
+++ b/src/gallium/drivers/swr/rasterizer/core/api.h
@@ -137,10 +137,11 @@ typedef void(SWR_API *PFN_STORE_TILE)(HANDLE 
hPrivateContext, SWR_FORMAT srcForm
/// @param renderTargetIndex - render target to store, can be color, depth or 
stencil
/// @param x - destination x coordinate
/// @param y - destination y coordinate
+/// @param renderTargetArrayIndex - render target array offset from arrayIndex
/// @param pClearColor - pointer to the hot tile's clear value
typedef void(SWR_API *PFN_CLEAR_TILE)(HANDLE hPrivateContext,
SWR_RENDERTARGET_ATTACHMENT rtIndex,
-uint32_t x, uint32_t y, const float* pClearColor);
+uint32_t x, uint32_t y, uint32_t renderTargetArrayIndex, const float* 
pClearColor);

//
/// @brief Callback to allow driver to update their copy of streamout write 
offset.
@@ -559,6 +560,7 @@ void SWR_API SwrStoreTiles(
/// @brief SwrClearRenderTarget - Clear attached render targets / depth / 
stencil
/// @param hContext - Handle passed back from SwrCreateContext
/// @param attachmentMask - combination of SWR_ATTACHMENT_*_BIT attachments to 
clear
+/// @param renderTargetArrayIndex - the RT array index to clear
/// @param clearColor - color use for clearing render targets
/// @param z - depth value use for clearing depth buffer
/// @param stencil - stencil value used for clearing stencil buffer
@@ -566,6 +568,7 @@ void SWR_API SwrStoreTiles(
void SWR_API SwrClearRenderTarget(
HANDLE hContext,
uint32_t attachmentMask,
+uint32_t renderTargetArrayIndex,
const float clearColor[4],
float z,
uint8_t stencil,
diff --git a/src/gallium/drivers/swr/rasterizer/core/backend.cpp 
b/src/gallium/drivers/swr/rasterizer/core/backend.cpp
index 45eff15..c45c0a7 100644
--- a/src/gallium/drivers/swr/rasterizer/core/backend.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/backend.cpp
@@ -37,7 +37,7 @@

#include 

-typedef void(*PFN_CLEAR_TILES)(DRAW_CONTEXT*, SWR_RENDERTARGET_ATTACHMENT rt, 
uint32_t, DWORD[4], const SWR_RECT& rect);
+typedef void(*PFN_CLEAR_TILES)(DRAW_CONTEXT*, SWR_RENDERTARGET_ATTACHMENT rt, 
uint32_t, uint32_t, DWORD[4], const SWR_RECT& rect);
static PFN_CLEAR_TILES sClearTilesTable[NUM_SWR_FORMATS];

//
@@ -134,7 +134,7 @@ 

Re: [Mesa-dev] [PATCH 5/5] swr: color interpolation is also supposed to get perspective division

2016-11-22 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 21, 2016, at 11:52 AM, Ilia Mirkin 
> wrote:

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/swr_shader.cpp | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_shader.cpp 
b/src/gallium/drivers/swr/swr_shader.cpp
index 428c9b3..294a568 100644
--- a/src/gallium/drivers/swr/swr_shader.cpp
+++ b/src/gallium/drivers/swr/swr_shader.cpp
@@ -457,7 +457,8 @@ BuilderSWR::CompileFS(struct swr_context *ctx, 
swr_jit_fs_key )

  // load/compute w
  Value *vw = nullptr, *pAttribs;
-  if (interpMode == TGSI_INTERPOLATE_PERSPECTIVE) {
+  if (interpMode == TGSI_INTERPOLATE_PERSPECTIVE ||
+  interpMode == TGSI_INTERPOLATE_COLOR) {
 pAttribs = pPerspAttribs;
 switch (interpLoc) {
 case TGSI_INTERPOLATE_LOC_CENTER:
@@ -596,7 +597,8 @@ BuilderSWR::CompileFS(struct swr_context *ctx, 
swr_jit_fs_key )
   Value *interp1 = FMUL(vb, vj);
   interp = FADD(interp, interp1);
   interp = FADD(interp, vc);
-   if (interpMode == TGSI_INTERPOLATE_PERSPECTIVE)
+   if (interpMode == TGSI_INTERPOLATE_PERSPECTIVE ||
+   interpMode == TGSI_INTERPOLATE_COLOR)
  interp = FMUL(interp, vw);
   inputs[attrib][channel] = wrap(interp);
}
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] swr: add sprite coord enable mask to fs key

2016-11-22 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 21, 2016, at 11:52 AM, Ilia Mirkin 
> wrote:

This fixes gl-coord-replace-doesnt-eliminate-frag-tex-coords

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/swr_shader.cpp | 3 ++-
src/gallium/drivers/swr/swr_shader.h   | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_shader.cpp 
b/src/gallium/drivers/swr/swr_shader.cpp
index d29f635..428c9b3 100644
--- a/src/gallium/drivers/swr/swr_shader.cpp
+++ b/src/gallium/drivers/swr/swr_shader.cpp
@@ -131,6 +131,7 @@ swr_generate_fs_key(struct swr_jit_fs_key ,

   key.nr_cbufs = ctx->framebuffer.nr_cbufs;
   key.light_twoside = ctx->rasterizer->light_twoside;
+   key.sprite_coord_enable = ctx->rasterizer->sprite_coord_enable;
   memcpy(_output_semantic_name,
  >vs->info.base.output_semantic_name,
  sizeof(key.vs_output_semantic_name));
@@ -515,7 +516,7 @@ BuilderSWR::CompileFS(struct swr_context *ctx, 
swr_jit_fs_key )
  unsigned linkedAttrib =
 locate_linkage(semantic_name, semantic_idx, >vs->info.base);
  if (semantic_name == TGSI_SEMANTIC_GENERIC &&
-  ctx->rasterizer->sprite_coord_enable & (1 << semantic_idx)) {
+  key.sprite_coord_enable & (1 << semantic_idx)) {
 /* we add an extra attrib to the backendState in swr_update_derived. */
 linkedAttrib = ctx->vs->info.base.num_outputs - 1;
 swr_fs->pointSpriteMask |= (1 << linkedAttrib);
diff --git a/src/gallium/drivers/swr/swr_shader.h 
b/src/gallium/drivers/swr/swr_shader.h
index ccdda44..7e3399c 100644
--- a/src/gallium/drivers/swr/swr_shader.h
+++ b/src/gallium/drivers/swr/swr_shader.h
@@ -51,6 +51,7 @@ struct swr_jit_sampler_key {
struct swr_jit_fs_key : swr_jit_sampler_key {
   unsigned nr_cbufs;
   unsigned light_twoside;
+   unsigned sprite_coord_enable;
   ubyte vs_output_semantic_name[PIPE_MAX_SHADER_OUTPUTS];
   ubyte vs_output_semantic_idx[PIPE_MAX_SHADER_OUTPUTS];
};
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/5] swr: rework vert <-> frag shader linkage logic

2016-11-22 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 21, 2016, at 11:52 AM, Ilia Mirkin 
> wrote:

Fixes a few things:
- sprite coords only apply to generic varyings, and are a bitmask
- back color only applies in 2-sided lighting mode
- handle some odd situations between only some front/back colors being
  there. This is only semi-legal in GL, but we shouldn't start
  crashing.

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/swr_shader.cpp | 93 ++
1 file changed, 50 insertions(+), 43 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_shader.cpp 
b/src/gallium/drivers/swr/swr_shader.cpp
index 2f72239..d29f635 100644
--- a/src/gallium/drivers/swr/swr_shader.cpp
+++ b/src/gallium/drivers/swr/swr_shader.cpp
@@ -372,15 +372,6 @@ locate_linkage(ubyte name, ubyte index, struct 
tgsi_shader_info *info)
  }
   }

-   if (name == TGSI_SEMANTIC_COLOR) { // BCOLOR fallback
-  for (int i = 0; i < PIPE_MAX_SHADER_OUTPUTS; i++) {
- if ((info->output_semantic_name[i] == TGSI_SEMANTIC_BCOLOR)
- && (info->output_semantic_index[i] == index)) {
-return i - 1; // position is not part of the linkage
- }
-  }
-   }
-
   return 0x;
}

@@ -523,54 +514,70 @@ BuilderSWR::CompileFS(struct swr_context *ctx, 
swr_jit_fs_key )

  unsigned linkedAttrib =
 locate_linkage(semantic_name, semantic_idx, >vs->info.base);
-  if (linkedAttrib == 0x) {
- // not found - check for point sprite
- if (ctx->rasterizer->sprite_coord_enable) {
-linkedAttrib = ctx->vs->info.base.num_outputs - 1;
-swr_fs->pointSpriteMask |= (1 << linkedAttrib);
- } else {
-fprintf(stderr,
-"Missing %s[%d]\n",
-tgsi_semantic_names[semantic_name],
-semantic_idx);
-assert(0 && "attribute linkage not found");
+  if (semantic_name == TGSI_SEMANTIC_GENERIC &&
+  ctx->rasterizer->sprite_coord_enable & (1 << semantic_idx)) {
+ /* we add an extra attrib to the backendState in swr_update_derived. 
*/
+ linkedAttrib = ctx->vs->info.base.num_outputs - 1;
+ swr_fs->pointSpriteMask |= (1 << linkedAttrib);
+  } else if (linkedAttrib == 0x) {
+ inputs[attrib][0] = wrap(VIMMED1(0.0f));
+ inputs[attrib][1] = wrap(VIMMED1(0.0f));
+ inputs[attrib][2] = wrap(VIMMED1(0.0f));
+ inputs[attrib][3] = wrap(VIMMED1(1.0f));
+ /* If we're reading in color and 2-sided lighting is enabled, we have
+  * to keep going.
+  */
+ if (semantic_name != TGSI_SEMANTIC_COLOR || !key.light_twoside)
+continue;
+  } else {
+ if (interpMode == TGSI_INTERPOLATE_CONSTANT) {
+swr_fs->constantMask |= 1 << linkedAttrib;
+ } else if (interpMode == TGSI_INTERPOLATE_COLOR) {
+swr_fs->flatConstantMask |= 1 << linkedAttrib;
 }
  }

-  if (interpMode == TGSI_INTERPOLATE_CONSTANT) {
- swr_fs->constantMask |= 1 << linkedAttrib;
-  } else if (interpMode == TGSI_INTERPOLATE_COLOR) {
- swr_fs->flatConstantMask |= 1 << linkedAttrib;
-  }
-
-  for (int channel = 0; channel < TGSI_NUM_CHANNELS; channel++) {
- if (mask & (1 << channel)) {
-Value *indexA = C(linkedAttrib * 12 + channel);
-Value *indexB = C(linkedAttrib * 12 + channel + 4);
-Value *indexC = C(linkedAttrib * 12 + channel + 8);
+  unsigned bcolorAttrib = 0x;
+  Value *offset = NULL;
+  if (semantic_name == TGSI_SEMANTIC_COLOR && key.light_twoside) {
+ bcolorAttrib = locate_linkage(
+   TGSI_SEMANTIC_BCOLOR, semantic_idx, >vs->info.base);
+ /* Neither front nor back colors were available. Nothing to load. */
+ if (bcolorAttrib == 0x && linkedAttrib == 0x)
+continue;
+ /* If there is no front color, just always use the back color. */
+ if (linkedAttrib == 0x)
+linkedAttrib = bcolorAttrib;

-if ((semantic_name == TGSI_SEMANTIC_COLOR)
-&& ctx->rasterizer->light_twoside) {
-   unsigned bcolorAttrib = locate_linkage(
-  TGSI_SEMANTIC_BCOLOR, semantic_idx, >vs->info.base);
+ if (bcolorAttrib != 0x) {
+if (interpMode == TGSI_INTERPOLATE_CONSTANT) {
+   swr_fs->constantMask |= 1 << bcolorAttrib;
+} else if (interpMode == TGSI_INTERPOLATE_COLOR) {
+   swr_fs->flatConstantMask |= 1 << bcolorAttrib;
+}

-   unsigned diff = 12 * (bcolorAttrib - linkedAttrib);
+unsigned diff = 12 * (bcolorAttrib - linkedAttrib);

+if (diff) {
   Value *back =

Re: [Mesa-dev] [PATCH 2/5] swr: flatshading makes color outputs flat, it doesn't affect others

2016-11-22 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 21, 2016, at 11:52 AM, Ilia Mirkin 
> wrote:

We were previously not marking the "regular" flat outputs as flat when
flatshading was enabled.

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/swr_state.cpp | 6 ++
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index dcbe434..8541aca 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -1490,10 +1490,8 @@ swr_update_derived(struct pipe_context *pipe,
  (ctx->rasterizer->sprite_coord_enable ? 1 : 0);
   for (unsigned i = 0; i < backendState.numAttributes; i++)
  backendState.numComponents[i] = 4;
-   backendState.constantInterpolationMask =
-  ctx->rasterizer->flatshade ?
-  ctx->fs->flatConstantMask :
-  ctx->fs->constantMask;
+   backendState.constantInterpolationMask = ctx->fs->constantMask |
+  (ctx->rasterizer->flatshade ? ctx->fs->flatConstantMask : 0);
   backendState.pointSpriteTexCoordMask = ctx->fs->pointSpriteMask;

   SwrSetBackendState(ctx->swrContext, );
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] swr: only broadcast color0 value, not all color values

2016-11-22 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 21, 2016, at 11:52 AM, Ilia Mirkin 
> wrote:

The way that dual-source blending is described for GLES2 is very odd,
and we end up with a shader that both has this property set *and* has a
color1 value to be used as the second source. While changing the state
tracker is an option, it seems more reliable to verify that the
broadcast is only done on color0.

Fixes arb_blend_func_extended-fbo-extended-blend-pattern_gles2

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/swr_shader.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_shader.cpp 
b/src/gallium/drivers/swr/swr_shader.cpp
index e4f9796..2f72239 100644
--- a/src/gallium/drivers/swr/swr_shader.cpp
+++ b/src/gallium/drivers/swr/swr_shader.cpp
@@ -645,7 +645,8 @@ BuilderSWR::CompileFS(struct swr_context *ctx, 
swr_jit_fs_key )

LLVMValueRef out =
   LLVMBuildLoad(gallivm->builder, outputs[attrib][channel], "");
-if 
(swr_fs->info.base.properties[TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS]) {
+if 
(swr_fs->info.base.properties[TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS] &&
+swr_fs->info.base.output_semantic_index[attrib] == 0) {
   for (uint32_t rt = 0; rt < key.nr_cbufs; rt++) {
  STORE(unwrap(out),
pPS,
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: don't claim to allow setting layer/viewport from VS

2016-11-21 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 20, 2016, at 12:20 PM, Ilia Mirkin 
> wrote:

This may ultimately be possible to support, but for now it's not hooked
up and the swr core only supports this output from GS.

This normally wouldn't matter, but we lie about supporting GL 3.2, and
also the blitter and st/mesa will make use of this functionality if
claimed.

Signed-off-by: Ilia Mirkin >
---
src/gallium/drivers/swr/swr_screen.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index 9affa02..bbecee5 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -252,10 +252,10 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
   case PIPE_CAP_USER_CONSTANT_BUFFERS:
   case PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME:
   case PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS:
-   case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT:
  return 1;
   case PIPE_CAP_CONSTANT_BUFFER_OFFSET_ALIGNMENT:
  return 16;
+   case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT:
   case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS:
   case PIPE_CAP_VERTEX_BUFFER_OFFSET_4BYTE_ALIGNED_ONLY:
   case PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY:
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] swr: calculate viewport width/height based on the scale

2016-11-21 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 20, 2016, at 10:32 AM, Ilia Mirkin 
> wrote:

The former calculations were for min/max y. The width/height don't take
translate into account.

Signed-off-by: Ilia Mirkin >
---

v2 -> v3:
- reduce viewport width when clamping the x/y offsets to 0
- subtract vp->y from height, not vp->x

Let's hope I don't need to write a v4 of this trivial patch.

src/gallium/drivers/swr/swr_state.cpp | 18 --
1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index 520faea..0302439 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -1018,9 +1018,9 @@ swr_update_derived(struct pipe_context *pipe,
  SWR_VIEWPORT_MATRICES *vpm = >derived.vpm;

  vp->x = state->translate[0] - state->scale[0];
-  vp->width = state->translate[0] + state->scale[0];
+  vp->width = 2 * state->scale[0];
  vp->y = state->translate[1] - fabs(state->scale[1]);
-  vp->height = state->translate[1] + fabs(state->scale[1]);
+  vp->height = 2 * fabs(state->scale[1]);
  util_viewport_zmin_zmax(state, rasterizer->clip_halfz,
  >minZ, >maxZ);

@@ -1033,10 +1033,16 @@ swr_update_derived(struct pipe_context *pipe,

  /* Now that the matrix is calculated, clip the view coords to screen
   * size.  OpenGL allows for -ve x,y in the viewport. */
-  vp->x = std::max(vp->x, 0.0f);
-  vp->y = std::max(vp->y, 0.0f);
-  vp->width = std::min(vp->width, (float)fb->width);
-  vp->height = std::min(vp->height, (float)fb->height);
+  if (vp->x < 0.0f) {
+ vp->width += vp->x;
+ vp->x = 0.0f;
+  }
+  if (vp->y < 0.0f) {
+ vp->height += vp->y;
+ vp->y = 0.0f;
+  }
+  vp->width = std::min(vp->width, (float)fb->width - vp->x);
+  vp->height = std::min(vp->height, (float)fb->height - vp->y);

  SwrSetViewports(ctx->swrContext, 1, vp, vpm);
   }
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: report a reasonable max lod bias

2016-11-21 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 19, 2016, at 10:11 AM, Ilia Mirkin 
> wrote:

This is the same value that llvmpipe uses. Since swr uses the same
sampler logic, makes sense for this value to also be the same. Most
applications don't care.

Signed-off-by: Ilia Mirkin >
---

I kind of assume this is dependent on my layout patches since LODs weren't
always properly handled before.

src/gallium/drivers/swr/swr_screen.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index 36afcc3..9affa02 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -406,7 +406,7 @@ swr_get_paramf(struct pipe_screen *screen, enum pipe_capf 
param)
   case PIPE_CAPF_MAX_TEXTURE_ANISOTROPY:
  return 0.0;
   case PIPE_CAPF_MAX_TEXTURE_LOD_BIAS:
-  return 0.0;
+  return 16.0; /* arbitrary */
   case PIPE_CAPF_GUARD_BAND_LEFT:
   case PIPE_CAPF_GUARD_BAND_TOP:
   case PIPE_CAPF_GUARD_BAND_RIGHT:
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/6] swr: [rasterizer memory] minify original sizes for block formats

2016-11-21 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
>

On Nov 17, 2016, at 10:56 PM, Ilia Mirkin 
> wrote:

There's no guarantee that mip width/height will be a multiple of the
compressed block size. Doing a divide by the block size first yields
different results than GL expects, so we do the divide at the end.

Signed-off-by: Ilia Mirkin >
---
.../swr/rasterizer/memory/TilingFunctions.h| 36 +++---
1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/memory/TilingFunctions.h 
b/src/gallium/drivers/swr/rasterizer/memory/TilingFunctions.h
index 0694a99..11ed451 100644
--- a/src/gallium/drivers/swr/rasterizer/memory/TilingFunctions.h
+++ b/src/gallium/drivers/swr/rasterizer/memory/TilingFunctions.h
@@ -274,9 +274,12 @@ INLINE void ComputeLODOffset1D(
else
{
uint32_t curWidth = baseWidth;
-// translate mip width from pixels to blocks for block compressed 
formats
-// @note hAlign is already in blocks for compressed formats so no need 
to convert
-if (info.isBC) curWidth /= info.bcWidth;
+// @note hAlign is already in blocks for compressed formats so 
upconvert
+//   so that we have the desired alignment post-divide.
+if (info.isBC)
+{
+hAlign *= info.bcWidth;
+}

offset = GFX_ALIGN(curWidth, hAlign);
for (uint32_t l = 1; l < lod; ++l)
@@ -285,7 +288,7 @@ INLINE void ComputeLODOffset1D(
offset += curWidth;
}

-if (info.isSubsampled)
+if (info.isSubsampled || info.isBC)
{
offset /= info.bcWidth;
}
@@ -312,14 +315,17 @@ INLINE void ComputeLODOffsetX(
else
{
uint32_t curWidth = baseWidth;
-// convert mip width from pixels to blocks for block compressed formats
-// @note hAlign is already in blocks for compressed formats so no need 
to convert
-if (info.isBC) curWidth /= info.bcWidth;
+// @note hAlign is already in blocks for compressed formats so 
upconvert
+//   so that we have the desired alignment post-divide.
+if (info.isBC)
+{
+hAlign *= info.bcWidth;
+}

curWidth = std::max(curWidth >> 1, 1U);
curWidth = GFX_ALIGN(curWidth, hAlign);

-if (info.isSubsampled)
+if (info.isSubsampled || info.isBC)
{
curWidth /= info.bcWidth;
}
@@ -350,9 +356,12 @@ INLINE void ComputeLODOffsetY(
offset = 0;
uint32_t mipHeight = baseHeight;

-// translate mip height from pixels to blocks for block compressed 
formats
-// @note VAlign is already in blocks for compressed formats so no need 
to convert
-if (info.isBC) mipHeight /= info.bcHeight;
+// @note vAlign is already in blocks for compressed formats so 
upconvert
+//   so that we have the desired alignment post-divide.
+if (info.isBC)
+{
+vAlign *= info.bcHeight;
+}

for (uint32_t l = 1; l <= lod; ++l)
{
@@ -360,6 +369,11 @@ INLINE void ComputeLODOffsetY(
offset += ((l != 2) ? alignedMipHeight : 0);
mipHeight = std::max(mipHeight >> 1, 1U);
}
+
+if (info.isBC)
+{
+offset /= info.bcHeight;
+}
}
}

--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >