Re: [Mesa-dev] [PATCH v2] swr: move msaa resolve to generalized StoreTile

2017-04-28 Thread Cherniak, Bruce

> On Apr 28, 2017, at 3:20 PM, Ilia Mirkin  wrote:
> 
> On Fri, Apr 28, 2017 at 3:58 PM, Cherniak, Bruce
>  wrote:
>> 
>>> On Apr 27, 2017, at 7:50 PM, Ilia Mirkin  wrote:
>>> 
>>> On Thu, Apr 27, 2017 at 8:45 PM, Cherniak, Bruce
>>>  wrote:
 
> On Apr 27, 2017, at 7:38 PM, Ilia Mirkin  wrote:
> 
> Erm, so ... what happens if I render to FB1, then render to FB2, then
> render to FB1 again (and I have blending enabled)? Doesn't the resolve
> lose the per-sample information? Or does the resolve merely precompute
> the resolved version on the off chance that it's needed, without
> losing the source data?
 
 The resolve occurs into a secondary, driver private, surface.  All 
 per-sample
 information is maintained in the original surfaces.
 
 Yes, the resolve is currently done "on the off chance that it’s needed”.
 There is likely an optimization to be had there, but it should be 
 functionally
 correct.
>>> 
>>> Got it. May I ask why this isn't done on-demand instead? Is it a pain
>>> to plug into swr's execution engine? I'm just concerned that
>>> StoreTile() may get called a lot, more than even there are draws, as
>>> tiles are swapped in and out of "hotness", and I wouldn't be surprised
>>> if resolves were needed only a fraction of the time.
>>> 
>>> Cheers,
>>> 
>>> -ilia
>> 
>> 
>> Good observation.  I haven’t yet seen this to be the case in the scientific
>> visualization applications I’ve been running. But, I can envision where that
>> becomes a performance concern.
>> 
>> Do you mean a blit based “state_tracker initiated” on-demand resolve (via
>> pipe_blit)?  If so, here are my thoughts:
> 
> Yes. The resolve is always initiated via a blit() call anyways (with a
> dst surface with nr_samples == 0).
> 
>> 1) The software winsys and state trackers don't support multisample surfaces
>>   for software renderers, nor will/should they (except for swr).  So, I
>>   thought keeping most of the changes local to our driver would be most
>>   desirable and safest, as far as swrast and llvmpipe are concerned.  Not
>>   sure about wgl yet, but I don't see it.
>> 
>> 2) A blit based resolve causes a pipeline reconfiguration (save/restore 
>> around
>>   the blit) that is inherently less efficient than simply
>>   storing-out/resolving HotTiles.
>> 
>> 3) A blit based resolve needs to sample from the multisample surface using a
>>   texture sampler with 2DMS/3DMS support.  We’re currently using llvmpipe's
>>   sampler which doesn't need this support.  I’m looking into extending it, as
>>   I know we need the functionality for compliance; it’s just not there yet.
>> 
>> I may be off-base on any of these thoughts.  If so, please correct me.
>> 
>> We’ll probably move to a “driver internal” on-demand resolve, implemented
>> similar to StoreTiles.  It's a simple matter to only resolve for the times we
>> know it's needed and the multisample surface is in HotTiles.  But, I need to
>> work out the LoadTiles case for surfaces that aren’t currently in HotTiles.
>> Tricky, since we're checking the resolve status of the secondary (resolved)
>> surface and the HotTile state of the multisample surface.
>> 
>> Thanks for the feedback.  Getting this completely correct and optimized is
>> going to be iterative.  This current patch, while maybe not optimal, helps
>> with functionality.  So, I think it's a step in the right direction.
> 
> I hope you realize I wasn't looking to derail your attempts at
> progress, more like providing some things to think about on your march
> towards perfection :) MS textures/fbo's are definitely a thing,
> probably more so than MS winsys surfaces these days. At least for
> games, maybe not visualization software, with which I have next to no
> experience. Try it with e.g. Unigine Heaven or Valley (with MSAA
> enabled). I'm fairly sure that at least Heaven uses MSAA textures.

We always value and appreciate your input.  And, while we’re primarily
focused on sci vis software, we’d like to be compliant as possible; which
means running a wide variety of applications… even *gasp* games.

I’ll definitely give Unigine a try.  Not expected great performance, but
we can at least strive for correct functionality.

> I believe most hardware uses MSAA compression, based on the
> observation that it's pretty common for all samples in a pixel to have
> the same color, or bg color + fg color + coverage mask. TBH I'm not
> sure how it all works. Something for the future when you get all the
> basics right.

“march towards perfections” :)

> Some hardware has built-in resolve functionality (e.g. Adreno, maybe
> other tilers as well) for moving a MS FBO out of a "hot tile", while
> most hardware requires the pipeline reconfiguration + blit. Perhaps
> it'd make sense to add a special FE command for computing the resolved
> version 

Re: [Mesa-dev] [PATCH v2] swr: move msaa resolve to generalized StoreTile

2017-04-28 Thread Ilia Mirkin
On Fri, Apr 28, 2017 at 3:58 PM, Cherniak, Bruce
 wrote:
>
>> On Apr 27, 2017, at 7:50 PM, Ilia Mirkin  wrote:
>>
>> On Thu, Apr 27, 2017 at 8:45 PM, Cherniak, Bruce
>>  wrote:
>>>
 On Apr 27, 2017, at 7:38 PM, Ilia Mirkin  wrote:

 Erm, so ... what happens if I render to FB1, then render to FB2, then
 render to FB1 again (and I have blending enabled)? Doesn't the resolve
 lose the per-sample information? Or does the resolve merely precompute
 the resolved version on the off chance that it's needed, without
 losing the source data?
>>>
>>> The resolve occurs into a secondary, driver private, surface.  All 
>>> per-sample
>>> information is maintained in the original surfaces.
>>>
>>> Yes, the resolve is currently done "on the off chance that it’s needed”.
>>> There is likely an optimization to be had there, but it should be 
>>> functionally
>>> correct.
>>
>> Got it. May I ask why this isn't done on-demand instead? Is it a pain
>> to plug into swr's execution engine? I'm just concerned that
>> StoreTile() may get called a lot, more than even there are draws, as
>> tiles are swapped in and out of "hotness", and I wouldn't be surprised
>> if resolves were needed only a fraction of the time.
>>
>> Cheers,
>>
>>  -ilia
>
>
> Good observation.  I haven’t yet seen this to be the case in the scientific
> visualization applications I’ve been running. But, I can envision where that
> becomes a performance concern.
>
> Do you mean a blit based “state_tracker initiated” on-demand resolve (via
> pipe_blit)?  If so, here are my thoughts:

Yes. The resolve is always initiated via a blit() call anyways (with a
dst surface with nr_samples == 0).

> 1) The software winsys and state trackers don't support multisample surfaces
>for software renderers, nor will/should they (except for swr).  So, I
>thought keeping most of the changes local to our driver would be most
>desirable and safest, as far as swrast and llvmpipe are concerned.  Not
>sure about wgl yet, but I don't see it.
>
> 2) A blit based resolve causes a pipeline reconfiguration (save/restore around
>the blit) that is inherently less efficient than simply
>storing-out/resolving HotTiles.
>
> 3) A blit based resolve needs to sample from the multisample surface using a
>texture sampler with 2DMS/3DMS support.  We’re currently using llvmpipe's
>sampler which doesn't need this support.  I’m looking into extending it, as
>I know we need the functionality for compliance; it’s just not there yet.
>
> I may be off-base on any of these thoughts.  If so, please correct me.
>
> We’ll probably move to a “driver internal” on-demand resolve, implemented
> similar to StoreTiles.  It's a simple matter to only resolve for the times we
> know it's needed and the multisample surface is in HotTiles.  But, I need to
> work out the LoadTiles case for surfaces that aren’t currently in HotTiles.
> Tricky, since we're checking the resolve status of the secondary (resolved)
> surface and the HotTile state of the multisample surface.
>
> Thanks for the feedback.  Getting this completely correct and optimized is
> going to be iterative.  This current patch, while maybe not optimal, helps
> with functionality.  So, I think it's a step in the right direction.

I hope you realize I wasn't looking to derail your attempts at
progress, more like providing some things to think about on your march
towards perfection :) MS textures/fbo's are definitely a thing,
probably more so than MS winsys surfaces these days. At least for
games, maybe not visualization software, with which I have next to no
experience. Try it with e.g. Unigine Heaven or Valley (with MSAA
enabled). I'm fairly sure that at least Heaven uses MSAA textures.

I believe most hardware uses MSAA compression, based on the
observation that it's pretty common for all samples in a pixel to have
the same color, or bg color + fg color + coverage mask. TBH I'm not
sure how it all works. Something for the future when you get all the
basics right.

Some hardware has built-in resolve functionality (e.g. Adreno, maybe
other tilers as well) for moving a MS FBO out of a "hot tile", while
most hardware requires the pipeline reconfiguration + blit. Perhaps
it'd make sense to add a special FE command for computing the resolved
version of all the tiles, and have that state get dirtied when you
render. There are also extensions like
GL_EXT_multisampled_render_to_texture which support the
"insta-resolve" use-case more directly. However they're not
implemented in mesa AFAIK.

Cheers,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] swr: move msaa resolve to generalized StoreTile

2017-04-28 Thread Cherniak, Bruce

> On Apr 27, 2017, at 7:50 PM, Ilia Mirkin  wrote:
> 
> On Thu, Apr 27, 2017 at 8:45 PM, Cherniak, Bruce
>  wrote:
>> 
>>> On Apr 27, 2017, at 7:38 PM, Ilia Mirkin  wrote:
>>> 
>>> Erm, so ... what happens if I render to FB1, then render to FB2, then
>>> render to FB1 again (and I have blending enabled)? Doesn't the resolve
>>> lose the per-sample information? Or does the resolve merely precompute
>>> the resolved version on the off chance that it's needed, without
>>> losing the source data?
>> 
>> The resolve occurs into a secondary, driver private, surface.  All per-sample
>> information is maintained in the original surfaces.
>> 
>> Yes, the resolve is currently done "on the off chance that it’s needed”.
>> There is likely an optimization to be had there, but it should be 
>> functionally
>> correct.
> 
> Got it. May I ask why this isn't done on-demand instead? Is it a pain
> to plug into swr's execution engine? I'm just concerned that
> StoreTile() may get called a lot, more than even there are draws, as
> tiles are swapped in and out of "hotness", and I wouldn't be surprised
> if resolves were needed only a fraction of the time.
> 
> Cheers,
> 
>  -ilia


Good observation.  I haven’t yet seen this to be the case in the scientific
visualization applications I’ve been running. But, I can envision where that
becomes a performance concern.

Do you mean a blit based “state_tracker initiated” on-demand resolve (via
pipe_blit)?  If so, here are my thoughts:

1) The software winsys and state trackers don't support multisample surfaces
   for software renderers, nor will/should they (except for swr).  So, I
   thought keeping most of the changes local to our driver would be most
   desirable and safest, as far as swrast and llvmpipe are concerned.  Not
   sure about wgl yet, but I don't see it.

2) A blit based resolve causes a pipeline reconfiguration (save/restore around
   the blit) that is inherently less efficient than simply
   storing-out/resolving HotTiles.

3) A blit based resolve needs to sample from the multisample surface using a
   texture sampler with 2DMS/3DMS support.  We’re currently using llvmpipe's
   sampler which doesn't need this support.  I’m looking into extending it, as
   I know we need the functionality for compliance; it’s just not there yet.

I may be off-base on any of these thoughts.  If so, please correct me.

We’ll probably move to a “driver internal” on-demand resolve, implemented
similar to StoreTiles.  It's a simple matter to only resolve for the times we
know it's needed and the multisample surface is in HotTiles.  But, I need to
work out the LoadTiles case for surfaces that aren’t currently in HotTiles.
Tricky, since we're checking the resolve status of the secondary (resolved)
surface and the HotTile state of the multisample surface.

Thanks for the feedback.  Getting this completely correct and optimized is
going to be iterative.  This current patch, while maybe not optimal, helps
with functionality.  So, I think it's a step in the right direction.

Thanks,

Bruce

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] swr: move msaa resolve to generalized StoreTile

2017-04-27 Thread Ilia Mirkin
On Thu, Apr 27, 2017 at 8:45 PM, Cherniak, Bruce
 wrote:
>
>> On Apr 27, 2017, at 7:38 PM, Ilia Mirkin  wrote:
>>
>> Erm, so ... what happens if I render to FB1, then render to FB2, then
>> render to FB1 again (and I have blending enabled)? Doesn't the resolve
>> lose the per-sample information? Or does the resolve merely precompute
>> the resolved version on the off chance that it's needed, without
>> losing the source data?
>
> The resolve occurs into a secondary, driver private, surface.  All per-sample
> information is maintained in the original surfaces.
>
> Yes, the resolve is currently done "on the off chance that it’s needed”.
> There is likely an optimization to be had there, but it should be functionally
> correct.

Got it. May I ask why this isn't done on-demand instead? Is it a pain
to plug into swr's execution engine? I'm just concerned that
StoreTile() may get called a lot, more than even there are draws, as
tiles are swapped in and out of "hotness", and I wouldn't be surprised
if resolves were needed only a fraction of the time.

Cheers,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] swr: move msaa resolve to generalized StoreTile

2017-04-27 Thread Cherniak, Bruce

> On Apr 27, 2017, at 7:38 PM, Ilia Mirkin  wrote:
> 
> Erm, so ... what happens if I render to FB1, then render to FB2, then
> render to FB1 again (and I have blending enabled)? Doesn't the resolve
> lose the per-sample information? Or does the resolve merely precompute
> the resolved version on the off chance that it's needed, without
> losing the source data?

The resolve occurs into a secondary, driver private, surface.  All per-sample
information is maintained in the original surfaces.

Yes, the resolve is currently done "on the off chance that it’s needed”.
There is likely an optimization to be had there, but it should be functionally
correct.

> On Thu, Apr 27, 2017 at 8:22 PM, Bruce Cherniak
>  wrote:
>> v2: Reword commit message to more closely adhere to community
>> guidelines.
>> 
>> This patch moves msaa resolve down into core/StoreTiles where the
>> surface format conversion routines are available.  The previous
>> "experimental" resolve was limited to 8-bit unsigned render targets.
>> 
>> This fixes a number of piglit msaa tests by adding resolve support for
>> all the render target formats we support.
>> 
>> MSAA is still disabled by default, but can be enabled with
>> "export SWR_MSAA_MAX_COUNT=4" (1,2,4,8,16 are options)
>> The default is 0, which is disabled.
>> 
>> Because it fixes a number of piglit tests, I kindly request inclusion
>> into 17.1 stable.
>> 
>> cc: mesa-sta...@lists.freedesktop.org
>> ---
>> .../drivers/swr/rasterizer/memory/StoreTile.h  | 75 +
>> src/gallium/drivers/swr/swr_context.cpp| 77 
>> +-
>> src/gallium/drivers/swr/swr_screen.cpp | 10 +--
>> 3 files changed, 82 insertions(+), 80 deletions(-)
>> 
>> diff --git a/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h 
>> b/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h
>> index ffde574c03..12a5f3d8ce 100644
>> --- a/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h
>> +++ b/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h
>> @@ -1133,6 +1133,64 @@ struct StoreRasterTile
>> }
>> }
>> }
>> +
>> +
>> //
>> +/// @brief Resolves an 8x8 raster tile to the resolve destination 
>> surface.
>> +/// @param pSrc - Pointer to raster tile.
>> +/// @param pDstSurface - Destination surface state
>> +/// @param x, y - Coordinates to raster tile.
>> +/// @param sampleOffset - Offset between adjacent multisamples
>> +INLINE static void Resolve(
>> +uint8_t *pSrc,
>> +SWR_SURFACE_STATE* pDstSurface,
>> +uint32_t x, uint32_t y, uint32_t sampleOffset, uint32_t 
>> renderTargetArrayIndex) // (x, y) pixel coordinate to start of raster tile.
>> +{
>> +uint32_t lodWidth = std::max(pDstSurface->width >> 
>> pDstSurface->lod, 1U);
>> +uint32_t lodHeight = std::max(pDstSurface->height >> 
>> pDstSurface->lod, 1U);
>> +
>> +float oneOverNumSamples = 1.0f / pDstSurface->numSamples;
>> +
>> +// For each raster tile pixel (rx, ry)
>> +for (uint32_t ry = 0; ry < KNOB_TILE_Y_DIM; ++ry)
>> +{
>> +for (uint32_t rx = 0; rx < KNOB_TILE_X_DIM; ++rx)
>> +{
>> +// Perform bounds checking.
>> +if (((x + rx) < lodWidth) &&
>> +((y + ry) < lodHeight))
>> +{
>> +// Sum across samples
>> +float resolveColor[4] = {0};
>> +for (uint32_t sampleNum = 0; sampleNum < 
>> pDstSurface->numSamples; sampleNum++)
>> +{
>> +float sampleColor[4] = {0};
>> +uint8_t *pSampleSrc = pSrc + sampleOffset * 
>> sampleNum;
>> +GetSwizzledSrcColor(pSampleSrc, rx, ry, 
>> sampleColor);
>> +resolveColor[0] += sampleColor[0];
>> +resolveColor[1] += sampleColor[1];
>> +resolveColor[2] += sampleColor[2];
>> +resolveColor[3] += sampleColor[3];
>> +}
>> +
>> +// Divide by numSamples to average
>> +resolveColor[0] *= oneOverNumSamples;
>> +resolveColor[1] *= oneOverNumSamples;
>> +resolveColor[2] *= oneOverNumSamples;
>> +resolveColor[3] *= oneOverNumSamples;
>> +
>> +// Use the resolve surface state
>> +SWR_SURFACE_STATE* pResolveSurface = 
>> (SWR_SURFACE_STATE*)pDstSurface->pAuxBaseAddress;
>> +uint8_t *pDst = (uint8_t*)ComputeSurfaceAddress> false>((x + rx), (y + ry),
>> +pResolveSurface->arrayIndex + 
>> renderTargetArrayIndex, pResolveSurface->arrayIndex + renderTargetArrayIndex,
>> +0, 

Re: [Mesa-dev] [PATCH v2] swr: move msaa resolve to generalized StoreTile

2017-04-27 Thread Ilia Mirkin
Erm, so ... what happens if I render to FB1, then render to FB2, then
render to FB1 again (and I have blending enabled)? Doesn't the resolve
lose the per-sample information? Or does the resolve merely precompute
the resolved version on the off chance that it's needed, without
losing the source data?

On Thu, Apr 27, 2017 at 8:22 PM, Bruce Cherniak
 wrote:
> v2: Reword commit message to more closely adhere to community
> guidelines.
>
> This patch moves msaa resolve down into core/StoreTiles where the
> surface format conversion routines are available.  The previous
> "experimental" resolve was limited to 8-bit unsigned render targets.
>
> This fixes a number of piglit msaa tests by adding resolve support for
> all the render target formats we support.
>
> MSAA is still disabled by default, but can be enabled with
> "export SWR_MSAA_MAX_COUNT=4" (1,2,4,8,16 are options)
> The default is 0, which is disabled.
>
> Because it fixes a number of piglit tests, I kindly request inclusion
> into 17.1 stable.
>
> cc: mesa-sta...@lists.freedesktop.org
> ---
>  .../drivers/swr/rasterizer/memory/StoreTile.h  | 75 +
>  src/gallium/drivers/swr/swr_context.cpp| 77 
> +-
>  src/gallium/drivers/swr/swr_screen.cpp | 10 +--
>  3 files changed, 82 insertions(+), 80 deletions(-)
>
> diff --git a/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h 
> b/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h
> index ffde574c03..12a5f3d8ce 100644
> --- a/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h
> +++ b/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h
> @@ -1133,6 +1133,64 @@ struct StoreRasterTile
>  }
>  }
>  }
> +
> +
> //
> +/// @brief Resolves an 8x8 raster tile to the resolve destination 
> surface.
> +/// @param pSrc - Pointer to raster tile.
> +/// @param pDstSurface - Destination surface state
> +/// @param x, y - Coordinates to raster tile.
> +/// @param sampleOffset - Offset between adjacent multisamples
> +INLINE static void Resolve(
> +uint8_t *pSrc,
> +SWR_SURFACE_STATE* pDstSurface,
> +uint32_t x, uint32_t y, uint32_t sampleOffset, uint32_t 
> renderTargetArrayIndex) // (x, y) pixel coordinate to start of raster tile.
> +{
> +uint32_t lodWidth = std::max(pDstSurface->width >> pDstSurface->lod, 
> 1U);
> +uint32_t lodHeight = std::max(pDstSurface->height >> 
> pDstSurface->lod, 1U);
> +
> +float oneOverNumSamples = 1.0f / pDstSurface->numSamples;
> +
> +// For each raster tile pixel (rx, ry)
> +for (uint32_t ry = 0; ry < KNOB_TILE_Y_DIM; ++ry)
> +{
> +for (uint32_t rx = 0; rx < KNOB_TILE_X_DIM; ++rx)
> +{
> +// Perform bounds checking.
> +if (((x + rx) < lodWidth) &&
> +((y + ry) < lodHeight))
> +{
> +// Sum across samples
> +float resolveColor[4] = {0};
> +for (uint32_t sampleNum = 0; sampleNum < 
> pDstSurface->numSamples; sampleNum++)
> +{
> +float sampleColor[4] = {0};
> +uint8_t *pSampleSrc = pSrc + sampleOffset * 
> sampleNum;
> +GetSwizzledSrcColor(pSampleSrc, rx, ry, sampleColor);
> +resolveColor[0] += sampleColor[0];
> +resolveColor[1] += sampleColor[1];
> +resolveColor[2] += sampleColor[2];
> +resolveColor[3] += sampleColor[3];
> +}
> +
> +// Divide by numSamples to average
> +resolveColor[0] *= oneOverNumSamples;
> +resolveColor[1] *= oneOverNumSamples;
> +resolveColor[2] *= oneOverNumSamples;
> +resolveColor[3] *= oneOverNumSamples;
> +
> +// Use the resolve surface state
> +SWR_SURFACE_STATE* pResolveSurface = 
> (SWR_SURFACE_STATE*)pDstSurface->pAuxBaseAddress;
> +uint8_t *pDst = (uint8_t*)ComputeSurfaceAddress false>((x + rx), (y + ry),
> +pResolveSurface->arrayIndex + 
> renderTargetArrayIndex, pResolveSurface->arrayIndex + renderTargetArrayIndex,
> +0, pResolveSurface->lod, pResolveSurface);
> +{
> +ConvertPixelFromFloat(pDst, resolveColor);
> +}
> +}
> +}
> +}
> +}
> +
>  };
>
>  template
> @@ -2316,6 +2374,9 @@ struct StoreMacroTile
>  pfnStore[sampleNum] = (bForceGeneric || 
> KNOB_USE_GENERIC_STORETILE) ? StoreRasterTile DstFormat>::Store : OptStoreRasterTile

[Mesa-dev] [PATCH v2] swr: move msaa resolve to generalized StoreTile

2017-04-27 Thread Bruce Cherniak
v2: Reword commit message to more closely adhere to community
guidelines.

This patch moves msaa resolve down into core/StoreTiles where the
surface format conversion routines are available.  The previous
"experimental" resolve was limited to 8-bit unsigned render targets.

This fixes a number of piglit msaa tests by adding resolve support for
all the render target formats we support.

MSAA is still disabled by default, but can be enabled with
"export SWR_MSAA_MAX_COUNT=4" (1,2,4,8,16 are options)
The default is 0, which is disabled.

Because it fixes a number of piglit tests, I kindly request inclusion
into 17.1 stable.

cc: mesa-sta...@lists.freedesktop.org
---
 .../drivers/swr/rasterizer/memory/StoreTile.h  | 75 +
 src/gallium/drivers/swr/swr_context.cpp| 77 +-
 src/gallium/drivers/swr/swr_screen.cpp | 10 +--
 3 files changed, 82 insertions(+), 80 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h 
b/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h
index ffde574c03..12a5f3d8ce 100644
--- a/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h
+++ b/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h
@@ -1133,6 +1133,64 @@ struct StoreRasterTile
 }
 }
 }
+
+//
+/// @brief Resolves an 8x8 raster tile to the resolve destination surface.
+/// @param pSrc - Pointer to raster tile.
+/// @param pDstSurface - Destination surface state
+/// @param x, y - Coordinates to raster tile.
+/// @param sampleOffset - Offset between adjacent multisamples
+INLINE static void Resolve(
+uint8_t *pSrc,
+SWR_SURFACE_STATE* pDstSurface,
+uint32_t x, uint32_t y, uint32_t sampleOffset, uint32_t 
renderTargetArrayIndex) // (x, y) pixel coordinate to start of raster tile.
+{
+uint32_t lodWidth = std::max(pDstSurface->width >> pDstSurface->lod, 
1U);
+uint32_t lodHeight = std::max(pDstSurface->height >> pDstSurface->lod, 
1U);
+
+float oneOverNumSamples = 1.0f / pDstSurface->numSamples;
+
+// For each raster tile pixel (rx, ry)
+for (uint32_t ry = 0; ry < KNOB_TILE_Y_DIM; ++ry)
+{
+for (uint32_t rx = 0; rx < KNOB_TILE_X_DIM; ++rx)
+{
+// Perform bounds checking.
+if (((x + rx) < lodWidth) &&
+((y + ry) < lodHeight))
+{
+// Sum across samples
+float resolveColor[4] = {0};
+for (uint32_t sampleNum = 0; sampleNum < 
pDstSurface->numSamples; sampleNum++)
+{
+float sampleColor[4] = {0};
+uint8_t *pSampleSrc = pSrc + sampleOffset * sampleNum;
+GetSwizzledSrcColor(pSampleSrc, rx, ry, sampleColor);
+resolveColor[0] += sampleColor[0];
+resolveColor[1] += sampleColor[1];
+resolveColor[2] += sampleColor[2];
+resolveColor[3] += sampleColor[3];
+}
+
+// Divide by numSamples to average
+resolveColor[0] *= oneOverNumSamples;
+resolveColor[1] *= oneOverNumSamples;
+resolveColor[2] *= oneOverNumSamples;
+resolveColor[3] *= oneOverNumSamples;
+
+// Use the resolve surface state
+SWR_SURFACE_STATE* pResolveSurface = 
(SWR_SURFACE_STATE*)pDstSurface->pAuxBaseAddress;
+uint8_t *pDst = (uint8_t*)ComputeSurfaceAddress((x + rx), (y + ry),
+pResolveSurface->arrayIndex + renderTargetArrayIndex, 
pResolveSurface->arrayIndex + renderTargetArrayIndex,
+0, pResolveSurface->lod, pResolveSurface);
+{
+ConvertPixelFromFloat(pDst, resolveColor);
+}
+}
+}
+}
+}
+
 };
 
 template
@@ -2316,6 +2374,9 @@ struct StoreMacroTile
 pfnStore[sampleNum] = (bForceGeneric || 
KNOB_USE_GENERIC_STORETILE) ? StoreRasterTile::Store : OptStoreRasterTile::Store;
 }
 
+// Save original for pSrcHotTile resolve.
+uint8_t *pResolveSrcHotTile = pSrcHotTile;
+
 // Store each raster tile from the hot tile to the destination surface.
 for(uint32_t row = 0; row < KNOB_MACROTILE_Y_DIM; row += 
KNOB_TILE_Y_DIM)
 {
@@ -2328,6 +2389,20 @@ struct StoreMacroTile
 }
 }
 }
+
+if (pDstSurface->pAuxBaseAddress)
+{
+uint32_t sampleOffset = KNOB_TILE_X_DIM * KNOB_TILE_Y_DIM * 
(FormatTraits::bpp / 8);
+// Store each raster tile from