Re: [Mesa-dev] [PATCH v2] swr: move msaa resolve to generalized StoreTile
> On Apr 28, 2017, at 3:20 PM, Ilia Mirkinwrote: > > On Fri, Apr 28, 2017 at 3:58 PM, Cherniak, Bruce > wrote: >> >>> On Apr 27, 2017, at 7:50 PM, Ilia Mirkin wrote: >>> >>> On Thu, Apr 27, 2017 at 8:45 PM, Cherniak, Bruce >>> wrote: > On Apr 27, 2017, at 7:38 PM, Ilia Mirkin wrote: > > Erm, so ... what happens if I render to FB1, then render to FB2, then > render to FB1 again (and I have blending enabled)? Doesn't the resolve > lose the per-sample information? Or does the resolve merely precompute > the resolved version on the off chance that it's needed, without > losing the source data? The resolve occurs into a secondary, driver private, surface. All per-sample information is maintained in the original surfaces. Yes, the resolve is currently done "on the off chance that it’s needed”. There is likely an optimization to be had there, but it should be functionally correct. >>> >>> Got it. May I ask why this isn't done on-demand instead? Is it a pain >>> to plug into swr's execution engine? I'm just concerned that >>> StoreTile() may get called a lot, more than even there are draws, as >>> tiles are swapped in and out of "hotness", and I wouldn't be surprised >>> if resolves were needed only a fraction of the time. >>> >>> Cheers, >>> >>> -ilia >> >> >> Good observation. I haven’t yet seen this to be the case in the scientific >> visualization applications I’ve been running. But, I can envision where that >> becomes a performance concern. >> >> Do you mean a blit based “state_tracker initiated” on-demand resolve (via >> pipe_blit)? If so, here are my thoughts: > > Yes. The resolve is always initiated via a blit() call anyways (with a > dst surface with nr_samples == 0). > >> 1) The software winsys and state trackers don't support multisample surfaces >> for software renderers, nor will/should they (except for swr). So, I >> thought keeping most of the changes local to our driver would be most >> desirable and safest, as far as swrast and llvmpipe are concerned. Not >> sure about wgl yet, but I don't see it. >> >> 2) A blit based resolve causes a pipeline reconfiguration (save/restore >> around >> the blit) that is inherently less efficient than simply >> storing-out/resolving HotTiles. >> >> 3) A blit based resolve needs to sample from the multisample surface using a >> texture sampler with 2DMS/3DMS support. We’re currently using llvmpipe's >> sampler which doesn't need this support. I’m looking into extending it, as >> I know we need the functionality for compliance; it’s just not there yet. >> >> I may be off-base on any of these thoughts. If so, please correct me. >> >> We’ll probably move to a “driver internal” on-demand resolve, implemented >> similar to StoreTiles. It's a simple matter to only resolve for the times we >> know it's needed and the multisample surface is in HotTiles. But, I need to >> work out the LoadTiles case for surfaces that aren’t currently in HotTiles. >> Tricky, since we're checking the resolve status of the secondary (resolved) >> surface and the HotTile state of the multisample surface. >> >> Thanks for the feedback. Getting this completely correct and optimized is >> going to be iterative. This current patch, while maybe not optimal, helps >> with functionality. So, I think it's a step in the right direction. > > I hope you realize I wasn't looking to derail your attempts at > progress, more like providing some things to think about on your march > towards perfection :) MS textures/fbo's are definitely a thing, > probably more so than MS winsys surfaces these days. At least for > games, maybe not visualization software, with which I have next to no > experience. Try it with e.g. Unigine Heaven or Valley (with MSAA > enabled). I'm fairly sure that at least Heaven uses MSAA textures. We always value and appreciate your input. And, while we’re primarily focused on sci vis software, we’d like to be compliant as possible; which means running a wide variety of applications… even *gasp* games. I’ll definitely give Unigine a try. Not expected great performance, but we can at least strive for correct functionality. > I believe most hardware uses MSAA compression, based on the > observation that it's pretty common for all samples in a pixel to have > the same color, or bg color + fg color + coverage mask. TBH I'm not > sure how it all works. Something for the future when you get all the > basics right. “march towards perfections” :) > Some hardware has built-in resolve functionality (e.g. Adreno, maybe > other tilers as well) for moving a MS FBO out of a "hot tile", while > most hardware requires the pipeline reconfiguration + blit. Perhaps > it'd make sense to add a special FE command for computing the resolved > version
Re: [Mesa-dev] [PATCH v2] swr: move msaa resolve to generalized StoreTile
On Fri, Apr 28, 2017 at 3:58 PM, Cherniak, Brucewrote: > >> On Apr 27, 2017, at 7:50 PM, Ilia Mirkin wrote: >> >> On Thu, Apr 27, 2017 at 8:45 PM, Cherniak, Bruce >> wrote: >>> On Apr 27, 2017, at 7:38 PM, Ilia Mirkin wrote: Erm, so ... what happens if I render to FB1, then render to FB2, then render to FB1 again (and I have blending enabled)? Doesn't the resolve lose the per-sample information? Or does the resolve merely precompute the resolved version on the off chance that it's needed, without losing the source data? >>> >>> The resolve occurs into a secondary, driver private, surface. All >>> per-sample >>> information is maintained in the original surfaces. >>> >>> Yes, the resolve is currently done "on the off chance that it’s needed”. >>> There is likely an optimization to be had there, but it should be >>> functionally >>> correct. >> >> Got it. May I ask why this isn't done on-demand instead? Is it a pain >> to plug into swr's execution engine? I'm just concerned that >> StoreTile() may get called a lot, more than even there are draws, as >> tiles are swapped in and out of "hotness", and I wouldn't be surprised >> if resolves were needed only a fraction of the time. >> >> Cheers, >> >> -ilia > > > Good observation. I haven’t yet seen this to be the case in the scientific > visualization applications I’ve been running. But, I can envision where that > becomes a performance concern. > > Do you mean a blit based “state_tracker initiated” on-demand resolve (via > pipe_blit)? If so, here are my thoughts: Yes. The resolve is always initiated via a blit() call anyways (with a dst surface with nr_samples == 0). > 1) The software winsys and state trackers don't support multisample surfaces >for software renderers, nor will/should they (except for swr). So, I >thought keeping most of the changes local to our driver would be most >desirable and safest, as far as swrast and llvmpipe are concerned. Not >sure about wgl yet, but I don't see it. > > 2) A blit based resolve causes a pipeline reconfiguration (save/restore around >the blit) that is inherently less efficient than simply >storing-out/resolving HotTiles. > > 3) A blit based resolve needs to sample from the multisample surface using a >texture sampler with 2DMS/3DMS support. We’re currently using llvmpipe's >sampler which doesn't need this support. I’m looking into extending it, as >I know we need the functionality for compliance; it’s just not there yet. > > I may be off-base on any of these thoughts. If so, please correct me. > > We’ll probably move to a “driver internal” on-demand resolve, implemented > similar to StoreTiles. It's a simple matter to only resolve for the times we > know it's needed and the multisample surface is in HotTiles. But, I need to > work out the LoadTiles case for surfaces that aren’t currently in HotTiles. > Tricky, since we're checking the resolve status of the secondary (resolved) > surface and the HotTile state of the multisample surface. > > Thanks for the feedback. Getting this completely correct and optimized is > going to be iterative. This current patch, while maybe not optimal, helps > with functionality. So, I think it's a step in the right direction. I hope you realize I wasn't looking to derail your attempts at progress, more like providing some things to think about on your march towards perfection :) MS textures/fbo's are definitely a thing, probably more so than MS winsys surfaces these days. At least for games, maybe not visualization software, with which I have next to no experience. Try it with e.g. Unigine Heaven or Valley (with MSAA enabled). I'm fairly sure that at least Heaven uses MSAA textures. I believe most hardware uses MSAA compression, based on the observation that it's pretty common for all samples in a pixel to have the same color, or bg color + fg color + coverage mask. TBH I'm not sure how it all works. Something for the future when you get all the basics right. Some hardware has built-in resolve functionality (e.g. Adreno, maybe other tilers as well) for moving a MS FBO out of a "hot tile", while most hardware requires the pipeline reconfiguration + blit. Perhaps it'd make sense to add a special FE command for computing the resolved version of all the tiles, and have that state get dirtied when you render. There are also extensions like GL_EXT_multisampled_render_to_texture which support the "insta-resolve" use-case more directly. However they're not implemented in mesa AFAIK. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] swr: move msaa resolve to generalized StoreTile
> On Apr 27, 2017, at 7:50 PM, Ilia Mirkinwrote: > > On Thu, Apr 27, 2017 at 8:45 PM, Cherniak, Bruce > wrote: >> >>> On Apr 27, 2017, at 7:38 PM, Ilia Mirkin wrote: >>> >>> Erm, so ... what happens if I render to FB1, then render to FB2, then >>> render to FB1 again (and I have blending enabled)? Doesn't the resolve >>> lose the per-sample information? Or does the resolve merely precompute >>> the resolved version on the off chance that it's needed, without >>> losing the source data? >> >> The resolve occurs into a secondary, driver private, surface. All per-sample >> information is maintained in the original surfaces. >> >> Yes, the resolve is currently done "on the off chance that it’s needed”. >> There is likely an optimization to be had there, but it should be >> functionally >> correct. > > Got it. May I ask why this isn't done on-demand instead? Is it a pain > to plug into swr's execution engine? I'm just concerned that > StoreTile() may get called a lot, more than even there are draws, as > tiles are swapped in and out of "hotness", and I wouldn't be surprised > if resolves were needed only a fraction of the time. > > Cheers, > > -ilia Good observation. I haven’t yet seen this to be the case in the scientific visualization applications I’ve been running. But, I can envision where that becomes a performance concern. Do you mean a blit based “state_tracker initiated” on-demand resolve (via pipe_blit)? If so, here are my thoughts: 1) The software winsys and state trackers don't support multisample surfaces for software renderers, nor will/should they (except for swr). So, I thought keeping most of the changes local to our driver would be most desirable and safest, as far as swrast and llvmpipe are concerned. Not sure about wgl yet, but I don't see it. 2) A blit based resolve causes a pipeline reconfiguration (save/restore around the blit) that is inherently less efficient than simply storing-out/resolving HotTiles. 3) A blit based resolve needs to sample from the multisample surface using a texture sampler with 2DMS/3DMS support. We’re currently using llvmpipe's sampler which doesn't need this support. I’m looking into extending it, as I know we need the functionality for compliance; it’s just not there yet. I may be off-base on any of these thoughts. If so, please correct me. We’ll probably move to a “driver internal” on-demand resolve, implemented similar to StoreTiles. It's a simple matter to only resolve for the times we know it's needed and the multisample surface is in HotTiles. But, I need to work out the LoadTiles case for surfaces that aren’t currently in HotTiles. Tricky, since we're checking the resolve status of the secondary (resolved) surface and the HotTile state of the multisample surface. Thanks for the feedback. Getting this completely correct and optimized is going to be iterative. This current patch, while maybe not optimal, helps with functionality. So, I think it's a step in the right direction. Thanks, Bruce ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] swr: move msaa resolve to generalized StoreTile
On Thu, Apr 27, 2017 at 8:45 PM, Cherniak, Brucewrote: > >> On Apr 27, 2017, at 7:38 PM, Ilia Mirkin wrote: >> >> Erm, so ... what happens if I render to FB1, then render to FB2, then >> render to FB1 again (and I have blending enabled)? Doesn't the resolve >> lose the per-sample information? Or does the resolve merely precompute >> the resolved version on the off chance that it's needed, without >> losing the source data? > > The resolve occurs into a secondary, driver private, surface. All per-sample > information is maintained in the original surfaces. > > Yes, the resolve is currently done "on the off chance that it’s needed”. > There is likely an optimization to be had there, but it should be functionally > correct. Got it. May I ask why this isn't done on-demand instead? Is it a pain to plug into swr's execution engine? I'm just concerned that StoreTile() may get called a lot, more than even there are draws, as tiles are swapped in and out of "hotness", and I wouldn't be surprised if resolves were needed only a fraction of the time. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] swr: move msaa resolve to generalized StoreTile
> On Apr 27, 2017, at 7:38 PM, Ilia Mirkinwrote: > > Erm, so ... what happens if I render to FB1, then render to FB2, then > render to FB1 again (and I have blending enabled)? Doesn't the resolve > lose the per-sample information? Or does the resolve merely precompute > the resolved version on the off chance that it's needed, without > losing the source data? The resolve occurs into a secondary, driver private, surface. All per-sample information is maintained in the original surfaces. Yes, the resolve is currently done "on the off chance that it’s needed”. There is likely an optimization to be had there, but it should be functionally correct. > On Thu, Apr 27, 2017 at 8:22 PM, Bruce Cherniak > wrote: >> v2: Reword commit message to more closely adhere to community >> guidelines. >> >> This patch moves msaa resolve down into core/StoreTiles where the >> surface format conversion routines are available. The previous >> "experimental" resolve was limited to 8-bit unsigned render targets. >> >> This fixes a number of piglit msaa tests by adding resolve support for >> all the render target formats we support. >> >> MSAA is still disabled by default, but can be enabled with >> "export SWR_MSAA_MAX_COUNT=4" (1,2,4,8,16 are options) >> The default is 0, which is disabled. >> >> Because it fixes a number of piglit tests, I kindly request inclusion >> into 17.1 stable. >> >> cc: mesa-sta...@lists.freedesktop.org >> --- >> .../drivers/swr/rasterizer/memory/StoreTile.h | 75 + >> src/gallium/drivers/swr/swr_context.cpp| 77 >> +- >> src/gallium/drivers/swr/swr_screen.cpp | 10 +-- >> 3 files changed, 82 insertions(+), 80 deletions(-) >> >> diff --git a/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h >> b/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h >> index ffde574c03..12a5f3d8ce 100644 >> --- a/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h >> +++ b/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h >> @@ -1133,6 +1133,64 @@ struct StoreRasterTile >> } >> } >> } >> + >> + >> // >> +/// @brief Resolves an 8x8 raster tile to the resolve destination >> surface. >> +/// @param pSrc - Pointer to raster tile. >> +/// @param pDstSurface - Destination surface state >> +/// @param x, y - Coordinates to raster tile. >> +/// @param sampleOffset - Offset between adjacent multisamples >> +INLINE static void Resolve( >> +uint8_t *pSrc, >> +SWR_SURFACE_STATE* pDstSurface, >> +uint32_t x, uint32_t y, uint32_t sampleOffset, uint32_t >> renderTargetArrayIndex) // (x, y) pixel coordinate to start of raster tile. >> +{ >> +uint32_t lodWidth = std::max(pDstSurface->width >> >> pDstSurface->lod, 1U); >> +uint32_t lodHeight = std::max(pDstSurface->height >> >> pDstSurface->lod, 1U); >> + >> +float oneOverNumSamples = 1.0f / pDstSurface->numSamples; >> + >> +// For each raster tile pixel (rx, ry) >> +for (uint32_t ry = 0; ry < KNOB_TILE_Y_DIM; ++ry) >> +{ >> +for (uint32_t rx = 0; rx < KNOB_TILE_X_DIM; ++rx) >> +{ >> +// Perform bounds checking. >> +if (((x + rx) < lodWidth) && >> +((y + ry) < lodHeight)) >> +{ >> +// Sum across samples >> +float resolveColor[4] = {0}; >> +for (uint32_t sampleNum = 0; sampleNum < >> pDstSurface->numSamples; sampleNum++) >> +{ >> +float sampleColor[4] = {0}; >> +uint8_t *pSampleSrc = pSrc + sampleOffset * >> sampleNum; >> +GetSwizzledSrcColor(pSampleSrc, rx, ry, >> sampleColor); >> +resolveColor[0] += sampleColor[0]; >> +resolveColor[1] += sampleColor[1]; >> +resolveColor[2] += sampleColor[2]; >> +resolveColor[3] += sampleColor[3]; >> +} >> + >> +// Divide by numSamples to average >> +resolveColor[0] *= oneOverNumSamples; >> +resolveColor[1] *= oneOverNumSamples; >> +resolveColor[2] *= oneOverNumSamples; >> +resolveColor[3] *= oneOverNumSamples; >> + >> +// Use the resolve surface state >> +SWR_SURFACE_STATE* pResolveSurface = >> (SWR_SURFACE_STATE*)pDstSurface->pAuxBaseAddress; >> +uint8_t *pDst = (uint8_t*)ComputeSurfaceAddress > false>((x + rx), (y + ry), >> +pResolveSurface->arrayIndex + >> renderTargetArrayIndex, pResolveSurface->arrayIndex + renderTargetArrayIndex, >> +0,
Re: [Mesa-dev] [PATCH v2] swr: move msaa resolve to generalized StoreTile
Erm, so ... what happens if I render to FB1, then render to FB2, then render to FB1 again (and I have blending enabled)? Doesn't the resolve lose the per-sample information? Or does the resolve merely precompute the resolved version on the off chance that it's needed, without losing the source data? On Thu, Apr 27, 2017 at 8:22 PM, Bruce Cherniakwrote: > v2: Reword commit message to more closely adhere to community > guidelines. > > This patch moves msaa resolve down into core/StoreTiles where the > surface format conversion routines are available. The previous > "experimental" resolve was limited to 8-bit unsigned render targets. > > This fixes a number of piglit msaa tests by adding resolve support for > all the render target formats we support. > > MSAA is still disabled by default, but can be enabled with > "export SWR_MSAA_MAX_COUNT=4" (1,2,4,8,16 are options) > The default is 0, which is disabled. > > Because it fixes a number of piglit tests, I kindly request inclusion > into 17.1 stable. > > cc: mesa-sta...@lists.freedesktop.org > --- > .../drivers/swr/rasterizer/memory/StoreTile.h | 75 + > src/gallium/drivers/swr/swr_context.cpp| 77 > +- > src/gallium/drivers/swr/swr_screen.cpp | 10 +-- > 3 files changed, 82 insertions(+), 80 deletions(-) > > diff --git a/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h > b/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h > index ffde574c03..12a5f3d8ce 100644 > --- a/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h > +++ b/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h > @@ -1133,6 +1133,64 @@ struct StoreRasterTile > } > } > } > + > + > // > +/// @brief Resolves an 8x8 raster tile to the resolve destination > surface. > +/// @param pSrc - Pointer to raster tile. > +/// @param pDstSurface - Destination surface state > +/// @param x, y - Coordinates to raster tile. > +/// @param sampleOffset - Offset between adjacent multisamples > +INLINE static void Resolve( > +uint8_t *pSrc, > +SWR_SURFACE_STATE* pDstSurface, > +uint32_t x, uint32_t y, uint32_t sampleOffset, uint32_t > renderTargetArrayIndex) // (x, y) pixel coordinate to start of raster tile. > +{ > +uint32_t lodWidth = std::max(pDstSurface->width >> pDstSurface->lod, > 1U); > +uint32_t lodHeight = std::max(pDstSurface->height >> > pDstSurface->lod, 1U); > + > +float oneOverNumSamples = 1.0f / pDstSurface->numSamples; > + > +// For each raster tile pixel (rx, ry) > +for (uint32_t ry = 0; ry < KNOB_TILE_Y_DIM; ++ry) > +{ > +for (uint32_t rx = 0; rx < KNOB_TILE_X_DIM; ++rx) > +{ > +// Perform bounds checking. > +if (((x + rx) < lodWidth) && > +((y + ry) < lodHeight)) > +{ > +// Sum across samples > +float resolveColor[4] = {0}; > +for (uint32_t sampleNum = 0; sampleNum < > pDstSurface->numSamples; sampleNum++) > +{ > +float sampleColor[4] = {0}; > +uint8_t *pSampleSrc = pSrc + sampleOffset * > sampleNum; > +GetSwizzledSrcColor(pSampleSrc, rx, ry, sampleColor); > +resolveColor[0] += sampleColor[0]; > +resolveColor[1] += sampleColor[1]; > +resolveColor[2] += sampleColor[2]; > +resolveColor[3] += sampleColor[3]; > +} > + > +// Divide by numSamples to average > +resolveColor[0] *= oneOverNumSamples; > +resolveColor[1] *= oneOverNumSamples; > +resolveColor[2] *= oneOverNumSamples; > +resolveColor[3] *= oneOverNumSamples; > + > +// Use the resolve surface state > +SWR_SURFACE_STATE* pResolveSurface = > (SWR_SURFACE_STATE*)pDstSurface->pAuxBaseAddress; > +uint8_t *pDst = (uint8_t*)ComputeSurfaceAddress false>((x + rx), (y + ry), > +pResolveSurface->arrayIndex + > renderTargetArrayIndex, pResolveSurface->arrayIndex + renderTargetArrayIndex, > +0, pResolveSurface->lod, pResolveSurface); > +{ > +ConvertPixelFromFloat(pDst, resolveColor); > +} > +} > +} > +} > +} > + > }; > > template > @@ -2316,6 +2374,9 @@ struct StoreMacroTile > pfnStore[sampleNum] = (bForceGeneric || > KNOB_USE_GENERIC_STORETILE) ? StoreRasterTile DstFormat>::Store : OptStoreRasterTile
[Mesa-dev] [PATCH v2] swr: move msaa resolve to generalized StoreTile
v2: Reword commit message to more closely adhere to community guidelines. This patch moves msaa resolve down into core/StoreTiles where the surface format conversion routines are available. The previous "experimental" resolve was limited to 8-bit unsigned render targets. This fixes a number of piglit msaa tests by adding resolve support for all the render target formats we support. MSAA is still disabled by default, but can be enabled with "export SWR_MSAA_MAX_COUNT=4" (1,2,4,8,16 are options) The default is 0, which is disabled. Because it fixes a number of piglit tests, I kindly request inclusion into 17.1 stable. cc: mesa-sta...@lists.freedesktop.org --- .../drivers/swr/rasterizer/memory/StoreTile.h | 75 + src/gallium/drivers/swr/swr_context.cpp| 77 +- src/gallium/drivers/swr/swr_screen.cpp | 10 +-- 3 files changed, 82 insertions(+), 80 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h b/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h index ffde574c03..12a5f3d8ce 100644 --- a/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h +++ b/src/gallium/drivers/swr/rasterizer/memory/StoreTile.h @@ -1133,6 +1133,64 @@ struct StoreRasterTile } } } + +// +/// @brief Resolves an 8x8 raster tile to the resolve destination surface. +/// @param pSrc - Pointer to raster tile. +/// @param pDstSurface - Destination surface state +/// @param x, y - Coordinates to raster tile. +/// @param sampleOffset - Offset between adjacent multisamples +INLINE static void Resolve( +uint8_t *pSrc, +SWR_SURFACE_STATE* pDstSurface, +uint32_t x, uint32_t y, uint32_t sampleOffset, uint32_t renderTargetArrayIndex) // (x, y) pixel coordinate to start of raster tile. +{ +uint32_t lodWidth = std::max(pDstSurface->width >> pDstSurface->lod, 1U); +uint32_t lodHeight = std::max(pDstSurface->height >> pDstSurface->lod, 1U); + +float oneOverNumSamples = 1.0f / pDstSurface->numSamples; + +// For each raster tile pixel (rx, ry) +for (uint32_t ry = 0; ry < KNOB_TILE_Y_DIM; ++ry) +{ +for (uint32_t rx = 0; rx < KNOB_TILE_X_DIM; ++rx) +{ +// Perform bounds checking. +if (((x + rx) < lodWidth) && +((y + ry) < lodHeight)) +{ +// Sum across samples +float resolveColor[4] = {0}; +for (uint32_t sampleNum = 0; sampleNum < pDstSurface->numSamples; sampleNum++) +{ +float sampleColor[4] = {0}; +uint8_t *pSampleSrc = pSrc + sampleOffset * sampleNum; +GetSwizzledSrcColor(pSampleSrc, rx, ry, sampleColor); +resolveColor[0] += sampleColor[0]; +resolveColor[1] += sampleColor[1]; +resolveColor[2] += sampleColor[2]; +resolveColor[3] += sampleColor[3]; +} + +// Divide by numSamples to average +resolveColor[0] *= oneOverNumSamples; +resolveColor[1] *= oneOverNumSamples; +resolveColor[2] *= oneOverNumSamples; +resolveColor[3] *= oneOverNumSamples; + +// Use the resolve surface state +SWR_SURFACE_STATE* pResolveSurface = (SWR_SURFACE_STATE*)pDstSurface->pAuxBaseAddress; +uint8_t *pDst = (uint8_t*)ComputeSurfaceAddress((x + rx), (y + ry), +pResolveSurface->arrayIndex + renderTargetArrayIndex, pResolveSurface->arrayIndex + renderTargetArrayIndex, +0, pResolveSurface->lod, pResolveSurface); +{ +ConvertPixelFromFloat(pDst, resolveColor); +} +} +} +} +} + }; template @@ -2316,6 +2374,9 @@ struct StoreMacroTile pfnStore[sampleNum] = (bForceGeneric || KNOB_USE_GENERIC_STORETILE) ? StoreRasterTile ::Store : OptStoreRasterTile ::Store; } +// Save original for pSrcHotTile resolve. +uint8_t *pResolveSrcHotTile = pSrcHotTile; + // Store each raster tile from the hot tile to the destination surface. for(uint32_t row = 0; row < KNOB_MACROTILE_Y_DIM; row += KNOB_TILE_Y_DIM) { @@ -2328,6 +2389,20 @@ struct StoreMacroTile } } } + +if (pDstSurface->pAuxBaseAddress) +{ +uint32_t sampleOffset = KNOB_TILE_X_DIM * KNOB_TILE_Y_DIM * (FormatTraits::bpp / 8); +// Store each raster tile from