Re: [Mesa-dev] [PATCH] r600g: fix abysmal performance in Reaction Quake
On Sam, 2012-11-10 at 16:52 +0100, Marek Olšák wrote: On Fri, Nov 9, 2012 at 9:44 PM, Jerome Glisse j.gli...@gmail.com wrote: On Thu, Nov 01, 2012 at 03:13:31AM +0100, Marek Olšák wrote: On Thu, Nov 1, 2012 at 2:13 AM, Alex Deucher alexdeuc...@gmail.com wrote: On Wed, Oct 31, 2012 at 8:05 PM, Marek Olšák mar...@gmail.com wrote: The problem was we set VRAM|GTT for relocations of STATIC resources. Setting just VRAM increases the framerate 4 times on my machine. I rewrote the switch statement and adjusted the domains for window framebuffers too. Reviewed-by: Alex Deucher alexander.deuc...@amd.com Stable branches? Yes, good idea. Marek Btw as a follow up on this, i did some experiment with ttm and eviction. Blocking any vram eviction improve average fps (20-30%) and minimum fps (40-60%) but it diminish maximum fps (100%). Overall blocking eviction just make framerate more consistant. I then tried several heuristic on the eviction process (not evicting buffer if buffer was use in the last 1ms, 10ms, 20ms ..., sorting lru differently btw buffer used for rendering and auxiliary buffer use by kernel, ... none of those heuristic improved anything. I also removed bo wait in the eviction pipeline but still no improvement. Haven't time to look further but anyway bottom line is that some benchmark are memory tight and constant eviction hurt. (used unigine heaven and reaction quake for benchmark) I've came up with the following solution, which I think would help improve the situation a lot. We should prepare a list of command streams and one list of relocations for an entire frame, do buffer validation/placements for the entire frame at the beginning and then just render the whole frame (schedule all the command streams at once). That would minimize the buffer evictions and give us the ideal buffer placements for the whole frame and then the GPU would run the commands uninterrupted by other processes (and we don't have to flush caches so much). The only downsides are: - Buffers would be marked as busy for the entire frame, because the fence would only be at the end of the frame. We definitely need more fine-grained distribution of fences for apps which map buffers during rendering. One possible solution is to let userspace emit fences by itself and associate the fences with the buffers in the relocation list. The bo-wait mechanism would then use the fence from the (buffer, fence) pair, while TTM would use the end-of-frame fence (we can't trust the userspace giving us the right fences). - We should find out how to offload flushing and SwapBuffers to another thread, because the final CS ioctl will be really big. Currently, the radeon winsys doesn't offload the CS ioctl if it's in the SwapBuffers call. - Deferring to a single big flush like that might introduce additional latency before the GPU starts processing a frame and hurt some apps. Possible improvement: - The userspace should emit commands into a GPU buffer and not in the user memory, so that we don't have to do copy_from_user in the kernel. I expect the CS ioctl to unmap the GPU buffer and forbid later mapping as well as putting the buffer in the relocation list. Unmapping etc. shouldn't be necessary in the long run with GPUVM. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix abysmal performance in Reaction Quake
On Mon, Nov 12, 2012 at 12:23 PM, Christian König deathsim...@vodafone.de wrote: On 12.11.2012 11:08, Michel Dänzer wrote: On Sam, 2012-11-10 at 16:52 +0100, Marek Olšák wrote: On Fri, Nov 9, 2012 at 9:44 PM, Jerome Glisse j.gli...@gmail.com wrote: On Thu, Nov 01, 2012 at 03:13:31AM +0100, Marek Olšák wrote: On Thu, Nov 1, 2012 at 2:13 AM, Alex Deucher alexdeuc...@gmail.com wrote: On Wed, Oct 31, 2012 at 8:05 PM, Marek Olšák mar...@gmail.com wrote: The problem was we set VRAM|GTT for relocations of STATIC resources. Setting just VRAM increases the framerate 4 times on my machine. I rewrote the switch statement and adjusted the domains for window framebuffers too. Reviewed-by: Alex Deucher alexander.deuc...@amd.com Stable branches? Yes, good idea. Marek Btw as a follow up on this, i did some experiment with ttm and eviction. Blocking any vram eviction improve average fps (20-30%) and minimum fps (40-60%) but it diminish maximum fps (100%). Overall blocking eviction just make framerate more consistant. I then tried several heuristic on the eviction process (not evicting buffer if buffer was use in the last 1ms, 10ms, 20ms ..., sorting lru differently btw buffer used for rendering and auxiliary buffer use by kernel, ... none of those heuristic improved anything. I also removed bo wait in the eviction pipeline but still no improvement. Haven't time to look further but anyway bottom line is that some benchmark are memory tight and constant eviction hurt. (used unigine heaven and reaction quake for benchmark) I've came up with the following solution, which I think would help improve the situation a lot. We should prepare a list of command streams and one list of relocations for an entire frame, do buffer validation/placements for the entire frame at the beginning and then just render the whole frame (schedule all the command streams at once). That would minimize the buffer evictions and give us the ideal buffer placements for the whole frame and then the GPU would run the commands uninterrupted by other processes (and we don't have to flush caches so much). The only downsides are: - Buffers would be marked as busy for the entire frame, because the fence would only be at the end of the frame. We definitely need more fine-grained distribution of fences for apps which map buffers during rendering. One possible solution is to let userspace emit fences by itself and associate the fences with the buffers in the relocation list. The bo-wait mechanism would then use the fence from the (buffer, fence) pair, while TTM would use the end-of-frame fence (we can't trust the userspace giving us the right fences). - We should find out how to offload flushing and SwapBuffers to another thread, because the final CS ioctl will be really big. Currently, the radeon winsys doesn't offload the CS ioctl if it's in the SwapBuffers call. - Deferring to a single big flush like that might introduce additional latency before the GPU starts processing a frame and hurt some apps. Instead of fencing the buffers in userspace how about something like this for the kernel CS interface: RADEON_CHUNK_ID_IB RADEON_CHUNK_ID_IB RADEON_CHUNK_ID_IB RADEON_CHUNK_ID_IB RADEON_CHUNK_ID_RELOCS RADEON_CHUNK_ID_IB RADEON_CHUNK_ID_IB RADEON_CHUNK_ID_RELOCS RADEON_CHUNK_ID_IB RADEON_CHUNK_ID_IB RADEON_CHUNK_ID_RELOCS RADEON_CHUNK_ID_FLAGS Fences are only emitted at RADEON_CHUNK_ID_RELOCS borders, but the whole CS call is submitted as one single chunk of work and so all BOs get reserved and placed at once. That of course doesn't help with the higher latency before actually starting a frame, but I don't think that this would actually be such a big problem. The latency can add an input lag, which can negatively impact game experience especially in first person shooters. In the long run I think Radeon/TTM should plan buffer moves across several frames, i.e. an incremental approach that should eventually converge to the ideal state, i.e. a process generating the highest GPU load should get much better buffer placements than the idling processes after like 10-20 CS ioctls and with a guarantee that its buffers won't be evicted anytime soon. Also I think the domains in the relocation list should be ignored completely and only the initial domain from GEM_CREATE should be taken into account, because that's the domain the gallium driver wanted in the first place. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix abysmal performance in Reaction Quake
On Fri, Nov 9, 2012 at 9:44 PM, Jerome Glisse j.gli...@gmail.com wrote: On Thu, Nov 01, 2012 at 03:13:31AM +0100, Marek Olšák wrote: On Thu, Nov 1, 2012 at 2:13 AM, Alex Deucher alexdeuc...@gmail.com wrote: On Wed, Oct 31, 2012 at 8:05 PM, Marek Olšák mar...@gmail.com wrote: The problem was we set VRAM|GTT for relocations of STATIC resources. Setting just VRAM increases the framerate 4 times on my machine. I rewrote the switch statement and adjusted the domains for window framebuffers too. Reviewed-by: Alex Deucher alexander.deuc...@amd.com Stable branches? Yes, good idea. Marek Btw as a follow up on this, i did some experiment with ttm and eviction. Blocking any vram eviction improve average fps (20-30%) and minimum fps (40-60%) but it diminish maximum fps (100%). Overall blocking eviction just make framerate more consistant. I then tried several heuristic on the eviction process (not evicting buffer if buffer was use in the last 1ms, 10ms, 20ms ..., sorting lru differently btw buffer used for rendering and auxiliary buffer use by kernel, ... none of those heuristic improved anything. I also removed bo wait in the eviction pipeline but still no improvement. Haven't time to look further but anyway bottom line is that some benchmark are memory tight and constant eviction hurt. (used unigine heaven and reaction quake for benchmark) I've came up with the following solution, which I think would help improve the situation a lot. We should prepare a list of command streams and one list of relocations for an entire frame, do buffer validation/placements for the entire frame at the beginning and then just render the whole frame (schedule all the command streams at once). That would minimize the buffer evictions and give us the ideal buffer placements for the whole frame and then the GPU would run the commands uninterrupted by other processes (and we don't have to flush caches so much). The only downsides are: - Buffers would be marked as busy for the entire frame, because the fence would only be at the end of the frame. We definitely need more fine-grained distribution of fences for apps which map buffers during rendering. One possible solution is to let userspace emit fences by itself and associate the fences with the buffers in the relocation list. The bo-wait mechanism would then use the fence from the (buffer, fence) pair, while TTM would use the end-of-frame fence (we can't trust the userspace giving us the right fences). - We should find out how to offload flushing and SwapBuffers to another thread, because the final CS ioctl will be really big. Currently, the radeon winsys doesn't offload the CS ioctl if it's in the SwapBuffers call. Possible improvement: - The userspace should emit commands into a GPU buffer and not in the user memory, so that we don't have to do copy_from_user in the kernel. I expect the CS ioctl to unmap the GPU buffer and forbid later mapping as well as putting the buffer in the relocation list. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix abysmal performance in Reaction Quake
On Sat, Nov 10, 2012 at 10:52 AM, Marek Olšák mar...@gmail.com wrote: On Fri, Nov 9, 2012 at 9:44 PM, Jerome Glisse j.gli...@gmail.com wrote: On Thu, Nov 01, 2012 at 03:13:31AM +0100, Marek Olšák wrote: On Thu, Nov 1, 2012 at 2:13 AM, Alex Deucher alexdeuc...@gmail.com wrote: On Wed, Oct 31, 2012 at 8:05 PM, Marek Olšák mar...@gmail.com wrote: The problem was we set VRAM|GTT for relocations of STATIC resources. Setting just VRAM increases the framerate 4 times on my machine. I rewrote the switch statement and adjusted the domains for window framebuffers too. Reviewed-by: Alex Deucher alexander.deuc...@amd.com Stable branches? Yes, good idea. Marek Btw as a follow up on this, i did some experiment with ttm and eviction. Blocking any vram eviction improve average fps (20-30%) and minimum fps (40-60%) but it diminish maximum fps (100%). Overall blocking eviction just make framerate more consistant. I then tried several heuristic on the eviction process (not evicting buffer if buffer was use in the last 1ms, 10ms, 20ms ..., sorting lru differently btw buffer used for rendering and auxiliary buffer use by kernel, ... none of those heuristic improved anything. I also removed bo wait in the eviction pipeline but still no improvement. Haven't time to look further but anyway bottom line is that some benchmark are memory tight and constant eviction hurt. (used unigine heaven and reaction quake for benchmark) I've came up with the following solution, which I think would help improve the situation a lot. We should prepare a list of command streams and one list of relocations for an entire frame, do buffer validation/placements for the entire frame at the beginning and then just render the whole frame (schedule all the command streams at once). That would minimize the buffer evictions and give us the ideal buffer placements for the whole frame and then the GPU would run the commands uninterrupted by other processes (and we don't have to flush caches so much). Another possibility would be to allocate a small number of very large buffers and then sub-allocate in the 3D driver. That should alleviate some of the overhead in dealing with lots of small buffers in ttm and also reduce fragmentation. Alex The only downsides are: - Buffers would be marked as busy for the entire frame, because the fence would only be at the end of the frame. We definitely need more fine-grained distribution of fences for apps which map buffers during rendering. One possible solution is to let userspace emit fences by itself and associate the fences with the buffers in the relocation list. The bo-wait mechanism would then use the fence from the (buffer, fence) pair, while TTM would use the end-of-frame fence (we can't trust the userspace giving us the right fences). - We should find out how to offload flushing and SwapBuffers to another thread, because the final CS ioctl will be really big. Currently, the radeon winsys doesn't offload the CS ioctl if it's in the SwapBuffers call. Possible improvement: - The userspace should emit commands into a GPU buffer and not in the user memory, so that we don't have to do copy_from_user in the kernel. I expect the CS ioctl to unmap the GPU buffer and forbid later mapping as well as putting the buffer in the relocation list. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix abysmal performance in Reaction Quake
On Sun, Nov 11, 2012 at 1:52 AM, Marek Olšák mar...@gmail.com wrote: On Fri, Nov 9, 2012 at 9:44 PM, Jerome Glisse j.gli...@gmail.com wrote: On Thu, Nov 01, 2012 at 03:13:31AM +0100, Marek Olšák wrote: On Thu, Nov 1, 2012 at 2:13 AM, Alex Deucher alexdeuc...@gmail.com wrote: On Wed, Oct 31, 2012 at 8:05 PM, Marek Olšák mar...@gmail.com wrote: The problem was we set VRAM|GTT for relocations of STATIC resources. Setting just VRAM increases the framerate 4 times on my machine. I rewrote the switch statement and adjusted the domains for window framebuffers too. Reviewed-by: Alex Deucher alexander.deuc...@amd.com Stable branches? Yes, good idea. Marek Btw as a follow up on this, i did some experiment with ttm and eviction. Blocking any vram eviction improve average fps (20-30%) and minimum fps (40-60%) but it diminish maximum fps (100%). Overall blocking eviction just make framerate more consistant. I then tried several heuristic on the eviction process (not evicting buffer if buffer was use in the last 1ms, 10ms, 20ms ..., sorting lru differently btw buffer used for rendering and auxiliary buffer use by kernel, ... none of those heuristic improved anything. I also removed bo wait in the eviction pipeline but still no improvement. Haven't time to look further but anyway bottom line is that some benchmark are memory tight and constant eviction hurt. (used unigine heaven and reaction quake for benchmark) I've came up with the following solution, which I think would help improve the situation a lot. We should prepare a list of command streams and one list of relocations for an entire frame, do buffer validation/placements for the entire frame at the beginning and then just render the whole frame (schedule all the command streams at once). That would minimize the buffer evictions and give us the ideal buffer placements for the whole frame and then the GPU would run the commands uninterrupted by other processes (and we don't have to flush caches so much). I actually did something a bit similar once but it didn't show up as useful at the time, http://cgit.freedesktop.org/~airlied/linux/log/?h=radeon-cs-setup It meant one ioctl per frame, and it avoided flushes between ioctls, at the time I just tested it with OA and it made no difference, so I didn't pursue it further, there might be something in there to pick over, though the code has moved a fair bit since then. My only worry is with fragmentation, I expect we need to have an evict all validation in place (do we have this already? I'm not sure). - Buffers would be marked as busy for the entire frame, because the fence would only be at the end of the frame. We definitely need more fine-grained distribution of fences for apps which map buffers during rendering. One possible solution is to let userspace emit fences by itself and associate the fences with the buffers in the relocation list. The bo-wait mechanism would then use the fence from the (buffer, fence) pair, while TTM would use the end-of-frame fence (we can't trust the userspace giving us the right fences). - We should find out how to offload flushing and SwapBuffers to another thread, because the final CS ioctl will be really big. Currently, the radeon winsys doesn't offload the CS ioctl if it's in the SwapBuffers call. The Intel guys were looking at some of this as well, interactions with GLX and threads are v. messy. Possible improvement: - The userspace should emit commands into a GPU buffer and not in the user memory, so that we don't have to do copy_from_user in the kernel. I expect the CS ioctl to unmap the GPU buffer and forbid later mapping as well as putting the buffer in the relocation list. This I've gone back and forward on a few times, you'd be surprised how much overhead there is in mapping/unmapping, and also AGP would need the older paths to avoid it accessing uncached memory. Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix abysmal performance in Reaction Quake
On Thu, Nov 01, 2012 at 03:13:31AM +0100, Marek Olšák wrote: On Thu, Nov 1, 2012 at 2:13 AM, Alex Deucher alexdeuc...@gmail.com wrote: On Wed, Oct 31, 2012 at 8:05 PM, Marek Olšák mar...@gmail.com wrote: The problem was we set VRAM|GTT for relocations of STATIC resources. Setting just VRAM increases the framerate 4 times on my machine. I rewrote the switch statement and adjusted the domains for window framebuffers too. Reviewed-by: Alex Deucher alexander.deuc...@amd.com Stable branches? Yes, good idea. Marek Btw as a follow up on this, i did some experiment with ttm and eviction. Blocking any vram eviction improve average fps (20-30%) and minimum fps (40-60%) but it diminish maximum fps (100%). Overall blocking eviction just make framerate more consistant. I then tried several heuristic on the eviction process (not evicting buffer if buffer was use in the last 1ms, 10ms, 20ms ..., sorting lru differently btw buffer used for rendering and auxiliary buffer use by kernel, ... none of those heuristic improved anything. I also removed bo wait in the eviction pipeline but still no improvement. Haven't time to look further but anyway bottom line is that some benchmark are memory tight and constant eviction hurt. (used unigine heaven and reaction quake for benchmark) Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix abysmal performance in Reaction Quake
On Wed, Oct 31, 2012 at 8:05 PM, Marek Olšák mar...@gmail.com wrote: The problem was we set VRAM|GTT for relocations of STATIC resources. Setting just VRAM increases the framerate 4 times on my machine. I rewrote the switch statement and adjusted the domains for window framebuffers too. Reviewed-by: Alex Deucher alexander.deuc...@amd.com Stable branches? --- src/gallium/drivers/r600/r600_buffer.c | 42 --- src/gallium/drivers/r600/r600_texture.c |3 ++- 2 files changed, 24 insertions(+), 21 deletions(-) diff --git a/src/gallium/drivers/r600/r600_buffer.c b/src/gallium/drivers/r600/r600_buffer.c index f4566ee..116ab51 100644 --- a/src/gallium/drivers/r600/r600_buffer.c +++ b/src/gallium/drivers/r600/r600_buffer.c @@ -206,29 +206,31 @@ bool r600_init_resource(struct r600_screen *rscreen, { uint32_t initial_domain, domains; - /* Staging resources particpate in transfers and blits only -* and are used for uploads and downloads from regular -* resources. We generate them internally for some transfers. -*/ - if (usage == PIPE_USAGE_STAGING) { + switch(usage) { + case PIPE_USAGE_STAGING: + /* Staging resources participate in transfers, i.e. are used +* for uploads and downloads from regular resources. +* We generate them internally for some transfers. +*/ + initial_domain = RADEON_DOMAIN_GTT; domains = RADEON_DOMAIN_GTT; + break; + case PIPE_USAGE_DYNAMIC: + case PIPE_USAGE_STREAM: + /* Default to GTT, but allow the memory manager to move it to VRAM. */ initial_domain = RADEON_DOMAIN_GTT; - } else { domains = RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM; - - switch(usage) { - case PIPE_USAGE_DYNAMIC: - case PIPE_USAGE_STREAM: - case PIPE_USAGE_STAGING: - initial_domain = RADEON_DOMAIN_GTT; - break; - case PIPE_USAGE_DEFAULT: - case PIPE_USAGE_STATIC: - case PIPE_USAGE_IMMUTABLE: - default: - initial_domain = RADEON_DOMAIN_VRAM; - break; - } + break; + case PIPE_USAGE_DEFAULT: + case PIPE_USAGE_STATIC: + case PIPE_USAGE_IMMUTABLE: + default: + /* Don't list GTT here, because the memory manager would put some +* resources to GTT no matter what the initial domain is. +* Not listing GTT in the domains improves performance a lot. */ + initial_domain = RADEON_DOMAIN_VRAM; + domains = RADEON_DOMAIN_VRAM; + break; } res-buf = rscreen-ws-buffer_create(rscreen-ws, size, alignment, bind, initial_domain); diff --git a/src/gallium/drivers/r600/r600_texture.c b/src/gallium/drivers/r600/r600_texture.c index 785eeff..2df390d 100644 --- a/src/gallium/drivers/r600/r600_texture.c +++ b/src/gallium/drivers/r600/r600_texture.c @@ -421,9 +421,10 @@ r600_texture_create_object(struct pipe_screen *screen, return NULL; } } else if (buf) { + /* This is usually the window framebuffer. We want it in VRAM, always. */ resource-buf = buf; resource-cs_buf = rscreen-ws-buffer_get_cs_handle(buf); - resource-domains = RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM; + resource-domains = RADEON_DOMAIN_VRAM; } if (rtex-cmask_size) { -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix abysmal performance in Reaction Quake
On Wed, Oct 31, 2012 at 8:05 PM, Marek Olšák mar...@gmail.com wrote: The problem was we set VRAM|GTT for relocations of STATIC resources. Setting just VRAM increases the framerate 4 times on my machine. I rewrote the switch statement and adjusted the domains for window framebuffers too. Reviewed-by: Jerome Glisse jgli...@redhat.com --- src/gallium/drivers/r600/r600_buffer.c | 42 --- src/gallium/drivers/r600/r600_texture.c |3 ++- 2 files changed, 24 insertions(+), 21 deletions(-) diff --git a/src/gallium/drivers/r600/r600_buffer.c b/src/gallium/drivers/r600/r600_buffer.c index f4566ee..116ab51 100644 --- a/src/gallium/drivers/r600/r600_buffer.c +++ b/src/gallium/drivers/r600/r600_buffer.c @@ -206,29 +206,31 @@ bool r600_init_resource(struct r600_screen *rscreen, { uint32_t initial_domain, domains; - /* Staging resources particpate in transfers and blits only -* and are used for uploads and downloads from regular -* resources. We generate them internally for some transfers. -*/ - if (usage == PIPE_USAGE_STAGING) { + switch(usage) { + case PIPE_USAGE_STAGING: + /* Staging resources participate in transfers, i.e. are used +* for uploads and downloads from regular resources. +* We generate them internally for some transfers. +*/ + initial_domain = RADEON_DOMAIN_GTT; domains = RADEON_DOMAIN_GTT; + break; + case PIPE_USAGE_DYNAMIC: + case PIPE_USAGE_STREAM: + /* Default to GTT, but allow the memory manager to move it to VRAM. */ initial_domain = RADEON_DOMAIN_GTT; - } else { domains = RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM; - - switch(usage) { - case PIPE_USAGE_DYNAMIC: - case PIPE_USAGE_STREAM: - case PIPE_USAGE_STAGING: - initial_domain = RADEON_DOMAIN_GTT; - break; - case PIPE_USAGE_DEFAULT: - case PIPE_USAGE_STATIC: - case PIPE_USAGE_IMMUTABLE: - default: - initial_domain = RADEON_DOMAIN_VRAM; - break; - } + break; + case PIPE_USAGE_DEFAULT: + case PIPE_USAGE_STATIC: + case PIPE_USAGE_IMMUTABLE: + default: + /* Don't list GTT here, because the memory manager would put some +* resources to GTT no matter what the initial domain is. +* Not listing GTT in the domains improves performance a lot. */ + initial_domain = RADEON_DOMAIN_VRAM; + domains = RADEON_DOMAIN_VRAM; + break; } res-buf = rscreen-ws-buffer_create(rscreen-ws, size, alignment, bind, initial_domain); diff --git a/src/gallium/drivers/r600/r600_texture.c b/src/gallium/drivers/r600/r600_texture.c index 785eeff..2df390d 100644 --- a/src/gallium/drivers/r600/r600_texture.c +++ b/src/gallium/drivers/r600/r600_texture.c @@ -421,9 +421,10 @@ r600_texture_create_object(struct pipe_screen *screen, return NULL; } } else if (buf) { + /* This is usually the window framebuffer. We want it in VRAM, always. */ resource-buf = buf; resource-cs_buf = rscreen-ws-buffer_get_cs_handle(buf); - resource-domains = RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM; + resource-domains = RADEON_DOMAIN_VRAM; } if (rtex-cmask_size) { -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix abysmal performance in Reaction Quake
On Thu, Nov 1, 2012 at 2:13 AM, Alex Deucher alexdeuc...@gmail.com wrote: On Wed, Oct 31, 2012 at 8:05 PM, Marek Olšák mar...@gmail.com wrote: The problem was we set VRAM|GTT for relocations of STATIC resources. Setting just VRAM increases the framerate 4 times on my machine. I rewrote the switch statement and adjusted the domains for window framebuffers too. Reviewed-by: Alex Deucher alexander.deuc...@amd.com Stable branches? Yes, good idea. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev