Re: [Mesa-dev] [PATCH 00/13] Use the constant engine in radeonsi
On Thu, Apr 14, 2016 at 8:29 PM, Nicolai Hähnlewrote: > On 13.04.2016 20:34, Bas Nieuwenhuizen wrote: >> >> This series implements updating descriptors using the constant engine. >> This should result in a 0%-3% improvement for CPU bound applications, >> as we only have to upload the change descriptors from the CPU. > > > Thanks again for looking at this! > >> There are very slight performance advantages on the GPU too, as the >> CE uploads the data directly to the L2 cache. This series reduces the >> number of cycles waited on LGKM counters by about 1/6th across multiple >> applications. However this translates in almost negligible performance >> improvements. >> >> Only the amdgpu winsys supports this for now. I think the radeon >> winsys can be extended to support it, but I don't have any radeon >> hardware so it is probably better that someone else implements that. >> >> This series does not include support for partially updating >> uniforms. My gallium interface proof of concept became kind of messy >> and the constant engine patches have been requested for descriptors. > > > Interesting coincidence that Jakob wrote an email about "Rework uniform > storage" on the same day... > > Two related general observations on the series: > > 1. The CE also has a LOAD_CONST_RAM command. Instead of writing all > descriptors to the CE IB at CS begin, perhaps you could save space and CPU > time by using that for the first upload per CS? Though it also might make > sense to have this depend on the dirty_mask population count. > > 2. It seems like it should be possible to get rid of the list_dirty flag > now, but I actually don't think that's true, because AFAICS doing that on > top of this series will force a re-upload of all descriptors on every new > CS, which is clearly bad. Still, logically there really shouldn't be a > list_dirty _and_ a dirty_mask. > > So how about a dirty_mask and a ce_ram_dirty flag? Do not set the dirty_mask > on begin_new_cs, and use the dirty_mask to gate descriptor uploads. Set the > ce_ram_dirty flag on begin_new_cs, and use it to either (a) override the > dirty_mask as appropriate for the first descriptor upload in each CS or (b) > emit a LOAD_CONST_RAM command. > > I'd appreciate if you could address these points - I don't feel too strongly > about the LOAD_CONST_RAM optimization, but having both list_dirty and > dirty_mask is weird. The AMDGPU kernel driver support the preamble CE IB, which is conditionally executed before the CE IB. It was designed to skip CE RAM initialization at the beginning of IBs. The usage should be: The preamble CE IB should initialize CE RAM from memory (LOAD_CONST_RAM). The main CE & DE IBs should assume CE RAM is initialized. The kernel driver doesn't schedule the preamble CE IB if the command submission immediately follows a previous submission from the same context. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/13] Use the constant engine in radeonsi
On 13.04.2016 20:34, Bas Nieuwenhuizen wrote: This series implements updating descriptors using the constant engine. This should result in a 0%-3% improvement for CPU bound applications, as we only have to upload the change descriptors from the CPU. Thanks again for looking at this! There are very slight performance advantages on the GPU too, as the CE uploads the data directly to the L2 cache. This series reduces the number of cycles waited on LGKM counters by about 1/6th across multiple applications. However this translates in almost negligible performance improvements. Only the amdgpu winsys supports this for now. I think the radeon winsys can be extended to support it, but I don't have any radeon hardware so it is probably better that someone else implements that. This series does not include support for partially updating uniforms. My gallium interface proof of concept became kind of messy and the constant engine patches have been requested for descriptors. Interesting coincidence that Jakob wrote an email about "Rework uniform storage" on the same day... Two related general observations on the series: 1. The CE also has a LOAD_CONST_RAM command. Instead of writing all descriptors to the CE IB at CS begin, perhaps you could save space and CPU time by using that for the first upload per CS? Though it also might make sense to have this depend on the dirty_mask population count. 2. It seems like it should be possible to get rid of the list_dirty flag now, but I actually don't think that's true, because AFAICS doing that on top of this series will force a re-upload of all descriptors on every new CS, which is clearly bad. Still, logically there really shouldn't be a list_dirty _and_ a dirty_mask. So how about a dirty_mask and a ce_ram_dirty flag? Do not set the dirty_mask on begin_new_cs, and use the dirty_mask to gate descriptor uploads. Set the ce_ram_dirty flag on begin_new_cs, and use it to either (a) override the dirty_mask as appropriate for the first descriptor upload in each CS or (b) emit a LOAD_CONST_RAM command. I'd appreciate if you could address these points - I don't feel too strongly about the LOAD_CONST_RAM optimization, but having both list_dirty and dirty_mask is weird. Thanks, Nicolai Bas Nieuwenhuizen (10): winsys/amdgpu: Enlarge const IB size. radeonsi: Create CE IB. radeonsi: Add dirty_mask to descriptor list. radeonsi: Add CE packet definitions. radeonsi: Add CE synchronization. radeonsi: Allocate chunks of CE ram. radeonsi: Add CE uploader. radeonsi: Use CE for vertex buffers. gallium/util: Add u_bit_scan_consecutive_range64. radeonsi: Use CE for all descriptors. Marek Olšák (3): gallium/radeon: move ring_type into winsyses winsys/amdgpu: split IB data into a new structure in preparation for CE winsys/amdgpu: add support for const IB src/gallium/auxiliary/util/u_math.h | 8 ++ src/gallium/drivers/radeon/r600_pipe_common.c | 1 + src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/gallium/drivers/radeon/radeon_winsys.h| 19 +++- src/gallium/drivers/radeonsi/si_descriptors.c | 149 +- src/gallium/drivers/radeonsi/si_hw_context.c | 4 +- src/gallium/drivers/radeonsi/si_pipe.c| 18 src/gallium/drivers/radeonsi/si_pipe.h| 6 ++ src/gallium/drivers/radeonsi/si_state.h | 4 + src/gallium/drivers/radeonsi/si_state_draw.c | 24 + src/gallium/drivers/radeonsi/sid.h| 6 ++ src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 5 - src/gallium/winsys/amdgpu/drm/amdgpu_bo.h | 6 ++ src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 125 ++--- src/gallium/winsys/amdgpu/drm/amdgpu_cs.h | 26 +++-- src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 10 +- src/gallium/winsys/radeon/drm/radeon_drm_cs.h | 1 + 17 files changed, 326 insertions(+), 87 deletions(-) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/13] Use the constant engine in radeonsi
This series implements updating descriptors using the constant engine. This should result in a 0%-3% improvement for CPU bound applications, as we only have to upload the change descriptors from the CPU. There are very slight performance advantages on the GPU too, as the CE uploads the data directly to the L2 cache. This series reduces the number of cycles waited on LGKM counters by about 1/6th across multiple applications. However this translates in almost negligible performance improvements. Only the amdgpu winsys supports this for now. I think the radeon winsys can be extended to support it, but I don't have any radeon hardware so it is probably better that someone else implements that. This series does not include support for partially updating uniforms. My gallium interface proof of concept became kind of messy and the constant engine patches have been requested for descriptors. Bas Nieuwenhuizen (10): winsys/amdgpu: Enlarge const IB size. radeonsi: Create CE IB. radeonsi: Add dirty_mask to descriptor list. radeonsi: Add CE packet definitions. radeonsi: Add CE synchronization. radeonsi: Allocate chunks of CE ram. radeonsi: Add CE uploader. radeonsi: Use CE for vertex buffers. gallium/util: Add u_bit_scan_consecutive_range64. radeonsi: Use CE for all descriptors. Marek Olšák (3): gallium/radeon: move ring_type into winsyses winsys/amdgpu: split IB data into a new structure in preparation for CE winsys/amdgpu: add support for const IB src/gallium/auxiliary/util/u_math.h | 8 ++ src/gallium/drivers/radeon/r600_pipe_common.c | 1 + src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/gallium/drivers/radeon/radeon_winsys.h| 19 +++- src/gallium/drivers/radeonsi/si_descriptors.c | 149 +- src/gallium/drivers/radeonsi/si_hw_context.c | 4 +- src/gallium/drivers/radeonsi/si_pipe.c| 18 src/gallium/drivers/radeonsi/si_pipe.h| 6 ++ src/gallium/drivers/radeonsi/si_state.h | 4 + src/gallium/drivers/radeonsi/si_state_draw.c | 24 + src/gallium/drivers/radeonsi/sid.h| 6 ++ src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 5 - src/gallium/winsys/amdgpu/drm/amdgpu_bo.h | 6 ++ src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 125 ++--- src/gallium/winsys/amdgpu/drm/amdgpu_cs.h | 26 +++-- src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 10 +- src/gallium/winsys/radeon/drm/radeon_drm_cs.h | 1 + 17 files changed, 326 insertions(+), 87 deletions(-) -- 2.8.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev