from:"Niels Ole Salscheider"

Re: [Mesa-dev] [PATCH] st/clover: Define __OPENCL_VERSION__ on the device side

2016-09-10 Thread Niels Ole Salscheider

On Wednesday, 31 August 2016, 15:53:05 CEST, Serge Martin wrote:
> On Wednesday 31 August 2016 12:39:23 Vedran Miletić wrote:
> > On 08/28/2016 04:42 PM, Niels Ole Salscheider wrote:
> > > This is required by the OpenCL standard.
> > > 
> > > Signed-off-by: Niels Ole Salscheider <niels_...@salscheider-online.de>
> > 
> > Reviewed-by: Vedran Miletić <ved...@miletic.net>
> > 
> > Good catch. Do we miss more defines from [1]?
> 
> I think __IMAGE_SUPPORT__ and __EMBEDDED_PROFILE__ should be managed by
> Clover too but none off them would be ever define wit our current feature
> level, so this is ok.
> 
> I think __ENDIAN_LITTLE__ is missing.
> 
> Anyway, adding some piglit tests would be nice :)

I have posted a patch with a piglit test. Can somebody push this for me?

> Serge
> 
> > Regards,
> > Vedran
> > 
> > [1]
> > https://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/preprocessorDir
> > ectives.html> 
> > > ---
> > > 
> > >  src/gallium/state_trackers/clover/llvm/invocation.cpp | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp
> > > b/src/gallium/state_trackers/clover/llvm/invocation.cpp index
> > > 5490d72..b5e8b52 100644
> > > --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
> > > +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> > > @@ -153,6 +153,9 @@ namespace {
> > > 
> > >// Add libclc include
> > >c.getPreprocessorOpts().Includes.push_back("clc/clc.h");
> > > 
> > > +  // Add definition for the OpenCL version
> > > +  c.getPreprocessorOpts().addMacroDef("__OPENCL_VERSION__=110");
> > > +
> > > 
> > >// clc.h requires that this macro be defined:
> > >c.getPreprocessorOpts().addMacroDef("cl_clang_storage_class_speci
> > >fiers");
> > >c.getPreprocessorOpts().addRemappedFile(


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] st/clover: Define __OPENCL_VERSION__ on the device side

2016-08-28 Thread Niels Ole Salscheider

This is required by the OpenCL standard.

Signed-off-by: Niels Ole Salscheider <niels_...@salscheider-online.de>
---
 src/gallium/state_trackers/clover/llvm/invocation.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
b/src/gallium/state_trackers/clover/llvm/invocation.cpp
index 5490d72..b5e8b52 100644
--- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
+++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
@@ -153,6 +153,9 @@ namespace {
   // Add libclc include
   c.getPreprocessorOpts().Includes.push_back("clc/clc.h");
 
+  // Add definition for the OpenCL version
+  c.getPreprocessorOpts().addMacroDef("__OPENCL_VERSION__=110");
+
   // clc.h requires that this macro be defined:
   c.getPreprocessorOpts().addMacroDef("cl_clang_storage_class_specifiers");
   c.getPreprocessorOpts().addRemappedFile(
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] winsys/radeon: Do not deinit the pb cache if it was not initialized

2016-01-29 Thread Niels Ole Salscheider

This fixes a crash in pb_cache_release_all_buffers.

Signed-off-by: Niels Ole Salscheider <niels_...@salscheider-online.de>
---
 src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
index 8a1ed3a..4823bf3 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
@@ -742,7 +742,7 @@ radeon_drm_winsys_create(int fd, radeon_screen_create_t 
screen_create)
 ws->fd = dup(fd);
 
 if (!do_winsys_init(ws))
-goto fail;
+goto fail1;
 
 pb_cache_init(>bo_cache, 50, 2.0f, 0,
   MIN2(ws->info.vram_size, ws->info.gart_size),
@@ -812,8 +812,9 @@ radeon_drm_winsys_create(int fd, radeon_screen_create_t 
screen_create)
 return >base;
 
 fail:
-pipe_mutex_unlock(fd_tab_mutex);
 pb_cache_deinit(>bo_cache);
+fail1:
+pipe_mutex_unlock(fd_tab_mutex);
 if (ws->surf_man)
 radeon_surface_manager_free(ws->surf_man);
 if (ws->fd >= 0)
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] configure: Link against all LLVM targets when building clover

2015-01-24 Thread Niels Ole Salscheider

Since 8e7df519bd8556591794b2de08a833a67e34d526, we initialise all targets in
clover. This fixes bug 85380.

v2: Mention correct bug in commit message

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 configure.ac | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index 1cce517..2b7f576 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1902,7 +1902,7 @@ if test x$enable_gallium_llvm = xyes; then
 fi
 
 if test x$enable_opencl = xyes; then
-LLVM_COMPONENTS=${LLVM_COMPONENTS} ipo linker instrumentation
+LLVM_COMPONENTS=${LLVM_COMPONENTS} all-targets ipo linker 
instrumentation
 # LLVM 3.3 = 177971 requires IRReader
 if $LLVM_CONFIG --components | grep -qw 'irreader'; then
 LLVM_COMPONENTS=${LLVM_COMPONENTS} irreader
-- 
2.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] configure: Link against all LLVM targets when building clover

2015-01-24 Thread Niels Ole Salscheider

On Saturday 24 January 2015, 18:24:16, Jan Vesely wrote:
 On Sat, 2015-01-24 at 22:49 +0100, Niels Ole Salscheider wrote:
  Since 8e7df519bd8556591794b2de08a833a67e34d526, we initialise all targets
  in clover. This fixes bug 85380.
  
  v2: Mention correct bug in commit message
  
  Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
 
 I thought you already had Tom's rb.
 you can add mine as well
 Reviewed-by: Jan Vesely jan.ves...@rutgers.edu

Ok, thanks. But I do not have write access to mesa - would you mind to push it 
for me?

 
  ---
  
   configure.ac | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
  
  diff --git a/configure.ac b/configure.ac
  index 1cce517..2b7f576 100644
  --- a/configure.ac
  +++ b/configure.ac
  @@ -1902,7 +1902,7 @@ if test x$enable_gallium_llvm = xyes; then
  
   fi
   
   if test x$enable_opencl = xyes; then
  
  -LLVM_COMPONENTS=${LLVM_COMPONENTS} ipo linker
  instrumentation +LLVM_COMPONENTS=${LLVM_COMPONENTS}
  all-targets ipo linker instrumentation 
   # LLVM 3.3 = 177971 requires IRReader
   if $LLVM_CONFIG --components | grep -qw 'irreader'; then
   
   LLVM_COMPONENTS=${LLVM_COMPONENTS} irreader
 
 --
 Jan Vesely jan.ves...@rutgers.edu

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] configure: Link against all LLVM targets when building clover

2015-01-22 Thread Niels Ole Salscheider

On Thursday 22 January 2015, 13:46:14, Jan Vesely wrote:
 On Thu, 2015-01-22 at 16:45 +, Emil Velikov wrote:
  On 15/01/15 21:38, Tom Stellard wrote:
   On Thu, Jan 15, 2015 at 07:25:56PM +0100, Niels Ole Salscheider wrote:
   Since 8e7df519bd8556591794b2de08a833a67e34d526, we initialise all
   targets in clover. This fixes bug 85189.
   
   Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
   
   Reviewed-by: Tom Stellard thomas.stell...@amd.com
  
  Hi Niels,
  
  Can you confirm if this is needed for the 10.4 branch ? The commit
  mentioned got in the 10.4 devel cycle.
  
  Also the bug mentioned
  (https://bugs.freedesktop.org/show_bug.cgi?id=85189) seems to have
  alternative fix which is already in master. I take that this fix is
  required when building with static llvm ?
 
 the patch looks like it fixes
 https://bugs.freedesktop.org/show_bug.cgi?id=85380
 instead of 85189

Yes, Jan is right. This patch fixes bug 85380 instead of 85189 - this was 
probably a copypaste error.

This patch is relevant for the 10.4 branch, too, since commit 
8e7df519bd8556591794b2de08a833a67e34d526 is in it.

Ole

 jan
 
  Thanks
  Emil
  
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] configure: Link against all LLVM targets when building clover

2015-01-15 Thread Niels Ole Salscheider

Since 8e7df519bd8556591794b2de08a833a67e34d526, we initialise all targets in
clover. This fixes bug 85189.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 configure.ac | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index c72fe92..1761c32 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1732,7 +1732,7 @@ if test x$enable_gallium_llvm = xyes; then
 fi
 
 if test x$enable_opencl = xyes; then
-LLVM_COMPONENTS=${LLVM_COMPONENTS} ipo linker instrumentation
+LLVM_COMPONENTS=${LLVM_COMPONENTS} all-targets ipo linker 
instrumentation
 # LLVM 3.3 = 177971 requires IRReader
 if $LLVM_CONFIG --components | grep -qw 'irreader'; then
 LLVM_COMPONENTS=${LLVM_COMPONENTS} irreader
-- 
2.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] r600: Use DMA transfers in r600_copy_global_buffer

2014-09-28 Thread Niels Ole Salscheider

On Sunday 28 September 2014, 17:44:53, Bruno Jimenez wrote:
 Hi,
 
 Sorry for not answering until now, but I have had some personal issues
 (changing university, moving to another city...)
 
 As you said, this is used from clover's resource::copy, which is used by
 clEnqueueCopyBuffer if I remember correctly (and understand correctly
 clover) If it doesn't regress any piglit test then it has my R-b :)
 
 Thanks a lot!
 Bruno

Hi,

no problem, I have been a bit busy with my thesis anyway (I have to hand it in 
on Tuesday)...

You are right, it is used from clEnqueueCopyBuffer - and it does not regress 
any piglit tests for me.
Can someone with write access please push this?

Thanks,

Ole

 On Mon, 2014-09-08 at 20:10 +0200, Niels Ole Salscheider wrote:
  v2: Do not demote items that are already in the pool
  
  Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
  ---
  
   src/gallium/drivers/r600/evergreen_compute.h |  1 +
   src/gallium/drivers/r600/r600_blit.c | 59
    2 files changed, 43 insertions(+), 17
   deletions(-)
  
  diff --git a/src/gallium/drivers/r600/evergreen_compute.h
  b/src/gallium/drivers/r600/evergreen_compute.h index 4fb53a1..e4d3a38
  100644
  --- a/src/gallium/drivers/r600/evergreen_compute.h
  +++ b/src/gallium/drivers/r600/evergreen_compute.h
  @@ -45,6 +45,7 @@ void evergreen_init_atom_start_compute_cs(struct
  r600_context *rctx); 
   void evergreen_init_compute_state_functions(struct r600_context *rctx);
   void evergreen_emit_cs_shader(struct r600_context *rctx, struct r600_atom
   * atom); 
  +struct r600_resource* r600_compute_buffer_alloc_vram(struct r600_screen
  *screen, unsigned size); 
   struct pipe_resource *r600_compute_global_buffer_create(struct
   pipe_screen *screen, const struct pipe_resource *templ); void
   r600_compute_global_buffer_destroy(struct pipe_screen *screen, struct
   pipe_resource *res); void *r600_compute_global_transfer_map(
  
  diff --git a/src/gallium/drivers/r600/r600_blit.c
  b/src/gallium/drivers/r600/r600_blit.c index f766e37..b334a75 100644
  --- a/src/gallium/drivers/r600/r600_blit.c
  +++ b/src/gallium/drivers/r600/r600_blit.c
  @@ -21,6 +21,8 @@
  
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
   
   #include r600_pipe.h
  
  +#include compute_memory_pool.h
  +#include evergreen_compute.h
  
   #include util/u_surface.h
   #include util/u_format.h
   #include evergreend.h
  
  @@ -514,29 +516,52 @@ static void r600_copy_buffer(struct pipe_context
  *ctx, struct pipe_resource *dst 
* into a single global resource (r600_screen::global_pool).  The means
* they don't have their own cs_buf handle, so they cannot be passed
* to r600_copy_buffer() and must be handled separately.
  
  - *
  - * XXX: It should be possible to implement this function using
  - * r600_copy_buffer() by passing the memory_pool resource as both src
  - * and dst and updating dstx and src_box to point to the correct offsets.
  - * This would likely perform better than the current implementation.
  
*/
   
   static void r600_copy_global_buffer(struct pipe_context *ctx,
   
  struct pipe_resource *dst, unsigned
  dstx, struct pipe_resource *src,
  const struct pipe_box *src_box)
   
   {
  
  -   struct pipe_box dst_box; struct pipe_transfer *src_pxfer,
  -   *dst_pxfer;
  -
  -   u_box_1d(dstx, src_box-width, dst_box);
  -   void *src_ptr = ctx-transfer_map(ctx, src, 0, PIPE_TRANSFER_READ,
  - src_box, src_pxfer);
  -   void *dst_ptr = ctx-transfer_map(ctx, dst, 0, PIPE_TRANSFER_WRITE,
  - dst_box, dst_pxfer);
  -   memcpy(dst_ptr, src_ptr, src_box-width);
  -
  -   ctx-transfer_unmap(ctx, src_pxfer);
  -   ctx-transfer_unmap(ctx, dst_pxfer);
  +   struct r600_context *rctx = (struct r600_context*)ctx;
  +   struct compute_memory_pool *pool = rctx-screen-global_pool;
  +   struct pipe_box new_src_box = *src_box;
  +
  +   if (src-bind  PIPE_BIND_GLOBAL) {
  +   struct r600_resource_global *rsrc =
  +   (struct r600_resource_global *)src;
  +   struct compute_memory_item *item = rsrc-chunk;
  +
  +   if (is_item_in_pool(item)) {
  +   new_src_box.x += 4 * item-start_in_dw;
  +   src = (struct pipe_resource *)pool-bo;
  +   } else {
  +   if (item-real_buffer == NULL) {
  +   item-real_buffer = (struct r600_resource*)
  +   
  r600_compute_buffer_alloc_vram(pool-screen,
  +  
  item-size_in_dw * 4);
  +   }
  +   src = (struct pipe_resource*)item-real_buffer;
  +   }
  +   }
  +   if (dst-bind  PIPE_BIND_GLOBAL) {
  +   struct r600_resource_global *rdst

Re: [Mesa-dev] [PATCH 1/4] radeonsi/compute: directly emit CONTEXT_CONTROL

2014-09-22 Thread Niels Ole Salscheider

On Monday 22 September 2014, 12:16:13, Alex Deucher wrote:
 On Sat, Sep 20, 2014 at 6:11 AM, Marek Olšák mar...@gmail.com wrote:
  From: Marek Olšák marek.ol...@amd.com
 
 Looks good.  Tom should probably take a look as well.  As a further
 improvement, it would be nice to be able to use the compute rings for
 compute rather than gfx, but I'm not sure how much additional effort
 it would take to clean that up.

This is completely untested but now that we can detect compute contexts 
something like the attached patches might be sufficient...

 Reviewed-by: Alex Deucher alexander.deuc...@amd.com
 
  ---
  
   src/gallium/drivers/radeonsi/si_compute.c | 6 +-
   1 file changed, 5 insertions(+), 1 deletion(-)
  
  diff --git a/src/gallium/drivers/radeonsi/si_compute.c
  b/src/gallium/drivers/radeonsi/si_compute.c index 4b2662d..3ad9182 100644
  --- a/src/gallium/drivers/radeonsi/si_compute.c
  +++ b/src/gallium/drivers/radeonsi/si_compute.c
  @@ -168,6 +168,7 @@ static void si_launch_grid(
  
  uint32_t pc, const void *input)
   
   {
   
  struct si_context *sctx = (struct si_context*)ctx;
  
  +   struct radeon_winsys_cs *cs = sctx-b.rings.gfx.cs;
  
  struct si_compute *program = sctx-cs_shader_state.program;
  struct si_pm4_state *pm4 = CALLOC_STRUCT(si_pm4_state);
  struct r600_resource *input_buffer = program-input_buffer;
  
  @@ -184,8 +185,11 @@ static void si_launch_grid(
  
  unsigned lds_blocks;
  unsigned num_waves_for_scratch;
  
  +   radeon_emit(cs, PKT3(PKT3_CONTEXT_CONTROL, 1, 0) |
  PKT3_SHADER_TYPE_S(1)); +   radeon_emit(cs, 0x8000);
  +   radeon_emit(cs, 0x8000);
  +
  
  pm4-compute_pkt = true;
  
  -   si_cmd_context_control(pm4);
  
  si_pm4_cmd_begin(pm4, PKT3_EVENT_WRITE);
  si_pm4_cmd_add(pm4, EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH) |
  
  --
  1.9.1
  
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
From 9714d3ee55ee0ddb0bcf63934b552df641b866a2 Mon Sep 17 00:00:00 2001
From: Niels Ole Salscheider niels_...@salscheider-online.de
Date: Mon, 22 Sep 2014 19:41:20 +0200
Subject: [PATCH 1/2] radeon: submit compute packets to the compute ring

They have been submitted to the gfx ring since
764502b481e2288cb5e751de739253fdee886e3e.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/radeon/r600_pipe_common.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c
index ae203b6..0d9ce17 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -174,6 +174,9 @@ static void r600_flush_from_st(struct pipe_context *ctx,
 	if (flags  PIPE_FLUSH_END_OF_FRAME)
 		rflags |= RADEON_FLUSH_END_OF_FRAME;
 
+	if (rctx-flags  R600_CONTEXT_FLAG_COMPUTE)
+		rflags |= RADEON_FLUSH_COMPUTE;
+
 	if (rctx-rings.dma.cs) {
 		rctx-rings.dma.flush(rctx, rflags, NULL);
 	}
-- 
2.1.0

From e578f9c067de68e9401f798a78c1ed785ceb1137 Mon Sep 17 00:00:00 2001
From: Niels Ole Salscheider niels_...@salscheider-online.de
Date: Mon, 22 Sep 2014 19:57:52 +0200
Subject: [PATCH 2/2] r600: set R600_CONTEXT_FLAG_COMPUTE in compute_emit_cs

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/r600/evergreen_compute.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c
index 38b78c7..03118e1 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -420,7 +420,9 @@ static void compute_emit_cs(struct r600_context *ctx, const uint *block_layout,
 	 */
 	r600_emit_command_buffer(cs, ctx-start_compute_cs_cmd);
 
-	ctx-b.flags |= R600_CONTEXT_WAIT_3D_IDLE | R600_CONTEXT_FLUSH_AND_INV;
+	ctx-b.flags |= R600_CONTEXT_WAIT_3D_IDLE |
+	  R600_CONTEXT_FLUSH_AND_INV |
+	  R600_CONTEXT_FLAG_COMPUTE;
 	r600_flush_emit(ctx);
 
 	/* Emit colorbuffers. */
@@ -485,7 +487,8 @@ static void compute_emit_cs(struct r600_context *ctx, const uint *block_layout,
 	 */
 	ctx-b.flags |= R600_CONTEXT_INV_CONST_CACHE |
 		  R600_CONTEXT_INV_VERTEX_CACHE |
-	  R600_CONTEXT_INV_TEX_CACHE;
+	  R600_CONTEXT_INV_TEX_CACHE |
+	  R600_CONTEXT_FLAG_COMPUTE;
 	r600_flush_emit(ctx);
 	ctx-b.flags = 0;
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] r600: Use DMA transfers in r600_copy_global_buffer

2014-09-09 Thread Niels Ole Salscheider

On Tuesday 09 September 2014, 11:40:49, Bruno Jimenez wrote:
 On Mon, 2014-09-08 at 18:30 +0200, Niels Ole Salscheider wrote:
  On Monday 08 September 2014, 15:19:15, Bruno Jimenez wrote:
   Hi,
   
   I'm not sure if this will work. Imagine this case:
   
   We  have an item in the pool, and we want to use
   r600_resource_copy_region with it, for example because we want to demote
   it. This will call r600_copy_global_buffer, and with your patch it will
   call r600_compute_global_demote_or_alloc, which will again call
   compute_memory_demote_item causing an infinite cycle.
  
  I think this will not be a problem because neither the pool bo nor the
  real_buffer will have the PIPE_BIND_GLOBAL flag. Therefore,
  r600_compute_global_demote_or_alloc will not be called again.
 
 Hi,
 
 You are completely right, for a moment I thought that the resources
 associated with the items also had the PIPE_BIND_GLOBAL flag.
 
 Then I think that this code isn't truly necessary, as every call to
 resource_copy_region related with compute items is done to the
 r600_resources directly without touchin the global resources.
 
   Also, why are you reassigning src and dst in r600_copy_global_buffer?
  
  For r600, resources with PIPE_BIND_GLOBAL are not real resources but only
  correspond to items in the compute pool. There they can either have the
  real_buffer bo when they should be mapped or be part of the pool bo.
  Therefore the pipe_resources have to be reassigned accordingly.
 
 You are right again. I'm not thinking clearly lately, sorry.
 
  I am however not sure if it is really necessary to demote the item from
  the
  pool before copying data to it. Otherwise it would be possible to directly
  access the pool bo if the item is already in it.
 
 I hope that it isn't necesary to demote the items for this. But, as I
 have said, resource_copy_region isn't called with r600_resource_globals
 (as far as I know)

Yes, I have sent an updated patch to the list yesterday that does not demote 
the item.

This code is used, though. resource_copy_region is called from clover's 
resource::copy with global compute resources as arguments.

 Hopefully, I haven't said any other dumb thing.
 
 Thanks!
 Bruno
 
   - Bruno
   
   On Sun, 2014-09-07 at 18:32 +0200, Niels Ole Salscheider wrote:
Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---

 src/gallium/drivers/r600/evergreen_compute.c | 27 ---
 src/gallium/drivers/r600/evergreen_compute.h |  1 +
 src/gallium/drivers/r600/r600_blit.c | 40
  3 files changed, 41 insertions(+), 27
 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c
b/src/gallium/drivers/r600/evergreen_compute.c index 38b78c7..b495868
100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -953,6 +953,22 @@ void r600_compute_global_buffer_destroy(

free(res);
 
 }

+void r600_compute_global_demote_or_alloc(
+   struct compute_memory_pool *pool,
+   struct compute_memory_item *item,
+   struct pipe_context *ctx)
+{
+   if (is_item_in_pool(item)) {
+   compute_memory_demote_item(pool, item, ctx);
+   } else {
+   if (item-real_buffer == NULL) {
+   item-real_buffer = (struct r600_resource*)
+   
r600_compute_buffer_alloc_vram(pool-screen, item-
  
  size_in_dw * 4);
  
+   }
+   }
+
+}
+

 void *r600_compute_global_transfer_map(
 
struct pipe_context *ctx_,
struct pipe_resource *resource,

@@ -970,16 +986,7 @@ void *r600_compute_global_transfer_map(

struct pipe_resource *dst = NULL;
unsigned offset = box-x;

-   if (is_item_in_pool(item)) {
-   compute_memory_demote_item(pool, item, ctx_);
-   }
-   else {
-   if (item-real_buffer == NULL) {
-   item-real_buffer = (struct r600_resource*)
-   
r600_compute_buffer_alloc_vram(pool-screen, item-
  
  size_in_dw * 4);
  
-   }
-   }
-
+   r600_compute_global_demote_or_alloc(pool, item, ctx_);

dst = (struct pipe_resource*)item-real_buffer;

if (usage  PIPE_TRANSFER_READ)

diff --git a/src/gallium/drivers/r600/evergreen_compute.h
b/src/gallium/drivers/r600/evergreen_compute.h index 4fb53a1..39bb854
100644
--- a/src/gallium/drivers/r600/evergreen_compute.h
+++ b/src/gallium/drivers/r600/evergreen_compute.h
@@ -47,6 +47,7 @@ void evergreen_emit_cs_shader(struct r600_context
*rctx,
struct r600_atom * atom

 struct pipe_resource

Re: [Mesa-dev] [PATCH] r600: Use DMA transfers in r600_copy_global_buffer

2014-09-08 Thread Niels Ole Salscheider

On Monday 08 September 2014, 15:19:15, Bruno Jimenez wrote:
 Hi,
 
 I'm not sure if this will work. Imagine this case:
 
 We  have an item in the pool, and we want to use
 r600_resource_copy_region with it, for example because we want to demote
 it. This will call r600_copy_global_buffer, and with your patch it will
 call r600_compute_global_demote_or_alloc, which will again call
 compute_memory_demote_item causing an infinite cycle.

I think this will not be a problem because neither the pool bo nor the 
real_buffer will have the PIPE_BIND_GLOBAL flag. Therefore, 
r600_compute_global_demote_or_alloc will not be called again.

 Also, why are you reassigning src and dst in r600_copy_global_buffer?

For r600, resources with PIPE_BIND_GLOBAL are not real resources but only 
correspond to items in the compute pool. There they can either have the 
real_buffer bo when they should be mapped or be part of the pool bo. 
Therefore the pipe_resources have to be reassigned accordingly.

I am however not sure if it is really necessary to demote the item from the 
pool before copying data to it. Otherwise it would be possible to directly 
access the pool bo if the item is already in it.

 - Bruno
 
 On Sun, 2014-09-07 at 18:32 +0200, Niels Ole Salscheider wrote:
  Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
  ---
  
   src/gallium/drivers/r600/evergreen_compute.c | 27 ---
   src/gallium/drivers/r600/evergreen_compute.h |  1 +
   src/gallium/drivers/r600/r600_blit.c | 40
    3 files changed, 41 insertions(+), 27
   deletions(-)
  
  diff --git a/src/gallium/drivers/r600/evergreen_compute.c
  b/src/gallium/drivers/r600/evergreen_compute.c index 38b78c7..b495868
  100644
  --- a/src/gallium/drivers/r600/evergreen_compute.c
  +++ b/src/gallium/drivers/r600/evergreen_compute.c
  @@ -953,6 +953,22 @@ void r600_compute_global_buffer_destroy(
  
  free(res);
   
   }
  
  +void r600_compute_global_demote_or_alloc(
  +   struct compute_memory_pool *pool,
  +   struct compute_memory_item *item,
  +   struct pipe_context *ctx)
  +{
  +   if (is_item_in_pool(item)) {
  +   compute_memory_demote_item(pool, item, ctx);
  +   } else {
  +   if (item-real_buffer == NULL) {
  +   item-real_buffer = (struct r600_resource*)
  +   
  r600_compute_buffer_alloc_vram(pool-screen, item-
size_in_dw * 4);
  +   }
  +   }
  +
  +}
  +
  
   void *r600_compute_global_transfer_map(
   
  struct pipe_context *ctx_,
  struct pipe_resource *resource,
  
  @@ -970,16 +986,7 @@ void *r600_compute_global_transfer_map(
  
  struct pipe_resource *dst = NULL;
  unsigned offset = box-x;
  
  -   if (is_item_in_pool(item)) {
  -   compute_memory_demote_item(pool, item, ctx_);
  -   }
  -   else {
  -   if (item-real_buffer == NULL) {
  -   item-real_buffer = (struct r600_resource*)
  -   
  r600_compute_buffer_alloc_vram(pool-screen, item-
size_in_dw * 4);
  -   }
  -   }
  -
  +   r600_compute_global_demote_or_alloc(pool, item, ctx_);
  
  dst = (struct pipe_resource*)item-real_buffer;
  
  if (usage  PIPE_TRANSFER_READ)
  
  diff --git a/src/gallium/drivers/r600/evergreen_compute.h
  b/src/gallium/drivers/r600/evergreen_compute.h index 4fb53a1..39bb854
  100644
  --- a/src/gallium/drivers/r600/evergreen_compute.h
  +++ b/src/gallium/drivers/r600/evergreen_compute.h
  @@ -47,6 +47,7 @@ void evergreen_emit_cs_shader(struct r600_context *rctx,
  struct r600_atom * atom 
   struct pipe_resource *r600_compute_global_buffer_create(struct
   pipe_screen *screen, const struct pipe_resource *templ); void
   r600_compute_global_buffer_destroy(struct pipe_screen *screen, struct
   pipe_resource *res); 
  +void r600_compute_global_demote_or_alloc(struct compute_memory_pool
  *pool, struct compute_memory_item *item, struct pipe_context *ctx); 
   void *r600_compute_global_transfer_map(
   
  struct pipe_context *ctx_,
  struct pipe_resource *resource,
  
  diff --git a/src/gallium/drivers/r600/r600_blit.c
  b/src/gallium/drivers/r600/r600_blit.c index f766e37..f6471cb 100644
  --- a/src/gallium/drivers/r600/r600_blit.c
  +++ b/src/gallium/drivers/r600/r600_blit.c
  @@ -21,6 +21,8 @@
  
* USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
   
   #include r600_pipe.h
  
  +#include compute_memory_pool.h
  +#include evergreen_compute.h
  
   #include util/u_surface.h
   #include util/u_format.h
   #include evergreend.h
  
  @@ -514,29 +516,33 @@ static void r600_copy_buffer(struct pipe_context
  *ctx, struct pipe_resource *dst 
* into a single global resource (r600_screen::global_pool).  The means
* they don't have their own cs_buf handle, so they cannot be passed
* to r600_copy_buffer() and must be handled separately.
  
  - *
  - * XXX: It should be possible to implement this function using

[Mesa-dev] [PATCH v2] r600: Use DMA transfers in r600_copy_global_buffer

2014-09-08 Thread Niels Ole Salscheider

v2: Do not demote items that are already in the pool

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/r600/evergreen_compute.h |  1 +
 src/gallium/drivers/r600/r600_blit.c | 59 
 2 files changed, 43 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.h 
b/src/gallium/drivers/r600/evergreen_compute.h
index 4fb53a1..e4d3a38 100644
--- a/src/gallium/drivers/r600/evergreen_compute.h
+++ b/src/gallium/drivers/r600/evergreen_compute.h
@@ -45,6 +45,7 @@ void evergreen_init_atom_start_compute_cs(struct r600_context 
*rctx);
 void evergreen_init_compute_state_functions(struct r600_context *rctx);
 void evergreen_emit_cs_shader(struct r600_context *rctx, struct r600_atom * 
atom);
 
+struct r600_resource* r600_compute_buffer_alloc_vram(struct r600_screen 
*screen, unsigned size);
 struct pipe_resource *r600_compute_global_buffer_create(struct pipe_screen 
*screen, const struct pipe_resource *templ);
 void r600_compute_global_buffer_destroy(struct pipe_screen *screen, struct 
pipe_resource *res);
 void *r600_compute_global_transfer_map(
diff --git a/src/gallium/drivers/r600/r600_blit.c 
b/src/gallium/drivers/r600/r600_blit.c
index f766e37..b334a75 100644
--- a/src/gallium/drivers/r600/r600_blit.c
+++ b/src/gallium/drivers/r600/r600_blit.c
@@ -21,6 +21,8 @@
  * USE OR OTHER DEALINGS IN THE SOFTWARE.
  */
 #include r600_pipe.h
+#include compute_memory_pool.h
+#include evergreen_compute.h
 #include util/u_surface.h
 #include util/u_format.h
 #include evergreend.h
@@ -514,29 +516,52 @@ static void r600_copy_buffer(struct pipe_context *ctx, 
struct pipe_resource *dst
  * into a single global resource (r600_screen::global_pool).  The means
  * they don't have their own cs_buf handle, so they cannot be passed
  * to r600_copy_buffer() and must be handled separately.
- *
- * XXX: It should be possible to implement this function using
- * r600_copy_buffer() by passing the memory_pool resource as both src
- * and dst and updating dstx and src_box to point to the correct offsets.
- * This would likely perform better than the current implementation.
  */
 static void r600_copy_global_buffer(struct pipe_context *ctx,
struct pipe_resource *dst, unsigned
dstx, struct pipe_resource *src,
const struct pipe_box *src_box)
 {
-   struct pipe_box dst_box; struct pipe_transfer *src_pxfer,
-   *dst_pxfer;
-
-   u_box_1d(dstx, src_box-width, dst_box);
-   void *src_ptr = ctx-transfer_map(ctx, src, 0, PIPE_TRANSFER_READ,
- src_box, src_pxfer);
-   void *dst_ptr = ctx-transfer_map(ctx, dst, 0, PIPE_TRANSFER_WRITE,
- dst_box, dst_pxfer);
-   memcpy(dst_ptr, src_ptr, src_box-width);
-
-   ctx-transfer_unmap(ctx, src_pxfer);
-   ctx-transfer_unmap(ctx, dst_pxfer);
+   struct r600_context *rctx = (struct r600_context*)ctx;
+   struct compute_memory_pool *pool = rctx-screen-global_pool;
+   struct pipe_box new_src_box = *src_box;
+
+   if (src-bind  PIPE_BIND_GLOBAL) {
+   struct r600_resource_global *rsrc =
+   (struct r600_resource_global *)src;
+   struct compute_memory_item *item = rsrc-chunk;
+
+   if (is_item_in_pool(item)) {
+   new_src_box.x += 4 * item-start_in_dw;
+   src = (struct pipe_resource *)pool-bo;
+   } else {
+   if (item-real_buffer == NULL) {
+   item-real_buffer = (struct r600_resource*)
+   
r600_compute_buffer_alloc_vram(pool-screen,
+  
item-size_in_dw * 4);
+   }
+   src = (struct pipe_resource*)item-real_buffer;
+   }
+   }
+   if (dst-bind  PIPE_BIND_GLOBAL) {
+   struct r600_resource_global *rdst =
+   (struct r600_resource_global *)dst;
+   struct compute_memory_item *item = rdst-chunk;
+
+   if (is_item_in_pool(item)) {
+   dstx += 4 * item-start_in_dw;
+   dst = (struct pipe_resource *)pool-bo;
+   } else {
+   if (item-real_buffer == NULL) {
+   item-real_buffer = (struct r600_resource*)
+   
r600_compute_buffer_alloc_vram(pool-screen,
+  
item-size_in_dw * 4);
+   }
+   dst = (struct pipe_resource*)item-real_buffer;
+   }
+   }
+
+   r600_copy_buffer(ctx, dst, dstx, src, new_src_box);
 }
 
 static void r600_clear_buffer(struct

[Mesa-dev] [PATCH] r600: Use DMA transfers in r600_copy_global_buffer

2014-09-07 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/r600/evergreen_compute.c | 27 ---
 src/gallium/drivers/r600/evergreen_compute.h |  1 +
 src/gallium/drivers/r600/r600_blit.c | 40 
 3 files changed, 41 insertions(+), 27 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
b/src/gallium/drivers/r600/evergreen_compute.c
index 38b78c7..b495868 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -953,6 +953,22 @@ void r600_compute_global_buffer_destroy(
free(res);
 }
 
+void r600_compute_global_demote_or_alloc(
+   struct compute_memory_pool *pool,
+   struct compute_memory_item *item,
+   struct pipe_context *ctx)
+{
+   if (is_item_in_pool(item)) {
+   compute_memory_demote_item(pool, item, ctx);
+   } else {
+   if (item-real_buffer == NULL) {
+   item-real_buffer = (struct r600_resource*)
+   
r600_compute_buffer_alloc_vram(pool-screen, item-size_in_dw * 4);
+   }
+   }
+
+}
+
 void *r600_compute_global_transfer_map(
struct pipe_context *ctx_,
struct pipe_resource *resource,
@@ -970,16 +986,7 @@ void *r600_compute_global_transfer_map(
struct pipe_resource *dst = NULL;
unsigned offset = box-x;
 
-   if (is_item_in_pool(item)) {
-   compute_memory_demote_item(pool, item, ctx_);
-   }
-   else {
-   if (item-real_buffer == NULL) {
-   item-real_buffer = (struct r600_resource*)
-   
r600_compute_buffer_alloc_vram(pool-screen, item-size_in_dw * 4);
-   }
-   }
-
+   r600_compute_global_demote_or_alloc(pool, item, ctx_);
dst = (struct pipe_resource*)item-real_buffer;
 
if (usage  PIPE_TRANSFER_READ)
diff --git a/src/gallium/drivers/r600/evergreen_compute.h 
b/src/gallium/drivers/r600/evergreen_compute.h
index 4fb53a1..39bb854 100644
--- a/src/gallium/drivers/r600/evergreen_compute.h
+++ b/src/gallium/drivers/r600/evergreen_compute.h
@@ -47,6 +47,7 @@ void evergreen_emit_cs_shader(struct r600_context *rctx, 
struct r600_atom * atom
 
 struct pipe_resource *r600_compute_global_buffer_create(struct pipe_screen 
*screen, const struct pipe_resource *templ);
 void r600_compute_global_buffer_destroy(struct pipe_screen *screen, struct 
pipe_resource *res);
+void r600_compute_global_demote_or_alloc(struct compute_memory_pool *pool, 
struct compute_memory_item *item, struct pipe_context *ctx);
 void *r600_compute_global_transfer_map(
struct pipe_context *ctx_,
struct pipe_resource *resource,
diff --git a/src/gallium/drivers/r600/r600_blit.c 
b/src/gallium/drivers/r600/r600_blit.c
index f766e37..f6471cb 100644
--- a/src/gallium/drivers/r600/r600_blit.c
+++ b/src/gallium/drivers/r600/r600_blit.c
@@ -21,6 +21,8 @@
  * USE OR OTHER DEALINGS IN THE SOFTWARE.
  */
 #include r600_pipe.h
+#include compute_memory_pool.h
+#include evergreen_compute.h
 #include util/u_surface.h
 #include util/u_format.h
 #include evergreend.h
@@ -514,29 +516,33 @@ static void r600_copy_buffer(struct pipe_context *ctx, 
struct pipe_resource *dst
  * into a single global resource (r600_screen::global_pool).  The means
  * they don't have their own cs_buf handle, so they cannot be passed
  * to r600_copy_buffer() and must be handled separately.
- *
- * XXX: It should be possible to implement this function using
- * r600_copy_buffer() by passing the memory_pool resource as both src
- * and dst and updating dstx and src_box to point to the correct offsets.
- * This would likely perform better than the current implementation.
  */
 static void r600_copy_global_buffer(struct pipe_context *ctx,
struct pipe_resource *dst, unsigned
dstx, struct pipe_resource *src,
const struct pipe_box *src_box)
 {
-   struct pipe_box dst_box; struct pipe_transfer *src_pxfer,
-   *dst_pxfer;
-
-   u_box_1d(dstx, src_box-width, dst_box);
-   void *src_ptr = ctx-transfer_map(ctx, src, 0, PIPE_TRANSFER_READ,
- src_box, src_pxfer);
-   void *dst_ptr = ctx-transfer_map(ctx, dst, 0, PIPE_TRANSFER_WRITE,
- dst_box, dst_pxfer);
-   memcpy(dst_ptr, src_ptr, src_box-width);
-
-   ctx-transfer_unmap(ctx, src_pxfer);
-   ctx-transfer_unmap(ctx, dst_pxfer);
+   struct r600_context *rctx = (struct r600_context*)ctx;
+   struct compute_memory_pool *pool = rctx-screen-global_pool;
+
+   if (src-bind  PIPE_BIND_GLOBAL) {
+   struct r600_resource_global *rsrc =
+   (struct r600_resource_global *)src;
+   struct compute_memory_item *item = rsrc-chunk

[Mesa-dev] [PATCH] gallium/radeon: Do not use u_upload_mgr for buffer downloads

2014-08-14 Thread Niels Ole Salscheider

Instead create a staging texture with pipe_buffer_create and
PIPE_USAGE_STAGING.

u_upload_mgr sets the usage of its staging buffer to PIPE_USAGE_STREAM.
But since 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 CPU - GPU streaming buffers
are created in VRAM. Therefore the staging texture (in VRAM) does not offer any
performance improvements for buffer downloads.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/radeon/r600_buffer_common.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index 22bc97e..ee05776 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -303,26 +303,22 @@ static void *r600_buffer_transfer_map(struct pipe_context 
*ctx,
 !(usage  PIPE_TRANSFER_WRITE) 
 rbuffer-domains == RADEON_DOMAIN_VRAM 
 r600_can_dma_copy_buffer(rctx, 0, box-x, box-width)) {
-   unsigned offset;
-   struct r600_resource *staging = NULL;
-
-   u_upload_alloc(rctx-uploader, 0,
-  box-width + (box-x % 
R600_MAP_BUFFER_ALIGNMENT),
-  offset, (struct pipe_resource**)staging, 
(void**)data);
+   struct r600_resource *staging;
 
+   staging = (struct r600_resource*) pipe_buffer_create(
+   ctx-screen, PIPE_BIND_TRANSFER_READ, 
PIPE_USAGE_STAGING,
+   box-width + (box-x % 
R600_MAP_BUFFER_ALIGNMENT));
if (staging) {
-   data += box-x % R600_MAP_BUFFER_ALIGNMENT;
-
/* Copy the VRAM buffer to the staging buffer. */
rctx-dma_copy(ctx, staging-b.b, 0,
-  offset + box-x % 
R600_MAP_BUFFER_ALIGNMENT,
+  box-x % R600_MAP_BUFFER_ALIGNMENT,
   0, 0, resource, level, box);
 
-   /* Just do the synchronization. The buffer is mapped 
already. */
-   r600_buffer_map_sync_with_rings(rctx, staging, 
PIPE_TRANSFER_READ);
+   data = r600_buffer_map_sync_with_rings(rctx, staging, 
PIPE_TRANSFER_READ);
+   data += box-x % R600_MAP_BUFFER_ALIGNMENT;
 
return r600_buffer_get_transfer(ctx, resource, level, 
usage, box,
-   ptransfer, data, 
staging, offset);
+   ptransfer, data, 
staging, 0);
}
}
 
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] gallium/radeon: Set gpu_address to 0 if r600_virtual_address is false

2014-08-10 Thread Niels Ole Salscheider

Without this patch I get the following during DMA transfers:
[drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream !
radeon :01:00.0: CP DMA dst buffer too small (21475829792 4096)

This is a fixup for e878e154cdfd4dbb5474f776e0a6d86fcb983098.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/radeon/r600_buffer_common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index a580685..22bc97e 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -171,6 +171,8 @@ bool r600_init_resource(struct r600_common_screen *rscreen,
 
if (rscreen-info.r600_virtual_address)
res-gpu_address = 
rscreen-ws-buffer_get_virtual_address(res-cs_buf);
+   else
+   res-gpu_address = 0;
 
pb_reference(old_buf, NULL);
 
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/2] radeon: Use the DMA engine for buffer downloads

2014-08-09 Thread Niels Ole Salscheider

On Tuesday 04 March 2014, 02:08:58, Marek Olšák wrote:
 Could you please do this without changing u_upload_mgr? You can still
 use u_upload_alloc to allocate buffer memory in the driver and the map
 buffer read/write flags are not important with persistent coherent
 buffer mappings anyway.

Since 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 we allocate CPU - GPU 
streaming buffers (i. e. those with PIPE_USAGE_STREAM) in VRAM.
We should therefore set buffer.usage to PIPE_USAGE_STAGING in 
u_upload_alloc_buffer when we use u_upload_mgr for downloads - otherwise we 
won't get any performance improvements.
Would it now be OK to change u_upload_mgr or do you have a better proposal?

Ole

 Marek
 
 On Mon, Mar 3, 2014 at 9:29 PM, Niels Ole Salscheider
 
 niels_...@salscheider-online.de wrote:
  Using the DMA engine for buffer downloads vastly improves performance.
  This is because reads from VRAM by the CPU are slow because of the high
  latency of the PCIe bus.
  
  The first patch allows u_upload_mgr to be used for downloads, too. The
  second patch then uses u_upload_mgr in the radeon driver for downloads.
  I considered to rename u_upload_mgr to u_transfer_mgr since it might be
  confusing that an upload manager can be used for downloads. But then
  again we also have transfers so that u_transfer_mgr might also be
  confusing. Thus, I decided not to rename it for now.
  
  Without these patches, the buffer_bandwidth benchmark from uCLbench gives
  me:
  
  ./buffer_bandwidth --size=2000 --iterations=100
  # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant
  memory, 
  32 KB local memory)
  
  1/1 direct 2000 Bytes   759.29 MB/s(HD) 17.13 MB/s(DD)
  
  14.61 MB/s(DH)
  
  With these paches, the read performance is much better:
  
  ./buffer_bandwidth --size=2000 --iterations=100
  # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant
  memory, 
  32 KB local memory)
  
  1/1 direct 2000 Bytes   759.90 MB/s(HD) 613.49 MB/s(DD)
  
  1841.07 MB/s(DH)
  
  Judging by these numbers, it might even make sense to use the DMA engine
  for larger buffer downloads...
  
  Niels Ole Salscheider (2):
util/u_upload_mgr: Allow to also use it for downloads
radeon: Use transfer manager for buffer downloads
   
   src/gallium/auxiliary/hud/hud_context.c |  3 +-
   src/gallium/auxiliary/util/u_blitter.c  |  3 +-
   src/gallium/auxiliary/util/u_upload_mgr.c   | 49 +++-
   src/gallium/auxiliary/util/u_upload_mgr.h   | 13 -
   src/gallium/auxiliary/util/u_vbuf.c |  3 +-
   src/gallium/auxiliary/vl/vl_compositor.c|  3 +-
   src/gallium/drivers/ilo/ilo_context.c   |  3 +-
   src/gallium/drivers/r300/r300_context.c |  3 +-
   src/gallium/drivers/radeon/r600_buffer_common.c | 78
   +++-- src/gallium/drivers/radeon/r600_pipe_common.c 
| 14 -
   src/gallium/drivers/radeon/r600_pipe_common.h   |  1 +
   src/mesa/state_tracker/st_context.c |  9 ++-
   12 files changed, 136 insertions(+), 46 deletions(-)
  
  --
  1.9.0
  
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] clover: Call end_query before getting timestamp result

2014-07-16 Thread Niels Ole Salscheider

On Wednesday 16 July 2014, 16:49:08, Tom Stellard wrote:
 Also change the wait parameter from false to true.
 ---
 
 I'm really not sure what is correct here, but this patch fixes event
 profiling on SI.

I think you should call end_query in the constructor right after the call to 
create_query. That is because you want the corresponding packet to be emited 
as soon as the query is created and not when you are interested in the results 
(i. e. when the corresponding event has occured).

  src/gallium/state_trackers/clover/core/timestamp.cpp | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/src/gallium/state_trackers/clover/core/timestamp.cpp
 b/src/gallium/state_trackers/clover/core/timestamp.cpp index
 481c4f9..a6edaf6 100644
 --- a/src/gallium/state_trackers/clover/core/timestamp.cpp
 +++ b/src/gallium/state_trackers/clover/core/timestamp.cpp
 @@ -47,7 +47,8 @@ cl_ulong
  timestamp::query::operator()() const {
 pipe_query_result result;
 
 -   if (!q().pipe-get_query_result(q().pipe, _query, false, result))
 +   q().pipe-end_query(q().pipe, _query);
 +   if (!q().pipe-get_query_result(q().pipe, _query, true, result))
throw error(CL_PROFILING_INFO_NOT_AVAILABLE);
 
 return result.u64;
 --
 1.8.1.5
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] clover: Call end_query before getting timestamp result v2

2014-07-16 Thread Niels Ole Salscheider

Reviewed-by: Niels Ole Salscheider niels_...@salscheider-online.de

On Wednesday 16 July 2014, 17:37:48, Tom Stellard wrote:
 v2:
   - Move the end_query() call into the timestamp constructor.
   - Still pass false as the wait parameter to get_query_result().
 ---
  src/gallium/state_trackers/clover/core/timestamp.cpp | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/src/gallium/state_trackers/clover/core/timestamp.cpp
 b/src/gallium/state_trackers/clover/core/timestamp.cpp index
 481c4f9..3fd341f 100644
 --- a/src/gallium/state_trackers/clover/core/timestamp.cpp
 +++ b/src/gallium/state_trackers/clover/core/timestamp.cpp
 @@ -30,6 +30,7 @@ using namespace clover;
  timestamp::query::query(command_queue q) :
 q(q),
 _query(q.pipe-create_query(q.pipe, PIPE_QUERY_TIMESTAMP, 0)) {
 +   q.pipe-end_query(q.pipe, _query);
  }
 
  timestamp::query::query(query other) :
 --
 1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2] egl/gallium: Set defines for supported APIs when using automake

2014-06-11 Thread Niels Ole Salscheider

This fixes automake builds which are broken since
b52a530ce2aada1967bc8fefa83ab53e6a737dae.

v2: This patch also adds the FEATURE_* defines back to targets/egl-static for
Android and Scons that have been removed in the mentioned commit.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79885
---
 src/gallium/state_trackers/egl/Makefile.am | 20 
 src/gallium/targets/egl-static/Android.mk  |  2 ++
 src/gallium/targets/egl-static/SConscript  |  6 ++
 3 files changed, 28 insertions(+)

diff --git a/src/gallium/state_trackers/egl/Makefile.am 
b/src/gallium/state_trackers/egl/Makefile.am
index b7dcdab..828bf13 100644
--- a/src/gallium/state_trackers/egl/Makefile.am
+++ b/src/gallium/state_trackers/egl/Makefile.am
@@ -88,3 +88,23 @@ AM_CPPFLAGS += \
-I$(top_srcdir)/src/gallium/winsys/sw \
-DHAVE_NULL_BACKEND
 endif
+
+if HAVE_OPENGL
+AM_CPPFLAGS += \
+   -DFEATURE_GL=1
+endif
+
+if HAVE_OPENGL_ES1
+AM_CPPFLAGS += \
+   -DFEATURE_ES1=1
+endif
+
+if HAVE_OPENGL_ES2
+AM_CPPFLAGS += \
+   -DFEATURE_ES2=1
+endif
+
+if HAVE_OPENVG
+AM_CPPFLAGS += \
+   -DFEATURE_VG=1
+endif
diff --git a/src/gallium/targets/egl-static/Android.mk 
b/src/gallium/targets/egl-static/Android.mk
index 01408a7..37244b5 100644
--- a/src/gallium/targets/egl-static/Android.mk
+++ b/src/gallium/targets/egl-static/Android.mk
@@ -31,6 +31,8 @@ LOCAL_SRC_FILES := \
egl_st.c
 
 LOCAL_CFLAGS := \
+   -DFEATURE_ES1=1 \
+   -DFEATURE_ES2=1 \
-D_EGL_MAIN=_eglBuiltInDriverGALLIUM
 
 LOCAL_C_INCLUDES := \
diff --git a/src/gallium/targets/egl-static/SConscript 
b/src/gallium/targets/egl-static/SConscript
index 7d8d4d2..afb5c11 100644
--- a/src/gallium/targets/egl-static/SConscript
+++ b/src/gallium/targets/egl-static/SConscript
@@ -63,6 +63,11 @@ if env['platform'] == 'windows':
 
 # OpenGL ES and OpenGL
 if env['gles']:
+env.Append(CPPDEFINES = [
+'FEATURE_GL=1',
+'FEATURE_ES1=1',
+'FEATURE_ES2=1'
+])
 env.Prepend(LIBPATH = [shared_glapi.dir])
 # manually add LIBPREFIX on windows
 glapi_name = 'glapi' if env['platform'] != 'windows' else 'libglapi'
@@ -70,6 +75,7 @@ if env['gles']:
 
 # OpenVG
 if True:
+env.Append(CPPDEFINES = ['FEATURE_VG=1'])
 env.Prepend(LIBPATH = [openvg.dir])
 # manually add LIBPREFIX on windows
 openvg_name = 'OpenVG' if env['platform'] != 'windows' else 'libOpenVG'
-- 
2.0.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] egl/gallium: Set defines for supported APIs when using automake

2014-06-10 Thread Niels Ole Salscheider

This fixes automake builds which are broken since
b52a530ce2aada1967bc8fefa83ab53e6a737dae.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/state_trackers/egl/Makefile.am | 20 
 1 file changed, 20 insertions(+)

diff --git a/src/gallium/state_trackers/egl/Makefile.am 
b/src/gallium/state_trackers/egl/Makefile.am
index b7dcdab..828bf13 100644
--- a/src/gallium/state_trackers/egl/Makefile.am
+++ b/src/gallium/state_trackers/egl/Makefile.am
@@ -88,3 +88,23 @@ AM_CPPFLAGS += \
-I$(top_srcdir)/src/gallium/winsys/sw \
-DHAVE_NULL_BACKEND
 endif
+
+if HAVE_OPENGL
+AM_CPPFLAGS += \
+   -DFEATURE_GL=1
+endif
+
+if HAVE_OPENGL_ES1
+AM_CPPFLAGS += \
+   -DFEATURE_ES1=1
+endif
+
+if HAVE_OPENGL_ES2
+AM_CPPFLAGS += \
+   -DFEATURE_ES2=1
+endif
+
+if HAVE_OPENVG
+AM_CPPFLAGS += \
+   -DFEATURE_VG=1
+endif
-- 
2.0.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] egl/gallium: Set defines for supported APIs when using automake

2014-06-10 Thread Niels Ole Salscheider

On Tuesday 10 June 2014, 16:18:56, Emil Velikov wrote:
 On 10/06/14 15:17, Niels Ole Salscheider wrote:
  This fixes automake builds which are broken since
  b52a530ce2aada1967bc8fefa83ab53e6a737dae.
 
 Not sure what I was smoking with the above mentioned patch.
 Seem like I've completely forgotten about automake :\
 
 Niels can you please drop the FEATURE* defines from
 src/gallium/targets/egl-static/Makefile.am

I think they are still necessary since src/gallium/targets/egl-static/egl_st.c 
contains these flags, too... Or am I missing something?
I have seen that you removed them in b52a530ce2aada1967bc8fefa83ab53e6a737dae 
for the other build systems...
 
 With that fixed
 
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79885
 Reviewed-by: Emil Velikov emil.l.veli...@gmail.com
 
 Thanks
 Emil
 
  Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
  ---
  
   src/gallium/state_trackers/egl/Makefile.am | 20 
   1 file changed, 20 insertions(+)
  
  diff --git a/src/gallium/state_trackers/egl/Makefile.am
  b/src/gallium/state_trackers/egl/Makefile.am index b7dcdab..828bf13
  100644
  --- a/src/gallium/state_trackers/egl/Makefile.am
  +++ b/src/gallium/state_trackers/egl/Makefile.am
  @@ -88,3 +88,23 @@ AM_CPPFLAGS += \
  
  -I$(top_srcdir)/src/gallium/winsys/sw \
  -DHAVE_NULL_BACKEND
   
   endif
  
  +
  +if HAVE_OPENGL
  +AM_CPPFLAGS += \
  +   -DFEATURE_GL=1
  +endif
  +
  +if HAVE_OPENGL_ES1
  +AM_CPPFLAGS += \
  +   -DFEATURE_ES1=1
  +endif
  +
  +if HAVE_OPENGL_ES2
  +AM_CPPFLAGS += \
  +   -DFEATURE_ES2=1
  +endif
  +
  +if HAVE_OPENVG
  +AM_CPPFLAGS += \
  +   -DFEATURE_VG=1
  +endif

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] radeonsi: Implement DMA blit

2014-03-17 Thread Niels Ole Salscheider

This code is a slightly modified version of evergreen_dma_blit (and
evergreen_dma_copy as well as evergreen_dma_copy_tile).
It would be nice to share some of the code in the long term.

I have reused some cik-prefixed functions that also return the right
value for SI. I am not sure if they should be renamed.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/radeonsi/Makefile.sources |   1 +
 src/gallium/drivers/radeonsi/si_dma.c | 352 ++
 src/gallium/drivers/radeonsi/si_pipe.h|   9 +
 src/gallium/drivers/radeonsi/si_state.c   |  25 +-
 src/gallium/drivers/radeonsi/si_state.h   |   7 +
 src/gallium/drivers/radeonsi/sid.h|  20 ++
 6 files changed, 394 insertions(+), 20 deletions(-)
 create mode 100644 src/gallium/drivers/radeonsi/si_dma.c

diff --git a/src/gallium/drivers/radeonsi/Makefile.sources 
b/src/gallium/drivers/radeonsi/Makefile.sources
index 11b3319..6a24cde 100644
--- a/src/gallium/drivers/radeonsi/Makefile.sources
+++ b/src/gallium/drivers/radeonsi/Makefile.sources
@@ -3,6 +3,7 @@ C_SOURCES := \
si_commands.c \
si_compute.c \
si_descriptors.c \
+   si_dma.c \
si_hw_context.c \
si_pipe.c \
si_pm4.c \
diff --git a/src/gallium/drivers/radeonsi/si_dma.c 
b/src/gallium/drivers/radeonsi/si_dma.c
new file mode 100644
index 000..61078eb
--- /dev/null
+++ b/src/gallium/drivers/radeonsi/si_dma.c
@@ -0,0 +1,352 @@
+/*
+ * Copyright 2010 Jerome Glisse gli...@freedesktop.org
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the Software),
+ * to deal in the Software without restriction, including without limitation
+ * on the rights to use, copy, modify, merge, publish, distribute, sub
+ * license, and/or sell copies of the Software, and to permit persons to whom
+ * the Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *  Jerome Glisse
+ */
+
+#include sid.h
+#include si_pipe.h
+#include ../radeon/r600_cs.h
+
+#include util/u_format.h
+
+static unsigned si_array_mode(unsigned mode)
+{
+   switch (mode) {
+   case RADEON_SURF_MODE_LINEAR_ALIGNED:
+   return V_009910_ARRAY_LINEAR_ALIGNED;
+   case RADEON_SURF_MODE_1D:
+   return V_009910_ARRAY_1D_TILED_THIN1;
+   case RADEON_SURF_MODE_2D:
+   return V_009910_ARRAY_2D_TILED_THIN1;
+   default:
+   case RADEON_SURF_MODE_LINEAR:
+   return V_009910_ARRAY_LINEAR_GENERAL;
+   }
+}
+
+static uint32_t si_num_banks(uint32_t nbanks)
+{
+   switch (nbanks) {
+   case 2:
+   return V_009910_ADDR_SURF_2_BANK;
+   case 4:
+   return V_009910_ADDR_SURF_4_BANK;
+   case 8:
+   default:
+   return V_009910_ADDR_SURF_8_BANK;
+   case 16:
+   return V_009910_ADDR_SURF_16_BANK;
+   }
+}
+
+static uint32_t si_micro_tile_mode(struct si_screen *sscreen, unsigned 
tile_mode)
+{
+   if (sscreen-b.info.si_tile_mode_array_valid) {
+   uint32_t gb_tile_mode = 
sscreen-b.info.si_tile_mode_array[tile_mode];
+
+   return G_009910_MICRO_TILE_MODE(gb_tile_mode);
+   }
+
+   /* The kernel cannod return the tile mode array. Guess? */
+   return V_009910_ADDR_SURF_THIN_MICRO_TILING;
+}
+
+static void si_dma_copy_buffer(struct si_context *ctx,
+   struct pipe_resource *dst,
+   struct pipe_resource *src,
+   uint64_t dst_offset,
+   uint64_t src_offset,
+   uint64_t size)
+{
+   struct radeon_winsys_cs *cs = ctx-b.rings.dma.cs;
+   unsigned i, ncopy, csize, max_csize, sub_cmd, shift;
+   struct r600_resource *rdst = (struct r600_resource*)dst;
+   struct r600_resource *rsrc = (struct r600_resource*)src;
+
+   /* Mark the buffer range of destination as valid (initialized),
+* so that transfer_map knows it should wait for the GPU when mapping
+* that range. */
+   util_range_add(rdst-valid_buffer_range, dst_offset,
+  dst_offset + size);
+
+   dst_offset

[Mesa-dev] [PATCH 1/2] radeon: Move r600_need_dma_space to common code

2014-03-17 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/r600/evergreen_hw_context.c |  2 +-
 src/gallium/drivers/r600/evergreen_state.c  |  2 +-
 src/gallium/drivers/r600/r600_hw_context.c  | 12 +---
 src/gallium/drivers/r600/r600_pipe.h|  1 -
 src/gallium/drivers/r600/r600_state.c   |  2 +-
 src/gallium/drivers/radeon/r600_pipe_common.c   | 10 ++
 src/gallium/drivers/radeon/r600_pipe_common.h   |  1 +
 7 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c 
b/src/gallium/drivers/r600/evergreen_hw_context.c
index 083b697..a433876 100644
--- a/src/gallium/drivers/r600/evergreen_hw_context.c
+++ b/src/gallium/drivers/r600/evergreen_hw_context.c
@@ -62,7 +62,7 @@ void evergreen_dma_copy(struct r600_context *rctx,
}
ncopy = (size / 0x000f) + !!(size % 0x000f);
 
-   r600_need_dma_space(rctx, ncopy * 5);
+   r600_need_dma_space(rctx-b, ncopy * 5);
for (i = 0; i  ncopy; i++) {
csize = size  0x000f ? size : 0x000f;
/* emit reloc before writting cs so that cs is always in 
consistent state */
diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 05cc3ef..b929f17 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -3295,7 +3295,7 @@ static void evergreen_dma_copy_tile(struct r600_context 
*rctx,
 
size = (copy_height * pitch)  2;
ncopy = (size / 0x000f) + !!(size % 0x000f);
-   r600_need_dma_space(rctx, ncopy * 9);
+   r600_need_dma_space(rctx-b, ncopy * 9);
 
for (i = 0; i  ncopy; i++) {
cheight = copy_height;
diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 3a3b3d5..75723be 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -440,16 +440,6 @@ void r600_cp_dma_copy_buffer(struct r600_context *rctx,
 R600_CONTEXT_INV_TEX_CACHE;
 }
 
-void r600_need_dma_space(struct r600_context *ctx, unsigned num_dw)
-{
-   /* The number of dwords we already used in the DMA so far. */
-   num_dw += ctx-b.rings.dma.cs-cdw;
-   /* Flush if there's not enough space. */
-   if (num_dw  RADEON_MAX_CMDBUF_DWORDS) {
-   ctx-b.rings.dma.flush(ctx, RADEON_FLUSH_ASYNC);
-   }
-}
-
 void r600_dma_copy(struct r600_context *rctx,
struct pipe_resource *dst,
struct pipe_resource *src,
@@ -475,7 +465,7 @@ void r600_dma_copy(struct r600_context *rctx,
shift = 2;
ncopy = (size / 0x) + !!(size % 0x);
 
-   r600_need_dma_space(rctx, ncopy * 5);
+   r600_need_dma_space(rctx-b, ncopy * 5);
for (i = 0; i  ncopy; i++) {
csize = size  0x ? size : 0x;
/* emit reloc before writting cs so that cs is always in 
consistent state */
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index a3827e3..0472eaa 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -586,7 +586,6 @@ void r600_context_flush(struct r600_context *ctx, unsigned 
flags);
 void r600_begin_new_cs(struct r600_context *ctx);
 void r600_flush_emit(struct r600_context *ctx);
 void r600_need_cs_space(struct r600_context *ctx, unsigned num_dw, boolean 
count_draw_in);
-void r600_need_dma_space(struct r600_context *ctx, unsigned num_dw);
 void r600_cp_dma_copy_buffer(struct r600_context *rctx,
 struct pipe_resource *dst, uint64_t dst_offset,
 struct pipe_resource *src, uint64_t src_offset,
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index 39e38f4..6c8222b 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -2856,7 +2856,7 @@ static boolean r600_dma_copy_tile(struct r600_context 
*rctx,
 */
cheight = ((0x  2) / pitch)  0xfff8;
ncopy = (copy_height / cheight) + !!(copy_height % cheight);
-   r600_need_dma_space(rctx, ncopy * 7);
+   r600_need_dma_space(rctx-b, ncopy * 7);
 
for (i = 0; i  ncopy; i++) {
cheight = cheight  copy_height ? copy_height : cheight;
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 05ada1c..35901c8 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -727,3 +727,13 @@ void r600_screen_clear_buffer(struct r600_common_screen 
*rscreen, struct pipe_re
rscreen-aux_context-flush(rscreen-aux_context, NULL, 0);
pipe_mutex_unlock(rscreen-aux_context_lock);
 }
+
+void r600_need_dma_space(struct

Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement DMA blit

2014-03-17 Thread Niels Ole Salscheider

I have sent an updated version of the patch to the mailing list.
I hope that the copyright header of si_dma.c is right - I copied it from 
si_hw_context.c...

Ole

On Monday 17 March 2014, 02:33:35, Marek Olšák wrote:
 Thanks for doing this! I have some comments...
 
 1) As of SI, the maximum supported size for dword-aligned L2L, L2T,
 and T2L copies is 0x8. The maximum supported size for byte-aligned
 L2L copies is 0xfffe0. I'd like to have proper definitions for this,
 e.g. SI_DMA_COPY_MAX_SIZE and SI_DMA_COPY_MAX_SIZE_DW. All occurrences
 of 0x000f should be replaced appropriately.
 
 Now the cosmetic stuff.
 
 2) This is quite a lot of code, so I'd like all of this to be in a
 separate file, e.g. si_dma.c.
 
 3) r600/si_need_cs_space could be moved to drivers/radeon.
 
 4) All calls to r600_context_bo_reloc could be moved out of the loops,
 because SI supports virtual memory and therefore it's not required to
 call the function before every packet. See also my explanation in
 patch winsys/radeon: only add duplicate relocations for DMA if VM
 isn't supported.
 
 5) Flushing the gfx CS is not required, because r600_context_bo_reloc
 flushes it for you.
 
 Please see also my latest DMA patches for r600g.
 
 Thanks.
 
 Marek
 
 On Thu, Mar 13, 2014 at 8:45 AM, Niels Ole Salscheider
 
 niels_...@salscheider-online.de wrote:
  This code is a slightly modified version of evergreen_dma_blit (and
  evergreen_dma_copy as well as evergreen_dma_copy_tile).
  It would be nice to share some of the code in the long term.
  
  I have reused some cik-prefixed functions that also return the right
  value for SI. I am not sure if they should be renamed.
  
  Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
  ---
  
   src/gallium/drivers/radeonsi/si_hw_context.c |  65 +++
   src/gallium/drivers/radeonsi/si_pipe.h   |   7 +
   src/gallium/drivers/radeonsi/si_state.c  | 265
   ++- src/gallium/drivers/radeonsi/sid.h  
   |  15 ++
   4 files changed, 346 insertions(+), 6 deletions(-)
  
  diff --git a/src/gallium/drivers/radeonsi/si_hw_context.c
  b/src/gallium/drivers/radeonsi/si_hw_context.c index d9fba01..76583a3
  100644
  --- a/src/gallium/drivers/radeonsi/si_hw_context.c
  +++ b/src/gallium/drivers/radeonsi/si_hw_context.c
  @@ -25,6 +25,8 @@
  
*/
   
   #include si_pipe.h
  
  +#include sid.h
  +#include ../radeon/r600_cs.h
  
   /* initialize */
   void si_need_cs_space(struct si_context *ctx, unsigned num_dw,
  
  @@ -186,6 +188,69 @@ void si_begin_new_cs(struct si_context *ctx)
  
  ctx-b.initial_gfx_cs_size = ctx-b.rings.gfx.cs-cdw;
   
   }
  
  +void si_need_dma_space(struct si_context *ctx, unsigned num_dw)
  +{
  +   /* The number of dwords we already used in the DMA so far. */
  +   num_dw += ctx-b.rings.dma.cs-cdw;
  +   /* Flush if there's not enough space. */
  +   if (num_dw  RADEON_MAX_CMDBUF_DWORDS) {
  +   ctx-b.rings.dma.flush(ctx, RADEON_FLUSH_ASYNC);
  +   }
  +}
  +
  +void si_dma_copy(struct si_context *ctx,
  +struct pipe_resource *dst,
  +struct pipe_resource *src,
  +uint64_t dst_offset,
  +uint64_t src_offset,
  +uint64_t size)
  +{
  +   struct radeon_winsys_cs *cs = ctx-b.rings.dma.cs;
  +   unsigned i, ncopy, csize, sub_cmd, shift;
  +   struct r600_resource *rdst = (struct r600_resource*)dst;
  +   struct r600_resource *rsrc = (struct r600_resource*)src;
  +
  +   /* Mark the buffer range of destination as valid (initialized),
  +* so that transfer_map knows it should wait for the GPU when
  mapping +* that range. */
  +   util_range_add(rdst-valid_buffer_range, dst_offset,
  +  dst_offset + size);
  +
  +   /* make sure that the dma ring is only one active */
  +   ctx-b.rings.gfx.flush(ctx, RADEON_FLUSH_ASYNC);
  +   dst_offset += r600_resource_va(ctx-screen-b.b, dst);
  +   src_offset += r600_resource_va(ctx-screen-b.b, src);
  +
  +   /* see if we use dword or byte copy */
  +   if (!(dst_offset  0x3)  !(src_offset  0x3)  !(size  0x3)) {
  +   size = 2;
  +   sub_cmd = 0x00;
  +   shift = 2;
  +   } else {
  +   sub_cmd = 0x40;
  +   shift = 0;
  +   }
  +   ncopy = (size / 0x000f) + !!(size % 0x000f);
  +
  +   si_need_dma_space(ctx, ncopy * 5);
  +   for (i = 0; i  ncopy; i++) {
  +   csize = size  0x000f ? size : 0x000f;
  +   /* emit reloc before writting cs so that cs is always in
  consistent state */ +   r600_context_bo_reloc(ctx-b,
  ctx-b.rings.dma, rsrc, RADEON_USAGE_READ, +
  RADEON_PRIO_MIN);
  +   r600_context_bo_reloc(ctx-b, ctx-b.rings.dma, rdst,
  RADEON_USAGE_WRITE

[Mesa-dev] [PATCH 1/2] radeonsi: Add DMA ring

2014-03-13 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/radeonsi/si_hw_context.c |  3 +++
 src/gallium/drivers/radeonsi/si_pipe.c   | 22 ++
 2 files changed, 25 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_hw_context.c 
b/src/gallium/drivers/radeonsi/si_hw_context.c
index c952c8d..d9fba01 100644
--- a/src/gallium/drivers/radeonsi/si_hw_context.c
+++ b/src/gallium/drivers/radeonsi/si_hw_context.c
@@ -123,6 +123,9 @@ void si_context_flush(struct si_context *ctx, unsigned 
flags)
 #endif
 
/* Flush the CS. */
+   if (ctx-b.rings.dma.cs) {
+   ctx-b.ws-cs_flush(ctx-b.rings.dma.cs, flags, 0);
+   }
ctx-b.ws-cs_flush(ctx-b.rings.gfx.cs, flags, 0);
 
 #if SI_TRACE_CS
diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 827e9fe..21cbedf 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -74,6 +74,24 @@ static void si_flush_from_winsys(void *ctx, unsigned flags)
si_flush((struct pipe_context*)ctx, NULL, flags);
 }
 
+static void si_flush_dma_from_st(void *ctx, unsigned flags)
+{
+   struct si_context *sctx = (struct si_context *)ctx;
+   struct radeon_winsys_cs *cs = sctx-b.rings.dma.cs;
+
+   if (!cs-cdw) {
+   return;
+   }
+
+   sctx-b.ws-cs_flush(cs, flags, 0);
+}
+
+static void si_flush_dma_from_winsys(void *ctx, unsigned flags)
+{
+   struct si_context *sctx = (struct si_context *)ctx;
+   sctx-b.rings.dma.flush(sctx, flags);
+}
+
 static void si_destroy_context(struct pipe_context *context)
 {
struct si_context *sctx = (struct si_context *)context;
@@ -163,6 +181,10 @@ static struct pipe_context *si_create_context(struct 
pipe_screen *screen, void *
 
sctx-b.ws-cs_set_flush_callback(sctx-b.rings.gfx.cs, 
si_flush_from_winsys, sctx);
 
+   sctx-b.rings.dma.cs = sctx-b.ws-cs_create(sctx-b.ws, RING_DMA, 
NULL);
+   sctx-b.rings.dma.flush = si_flush_dma_from_st;
+   sctx-b.ws-cs_set_flush_callback(sctx-b.rings.dma.cs, 
si_flush_dma_from_winsys, sctx);
+
sctx-blitter = util_blitter_create(sctx-b.b);
if (sctx-blitter == NULL)
goto fail;
-- 
1.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] radeonsi: Implement DMA blit

2014-03-13 Thread Niels Ole Salscheider

This code is a slightly modified version of evergreen_dma_blit (and
evergreen_dma_copy as well as evergreen_dma_copy_tile).
It would be nice to share some of the code in the long term.

I have reused some cik-prefixed functions that also return the right
value for SI. I am not sure if they should be renamed.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/radeonsi/si_hw_context.c |  65 +++
 src/gallium/drivers/radeonsi/si_pipe.h   |   7 +
 src/gallium/drivers/radeonsi/si_state.c  | 265 ++-
 src/gallium/drivers/radeonsi/sid.h   |  15 ++
 4 files changed, 346 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_hw_context.c 
b/src/gallium/drivers/radeonsi/si_hw_context.c
index d9fba01..76583a3 100644
--- a/src/gallium/drivers/radeonsi/si_hw_context.c
+++ b/src/gallium/drivers/radeonsi/si_hw_context.c
@@ -25,6 +25,8 @@
  */
 
 #include si_pipe.h
+#include sid.h
+#include ../radeon/r600_cs.h
 
 /* initialize */
 void si_need_cs_space(struct si_context *ctx, unsigned num_dw,
@@ -186,6 +188,69 @@ void si_begin_new_cs(struct si_context *ctx)
ctx-b.initial_gfx_cs_size = ctx-b.rings.gfx.cs-cdw;
 }
 
+void si_need_dma_space(struct si_context *ctx, unsigned num_dw)
+{
+   /* The number of dwords we already used in the DMA so far. */
+   num_dw += ctx-b.rings.dma.cs-cdw;
+   /* Flush if there's not enough space. */
+   if (num_dw  RADEON_MAX_CMDBUF_DWORDS) {
+   ctx-b.rings.dma.flush(ctx, RADEON_FLUSH_ASYNC);
+   }
+}
+
+void si_dma_copy(struct si_context *ctx,
+struct pipe_resource *dst,
+struct pipe_resource *src,
+uint64_t dst_offset,
+uint64_t src_offset,
+uint64_t size)
+{
+   struct radeon_winsys_cs *cs = ctx-b.rings.dma.cs;
+   unsigned i, ncopy, csize, sub_cmd, shift;
+   struct r600_resource *rdst = (struct r600_resource*)dst;
+   struct r600_resource *rsrc = (struct r600_resource*)src;
+
+   /* Mark the buffer range of destination as valid (initialized),
+* so that transfer_map knows it should wait for the GPU when mapping
+* that range. */
+   util_range_add(rdst-valid_buffer_range, dst_offset,
+  dst_offset + size);
+
+   /* make sure that the dma ring is only one active */
+   ctx-b.rings.gfx.flush(ctx, RADEON_FLUSH_ASYNC);
+   dst_offset += r600_resource_va(ctx-screen-b.b, dst);
+   src_offset += r600_resource_va(ctx-screen-b.b, src);
+
+   /* see if we use dword or byte copy */
+   if (!(dst_offset  0x3)  !(src_offset  0x3)  !(size  0x3)) {
+   size = 2;
+   sub_cmd = 0x00;
+   shift = 2;
+   } else {
+   sub_cmd = 0x40;
+   shift = 0;
+   }
+   ncopy = (size / 0x000f) + !!(size % 0x000f);
+
+   si_need_dma_space(ctx, ncopy * 5);
+   for (i = 0; i  ncopy; i++) {
+   csize = size  0x000f ? size : 0x000f;
+   /* emit reloc before writting cs so that cs is always in 
consistent state */
+   r600_context_bo_reloc(ctx-b, ctx-b.rings.dma, rsrc, 
RADEON_USAGE_READ,
+ RADEON_PRIO_MIN);
+   r600_context_bo_reloc(ctx-b, ctx-b.rings.dma, rdst, 
RADEON_USAGE_WRITE,
+ RADEON_PRIO_MIN);
+   cs-buf[cs-cdw++] = SI_DMA_PACKET(SI_DMA_PACKET_COPY, sub_cmd, 
csize);
+   cs-buf[cs-cdw++] = dst_offset  0x;
+   cs-buf[cs-cdw++] = src_offset  0x;
+   cs-buf[cs-cdw++] = (dst_offset  32UL)  0xff;
+   cs-buf[cs-cdw++] = (src_offset  32UL)  0xff;
+   dst_offset += csize  shift;
+   src_offset += csize  shift;
+   size -= csize;
+   }
+}
+
 #if SI_TRACE_CS
 void si_trace_emit(struct si_context *sctx)
 {
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 47dc8e7..45def1e 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -171,6 +171,13 @@ void si_decompress_color_textures(struct si_context *sctx,
 void si_context_flush(struct si_context *ctx, unsigned flags);
 void si_begin_new_cs(struct si_context *ctx);
 void si_need_cs_space(struct si_context *ctx, unsigned num_dw, boolean 
count_draw_in);
+void si_need_dma_space(struct si_context *ctx, unsigned num_dw);
+void si_dma_copy(struct si_context *ctx,
+struct pipe_resource *dst,
+struct pipe_resource *src,
+uint64_t dst_offset,
+uint64_t src_offset,
+uint64_t size);
 
 /* si_pipe.c */
 void si_flush(struct pipe_context *ctx, struct pipe_fence_handle **fence,
diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 3843330

[Mesa-dev] [PATCH 1/2] radeon: Move DMA ring creation to common code

2014-03-13 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/r600/r600_pipe.c  | 30 -
 src/gallium/drivers/r600/r600_pipe.h  |  1 -
 src/gallium/drivers/radeon/r600_pipe_common.c | 32 +++
 src/gallium/drivers/radeon/r600_pipe_common.h |  2 ++
 4 files changed, 34 insertions(+), 31 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 88fbdd8..982e18d 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -48,7 +48,6 @@ static const struct debug_named_value r600_debug_options[] = {
{ nollvm, DBG_NO_LLVM, Disable the LLVM shader compiler },
 #endif
{ nocpdma, DBG_NO_CP_DMA, Disable CP DMA },
-   { nodma, DBG_NO_ASYNC_DMA, Disable asynchronous DMA },
 
/* shader backend */
{ nosb, DBG_NO_SB, Disable sb backend for graphics shaders },
@@ -121,20 +120,6 @@ static void r600_flush_gfx_ring(void *ctx, unsigned flags)
r600_flush((struct pipe_context*)ctx, flags);
 }
 
-static void r600_flush_dma_ring(void *ctx, unsigned flags)
-{
-   struct r600_context *rctx = (struct r600_context *)ctx;
-   struct radeon_winsys_cs *cs = rctx-b.rings.dma.cs;
-
-   if (!cs-cdw) {
-   return;
-   }
-
-   rctx-b.rings.dma.flushing = true;
-   rctx-b.ws-cs_flush(cs, flags, 0);
-   rctx-b.rings.dma.flushing = false;
-}
-
 static void r600_flush_from_winsys(void *ctx, unsigned flags)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
@@ -142,13 +127,6 @@ static void r600_flush_from_winsys(void *ctx, unsigned 
flags)
rctx-b.rings.gfx.flush(rctx, flags);
 }
 
-static void r600_flush_dma_from_winsys(void *ctx, unsigned flags)
-{
-   struct r600_context *rctx = (struct r600_context *)ctx;
-
-   rctx-b.rings.dma.flush(rctx, flags);
-}
-
 static void r600_destroy_context(struct pipe_context *context)
 {
struct r600_context *rctx = (struct r600_context *)context;
@@ -269,14 +247,6 @@ static struct pipe_context *r600_create_context(struct 
pipe_screen *screen, void
rctx-b.ws-cs_set_flush_callback(rctx-b.rings.gfx.cs, 
r600_flush_from_winsys, rctx);
rctx-b.rings.gfx.flushing = false;
 
-   rctx-b.rings.dma.cs = NULL;
-   if (rscreen-b.info.r600_has_dma  !(rscreen-b.debug_flags  
DBG_NO_ASYNC_DMA)) {
-   rctx-b.rings.dma.cs = rctx-b.ws-cs_create(rctx-b.ws, 
RING_DMA, NULL);
-   rctx-b.rings.dma.flush = r600_flush_dma_ring;
-   rctx-b.ws-cs_set_flush_callback(rctx-b.rings.dma.cs, 
r600_flush_dma_from_winsys, rctx);
-   rctx-b.rings.dma.flushing = false;
-   }
-
rctx-allocator_fetch_shader = u_suballocator_create(rctx-b.b, 64 * 
1024, 256,
 0, 
PIPE_USAGE_DEFAULT, FALSE);
if (!rctx-allocator_fetch_shader)
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 6d627e5..a3827e3 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -197,7 +197,6 @@ struct r600_gs_rings_state {
 /* features */
 #define DBG_NO_LLVM(1  17)
 #define DBG_NO_CP_DMA  (1  18)
-#define DBG_NO_ASYNC_DMA   (1  19)
 /* shader backend */
 #define DBG_NO_SB  (1  21)
 #define DBG_SB_CS  (1  22)
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 3aa718d..2e39aaf 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -43,6 +43,27 @@ static void r600_memory_barrier(struct pipe_context *ctx, 
unsigned flags)
 {
 }
 
+static void r600_flush_dma_ring(void *ctx, unsigned flags)
+{
+   struct r600_common_context *rctx = (struct r600_common_context *)ctx;
+   struct radeon_winsys_cs *cs = rctx-rings.dma.cs;
+
+   if (!cs-cdw) {
+   return;
+   }
+
+   rctx-rings.dma.flushing = true;
+   rctx-ws-cs_flush(cs, flags, 0);
+   rctx-rings.dma.flushing = false;
+}
+
+static void r600_flush_dma_from_winsys(void *ctx, unsigned flags)
+{
+   struct r600_common_context *rctx = (struct r600_common_context *)ctx;
+
+   rctx-rings.dma.flush(rctx, flags);
+}
+
 bool r600_common_context_init(struct r600_common_context *rctx,
  struct r600_common_screen *rscreen)
 {
@@ -77,6 +98,14 @@ bool r600_common_context_init(struct r600_common_context 
*rctx,
if (!rctx-uploader)
return false;
 
+   rctx-rings.dma.cs = NULL;
+   if (rscreen-info.r600_has_dma  !(rscreen-debug_flags  
DBG_NO_ASYNC_DMA)) {
+   rctx-rings.dma.cs = rctx-ws-cs_create(rctx-ws, RING_DMA, 
NULL);
+   rctx-rings.dma.flush = r600_flush_dma_ring;
+   rctx-ws-cs_set_flush_callback(rctx-rings.dma.cs

[Mesa-dev] [PATCH 2/2] radeonsi: flush the dma ring in si_flush_from_st

2014-03-13 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/radeonsi/si_pipe.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 827e9fe..401bf6a 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -65,6 +65,13 @@ static void si_flush_from_st(struct pipe_context *ctx,
 struct pipe_fence_handle **fence,
 unsigned flags)
 {
+   struct si_context *sctx = (struct si_context *)ctx;
+
+   if (sctx-b.rings.dma.cs) {
+   sctx-b.rings.dma.flush(sctx,
+   flags  PIPE_FLUSH_END_OF_FRAME ? 
RADEON_FLUSH_END_OF_FRAME : 0);
+   }
+
si_flush(ctx, fence,
 flags  PIPE_FLUSH_END_OF_FRAME ? RADEON_FLUSH_END_OF_FRAME : 
0);
 }
-- 
1.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] r600g, radeonsi: use a fallback in dma_copy instead of failing

2014-03-09 Thread Niels Ole Salscheider

On Sunday 09 March 2014, 02:24:51, Marek Olšák wrote:
 From: Marek Olšák marek.ol...@amd.com
 
 ---
  src/gallium/drivers/r600/evergreen_state.c  | 37 +---
  src/gallium/drivers/r600/r600_state.c   | 41 ++---
  src/gallium/drivers/radeon/r600_buffer_common.c | 58
 +++-- src/gallium/drivers/radeon/r600_pipe_common.h   |
 17 
  src/gallium/drivers/radeon/r600_texture.c   | 18 +++-
  src/gallium/drivers/radeonsi/si_state.c | 19 
  6 files changed, 97 insertions(+), 93 deletions(-)
 
 diff --git a/src/gallium/drivers/r600/evergreen_state.c
 b/src/gallium/drivers/r600/evergreen_state.c index dca7c58..5e57f8d 100644
 --- a/src/gallium/drivers/r600/evergreen_state.c
 +++ b/src/gallium/drivers/r600/evergreen_state.c
 @@ -3329,13 +3329,13 @@ static void evergreen_dma_copy_tile(struct
 r600_context *rctx, }
  }
 
 -static boolean evergreen_dma_blit(struct pipe_context *ctx,
 -   struct pipe_resource *dst,
 -   unsigned dst_level,
 -   unsigned dst_x, unsigned dst_y, unsigned 
 dst_z,
 -   struct pipe_resource *src,
 -   unsigned src_level,
 -   const struct pipe_box *src_box)
 +static void evergreen_dma_blit(struct pipe_context *ctx,
 +struct pipe_resource *dst,
 +unsigned dst_level,
 +unsigned dst_x, unsigned dst_y, unsigned dst_z,
 +struct pipe_resource *src,
 +unsigned src_level,
 +const struct pipe_box *src_box)
  {
   struct r600_context *rctx = (struct r600_context *)ctx;
   struct r600_texture *rsrc = (struct r600_texture*)src;
 @@ -3345,19 +3345,22 @@ static boolean evergreen_dma_blit(struct
 pipe_context *ctx, unsigned src_x, src_y;
 
   if (rctx-b.rings.dma.cs == NULL) {
 - return FALSE;
 + goto fallback;
   }
 
   if (dst-target == PIPE_BUFFER  src-target == PIPE_BUFFER) {
 + if (dst_x % 4 || src_box-x % 4 || src_box-width % 4)
 + goto fallback;

Why do we need this? I think that the async DMA engine can handle byte aligned 
copies. It is streamout that needs x and width to be dw aligned, isn't it?

 +
   evergreen_dma_copy(rctx, dst, src, dst_x, src_box-x, src_box-
width);
 - return TRUE;
 + return;
   }
 
   if (src-format != dst-format) {
 - return FALSE;
 + goto fallback;
   }
   if (rdst-dirty_level_mask != 0) {
 - return FALSE;
 + goto fallback;
   }
   if (rsrc-dirty_level_mask) {
   ctx-flush_resource(ctx, src);
 @@ -3383,13 +3386,13 @@ static boolean evergreen_dma_blit(struct
 pipe_context *ctx,
 
   if (src_pitch != dst_pitch || src_box-x || dst_x || src_w != dst_w) {
   /* FIXME evergreen can do partial blit */
 - return FALSE;
 + goto fallback;
   }
   /* the x test here are currently useless (because we don't support 
partial
 blit) * but keep them around so we don't forget about those
*/
   if ((src_pitch  0x7) || (src_box-x  0x7) || (dst_x  0x7) ||
 (src_box-y  0x7) || (dst_y  0x7)) { -  return FALSE;
 + goto fallback;
   }
 
   /* 128 bpp surfaces require non_disp_tiling for both
 @@ -3400,7 +3403,7 @@ static boolean evergreen_dma_blit(struct pipe_context
 *ctx, if ((rctx-b.chip_class == CAYMAN) 
   (src_mode != dst_mode) 
   (util_format_get_blocksize(src-format) = 16)) {
 - return FALSE;
 + goto fallback;
   }
 
   if (src_mode == dst_mode) {
 @@ -3423,7 +3426,11 @@ static boolean evergreen_dma_blit(struct pipe_context
 *ctx, src, src_level, src_x, src_y, src_box-z,
   copy_height, dst_pitch, bpp);
   }
 - return TRUE;
 + return;
 +
 +fallback:
 + ctx-resource_copy_region(ctx, dst, dst_level, dst_x, dst_y, dst_z,
 +   src, src_level, src_box);
  }
 
  void evergreen_init_state_functions(struct r600_context *rctx)
 diff --git a/src/gallium/drivers/r600/r600_state.c
 b/src/gallium/drivers/r600/r600_state.c index 6d89e6c..a0e6d2d 100644
 --- a/src/gallium/drivers/r600/r600_state.c
 +++ b/src/gallium/drivers/r600/r600_state.c
 @@ -2883,13 +2883,13 @@ static boolean r600_dma_copy_tile(struct
 r600_context *rctx, return TRUE;
  }
 
 -static boolean r600_dma_blit(struct pipe_context *ctx,
 -  struct pipe_resource *dst,
 -  unsigned dst_level,
 -  unsigned dst_x, unsigned dst_y, unsigned dst_z,
 -  struct pipe_resource *src,
 -  unsigned src_level,
 -

Re: [Mesa-dev] [PATCH 1/3] r600g, radeonsi: use a fallback in dma_copy instead of failing

2014-03-09 Thread Niels Ole Salscheider

You are right, r600-r700 require dword alignment while linear copies can be 
byte aligned on EG+.
Apart from that, patch 1 and 2 look good to me...

Ole
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3] radeon: Use upload manager for buffer downloads

2014-03-05 Thread Niels Ole Salscheider

Using DMA for reads is much faster.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/radeon/r600_buffer_common.c | 74 +++--
 1 file changed, 56 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index 340ebb2..90ca8cb 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -260,6 +260,42 @@ static void *r600_buffer_transfer_map(struct pipe_context 
*ctx,
/* At this point, the buffer is always idle (we checked it 
above). */
usage |= PIPE_TRANSFER_UNSYNCHRONIZED;
}
+   /* Using DMA for larger reads is much faster */
+   else if ((usage  PIPE_TRANSFER_READ) 
+!(usage  PIPE_TRANSFER_WRITE) 
+(rbuffer-domains == RADEON_DOMAIN_VRAM)) {
+   unsigned offset;
+   struct r600_resource *staging = NULL;
+
+   u_upload_alloc(rctx-uploader, 0,
+  box-width + (box-x % 
R600_MAP_BUFFER_ALIGNMENT),
+  offset, (struct pipe_resource**)staging, 
(void**)data);
+
+   if (staging) {
+   data += box-x % R600_MAP_BUFFER_ALIGNMENT;
+
+   /* Copy the staging buffer into the original one. */
+   if (rctx-dma_copy(ctx, (struct pipe_resource*)staging, 
0,
+box-x % 
R600_MAP_BUFFER_ALIGNMENT,
+0, 0, resource, level, box)) {
+   rctx-rings.gfx.flush(rctx, 0);
+   if (rctx-rings.dma.cs)
+   rctx-rings.dma.flush(rctx, 0);
+
+   /* Wait for any offloaded CS flush to complete
+* to avoid busy-waiting in the winsys. */
+   rctx-ws-cs_sync_flush(rctx-rings.gfx.cs);
+   if (rctx-rings.dma.cs)
+   
rctx-ws-cs_sync_flush(rctx-rings.dma.cs);
+
+   rctx-ws-buffer_wait(staging-buf, 
RADEON_USAGE_WRITE);
+   return r600_buffer_get_transfer(ctx, resource, 
level, usage, box,
+   ptransfer, 
data, staging, offset);
+   } else {
+   pipe_resource_reference((struct 
pipe_resource**)staging, NULL);
+   }
+   }
+   }
 
data = r600_buffer_map_sync_with_rings(rctx, rbuffer, usage);
if (!data) {
@@ -279,24 +315,26 @@ static void r600_buffer_transfer_unmap(struct 
pipe_context *ctx,
struct r600_resource *rbuffer = r600_resource(transfer-resource);
 
if (rtransfer-staging) {
-   struct pipe_resource *dst, *src;
-   unsigned soffset, doffset, size;
-   struct pipe_box box;
-
-   dst = transfer-resource;
-   src = rtransfer-staging-b.b;
-   size = transfer-box.width;
-   doffset = transfer-box.x;
-   soffset = rtransfer-offset + transfer-box.x % 
R600_MAP_BUFFER_ALIGNMENT;
-
-   u_box_1d(soffset, size, box);
-
-   /* Copy the staging buffer into the original one. */
-   if (!(size % 4)  !(doffset % 4)  !(soffset % 4) 
-   rctx-dma_copy(ctx, dst, 0, doffset, 0, 0, src, 0, box)) {
-   /* DONE. */
-   } else {
-   ctx-resource_copy_region(ctx, dst, 0, doffset, 0, 0, 
src, 0, box);
+   if (rtransfer-transfer.usage  PIPE_TRANSFER_WRITE) {
+   struct pipe_resource *dst, *src;
+   unsigned soffset, doffset, size;
+   struct pipe_box box;
+
+   dst = transfer-resource;
+   src = rtransfer-staging-b.b;
+   size = transfer-box.width;
+   doffset = transfer-box.x;
+   soffset = rtransfer-offset + transfer-box.x % 
R600_MAP_BUFFER_ALIGNMENT;
+
+   u_box_1d(soffset, size, box);
+
+   /* Copy the staging buffer into the original one. */
+   if (!(size % 4)  !(doffset % 4)  !(soffset % 4) 
+   rctx-dma_copy(ctx, dst, 0, doffset, 0, 0, src, 0, 
box)) {
+   /* DONE. */
+   } else {
+   ctx-resource_copy_region(ctx, dst, 0, doffset, 
0, 0, src, 0, box);
+   }
}
pipe_resource_reference((struct 
pipe_resource**)rtransfer-staging, NULL);
}
-- 
1.9.0

Re: [Mesa-dev] [PATCH v2] radeon: Use upload manager for buffer downloads

2014-03-05 Thread Niels Ole Salscheider

On Tuesday 04 March 2014, 23:43:01, Marek Olšák wrote:
 You check for streamout and CP DMA support, but you don't use
 resource_copy_region if DMA is not supported. The CP DMA and
 streamout-based buffer copying is only used by resource_copy_region.

Oh, right. I initially used resource_copy_region as a fallback and forgot to 
remove these checks. I have sent an updated patch to the list.

 The last parameter of buffer_wait should be RADEON_USAGE_WRITE (you're
 waiting for the last write to the staging buffer), but that parameter
 is not used by the winsys yet.
 
 Other than those two, the patch looks good.
 
 CP DMA != async DMA (dma_copy). CP DMA is actually a feature of the
 graphics ring.
 
 Marek
 
 On Tue, Mar 4, 2014 at 6:23 PM, Niels Ole Salscheider
 
 niels_...@salscheider-online.de wrote:
  Using DMA for reads is much faster.
  
  Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
  ---
  
   src/gallium/drivers/radeon/r600_buffer_common.c | 78
   +++-- 1 file changed, 60 insertions(+), 18
   deletions(-)
  
  diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c
  b/src/gallium/drivers/radeon/r600_buffer_common.c index 340ebb2..ed3a08c
  100644
  --- a/src/gallium/drivers/radeon/r600_buffer_common.c
  +++ b/src/gallium/drivers/radeon/r600_buffer_common.c
  @@ -260,6 +260,46 @@ static void *r600_buffer_transfer_map(struct
  pipe_context *ctx, 
  /* At this point, the buffer is always idle (we checked it
  above). */
  usage |= PIPE_TRANSFER_UNSYNCHRONIZED;
  
  }
  
  +   /* Using DMA for larger reads is much faster */
  +   else if ((usage  PIPE_TRANSFER_READ) 
  +!(usage  PIPE_TRANSFER_WRITE) 
  +(rbuffer-domains == RADEON_DOMAIN_VRAM) 
  +(rscreen-has_cp_dma ||
  + (rscreen-has_streamout 
  +  /* The buffer range must be aligned to 4 with
  streamout. */ +  box-x % 4 == 0  box-width % 4 ==
  0))) {
  +   unsigned offset;
  +   struct r600_resource *staging = NULL;
  +
  +   u_upload_alloc(rctx-uploader, 0,
  +  box-width + (box-x %
  R600_MAP_BUFFER_ALIGNMENT), +  offset,
  (struct pipe_resource**)staging, (void**)data); +
  +   if (staging) {
  +   data += box-x % R600_MAP_BUFFER_ALIGNMENT;
  +
  +   /* Copy the staging buffer into the original one.
  */ +   if (rctx-dma_copy(ctx, (struct
  pipe_resource*)staging, 0, + 
box-x % R600_MAP_BUFFER_ALIGNMENT, +  
   0, 0, resource, level, box)) { +
rctx-rings.gfx.flush(rctx, 0);
  +   if (rctx-rings.dma.cs)
  +   rctx-rings.dma.flush(rctx, 0);
  +
  +   /* Wait for any offloaded CS flush to
  complete +* to avoid busy-waiting in the
  winsys. */ +  
  rctx-ws-cs_sync_flush(rctx-rings.gfx.cs); +   
 if (rctx-rings.dma.cs)
  +  
  rctx-ws-cs_sync_flush(rctx-rings.dma.cs); +
  +   rctx-ws-buffer_wait(staging-buf,
  RADEON_USAGE_READ); +   return
  r600_buffer_get_transfer(ctx, resource, level, usage, box, + 
   ptransfer, data,
  staging, offset); +   } else {
  +   pipe_resource_reference((struct
  pipe_resource**)staging, NULL); +   }
  +   }
  +   }
  
  data = r600_buffer_map_sync_with_rings(rctx, rbuffer, usage);
  if (!data) {
  
  @@ -279,24 +319,26 @@ static void r600_buffer_transfer_unmap(struct
  pipe_context *ctx, 
  struct r600_resource *rbuffer = r600_resource(transfer-resource);
  
  if (rtransfer-staging) {
  
  -   struct pipe_resource *dst, *src;
  -   unsigned soffset, doffset, size;
  -   struct pipe_box box;
  -
  -   dst = transfer-resource;
  -   src = rtransfer-staging-b.b;
  -   size = transfer-box.width;
  -   doffset = transfer-box.x;
  -   soffset = rtransfer-offset + transfer-box.x %
  R600_MAP_BUFFER_ALIGNMENT; -
  -   u_box_1d(soffset, size, box);
  -
  -   /* Copy the staging buffer into the original one. */
  -   if (!(size % 4)  !(doffset % 4)  !(soffset % 4) 
  -   rctx-dma_copy(ctx, dst, 0, doffset, 0, 0, src, 0,
  box)) { -   /* DONE. */
  -   } else {
  -   ctx

[Mesa-dev] [PATCH v2] radeon: Use upload manager for buffer downloads

2014-03-04 Thread Niels Ole Salscheider

Using DMA for reads is much faster.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/radeon/r600_buffer_common.c | 78 +++--
 1 file changed, 60 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index 340ebb2..ed3a08c 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -260,6 +260,46 @@ static void *r600_buffer_transfer_map(struct pipe_context 
*ctx,
/* At this point, the buffer is always idle (we checked it 
above). */
usage |= PIPE_TRANSFER_UNSYNCHRONIZED;
}
+   /* Using DMA for larger reads is much faster */
+   else if ((usage  PIPE_TRANSFER_READ) 
+!(usage  PIPE_TRANSFER_WRITE) 
+(rbuffer-domains == RADEON_DOMAIN_VRAM) 
+(rscreen-has_cp_dma ||
+ (rscreen-has_streamout 
+  /* The buffer range must be aligned to 4 with streamout. */
+  box-x % 4 == 0  box-width % 4 == 0))) {
+   unsigned offset;
+   struct r600_resource *staging = NULL;
+
+   u_upload_alloc(rctx-uploader, 0,
+  box-width + (box-x % 
R600_MAP_BUFFER_ALIGNMENT),
+  offset, (struct pipe_resource**)staging, 
(void**)data);
+
+   if (staging) {
+   data += box-x % R600_MAP_BUFFER_ALIGNMENT;
+
+   /* Copy the staging buffer into the original one. */
+   if (rctx-dma_copy(ctx, (struct pipe_resource*)staging, 
0,
+box-x % 
R600_MAP_BUFFER_ALIGNMENT,
+0, 0, resource, level, box)) {
+   rctx-rings.gfx.flush(rctx, 0);
+   if (rctx-rings.dma.cs)
+   rctx-rings.dma.flush(rctx, 0);
+
+   /* Wait for any offloaded CS flush to complete
+* to avoid busy-waiting in the winsys. */
+   rctx-ws-cs_sync_flush(rctx-rings.gfx.cs);
+   if (rctx-rings.dma.cs)
+   
rctx-ws-cs_sync_flush(rctx-rings.dma.cs);
+
+   rctx-ws-buffer_wait(staging-buf, 
RADEON_USAGE_READ);
+   return r600_buffer_get_transfer(ctx, resource, 
level, usage, box,
+   ptransfer, 
data, staging, offset);
+   } else {
+   pipe_resource_reference((struct 
pipe_resource**)staging, NULL);
+   }
+   }
+   }
 
data = r600_buffer_map_sync_with_rings(rctx, rbuffer, usage);
if (!data) {
@@ -279,24 +319,26 @@ static void r600_buffer_transfer_unmap(struct 
pipe_context *ctx,
struct r600_resource *rbuffer = r600_resource(transfer-resource);
 
if (rtransfer-staging) {
-   struct pipe_resource *dst, *src;
-   unsigned soffset, doffset, size;
-   struct pipe_box box;
-
-   dst = transfer-resource;
-   src = rtransfer-staging-b.b;
-   size = transfer-box.width;
-   doffset = transfer-box.x;
-   soffset = rtransfer-offset + transfer-box.x % 
R600_MAP_BUFFER_ALIGNMENT;
-
-   u_box_1d(soffset, size, box);
-
-   /* Copy the staging buffer into the original one. */
-   if (!(size % 4)  !(doffset % 4)  !(soffset % 4) 
-   rctx-dma_copy(ctx, dst, 0, doffset, 0, 0, src, 0, box)) {
-   /* DONE. */
-   } else {
-   ctx-resource_copy_region(ctx, dst, 0, doffset, 0, 0, 
src, 0, box);
+   if (rtransfer-transfer.usage  PIPE_TRANSFER_WRITE) {
+   struct pipe_resource *dst, *src;
+   unsigned soffset, doffset, size;
+   struct pipe_box box;
+
+   dst = transfer-resource;
+   src = rtransfer-staging-b.b;
+   size = transfer-box.width;
+   doffset = transfer-box.x;
+   soffset = rtransfer-offset + transfer-box.x % 
R600_MAP_BUFFER_ALIGNMENT;
+
+   u_box_1d(soffset, size, box);
+
+   /* Copy the staging buffer into the original one. */
+   if (!(size % 4)  !(doffset % 4)  !(soffset % 4) 
+   rctx-dma_copy(ctx, dst, 0, doffset, 0, 0, src, 0, 
box)) {
+   /* DONE. */
+   } else {
+   ctx-resource_copy_region(ctx, dst, 0

Re: [Mesa-dev] [PATCH 0/2] radeon: Use the DMA engine for buffer downloads

2014-03-04 Thread Niels Ole Salscheider

 Could you please do this without changing u_upload_mgr? You can still
 use u_upload_alloc to allocate buffer memory in the driver and the map
 buffer read/write flags are not important with persistent coherent
 buffer mappings anyway.

I have sent an updated patch to the list.

Ole
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] r600: compute memory pool size is given in dw

2014-03-03 Thread Niels Ole Salscheider

Multiply the dw value by 4 in order to map the complete buffer.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/r600/compute_memory_pool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/compute_memory_pool.c 
b/src/gallium/drivers/r600/compute_memory_pool.c
index 90d5358..2f0d4c8 100644
--- a/src/gallium/drivers/r600/compute_memory_pool.c
+++ b/src/gallium/drivers/r600/compute_memory_pool.c
@@ -449,7 +449,7 @@ void compute_memory_transfer(
 
if (device_to_host) {
map = pipe-transfer_map(pipe, gart, 0, PIPE_TRANSFER_READ,
-   (struct pipe_box) { .width = aligned_size,
+   (struct pipe_box) { .width = aligned_size * 4,
.height = 1, .depth = 1 }, xfer);
 assert(xfer);
assert(map);
@@ -457,7 +457,7 @@ void compute_memory_transfer(
pipe-transfer_unmap(pipe, xfer);
} else {
map = pipe-transfer_map(pipe, gart, 0, PIPE_TRANSFER_WRITE,
-   (struct pipe_box) { .width = aligned_size,
+   (struct pipe_box) { .width = aligned_size * 4,
.height = 1, .depth = 1 }, xfer);
assert(xfer);
assert(map);
-- 
1.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] util/u_upload_mgr: Allow to also use it for downloads

2014-03-03 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/auxiliary/hud/hud_context.c   |  3 +-
 src/gallium/auxiliary/util/u_blitter.c|  3 +-
 src/gallium/auxiliary/util/u_upload_mgr.c | 49 +++
 src/gallium/auxiliary/util/u_upload_mgr.h | 13 +--
 src/gallium/auxiliary/util/u_vbuf.c   |  3 +-
 src/gallium/auxiliary/vl/vl_compositor.c  |  3 +-
 src/gallium/drivers/ilo/ilo_context.c |  3 +-
 src/gallium/drivers/r300/r300_context.c   |  3 +-
 src/gallium/drivers/radeon/r600_pipe_common.c |  3 +-
 src/mesa/state_tracker/st_context.c   |  9 +++--
 10 files changed, 64 insertions(+), 28 deletions(-)

diff --git a/src/gallium/auxiliary/hud/hud_context.c 
b/src/gallium/auxiliary/hud/hud_context.c
index 465013c..567ec99 100644
--- a/src/gallium/auxiliary/hud/hud_context.c
+++ b/src/gallium/auxiliary/hud/hud_context.c
@@ -938,7 +938,8 @@ hud_create(struct pipe_context *pipe, struct cso_context 
*cso)
hud-pipe = pipe;
hud-cso = cso;
hud-uploader = u_upload_create(pipe, 256 * 1024, 16,
-   PIPE_BIND_VERTEX_BUFFER);
+   PIPE_BIND_VERTEX_BUFFER,
+   UPLOAD_MGR_UPLOAD);
 
/* font */
if (!util_font_create(pipe, UTIL_FONT_FIXED_8X13, hud-font)) {
diff --git a/src/gallium/auxiliary/util/u_blitter.c 
b/src/gallium/auxiliary/util/u_blitter.c
index 95e7fb6..fb606ee 100644
--- a/src/gallium/auxiliary/util/u_blitter.c
+++ b/src/gallium/auxiliary/util/u_blitter.c
@@ -333,7 +333,8 @@ struct blitter_context *util_blitter_create(struct 
pipe_context *pipe)
for (i = 0; i  4; i++)
   ctx-vertices[i][0][3] = 1; /*v.w*/
 
-   ctx-upload = u_upload_create(pipe, 65536, 4, PIPE_BIND_VERTEX_BUFFER);
+   ctx-upload = u_upload_create(pipe, 65536, 4, PIPE_BIND_VERTEX_BUFFER,
+ UPLOAD_MGR_UPLOAD);
 
return ctx-base;
 }
diff --git a/src/gallium/auxiliary/util/u_upload_mgr.c 
b/src/gallium/auxiliary/util/u_upload_mgr.c
index 744ea2e..3205cd1 100644
--- a/src/gallium/auxiliary/util/u_upload_mgr.c
+++ b/src/gallium/auxiliary/util/u_upload_mgr.c
@@ -41,11 +41,14 @@
 struct u_upload_mgr {
struct pipe_context *pipe;
 
-   unsigned default_size;  /* Minimum size of the upload buffer, in bytes. */
-   unsigned alignment; /* Alignment of each sub-allocation. */
-   unsigned bind;  /* Bitmask of PIPE_BIND_* flags. */
-   unsigned map_flags; /* Bitmask of PIPE_TRANSFER_* flags. */
-   boolean map_persistent; /* If persistent mappings are supported. */
+   unsigned default_size;  /* Minimum size of the upload buffer,
+* in bytes. */
+   unsigned alignment; /* Alignment of each sub-allocation. */
+   unsigned bind;  /* Bitmask of PIPE_BIND_* flags. */
+   unsigned map_flags; /* Bitmask of PIPE_TRANSFER_* flags. */
+   boolean map_persistent; /* If persistent mappings are supported. */
+   enum u_upload_mgr_usage usage;  /* Usage of the upload manager
+* (for uploads or downloads) */
 
struct pipe_resource *buffer;   /* Upload buffer. */
struct pipe_transfer *transfer; /* Transfer object for the upload buffer. */
@@ -58,7 +61,8 @@ struct u_upload_mgr {
 struct u_upload_mgr *u_upload_create( struct pipe_context *pipe,
   unsigned default_size,
   unsigned alignment,
-  unsigned bind )
+  unsigned bind,
+  enum u_upload_mgr_usage usage )
 {
struct u_upload_mgr *upload = CALLOC_STRUCT( u_upload_mgr );
if (!upload)
@@ -68,20 +72,29 @@ struct u_upload_mgr *u_upload_create( struct pipe_context 
*pipe,
upload-default_size = default_size;
upload-alignment = alignment;
upload-bind = bind;
+   upload-usage = usage;
 
upload-map_persistent =
   pipe-screen-get_param(pipe-screen,
   PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT);
 
if (upload-map_persistent) {
-  upload-map_flags = PIPE_TRANSFER_WRITE |
-  PIPE_TRANSFER_PERSISTENT |
+  upload-map_flags = PIPE_TRANSFER_PERSISTENT |
   PIPE_TRANSFER_COHERENT;
+  if (usage == UPLOAD_MGR_UPLOAD) {
+  upload-map_flags |= PIPE_TRANSFER_WRITE;
+  } else {
+  upload-map_flags |= PIPE_TRANSFER_READ;
+  }
}
else {
-  upload-map_flags = PIPE_TRANSFER_WRITE |
-  PIPE_TRANSFER_UNSYNCHRONIZED |
-  PIPE_TRANSFER_FLUSH_EXPLICIT;
+  if (usage == UPLOAD_MGR_UPLOAD) {
+  upload-map_flags = PIPE_TRANSFER_WRITE |
+  PIPE_TRANSFER_UNSYNCHRONIZED |
+  PIPE_TRANSFER_FLUSH_EXPLICIT

[Mesa-dev] [PATCH 0/2] radeon: Use the DMA engine for buffer downloads

2014-03-03 Thread Niels Ole Salscheider

Using the DMA engine for buffer downloads vastly improves performance. This is
because reads from VRAM by the CPU are slow because of the high latency of the
PCIe bus.

The first patch allows u_upload_mgr to be used for downloads, too. The second
patch then uses u_upload_mgr in the radeon driver for downloads.
I considered to rename u_upload_mgr to u_transfer_mgr since it might be
confusing that an upload manager can be used for downloads. But then again we
also have transfers so that u_transfer_mgr might also be confusing. Thus, I
decided not to rename it for now.

Without these patches, the buffer_bandwidth benchmark from uCLbench gives me:

./buffer_bandwidth --size=2000 --iterations=100
# device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant memory,
32 KB local memory)
1/1 direct 2000 Bytes   759.29 MB/s(HD) 17.13 MB/s(DD)
14.61 MB/s(DH)

With these paches, the read performance is much better:

./buffer_bandwidth --size=2000 --iterations=100
# device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant memory,
32 KB local memory)
1/1 direct 2000 Bytes   759.90 MB/s(HD) 613.49 MB/s(DD)
1841.07 MB/s(DH)

Judging by these numbers, it might even make sense to use the DMA engine for
larger buffer downloads...

Niels Ole Salscheider (2):
  util/u_upload_mgr: Allow to also use it for downloads
  radeon: Use transfer manager for buffer downloads

 src/gallium/auxiliary/hud/hud_context.c |  3 +-
 src/gallium/auxiliary/util/u_blitter.c  |  3 +-
 src/gallium/auxiliary/util/u_upload_mgr.c   | 49 +++-
 src/gallium/auxiliary/util/u_upload_mgr.h   | 13 -
 src/gallium/auxiliary/util/u_vbuf.c |  3 +-
 src/gallium/auxiliary/vl/vl_compositor.c|  3 +-
 src/gallium/drivers/ilo/ilo_context.c   |  3 +-
 src/gallium/drivers/r300/r300_context.c |  3 +-
 src/gallium/drivers/radeon/r600_buffer_common.c | 78 +++--
 src/gallium/drivers/radeon/r600_pipe_common.c   | 14 -
 src/gallium/drivers/radeon/r600_pipe_common.h   |  1 +
 src/mesa/state_tracker/st_context.c |  9 ++-
 12 files changed, 136 insertions(+), 46 deletions(-)

-- 
1.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] radeon: Use transfer manager for buffer downloads

2014-03-03 Thread Niels Ole Salscheider

Using DMA for reads is much faster.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/radeon/r600_buffer_common.c | 78 +++--
 src/gallium/drivers/radeon/r600_pipe_common.c   | 11 
 src/gallium/drivers/radeon/r600_pipe_common.h   |  1 +
 3 files changed, 72 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index 340ebb2..c910107 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -260,6 +260,46 @@ static void *r600_buffer_transfer_map(struct pipe_context 
*ctx,
/* At this point, the buffer is always idle (we checked it 
above). */
usage |= PIPE_TRANSFER_UNSYNCHRONIZED;
}
+   /* Using DMA for larger reads is much faster */
+   else if ((usage  PIPE_TRANSFER_READ) 
+!(usage  PIPE_TRANSFER_WRITE) 
+(rbuffer-domains == RADEON_DOMAIN_VRAM) 
+(rscreen-has_cp_dma ||
+ (rscreen-has_streamout 
+  /* The buffer range must be aligned to 4 with streamout. */
+  box-x % 4 == 0  box-width % 4 == 0))) {
+   unsigned offset;
+   struct r600_resource *staging = NULL;
+
+   u_upload_alloc(rctx-downloader, 0,
+  box-width + (box-x % 
R600_MAP_BUFFER_ALIGNMENT),
+  offset, (struct pipe_resource**)staging, 
(void**)data);
+
+   if (staging) {
+   data += box-x % R600_MAP_BUFFER_ALIGNMENT;
+
+   /* Copy the staging buffer into the original one. */
+   if (rctx-dma_copy(ctx, (struct pipe_resource*)staging, 
0,
+box-x % 
R600_MAP_BUFFER_ALIGNMENT,
+0, 0, resource, level, box)) {
+   rctx-rings.gfx.flush(rctx, 0);
+   if (rctx-rings.dma.cs)
+   rctx-rings.dma.flush(rctx, 0);
+
+   /* Wait for any offloaded CS flush to complete
+* to avoid busy-waiting in the winsys. */
+   rctx-ws-cs_sync_flush(rctx-rings.gfx.cs);
+   if (rctx-rings.dma.cs)
+   
rctx-ws-cs_sync_flush(rctx-rings.dma.cs);
+
+   rctx-ws-buffer_wait(staging-buf, 
RADEON_USAGE_READWRITE);
+   return r600_buffer_get_transfer(ctx, resource, 
level, usage, box,
+   ptransfer, 
data, staging, offset);
+   } else {
+   pipe_resource_reference((struct 
pipe_resource**)staging, NULL);
+   }
+   }
+   }
 
data = r600_buffer_map_sync_with_rings(rctx, rbuffer, usage);
if (!data) {
@@ -279,24 +319,26 @@ static void r600_buffer_transfer_unmap(struct 
pipe_context *ctx,
struct r600_resource *rbuffer = r600_resource(transfer-resource);
 
if (rtransfer-staging) {
-   struct pipe_resource *dst, *src;
-   unsigned soffset, doffset, size;
-   struct pipe_box box;
-
-   dst = transfer-resource;
-   src = rtransfer-staging-b.b;
-   size = transfer-box.width;
-   doffset = transfer-box.x;
-   soffset = rtransfer-offset + transfer-box.x % 
R600_MAP_BUFFER_ALIGNMENT;
-
-   u_box_1d(soffset, size, box);
-
-   /* Copy the staging buffer into the original one. */
-   if (!(size % 4)  !(doffset % 4)  !(soffset % 4) 
-   rctx-dma_copy(ctx, dst, 0, doffset, 0, 0, src, 0, box)) {
-   /* DONE. */
-   } else {
-   ctx-resource_copy_region(ctx, dst, 0, doffset, 0, 0, 
src, 0, box);
+   if (rtransfer-transfer.usage  PIPE_TRANSFER_WRITE) {
+   struct pipe_resource *dst, *src;
+   unsigned soffset, doffset, size;
+   struct pipe_box box;
+
+   dst = transfer-resource;
+   src = rtransfer-staging-b.b;
+   size = transfer-box.width;
+   doffset = transfer-box.x;
+   soffset = rtransfer-offset + transfer-box.x % 
R600_MAP_BUFFER_ALIGNMENT;
+
+   u_box_1d(soffset, size, box);
+
+   /* Copy the staging buffer into the original one. */
+   if (!(size % 4)  !(doffset % 4)  !(soffset % 4) 
+   rctx-dma_copy(ctx, dst, 0, doffset, 0, 0, src, 0, 
box

[Mesa-dev] [PATCH] winsys/radeon: remove superfluous distinction of cases

2013-12-18 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 20 +---
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
index acb12b2..d8ad297 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
@@ -482,22 +482,12 @@ static void radeon_drm_cs_flush(struct radeon_winsys_cs 
*rcs, unsigned flags, ui
/* pad DMA ring to 8 DWs to meet CP fetch alignment requirements
 * r6xx, requires at least 4 dw alignment to avoid a hw bug.
 */
-if (flags  RADEON_FLUSH_COMPUTE) {
-   if (cs-ws-info.chip_class = SI) {
-   while (rcs-cdw  7)
-   OUT_CS(cs-base, 0x8000); /* type2 nop 
packet */
-   } else {
-   while (rcs-cdw  7)
-   OUT_CS(cs-base, 0x1000); /* type3 nop 
packet */
-   }
+   if (cs-ws-info.chip_class = SI) {
+   while (rcs-cdw  7)
+   OUT_CS(cs-base, 0x8000); /* type2 nop packet 
*/
} else {
-   if (cs-ws-info.chip_class = SI) {
-   while (rcs-cdw  7)
-   OUT_CS(cs-base, 0x8000); /* type2 nop 
packet */
-   } else {
-   while (rcs-cdw  7)
-   OUT_CS(cs-base, 0x1000); /* type3 nop 
packet */
-   }
+   while (rcs-cdw  7)
+   OUT_CS(cs-base, 0x1000); /* type3 nop packet 
*/
}
break;
 case RING_UVD:
-- 
1.8.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] clover: Calculate the optimal work group size when local_size is NULL

2013-10-29 Thread Niels Ole Salscheider

Hi Tom,

this has been on my todo list for quite a while.

Your patch looks good to me, but in my experience a block with approximately 
the same size for each dimension gives slightly better performance in many 
cases when compared to one where one dimension is significantly larger.
Maybe you could initialise the size for each dimension to 1 and multiply them 
by 2 in a round-robin fashion as long as feasible.

Regards,

Ole
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] st/clover: Always flush the queue when waiting on an hard_event

2013-10-06 Thread Niels Ole Salscheider

Am Donnerstag, 3. Oktober 2013, 11:08:26 schrieb Francisco Jerez:
 Niels Ole Salscheider niels_...@salscheider-online.de writes:
  Do you have any example of a real world application that relies on this?
  Or at least some reasonable use case?
  
  The problem is that the queue is only cleared from already signalled
  events
  when we flush it. And we might not do this if the user only calls
  clWaitForEvents once the corresponding event has already been signalled.
  
  I am fine with not flushing the queue, but we should at least make sure
  that signalled events are freed early enough.

 So your application doesn't call clFlush() explicitly nor any blocking
 call on that specific event and it stalls forever polling an event with
 clGetEventInfo() that never gets flushed to the GPU?  Is that the
 problem you've seen?  Is it an open source application?

Unfortunately, the application is not open source and I am not allowed to give 
the code to someone else, even though I have access to it.

The application calls clFinish and clWaitForEvents, but not clFlush. I think 
the problem is that the kernels might already have finished execution when the 
application calls these functions. Because of that the queue is not flushed and 
thus not cleared.
However, I cannot reproduce it right now.

Regards,

Ole
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] st/clover: Always flush the queue when waiting on an hard_event

2013-10-03 Thread Niels Ole Salscheider

 I don't think this is right, with this patch we remove *all* events from
 the command queue, signalled or not, every time the command queue is
 flushed.

You are right, I got the logic wrong here (see also 
http://lists.freedesktop.org/archives/mesa-dev/2013-September/044363.html).

The problem is that I have an application that causes a leak of event objects. 
That is, some events are never deleted from the queue. I will have to debug 
this further, but I am somewhat busy right now since I a have just relocated.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] st/clover: Always flush the queue when waiting on an hard_event

2013-10-03 Thread Niels Ole Salscheider

 Do you have any example of a real world application that relies on this?
 Or at least some reasonable use case?

The problem is that the queue is only cleared from already signalled events 
when we flush it. And we might not do this if the user only calls 
clWaitForEvents once the corresponding event has already been signalled.

I am fine with not flushing the queue, but we should at least make sure that 
signalled events are freed early enough.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] st/clover: Clear the complete queue

2013-09-26 Thread Niels Ole Salscheider

Ping
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] st/clover: Always flush the queue when waiting on an hard_event

2013-09-06 Thread Niels Ole Salscheider

The OpenCL spec says:
Any blocking commands queued in a command-queue and clReleaseCommandQueue
perform an implicit flush of the command-queue. These blocking commands are
[...] or clWaitForEvents.

Flushing the queue unconditionally also helps to actually clear the
queued_events list of the queue object.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/state_trackers/clover/core/event.cpp | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/gallium/state_trackers/clover/core/event.cpp 
b/src/gallium/state_trackers/clover/core/event.cpp
index cbb97bf..8b5acd0 100644
--- a/src/gallium/state_trackers/clover/core/event.cpp
+++ b/src/gallium/state_trackers/clover/core/event.cpp
@@ -153,8 +153,7 @@ void
 hard_event::wait() const {
pipe_screen *screen = queue()-dev.pipe;
 
-   if (status() == CL_QUEUED)
-  queue()-flush();
+   queue()-flush();
 
if (!__fence ||
!screen-fence_finish(screen, __fence, PIPE_TIMEOUT_INFINITE))
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/6] st/clover: Unreference fences as early as possible

2013-09-06 Thread Niels Ole Salscheider

While unreferencing fences as early as possible is not a bad idea, this patch 
hides the underlying problem. That is, events are never deleted from the 
queued_events list of the queue object if their fences are signalled before 
the queue is flushed.
I will send a patch that fixes the problem shortly.

Ole
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] st/clover: Clear the complete queue

2013-09-06 Thread Niels Ole Salscheider

Events that are already signalled can be removed from the queue, too.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/state_trackers/clover/core/queue.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/clover/core/queue.cpp 
b/src/gallium/state_trackers/clover/core/queue.cpp
index 0b1c494..500a636 100644
--- a/src/gallium/state_trackers/clover/core/queue.cpp
+++ b/src/gallium/state_trackers/clover/core/queue.cpp
@@ -56,7 +56,7 @@ _cl_command_queue::flush() {
   pipe-flush(pipe, fence, 0);
   std::for_each(first, last, [](event_ptr ev) { ev-fence(fence); });
   screen-fence_reference(screen, fence, NULL);
-  queued_events.erase(first, last);
+  queued_events.clear();
}
 }
 
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radeonsi: Do not suspend timer queries

2013-08-28 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/radeonsi/r600.h|  1 +
 src/gallium/drivers/radeonsi/r600_hw_context.c | 28 ++
 src/gallium/drivers/radeonsi/r600_query.c  |  7 +--
 src/gallium/drivers/radeonsi/radeonsi_pipe.c   |  2 +-
 src/gallium/drivers/radeonsi/radeonsi_pipe.h   |  4 ++--
 src/gallium/drivers/radeonsi/si_state_draw.c   |  2 +-
 6 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/r600.h 
b/src/gallium/drivers/radeonsi/r600.h
index ce0468d..ac3b2f1 100644
--- a/src/gallium/drivers/radeonsi/r600.h
+++ b/src/gallium/drivers/radeonsi/r600.h
@@ -102,6 +102,7 @@ void si_context_emit_fence(struct r600_context *ctx, struct 
si_resource *fence,
unsigned offset, unsigned value);
 
 void r600_context_draw_opaque_count(struct r600_context *ctx, struct 
r600_so_target *t);
+bool si_is_timer_query(unsigned type);
 bool si_query_needs_begin(unsigned type);
 void si_need_cs_space(struct r600_context *ctx, unsigned num_dw, boolean 
count_draw_in);
 
diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c 
b/src/gallium/drivers/radeonsi/r600_hw_context.c
index 59b2d70..f050b3b 100644
--- a/src/gallium/drivers/radeonsi/r600_hw_context.c
+++ b/src/gallium/drivers/radeonsi/r600_hw_context.c
@@ -110,6 +110,13 @@ err:
return;
 }
 
+bool si_is_timer_query(unsigned type)
+{
+   return type == PIPE_QUERY_TIME_ELAPSED ||
+   type == PIPE_QUERY_TIMESTAMP ||
+   type == PIPE_QUERY_TIMESTAMP_DISJOINT;
+}
+
 bool si_query_needs_begin(unsigned type)
 {
return type != PIPE_QUERY_TIMESTAMP;
@@ -139,7 +146,7 @@ void si_need_cs_space(struct r600_context *ctx, unsigned 
num_dw,
}
 
/* Count in queries_suspend. */
-   num_dw += ctx-num_cs_dw_queries_suspend;
+   num_dw += ctx-num_cs_dw_nontimer_queries_suspend;
 
/* Count in streamout_end at the end of CS. */
num_dw += ctx-num_cs_dw_streamout_end;
@@ -211,7 +218,7 @@ void si_context_flush(struct r600_context *ctx, unsigned 
flags)
return;
 
/* suspend queries */
-   if (ctx-num_cs_dw_queries_suspend) {
+   if (ctx-num_cs_dw_nontimer_queries_suspend) {
r600_context_queries_suspend(ctx);
queries_suspended = true;
}
@@ -506,7 +513,9 @@ void r600_query_begin(struct r600_context *ctx, struct 
r600_query *query)
cs-buf[cs-cdw++] = PKT3(PKT3_NOP, 0, 0);
cs-buf[cs-cdw++] = r600_context_bo_reloc(ctx, query-buffer, 
RADEON_USAGE_WRITE);
 
-   ctx-num_cs_dw_queries_suspend += query-num_cs_dw;
+   if (!si_is_timer_query(query-type)) {
+   ctx-num_cs_dw_nontimer_queries_suspend += query-num_cs_dw;
+   }
 }
 
 void r600_query_end(struct r600_context *ctx, struct r600_query *query)
@@ -565,7 +574,10 @@ void r600_query_end(struct r600_context *ctx, struct 
r600_query *query)
cs-buf[cs-cdw++] = r600_context_bo_reloc(ctx, query-buffer, 
RADEON_USAGE_WRITE);
 
query-results_end = (query-results_end + query-result_size) % 
query-buffer-b.b.width0;
-   ctx-num_cs_dw_queries_suspend -= query-num_cs_dw;
+
+   if (si_query_needs_begin(query-type)  
!si_is_timer_query(query-type)) {
+   ctx-num_cs_dw_nontimer_queries_suspend -= query-num_cs_dw;
+   }
 }
 
 void r600_query_predication(struct r600_context *ctx, struct r600_query 
*query, int operation,
@@ -712,19 +724,19 @@ void r600_context_queries_suspend(struct r600_context 
*ctx)
 {
struct r600_query *query;
 
-   LIST_FOR_EACH_ENTRY(query, ctx-active_query_list, list) {
+   LIST_FOR_EACH_ENTRY(query, ctx-active_nontimer_query_list, list) {
r600_query_end(ctx, query);
}
-   assert(ctx-num_cs_dw_queries_suspend == 0);
+   assert(ctx-num_cs_dw_nontimer_queries_suspend == 0);
 }
 
 void r600_context_queries_resume(struct r600_context *ctx)
 {
struct r600_query *query;
 
-   assert(ctx-num_cs_dw_queries_suspend == 0);
+   assert(ctx-num_cs_dw_nontimer_queries_suspend == 0);
 
-   LIST_FOR_EACH_ENTRY(query, ctx-active_query_list, list) {
+   LIST_FOR_EACH_ENTRY(query, ctx-active_nontimer_query_list, list) {
r600_query_begin(ctx, query);
}
 }
diff --git a/src/gallium/drivers/radeonsi/r600_query.c 
b/src/gallium/drivers/radeonsi/r600_query.c
index 927577c..aa51e74 100644
--- a/src/gallium/drivers/radeonsi/r600_query.c
+++ b/src/gallium/drivers/radeonsi/r600_query.c
@@ -50,7 +50,10 @@ static void r600_begin_query(struct pipe_context *ctx, 
struct pipe_query *query)
memset(rquery-result, 0, sizeof(rquery-result));
rquery-results_start = rquery-results_end;
r600_query_begin(rctx, (struct r600_query *)query);
-   LIST_ADDTAIL(rquery-list, rctx-active_query_list);
+
+   if (!si_is_timer_query(rquery-type)) {
+   LIST_ADDTAIL

[Mesa-dev] [PATCH 3/6] st/clover: Unreference fences as early as possible

2013-08-09 Thread Niels Ole Salscheider

This makes sure that there are not too many concurrent fences.

Also, simplify status handling by keeping track of the current state.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/state_trackers/clover/core/event.cpp | 29 +++-
 src/gallium/state_trackers/clover/core/event.hpp | 12 +-
 2 Dateien geändert, 24 Zeilen hinzugefügt(+), 17 Zeilen entfernt(-)

diff --git a/src/gallium/state_trackers/clover/core/event.cpp 
b/src/gallium/state_trackers/clover/core/event.cpp
index cbb97bf..13e8130 100644
--- a/src/gallium/state_trackers/clover/core/event.cpp
+++ b/src/gallium/state_trackers/clover/core/event.cpp
@@ -28,7 +28,7 @@ using namespace clover;
 _cl_event::_cl_event(clover::context ctx,
  std::vectorclover::event * deps,
  action action_ok, action action_fail) :
-   ctx(ctx), __status(0), wait_count(1),
+   ctx(ctx), __status(CL_QUEUED), wait_count(1),
action_ok(action_ok), action_fail(action_fail) {
for (auto ev : deps)
   ev-chain(this);
@@ -114,6 +114,7 @@ hard_event::trigger() {
  pipe-end_query(pipe, __query_end);
  __ts_submit = screen-get_timestamp(screen);
   }
+  __status = CL_SUBMITTED;
 
   while (!__chain.empty()) {
  __chain.back()-trigger();
@@ -123,20 +124,21 @@ hard_event::trigger() {
 }
 
 cl_int
-hard_event::status() const {
+hard_event::status() {
pipe_screen *screen = queue()-dev.pipe;
 
-   if (__status  0)
+   if (__status != CL_SUBMITTED)
   return __status;
 
-   else if (!__fence)
-  return CL_QUEUED;
-
-   else if (!screen-fence_signalled(screen, __fence))
+   else if (__fence  !screen-fence_signalled(screen, __fence))
   return CL_SUBMITTED;
 
-   else
+   else {
+  if (__fence)
+ screen-fence_reference(screen, __fence, NULL);
+  __status = CL_COMPLETE;
   return CL_COMPLETE;
+   }
 }
 
 cl_command_queue
@@ -150,15 +152,20 @@ hard_event::command() const {
 }
 
 void
-hard_event::wait() const {
+hard_event::wait() {
pipe_screen *screen = queue()-dev.pipe;
 
if (status() == CL_QUEUED)
   queue()-flush();
 
+   if (status() == CL_COMPLETE)
+  return;
+
if (!__fence ||
!screen-fence_finish(screen, __fence, PIPE_TIMEOUT_INFINITE))
   throw error(CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST);
+   screen-fence_reference(screen, __fence, NULL);
+   __status = CL_COMPLETE;
 }
 
 cl_ulong
@@ -231,7 +238,7 @@ soft_event::trigger() {
 }
 
 cl_int
-soft_event::status() const {
+soft_event::status() {
if (__status  0)
   return __status;
 
@@ -256,7 +263,7 @@ soft_event::command() const {
 }
 
 void
-soft_event::wait() const {
+soft_event::wait() {
for (auto ev : deps)
   ev-wait();
 
diff --git a/src/gallium/state_trackers/clover/core/event.hpp 
b/src/gallium/state_trackers/clover/core/event.hpp
index de92de0..611b233 100644
--- a/src/gallium/state_trackers/clover/core/event.hpp
+++ b/src/gallium/state_trackers/clover/core/event.hpp
@@ -61,10 +61,10 @@ public:
void abort(cl_int status);
bool signalled() const;
 
-   virtual cl_int status() const = 0;
+   virtual cl_int status() = 0;
virtual cl_command_queue queue() const = 0;
virtual cl_command_type command() const = 0;
-   virtual void wait() const = 0;
+   virtual void wait() = 0;
 
clover::context ctx;
 
@@ -101,10 +101,10 @@ namespace clover {
 
   virtual void trigger();
 
-  virtual cl_int status() const;
+  virtual cl_int status();
   virtual cl_command_queue queue() const;
   virtual cl_command_type command() const;
-  virtual void wait() const;
+  virtual void wait();
 
   cl_ulong ts_queued() const;
   cl_ulong ts_submit() const;
@@ -138,10 +138,10 @@ namespace clover {
 
   virtual void trigger();
 
-  virtual cl_int status() const;
+  virtual cl_int status();
   virtual cl_command_queue queue() const;
   virtual cl_command_type command() const;
-  virtual void wait() const;
+  virtual void wait();
};
 }
 
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/6] radeonsi: copy r600_get_timestamp

2013-08-09 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/radeonsi/radeonsi_pipe.c | 9 +
 1 Datei geändert, 9 Zeilen hinzugefügt(+)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c 
b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
index 3ba8232..7ae5598 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
@@ -779,6 +779,14 @@ static int r600_init_tiling(struct r600_screen *rscreen)
return evergreen_interpret_tiling(rscreen, tiling_config);
 }
 
+static uint64_t r600_get_timestamp(struct pipe_screen *screen)
+{
+   struct r600_screen *rscreen = (struct r600_screen*)screen;
+
+   return 100 * rscreen-ws-query_value(rscreen-ws, 
RADEON_TIMESTAMP) /
+   rscreen-info.r600_clock_crystal_freq;
+}
+
 static unsigned radeon_family_from_device(unsigned device)
 {
switch (device) {
@@ -830,6 +838,7 @@ struct pipe_screen *radeonsi_screen_create(struct 
radeon_winsys *ws)
rscreen-screen.get_shader_param = r600_get_shader_param;
rscreen-screen.get_paramf = r600_get_paramf;
rscreen-screen.get_compute_param = r600_get_compute_param;
+   rscreen-screen.get_timestamp = r600_get_timestamp;
rscreen-screen.is_format_supported = si_is_format_supported;
rscreen-screen.context_create = r600_create_context;
rscreen-screen.fence_reference = r600_fence_reference;
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/6] radeonsi: Implement PIPE_QUERY_TIMESTAMP

2013-08-09 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/radeonsi/r600.h|  1 +
 src/gallium/drivers/radeonsi/r600_hw_context.c | 31 ++
 src/gallium/drivers/radeonsi/r600_query.c  | 14 +++-
 src/gallium/drivers/radeonsi/radeonsi_pipe.c   |  2 +-
 4 Dateien geändert, 46 Zeilen hinzugefügt(+), 2 Zeilen entfernt(-)

diff --git a/src/gallium/drivers/radeonsi/r600.h 
b/src/gallium/drivers/radeonsi/r600.h
index 8f35cc2..ce0468d 100644
--- a/src/gallium/drivers/radeonsi/r600.h
+++ b/src/gallium/drivers/radeonsi/r600.h
@@ -102,6 +102,7 @@ void si_context_emit_fence(struct r600_context *ctx, struct 
si_resource *fence,
unsigned offset, unsigned value);
 
 void r600_context_draw_opaque_count(struct r600_context *ctx, struct 
r600_so_target *t);
+bool si_query_needs_begin(unsigned type);
 void si_need_cs_space(struct r600_context *ctx, unsigned num_dw, boolean 
count_draw_in);
 
 int si_context_init(struct r600_context *ctx);
diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c 
b/src/gallium/drivers/radeonsi/r600_hw_context.c
index 25c972b..7de3745 100644
--- a/src/gallium/drivers/radeonsi/r600_hw_context.c
+++ b/src/gallium/drivers/radeonsi/r600_hw_context.c
@@ -110,6 +110,11 @@ err:
return;
 }
 
+bool si_query_needs_begin(unsigned type)
+{
+   return type != PIPE_QUERY_TIMESTAMP;
+}
+
 /* initialize */
 void si_need_cs_space(struct r600_context *ctx, unsigned num_dw,
boolean count_draw_in)
@@ -340,6 +345,12 @@ static boolean r600_query_result(struct r600_context *ctx, 
struct r600_query *qu
results_base = (results_base + 16) % 
query-buffer-b.b.width0;
}
break;
+   case PIPE_QUERY_TIMESTAMP:
+   {
+   uint32_t *current_result = (uint32_t*)map;
+   query-result.u64 = (uint64_t)current_result[0] | 
(uint64_t)current_result[1]  32;
+   break;
+   }
case PIPE_QUERY_TIME_ELAPSED:
while (results_base != query-results_end) {
query-result.u64 +=
@@ -485,6 +496,19 @@ void r600_query_end(struct r600_context *ctx, struct 
r600_query *query)
 {
struct radeon_winsys_cs *cs = ctx-cs;
uint64_t va;
+   unsigned new_results_end;
+
+   /* The queries which need begin already called this in begin_query. */
+   if (!si_query_needs_begin(query-type)) {
+   si_need_cs_space(ctx, query-num_cs_dw, TRUE);
+
+   new_results_end = (query-results_end + query-result_size) % 
query-buffer-b.b.width0;
+
+   /* collect current results if query buffer is full */
+   if (new_results_end == query-results_start) {
+   r600_query_result(ctx, query, TRUE);
+   }
+   }
 
va = r600_resource_va(ctx-screen-screen, (void*)query-buffer);
/* emit end query */
@@ -508,6 +532,8 @@ void r600_query_end(struct r600_context *ctx, struct 
r600_query *query)
break;
case PIPE_QUERY_TIME_ELAPSED:
va += query-results_end + query-result_size/2;
+   /* fall through */
+   case PIPE_QUERY_TIMESTAMP:
cs-buf[cs-cdw++] = PKT3(PKT3_EVENT_WRITE_EOP, 4, 0);
cs-buf[cs-cdw++] = 
EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_TS_EVENT) | EVENT_INDEX(5);
cs-buf[cs-cdw++] = va;
@@ -585,6 +611,10 @@ struct r600_query *r600_context_query_create(struct 
r600_context *ctx, unsigned
query-result_size = 16 * ctx-max_db;
query-num_cs_dw = 6;
break;
+   case PIPE_QUERY_TIMESTAMP:
+   query-result_size = 8;
+   query-num_cs_dw = 8;
+   break;
case PIPE_QUERY_TIME_ELAPSED:
query-result_size = 16;
query-num_cs_dw = 8;
@@ -648,6 +678,7 @@ boolean r600_context_query_result(struct r600_context *ctx,
case PIPE_QUERY_SO_OVERFLOW_PREDICATE:
*result_b = query-result.b;
break;
+   case PIPE_QUERY_TIMESTAMP:
case PIPE_QUERY_TIME_ELAPSED:
*result_u64 = (100 * query-result.u64) / 
ctx-screen-info.r600_clock_crystal_freq;
break;
diff --git a/src/gallium/drivers/radeonsi/r600_query.c 
b/src/gallium/drivers/radeonsi/r600_query.c
index 0162cce..927577c 100644
--- a/src/gallium/drivers/radeonsi/r600_query.c
+++ b/src/gallium/drivers/radeonsi/r600_query.c
@@ -42,6 +42,11 @@ static void r600_begin_query(struct pipe_context *ctx, 
struct pipe_query *query)
struct r600_context *rctx = (struct r600_context *)ctx;
struct r600_query *rquery = (struct r600_query *)query;
 
+   if (!si_query_needs_begin(rquery-type)) {
+   assert(0);
+   return;
+   }
+
memset(rquery-result, 0, sizeof(rquery-result));
rquery-results_start

[Mesa-dev] [PATCH 2/6] st/clover: Add event to deps even if it has been triggered

2013-08-09 Thread Niels Ole Salscheider

The command is submitted once the event has been triggered, but it might not
have completed yet. Therefore, we have to add it to deps in order to wait on it.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/state_trackers/clover/core/event.cpp | 2 +-
 1 Datei geändert, 1 Zeile hinzugefügt(+), 1 Zeile entfernt(-)

diff --git a/src/gallium/state_trackers/clover/core/event.cpp 
b/src/gallium/state_trackers/clover/core/event.cpp
index de21f0c..cbb97bf 100644
--- a/src/gallium/state_trackers/clover/core/event.cpp
+++ b/src/gallium/state_trackers/clover/core/event.cpp
@@ -58,8 +58,8 @@ _cl_event::chain(clover::event *ev) {
if (wait_count) {
   ev-wait_count++;
   __chain.push_back(ev);
-  ev-deps.push_back(this);
}
+   ev-deps.push_back(this);
 }
 
 hard_event::hard_event(clover::command_queue q, cl_command_type command,
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 6/6] radeonsi: Handle additional PIPE_COMPUTE_CAP_*

2013-08-09 Thread Niels Ole Salscheider

This patch adds support for:
PIPE_COMPUTE_CAP_MAX_INPUT_SIZE
PIPE_COMPUTE_CAP_MAX_LOCAL_SIZE

Return the values reported by the closed source driver for now.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/radeonsi/radeonsi_pipe.c | 15 ++-
 1 Datei geändert, 14 Zeilen hinzugefügt(+), 1 Zeile entfernt(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c 
b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
index 7ae5598..47f5191 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
@@ -602,7 +602,20 @@ static int r600_get_compute_param(struct pipe_screen 
*screen,
*max_global_size = 20;
}
return sizeof(uint64_t);
-
+   case PIPE_COMPUTE_CAP_MAX_LOCAL_SIZE:
+   if (ret) {
+   uint64_t *max_local_size = ret;
+   /* Value reported by the closed source driver. */
+   *max_local_size = 32768;
+   }
+   return sizeof(uint64_t);
+   case PIPE_COMPUTE_CAP_MAX_INPUT_SIZE:
+   if (ret) {
+   uint64_t *max_input_size = ret;
+   /* Value reported by the closed source driver. */
+   *max_input_size = 1024;
+   }
+   return sizeof(uint64_t);
case PIPE_COMPUTE_CAP_MAX_MEM_ALLOC_SIZE:
if (ret) {
uint64_t max_global_size;
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/6] st/clover: Profiling support

2013-08-09 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/state_trackers/clover/api/event.cpp  |  26 -
 src/gallium/state_trackers/clover/core/event.cpp | 116 ---
 src/gallium/state_trackers/clover/core/event.hpp |  18 +++-
 3 Dateien geändert, 142 Zeilen hinzugefügt(+), 18 Zeilen entfernt(-)

diff --git a/src/gallium/state_trackers/clover/api/event.cpp 
b/src/gallium/state_trackers/clover/api/event.cpp
index 39a647b..ea1576c 100644
--- a/src/gallium/state_trackers/clover/api/event.cpp
+++ b/src/gallium/state_trackers/clover/api/event.cpp
@@ -217,7 +217,31 @@ clEnqueueWaitForEvents(cl_command_queue q, cl_uint num_evs,
 PUBLIC cl_int
 clGetEventProfilingInfo(cl_event ev, cl_profiling_info param,
 size_t size, void *buf, size_t *size_ret) {
-   return CL_PROFILING_INFO_NOT_AVAILABLE;
+   hard_event *hev = dynamic_casthard_event *(ev);
+   soft_event *sev = dynamic_castsoft_event *(ev);
+
+   if (!hev  !sev)
+  return CL_INVALID_EVENT;
+   if (!hev || !(hev-queue()-props()  CL_QUEUE_PROFILING_ENABLE) ||
+   hev-status() != CL_COMPLETE)
+  return CL_PROFILING_INFO_NOT_AVAILABLE;
+
+   switch (param) {
+   case CL_PROFILING_COMMAND_QUEUED:
+  return scalar_propertycl_ulong(buf, size, size_ret, hev-ts_queued());
+
+   case CL_PROFILING_COMMAND_SUBMIT:
+  return scalar_propertycl_ulong(buf, size, size_ret, hev-ts_submit());
+
+   case CL_PROFILING_COMMAND_START:
+  return scalar_propertycl_ulong(buf, size, size_ret, hev-ts_start());
+
+   case CL_PROFILING_COMMAND_END:
+  return scalar_propertycl_ulong(buf, size, size_ret, hev-ts_end());
+
+   default:
+  return CL_INVALID_VALUE;
+   }
 }
 
 PUBLIC cl_int
diff --git a/src/gallium/state_trackers/clover/core/event.cpp 
b/src/gallium/state_trackers/clover/core/event.cpp
index 93d3b58..de21f0c 100644
--- a/src/gallium/state_trackers/clover/core/event.cpp
+++ b/src/gallium/state_trackers/clover/core/event.cpp
@@ -38,18 +38,6 @@ _cl_event::~_cl_event() {
 }
 
 void
-_cl_event::trigger() {
-   if (!--wait_count) {
-  action_ok(*this);
-
-  while (!__chain.empty()) {
- __chain.back()-trigger();
- __chain.pop_back();
-  }
-   }
-}
-
-void
 _cl_event::abort(cl_int status) {
__status = status;
action_fail(*this);
@@ -77,14 +65,61 @@ _cl_event::chain(clover::event *ev) {
 hard_event::hard_event(clover::command_queue q, cl_command_type command,
std::vectorclover::event * deps, action action) :
_cl_event(q.ctx, deps, action, [](event ev){}),
-   __queue(q), __command(command), __fence(NULL) {
+   __queue(q), __command(command), __fence(NULL),
+   __query_start(NULL), __query_end(NULL) {
q.sequence(this);
+
+   if(q.props()  CL_QUEUE_PROFILING_ENABLE) {
+  pipe_screen *screen = q.dev.pipe;
+  __ts_queued = screen-get_timestamp(screen);
+   }
+
trigger();
 }
 
 hard_event::~hard_event() {
pipe_screen *screen = queue()-dev.pipe;
+   pipe_context *pipe = queue()-pipe;
screen-fence_reference(screen, __fence, NULL);
+
+   if(__query_start) {
+  pipe-destroy_query(pipe, __query_start);
+  __query_start = 0;
+   }
+
+   if(__query_end) {
+  pipe-destroy_query(pipe, __query_end);
+  __query_end = 0;
+   }
+}
+
+void
+hard_event::trigger() {
+   if (!--wait_count) {
+   /* XXX: Currently, a timestamp query gives wrong results for memory
+* transfers. This is, because we use memcpy instead of the DMA engines. */
+
+  if(queue()-props()  CL_QUEUE_PROFILING_ENABLE) {
+ pipe_context *pipe = queue()-pipe;
+ __query_start = pipe-create_query(pipe, PIPE_QUERY_TIMESTAMP);
+ pipe-end_query(queue()-pipe, __query_start);
+  }
+
+  action_ok(*this);
+
+  if(queue()-props()  CL_QUEUE_PROFILING_ENABLE) {
+ pipe_context *pipe = queue()-pipe;
+ pipe_screen *screen = queue()-dev.pipe;
+ __query_end = pipe-create_query(pipe, PIPE_QUERY_TIMESTAMP);
+ pipe-end_query(pipe, __query_end);
+ __ts_submit = screen-get_timestamp(screen);
+  }
+
+  while (!__chain.empty()) {
+ __chain.back()-trigger();
+ __chain.pop_back();
+  }
+   }
 }
 
 cl_int
@@ -126,6 +161,49 @@ hard_event::wait() const {
   throw error(CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST);
 }
 
+cl_ulong
+hard_event::ts_queued() const {
+   return __ts_queued;
+}
+
+cl_ulong
+hard_event::ts_submit() const {
+   return __ts_submit;
+}
+
+cl_ulong
+hard_event::ts_start() {
+   get_query_results();
+   return __ts_start;
+}
+
+cl_ulong
+hard_event::ts_end() {
+   get_query_results();
+   return __ts_end;
+}
+
+void
+hard_event::get_query_results() {
+   pipe_context *pipe = queue()-pipe;
+
+   if(__query_start) {
+  pipe_query_result result;
+  pipe-get_query_result(pipe, __query_start, true, result);
+  __ts_start = result.u64;
+  pipe-destroy_query(pipe, __query_start);
+  __query_start = 0

[Mesa-dev] [PATCH 1/2] R600/SI: Add FMA pattern

2013-08-09 Thread Niels Ole Salscheider

---
 lib/Target/R600/SIInstructions.td |  8 ++--
 test/CodeGen/R600/fma.ll  | 31 +++
 2 Dateien geändert, 37 Zeilen hinzugefügt(+), 2 Zeilen entfernt(-)
 create mode 100644 test/CodeGen/R600/fma.ll

diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index dc41885..dc14609 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -1007,8 +1007,12 @@ def V_BFE_U32 : VOP3_32 0x0148, V_BFE_U32, [];
 def V_BFE_I32 : VOP3_32 0x0149, V_BFE_I32, [];
 def V_BFI_B32 : VOP3_32 0x014a, V_BFI_B32, [];
 defm : BFIPatterns V_BFI_B32;
-def V_FMA_F32 : VOP3_32 0x014b, V_FMA_F32, [];
-def V_FMA_F64 : VOP3_64 0x014c, V_FMA_F64, [];
+def V_FMA_F32 : VOP3_32 0x014b, V_FMA_F32,
+  [(set f32:$dst, (fma f32:$src0, f32:$src1, f32:$src2))]
+;
+def V_FMA_F64 : VOP3_64 0x014c, V_FMA_F64,
+  [(set f64:$dst, (fma f64:$src0, f64:$src1, f64:$src2))]
+;
 //def V_LERP_U8 : VOP3_U8 0x014d, V_LERP_U8, [];
 def V_ALIGNBIT_B32 : VOP3_32 0x014e, V_ALIGNBIT_B32, [];
 def : ROTRPattern V_ALIGNBIT_B32;
diff --git a/test/CodeGen/R600/fma.ll b/test/CodeGen/R600/fma.ll
new file mode 100644
index 000..afef970
--- /dev/null
+++ b/test/CodeGen/R600/fma.ll
@@ -0,0 +1,31 @@
+; RUN: llc  %s -march=r600 -mcpu=SI | FileCheck %s
+
+; CHECK: @fma_f32
+; CHECK: V_FMA_F32 {{VGPR[0-9]+, VGPR[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @fma_f32(float addrspace(1)* %out, float addrspace(1)* %in1,
+ float addrspace(1)* %in2, float addrspace(1)* %in3) {
+   %r0 = load float addrspace(1)* %in1
+   %r1 = load float addrspace(1)* %in2
+   %r2 = load float addrspace(1)* %in3
+   %r3 = tail call float @llvm.fma.f32(float %r0, float %r1, float %r2)
+   store float %r3, float addrspace(1)* %out
+   ret void
+}
+
+declare float @llvm.fma.f32(float, float, float)
+
+; CHECK: @fma_f64
+; CHECK: V_FMA_F64 {{VGPR[0-9]+_VGPR[0-9]+, VGPR[0-9]+_VGPR[0-9]+, 
VGPR[0-9]+_VGPR[0-9]+, VGPR[0-9]+_VGPR[0-9]+}}
+
+define void @fma_f64(double addrspace(1)* %out, double addrspace(1)* %in1,
+ double addrspace(1)* %in2, double addrspace(1)* %in3) {
+   %r0 = load double addrspace(1)* %in1
+   %r1 = load double addrspace(1)* %in2
+   %r2 = load double addrspace(1)* %in3
+   %r3 = tail call double @llvm.fma.f64(double %r0, double %r1, double %r2)
+   store double %r3, double addrspace(1)* %out
+   ret void
+}
+
+declare double @llvm.fma.f64(double, double, double)
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] R600/SI: FMA is faster than fmul and fadd for f64

2013-08-09 Thread Niels Ole Salscheider

---
 lib/Target/R600/SIISelLowering.cpp | 18 ++
 lib/Target/R600/SIISelLowering.h   |  1 +
 test/CodeGen/R600/fmuladd.ll   | 31 +++
 3 Dateien geändert, 50 Zeilen hinzugefügt(+)
 create mode 100644 test/CodeGen/R600/fmuladd.ll

diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index b714fc1..a76e6ee 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -338,6 +338,24 @@ MVT SITargetLowering::getScalarShiftAmountTy(EVT VT) const 
{
   return MVT::i32;
 }
 
+bool SITargetLowering::isFMAFasterThanFMulAndFAdd(EVT VT) const {
+  VT = VT.getScalarType();
+
+  if (!VT.isSimple())
+return false;
+
+  switch (VT.getSimpleVT().SimpleTy) {
+  case MVT::f32:
+return false; /* There is V_MAD_F32 for f32 */
+  case MVT::f64:
+return true;
+  default:
+break;
+  }
+
+  return false;
+}
+
 
//===--===//
 // Custom DAG Lowering Operations
 
//===--===//
diff --git a/lib/Target/R600/SIISelLowering.h b/lib/Target/R600/SIISelLowering.h
index b4202c4..effbf1f 100644
--- a/lib/Target/R600/SIISelLowering.h
+++ b/lib/Target/R600/SIISelLowering.h
@@ -55,6 +55,7 @@ public:
   MachineBasicBlock * BB) const;
   virtual EVT getSetCCResultType(LLVMContext Context, EVT VT) const;
   virtual MVT getScalarShiftAmountTy(EVT VT) const;
+  virtual bool isFMAFasterThanFMulAndFAdd(EVT VT) const;
   virtual SDValue LowerOperation(SDValue Op, SelectionDAG DAG) const;
   virtual SDValue PerformDAGCombine(SDNode *N, DAGCombinerInfo DCI) const;
   virtual SDNode *PostISelFolding(MachineSDNode *N, SelectionDAG DAG) const;
diff --git a/test/CodeGen/R600/fmuladd.ll b/test/CodeGen/R600/fmuladd.ll
new file mode 100644
index 000..ac379f4
--- /dev/null
+++ b/test/CodeGen/R600/fmuladd.ll
@@ -0,0 +1,31 @@
+; RUN: llc  %s -march=r600 -mcpu=SI | FileCheck %s
+
+; CHECK: @fmuladd_f32
+; CHECK: V_MAD_F32 {{VGPR[0-9]+, VGPR[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @fmuladd_f32(float addrspace(1)* %out, float addrspace(1)* %in1,
+ float addrspace(1)* %in2, float addrspace(1)* %in3) {
+   %r0 = load float addrspace(1)* %in1
+   %r1 = load float addrspace(1)* %in2
+   %r2 = load float addrspace(1)* %in3
+   %r3 = tail call float @llvm.fmuladd.f32(float %r0, float %r1, float %r2)
+   store float %r3, float addrspace(1)* %out
+   ret void
+}
+
+declare float @llvm.fmuladd.f32(float, float, float)
+
+; CHECK: @fmuladd_f64
+; CHECK: V_FMA_F64 {{VGPR[0-9]+_VGPR[0-9]+, VGPR[0-9]+_VGPR[0-9]+, 
VGPR[0-9]+_VGPR[0-9]+, VGPR[0-9]+_VGPR[0-9]+}}
+
+define void @fmuladd_f64(double addrspace(1)* %out, double addrspace(1)* %in1,
+ double addrspace(1)* %in2, double addrspace(1)* %in3) 
{
+   %r0 = load double addrspace(1)* %in1
+   %r1 = load double addrspace(1)* %in2
+   %r2 = load double addrspace(1)* %in3
+   %r3 = tail call double @llvm.fmuladd.f64(double %r0, double %r1, double %r2)
+   store double %r3, double addrspace(1)* %out
+   ret void
+}
+
+declare double @llvm.fmuladd.f64(double, double, double)
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] clover: Fix linkage of libOpenCL

2013-08-07 Thread Niels Ole Salscheider

Clover needs the option component of llvm.

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 configure.ac | 4 
 1 Datei geändert, 4 Zeilen hinzugefügt(+)

diff --git a/configure.ac b/configure.ac
index 62d06e0..0dcd2a5 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1617,6 +1617,10 @@ if test x$enable_gallium_llvm = xyes; then
 if $LLVM_CONFIG --components | grep -qw 'irreader'; then
 LLVM_COMPONENTS=${LLVM_COMPONENTS} irreader
 fi
+# LLVM 3.4 requires Option
+if $LLVM_CONFIG --components | grep -qw 'option'; then
+LLVM_COMPONENTS=${LLVM_COMPONENTS} option
+fi
 fi
 DEFINES=${DEFINES} -DHAVE_LLVM=0x0$LLVM_VERSION_INT
 MESA_LLVM=1
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] R600/SI: Implement sint-fp64 conversions

2013-08-07 Thread Niels Ole Salscheider

---
 lib/Target/R600/SIInstrInfo.td| 6 ++
 lib/Target/R600/SIInstructions.td | 8 ++--
 test/CodeGen/R600/fp64_to_sint.ll | 9 +
 test/CodeGen/R600/sint_to_fp64.ll | 9 +
 4 Dateien geändert, 30 Zeilen hinzugefügt(+), 2 Zeilen entfernt(-)
 create mode 100644 test/CodeGen/R600/fp64_to_sint.ll
 create mode 100644 test/CodeGen/R600/sint_to_fp64.ll

diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
index 52af79c..302fa24 100644
--- a/lib/Target/R600/SIInstrInfo.td
+++ b/lib/Target/R600/SIInstrInfo.td
@@ -184,6 +184,12 @@ multiclass VOP1_32 bits8 op, string opName, listdag 
pattern
 multiclass VOP1_64 bits8 op, string opName, listdag pattern
   : VOP1_Helper op, VReg_64, VSrc_64, opName, pattern;
 
+multiclass VOP1_32_64 bits8 op, string opName, listdag pattern
+  : VOP1_Helper op, VReg_32, VSrc_64, opName, pattern;
+
+multiclass VOP1_64_32 bits8 op, string opName, listdag pattern
+  : VOP1_Helper op, VReg_64, VSrc_32, opName, pattern;
+
 multiclass VOP2_Helper bits6 op, RegisterClass vrc, RegisterClass arc,
 string opName, listdag pattern, string revOp {
   def _e32 : VOP2 
diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index 500d15e..efe7a3e 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -603,8 +603,12 @@ defm V_MOV_B32 : VOP1_32 0x0001, V_MOV_B32, [];
 } // End neverHasSideEffects = 1, isMoveImm = 1
 
 defm V_READFIRSTLANE_B32 : VOP1_32 0x0002, V_READFIRSTLANE_B32, [];
-//defm V_CVT_I32_F64 : VOP1_32 0x0003, V_CVT_I32_F64, [];
-//defm V_CVT_F64_I32 : VOP1_64 0x0004, V_CVT_F64_I32, [];
+defm V_CVT_I32_F64 : VOP1_32_64 0x0003, V_CVT_I32_F64,
+  [(set i32:$dst, (fp_to_sint f64:$src0))]
+;
+defm V_CVT_F64_I32 : VOP1_64_32 0x0004, V_CVT_F64_I32,
+  [(set f64:$dst, (sint_to_fp i32:$src0))]
+;
 defm V_CVT_F32_I32 : VOP1_32 0x0005, V_CVT_F32_I32,
   [(set f32:$dst, (sint_to_fp i32:$src0))]
 ;
diff --git a/test/CodeGen/R600/fp64_to_sint.ll 
b/test/CodeGen/R600/fp64_to_sint.ll
new file mode 100644
index 000..42f9f34
--- /dev/null
+++ b/test/CodeGen/R600/fp64_to_sint.ll
@@ -0,0 +1,9 @@
+; RUN: llc  %s -march=r600 -mcpu=SI | FileCheck %s --check-prefix=CHECK
+
+; CHECK: @fp64_to_sint
+; CHECK: V_CVT_I32_F64_e32
+define void @fp64_to_sint(i32 addrspace(1)* %out, double %in) {
+  %result = fptosi double %in to i32
+  store i32 %result, i32 addrspace(1)* %out
+  ret void
+}
diff --git a/test/CodeGen/R600/sint_to_fp64.ll 
b/test/CodeGen/R600/sint_to_fp64.ll
new file mode 100644
index 000..37f67c9
--- /dev/null
+++ b/test/CodeGen/R600/sint_to_fp64.ll
@@ -0,0 +1,9 @@
+; RUN: llc  %s -march=r600 -mcpu=SI | FileCheck %s --check-prefix=CHECK
+
+; CHECK: @sint_to_fp64
+; CHECK: V_CVT_F64_I32_e32
+define void @sint_to_fp64(double addrspace(1)* %out, i32 %in) {
+  %result = sitofp i32 %in to double
+  store double %result, double addrspace(1)* %out
+  ret void
+}
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] R600/SI: Implement fp32-fp64 conversions

2013-08-07 Thread Niels Ole Salscheider

---
 lib/Target/R600/SIISelLowering.cpp | 3 +++
 lib/Target/R600/SIInstructions.td  | 8 ++--
 test/CodeGen/R600/fpext.ll | 9 +
 test/CodeGen/R600/fptrunc.ll   | 9 +
 4 Dateien geändert, 27 Zeilen hinzugefügt(+), 2 Zeilen entfernt(-)
 create mode 100644 test/CodeGen/R600/fpext.ll
 create mode 100644 test/CodeGen/R600/fptrunc.ll

diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index c64027f..b714fc1 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -85,6 +85,9 @@ SITargetLowering::SITargetLowering(TargetMachine TM) :
 
   setLoadExtAction(ISD::SEXTLOAD, MVT::i32, Expand);
 
+  setLoadExtAction(ISD::EXTLOAD, MVT::f32, Expand);
+  setTruncStoreAction(MVT::f64, MVT::f32, Expand);
+
   setOperationAction(ISD::GlobalAddress, MVT::i64, Custom);
 
   setTargetDAGCombine(ISD::SELECT_CC);
diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index efe7a3e..dc41885 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -625,8 +625,12 @@ defm V_MOV_FED_B32 : VOP1_32 0x0009, V_MOV_FED_B32, 
[];
 //defm V_CVT_RPI_I32_F32 : VOP1_32 0x000c, V_CVT_RPI_I32_F32, [];
 //defm V_CVT_FLR_I32_F32 : VOP1_32 0x000d, V_CVT_FLR_I32_F32, [];
 //defm V_CVT_OFF_F32_I4 : VOP1_32 0x000e, V_CVT_OFF_F32_I4, [];
-//defm V_CVT_F32_F64 : VOP1_32 0x000f, V_CVT_F32_F64, [];
-//defm V_CVT_F64_F32 : VOP1_64 0x0010, V_CVT_F64_F32, [];
+defm V_CVT_F32_F64 : VOP1_32_64 0x000f, V_CVT_F32_F64,
+  [(set f32:$dst, (fround f64:$src0))]
+;
+defm V_CVT_F64_F32 : VOP1_64_32 0x0010, V_CVT_F64_F32,
+  [(set f64:$dst, (fextend f32:$src0))]
+;
 //defm V_CVT_F32_UBYTE0 : VOP1_32 0x0011, V_CVT_F32_UBYTE0, [];
 //defm V_CVT_F32_UBYTE1 : VOP1_32 0x0012, V_CVT_F32_UBYTE1, [];
 //defm V_CVT_F32_UBYTE2 : VOP1_32 0x0013, V_CVT_F32_UBYTE2, [];
diff --git a/test/CodeGen/R600/fpext.ll b/test/CodeGen/R600/fpext.ll
new file mode 100644
index 000..e02c19c
--- /dev/null
+++ b/test/CodeGen/R600/fpext.ll
@@ -0,0 +1,9 @@
+; RUN: llc  %s -march=r600 -mcpu=SI | FileCheck %s --check-prefix=CHECK
+
+; CHECK: @fpext
+; CHECK: V_CVT_F64_F32_e32
+define void @fpext(double addrspace(1)* %out, float %in) {
+  %result = fpext float %in to double
+  store double %result, double addrspace(1)* %out
+  ret void
+}
diff --git a/test/CodeGen/R600/fptrunc.ll b/test/CodeGen/R600/fptrunc.ll
new file mode 100644
index 000..2a10f63
--- /dev/null
+++ b/test/CodeGen/R600/fptrunc.ll
@@ -0,0 +1,9 @@
+; RUN: llc  %s -march=r600 -mcpu=SI | FileCheck %s --check-prefix=CHECK
+
+; CHECK: @fptrunc
+; CHECK: V_CVT_F32_F64_e32
+define void @fptrunc(float addrspace(1)* %out, double %in) {
+  %result = fptrunc double %in to float
+  store float %result, float addrspace(1)* %out
+  ret void
+}
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] R600/SI: Initial double precision support for Radeon SI

2013-07-09 Thread Niels Ole Salscheider

Hi Tom,

 All these patches look good to me, but #2 and #6 should have a test case
 with them.  If you resubmit these patches with test cases, I will push the
 entire series.

I have attached an updated patchset. I have added a test case to patch #2 and 
#6. I have also replaced the scalar move in patch #2 by a vector move since 
there is probably no point in having a floating point value in a scalar 
register.

Kind regards,

OleFrom 4224b314cf2d97cdf2ac99564d6155fa04fbb971 Mon Sep 17 00:00:00 2001
From: Niels Ole Salscheider niels_...@salscheider-online.de
Date: Sat, 1 Jun 2013 16:48:56 +0200
Subject: [PATCH 1/6] R600/SI: Add initial double precision support for SI

---
 lib/Target/R600/AMDGPUISelLowering.cpp |  6 ++
 lib/Target/R600/SIISelLowering.cpp |  1 +
 lib/Target/R600/SIInstructions.td  | 30 +-
 test/CodeGen/R600/fadd64.ll| 13 +
 test/CodeGen/R600/fdiv64.ll| 14 ++
 test/CodeGen/R600/fmul64.ll| 13 +
 test/CodeGen/R600/load64.ll| 20 
 7 Dateien geändert, 96 Zeilen hinzugefügt(+), 1 Zeile entfernt(-)
 create mode 100644 test/CodeGen/R600/fadd64.ll
 create mode 100644 test/CodeGen/R600/fdiv64.ll
 create mode 100644 test/CodeGen/R600/fmul64.ll
 create mode 100644 test/CodeGen/R600/load64.ll

diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp b/lib/Target/R600/AMDGPUISelLowering.cpp
index 4019a1f..5f3d496 100644
--- a/lib/Target/R600/AMDGPUISelLowering.cpp
+++ b/lib/Target/R600/AMDGPUISelLowering.cpp
@@ -60,12 +60,18 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine TM) :
   setOperationAction(ISD::STORE, MVT::v4f32, Promote);
   AddPromotedToType(ISD::STORE, MVT::v4f32, MVT::v4i32);
 
+  setOperationAction(ISD::STORE, MVT::f64, Promote);
+  AddPromotedToType(ISD::STORE, MVT::f64, MVT::i64);
+
   setOperationAction(ISD::LOAD, MVT::f32, Promote);
   AddPromotedToType(ISD::LOAD, MVT::f32, MVT::i32);
 
   setOperationAction(ISD::LOAD, MVT::v4f32, Promote);
   AddPromotedToType(ISD::LOAD, MVT::v4f32, MVT::v4i32);
 
+  setOperationAction(ISD::LOAD, MVT::f64, Promote);
+  AddPromotedToType(ISD::LOAD, MVT::f64, MVT::i64);
+
   setOperationAction(ISD::MUL, MVT::i64, Expand);
 
   setOperationAction(ISD::UDIV, MVT::i32, Expand);
diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp
index 9d4cfef..0d17a12 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -45,6 +45,7 @@ SITargetLowering::SITargetLowering(TargetMachine TM) :
 
   addRegisterClass(MVT::v2i32, AMDGPU::VReg_64RegClass);
   addRegisterClass(MVT::v2f32, AMDGPU::VReg_64RegClass);
+  addRegisterClass(MVT::f64, AMDGPU::VReg_64RegClass);
 
   addRegisterClass(MVT::v4i32, AMDGPU::VReg_128RegClass);
   addRegisterClass(MVT::v4f32, AMDGPU::VReg_128RegClass);
diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
index 9c96c08..b956387 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -660,7 +660,9 @@ defm V_RSQ_LEGACY_F32 : VOP1_32 
   [(set f32:$dst, (int_AMDGPU_rsq f32:$src0))]
 ;
 defm V_RSQ_F32 : VOP1_32 0x002e, V_RSQ_F32, [];
-defm V_RCP_F64 : VOP1_64 0x002f, V_RCP_F64, [];
+defm V_RCP_F64 : VOP1_64 0x002f, V_RCP_F64,
+  [(set f64:$dst, (fdiv FP_ONE, f64:$src0))]
+;
 defm V_RCP_CLAMP_F64 : VOP1_64 0x0030, V_RCP_CLAMP_F64, [];
 defm V_RSQ_F64 : VOP1_64 0x0031, V_RSQ_F64, [];
 defm V_RSQ_CLAMP_F64 : VOP1_64 0x0032, V_RSQ_CLAMP_F64, [];
@@ -996,10 +998,25 @@ def V_LSHR_B64 : VOP3_64_Shift 0x0162, V_LSHR_B64,
 ;
 def V_ASHR_I64 : VOP3_64_Shift 0x0163, V_ASHR_I64, [];
 
+let isCommutable = 1 in {
+
 def V_ADD_F64 : VOP3_64 0x0164, V_ADD_F64, [];
 def V_MUL_F64 : VOP3_64 0x0165, V_MUL_F64, [];
 def V_MIN_F64 : VOP3_64 0x0166, V_MIN_F64, [];
 def V_MAX_F64 : VOP3_64 0x0167, V_MAX_F64, [];
+
+} // isCommutable = 1
+
+def : Pat 
+  (fadd f64:$src0, f64:$src1),
+  (V_ADD_F64 $src0, $src1, (i64 0))
+;
+
+def : Pat  
+  (fmul f64:$src0, f64:$src1),
+  (V_MUL_F64 $src0, $src1, (i64 0))
+;
+
 def V_LDEXP_F64 : VOP3_64 0x0168, V_LDEXP_F64, [];
 
 let isCommutable = 1 in {
@@ -1417,6 +1434,10 @@ def : BitConvert i32, f32, VReg_32;
 def : BitConvert f32, i32, SReg_32;
 def : BitConvert f32, i32, VReg_32;
 
+def : BitConvert i64, f64, VReg_64;
+
+def : BitConvert f64, i64, VReg_64;
+
 /** === **/
 /** Src  Dst modifiers **/
 /** === **/
@@ -1505,6 +1526,11 @@ def : Pat
   (V_MUL_F32_e32 $src0, (V_RCP_F32_e32 $src1))
 ;
 
+def : Pat
+  (fdiv f64:$src0, f64:$src1),
+  (V_MUL_F64 $src0, (V_RCP_F64_e32 $src1), (i64 0))
+;
+
 def : Pat 
   (fcos f32:$src0),
   (V_COS_F32_e32 (V_MUL_F32_e32 $src0, (V_MOV_B32_e32 CONST.TWO_PI_INV)))
@@ -1634,6 +1660,8 @@ multiclass MUBUFLoad_Pattern MUBUF Instr_ADDR64, ValueType vt,
   ;
 }
 
+defm : MUBUFLoad_Pattern

[Mesa-dev] [PATCH] st/clover: Allow double precision operations

2013-07-02 Thread Niels Ole Salscheider

Pass cl_khr_fp64 preprocessor definition to clang

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/state_trackers/clover/llvm/invocation.cpp | 1 +
 1 Datei geändert, 1 Zeile hinzugefügt(+)

diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
b/src/gallium/state_trackers/clover/llvm/invocation.cpp
index dae61f7..bc85b61 100644
--- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
+++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
@@ -175,6 +175,7 @@ namespace {
 
   // clc.h requires that this macro be defined:
   c.getPreprocessorOpts().addMacroDef(cl_clang_storage_class_specifiers);
+  c.getPreprocessorOpts().addMacroDef(cl_khr_fp64);
 
   c.getLangOpts().NoBuiltin = true;
   c.getTargetOpts().Triple = triple;
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] R600/SI: Initial double precision support for Radeon SI

2013-07-02 Thread Niels Ole Salscheider

Hi,

the attached patches add initial support for double precision operations on 
Southern Islands cards.

Some expressions containing multiple double precision kernel arguments cause 
llvm to run until all memory is used - but I do not (yet) know why.
It works fine as long as I pass pointers to double values.

Regards,

OleFrom 4224b314cf2d97cdf2ac99564d6155fa04fbb971 Mon Sep 17 00:00:00 2001
From: Niels Ole Salscheider niels_...@salscheider-online.de
Date: Sat, 1 Jun 2013 16:48:56 +0200
Subject: [PATCH 1/6] R600/SI: Add initial double precision support for SI

---
 lib/Target/R600/AMDGPUISelLowering.cpp |  6 ++
 lib/Target/R600/SIISelLowering.cpp |  1 +
 lib/Target/R600/SIInstructions.td  | 30 +-
 test/CodeGen/R600/fadd64.ll| 13 +
 test/CodeGen/R600/fdiv64.ll| 14 ++
 test/CodeGen/R600/fmul64.ll| 13 +
 test/CodeGen/R600/load64.ll| 20 
 7 Dateien geändert, 96 Zeilen hinzugefügt(+), 1 Zeile entfernt(-)
 create mode 100644 test/CodeGen/R600/fadd64.ll
 create mode 100644 test/CodeGen/R600/fdiv64.ll
 create mode 100644 test/CodeGen/R600/fmul64.ll
 create mode 100644 test/CodeGen/R600/load64.ll

diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp b/lib/Target/R600/AMDGPUISelLowering.cpp
index 4019a1f..5f3d496 100644
--- a/lib/Target/R600/AMDGPUISelLowering.cpp
+++ b/lib/Target/R600/AMDGPUISelLowering.cpp
@@ -60,12 +60,18 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine TM) :
   setOperationAction(ISD::STORE, MVT::v4f32, Promote);
   AddPromotedToType(ISD::STORE, MVT::v4f32, MVT::v4i32);
 
+  setOperationAction(ISD::STORE, MVT::f64, Promote);
+  AddPromotedToType(ISD::STORE, MVT::f64, MVT::i64);
+
   setOperationAction(ISD::LOAD, MVT::f32, Promote);
   AddPromotedToType(ISD::LOAD, MVT::f32, MVT::i32);
 
   setOperationAction(ISD::LOAD, MVT::v4f32, Promote);
   AddPromotedToType(ISD::LOAD, MVT::v4f32, MVT::v4i32);
 
+  setOperationAction(ISD::LOAD, MVT::f64, Promote);
+  AddPromotedToType(ISD::LOAD, MVT::f64, MVT::i64);
+
   setOperationAction(ISD::MUL, MVT::i64, Expand);
 
   setOperationAction(ISD::UDIV, MVT::i32, Expand);
diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp
index 9d4cfef..0d17a12 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -45,6 +45,7 @@ SITargetLowering::SITargetLowering(TargetMachine TM) :
 
   addRegisterClass(MVT::v2i32, AMDGPU::VReg_64RegClass);
   addRegisterClass(MVT::v2f32, AMDGPU::VReg_64RegClass);
+  addRegisterClass(MVT::f64, AMDGPU::VReg_64RegClass);
 
   addRegisterClass(MVT::v4i32, AMDGPU::VReg_128RegClass);
   addRegisterClass(MVT::v4f32, AMDGPU::VReg_128RegClass);
diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
index 9c96c08..b956387 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -660,7 +660,9 @@ defm V_RSQ_LEGACY_F32 : VOP1_32 
   [(set f32:$dst, (int_AMDGPU_rsq f32:$src0))]
 ;
 defm V_RSQ_F32 : VOP1_32 0x002e, V_RSQ_F32, [];
-defm V_RCP_F64 : VOP1_64 0x002f, V_RCP_F64, [];
+defm V_RCP_F64 : VOP1_64 0x002f, V_RCP_F64,
+  [(set f64:$dst, (fdiv FP_ONE, f64:$src0))]
+;
 defm V_RCP_CLAMP_F64 : VOP1_64 0x0030, V_RCP_CLAMP_F64, [];
 defm V_RSQ_F64 : VOP1_64 0x0031, V_RSQ_F64, [];
 defm V_RSQ_CLAMP_F64 : VOP1_64 0x0032, V_RSQ_CLAMP_F64, [];
@@ -996,10 +998,25 @@ def V_LSHR_B64 : VOP3_64_Shift 0x0162, V_LSHR_B64,
 ;
 def V_ASHR_I64 : VOP3_64_Shift 0x0163, V_ASHR_I64, [];
 
+let isCommutable = 1 in {
+
 def V_ADD_F64 : VOP3_64 0x0164, V_ADD_F64, [];
 def V_MUL_F64 : VOP3_64 0x0165, V_MUL_F64, [];
 def V_MIN_F64 : VOP3_64 0x0166, V_MIN_F64, [];
 def V_MAX_F64 : VOP3_64 0x0167, V_MAX_F64, [];
+
+} // isCommutable = 1
+
+def : Pat 
+  (fadd f64:$src0, f64:$src1),
+  (V_ADD_F64 $src0, $src1, (i64 0))
+;
+
+def : Pat  
+  (fmul f64:$src0, f64:$src1),
+  (V_MUL_F64 $src0, $src1, (i64 0))
+;
+
 def V_LDEXP_F64 : VOP3_64 0x0168, V_LDEXP_F64, [];
 
 let isCommutable = 1 in {
@@ -1417,6 +1434,10 @@ def : BitConvert i32, f32, VReg_32;
 def : BitConvert f32, i32, SReg_32;
 def : BitConvert f32, i32, VReg_32;
 
+def : BitConvert i64, f64, VReg_64;
+
+def : BitConvert f64, i64, VReg_64;
+
 /** === **/
 /** Src  Dst modifiers **/
 /** === **/
@@ -1505,6 +1526,11 @@ def : Pat
   (V_MUL_F32_e32 $src0, (V_RCP_F32_e32 $src1))
 ;
 
+def : Pat
+  (fdiv f64:$src0, f64:$src1),
+  (V_MUL_F64 $src0, (V_RCP_F64_e32 $src1), (i64 0))
+;
+
 def : Pat 
   (fcos f32:$src0),
   (V_COS_F32_e32 (V_MUL_F32_e32 $src0, (V_MOV_B32_e32 CONST.TWO_PI_INV)))
@@ -1634,6 +1660,8 @@ multiclass MUBUFLoad_Pattern MUBUF Instr_ADDR64, ValueType vt,
   ;
 }
 
+defm : MUBUFLoad_Pattern BUFFER_LOAD_DWORDX2_ADDR64, i64,
+  global_load, constant_load;
 defm : MUBUFLoad_Pattern

Re: [Mesa-dev] [PATCH 5/5] radeonsi/compute: Upload work group, work item size in input buffer

2013-05-27 Thread Niels Ole Salscheider

Am Freitag, 24. Mai 2013, 14:07:29 schrieb Tom Stellard:
 From: Tom Stellard thomas.stell...@amd.com
 
 ---
  src/gallium/drivers/radeonsi/radeonsi_compute.c | 38
 ++--- 1 file changed, 27 insertions(+), 11 deletions(-)
 
 diff --git a/src/gallium/drivers/radeonsi/radeonsi_compute.c
 b/src/gallium/drivers/radeonsi/radeonsi_compute.c index 035076d..3abf50b
 100644
 --- a/src/gallium/drivers/radeonsi/radeonsi_compute.c
 +++ b/src/gallium/drivers/radeonsi/radeonsi_compute.c
 @@ -91,9 +91,12 @@ static void radeonsi_launch_grid(
   struct r600_context *rctx = (struct r600_context*)ctx;
   struct si_pipe_compute *program = rctx-cs_shader_state.program;
   struct si_pm4_state *pm4 = CALLOC_STRUCT(si_pm4_state);
 - struct si_resource *input_buffer;
 - uint32_t input_offset = 0;
 - uint64_t input_va;
 + struct si_resource *kernel_args_buffer;

You should initialize this pointer to 0.

 + unsigned kernel_args_size;
 + unsigned num_work_size_bytes = 36;
 + uint32_t kernel_args_offset = 0;
 + uint32_t *kernel_args;
 + uint64_t kernel_args_va;
   uint64_t shader_va;
   unsigned arg_user_sgpr_count = 2;
   unsigned i;
 @@ -112,16 +115,29 @@ static void radeonsi_launch_grid(
   si_pm4_inval_shader_cache(pm4);
   si_cmd_surface_sync(pm4, pm4-cp_coher_cntl);
 
 - /* Upload the input data */
 - r600_upload_const_buffer(rctx, input_buffer, input,
 - program-input_size, input_offset);
 - input_va = r600_resource_va(ctx-screen, (struct
 pipe_resource*)input_buffer); -   input_va += input_offset;
 + /* Upload the kernel arguments */
 
 - si_pm4_add_bo(pm4, input_buffer, RADEON_USAGE_READ);
 + /* The extra num_work_size_bytes are for work group / work item size
 information */ +  kernel_args_size = program-input_size +
 num_work_size_bytes;
 + kernel_args = MALLOC(kernel_args_size);
 + for (i = 0; i  3; i++) {
 + kernel_args[i] = grid_layout[i];
 + kernel_args[i + 3] = grid_layout[i] * block_layout[i];
 + kernel_args[i + 6] = block_layout[i];
 + }
 +
 + memcpy(kernel_args + (num_work_size_bytes / 4), input,
 program-input_size); +
 + r600_upload_const_buffer(rctx, kernel_args_buffer, kernel_args,
 + kernel_args_size, kernel_args_offset);
 + kernel_args_va = r600_resource_va(ctx-screen,
 + (struct pipe_resource*)kernel_args_buffer);
 + kernel_args_va += kernel_args_offset;
 +
 + si_pm4_add_bo(pm4, kernel_args_buffer, RADEON_USAGE_READ);
 
 - si_pm4_set_reg(pm4, R_00B900_COMPUTE_USER_DATA_0, input_va);
 - si_pm4_set_reg(pm4, R_00B900_COMPUTE_USER_DATA_0 + 4,
 S_008F04_BASE_ADDRESS_HI (input_va  32) | S_008F04_STRIDE(0));
 + si_pm4_set_reg(pm4, R_00B900_COMPUTE_USER_DATA_0, kernel_args_va);
 + si_pm4_set_reg(pm4, R_00B900_COMPUTE_USER_DATA_0 + 4,
 S_008F04_BASE_ADDRESS_HI (kernel_args_va  32) | S_008F04_STRIDE(0));
 
   si_pm4_set_reg(pm4, R_00B810_COMPUTE_START_X, 0);
   si_pm4_set_reg(pm4, R_00B814_COMPUTE_START_Y, 0);
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] r600g: fixup for MSAA texture support checking

2013-05-15 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/drivers/r600/r600_shader.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 4e5af70..4d74db0 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -305,7 +305,7 @@ int r600_compute_shader_create(struct pipe_context * ctx,
 
shader_ctx.bc = bytecode;
r600_bytecode_init(shader_ctx.bc, r600_ctx-chip_class, 
r600_ctx-family,
-  r600_ctx-screen-msaa_texture_support);
+  r600_ctx-screen-has_compressed_msaa_texturing);
shader_ctx.bc-type = TGSI_PROCESSOR_COMPUTE;
shader_ctx.bc-isa = r600_ctx-isa;
r600_llvm_compile(mod, r600_ctx-family,
-- 
1.8.2.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallium/opencl: Fix out-of-tree build

2013-04-09 Thread Niels Ole Salscheider

Am Dienstag, 9. April 2013, 11:17:39 schrieb Michel Dänzer:
 From: Michel Dänzer michel.daen...@amd.com
 
 
 Signed-off-by: Michel Dänzer michel.daen...@amd.com
 ---
  src/gallium/targets/opencl/Makefile.am | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/src/gallium/targets/opencl/Makefile.am
 b/src/gallium/targets/opencl/Makefile.am index 389eecc..810f9bb 100644
 --- a/src/gallium/targets/opencl/Makefile.am
 +++ b/src/gallium/targets/opencl/Makefile.am
 @@ -32,11 +32,11 @@ libOpenCL_la_SOURCES =
  # Force usage of a C++ linker
  nodist_EXTRA_libOpenCL_la_SOURCES = dummy.cpp
  
 -PIPE_SRC_DIR = $(top_srcdir)/src/gallium/targets/pipe-loader
 +PIPE_BUILD_DIR = $(top_builddir)/src/gallium/targets/pipe-loader
  
  # Provide compatibility with scripts for the old Mesa build system for
  # a while by putting a link to the driver into /lib of the build tree.
  all-local: libOpenCL.la
 -   @$(MAKE) -C $(PIPE_SRC_DIR)
 +   @$(MAKE) -C $(PIPE_BUILD_DIR)
 $(MKDIR_P) $(top_builddir)/$(LIB_DIR)
 ln -f .libs/libOpenCL.so* $(top_builddir)/$(LIB_DIR)/
 -- 
 1.8.2

I sent that patch to the list on 24.02.2013, but Matt Turner said that he has 
a better solution that does not involve calling make...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] pipe-loader: Fix out of source build

2013-04-04 Thread Niels Ole Salscheider

Am Sonntag, 24. Februar 2013, 15:02:33 schrieb Matt Turner:
 On Sun, Feb 24, 2013 at 2:00 PM, Niels Ole Salscheider
 
 niels_...@salscheider-online.de wrote:
  Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
  ---
  
   src/gallium/targets/opencl/Makefile.am | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)
  
  diff --git a/src/gallium/targets/opencl/Makefile.am
  b/src/gallium/targets/opencl/Makefile.am index c5c3003..709112f 100644
  --- a/src/gallium/targets/opencl/Makefile.am
  +++ b/src/gallium/targets/opencl/Makefile.am
  @@ -32,11 +32,11 @@ libOpenCL_la_SOURCES =
  
   # Force usage of a C++ linker
   nodist_EXTRA_libOpenCL_la_SOURCES = dummy.cpp
  
  -PIPE_SRC_DIR = $(top_srcdir)/src/gallium/targets/pipe-loader
  +PIPE_BUILD_DIR = $(top_builddir)/src/gallium/targets/pipe-loader
  
   # Provide compatibility with scripts for the old Mesa build system for
   # a while by putting a link to the driver into /lib of the build tree.
   all-local: libOpenCL.la
  
  -   @$(MAKE) -C $(PIPE_SRC_DIR)
  +   @$(MAKE) -C $(PIPE_BUILD_DIR)
  
  $(MKDIR_P) $(top_builddir)/$(LIB_DIR)
  ln -f .libs/libOpenCL.so* $(top_builddir)/$(LIB_DIR)/
  
  --
  1.8.1.3
 
 I think I've fixed this in a different way (that doesn't involve
 calling $(MAKE)) in this branch:
 http://cgit.freedesktop.org/~mattst88/mesa/log/?h=make-dist

Do you intend to merge this branch in the forseeable future?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] clover: Fix linkage of libOpenCL

2013-04-04 Thread Niels Ole Salscheider

Clover needs the irreader component of llvm
---
 configure.ac | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index 81d4a3f..bfba1b3 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1649,7 +1649,7 @@ if test x$enable_gallium_llvm = xyes; then
 fi
 
 if test x$enable_opencl = xyes; then
-LLVM_COMPONENTS=${LLVM_COMPONENTS} ipo linker instrumentation
+LLVM_COMPONENTS=${LLVM_COMPONENTS} ipo irreader linker 
instrumentation
 fi
LLVM_LDFLAGS=`$LLVM_CONFIG --ldflags`
LLVM_BINDIR=`$LLVM_CONFIG --bindir`
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] clover: Fix linkage of libOpenCL

2013-04-04 Thread Niels Ole Salscheider

Clover needs the irreader component of llvm

v2: Check for irreader component
irreader is only available with LLVM 3.3 = 177971

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 configure.ac | 4 
 1 file changed, 4 insertions(+)

diff --git a/configure.ac b/configure.ac
index 81d4a3f..fea5868 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1650,6 +1650,10 @@ if test x$enable_gallium_llvm = xyes; then
 
 if test x$enable_opencl = xyes; then
 LLVM_COMPONENTS=${LLVM_COMPONENTS} ipo linker instrumentation
+# LLVM 3.3 = 177971 requires IRReader
+if $LLVM_CONFIG --components | grep -q '\irreader\'; then
+LLVM_COMPONENTS=${LLVM_COMPONENTS} irreader
+fi
 fi
LLVM_LDFLAGS=`$LLVM_CONFIG --ldflags`
LLVM_BINDIR=`$LLVM_CONFIG --bindir`
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] pipe-loader: Fix out of source build

2013-02-24 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider niels_...@salscheider-online.de
---
 src/gallium/targets/opencl/Makefile.am | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/targets/opencl/Makefile.am 
b/src/gallium/targets/opencl/Makefile.am
index c5c3003..709112f 100644
--- a/src/gallium/targets/opencl/Makefile.am
+++ b/src/gallium/targets/opencl/Makefile.am
@@ -32,11 +32,11 @@ libOpenCL_la_SOURCES =
 # Force usage of a C++ linker
 nodist_EXTRA_libOpenCL_la_SOURCES = dummy.cpp
 
-PIPE_SRC_DIR = $(top_srcdir)/src/gallium/targets/pipe-loader
+PIPE_BUILD_DIR = $(top_builddir)/src/gallium/targets/pipe-loader
 
 # Provide compatibility with scripts for the old Mesa build system for
 # a while by putting a link to the driver into /lib of the build tree.
 all-local: libOpenCL.la
-   @$(MAKE) -C $(PIPE_SRC_DIR)
+   @$(MAKE) -C $(PIPE_BUILD_DIR)
$(MKDIR_P) $(top_builddir)/$(LIB_DIR)
ln -f .libs/libOpenCL.so* $(top_builddir)/$(LIB_DIR)/
-- 
1.8.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] st/mesa: index can be negative in the PROGRAM_CONSTANT case

2012-08-12 Thread Niels Ole Salscheider

---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 39717b6..9f58312 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -4028,7 +4028,7 @@ dst_register(struct st_translate *t,
 static struct ureg_src
 src_register(struct st_translate *t,
  gl_register_file file,
- GLuint index)
+ GLint index)
 {
switch(file) {
case PROGRAM_UNDEFINED:
-- 
1.7.11.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

70 matches

Mail list logo