Re: [Mesa-dev] [PATCH] winsys/radeon: fix nop packet padding v2.

2014-07-25 Thread Jerome Glisse
On Thu, Jul 24, 2014 at 8:07 PM, Alex Deucher alexdeuc...@gmail.com wrote:

 On Thu, Jul 24, 2014 at 6:28 PM,  j.gli...@gmail.com wrote:
  From: Jerome Glisse jgli...@redhat.com
 
  The ucode we got for hawaii does not support 0x1000 special nop
  packet type 3 and this leads to gpu reading invalid memory. As packet
  type 2 still exist just use packet type 2.
 
  Note this only partialy fix hawaii issues and some zbuffer tiling
  issues are still present.
 
  Changed since v1:
- use packet type 2 instead of packet 3.

 We don't need this change if we use the updated firmware in my 3.17
 tree.  Looks like the original hawaii CP ucode didn't support the new
 0x1000 special case NOP packet.


I would rather have the nop2 packet solution than yet another is accel
working. Several
reasons :
  - 3.16 will be out soon and has most important fix
  - nop2 packet can easily be backported to stable mesa
  - testing for 3.16 is easy

So i think it would be cleaner to just do nop2 and 3.16.



 Alex

 
  Signed-off-by: Jérôme Glisse jgli...@redhat.com
  ---
   src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 9 ++---
   1 file changed, 2 insertions(+), 7 deletions(-)
 
  diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
 b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
  index a06ecb2..9ac7d0e 100644
  --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
  +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
  @@ -447,13 +447,8 @@ static void radeon_drm_cs_flush(struct
 radeon_winsys_cs *rcs,
   /* pad DMA ring to 8 DWs to meet CP fetch alignment requirements
* r6xx, requires at least 4 dw alignment to avoid a hw bug.
*/
  -if (cs-ws-info.chip_class = SI) {
  -while (rcs-cdw  7)
  -OUT_CS(cs-base, 0x8000); /* type2 nop packet */
  -} else {
  -while (rcs-cdw  7)
  -OUT_CS(cs-base, 0x1000); /* type3 nop packet */
  -}
  +while (rcs-cdw  7)
  +OUT_CS(cs-base, 0x8000); /* type2 nop packet */
   break;
   case RING_UVD:
   while (rcs-cdw  15)
  --
  1.8.3.1
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] winsys/radeon: fix nop packet padding.

2014-07-24 Thread Jerome Glisse
On Thu, Jul 24, 2014 at 05:42:21PM -0400, j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com
 
 The gpu packet prefetcher hates the ugly big nop packet those leads
 to prefetching some invalid memory in some case. Apparently hawaii
 is particularly sensible to this.
 
 Note this only partialy fix hawaii issues and some zbuffer tiling
 issues are still present.

Just to clarify this patch is almost good to go, there is the cs[MAX_DW-1]
case that need fixing and i am pondering on how to do that. Also i have not
tested on bonaire but i do expect that it should only fix thing and not
break things.

Cheers,
Jérôme

 
 Signed-off-by: Jérôme Glisse jgli...@redhat.com
 ---
  src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 18 --
  1 file changed, 16 insertions(+), 2 deletions(-)
 
 diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c 
 b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
 index a06ecb2..502a550 100644
 --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
 +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
 @@ -451,8 +451,22 @@ static void radeon_drm_cs_flush(struct radeon_winsys_cs 
 *rcs,
  while (rcs-cdw  7)
  OUT_CS(cs-base, 0x8000); /* type2 nop packet */
  } else {
 -while (rcs-cdw  7)
 -OUT_CS(cs-base, 0x1000); /* type3 nop packet */
 +switch (rcs-cdw  7) {
 +case 0:
 +break;
 +case 7:
 +/* FIXME can this be bad if we are at cs[LAST_DW-1] ? Need to
 + * think of something.
 + */
 +OUT_CS(cs-base, 0xc0001000);
 +OUT_CS(cs-base, 0xcafedead);
 +/* Note we fallthrough as this will add another 7 dwords */
 +default:
 +OUT_CS(cs-base, 0xc0001000 | (((8 - (rcs-cdw  7)) - 1) 
  16));
 +while (rcs-cdw  7) {
 +OUT_CS(cs-base, 0xcafedead);
 +}
 +}
  }
  break;
  case RING_UVD:
 -- 
 1.8.3.1
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] rules for merging patches to libdrm

2013-11-18 Thread Jerome Glisse
On Mon, Nov 18, 2013 at 05:41:50PM +0100, Thierry Reding wrote:
 On Mon, Nov 18, 2013 at 11:21:36AM -0500, Rob Clark wrote:
  On Mon, Nov 18, 2013 at 10:23 AM, Thierry Reding
  thierry.red...@gmail.com wrote:
   On Mon, Nov 18, 2013 at 10:17:47AM -0500, Rob Clark wrote:
   On Mon, Nov 18, 2013 at 8:29 AM, Thierry Reding
   thierry.red...@gmail.com wrote:
On Sat, Nov 09, 2013 at 01:26:24PM -0800, Ian Romanick wrote:
On 11/09/2013 12:11 AM, Dave Airlie wrote:
 How does this interact with the rule that kernel interfaces 
 require an
 open source userspace? Is here are the mesa/libdrm patches that 
 use
 it sufficient to get the kernel interface merged?

 That's my understanding: open source userspace needs to exist, but 
 it
 doesn't need to be merged upstream yet.

 Having an opensource userspace and having it committed to a final 
 repo
 are different things, as Daniel said patches on the mesa-list were
 sufficient, they're was no hurry to merge them considering a kernel
 release with the code wasn't close, esp with a 3 month release 
 window
 if the kernel merge window is close to that anyways.

 libdrm is easy to change and its releases are cheap. What problem 
 does
 committing code that uses an in-progress kernel interface to 
 libdrm
 cause? I guess I'm not understanding something.


 Releases are cheap, but ABI breaks aren't so you can't just go 
 release
 a libdrm with an ABI for mesa then decide later it was a bad plan.

 Introducing new kernel API usually involves assigning numbers for 
 things
 - a new ioctl number, new #defines for bitfield members, and so on.

 Multiple patches can be in flight at the same time.  For example, 
 Abdiel
 and I both defined execbuf2 flags:

 #define I915_EXEC_RS (1  13) (Abdiel's code)
 #define I915_EXEC_OA (1  13) (my code)

 These obviously conflict.  One of the two will land, and the second
 patch author will need to switch to (1  14) and resubmit.

 If we both decide to push to libdrm, we might get the order 
 backwards,
 or maybe one series won't get pushed at all (in this case, I'm 
 planning
 to drop my patch).  Waiting until one lands in the kernel avoids 
 that
 problem.  Normally, I believe we copy the kernel headers to 
 userspace
 and fix them up a bit.

 Dave may have other reasons; this is just the one I thought of.

 But mostly this, we've been stung by this exact thing happening
 before, and we made the process to stop it from happening again.
   
Then in all honestly, commits to libdrm should be controlled by 
either a
single person or a small cabal... just like the kernel and the 
xserver.
 We're clearly in an uncomfortable middle area where we have a 
stringent
set of restrictions but no way to actually enforce them.
   
That doesn't sound like a bad idea at all. It obviously causes more 
work
for whoever will be the gatekeeper(s).
   
It seems to me that libdrm is currently more of a free-for-all type of
project, and whoever merges some new feature required for a particular 
X
or Mesa driver cuts a new release so that the version number can be 
used
to track the dependency.
   
I wonder if perhaps tying the libdrm releases more tightly to Linux
kernel releases would help. Since there already is a requirement for 
new
kernel APIs to be merged before the libdrm equivalent can be merged,
then having both release cycles in lockstep makes some sense.
  
   Not sure about strictly tying it to kernel releases would be ideal.
   Not *everything* in libdrm is about new kernel APIs.  It tends to be
   the place for things needed both by xorg ddx and mesa driver, which I
   suppose is why it ends up a bit of a free-for-all.
  
   I didn't mean that every release would need to be tied to the Linux
   kernel. But whenever a new Linux kernel release was made, relevant
   changes from the public headers could be pulled into libdrm and a
   release be made. I could even imagine a matching of version numbers.
   libdrm releases could be numbered using the same major and minor as
   Linux kernels that they support. Micro version numbers could be used
   in intermediate releases.
  
  maybe an update-kernel-headers.sh script to grab the headers from
  drm-next and update libdrm wouldn't be a bad idea?
 
 Perhaps. But I think it could even be a manual step. It's not something
 that one person should be doing alone, but rather something that driver
 maintainers should be doing, since they know best what will be needed
 in a new version of libdrm.
 
 Like I mentioned in another subthread, I think a subtree-oriented model
 could work well.
 
 Thierry

Please stop asking for more process bureaucracy. libdrm development model
works fine. Everyone 

Re: [Mesa-dev] Update: UVD status on loongson 3a platform

2013-09-05 Thread Jerome Glisse
On Thu, Sep 05, 2013 at 10:14:32PM +0800, Chen Jie wrote:
 Hi all,
 
 This thread is about
 http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html.
 
 We recently find some interesting thing about UVD based playback on
 loongson 3a plaform, and also find a way to fix the problem.
 
 First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c
 caused the problem:
 * If memcpy is implemented though 16B or 8B load/store instructions,
 it will normally caused video mosaic. When insert a memcmp after the
 copying code in memcpy, it will report the src and dest are not equal.
 * If memcpy use 1B load/store instructions only, the memcmp after the
 copying code reports equal.
 
 Then we find the following changeset fixs out problem:
 
 diff --git a/src/gallium/drivers/radeon/radeon_uvd.c
 b/src/gallium/drivers/radeon/radeon_uvd.c
 index 2f98de2..f9599b6 100644
 --- a/src/gallium/drivers/radeon/radeon_uvd.c
 +++ b/src/gallium/drivers/radeon/radeon_uvd.c
 @@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec,
unsigned size)
  {
   buffer-buf = dec-ws-buffer_create(dec-ws, size, 4096, false,
 - RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM);
 + RADEON_DOMAIN_GTT);
   if (!buffer-buf)
   return false;
 
 The VRAM is mapped to an uncached area in out platform, so, my
 question is what could go wrong while using  4B load/store
 instructions in UVD workflow? Any idea?
 

How do you map the VRAM into user process mapping ? ie do you have
something like Intel PAT or something like MTRR or something else.

In other word, can you map into process address space a region of
io memory (GPU VRAM in this case) and mark it as uncached so that
none of the access to it goes through CPU cache.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Update: UVD status on loongson 3a platform

2013-09-05 Thread Jerome Glisse
On Thu, Sep 05, 2013 at 03:29:52PM -0400, Jerome Glisse wrote:
 On Thu, Sep 05, 2013 at 10:14:32PM +0800, Chen Jie wrote:
  Hi all,
  
  This thread is about
  http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html.
  
  We recently find some interesting thing about UVD based playback on
  loongson 3a plaform, and also find a way to fix the problem.
  
  First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c
  caused the problem:
  * If memcpy is implemented though 16B or 8B load/store instructions,
  it will normally caused video mosaic. When insert a memcmp after the
  copying code in memcpy, it will report the src and dest are not equal.
  * If memcpy use 1B load/store instructions only, the memcmp after the
  copying code reports equal.
  
  Then we find the following changeset fixs out problem:
  
  diff --git a/src/gallium/drivers/radeon/radeon_uvd.c
  b/src/gallium/drivers/radeon/radeon_uvd.c
  index 2f98de2..f9599b6 100644
  --- a/src/gallium/drivers/radeon/radeon_uvd.c
  +++ b/src/gallium/drivers/radeon/radeon_uvd.c
  @@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec,
 unsigned size)
   {
buffer-buf = dec-ws-buffer_create(dec-ws, size, 4096, false,
  - RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM);
  + RADEON_DOMAIN_GTT);
if (!buffer-buf)
return false;
  
  The VRAM is mapped to an uncached area in out platform, so, my
  question is what could go wrong while using  4B load/store
  instructions in UVD workflow? Any idea?
  
 
 How do you map the VRAM into user process mapping ? ie do you have
 something like Intel PAT or something like MTRR or something else.
 
 In other word, can you map into process address space a region of
 io memory (GPU VRAM in this case) and mark it as uncached so that
 none of the access to it goes through CPU cache.
 
 Cheers,
 Jerome

Also it might be that you can't do write combining on your platform,
which would be a major drawback as it's assume by radeon userspace.
I would need to check the pcie specification, but write combining is
probably not mandatory meaning that your architecture might not have
it. This would explain why only memset with byte size copy works.

Don't think there is any easy way to work around that.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] SIGFPE in libdrm_radeon on evergreen

2013-05-20 Thread Jerome Glisse
On Mon, May 20, 2013 at 5:13 AM, Vadim Girlin vadimgir...@gmail.com wrote:
 On 05/20/2013 11:27 AM, Dragomir Ivanov wrote:

 0x769058d7 in eg_surface_init_2d (surf_man=0x633ea0,
 surf=0x88d848,
 level=0x88dea8, bpe=1, tile_split=0, offset=65536, start_level=0)

 It looks like division by 0. tile_split=0 from the call site.


 Yes, I'm just not sure why tile_split is 0 here and what is the best way to
 fix it, possibly in fact this is a consequence of some problem in r600g, not
 in the libdrm. Though probably libdrm should handle it more gracefully
 anyway.

 Vadim

Just a guess, ddx is not properly setting tile split on a surface and
then r600g call in trying to rebuild miptree ... I think i fixed issue
in ddx couple month ago but maybe i did not.

Cheers,
Jerome



 On Mon, May 20, 2013 at 4:11 AM, Vadim Girlin vadimgir...@gmail.com
 wrote:

 Reduced test app attached and below is gdb backtrace. I suspect something
 is not initialized properly but I'm not very familiar with this code.

 Vadim


 Program received signal SIGFPE, Arithmetic exception.
 0x769058d7 in eg_surface_init_2d (surf_man=0x633ea0,
 surf=0x88d848, level=0x88dea8, bpe=1, tile_split=0, offset=65536,
 start_level=0)
   at radeon_surface.c:651
 651 slice_pt = tileb / tile_split;

 #0  0x769058d7 in eg_surface_init_2d (surf_man=0x633ea0,
 surf=0x88d848, level=0x88dea8, bpe=1, tile_split=0, offset=65536,
 start_level=0)
   at radeon_surface.c:651
 #1  0x76905eea in eg_surface_init_2d_miptrees (surf_man=0x633ea0,
 surf=0x88d848) at radeon_surface.c:807
 #2  0x76906062 in eg_surface_init (surf_man=0x633ea0,
 surf=0x88d848) at radeon_surface.c:863
 #3  0x76907fe6 in radeon_surface_init (surf_man=0x633ea0,
 surf=0x88d848) at radeon_surface.c:1901
 #4  0x7713260b in radeon_drm_winsys_surface_init (rws=0x6339a0,
 surf=0x88d848) at radeon_drm_winsys.c:477
 #5  0x770a3e1c in r600_setup_surface (screen=0x6340d0,
 rtex=0x88d760, pitch_in_bytes_override=0) at r600_texture.c:203
 #6  0x770a4774 in r600_texture_create_object (screen=0x6340d0,
 base=0x7fffd6d0, pitch_in_bytes_override=0, buf=0x0,
 surface=0x7fffc8e0)
   at r600_texture.c:432
 #7  0x770a5268 in r600_texture_create (screen=0x6340d0,
 templ=0x7fffd6d0) at r600_texture.c:607
 #8  0x7708a5bd in r600_resource_create (screen=0x6340d0,
 templ=0x7fffd6d0) at r600_resource.c:38
 #9  0x77125579 in dri2_drawable_process_buffers
 (drawable=0x88af80, buffers=0x88aea0, buffer_count=1, atts=0x88b628,
 att_count=2) at dri2.c:283
 #10 0x7712590a in dri2_allocate_textures (drawable=0x88af80,
 statts=0x88b628, statts_count=2) at dri2.c:404
 #11 0x77123e6a in dri_st_framebuffer_validate (stfbi=0x88af80,
 statts=0x88b628, count=2, out=0x7fffd840) at dri_drawable.c:81
 #12 0x76e461c1 in st_framebuffer_validate (stfb=0x88b1e0,
 st=0x883870) at ../../src/mesa/state_tracker/**st_manager.c:193

 #13 0x76e472a8 in st_api_make_current (stapi=0x7761b9e0
 st_gl_api, stctxi=0x883870, stdrawi=0x88af80, streadi=0x88af80)
   at ../../src/mesa/state_tracker/**st_manager.c:721

 #14 0x77122ce8 in dri_make_current (cPriv=0x7fdb70,
 driDrawPriv=0x88af40, driReadPriv=0x88af40) at dri_context.c:255
 #15 0x76c6ba1f in driBindContext (pcp=0x7fdb70, pdp=0x88af40,
 prp=0x88af40) at ../../../../src/mesa/drivers/**dri/common/dri_util.c:382

 #16 0x77dc57e3 in dri2_bind_context (context=0x7fd9d0,
 old=0x616650, draw=67108873, read=67108873) at dri2_glx.c:172
 #17 0x77d8c253 in MakeContextCurrent (dpy=0x602040,
 draw=67108873,
 read=67108873, gc_user=0x7fd9d0) at glxcurrent.c:269
 #18 0x00384e82713c in fgOpenWindow () from /lib64/libglut.so.3
 #19 0x00384e825afa in fgCreateWindow () from /lib64/libglut.so.3
 #20 0x00384e825b95 in fgCreateMenu () from /lib64/libglut.so.3
 #21 0x00384e823cd3 in glutCreateMenu () from /lib64/libglut.so.3
 #22 0x00400816 in main (argc=1, argv=0x7fffdf18) at test.c:17


 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev




 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] r600g: Don't set the dest cache bits on surface sync for R600_CONTEXT_FLUSH_AND_INV

2013-05-02 Thread Jerome Glisse
On Wed, May 1, 2013 at 1:23 PM, Marek Olšák mar...@gmail.com wrote:
 This is a funny subject. Originally, we only used SURFACE_SYNC on
 Evergreen, which is what the hw guys recommend using, but then Jerome
 came and rewrote it with no reasonable argument to back it up (what he
 was trying to fix by his cache-flush rework is not fixed to this day),
 such that we now flush and invalidate more caches than we need.

I guess fixing lockup is not reasonable.

Jerome


 FLUSH_AND_INV isn't recommended, because it should be slower in theory
 when streamout is used. Frequent changes of streamout buffers would
 also flush and invalidate the framebuffer cache, which is undesirable.
 Unfortunately, I don't know of any apps using streamout.

 This patch looks good. However, once we start seeing apps taking full
 advantage of GL3 and GL4, we will have to switch back to SURFACE_SYNC
 at least for graphics.

 Marek

 On Fri, Apr 26, 2013 at 7:21 PM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 We are already emitting a EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet
 when this flush flag is set, so flushing the dest caches with a
 SURFACE_SYNC should not be necessary.

 The motivation for this change is that emitting a SURFACE_SYNC packet with
 the CB bits set was causing compute shaders to hang on Cayman.
 ---
  src/gallium/drivers/r600/r600_hw_context.c | 28 +---
  1 file changed, 13 insertions(+), 15 deletions(-)

 diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
 b/src/gallium/drivers/r600/r600_hw_context.c
 index b4fb3bf..8aebd25 100644
 --- a/src/gallium/drivers/r600/r600_hw_context.c
 +++ b/src/gallium/drivers/r600/r600_hw_context.c
 @@ -231,21 +231,19 @@ void r600_flush_emit(struct r600_context *rctx)
 cs-buf[cs-cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0);
 cs-buf[cs-cdw++] = 
 EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0);
 if (rctx-chip_class = EVERGREEN) {
 -   cp_coher_cntl = S_0085F0_CB0_DEST_BASE_ENA(1) |
 -   S_0085F0_CB1_DEST_BASE_ENA(1) |
 -   S_0085F0_CB2_DEST_BASE_ENA(1) |
 -   S_0085F0_CB3_DEST_BASE_ENA(1) |
 -   S_0085F0_CB4_DEST_BASE_ENA(1) |
 -   S_0085F0_CB5_DEST_BASE_ENA(1) |
 -   S_0085F0_CB6_DEST_BASE_ENA(1) |
 -   S_0085F0_CB7_DEST_BASE_ENA(1) |
 -   S_0085F0_CB8_DEST_BASE_ENA(1) |
 -   S_0085F0_CB9_DEST_BASE_ENA(1) |
 -   S_0085F0_CB10_DEST_BASE_ENA(1) |
 -   S_0085F0_CB11_DEST_BASE_ENA(1) |
 -   S_0085F0_DB_DEST_BASE_ENA(1) |
 -   S_0085F0_TC_ACTION_ENA(1) |
 -   S_0085F0_CB_ACTION_ENA(1) |
 +   /* We were previously setting the CB and DB bits on
 +* cp_coher_cntl, but this is unnecessary since
 +* we are emitting the
 +* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet.
 +* Setting the CB bits was causing lockups when using
 +* compute on cayman.
 +*
 +* XXX: Do even need to emit a surface sync packet 
 here?
 +* Prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b
 +* surface sync was not being emitted with the
 +* R600_CONTEXT_FLUSH_AND_INV flag.
 +*/
 +   cp_coher_cntl = S_0085F0_TC_ACTION_ENA(1) |
 S_0085F0_DB_ACTION_ENA(1) |
 S_0085F0_SH_ACTION_ENA(1) |
 S_0085F0_SMX_ACTION_ENA(1) |
 --
 1.8.1.5

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: use CP DMA for buffer clears on evergreen+

2013-04-24 Thread Jerome Glisse
On Wed, Apr 24, 2013 at 3:15 PM,  alexdeuc...@gmail.com wrote:
 From: Alex Deucher alexander.deuc...@amd.com

 Lighter weight then using streamout.  Only evergreen
 and newer asics support embedded data as src with
 CP DMA.

 Signed-off-by: Alex Deucher alexander.deuc...@amd.com

Reviewed-by: Jerome Glisse jgli...@redhat.com

 ---
  src/gallium/drivers/r600/evergreen_hw_context.c |   66 
 +++
  src/gallium/drivers/r600/evergreend.h   |   42 ++
  src/gallium/drivers/r600/r600_blit.c|   10 +++-
  src/gallium/drivers/r600/r600_pipe.h|3 +
  4 files changed, 119 insertions(+), 2 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c 
 b/src/gallium/drivers/r600/evergreen_hw_context.c
 index d980c18..7cab879 100644
 --- a/src/gallium/drivers/r600/evergreen_hw_context.c
 +++ b/src/gallium/drivers/r600/evergreen_hw_context.c
 @@ -106,3 +106,69 @@ void evergreen_dma_copy(struct r600_context *rctx,
 util_range_add(rdst-valid_buffer_range, dst_offset,
dst_offset + size);
  }
 +
 +/* The max number of bytes to copy per packet. */
 +#define CP_DMA_MAX_BYTE_COUNT ((1  21) - 8)
 +
 +void evergreen_cp_dma_clear_buffer(struct r600_context *rctx,
 +  struct pipe_resource *dst, uint64_t offset,
 +  unsigned size, uint32_t clear_value)
 +{
 +   struct radeon_winsys_cs *cs = rctx-rings.gfx.cs;
 +
 +   assert(size);
 +   assert(rctx-screen-has_cp_dma);
 +
 +   offset += r600_resource_va(rctx-screen-screen, dst);
 +
 +   /* We flush the caches, because we might read from or write
 +* to resources which are bound right now. */
 +   rctx-flags |= R600_CONTEXT_INVAL_READ_CACHES |
 +  R600_CONTEXT_FLUSH_AND_INV |
 +  R600_CONTEXT_FLUSH_AND_INV_CB_META |
 +  R600_CONTEXT_FLUSH_AND_INV_DB_META |
 +  R600_CONTEXT_STREAMOUT_FLUSH |
 +  R600_CONTEXT_WAIT_3D_IDLE;
 +
 +   while (size) {
 +   unsigned sync = 0;
 +   unsigned byte_count = MIN2(size, CP_DMA_MAX_BYTE_COUNT);
 +   unsigned reloc;
 +
 +   r600_need_cs_space(rctx, 10 + (rctx-flags ? 
 R600_MAX_FLUSH_CS_DWORDS : 0), FALSE);
 +
 +   /* Flush the caches for the first copy only. */
 +   if (rctx-flags) {
 +   r600_flush_emit(rctx);
 +   }
 +
 +   /* Do the synchronization after the last copy, so that all 
 data is written to memory. */
 +   if (size == byte_count) {
 +   sync = PKT3_CP_DMA_CP_SYNC;
 +   }
 +
 +   /* This must be done after r600_need_cs_space. */
 +   reloc = r600_context_bo_reloc(rctx, rctx-rings.gfx,
 + (struct r600_resource*)dst, 
 RADEON_USAGE_WRITE);
 +
 +   r600_write_value(cs, PKT3(PKT3_CP_DMA, 4, 0));
 +   r600_write_value(cs, clear_value);  /* DATA [31:0] */
 +   r600_write_value(cs, sync | PKT3_CP_DMA_SRC_SEL(2));/* 
 CP_SYNC [31] | SRC_SEL[30:29] */
 +   r600_write_value(cs, offset);   /* DST_ADDR_LO [31:0] */
 +   r600_write_value(cs, (offset  32)  0xff);/* 
 DST_ADDR_HI [7:0] */
 +   r600_write_value(cs, byte_count);   /* COMMAND [29:22] | 
 BYTE_COUNT [20:0] */
 +
 +   r600_write_value(cs, PKT3(PKT3_NOP, 0, 0));
 +   r600_write_value(cs, reloc);
 +
 +   size -= byte_count;
 +   offset += byte_count;
 +   }
 +
 +   /* Invalidate the read caches. */
 +   rctx-flags |= R600_CONTEXT_INVAL_READ_CACHES;
 +
 +   util_range_add(r600_resource(dst)-valid_buffer_range, offset,
 +  offset + size);
 +}
 +
 diff --git a/src/gallium/drivers/r600/evergreend.h 
 b/src/gallium/drivers/r600/evergreend.h
 index 53b68a4..5d72432 100644
 --- a/src/gallium/drivers/r600/evergreend.h
 +++ b/src/gallium/drivers/r600/evergreend.h
 @@ -118,6 +118,48 @@
  #define PKT3_PREDICATE(x)   (((x)  0)  0x1)
  #define PKT0(index, count) (PKT_TYPE_S(0) | PKT0_BASE_INDEX_S(index) | 
 PKT_COUNT_S(count))

 +#define PKT3_CP_DMA0x41
 +/* 1. header
 + * 2. SRC_ADDR_LO [31:0] or DATA [31:0]
 + * 3. CP_SYNC [31] | SRC_SEL [30:29] | ENGINE [27] | DST_SEL [21:20] | 
 SRC_ADDR_HI [7:0]
 + * 4. DST_ADDR_LO [31:0]
 + * 5. DST_ADDR_HI [7:0]
 + * 6. COMMAND [29:22] | BYTE_COUNT [20:0]
 + */
 +#define PKT3_CP_DMA_CP_SYNC   (1  31)
 +#define PKT3_CP_DMA_SRC_SEL(x)   ((x)  29)
 +/* 0 - SRC_ADDR
 + * 1 - GDS (program SAS to 1 as well)
 + * 2 - DATA
 + */
 +#define PKT3_CP_DMA_DST_SEL(x)   ((x)  20)
 +/* 0 - DST_ADDR
 + * 1 - GDS (program DAS to 1 as well)
 + */
 +/* COMMAND */
 +#define PKT3_CP_DMA_CMD_SRC_SWAP(x) ((x)  23)
 +/* 0

Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing v2

2013-03-27 Thread Jerome Glisse
On Wed, Mar 27, 2013 at 4:45 AM, Christian König
deathsim...@vodafone.de wrote:
 Am 27.03.2013 01:43, schrieb Jerome Glisse:

 On Tue, Mar 26, 2013 at 6:45 PM, Dave Airlie airl...@gmail.com wrote:

 correctly). But Marek is quite right that this only counts for state
 objects
 and makes no sense for set_* and draw_* calls (and I'm currently
 thinking
 how to avoid that and can't come up with a proper solution). Anyway
 it's
 definitely not an urgent problem for radeonsi.

 It will be a problem once we actually start caring about performance
 and, most importantly, the CPU overhead of the driver.

 I still think that writing into the command buffers directly (e.g.
 without
 wrapper functions) is a bad idea, cause that lead to mixing driver
 logic
 and

 I'm convinced the exact opposite is a bad idea, because it adds
 another layer all commands must go through. A layer which brings no
 advantage. Think about apps which issue 1k-10k draw calls per frame.
 It's obvious that every byte moved around counts and the key to high
 framerate is to do (almost) nothing in the driver. It looks like the
 idea here is to make the driver as slow as possible.

 packet building in r600g. For example just try to figure out how the
 relocation in NOPs work by reading the source (please keep in mind
 that
 one
 of the primary goals why AMD is supporting this driver is to give a
 good
 example code for customers who want to implement that stuff on their
 own
 systems).

 I'm shocked. Sacrificing performance in the name of making the code
 nicer for some customers? Seriously? I thought the plan was to make
 the best graphics driver ever.


 Well, maybe I'm repeating myself: Performance is not a priority, it's
 only
 nice to have!

 Sorry to say so, but if we sacrifice a bit of performance for more code
 readability than that is perfectly ok with me (Don't understand me wrong
 I
 would really prefer to replace the closed source driver today than
 tomorrow,
 it's unfortunately just not what I'm paid for).

 On the other hand, we are talking about perfectly optimizeable inline
 functions and/or macros. All I'm saying is that we should structurize
 the
 code a bit more.

 Its okay to take steps in the right direction, but if you start taking
 steps that away
 from performance in lieu of code readability then please be prepared
 to deal with
 objections.

 The thing is in a lot of cases, code readability is in the eye of the
 beholder, I'm sure
 Jerome though r600g was perfectly readable when he wrote it, but a lot
 of us didn't
 and spent a lot of time trying to remove the CPU overheads, not least
 the amount of
 time Marek spent. The thing is performance is measureable, code
 readability isn't.

 Dave.

 Maybe once again you forgot why i did things the way i did them, i
 explained myself to you back then, i designed r600g for a new kernel
 api which was violently different from the cs one, my hope was that
 the other kernel api would be better, it was not and i never pushed
 more on that front. So r600g design was definitely not adapted to the
 cs ioctl and not thinked for it. History often explain a lot of things
 and people seems to forget about them.

 That being said, i too find ironic the code readability argument, if
 one understand the cs ioctl then the r600g code as it's nowadays make
 sense, but the radeonsi code is closer to what r600g use to be. So
 assuming same ioctl i would say that radeonsi should move towards what
 r600g is nowadays.

 Anyway just wanted to set history straight.


 Well I think you hit the point here quite well, may I ask what your kernel
 interface would have been looked like?

 Christian.

I use to have a branch on fdo with it, basicly what use to be
r600_hw_context was a nop in gallium and you had state in kernel (cb,
db, sampler view, sampler, ...) and you created them and then bound
them so everything was mostly security check at creation time and
bound time was pretty quick, it was also transaction based. Relocation
was easier too. Anyway it was a bad API, i know that in closed world
or more obscure stack you can have a kernel api that doesn't do much
security check and call it a day which gives you a lot more freedom on
api.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] winsys/radeon: add command stream replay dump for faulty lockup

2013-03-27 Thread Jerome Glisse
On Wed, Mar 27, 2013 at 11:27 AM,  j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 Build time option, set RADEON_CS_DUMP_ON_LOCKUP to 1 in radeon_drm_cs.h to
 enable it.

 When enabled after each cs submission the code will try to detect lockup by
 waiting on one of the buffer of the cs to become idle, after a timeout it
 will consider that the cs triggered a lockup and will write a radeon_lockup.c
 file in current directory that have all information for replaying the cs.

 To build this file :
 gcc -O0 -g radeon_lockup.c -ldrm -o radeon_lockup -I/usr/include/libdrm

 Signed-off-by: Jerome Glisse jgli...@redhat.com

Maybe i should add the radeon_ctx.h file to winsys dir as you need it
to build the radeon_lockup.c i did not wanted to printf the whole
helper. For example you can check radeon_lockup.c and radeon_ctx.h
here :
http://people.freedesktop.org/~glisse/rlockup/

Note this is a radeon si verde capture for a 2d tiling that lockup
(can be hard lockup sometimes so be careful).

Cheers,
Jerome

 ---
  src/gallium/winsys/radeon/drm/Makefile.sources |   1 +
  src/gallium/winsys/radeon/drm/radeon_drm_bo.c  |  80 ++--
  src/gallium/winsys/radeon/drm/radeon_drm_bo.h  |   2 +
  src/gallium/winsys/radeon/drm/radeon_drm_cs.c  |   4 +
  src/gallium/winsys/radeon/drm/radeon_drm_cs.h  |   6 +
  src/gallium/winsys/radeon/drm/radeon_drm_cs_dump.c | 135 
 +
  6 files changed, 191 insertions(+), 37 deletions(-)
  create mode 100644 src/gallium/winsys/radeon/drm/radeon_drm_cs_dump.c

 diff --git a/src/gallium/winsys/radeon/drm/Makefile.sources 
 b/src/gallium/winsys/radeon/drm/Makefile.sources
 index 1d18d61..4ca5ebb 100644
 --- a/src/gallium/winsys/radeon/drm/Makefile.sources
 +++ b/src/gallium/winsys/radeon/drm/Makefile.sources
 @@ -1,4 +1,5 @@
  C_SOURCES := \
 radeon_drm_bo.c \
 radeon_drm_cs.c \
 +   radeon_drm_cs_dump.c \
 radeon_drm_winsys.c
 diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
 b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
 index f4ac526..5a9493a 100644
 --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
 +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
 @@ -391,14 +391,54 @@ static void radeon_bo_destroy(struct pb_buffer *_buf)
  FREE(bo);
  }

 +void *radeon_bo_do_map(struct radeon_bo *bo)
 +{
 +struct drm_radeon_gem_mmap args = {0};
 +void *ptr;
 +
 +/* Return the pointer if it's already mapped. */
 +if (bo-ptr)
 +return bo-ptr;
 +
 +/* Map the buffer. */
 +pipe_mutex_lock(bo-map_mutex);
 +/* Return the pointer if it's already mapped (in case of a race). */
 +if (bo-ptr) {
 +pipe_mutex_unlock(bo-map_mutex);
 +return bo-ptr;
 +}
 +args.handle = bo-handle;
 +args.offset = 0;
 +args.size = (uint64_t)bo-base.size;
 +if (drmCommandWriteRead(bo-rws-fd,
 +DRM_RADEON_GEM_MMAP,
 +args,
 +sizeof(args))) {
 +pipe_mutex_unlock(bo-map_mutex);
 +fprintf(stderr, radeon: gem_mmap failed: %p 0x%08X\n,
 +bo, bo-handle);
 +return NULL;
 +}
 +
 +ptr = os_mmap(0, args.size, PROT_READ|PROT_WRITE, MAP_SHARED,
 +   bo-rws-fd, args.addr_ptr);
 +if (ptr == MAP_FAILED) {
 +pipe_mutex_unlock(bo-map_mutex);
 +fprintf(stderr, radeon: mmap failed, errno: %i\n, errno);
 +return NULL;
 +}
 +bo-ptr = ptr;
 +pipe_mutex_unlock(bo-map_mutex);
 +
 +return bo-ptr;
 +}
 +
  static void *radeon_bo_map(struct radeon_winsys_cs_handle *buf,
 struct radeon_winsys_cs *rcs,
 enum pipe_transfer_usage usage)
  {
  struct radeon_bo *bo = (struct radeon_bo*)buf;
  struct radeon_drm_cs *cs = (struct radeon_drm_cs*)rcs;
 -struct drm_radeon_gem_mmap args = {0};
 -void *ptr;

  /* If it's not unsynchronized bo_map, flush CS if needed and then wait. 
 */
  if (!(usage  PIPE_TRANSFER_UNSYNCHRONIZED)) {
 @@ -461,41 +501,7 @@ static void *radeon_bo_map(struct 
 radeon_winsys_cs_handle *buf,
  }
  }

 -/* Return the pointer if it's already mapped. */
 -if (bo-ptr)
 -return bo-ptr;
 -
 -/* Map the buffer. */
 -pipe_mutex_lock(bo-map_mutex);
 -/* Return the pointer if it's already mapped (in case of a race). */
 -if (bo-ptr) {
 -pipe_mutex_unlock(bo-map_mutex);
 -return bo-ptr;
 -}
 -args.handle = bo-handle;
 -args.offset = 0;
 -args.size = (uint64_t)bo-base.size;
 -if (drmCommandWriteRead(bo-rws-fd,
 -DRM_RADEON_GEM_MMAP,
 -args,
 -sizeof(args))) {
 -pipe_mutex_unlock(bo-map_mutex);
 -fprintf(stderr, radeon: gem_mmap failed: %p 0x%08X\n,
 -bo, bo-handle);
 -return NULL;
 -}
 -
 -ptr = os_mmap(0

Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing v2

2013-03-26 Thread Jerome Glisse
On Tue, Mar 26, 2013 at 6:22 AM, Christian König
deathsim...@vodafone.de wrote:
 Am 25.03.2013 18:15, schrieb j.gli...@gmail.com:

 From: Jerome Glisse jgli...@redhat.com

 Same as on r600, trace cs execution by writting cs offset after each
 states, this allow to pin point lockup inside command stream and
 narrow down the scope of lockup investigation.

 v2: Use WRITE_DATA packet instead of WRITE_MEM

 Signed-off-by: Jerome Glisse jgli...@redhat.com
 ---
   src/gallium/drivers/radeonsi/r600_hw_context.c | 61
 ++
   src/gallium/drivers/radeonsi/radeonsi_pipe.c   | 22 ++
   src/gallium/drivers/radeonsi/radeonsi_pipe.h   | 12 +
   src/gallium/drivers/radeonsi/radeonsi_pm4.c| 12 +
   src/gallium/drivers/radeonsi/si_state_draw.c   |  7 ++-
   src/gallium/drivers/radeonsi/sid.h | 14 ++
   6 files changed, 127 insertions(+), 1 deletion(-)

 diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c
 b/src/gallium/drivers/radeonsi/r600_hw_context.c
 index bd348f9..967f093 100644
 --- a/src/gallium/drivers/radeonsi/r600_hw_context.c
 +++ b/src/gallium/drivers/radeonsi/r600_hw_context.c
 @@ -142,6 +142,12 @@ void si_need_cs_space(struct r600_context *ctx,
 unsigned num_dw,
 /* Save 16 dwords for the fence mechanism. */
 num_dw += 16;
   +#if R600_TRACE_CS
 +   if (ctx-screen-trace_bo) {
 +   num_dw += R600_TRACE_CS_DWORDS;
 +   }
 +#endif
 +
 /* Flush if there's not enough space. */
 if (num_dw  RADEON_MAX_CMDBUF_DWORDS) {
 radeonsi_flush(ctx-context, NULL, RADEON_FLUSH_ASYNC);
 @@ -206,9 +212,41 @@ void si_context_flush(struct r600_context *ctx,
 unsigned flags)
 /* force to keep tiling flags */
 flags |= RADEON_FLUSH_KEEP_TILING_FLAGS;
   +#if R600_TRACE_CS
 +   if (ctx-screen-trace_bo) {
 +   struct r600_screen *rscreen = ctx-screen;
 +   unsigned i;
 +
 +   for (i = 0; i  cs-cdw; i++) {
 +   fprintf(stderr, [%4d] [%5d] 0x%08x\n,
 rscreen-cs_count, i, cs-buf[i]);
 +   }
 +   rscreen-cs_count++;
 +   }
 +#endif
 +
 /* Flush the CS. */
 ctx-ws-cs_flush(ctx-cs, flags);
   +#if R600_TRACE_CS
 +   if (ctx-screen-trace_bo) {
 +   struct r600_screen *rscreen = ctx-screen;
 +   unsigned i;
 +
 +   for (i = 0; i  10; i++) {
 +   usleep(5);
 +   if
 (!ctx-ws-buffer_is_busy(rscreen-trace_bo-buf, RADEON_USAGE_READWRITE)) {
 +   break;
 +   }
 +   }
 +   if (i == 10) {
 +   fprintf(stderr, timeout on cs lockup likely
 happen at cs %d dw %d\n,
 +   rscreen-trace_ptr[1],
 rscreen-trace_ptr[0]);
 +   } else {
 +   fprintf(stderr, cs %d executed in %dms\n,
 rscreen-trace_ptr[1], i * 5);
 +   }
 +   }
 +#endif
 +
 ctx-pm4_dirty_cdwords = 0;
 ctx-flags = 0;
   @@ -665,3 +703,26 @@ void r600_context_draw_opaque_count(struct
 r600_context *ctx, struct r600_so_tar
 cs-buf[cs-cdw++] = r600_context_bo_reloc(ctx, t-filled_size,
 RADEON_USAGE_READ);
 }
 +
 +#if R600_TRACE_CS
 +void r600_trace_emit(struct r600_context *rctx)
 +{
 +   struct r600_screen *rscreen = rctx-screen;
 +   struct radeon_winsys_cs *cs = rctx-cs;
 +   uint64_t va;
 +   uint32_t reloc;
 +
 +   va = r600_resource_va(rscreen-screen, (void*)rscreen-trace_bo);
 +   reloc = r600_context_bo_reloc(rctx, rscreen-trace_bo,
 RADEON_USAGE_READWRITE);
 +   cs-buf[cs-cdw++] = PKT3(PKT3_WRITE_DATA, 4, 0);
 +   cs-buf[cs-cdw++] =
 PKT3_WRITE_DATA_DST_SEL(PKT3_WRITE_DATA_DST_SEL_MEM_SYNC) |
 +   PKT3_WRITE_DATA_WR_CONFIRM |
 +
 PKT3_WRITE_DATA_ENGINE_SEL(PKT3_WRITE_DATA_ENGINE_SEL_ME);
 +   cs-buf[cs-cdw++] = va  0xUL;
 +   cs-buf[cs-cdw++] = (va  32UL)  0xUL;
 +   cs-buf[cs-cdw++] = cs-cdw;
 +   cs-buf[cs-cdw++] = rscreen-cs_count;
 +   cs-buf[cs-cdw++] = PKT3(PKT3_NOP, 0, 0);
 +   cs-buf[cs-cdw++] = reloc;


 The NOP packet here is superfluous,  also I don't really like how this is
 implemented after all.

 May I just use this patch as base of a cleaner implementation?

 Christian.

Yeah nop is a left over, what don't you like ? This is a build time
debug option only that proved very useful (at least to me) on r600g

Cheers,
Jerome


 +}
 +#endif
 diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c
 b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
 index c5dac29..a370d7e 100644
 --- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c
 +++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
 @@ -525,6 +525,14 @@ static void r600_destroy_screen(struct pipe_screen*
 pscreen)
 rscreen-ws-buffer_unmap(rscreen-fences.bo-cs_buf

Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing v2

2013-03-26 Thread Jerome Glisse
On Tue, Mar 26, 2013 at 12:40 PM, Marek Olšák mar...@gmail.com wrote:
 On Tue, Mar 26, 2013 at 3:59 PM, Christian König
 deathsim...@vodafone.de wrote:
 Am 26.03.2013 15:34, schrieb Marek Olšák:

 Speaking of si_pm4_state, I think it's a horrible mechanism for
 anything other than constant state objects (create/bind/delete
 functions). For everything else (set/draw functions), you want to emit
 directly into the command stream. It's not so different from the bad
 state management which r600g used to have (which is now gone). If you
 have to call malloc or calloc in a set_* or draw_* function, you're
 doing it wrong. Are there plans to change it to something more
 efficient (e.g. how r300g and r600g emit non-CSO states right now), or
 will it be like this forever?


 Actually I hoped that r600g sooner or later moves into the same direction
 some more. The fact that we currently need to malloc every buffer indeed
 sucks badly, but that is still better than mixing packet generation with
 driver logic.

 I don't understand the last sentence. What mixing? The set_* and
 draw_* commands are supposed to be executed immediately, therefore
 it's reasonable and preferable to write to the CS directly. Having any
 intermediate storage for commands is a waste of time and space.

I agree here, i don't think uncached bo for command stream on new hw
would bring huge perf increase, probably will just be noise.


 Also I don't think that emitting directly into the command stream is such a
 good idea, we sooner or later want that buffer to be a buffer allocated in
 GART memory. And under this condition it is better to build up the commands
 in a (heavily cached) system memory and then memcpy then to the destination
 buffer.

 AFAIK, GART memory is cached on non-AGP systems, but even uncached
 access shouldn't be a big issue, because the access pattern is
 sequential and write-only. BTW, I have talked about emitting commands
 into a buffer object with Dave and he thinks it's a bad idea due to
 the map and unmap overhead. Also, we have to disallow writing to
 certain unsafe registers anyway.

 Marek

I think Christian is thinking about new hw  cayman where we can skip
register checking because of vm and hardware register checking (the hw
CP checks that register in the user IB is not one of the privilege
register and block write and throw irq if so). On this kind of hw you
can have cmd stream in bo and don't do the map/unmap.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] OpenGL ES 3.0 support

2013-03-26 Thread Jerome Glisse
On Tue, Mar 26, 2013 at 4:39 AM, violin yanev violin.ya...@gmail.com wrote:
 Thanks for your replies guys!

 The output of eglinfo is:
 EGL API version: 1.4
 EGL vendor string: Mesa Project
 EGL version string: 1.4 (DRI2)
 EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2
 EGL extensions string:
 EGL_MESA_drm_image EGL_WL_bind_wayland_display EGL_KHR_image_base
 EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_gl_renderbuffer_image
 EGL_KHR_surfaceless_context EGL_KHR_create_context
 EGL_NOK_swap_region EGL_NOK_texture_from_pixmap
 EGL_NV_post_sub_buffer

 So apparently ES3.0 is not a supported API :(

 @Jordan: do you know if one can reenable ES3 on Intel graphics? Is a special
 flag expected? I had read a message that Fedora 18 will enable ES3.0 by
 default?

 Violin

AFAICT fedora won't enable ES3.0 due to patent uncertainty regarding
floating point format, you can however build mesa yourself and enable
floating point format that would give you ES3.0 support.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] OpenGL ES 3.0 support

2013-03-26 Thread Jerome Glisse
On Tue, Mar 26, 2013 at 2:14 PM, Jordan Justen jljus...@gmail.com wrote:
 On Tue, Mar 26, 2013 at 10:34 AM, Jerome Glisse j.gli...@gmail.com wrote:
 On Tue, Mar 26, 2013 at 4:39 AM, violin yanev violin.ya...@gmail.com wrote:
 Thanks for your replies guys!

 The output of eglinfo is:
 EGL API version: 1.4
 EGL vendor string: Mesa Project
 EGL version string: 1.4 (DRI2)
 EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2
 EGL extensions string:
 EGL_MESA_drm_image EGL_WL_bind_wayland_display EGL_KHR_image_base
 EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_gl_renderbuffer_image
 EGL_KHR_surfaceless_context EGL_KHR_create_context
 EGL_NOK_swap_region EGL_NOK_texture_from_pixmap
 EGL_NV_post_sub_buffer

 So apparently ES3.0 is not a supported API :(

 @Jordan: do you know if one can reenable ES3 on Intel graphics? Is a special
 flag expected? I had read a message that Fedora 18 will enable ES3.0 by
 default?

 Violin

 AFAICT fedora won't enable ES3.0 due to patent uncertainty regarding
 floating point format

 This feature should be usable on Intel hardware which is why it was
 enabled by default (for Intel hardware) in 9bdf5be.

 -Jordan

Fedora patch revert this commit.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing v2

2013-03-26 Thread Jerome Glisse
On Tue, Mar 26, 2013 at 6:45 PM, Dave Airlie airl...@gmail.com wrote:

 correctly). But Marek is quite right that this only counts for state
 objects
 and makes no sense for set_* and draw_* calls (and I'm currently thinking
 how to avoid that and can't come up with a proper solution). Anyway it's
 definitely not an urgent problem for radeonsi.

 It will be a problem once we actually start caring about performance
 and, most importantly, the CPU overhead of the driver.

 I still think that writing into the command buffers directly (e.g.
 without
 wrapper functions) is a bad idea, cause that lead to mixing driver logic
 and

 I'm convinced the exact opposite is a bad idea, because it adds
 another layer all commands must go through. A layer which brings no
 advantage. Think about apps which issue 1k-10k draw calls per frame.
 It's obvious that every byte moved around counts and the key to high
 framerate is to do (almost) nothing in the driver. It looks like the
 idea here is to make the driver as slow as possible.

 packet building in r600g. For example just try to figure out how the
 relocation in NOPs work by reading the source (please keep in mind that
 one
 of the primary goals why AMD is supporting this driver is to give a good
 example code for customers who want to implement that stuff on their own
 systems).

 I'm shocked. Sacrificing performance in the name of making the code
 nicer for some customers? Seriously? I thought the plan was to make
 the best graphics driver ever.


 Well, maybe I'm repeating myself: Performance is not a priority, it's only
 nice to have!

 Sorry to say so, but if we sacrifice a bit of performance for more code
 readability than that is perfectly ok with me (Don't understand me wrong I
 would really prefer to replace the closed source driver today than tomorrow,
 it's unfortunately just not what I'm paid for).

 On the other hand, we are talking about perfectly optimizeable inline
 functions and/or macros. All I'm saying is that we should structurize the
 code a bit more.

 Its okay to take steps in the right direction, but if you start taking
 steps that away
 from performance in lieu of code readability then please be prepared
 to deal with
 objections.

 The thing is in a lot of cases, code readability is in the eye of the
 beholder, I'm sure
 Jerome though r600g was perfectly readable when he wrote it, but a lot
 of us didn't
 and spent a lot of time trying to remove the CPU overheads, not least
 the amount of
 time Marek spent. The thing is performance is measureable, code
 readability isn't.

 Dave.

Maybe once again you forgot why i did things the way i did them, i
explained myself to you back then, i designed r600g for a new kernel
api which was violently different from the cs one, my hope was that
the other kernel api would be better, it was not and i never pushed
more on that front. So r600g design was definitely not adapted to the
cs ioctl and not thinked for it. History often explain a lot of things
and people seems to forget about them.

That being said, i too find ironic the code readability argument, if
one understand the cs ioctl then the r600g code as it's nowadays make
sense, but the radeonsi code is closer to what r600g use to be. So
assuming same ioctl i would say that radeonsi should move towards what
r600g is nowadays.

Anyway just wanted to set history straight.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing

2013-03-25 Thread Jerome Glisse
On Mon, Mar 25, 2013 at 12:17 PM, Michel Dänzer mic...@daenzer.net wrote:
 On Mon, 2013-03-25 at 12:01 -0400, j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 Same as on r600, trace cs execution by writting cs offset after each
 states, this allow to pin point lockup inside command stream and
 narrow down the scope of lockup investigation.

 Signed-off-by: Jerome Glisse jgli...@redhat.com

 [...]

 diff --git a/src/gallium/drivers/radeonsi/r600_texture.c 
 b/src/gallium/drivers/radeonsi/r600_texture.c
 index 6cafc3d..3d074a3 100644
 --- a/src/gallium/drivers/radeonsi/r600_texture.c
 +++ b/src/gallium/drivers/radeonsi/r600_texture.c
 @@ -550,7 +550,7 @@ struct pipe_resource *si_texture_create(struct 
 pipe_screen *screen,

   if (!(templ-flags  R600_RESOURCE_FLAG_TRANSFER) 
   !(templ-bind  PIPE_BIND_SCANOUT)) {
 - array_mode = V_009910_ARRAY_2D_TILED_THIN1;
 + array_mode = V_009910_ARRAY_1D_TILED_THIN1;
   }

   r = r600_init_surface(rscreen, surface, templ, array_mode,

 What's this hunk doing in here? :)

 The rest looks good to me on a quick look.


Oops i did it on top of my 2d tiling stuff

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing

2013-03-25 Thread Jerome Glisse
On Mon, Mar 25, 2013 at 12:38 PM, Christian König
deathsim...@vodafone.de wrote:
 Am 25.03.2013 17:01, schrieb j.gli...@gmail.com:

 From: Jerome Glisse jgli...@redhat.com

 Same as on r600, trace cs execution by writting cs offset after each
 states, this allow to pin point lockup inside command stream and
 narrow down the scope of lockup investigation.

 Signed-off-by: Jerome Glisse jgli...@redhat.com


 Could your rewrite this to use an si_pm4_state instead of hand coding it?
 It's cleaner and should reduce the needed code quite a bit.

 Christian.

Well no, the whole point is to emit inside each si_pm4_state_emit so
that you can pin point which reg/packet trigger the lockup.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing

2013-03-25 Thread Jerome Glisse
On Mon, Mar 25, 2013 at 1:12 PM, Christian König
deathsim...@vodafone.de wrote:
 Am 25.03.2013 17:50, schrieb Jerome Glisse:

 On Mon, Mar 25, 2013 at 12:38 PM, Christian König
 deathsim...@vodafone.de wrote:

 Am 25.03.2013 17:01, schrieb j.gli...@gmail.com:

 From: Jerome Glisse jgli...@redhat.com

 Same as on r600, trace cs execution by writting cs offset after each
 states, this allow to pin point lockup inside command stream and
 narrow down the scope of lockup investigation.

 Signed-off-by: Jerome Glisse jgli...@redhat.com


 Could your rewrite this to use an si_pm4_state instead of hand coding it?
 It's cleaner and should reduce the needed code quite a bit.

 Christian.

 Well no, the whole point is to emit inside each si_pm4_state_emit so
 that you can pin point which reg/packet trigger the lockup.


 Ok, well then it makes no sense that you increment the counter only once per
 flush.

 Christian.

The counter is for tracking the cs number (number of call to cs
ioctl), while in r600_emit_trace i emit both the counter and the
cs-cdw value so that you have both the dwords offset of last trace
that went through as well as which cs ioctl call it was. The printf of
command stream print both so that you can easily pin point things.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: allocate FMASK right after the texture, so that it's aligned with it

2013-03-04 Thread Jerome Glisse
On Sun, Mar 3, 2013 at 9:13 AM, Marek Olšák mar...@gmail.com wrote:
 This avoids the kernel CS checker errors with MSAA textures.

Reviewed-by: Jerome Glisse jgli...@redhat.com

 ---
  src/gallium/drivers/r600/r600_texture.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/src/gallium/drivers/r600/r600_texture.c 
 b/src/gallium/drivers/r600/r600_texture.c
 index 484045e..4825592 100644
 --- a/src/gallium/drivers/r600/r600_texture.c
 +++ b/src/gallium/drivers/r600/r600_texture.c
 @@ -435,8 +435,8 @@ r600_texture_create_object(struct pipe_screen *screen,
 }

 if (base-nr_samples  1  !rtex-is_depth  !buf) {
 -   r600_texture_allocate_cmask(rscreen, rtex);
 r600_texture_allocate_fmask(rscreen, rtex);
 +   r600_texture_allocate_cmask(rscreen, rtex);
 }

 if (!rtex-is_depth  base-nr_samples  1 
 --
 1.7.10.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] r600g: inline r600_pipe_shader function

2013-03-04 Thread Jerome Glisse
On Sun, Mar 3, 2013 at 8:39 AM, Marek Olšák mar...@gmail.com wrote:
 also change names of other functions, so that they make sense

For the serie:
Reviewed-by: Jerome Glisse jgli...@redhat.com

 ---
  src/gallium/drivers/r600/evergreen_state.c   |4 +-
  src/gallium/drivers/r600/r600_pipe.h |8 +--
  src/gallium/drivers/r600/r600_shader.c   |   89 
 --
  src/gallium/drivers/r600/r600_state.c|4 +-
  src/gallium/drivers/r600/r600_state_common.c |4 +-
  5 files changed, 51 insertions(+), 58 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_state.c 
 b/src/gallium/drivers/r600/evergreen_state.c
 index 97f91df..5c7cd40 100644
 --- a/src/gallium/drivers/r600/evergreen_state.c
 +++ b/src/gallium/drivers/r600/evergreen_state.c
 @@ -3311,7 +3311,7 @@ void evergreen_init_atom_start_cs(struct r600_context 
 *rctx)
 eg_store_loop_const(cb, R_03A200_SQ_LOOP_CONST_0 + (32 * 4), 
 0x01000FFF);
  }

 -void evergreen_pipe_shader_ps(struct pipe_context *ctx, struct 
 r600_pipe_shader *shader)
 +void evergreen_update_ps_state(struct pipe_context *ctx, struct 
 r600_pipe_shader *shader)
  {
 struct r600_context *rctx = (struct r600_context *)ctx;
 struct r600_pipe_state *rstate = shader-rstate;
 @@ -3460,7 +3460,7 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, 
 struct r600_pipe_shader
 shader-flatshade = rctx-rasterizer-flatshade;
  }

 -void evergreen_pipe_shader_vs(struct pipe_context *ctx, struct 
 r600_pipe_shader *shader)
 +void evergreen_update_vs_state(struct pipe_context *ctx, struct 
 r600_pipe_shader *shader)
  {
 struct r600_context *rctx = (struct r600_context *)ctx;
 struct r600_pipe_state *rstate = shader-rstate;
 diff --git a/src/gallium/drivers/r600/r600_pipe.h 
 b/src/gallium/drivers/r600/r600_pipe.h
 index 3eb2968..28c7de3 100644
 --- a/src/gallium/drivers/r600/r600_pipe.h
 +++ b/src/gallium/drivers/r600/r600_pipe.h
 @@ -626,8 +626,8 @@ void cayman_init_common_regs(struct r600_command_buffer 
 *cb,

  void evergreen_init_state_functions(struct r600_context *rctx);
  void evergreen_init_atom_start_cs(struct r600_context *rctx);
 -void evergreen_pipe_shader_ps(struct pipe_context *ctx, struct 
 r600_pipe_shader *shader);
 -void evergreen_pipe_shader_vs(struct pipe_context *ctx, struct 
 r600_pipe_shader *shader);
 +void evergreen_update_ps_state(struct pipe_context *ctx, struct 
 r600_pipe_shader *shader);
 +void evergreen_update_vs_state(struct pipe_context *ctx, struct 
 r600_pipe_shader *shader);
  void *evergreen_create_db_flush_dsa(struct r600_context *rctx);
  void *evergreen_create_resolve_blend(struct r600_context *rctx);
  void *evergreen_create_decompress_blend(struct r600_context *rctx);
 @@ -701,8 +701,8 @@ r600_create_sampler_view_custom(struct pipe_context *ctx,
 unsigned width_first_level, unsigned 
 height_first_level);
  void r600_init_state_functions(struct r600_context *rctx);
  void r600_init_atom_start_cs(struct r600_context *rctx);
 -void r600_pipe_shader_ps(struct pipe_context *ctx, struct r600_pipe_shader 
 *shader);
 -void r600_pipe_shader_vs(struct pipe_context *ctx, struct r600_pipe_shader 
 *shader);
 +void r600_update_ps_state(struct pipe_context *ctx, struct r600_pipe_shader 
 *shader);
 +void r600_update_vs_state(struct pipe_context *ctx, struct r600_pipe_shader 
 *shader);
  void *r600_create_db_flush_dsa(struct r600_context *rctx);
  void *r600_create_resolve_blend(struct r600_context *rctx);
  void *r700_create_resolve_blend(struct r600_context *rctx);
 diff --git a/src/gallium/drivers/r600/r600_shader.c 
 b/src/gallium/drivers/r600/r600_shader.c
 index 949191a..7ecab7b 100644
 --- a/src/gallium/drivers/r600/r600_shader.c
 +++ b/src/gallium/drivers/r600/r600_shader.c
 @@ -58,52 +58,6 @@ issued in the w slot as well.
  The compiler must issue the source argument to slots z, y, and x
  */

 -static int r600_pipe_shader(struct pipe_context *ctx, struct 
 r600_pipe_shader *shader)
 -{
 -   struct r600_context *rctx = (struct r600_context *)ctx;
 -   struct r600_shader *rshader = shader-shader;
 -   uint32_t *ptr;
 -   int i;
 -
 -   /* copy new shader */
 -   if (shader-bo == NULL) {
 -   shader-bo = (struct r600_resource*)
 -   pipe_buffer_create(ctx-screen, PIPE_BIND_CUSTOM, 
 PIPE_USAGE_IMMUTABLE, rshader-bc.ndw * 4);
 -   if (shader-bo == NULL) {
 -   return -ENOMEM;
 -   }
 -   ptr = r600_buffer_mmap_sync_with_rings(rctx, shader-bo, 
 PIPE_TRANSFER_WRITE);
 -   if (R600_BIG_ENDIAN) {
 -   for (i = 0; i  rshader-bc.ndw; ++i) {
 -   ptr[i] = bswap_32(rshader-bc.bytecode[i]);
 -   }
 -   } else {
 -   memcpy(ptr, rshader-bc.bytecode, rshader-bc.ndw * 
 sizeof(*ptr

Re: [Mesa-dev] [PATCH 1/5] r600g: unify vgt states

2013-03-04 Thread Jerome Glisse
On Wed, Feb 27, 2013 at 6:11 PM, Marek Olšák mar...@gmail.com wrote:
 The states were split because we thought it caused a hardlock. Now we know
 the hardlock was caused by something else and has since been fixed.

For the serie:
Reviewed-by: Jerome Glisse jgli...@redhat.com

 ---
  src/gallium/drivers/r600/evergreen_state.c   |3 +--
  src/gallium/drivers/r600/r600_hw_context.c   |1 -
  src/gallium/drivers/r600/r600_pipe.h |6 --
  src/gallium/drivers/r600/r600_state.c|3 +--
  src/gallium/drivers/r600/r600_state_common.c |   22 +++---
  5 files changed, 9 insertions(+), 26 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_state.c 
 b/src/gallium/drivers/r600/evergreen_state.c
 index 205bbc5..244989d 100644
 --- a/src/gallium/drivers/r600/evergreen_state.c
 +++ b/src/gallium/drivers/r600/evergreen_state.c
 @@ -2615,8 +2615,7 @@ void evergreen_init_state_functions(struct r600_context 
 *rctx)
 r600_init_atom(rctx, 
 rctx-samplers[PIPE_SHADER_GEOMETRY].views.atom, id++, 
 evergreen_emit_gs_sampler_views, 0);
 r600_init_atom(rctx, 
 rctx-samplers[PIPE_SHADER_FRAGMENT].views.atom, id++, 
 evergreen_emit_ps_sampler_views, 0);

 -   r600_init_atom(rctx, rctx-vgt_state.atom, id++, 
 r600_emit_vgt_state, 6);
 -   r600_init_atom(rctx, rctx-vgt2_state.atom, id++, 
 r600_emit_vgt2_state, 3);
 +   r600_init_atom(rctx, rctx-vgt_state.atom, id++, 
 r600_emit_vgt_state, 7);

 if (rctx-chip_class == EVERGREEN) {
 r600_init_atom(rctx, rctx-sample_mask.atom, id++, 
 evergreen_emit_sample_mask, 3);
 diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
 b/src/gallium/drivers/r600/r600_hw_context.c
 index 91af6b8..b78b004 100644
 --- a/src/gallium/drivers/r600/r600_hw_context.c
 +++ b/src/gallium/drivers/r600/r600_hw_context.c
 @@ -827,7 +827,6 @@ void r600_begin_new_cs(struct r600_context *ctx)
 ctx-framebuffer.atom.dirty = true;
 ctx-poly_offset_state.atom.dirty = true;
 ctx-vgt_state.atom.dirty = true;
 -   ctx-vgt2_state.atom.dirty = true;
 ctx-sample_mask.atom.dirty = true;
 ctx-scissor.atom.dirty = true;
 ctx-config_state.atom.dirty = true;
 diff --git a/src/gallium/drivers/r600/r600_pipe.h 
 b/src/gallium/drivers/r600/r600_pipe.h
 index 570a284..4cfade1 100644
 --- a/src/gallium/drivers/r600/r600_pipe.h
 +++ b/src/gallium/drivers/r600/r600_pipe.h
 @@ -127,10 +127,6 @@ struct r600_vgt_state {
 struct r600_atom atom;
 uint32_t vgt_multi_prim_ib_reset_en;
 uint32_t vgt_multi_prim_ib_reset_indx;
 -};
 -
 -struct r600_vgt2_state {
 -   struct r600_atom atom;
 uint32_t vgt_indx_offset;
  };

 @@ -506,7 +502,6 @@ struct r600_context {
 struct r600_config_stateconfig_state;
 struct r600_stencil_ref_state   stencil_ref;
 struct r600_vgt_state   vgt_state;
 -   struct r600_vgt2_state  vgt2_state;
 struct r600_viewport_state  viewport;
 /* Shaders and shader resources. */
 struct r600_cso_state   vertex_fetch_shader;
 @@ -733,7 +728,6 @@ void r600_emit_cso_state(struct r600_context *rctx, 
 struct r600_atom *atom);
  void r600_emit_alphatest_state(struct r600_context *rctx, struct r600_atom 
 *atom);
  void r600_emit_blend_color(struct r600_context *rctx, struct r600_atom 
 *atom);
  void r600_emit_vgt_state(struct r600_context *rctx, struct r600_atom *atom);
 -void r600_emit_vgt2_state(struct r600_context *rctx, struct r600_atom *atom);
  void r600_emit_clip_misc_state(struct r600_context *rctx, struct r600_atom 
 *atom);
  void r600_emit_stencil_ref(struct r600_context *rctx, struct r600_atom 
 *atom);
  void r600_emit_viewport_state(struct r600_context *rctx, struct r600_atom 
 *atom);
 diff --git a/src/gallium/drivers/r600/r600_state.c 
 b/src/gallium/drivers/r600/r600_state.c
 index bbff6bd..fd3b14e 100644
 --- a/src/gallium/drivers/r600/r600_state.c
 +++ b/src/gallium/drivers/r600/r600_state.c
 @@ -2312,8 +2312,7 @@ void r600_init_state_functions(struct r600_context 
 *rctx)
 r600_init_atom(rctx, 
 rctx-samplers[PIPE_SHADER_FRAGMENT].views.atom, id++, 
 r600_emit_ps_sampler_views, 0);
 r600_init_atom(rctx, rctx-vertex_buffer_state.atom, id++, 
 r600_emit_vertex_buffers, 0);

 -   r600_init_atom(rctx, rctx-vgt_state.atom, id++, 
 r600_emit_vgt_state, 6);
 -   r600_init_atom(rctx, rctx-vgt2_state.atom, id++, 
 r600_emit_vgt2_state, 3);
 +   r600_init_atom(rctx, rctx-vgt_state.atom, id++, 
 r600_emit_vgt_state, 7);

 r600_init_atom(rctx, rctx-seamless_cube_map.atom, id++, 
 r600_emit_seamless_cube_map, 3);
 r600_init_atom(rctx, rctx-sample_mask.atom, id++, 
 r600_emit_sample_mask, 3);
 diff --git a/src/gallium/drivers/r600/r600_state_common.c 
 b/src/gallium/drivers/r600/r600_state_common.c
 index 4c68506..8906695 100644
 --- a/src/gallium/drivers/r600/r600_state_common.c
 +++ b/src/gallium

Re: [Mesa-dev] [PATCH] r600g/radeonsi: unreference previous fence in flush

2013-03-04 Thread Jerome Glisse
On Mon, Mar 4, 2013 at 2:05 PM, Michel Dänzer mic...@daenzer.net wrote:
 On Mon, 2013-03-04 at 13:17 -0500, j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 Some code calling the flush function gave a fence pointer that point
 to an old fence and should be unreference to avoid leaking fence.

 Candidate for 9.1

 Signed-off-by: Jerome Glisse jgli...@redhat.com
 ---
  src/gallium/drivers/r600/r600_pipe.c | 8 +---
  src/gallium/drivers/radeonsi/radeonsi_pipe.c | 9 ++---
  2 files changed, 11 insertions(+), 6 deletions(-)

 diff --git a/src/gallium/drivers/r600/r600_pipe.c 
 b/src/gallium/drivers/r600/r600_pipe.c
 index 78002ae..4bcfc67 100644
 --- a/src/gallium/drivers/r600/r600_pipe.c
 +++ b/src/gallium/drivers/r600/r600_pipe.c
 @@ -145,12 +145,14 @@ static void r600_flush_from_st(struct pipe_context 
 *ctx,
  enum pipe_flush_flags flags)
  {
   struct r600_context *rctx = (struct r600_context *)ctx;
 - struct r600_fence **rfence = (struct r600_fence**)fence;
 + struct r600_fence *rfence;
   unsigned fflags;

   fflags = flags  PIPE_FLUSH_END_OF_FRAME ? RADEON_FLUSH_END_OF_FRAME : 
 0;
 - if (rfence) {
 - *rfence = r600_create_fence(rctx);
 + if (fence) {
 + rfence = r600_create_fence(rctx);
 + ctx-screen-fence_reference(ctx-screen, fence,
 + (struct pipe_fence_handle 
 *)rfence);

 This change increases the reference count of the returned fence from 1
 to 2. I don't think that's correct, but if it is, the change should be
 amended with an explanation why.


No i have uncommited change in my tree. I will probably resend with
the xa patchset.

Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] winsys/radeon: Only add bo to hash table when creating flink

2013-03-01 Thread Jerome Glisse
On Fri, Mar 1, 2013 at 4:34 PM, Martin Andersson g02ma...@gmail.com wrote:
 The problem is that we mix bo handles and flinked names in the hash
 table. Because kms type handles are not flinked they should not be
 added to the hash table. If we do that we will sooner or later
 get a situation where we will overwrite a correct entry because
 the bo handle was the same as a flinked name.
 ---
  src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

 diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
 b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
 index 2d41c26..f4ac526 100644
 --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
 +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
 @@ -957,16 +957,16 @@ static boolean radeon_winsys_bo_get_handle(struct 
 pb_buffer *buffer,

  bo-flinked = TRUE;
  bo-flink = flink.name;
 +
 +pipe_mutex_lock(bo-mgr-bo_handles_mutex);
 +util_hash_table_set(bo-mgr-bo_handles, 
 (void*)(uintptr_t)bo-flink, bo);
 +pipe_mutex_unlock(bo-mgr-bo_handles_mutex);
  }
  whandle-handle = bo-flink;
  } else if (whandle-type == DRM_API_HANDLE_TYPE_KMS) {
  whandle-handle = bo-handle;
  }

 -pipe_mutex_lock(bo-mgr-bo_handles_mutex);
 -util_hash_table_set(bo-mgr-bo_handles, 
 (void*)(uintptr_t)whandle-handle, bo);
 -pipe_mutex_unlock(bo-mgr-bo_handles_mutex);
 -
  whandle-stride = stride;
  return TRUE;
  }
 --
 1.8.1.4


Reviewed-by: Jerome Glisse jgli...@redhat.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: status of my work on the shader optimization

2013-02-26 Thread Jerome Glisse
On Tue, Feb 26, 2013 at 1:05 PM, Stefan Seifert n...@detonation.org wrote:
 Good news!

 I gave the r600-sb branch a good testing at commit
 265ae41b1f1d086d35d274c7378c43cddb8215c8 and so far I've not had a single
 lockup in about 1 1/2 hours of flight time!

 The downside is that this is with R600_HYPERZ=0. But with HYPERZ enabled, I
 get lockups on master as well, so it would seem your branch is in pretty good
 shape. Testing done on a Radeon HD 5670 using kernel 3.8

 Regards,
 Stefan

Hyperz bug # ?

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] r6xx flushing rework and enable CP DMA

2013-02-22 Thread Jerome Glisse
On Fri, Feb 22, 2013 at 2:38 PM,  alexdeuc...@gmail.com wrote:
 From: Alex Deucher alexander.deuc...@amd.com

 This patch set cleans up the flushing on r6xx in what seems to be
 a logical manner.  The last patch enables CP DMA on r6xx.  No piglit
 regressions on RS780 which I was testing on.

 Alex Deucher (4):
   r600g: add missing emit_flush for R600_CONTEXT_FLUSH_AND_INV case
   r600g: synchronize streamout buffers on r6xx too (v2)
   r600g: set additional cp_coher_cntl bits for 6xx/7xx flush (v2)
   r600g: enable CP DMA on r6xx (v2)

  src/gallium/drivers/r600/r600_blit.c   |3 +--
  src/gallium/drivers/r600/r600_hw_context.c |   26 +-
  2 files changed, 18 insertions(+), 11 deletions(-)

For the serie:
Reviewed-by: Jerome Glisse jgli...@redhat.com


 --
 1.7.7.5

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: properly implement S8Z24 depth-stencil format for Evergreen

2013-02-13 Thread Jerome Glisse
On Tue, Feb 12, 2013 at 8:06 PM, Marek Olšák mar...@gmail.com wrote:
 I should say fix, but it has never been used until now.
 S8Z24 is the format equivalent to the GL_UNSIGNED_INT_24_8 packing,
 so we'll start to see it more often with st/mesa now making smart decisions
 about formats.

 The DB-CB copy can change the channel ordering for transfers, other than
 that, the internal DB format doesn't really matter.

 R600-R700 support is possible except shadow mapping.
 FMT_24_8 is broken if the SAMPLE_C instruction is used (no idea why).

 Also the sampler swizzling was broken in theory and the fact it worked was
 a lucky coincidence.

 radeonsi might need to port this.

Reviewed-by: Jerome Glisse jgli...@redhat.com

 ---
  src/gallium/drivers/r600/evergreen_state.c |   13 +++-
  src/gallium/drivers/r600/r600_state.c  |8 -
  src/gallium/drivers/r600/r600_texture.c|   44 
 ++--
  3 files changed, 47 insertions(+), 18 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_state.c 
 b/src/gallium/drivers/r600/evergreen_state.c
 index 211c218..c6e29db 100644
 --- a/src/gallium/drivers/r600/evergreen_state.c
 +++ b/src/gallium/drivers/r600/evergreen_state.c
 @@ -200,6 +200,8 @@ static uint32_t r600_translate_dbformat(enum pipe_format 
 format)
 return V_028040_Z_16;
 case PIPE_FORMAT_Z24X8_UNORM:
 case PIPE_FORMAT_Z24_UNORM_S8_UINT:
 +   case PIPE_FORMAT_X8Z24_UNORM:
 +   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
 return V_028040_Z_24;
 case PIPE_FORMAT_Z32_FLOAT:
 case PIPE_FORMAT_Z32_FLOAT_S8X24_UINT:
 @@ -339,7 +341,7 @@ static uint32_t r600_translate_colorswap(enum pipe_format 
 format)

 case PIPE_FORMAT_X8Z24_UNORM:
 case PIPE_FORMAT_S8_UINT_Z24_UNORM:
 -   return V_028C70_SWAP_STD;
 +   return V_028C70_SWAP_STD_REV;

 case PIPE_FORMAT_R10G10B10A2_UNORM:
 case PIPE_FORMAT_R10G10B10X2_SNORM:
 @@ -1106,6 +1108,11 @@ evergreen_create_sampler_view_custom(struct 
 pipe_context *ctx,
 case PIPE_FORMAT_Z32_FLOAT_S8X24_UINT:
 pipe_format = PIPE_FORMAT_Z32_FLOAT;
 break;
 +   case PIPE_FORMAT_X8Z24_UNORM:
 +   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
 +   /* Z24 is always stored like this. */
 +   pipe_format = PIPE_FORMAT_Z24X8_UNORM;
 +   break;
 case PIPE_FORMAT_X24S8_UINT:
 case PIPE_FORMAT_S8X24_UINT:
 case PIPE_FORMAT_X32_S8X24_UINT:
 @@ -1603,6 +1610,8 @@ static void evergreen_init_depth_surface(struct 
 r600_context *rctx,
 switch (surf-base.format) {
 case PIPE_FORMAT_Z24X8_UNORM:
 case PIPE_FORMAT_Z24_UNORM_S8_UINT:
 +   case PIPE_FORMAT_X8Z24_UNORM:
 +   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
 surf-pa_su_poly_offset_db_fmt_cntl =
 S_028B78_POLY_OFFSET_NEG_NUM_DB_BITS((char)-24);
 break;
 @@ -2179,6 +2188,8 @@ static void evergreen_emit_polygon_offset(struct 
 r600_context *rctx, struct r600
 switch (state-zs_format) {
 case PIPE_FORMAT_Z24X8_UNORM:
 case PIPE_FORMAT_Z24_UNORM_S8_UINT:
 +   case PIPE_FORMAT_X8Z24_UNORM:
 +   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
 offset_units *= 2.0f;
 break;
 case PIPE_FORMAT_Z16_UNORM:
 diff --git a/src/gallium/drivers/r600/r600_state.c 
 b/src/gallium/drivers/r600/r600_state.c
 index 5322850..d1f6626 100644
 --- a/src/gallium/drivers/r600/r600_state.c
 +++ b/src/gallium/drivers/r600/r600_state.c
 @@ -270,10 +270,6 @@ static uint32_t r600_translate_colorswap(enum 
 pipe_format format)
 case PIPE_FORMAT_Z24_UNORM_S8_UINT:
 return V_0280A0_SWAP_STD;

 -   case PIPE_FORMAT_X8Z24_UNORM:
 -   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
 -   return V_0280A0_SWAP_STD;
 -
 case PIPE_FORMAT_R10G10B10A2_UNORM:
 case PIPE_FORMAT_R10G10B10X2_SNORM:
 case PIPE_FORMAT_R10SG10SB10SA2U_NORM:
 @@ -440,10 +436,6 @@ static uint32_t r600_translate_colorformat(enum 
 pipe_format format)
 case PIPE_FORMAT_Z24_UNORM_S8_UINT:
 return V_0280A0_COLOR_8_24;

 -   case PIPE_FORMAT_X8Z24_UNORM:
 -   case PIPE_FORMAT_S8_UINT_Z24_UNORM:
 -   return V_0280A0_COLOR_24_8;
 -
 case PIPE_FORMAT_Z32_FLOAT_S8X24_UINT:
 return V_0280A0_COLOR_X24_8_32_FLOAT;

 diff --git a/src/gallium/drivers/r600/r600_texture.c 
 b/src/gallium/drivers/r600/r600_texture.c
 index 85fc887..7f5752d 100644
 --- a/src/gallium/drivers/r600/r600_texture.c
 +++ b/src/gallium/drivers/r600/r600_texture.c
 @@ -985,11 +985,14 @@ uint32_t r600_translate_texformat(struct pipe_screen 
 *screen,
   const unsigned char *swizzle_view

Re: [Mesa-dev] [PATCH 2/2] r600g: fix lockup when hyperz alpha test are enabled together. v2

2013-02-11 Thread Jerome Glisse
On Mon, Feb 11, 2013 at 6:45 PM,  j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 Seems that alpha test being enabled confuse the GPU on the order in
 which it should perform the Z testing. So force the order programmed
 throught db shader control.

 v2: Only force z order when alpha test is enabled

 Signed-off-by: Jerome Glisse jgli...@redhat.com
 Reviewed-by: Marek Olšák mar...@gmail.com

This one does not regress piglit (redwood or rv770) and still fix
lockup afaict. If no objection i will push tomorrow.

Cheers,
Jerome

 ---
  src/gallium/drivers/r600/evergreen_state.c | 25 +++--
  src/gallium/drivers/r600/r600_state.c  | 22 +-
  2 files changed, 44 insertions(+), 3 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_state.c 
 b/src/gallium/drivers/r600/evergreen_state.c
 index 211c218..b710131 100644
 --- a/src/gallium/drivers/r600/evergreen_state.c
 +++ b/src/gallium/drivers/r600/evergreen_state.c
 @@ -2251,6 +2251,13 @@ static void evergreen_emit_db_misc_state(struct 
 r600_context *rctx, struct r600_
 if (rctx-db_state.rsurf  rctx-db_state.rsurf-htile_enabled) {
 /* FORCE_OFF means HiZ/HiS are determined by 
 DB_SHADER_CONTROL */
 db_render_override |= 
 S_02800C_FORCE_HIZ_ENABLE(V_02800C_FORCE_OFF);
 +   /* This is to fix a lockup when hyperz and alpha test are 
 enabled at
 +* the same time some how GPU get confuse on which order to 
 pick for
 +* z test
 +*/
 +   if (rctx-alphatest_state.sx_alpha_test_control) {
 +   db_render_override |= 
 S_02800C_FORCE_SHADER_Z_ORDER(1);
 +   }
 } else {
 db_render_override |= 
 S_02800C_FORCE_HIZ_ENABLE(V_02800C_FORCE_DISABLE);
 }
 @@ -3240,7 +3247,7 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, 
 struct r600_pipe_shader
 struct r600_context *rctx = (struct r600_context *)ctx;
 struct r600_pipe_state *rstate = shader-rstate;
 struct r600_shader *rshader = shader-shader;
 -   unsigned i, exports_ps, num_cout, spi_ps_in_control_0, spi_input_z, 
 spi_ps_in_control_1, db_shader_control;
 +   unsigned i, exports_ps, num_cout, spi_ps_in_control_0, spi_input_z, 
 spi_ps_in_control_1, db_shader_control = 0;
 int pos_index = -1, face_index = -1;
 int ninterp = 0;
 boolean have_linear = FALSE, have_centroid = FALSE, have_perspective 
 = FALSE;
 @@ -3250,7 +3257,6 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, 
 struct r600_pipe_shader

 rstate-nregs = 0;

 -   db_shader_control = S_02880C_Z_ORDER(V_02880C_EARLY_Z_THEN_LATE_Z);
 for (i = 0; i  rshader-ninput; i++) {
 /* evergreen NUM_INTERP only contains values interpolated 
 into the LDS,
POSITION goes via GPRs from the SC so isn't counted */
 @@ -3484,6 +3490,21 @@ void evergreen_update_db_shader_control(struct 
 r600_context * rctx)
 
 V_02880C_EXPORT_DB_FULL) |
 
 S_02880C_ALPHA_TO_MASK_DISABLE(rctx-framebuffer.cb0_is_integer);

 +   /* When alpha test is enabled we can't antrust the hw to make the 
 proper
 +* decision on the order in which ztest should be run related to 
 fragment
 +* shader execution.
 +*
 +* If alpha test is enabled perform early z rejection (RE_Z) but 
 don't early
 +* write to the zbuffer. Write to zbuffer is delayed after fragment 
 shader
 +* execution and thus after alpha test so if discarded by the alpha 
 test
 +* the z value is not written.
 +*/
 +   if (rctx-alphatest_state.sx_alpha_test_control) {
 +   db_shader_control |= S_02880C_Z_ORDER(V_02880C_RE_Z);
 +   } else {
 +   db_shader_control |= 
 S_02880C_Z_ORDER(V_02880C_EARLY_Z_THEN_LATE_Z);
 +   }
 +
 if (db_shader_control != rctx-db_misc_state.db_shader_control) {
 rctx-db_misc_state.db_shader_control = db_shader_control;
 rctx-db_misc_state.atom.dirty = true;
 diff --git a/src/gallium/drivers/r600/r600_state.c 
 b/src/gallium/drivers/r600/r600_state.c
 index 5322850..8efd4b3 100644
 --- a/src/gallium/drivers/r600/r600_state.c
 +++ b/src/gallium/drivers/r600/r600_state.c
 @@ -1966,6 +1966,13 @@ static void r600_emit_db_misc_state(struct 
 r600_context *rctx, struct r600_atom
 if (rctx-db_state.rsurf  rctx-db_state.rsurf-htile_enabled) {
 /* FORCE_OFF means HiZ/HiS are determined by 
 DB_SHADER_CONTROL */
 db_render_override |= 
 S_028D10_FORCE_HIZ_ENABLE(V_028D10_FORCE_OFF);
 +   /* This is to fix a lockup when hyperz and alpha test are 
 enabled at
 +* the same time some how GPU get confuse on which order to 
 pick for
 +* z test

Re: [Mesa-dev] [PATCH] r600g: add cs memory usage accounting and limit it

2013-01-31 Thread Jerome Glisse
On Wed, Jan 30, 2013 at 10:35 PM, Marek Olšák mar...@gmail.com wrote:
 On Wed, Jan 30, 2013 at 6:14 PM,  j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 We are now seing cs that can go over the vram+gtt size to avoid
 failing flush early cs that goes over 70% (gtt+vram) usage. 70%
 is use to allow some fragmentation.

 Signed-off-by: Jerome Glisse jgli...@redhat.com
 ---
  src/gallium/drivers/r600/evergreen_state.c|  4 
  src/gallium/drivers/r600/r600.h   |  1 +
  src/gallium/drivers/r600/r600_buffer.c|  1 +
  src/gallium/drivers/r600/r600_hw_context.c| 12 
  src/gallium/drivers/r600/r600_pipe.c  |  3 +++
  src/gallium/drivers/r600/r600_pipe.h  | 21 +
  src/gallium/drivers/r600/r600_state.c |  3 +++
  src/gallium/drivers/r600/r600_state_common.c  | 17 -
  src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 11 +++
  src/gallium/winsys/radeon/drm/radeon_winsys.h | 10 ++
  10 files changed, 82 insertions(+), 1 deletion(-)

 diff --git a/src/gallium/drivers/r600/evergreen_state.c 
 b/src/gallium/drivers/r600/evergreen_state.c
 index be1c427..84f8dce 100644
 --- a/src/gallium/drivers/r600/evergreen_state.c
 +++ b/src/gallium/drivers/r600/evergreen_state.c
 @@ -1668,6 +1668,8 @@ static void evergreen_set_framebuffer_state(struct 
 pipe_context *ctx,
 surf = (struct r600_surface*)state-cbufs[i];
 rtex = (struct r600_texture*)surf-base.texture;

 +   r600_context_add_resource_size(ctx, 
 state-cbufs[i]-texture);
 +
 if (!surf-color_initialized) {
 evergreen_init_color_surface(rctx, surf);
 }
 @@ -1699,6 +1701,8 @@ static void evergreen_set_framebuffer_state(struct 
 pipe_context *ctx,
 if (state-zsbuf) {
 surf = (struct r600_surface*)state-zsbuf;

 +   r600_context_add_resource_size(ctx, state-zsbuf-texture);
 +
 if (!surf-depth_initialized) {
 evergreen_init_depth_surface(rctx, surf);
 }
 diff --git a/src/gallium/drivers/r600/r600.h 
 b/src/gallium/drivers/r600/r600.h
 index a383c90..b9f7d3d 100644
 --- a/src/gallium/drivers/r600/r600.h
 +++ b/src/gallium/drivers/r600/r600.h
 @@ -50,6 +50,7 @@ struct r600_resource {

 /* Resource state. */
 unsigneddomains;
 +   uint64_tsize;

 Don't add this. Use r600_resource::buf::size instead, which is already
 initialized.


  };

  #define R600_BLOCK_MAX_BO  32
 diff --git a/src/gallium/drivers/r600/r600_buffer.c 
 b/src/gallium/drivers/r600/r600_buffer.c
 index 6df0d91..92f549a 100644
 --- a/src/gallium/drivers/r600/r600_buffer.c
 +++ b/src/gallium/drivers/r600/r600_buffer.c
 @@ -250,6 +250,7 @@ bool r600_init_resource(struct r600_screen *rscreen,
 break;
 }

 +   res-size = size;
 res-buf = rscreen-ws-buffer_create(rscreen-ws, size, alignment,
use_reusable_pool,
initial_domain);
 diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
 b/src/gallium/drivers/r600/r600_hw_context.c
 index ebafd97..44d3b4d 100644
 --- a/src/gallium/drivers/r600/r600_hw_context.c
 +++ b/src/gallium/drivers/r600/r600_hw_context.c
 @@ -359,6 +359,16 @@ out_err:
  void r600_need_cs_space(struct r600_context *ctx, unsigned num_dw,
 boolean count_draw_in)
  {
 +   if (!ctx-ws-cs_memory_below_limit(ctx-rings.gfx.cs, ctx-vram, 
 ctx-gtt)) {
 +   ctx-gtt = 0;
 +   ctx-vram = 0;
 +   ctx-rings.gfx.flush(ctx, RADEON_FLUSH_ASYNC);
 +   return;
 +   }
 +   /* all will be accounted once relocation are emited */
 +   ctx-gtt = 0;
 +   ctx-vram = 0;
 +
 /* The number of dwords we already used in the CS so far. */
 num_dw += ctx-rings.gfx.cs-cdw;

 @@ -784,6 +794,8 @@ void r600_begin_new_cs(struct r600_context *ctx)

 ctx-pm4_dirty_cdwords = 0;
 ctx-flags = 0;
 +   ctx-gtt = 0;
 +   ctx-vram = 0;

 /* Begin a new CS. */
 r600_emit_command_buffer(ctx-rings.gfx.cs, ctx-start_cs_cmd);
 diff --git a/src/gallium/drivers/r600/r600_pipe.c 
 b/src/gallium/drivers/r600/r600_pipe.c
 index a59578d..cb50cfe 100644
 --- a/src/gallium/drivers/r600/r600_pipe.c
 +++ b/src/gallium/drivers/r600/r600_pipe.c
 @@ -333,6 +333,9 @@ static struct pipe_context *r600_create_context(struct 
 pipe_screen *screen, void
 rctx-chip_class = rscreen-chip_class;
 rctx-keep_tiling_flags = rscreen-info.drm_minor = 12;

 +   rctx-gtt = 0;
 +   rctx-vram = 0;

 There is no reason to initialize anything to 0 in context_create. The
 whole context structure is calloc'd.


 +
 LIST_INITHEAD(rctx-active_nontimer_queries

Re: [Mesa-dev] [PATCH 3/3] radeon/winsys: add async dma infrastructure

2013-01-07 Thread Jerome Glisse
On Mon, Jan 7, 2013 at 11:03 AM, Marek Olšák mar...@gmail.com wrote:
 On Mon, Jan 7, 2013 at 3:45 PM, Christian König deathsim...@vodafone.de 
 wrote:
 On 07.01.2013 01:24, Marek Olšák wrote:

 On Sun, Jan 6, 2013 at 11:58 PM, Jerome Glisse j.gli...@gmail.com wrote:

 On Sun, Jan 6, 2013 at 4:00 PM, Marek Olšák mar...@gmail.com wrote:

 I agree with Christian. You can use a separate instance of
 radeon_winsys_cs for the DMA CS. The winsys exposes all the functions
 you need (except one) for you to coordinate work between 2 command
 streams in the pipe driver. You may only need to expose one additional
 winsys function to the driver for synchronization, it's called
 radeon_drm_cs_sync_flush. I'm confident that this can be implemented
 and layered on top of the winsys, presumably with fewer lines of code
 and cleaner.

 The relocation add function need to access both the dma ring and the
 cs ring no matter on which ring the relocation is added. Doing the
 sync in the pipe driver would increase the code, each call site of
 add_reloc would need to check if the bo is referenced by the other
 ring and flush the other ring if so. Which also means that there is a
 higher likelyhood that someone adding an add reloc forget about the
 flushing.

 Well, in that case, you can define a new set of functions in the pipe
 driver, which are layered on top of radeon_winsys_cs and the existing
 interface radeon_winsys::cs_*.

 If you want to be super clean, you can add a new module that defines
 this command stream pair:

 struct r600_cs_with_dma {
 struct radeon_winsys_cs *cs_main, *cs_dma;
 };

 And define a set of functions which work with that, reimplementing all
 the cs_* functions by calling the existing functions of radeon_winsys.
 The pipe driver would then use the new CS functions everywhere instead
 of radeon_winsys.

 To me, the best design decision here is not to try to *hack* the
 existing winsys code to make it do what you want without giving it
 another thought. Adding another layer is preferable, because it keeps
 both parts simple and separated.


 Well thinking about it more and more I don't think add_reloc is the right
 place to do the sync anyway.

 Imagine a loop that wants to handle a bunch of buffers, first they are zero
 cleared and then rendered to. Those buffers are unique, so we can zero clear
 them all at once. In an ideal world they should all end up in the same DMA
 command stream.

 Now comes a buffer that is first rendered to and then copied around (for
 example), in this moment the DMA command stream needs to be flushed, cause
 now a new DMA command stream starts that actually needs to run after the
 rendering command stream.

 So instead of flushing when we see that a buffer gets added to a command
 stream we need to remember in which oder the command stream needs to get
 submitted and only flush when this order is going to change.

 I agree with all your points. add_reloc is a bad place for
 synchronization for yet another reason: you don't really know anything
 about what the driver is trying to do and what commands and
 relocations are likely to come next, as opposed to e.g. a write
 transfer where you are 100% sure that:
 - the source resource isn't referenced by a CS nor is it busy
 - the destination resource will likely be used pretty soon as a source
 for rendering

 In addition to that, I believe that using the async DMA is useless for
 anything but write-only transfers with a staging resource. Every other
 case is always synchronous with rendering and therefore defeats the
 purpose of the *async* DMA. I therefore propose:

 1) Let's not use the async DMA in resource_copy_region *at all*.

 2) Let's replace all the resource_copy_region and copy_buffer
 occurencies in transfer_unmap with async DMA copies. The function
 implementing the copy should do all the necessary synchronization
 between command streams by itself.

 Marek

I have v3 coming that should please you.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] radeon/winsys: add async dma infrastructure

2013-01-05 Thread Jerome Glisse
On Sat, Jan 5, 2013 at 9:49 AM, Christian König deathsim...@vodafone.de wrote:
 On 04.01.2013 23:19, j.gli...@gmail.com wrote:
 [SNIP]

 diff --git a/src/gallium/drivers/r300/r300_emit.c
 b/src/gallium/drivers/r300/r300_emit.c
 index d1ed4b3..c824821 100644
 --- a/src/gallium/drivers/r300/r300_emit.c
 +++ b/src/gallium/drivers/r300/r300_emit.c
 @@ -1184,7 +1184,8 @@ validate:
   assert(tex  tex-buf  cbuf is marked, but NULL!);
   r300-rws-cs_add_reloc(r300-cs, tex-cs_buf,
   RADEON_USAGE_READWRITE,
 -r300_surface(fb-cbufs[i])-domain);
 +r300_surface(fb-cbufs[i])-domain,
 +RADEON_RING_DMA);


 ??? DMA ring on R300? At least on first glance that looks quite odd, should
 probably be GFX ring instead.

Yeah it's cut and paste error i catched up that when testing on r3xx


   }
   /* ...depth buffer... */
   if (fb-zsbuf) {
 @@ -1192,7 +1193,8 @@ validate:
   assert(tex  tex-buf  zsbuf is marked, but NULL!);
   r300-rws-cs_add_reloc(r300-cs, tex-cs_buf,
   RADEON_USAGE_READWRITE,
 -r300_surface(fb-zsbuf)-domain);
 +r300_surface(fb-zsbuf)-domain,
 +RADEON_RING_DMA);


 Same here and repeats on a couple of more places.
 [SNIP]


 diff --git a/src/gallium/winsys/radeon/drm/radeon_winsys.h
 b/src/gallium/winsys/radeon/drm/radeon_winsys.h
 index 16536dc..5ff463e 100644
 --- a/src/gallium/winsys/radeon/drm/radeon_winsys.h
 +++ b/src/gallium/winsys/radeon/drm/radeon_winsys.h
 @@ -43,11 +43,13 @@
   #include pipebuffer/pb_buffer.h
   #include libdrm/radeon_surface.h
   -#define RADEON_MAX_CMDBUF_DWORDS (16 * 1024)
 +#define RADEON_MAX_CMDBUF_DWORDS(16 * 1024)
   -#define RADEON_FLUSH_ASYNC   (1  0)
 -#define RADEON_FLUSH_KEEP_TILING_FLAGS (1  1) /* needs DRM 2.12.0 */
 -#define RADEON_FLUSH_COMPUTE   (1  2)
 +#define RADEON_FLUSH_ASYNC  (1  0)
 +#define RADEON_FLUSH_KEEP_TILING_FLAGS  (1  1) /* needs DRM 2.12.0 */
 +#define RADEON_FLUSH_COMPUTE(1  2)
 +#define RADEON_FLUSH_DMA(1  3)
 +#define RADEON_FLUSH_GFX(1  4)
 /* Tiling flags. */
   enum radeon_bo_layout {
 @@ -137,12 +139,19 @@ enum chip_class {
   TAHITI,
   };
   +enum radeon_ring_type {
 +RADEON_RING_PM4 = 0,
 +RADEON_RING_DMA = 1,
 +};
 +


 Don't use PM4 as identifier here, the PM4 packet format is used for other
 ring types beside GFX/Compute as well, but those rings can't necessary
 execute GFX/Compute commands.

I was looking for a 3 letter name that encompass gfx and compute

   struct winsys_handle;
   struct radeon_winsys_cs_handle;
 struct radeon_winsys_cs {
 -unsigned cdw;  /* Number of used dwords. */
 -uint32_t *buf; /* The command buffer. */
 +unsignedcdw;  /* Number of used dwords. */
 +uint32_t*buf; /* The command buffer. */
 +unsigneddma_cdw;  /* Number of used dwords. */
 +uint32_t*dma_buf; /* The command buffer. */
   };


 Why like this? Can't we just have separate instances of the radeon_winsys_cs
 structure for each ring type we are dealing with?

 The rest looks quite good,
 Christian.

No we can't we need to keep track at the same time for same context of
the dma ring and the gfx/compute/uvd/other ring It's the relocation
code that needs that.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] r600g/radeon/winsys: indentation cleanup

2013-01-04 Thread Jerome Glisse
On Fri, Jan 4, 2013 at 5:19 PM,  j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 Signed-off-by: Jerome Glisse jgli...@redhat.com

For the serie piglit says no regression on r7xx/evergreen. I need to
test r3xx/r5xx and SI.

Cheers,
Jerome

 ---
  src/gallium/drivers/r600/r600_pipe.c  | 18 +-
  src/gallium/drivers/r600/r600_pipe.h  |  2 +-
  src/gallium/winsys/radeon/drm/radeon_drm_bo.c |  3 +--
  src/gallium/winsys/radeon/drm/radeon_drm_cs.h |  2 +-
  4 files changed, 12 insertions(+), 13 deletions(-)

 diff --git a/src/gallium/drivers/r600/r600_pipe.c 
 b/src/gallium/drivers/r600/r600_pipe.c
 index 65dcbf8..e9d5e0a 100644
 --- a/src/gallium/drivers/r600/r600_pipe.c
 +++ b/src/gallium/drivers/r600/r600_pipe.c
 @@ -290,21 +290,21 @@ static struct pipe_context *r600_create_context(struct 
 pipe_screen *screen, void
 rctx-cs = rctx-ws-cs_create(rctx-ws);
 rctx-ws-cs_set_flush_callback(rctx-cs, r600_flush_from_winsys, 
 rctx);

 -rctx-uploader = u_upload_create(rctx-context, 1024 * 1024, 256,
 - PIPE_BIND_INDEX_BUFFER |
 - PIPE_BIND_CONSTANT_BUFFER);
 -if (!rctx-uploader)
 -goto fail;
 +   rctx-uploader = u_upload_create(rctx-context, 1024 * 1024, 256,
 +   PIPE_BIND_INDEX_BUFFER |
 +   PIPE_BIND_CONSTANT_BUFFER);
 +   if (!rctx-uploader)
 +   goto fail;

 rctx-allocator_fetch_shader = u_suballocator_create(rctx-context, 
 64 * 1024, 256,
  0, 
 PIPE_USAGE_STATIC, FALSE);
 -if (!rctx-allocator_fetch_shader)
 -goto fail;
 +   if (!rctx-allocator_fetch_shader)
 +   goto fail;

 rctx-allocator_so_filled_size = 
 u_suballocator_create(rctx-context, 4096, 4,
 -   0, 
 PIPE_USAGE_STATIC, TRUE);
 +   0, 
 PIPE_USAGE_STATIC, TRUE);
  if (!rctx-allocator_so_filled_size)
 -goto fail;
 +   goto fail;

 rctx-blitter = util_blitter_create(rctx-context);
 if (rctx-blitter == NULL)
 diff --git a/src/gallium/drivers/r600/r600_pipe.h 
 b/src/gallium/drivers/r600/r600_pipe.h
 index 6b7c053..934a6f5 100644
 --- a/src/gallium/drivers/r600/r600_pipe.h
 +++ b/src/gallium/drivers/r600/r600_pipe.h
 @@ -408,7 +408,7 @@ struct r600_context {
 struct radeon_winsys*ws;
 struct radeon_winsys_cs *cs;
 struct blitter_context  *blitter;
 -   struct u_upload_mgr *uploader;
 +   struct u_upload_mgr *uploader;
 struct u_suballocator   *allocator_so_filled_size;
 struct u_suballocator   *allocator_fetch_shader;
 struct util_slab_mempoolpool_transfers;
 diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
 b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
 index 07e92c5..897e962 100644
 --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
 +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
 @@ -802,8 +802,7 @@ static void radeon_bo_set_tiling(struct pb_buffer *_buf,
  sizeof(args));
  }

 -static struct radeon_winsys_cs_handle *radeon_drm_get_cs_handle(
 -struct pb_buffer *_buf)
 +static struct radeon_winsys_cs_handle *radeon_drm_get_cs_handle(struct 
 pb_buffer *_buf)
  {
  /* return radeon_bo. */
  return (struct radeon_winsys_cs_handle*)get_radeon_bo(_buf);
 diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.h 
 b/src/gallium/winsys/radeon/drm/radeon_drm_cs.h
 index 6336d3a..286eb6a 100644
 --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.h
 +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.h
 @@ -33,7 +33,7 @@
  struct radeon_cs_context {
  uint32_tbuf[RADEON_MAX_CMDBUF_DWORDS];

 -int fd;
 +int fd;
  struct drm_radeon_cscs;
  struct drm_radeon_cs_chunk  chunks[3];
  uint64_tchunk_array[3];
 --
 1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] radeon/winsys: add async dma infrastructure

2013-01-04 Thread Jerome Glisse
On Fri, Jan 4, 2013 at 6:33 PM, Alex Deucher alexdeuc...@gmail.com wrote:
 On Fri, Jan 4, 2013 at 5:19 PM,  j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 The design is to take advantage of the fact that kernel will emit
 semaphore when buffer is referenced by different ring. So the only
 thing we need to enforce synchronization btw dma and gfx/compute
 ring is to make sure that we never reference same bo at the same
 time on the dma and gfx ring.

 This is achieved by tracking relocation, when we add a relocation
 to the dma ring for a bo we check first if the bo has an active
 relocation on the gfx ring. If it's the case we flush the gfx ring.
 We do the same when adding a bo to the gfx ring we check it does
 not have a relocation on the dma ring if it has one we flush the
 dma ring.

 This patch also simplify the helper query function to know if a bo
 has pending write/read command.

 Looks good.  A couple of minor comments below.  BTW, any performance gains?


No, there isn't much benchmark that will trigger a lot of buffer copy
AFAICT. Here is a WIP patch for texture copy :
http://people.freedesktop.org/~glisse/0001-r600g-r7xx-use-async-dma-for-resource-copy.patch

Kernel mostly reject the command stream so far i need to check what's going on.

Cheers,
Jerome

 Alex


 Signed-off-by: Jerome Glisse jgli...@redhat.com
 ---
  src/gallium/drivers/r300/r300_emit.c   |  21 +-
  src/gallium/drivers/r300/r300_flush.c  |   7 +-
  src/gallium/drivers/r600/evergreen_hw_context.c|  39 +++
  src/gallium/drivers/r600/evergreend.h  |  16 ++
  src/gallium/drivers/r600/r600.h|  13 +
  src/gallium/drivers/r600/r600_blit.c   |  94 +--
  src/gallium/drivers/r600/r600_hw_context.c |  44 +++-
  src/gallium/drivers/r600/r600_pipe.c   |  13 +-
  src/gallium/drivers/r600/r600_pipe.h   |   2 +-
  src/gallium/drivers/r600/r600_texture.c|   2 +-
  src/gallium/drivers/r600/r600d.h   |  16 ++
  src/gallium/drivers/radeonsi/r600_hw_context.c |   2 +-
  .../drivers/radeonsi/r600_hw_context_priv.h|   2 +-
  src/gallium/drivers/radeonsi/r600_texture.c|   2 +-
  src/gallium/drivers/radeonsi/radeonsi_pipe.c   |  13 +-
  src/gallium/winsys/radeon/drm/radeon_drm_bo.c  |  10 +-
  src/gallium/winsys/radeon/drm/radeon_drm_bo.h  |   2 +
  src/gallium/winsys/radeon/drm/radeon_drm_cs.c  | 270 
 +
  src/gallium/winsys/radeon/drm/radeon_drm_cs.h  |  40 ++-
  src/gallium/winsys/radeon/drm/radeon_drm_winsys.c  |   6 +
  src/gallium/winsys/radeon/drm/radeon_winsys.h  |  28 ++-
  21 files changed, 509 insertions(+), 133 deletions(-)

 diff --git a/src/gallium/drivers/r300/r300_emit.c 
 b/src/gallium/drivers/r300/r300_emit.c
 index d1ed4b3..c824821 100644
 --- a/src/gallium/drivers/r300/r300_emit.c
 +++ b/src/gallium/drivers/r300/r300_emit.c
 @@ -1184,7 +1184,8 @@ validate:
  assert(tex  tex-buf  cbuf is marked, but NULL!);
  r300-rws-cs_add_reloc(r300-cs, tex-cs_buf,
  RADEON_USAGE_READWRITE,
 -r300_surface(fb-cbufs[i])-domain);
 +r300_surface(fb-cbufs[i])-domain,
 +RADEON_RING_DMA);
  }
  /* ...depth buffer... */
  if (fb-zsbuf) {
 @@ -1192,7 +1193,8 @@ validate:
  assert(tex  tex-buf  zsbuf is marked, but NULL!);
  r300-rws-cs_add_reloc(r300-cs, tex-cs_buf,
  RADEON_USAGE_READWRITE,
 -r300_surface(fb-zsbuf)-domain);
 +r300_surface(fb-zsbuf)-domain,
 +RADEON_RING_DMA);
  }
  }
  if (r300-textures_state.dirty) {
 @@ -1204,18 +1206,21 @@ validate:

  tex = r300_resource(texstate-sampler_views[i]-base.texture);
  r300-rws-cs_add_reloc(r300-cs, tex-cs_buf, 
 RADEON_USAGE_READ,
 -tex-domain);
 +tex-domain,
 +RADEON_RING_DMA);
  }
  }
  /* ...occlusion query buffer... */
  if (r300-query_current)
  r300-rws-cs_add_reloc(r300-cs, r300-query_current-cs_buf,
 -RADEON_USAGE_WRITE, RADEON_DOMAIN_GTT);
 +RADEON_USAGE_WRITE, RADEON_DOMAIN_GTT,
 +RADEON_RING_DMA);
  /* ...vertex buffer for SWTCL path... */
  if (r300-vbo)
  r300-rws-cs_add_reloc(r300-cs, r300_resource(r300-vbo)-cs_buf,
  RADEON_USAGE_READ,
 -r300_resource(r300-vbo)-domain);
 +r300_resource(r300-vbo)-domain

Re: [Mesa-dev] [PATCH] r600g: add cs tracing infrastructure for lockup pin pointing

2012-12-19 Thread Jerome Glisse
On Wed, Dec 19, 2012 at 12:17 PM,  j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 It's a build time option you need to set R600_TRACE_CS to 1 and it
 will print to stderr all cs along as cs trace point value which
 gave last offset into a cs process by the GPU.

 Signed-off-by: Jerome Glisse jgli...@redhat.com

For information this is something i have been using for a while and i
am just getting tire of porting it over and over so i cleaned it up
into something that i believe is usefull. My rdb tools can be used to
annotate cs output given by this infrastructure: rdb_annotateib
hd2xxx.rdb dumpfile  dumpfile.readablebyhuman

I gave the the last dw before lockup. If you don't have many
application running at the same time it has proven to be accurate most
of the time.

Note you will need the kernel patch i just sent.

Cheers,
Jerome


 ---
  src/gallium/drivers/r600/r600_hw_context.c  | 41 
 +
  src/gallium/drivers/r600/r600_hw_context_priv.h |  5 +--
  src/gallium/drivers/r600/r600_pipe.c| 20 
  src/gallium/drivers/r600/r600_pipe.h| 16 ++
  src/gallium/drivers/r600/r600_state_common.c| 26 
  5 files changed, 106 insertions(+), 2 deletions(-)

 diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
 b/src/gallium/drivers/r600/r600_hw_context.c
 index cdd31a4..6c8cb9d 100644
 --- a/src/gallium/drivers/r600/r600_hw_context.c
 +++ b/src/gallium/drivers/r600/r600_hw_context.c
 @@ -27,6 +27,7 @@
  #include r600d.h
  #include util/u_memory.h
  #include errno.h
 +#include unistd.h

  /* Get backends mask */
  void r600_get_backend_mask(struct r600_context *ctx)
 @@ -369,6 +370,11 @@ void r600_need_cs_space(struct r600_context *ctx, 
 unsigned num_dw,
 for (i = 0; i  R600_NUM_ATOMS; i++) {
 if (ctx-atoms[i]  ctx-atoms[i]-dirty) {
 num_dw += ctx-atoms[i]-num_dw;
 +#if R600_TRACE_CS
 +   if (ctx-screen-trace_bo) {
 +   num_dw += R600_TRACE_CS_DWORDS;
 +   }
 +#endif
 }
 }

 @@ -376,6 +382,11 @@ void r600_need_cs_space(struct r600_context *ctx, 
 unsigned num_dw,

 /* The upper-bound of how much space a draw command would 
 take. */
 num_dw += R600_MAX_FLUSH_CS_DWORDS + R600_MAX_DRAW_CS_DWORDS;
 +#if R600_TRACE_CS
 +   if (ctx-screen-trace_bo) {
 +   num_dw += R600_TRACE_CS_DWORDS;
 +   }
 +#endif
 }

 /* Count in queries_suspend. */
 @@ -717,7 +728,37 @@ void r600_context_flush(struct r600_context *ctx, 
 unsigned flags)
 }

 /* Flush the CS. */
 +#if R600_TRACE_CS
 +   if (ctx-screen-trace_bo) {
 +   struct r600_screen *rscreen = ctx-screen;
 +   unsigned i;
 +
 +   for (i = 0; i  cs-cdw; i++) {
 +   fprintf(stderr, [%4d] [%5d] 0x%08x\n, 
 rscreen-cs_count, i, cs-buf[i]);
 +   }
 +   rscreen-cs_count++;
 +   }
 +#endif
 ctx-ws-cs_flush(ctx-cs, flags);
 +#if R600_TRACE_CS
 +   if (ctx-screen-trace_bo) {
 +   struct r600_screen *rscreen = ctx-screen;
 +   unsigned i;
 +
 +   for (i = 0; i  10; i++) {
 +   usleep(5);
 +   if (!ctx-ws-buffer_is_busy(rscreen-trace_bo-buf, 
 RADEON_USAGE_READWRITE)) {
 +   break;
 +   }
 +   }
 +   if (i == 10) {
 +   fprintf(stderr, timeout on cs lockup likely happen 
 at cs %d dw %d\n,
 +   rscreen-trace_ptr[1], rscreen-trace_ptr[0]);
 +   } else {
 +   fprintf(stderr, cs %d executed in %dms\n, 
 rscreen-trace_ptr[1], i * 5);
 +   }
 +   }
 +#endif

 r600_begin_new_cs(ctx);
  }
 diff --git a/src/gallium/drivers/r600/r600_hw_context_priv.h 
 b/src/gallium/drivers/r600/r600_hw_context_priv.h
 index 050c472..692e6ec 100644
 --- a/src/gallium/drivers/r600/r600_hw_context_priv.h
 +++ b/src/gallium/drivers/r600/r600_hw_context_priv.h
 @@ -29,8 +29,9 @@
  #include r600_pipe.h

  /* the number of CS dwords for flushing and drawing */
 -#define R600_MAX_FLUSH_CS_DWORDS 12
 -#define R600_MAX_DRAW_CS_DWORDS 34
 +#define R600_MAX_FLUSH_CS_DWORDS   12
 +#define R600_MAX_DRAW_CS_DWORDS34
 +#define R600_TRACE_CS_DWORDS   7

  /* these flags are used in register flags and added into block flags */
  #define REG_FLAG_NEED_BO 1
 diff --git a/src/gallium/drivers/r600/r600_pipe.c 
 b/src/gallium/drivers/r600/r600_pipe.c
 index e497744..7990400 100644
 --- a/src/gallium/drivers/r600/r600_pipe.c
 +++ b/src/gallium/drivers/r600/r600_pipe.c
 @@ -723,6 +723,12 @@ static void r600_destroy_screen(struct pipe_screen

Re: [Mesa-dev] [PATCH 2/2] r600g: texture buffer object + glsl 1.40 enable support

2012-12-19 Thread Jerome Glisse
On Wed, Dec 19, 2012 at 12:33 PM, Tom Stellard t...@stellard.net wrote:
 On Sun, Dec 16, 2012 at 08:33:23PM +1000, Dave Airlie wrote:
 From: Dave Airlie airl...@redhat.com

 This adds TBO support to r600g, and with GLSL 1.40 enabled,
 we now get 3.1 core profiles advertised for r600g.

 This code is evergreen only so far, but I don't think there is
 much to make it work on r600/700/cayman other than testing.

 a) buffer txq is broken like cube map txq, this sucks, fix it the
 exact same way.

 b) buffer fetches are done with a vertex clause,

 c) vertex swizzling offsets are different than texture swizzles,
 but we still need to use the combiner, so make it configurable.

 d) add implementation of UCMP.

 TODO: r600/700/cayman testin
 Signed-off-by: Dave Airlie airl...@redhat.com
 ---
  src/gallium/drivers/r600/evergreen_state.c   | 55 
  src/gallium/drivers/r600/r600_asm.c  |  2 +-
  src/gallium/drivers/r600/r600_asm.h  |  2 +
  src/gallium/drivers/r600/r600_pipe.c |  4 +-
  src/gallium/drivers/r600/r600_pipe.h | 10 +++-
  src/gallium/drivers/r600/r600_shader.c   | 75 
 
  src/gallium/drivers/r600/r600_shader.h   |  1 +
  src/gallium/drivers/r600/r600_state_common.c | 58 +
  src/gallium/drivers/r600/r600_texture.c  | 16 --
  9 files changed, 204 insertions(+), 19 deletions(-)


 [snip]

 diff --git a/src/gallium/drivers/r600/r600_shader.c 
 b/src/gallium/drivers/r600/r600_shader.c
 index feb7001..60667e7 100644
 --- a/src/gallium/drivers/r600/r600_shader.c
 +++ b/src/gallium/drivers/r600/r600_shader.c
 @@ -3819,6 +3819,71 @@ static inline unsigned tgsi_tex_get_src_gpr(struct 
 r600_shader_ctx *ctx,
   return ctx-file_offset[inst-Src[index].Register.File] + 
 inst-Src[index].Register.Index;
  }

 +static int do_vtx_fetch_inst(struct r600_shader_ctx *ctx, boolean 
 src_requires_loading)
 +{
 + struct r600_bytecode_vtx vtx;
 + struct r600_bytecode_alu alu;
 + struct tgsi_full_instruction *inst = 
 ctx-parse.FullToken.FullInstruction;
 + int src_gpr, r, i;
 +
 + src_gpr = tgsi_tex_get_src_gpr(ctx, 0);
 + if (src_requires_loading) {
 + for (i = 0; i  4; i++) {
 + memset(alu, 0, sizeof(struct r600_bytecode_alu));
 + alu.inst = 
 CTX_INST(V_SQ_ALU_WORD1_OP2_SQ_OP2_INST_MOV);
 + r600_bytecode_src(alu.src[0], ctx-src[0], i);
 + alu.dst.sel = ctx-temp_reg;
 + alu.dst.chan = i;
 + if (i == 3)
 + alu.last = 1;
 + alu.dst.write = 1;
 + r = r600_bytecode_add_alu(ctx-bc, alu);
 + if (r)
 + return r;
 + }
 + src_gpr = ctx-temp_reg;
 + }
 +
 + memset(vtx, 0, sizeof(vtx));
 + vtx.inst = 0;
 + vtx.buffer_id = tgsi_tex_get_src_gpr(ctx, 1) + R600_MAX_CONST_BUFFERS;;
 + vtx.fetch_type = 2; /* VTX_FETCH_NO_INDEX_OFFSET */
 + vtx.src_gpr = src_gpr;
 + vtx.mega_fetch_count = 16;
 + vtx.dst_gpr = ctx-file_offset[inst-Dst[0].Register.File] + 
 inst-Dst[0].Register.Index;
 + vtx.dst_sel_x = (inst-Dst[0].Register.WriteMask  1) ? 0 : 7; 
  /* SEL_X */
 + vtx.dst_sel_y = (inst-Dst[0].Register.WriteMask  2) ? 1 : 7; 
  /* SEL_Y */
 + vtx.dst_sel_z = (inst-Dst[0].Register.WriteMask  4) ? 2 : 7; 
  /* SEL_Z */
 + vtx.dst_sel_w = (inst-Dst[0].Register.WriteMask  8) ? 3 : 7; 
  /* SEL_W */
 + vtx.use_const_fields = 1;
 + vtx.srf_mode_all = 1;   /* SRF_MODE_NO_ZERO */
 +

 According to the docs, srf_mode_all will be ignored if use_const_fields
 is set.  However, based on my tests while running compute shaders, other
 fields like data_format, which are supposed to be ignored weren't being
 ignored unless the were set to zero.  So, I think it would be safer
 here to set srf_mode_all to zero and make sure that bit gets set on
 the resource.


 + if ((r = r600_bytecode_add_vtx(ctx-bc, vtx)))
 + return r;
 + return 0;
 +}
 +

 Otherwise, this code for vtx fetch looks good to me.  One problem I ran into
 with vtx fetch instructions while working on compute shaders was that
 the GPU will hang if you write to vtx.src_gpr in the
 instruction group following the vtx fetch.  Here is a simple example:

 %T2_Xdef = MOV %ZERO
 %T3_Xdef = VTX_READ_eg %T2_Xkill, 24
 %T2_Xdef = MOV %ZERO

 I'm not sure if this happens on all GPU variants, but I was able to
 consistently reproduce this on my SUMO.  You may want to keep an eye
 out for this in case you run into any unexplainable hangs.


The vtx fetch group had the barrier flag set ?

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] r600g: rework flusing and synchronization pattern v4

2012-12-08 Thread Jerome Glisse
On Sat, Dec 8, 2012 at 7:27 PM, Marek Olšák mar...@gmail.com wrote:
 Hi Jerome,

 I'm okay with the simplification of r600_flush_emit, I'm not so okay
 with some other things. There's also some cruft unrelated to flushing.

 1) R600_CONTEXT_FLUSH could have a better name, because it's not clear
 what it does. (it looks like it only flushed read-only bindings)

GPU_FLUSH ?

 2) Don't use magic numbers when setting cp_coher_cntl unless you want
 to hide something from us / obfuscating the code. :)

 3) The definition of R600_MAX_FLUSH_CS_DWORDS should be updated.

Yes i haven't recomputed worst case

 4) SURFACE_BASE_UPDATE is emitted twice in emit_framebuffer_state. I
 don't think splitting one packet into two packets doing the same thing
 is needed.

It's need couple r6xx/r7xx gpu will lockup after couple hour of
stressing, wasn't seeing lockup with it.

 5) RS780 and RS880 don't need SURFACE_BASE_UPDATE for streamout. Their
 streamout hardware was actually copied from R700. Doing  CHIP_RS780
 instead of  CHIP_RV770 was correct. The same for r600_flush_emit.

fglrx mostly do the same on r7xx and r6xx for streamout as i am not
sure i have any stressing test for that i side on fglrx side.

 6) In r600_context_flush, don't remove the comment about flushing
 framebuffer caches, because it's still done there.

 7) Masking out R600_CONTEXT_FLUSH in r600_context_emit_fence is not
 correct. We should still flush the caches later if they're dirty and
 even if the fence was emitted. You can't see this regression in
 piglit, because we don't have a test for that.
True
 8) There's some inconsistent flushing between graphics and compute
 colorbuffer bindings. For graphics, you use (WAIT_IDLE |
 FLUSH_AND_INV), which makes sense. For compute, you use
 R600_CONTEXT_FLUSH (which is used for vertex buffers and the like
 elsewhere, but not colorbuffers).

I haven't paid much attention to compute side, i should probably look at it.

 And one question:

 Why do you use set both FLUSH_AND_INV and STREAMOUT_FLUSH on
 Evergreen, while r600 only gets FLUSH_AND_INV? Did you overlook this?

No, just matching fglrx pattern, i don't think i tested without that
change, but it definitly match fglrx.

Cheers,
Jerome

 Marek

 On Thu, Dec 6, 2012 at 8:51 PM,  j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 This bring r600g allmost inline with closed source driver when
 it comes to flushing and synchronization pattern.

 Signed-off-by: Jerome Glisse jgli...@redhat.com
 ---
  src/gallium/drivers/r600/evergreen_compute.c   |   8 +-
  .../drivers/r600/evergreen_compute_internal.c  |   4 +-
  src/gallium/drivers/r600/evergreen_state.c |   4 +-
  src/gallium/drivers/r600/r600.h|  16 +--
  src/gallium/drivers/r600/r600_hw_context.c | 154 
 -
  src/gallium/drivers/r600/r600_state.c  |  18 ++-
  src/gallium/drivers/r600/r600_state_common.c   |  19 ++-
  7 files changed, 61 insertions(+), 162 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
 b/src/gallium/drivers/r600/evergreen_compute.c
 index 44831a7..33a5910 100644
 --- a/src/gallium/drivers/r600/evergreen_compute.c
 +++ b/src/gallium/drivers/r600/evergreen_compute.c
 @@ -98,7 +98,7 @@ static void evergreen_cs_set_vertex_buffer(

 /* The vertex instructions in the compute shaders use the texture 
 cache,
  * so we need to invalidate it. */
 -   rctx-flags |= R600_CONTEXT_TEX_FLUSH;
 +   rctx-flags |= R600_CONTEXT_FLUSH;
 state-enabled_mask |= 1  vb_index;
 state-dirty_mask |= 1  vb_index;
 state-atom.dirty = true;
 @@ -329,7 +329,7 @@ static void compute_emit_cs(struct r600_context *ctx, 
 const uint *block_layout,
  */
 r600_emit_command_buffer(ctx-cs, ctx-start_compute_cs_cmd);

 -   ctx-flags |= R600_CONTEXT_CB_FLUSH;
 +   ctx-flags |= R600_CONTEXT_FLUSH;
 r600_flush_emit(ctx);

 /* Emit colorbuffers. */
 @@ -409,7 +409,7 @@ static void compute_emit_cs(struct r600_context *ctx, 
 const uint *block_layout,

 /* XXX evergreen_flush_emit() hardcodes the CP_COHER_SIZE to 
 0x
  */
 -   ctx-flags |= R600_CONTEXT_CB_FLUSH;
 +   ctx-flags |= R600_CONTEXT_FLUSH;
 r600_flush_emit(ctx);

  #if 0
 @@ -468,7 +468,7 @@ void evergreen_emit_cs_shader(
 r600_write_value(cs, r600_context_bo_reloc(rctx, kernel-code_bo,
 RADEON_USAGE_READ));

 -   rctx-flags |= R600_CONTEXT_SHADERCONST_FLUSH;
 +   rctx-flags |= R600_CONTEXT_FLUSH;
  }

  static void evergreen_launch_grid(
 diff --git a/src/gallium/drivers/r600/evergreen_compute_internal.c 
 b/src/gallium/drivers/r600/evergreen_compute_internal.c
 index 7bc7fb4..187bcf1 100644
 --- a/src/gallium/drivers/r600/evergreen_compute_internal.c
 +++ b/src/gallium/drivers/r600/evergreen_compute_internal.c
 @@ -538,7 +538,7 @@ void

Re: [Mesa-dev] Proposal: allow hidden security bugs on Mesa's Bugzilla

2012-11-30 Thread Jerome Glisse
On Fri, Nov 30, 2012 at 7:43 AM, Benoit Jacob bja...@mozilla.com wrote:
 On 12-11-23 02:21 PM, Benoit Jacob wrote:
 On 12-11-21 12:48 PM, Chad Versace wrote:
 On 11/20/2012 09:29 AM, Benoit Jacob wrote:

 Any questions?
 Do you support or oppose me asking FD.o admins to allow hidden bugs on
 Mesa's bugzilla?

 Benoit
 I support this. It seems a sensible proposal for addressing security bugs.

 Thanks. I have just sent the request to FD.o admins.

 Benoit

 This option is now turned on on Bugzilla.

 See the new checkbox: Mesa Security Group

 Thanks!
 Benoit


How does one get into the security group ?

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix abysmal performance in Reaction Quake

2012-11-09 Thread Jerome Glisse
On Thu, Nov 01, 2012 at 03:13:31AM +0100, Marek Olšák wrote:
 On Thu, Nov 1, 2012 at 2:13 AM, Alex Deucher alexdeuc...@gmail.com wrote:
  On Wed, Oct 31, 2012 at 8:05 PM, Marek Olšák mar...@gmail.com wrote:
  The problem was we set VRAM|GTT for relocations of STATIC resources.
  Setting just VRAM increases the framerate 4 times on my machine.
 
  I rewrote the switch statement and adjusted the domains for window
  framebuffers too.
 
  Reviewed-by: Alex Deucher alexander.deuc...@amd.com
 
  Stable branches?
 
 Yes, good idea.
 
 Marek

Btw as a follow up on this, i did some experiment with ttm and eviction.
Blocking any vram eviction improve average fps (20-30%) and minimum fps
(40-60%) but it diminish maximum fps (100%). Overall blocking eviction
just make framerate more consistant.

I then tried several heuristic on the eviction process (not evicting buffer
if buffer was use in the last 1ms, 10ms, 20ms ..., sorting lru differently
btw buffer used for rendering and auxiliary buffer use by kernel, ...
none of those heuristic improved anything. I also removed bo wait in the
eviction pipeline but still no improvement. Haven't time to look further
but anyway bottom line is that some benchmark are memory tight and constant
eviction hurt.

(used unigine heaven and reaction quake for benchmark)

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] R600 tiling halves the frame rate

2012-10-31 Thread Jerome Glisse
On Tue, Oct 30, 2012 at 8:49 PM, Tzvetan Mikov tmi...@jupiter.com wrote:
 On 10/30/2012 05:20 PM, Tzvetan Mikov wrote:

 Thanks a lot! I reproduced the same results here and I think I have
 figured out what the problem is. The frame buffer is always created in
 linear mode. The temporary hack included below doubles the performance
 for me with EGL.

 Could you please check if it has the same result for you?

 If it does, what would be the next step to address this? I guess I could
 try to prepare a real patch to fix this, as soon as I figure the right
 way to do it... :-) I am new to Mesa, but I am making my way through the
 code base.

 regards,
 Tzvetan


 commit 10bb3497caba1655022a53a3a04c81be6e122faa
 Author: Tzvetan Mikov tmi...@jupiter.com
 Date:   Tue Oct 30 17:12:42 2012 -0700

  r600_texture.c: HACK to enforce tiling in the default case

 diff --git a/src/gallium/drivers/r600/r600_texture.c
 b/src/gallium/drivers/r600/r600_texture.c
 index 85e4e0c..f415de3 100644
 --- a/src/gallium/drivers/r600/r600_texture.c
 +++ b/src/gallium/drivers/r600/r600_texture.c
 @@ -450,7 +450,7 @@ struct pipe_resource *r600_texture_create(struct
 pipe_screen *screen,
   {
   struct r600_screen *rscreen = (struct r600_screen*)screen;
   struct radeon_surface surface;
 -unsigned array_mode = 0;
 +unsigned array_mode = V_038000_ARRAY_1D_TILED_THIN1;
   int r;

   if (!(templ-flags  R600_RESOURCE_FLAG_TRANSFER)) {



 I just noticed that with this hack the display doesn't look quite right, so
 while it hopefully points in the right direction, the real fix is likely to
 be much more involved. My enthusiasm may have been premature :-)

 regards,
 Tzvetan

For it to look right we need mesa to call into the kernel to tell the
kernel what is the bo tiling format. We should do that for scanout
buffer. This will fix your issue and you probably want 2d tiled not 1d
for scanout.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix abysmal performance in Reaction Quake

2012-10-31 Thread Jerome Glisse
On Wed, Oct 31, 2012 at 8:05 PM, Marek Olšák mar...@gmail.com wrote:
 The problem was we set VRAM|GTT for relocations of STATIC resources.
 Setting just VRAM increases the framerate 4 times on my machine.

 I rewrote the switch statement and adjusted the domains for window
 framebuffers too.

Reviewed-by: Jerome Glisse jgli...@redhat.com

 ---
  src/gallium/drivers/r600/r600_buffer.c  |   42 
 ---
  src/gallium/drivers/r600/r600_texture.c |3 ++-
  2 files changed, 24 insertions(+), 21 deletions(-)

 diff --git a/src/gallium/drivers/r600/r600_buffer.c 
 b/src/gallium/drivers/r600/r600_buffer.c
 index f4566ee..116ab51 100644
 --- a/src/gallium/drivers/r600/r600_buffer.c
 +++ b/src/gallium/drivers/r600/r600_buffer.c
 @@ -206,29 +206,31 @@ bool r600_init_resource(struct r600_screen *rscreen,
  {
 uint32_t initial_domain, domains;

 -   /* Staging resources particpate in transfers and blits only
 -* and are used for uploads and downloads from regular
 -* resources.  We generate them internally for some transfers.
 -*/
 -   if (usage == PIPE_USAGE_STAGING) {
 +   switch(usage) {
 +   case PIPE_USAGE_STAGING:
 +   /* Staging resources participate in transfers, i.e. are used
 +* for uploads and downloads from regular resources.
 +* We generate them internally for some transfers.
 +*/
 +   initial_domain = RADEON_DOMAIN_GTT;
 domains = RADEON_DOMAIN_GTT;
 +   break;
 +   case PIPE_USAGE_DYNAMIC:
 +   case PIPE_USAGE_STREAM:
 +   /* Default to GTT, but allow the memory manager to move it to 
 VRAM. */
 initial_domain = RADEON_DOMAIN_GTT;
 -   } else {
 domains = RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM;
 -
 -   switch(usage) {
 -   case PIPE_USAGE_DYNAMIC:
 -   case PIPE_USAGE_STREAM:
 -   case PIPE_USAGE_STAGING:
 -   initial_domain = RADEON_DOMAIN_GTT;
 -   break;
 -   case PIPE_USAGE_DEFAULT:
 -   case PIPE_USAGE_STATIC:
 -   case PIPE_USAGE_IMMUTABLE:
 -   default:
 -   initial_domain = RADEON_DOMAIN_VRAM;
 -   break;
 -   }
 +   break;
 +   case PIPE_USAGE_DEFAULT:
 +   case PIPE_USAGE_STATIC:
 +   case PIPE_USAGE_IMMUTABLE:
 +   default:
 +   /* Don't list GTT here, because the memory manager would put 
 some
 +* resources to GTT no matter what the initial domain is.
 +* Not listing GTT in the domains improves performance a lot. 
 */
 +   initial_domain = RADEON_DOMAIN_VRAM;
 +   domains = RADEON_DOMAIN_VRAM;
 +   break;
 }

 res-buf = rscreen-ws-buffer_create(rscreen-ws, size, alignment, 
 bind, initial_domain);
 diff --git a/src/gallium/drivers/r600/r600_texture.c 
 b/src/gallium/drivers/r600/r600_texture.c
 index 785eeff..2df390d 100644
 --- a/src/gallium/drivers/r600/r600_texture.c
 +++ b/src/gallium/drivers/r600/r600_texture.c
 @@ -421,9 +421,10 @@ r600_texture_create_object(struct pipe_screen *screen,
 return NULL;
 }
 } else if (buf) {
 +   /* This is usually the window framebuffer. We want it in 
 VRAM, always. */
 resource-buf = buf;
 resource-cs_buf = rscreen-ws-buffer_get_cs_handle(buf);
 -   resource-domains = RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM;
 +   resource-domains = RADEON_DOMAIN_VRAM;
 }

 if (rtex-cmask_size) {
 --
 1.7.9.5

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] R600 tiling halves the frame rate

2012-10-30 Thread Jerome Glisse
On Tue, Oct 30, 2012 at 10:43 AM, Tzvetan Mikov tmi...@jupiter.com wrote:
 On 10/30/2012 07:12 AM, Patrick Baggett wrote:
 Is your screen refresh rate 70 Hz? Because if so, that means that it's
 syncing to the vblank on Mesa, and not doing so on the proprietary one.

 Unfortunately no. In fact the Gallium EGL/R600 doesn't support flip on
 vsync at all - eglSwapInterval is always 0. The output is a standard
 60Hz LCD, plus I do get different, (but still low in absolute terms)
 frame rates with different chips. Off the top of my head:
 - HD5430 - 120 FPS
 - HD6450 - 140 FPS
 - HD6460 -  70 FPS
 - HD6750 - 400 FPS
 - HD6760 - 240 FPS

 I do think there is something fishy with the page flip though, which I
 am planning to investigate today. It is way too slow - a render loop
 which does nothing but a eglSwapBuffers() (no actual rendering
 whatsoever) runs at only 350 FPS. It should be either 60FPS, or thousands.

 regards,
 Tzvetan


So tested, it's something inside egl that lead to this, same program
as yours with glut on X11 with 2d tiling enabled and 2d color tiling
have a slight advantage 140fps vs 137fps (windowed so there is a blit
which would account for a hugue chunk of perf diff with fglrx).

However using egl i got 70fps with color tiling and 74fps without. So
something in egl is slowing things down.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] R600 tiling halves the frame rate

2012-10-26 Thread Jerome Glisse
On Fri, Oct 26, 2012 at 8:07 PM, Tzvetan Mikov tmi...@jupiter.com wrote:
 Hi,
 I have been running tests with Mesa 9.0 and Rdeon R600 (Radeon HD 6460) and I 
 accidentally noticed that a small hack I did to disable texture tiling, 
 actually *doubles* the frame rate. With different chips (e.g. 6750) the 
 difference is less pronounced, but in all cases texture tiling decreased the 
 performance noticeably in my tests.

 Can anyone shed some light on this? Is this by design - e.g. is this a case 
 of we know that tiling is currently slower than linear but the huge payoff 
 is scheduled to arrive in a future revision?

 Thanks!
 Tzvetan

No, in all benchmark i made on various gpu from hd2xxx to hd6xxx
tiling always gave a performance boost btw 5% up to 20%.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: avoid shader needing too many gpr to lockup the gpu

2012-10-26 Thread Jerome Glisse
On Fri, Oct 26, 2012 at 10:01 PM,  j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 On r6xx/r7xx shader resource management need to make sure that the
 shader does not goes over the gpr register limit. Each specific
 asic has a maxmimum register that can be split btw shader stage.
 For each stage the shader must not use more register than the
 limit programmed.

 Signed-off-by: Jerome Glisse jgli...@redhat.com

I haven't yet fully tested it on wide range of GPU but it fixes piglit
case that were locking up o one can directly use quick-drivers. I
mostly would like feedback on if we should print a warning when we
discard a draw command because shader exceed limit.

Note that with this patch the test that were locking up fails but with
a simple patch on top of that (decreasing clause temp gpr to 2) they
pass.

Regards,
Jerome

 ---
  src/gallium/drivers/r600/r600_pipe.h |  1 +
  src/gallium/drivers/r600/r600_state.c| 60 
 +++-
  src/gallium/drivers/r600/r600_state_common.c | 22 +-
  3 files changed, 55 insertions(+), 28 deletions(-)

 diff --git a/src/gallium/drivers/r600/r600_pipe.h 
 b/src/gallium/drivers/r600/r600_pipe.h
 index ff2a5fd..2045af3 100644
 --- a/src/gallium/drivers/r600/r600_pipe.h
 +++ b/src/gallium/drivers/r600/r600_pipe.h
 @@ -363,6 +363,7 @@ struct r600_context {
 enum chip_class chip_class;
 boolean has_vertex_cache;
 boolean keep_tiling_flags;
 +   booldiscard_draw;
 unsigneddefault_ps_gprs, default_vs_gprs;
 unsignedr6xx_num_clause_temp_gprs;
 unsignedbackend_mask;
 diff --git a/src/gallium/drivers/r600/r600_state.c 
 b/src/gallium/drivers/r600/r600_state.c
 index 7d07008..43af934 100644
 --- a/src/gallium/drivers/r600/r600_state.c
 +++ b/src/gallium/drivers/r600/r600_state.c
 @@ -2189,30 +2189,54 @@ void r600_init_state_functions(struct r600_context 
 *rctx)
  /* Adjust GPR allocation on R6xx/R7xx */
  void r600_adjust_gprs(struct r600_context *rctx)
  {
 -   unsigned num_ps_gprs = rctx-default_ps_gprs;
 -   unsigned num_vs_gprs = rctx-default_vs_gprs;
 +   unsigned num_ps_gprs = rctx-ps_shader-current-shader.bc.ngpr;
 +   unsigned num_vs_gprs = rctx-vs_shader-current-shader.bc.ngpr;
 +   unsigned new_num_ps_gprs = num_ps_gprs;
 +   unsigned new_num_vs_gprs = num_vs_gprs;
 +   unsigned cur_num_ps_gprs = 
 G_008C04_NUM_PS_GPRS(rctx-config_state.sq_gpr_resource_mgmt_1);
 +   unsigned cur_num_vs_gprs = 
 G_008C04_NUM_VS_GPRS(rctx-config_state.sq_gpr_resource_mgmt_1);
 +   unsigned def_num_ps_gprs = rctx-default_ps_gprs;
 +   unsigned def_num_vs_gprs = rctx-default_vs_gprs;
 +   unsigned def_num_clause_temp_gprs = rctx-r6xx_num_clause_temp_gprs;
 +   /* hardware will reserve twice num_clause_temp_gprs */
 +   unsigned max_gprs = def_num_ps_gprs + def_num_vs_gprs + 
 def_num_clause_temp_gprs * 2;
 unsigned tmp;
 -   int diff;

 -   if (rctx-ps_shader-current-shader.bc.ngpr  rctx-default_ps_gprs) 
 {
 -   diff = rctx-ps_shader-current-shader.bc.ngpr - 
 rctx-default_ps_gprs;
 -   num_vs_gprs -= diff;
 -   num_ps_gprs += diff;
 +   /* the sum of all SQ_GPR_RESOURCE_MGMT*.NUM_*_GPRS must = to 
 max_gprs */
 +   if (new_num_ps_gprs  cur_num_ps_gprs || new_num_vs_gprs  
 cur_num_vs_gprs) {
 +   /* try to use switch back to default */
 +   if (new_num_ps_gprs  def_num_ps_gprs || new_num_vs_gprs  
 def_num_vs_gprs) {
 +   /* always privilege vs stage so that at worst we have 
 the
 +* pixel stage producing wrong output (not the vertex
 +* stage) */
 +   new_num_ps_gprs = max_gprs - (new_num_vs_gprs + 
 def_num_clause_temp_gprs * 2);
 +   new_num_vs_gprs = num_vs_gprs;
 +   } else {
 +   new_num_ps_gprs = def_num_ps_gprs;
 +   new_num_vs_gprs = def_num_vs_gprs;
 +   }
 +   } else {
 +   rctx-discard_draw = false;
 +   return;
 }

 -   if (rctx-vs_shader-current-shader.bc.ngpr  rctx-default_vs_gprs)
 -   {
 -   diff = rctx-vs_shader-current-shader.bc.ngpr - 
 rctx-default_vs_gprs;
 -   num_ps_gprs -= diff;
 -   num_vs_gprs += diff;
 +   /* SQ_PGM_RESOURCES_*.NUM_GPRS must always be program to a value =
 +* SQ_GPR_RESOURCE_MGMT*.NUM_*_GPRS otherwise the GPU will lockup
 +* Also if a shader use more gpr than SQ_GPR_RESOURCE_MGMT*.NUM_*_GPRS
 +* it will lockup. So in this case just discard the draw command
 +* and don't change the current gprs repartitions.
 +*/
 +   rctx-discard_draw = false

Re: [Mesa-dev] R600 tiling halves the frame rate

2012-10-26 Thread Jerome Glisse
On Fri, Oct 26, 2012 at 10:26 PM, Tzvetan Mikov tmi...@jupiter.com wrote:
 -Original Message-
 From: Jerome Glisse

  Can anyone shed some light on this? Is this by design - e.g. is
  this a case of we know that tiling is currently slower than linear
  but the huge payoff is scheduled to arrive in a future revision?
 
  Thanks!
  Tzvetan

 No, in all benchmark i made on various gpu from hd2xxx to hd6xxx
 tiling always gave a performance boost btw 5% up to 20%.

 This is interesting. All I am doing is rotating a big texture on the
 screen. I am using EGL+Gallium, so it is as simple as it gets.

 The hack I am using to disable texture tiling is also extremely simple
 (see below). It speeds up the FPS measurably, up to the extreme
 case of doubling it on HD6460.

 What am I missing?

 Regards,
 Tzvetan


Could you provide a simple gl demo or point to one that shows the same
behavior with your patch. So i have something to know if i am
reproducing or not

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/14] r600g: remove the atom variable from r600_command_buffer

2012-10-09 Thread Jerome Glisse
On Sun, Oct 07, 2012 at 08:08:03PM +0200, Marek Olšák wrote:
 r600_command_buffer is not an atom.
 
 The atoms have evolved into state slots (or groups of state slots) where
 you can bind states. There is a fixed amount of atoms (state slots)
 in the context.
 
 The command buffers are nothing like that. They represent states, not state
 slots.
 
 We could probably give r600_atom a better name someday.

For the serie:
Reviewed-by: Jerome Glisse jgli...@redhat.com

 ---
  src/gallium/drivers/r600/evergreen_compute.c |4 +--
  src/gallium/drivers/r600/evergreen_state.c   |4 +--
  src/gallium/drivers/r600/r600_hw_context.c   |4 +--
  src/gallium/drivers/r600/r600_pipe.h |   44 
 +++---
  src/gallium/drivers/r600/r600_state.c|2 +-
  src/gallium/drivers/r600/r600_state_common.c |   13 +---
  6 files changed, 34 insertions(+), 37 deletions(-)
 
 diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
 b/src/gallium/drivers/r600/evergreen_compute.c
 index b7c7345..abd5b3c 100644
 --- a/src/gallium/drivers/r600/evergreen_compute.c
 +++ b/src/gallium/drivers/r600/evergreen_compute.c
 @@ -329,7 +329,7 @@ static void compute_emit_cs(struct r600_context *ctx, 
 const uint *block_layout,
* See evergreen_init_atom_start_compute_cs() in this file for the list
* of registers initialized by the start_compute_cs_cmd atom.
*/
 - r600_emit_atom(ctx, ctx-start_compute_cs_cmd.atom);
 + r600_emit_command_buffer(ctx-cs, ctx-start_compute_cs_cmd);
  
   ctx-flags |= R600_CONTEXT_CB_FLUSH;
   r600_flush_emit(ctx);
 @@ -625,7 +625,7 @@ void evergreen_init_atom_start_compute_cs(struct 
 r600_context *ctx)
   /* since all required registers are initialised in the
* start_compute_cs_cmd atom, we can EMIT_EARLY here.
*/
 - r600_init_command_buffer(ctx, cb, 1, 256);
 + r600_init_command_buffer(cb, 256);
   cb-pkt_flags = RADEON_CP_PACKET3_COMPUTE_MODE;
  
   switch (ctx-family) {
 diff --git a/src/gallium/drivers/r600/evergreen_state.c 
 b/src/gallium/drivers/r600/evergreen_state.c
 index e35314f..a073021 100644
 --- a/src/gallium/drivers/r600/evergreen_state.c
 +++ b/src/gallium/drivers/r600/evergreen_state.c
 @@ -2373,7 +2373,7 @@ static void cayman_init_atom_start_cs(struct 
 r600_context *rctx)
  {
   struct r600_command_buffer *cb = rctx-start_cs_cmd;
  
 - r600_init_command_buffer(rctx, cb, 0, 256);
 + r600_init_command_buffer(cb, 256);
  
   /* This must be first. */
   r600_store_value(cb, PKT3(PKT3_CONTEXT_CONTROL, 1, 0));
 @@ -2774,7 +2774,7 @@ void evergreen_init_atom_start_cs(struct r600_context 
 *rctx)
   return;
   }
  
 - r600_init_command_buffer(rctx, cb, 0, 256);
 + r600_init_command_buffer(cb, 256);
  
   /* This must be first. */
   r600_store_value(cb, PKT3(PKT3_CONTEXT_CONTROL, 1, 0));
 diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
 b/src/gallium/drivers/r600/r600_hw_context.c
 index 8245059..723039a 100644
 --- a/src/gallium/drivers/r600/r600_hw_context.c
 +++ b/src/gallium/drivers/r600/r600_hw_context.c
 @@ -815,7 +815,7 @@ void r600_context_flush(struct r600_context *ctx, 
 unsigned flags)
  {
   struct radeon_winsys_cs *cs = ctx-cs;
  
 - if (cs-cdw == ctx-start_cs_cmd.atom.num_dw)
 + if (cs-cdw == ctx-start_cs_cmd.num_dw)
   return;
  
   ctx-timer_queries_suspended = false;
 @@ -875,7 +875,7 @@ void r600_begin_new_cs(struct r600_context *ctx)
   ctx-flags = 0;
  
   /* Begin a new CS. */
 - r600_emit_atom(ctx, ctx-start_cs_cmd.atom);
 + r600_emit_command_buffer(ctx-cs, ctx-start_cs_cmd);
  
   /* Re-emit states. */
   r600_atom_dirty(ctx, ctx-alphatest_state.atom);
 diff --git a/src/gallium/drivers/r600/r600_pipe.h 
 b/src/gallium/drivers/r600/r600_pipe.h
 index 607116f..be7b891 100644
 --- a/src/gallium/drivers/r600/r600_pipe.h
 +++ b/src/gallium/drivers/r600/r600_pipe.h
 @@ -59,8 +59,8 @@ struct r600_atom {
  /* This is an atom containing GPU commands that never change.
   * This is supposed to be copied directly into the CS. */
  struct r600_command_buffer {
 - struct r600_atom atom;
   uint32_t *buf;
 + unsigned num_dw;
   unsigned max_num_dw;
   unsigned pkt_flags;
  };
 @@ -504,6 +504,14 @@ struct r600_context {
   int last_start_instance;
  };
  
 +static INLINE void r600_emit_command_buffer(struct radeon_winsys_cs *cs,
 + struct r600_command_buffer *cb)
 +{
 + assert(cs-cdw + cb-num_dw = RADEON_MAX_CMDBUF_DWORDS);
 + memcpy(cs-buf + cs-cdw, cb-buf, 4 * cb-num_dw);
 + cs-cdw += cb-num_dw;
 +}
 +
  static INLINE void r600_emit_atom(struct r600_context *rctx, struct 
 r600_atom *atom)
  {
   atom-emit(rctx, atom);
 @@ -696,15 +704,15 @@ unsigned r600_tex_compare(unsigned compare);
  
  static INLINE void r600_store_value(struct r600_command_buffer *cb, unsigned

Re: [Mesa-dev] [PATCH] r600g: add in-place DB decompression and texturing with DB tiling

2012-10-04 Thread Jerome Glisse
On Wed, Oct 3, 2012 at 5:50 PM, Marek Olšák mar...@gmail.com wrote:
 The decompression is done in-place and only the compressed tiles are
 decompressed. Note: R6xx-R7xx can do that only with Z16 and Z32F.

 The texture unit is programmed to use non-displayable tiling and depth
 ordering of samples, so that it can fetch the texture in the native DB format.

 The latest version of the libdrm surface allocator is required for stencil
 texturing to work. The old one didn't create the mipmap tree correctly.
 We need a separate mipmap tree for stencil, because the stencil mipmap
 offsets are not really depth offsets/4.

 The DB-CB copy is still used for transfers.
 ---

 I sent the libdrm patches a few minutes ago. I guess I will have to make 
 another libdrm release.

 What's good about this is that it improves performance by 4-5% with the 
 1024x768 resolution in Lightsmark on Evergreen. However, the larger the 
 resolution, the smaller the improvement is (something else becomes the 
 bottleneck). It also reduces the memory requirements for depth textures by 
 50%, because the flushed depth texture isn't needed anymore.

 The catch is fetching the 4th stencil mipmap level gives wrong pixels in one 
 not-yet-committed test. What's weird is that all the other mipmaps (both 
 smaller and larger) are fetched correctly. That bug has yet to be fixed, but 
 who is using a stencil buffer with mipmaps anyway? :)

This 4th level might be the usual switching point btw 2d tiled and 1d
tiled ... ie we think the hw is still using 2d while it switched to 1d
(or the other way around)

Otherwise reviewed

Cheers,
Jerome


  src/gallium/auxiliary/util/u_blitter.c |3 +-
  .../drivers/r600/evergreen_compute_internal.c  |6 +-
  src/gallium/drivers/r600/evergreen_state.c |   92 
 +++-
  src/gallium/drivers/r600/evergreend.h  |   10 ++-
  src/gallium/drivers/r600/r600_blit.c   |   89 ---
  src/gallium/drivers/r600/r600_pipe.h   |1 +
  src/gallium/drivers/r600/r600_resource.h   |   10 ++-
  src/gallium/drivers/r600/r600_state.c  |   13 +--
  src/gallium/drivers/r600/r600_texture.c|   60 -
  9 files changed, 216 insertions(+), 68 deletions(-)

 diff --git a/src/gallium/auxiliary/util/u_blitter.c 
 b/src/gallium/auxiliary/util/u_blitter.c
 index 4ad7a6b..86109f0 100644
 --- a/src/gallium/auxiliary/util/u_blitter.c
 +++ b/src/gallium/auxiliary/util/u_blitter.c
 @@ -1602,7 +1602,8 @@ void util_blitter_custom_depth_stencil(struct 
 blitter_context *blitter,
 blitter_disable_render_cond(ctx);

 /* bind states */
 -   pipe-bind_blend_state(pipe, ctx-blend[PIPE_MASK_RGBA]);
 +   pipe-bind_blend_state(pipe, cbsurf ? ctx-blend[PIPE_MASK_RGBA] :
 + ctx-blend[0]);
 pipe-bind_depth_stencil_alpha_state(pipe, dsa_stage);
 ctx-bind_fs_state(pipe, blitter_get_fs_col(ctx, 0, FALSE));
 pipe-bind_vertex_elements_state(pipe, ctx-velem_state);
 diff --git a/src/gallium/drivers/r600/evergreen_compute_internal.c 
 b/src/gallium/drivers/r600/evergreen_compute_internal.c
 index 496d099..b937135 100644
 --- a/src/gallium/drivers/r600/evergreen_compute_internal.c
 +++ b/src/gallium/drivers/r600/evergreen_compute_internal.c
 @@ -480,7 +480,7 @@ void evergreen_set_tex_resource(

 unsigned format, endian;
 uint32_t word4 = 0, yuv_format = 0, pitch = 0;
 -   unsigned char swizzle[4], array_mode = 0, tile_type = 0;
 +   unsigned char swizzle[4], array_mode = 0, non_disp_tiling = 0;
 unsigned height, depth;

 swizzle[0] = 0;
 @@ -503,7 +503,7 @@ void evergreen_set_tex_resource(
 pitch = align(tmp-surface.level[0].nblk_x *
 util_format_get_blockwidth(tmp-resource.b.b.format), 8);
 array_mode = tmp-array_mode[0];
 -   tile_type = tmp-tile_type;
 +   non_disp_tiling = tmp-non_disp_tiling;

 assert(view-base.texture-target != PIPE_TEXTURE_1D_ARRAY);
 assert(view-base.texture-target != PIPE_TEXTURE_2D_ARRAY);
 @@ -513,7 +513,7 @@ void evergreen_set_tex_resource(
 evergreen_emit_raw_value(res,
 
 (S_03_DIM(r600_tex_dim(view-base.texture-target)) |
 S_03_PITCH((pitch / 8) - 1) |
 -   S_03_NON_DISP_TILING_ORDER(tile_type) |
 +   
 S_03_NON_DISP_TILING_ORDER(non_disp_tiling) |
 S_03_TEX_WIDTH(view-base.texture-width0 
 - 1)));
 evergreen_emit_raw_value(res, (S_030004_TEX_HEIGHT(height - 1) |
 S_030004_TEX_DEPTH(depth - 1) |
 diff --git a/src/gallium/drivers/r600/evergreen_state.c 
 b/src/gallium/drivers/r600/evergreen_state.c
 index c126e7d..5a14934 100644
 --- a/src/gallium/drivers/r600/evergreen_state.c
 +++ b/src/gallium/drivers/r600/evergreen_state.c
 @@ 

Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes

2012-09-13 Thread Jerome Glisse
On Wed, Sep 12, 2012 at 5:24 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Tue, Sep 11, 2012 at 2:29 PM, Marek Olšák mar...@gmail.com wrote:
 On Tue, Sep 11, 2012 at 7:41 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Tue, Sep 11, 2012 at 1:10 PM, Marek Olšák mar...@gmail.com wrote:
 Please provide information about the GPU and the test which locks up. I'd
 like to reproduce it. Also please explain what's the cause of the
 lockup if you know it (which registers are not emitted in the correct
 order and how it can fixed).

 Marek


 For instance
 http://people.freedesktop.org/~glisse/registerposition/lockup-longprim.sh

 will lockup probably any r6xx/r7xx (definitely rv670  rv770)

 I know that the whole vgt register order is picky and that most of
 them need to be emitted before ta_cntl_aux and before cb/db. But the
 ordering relative to pa is kind of weird and moving when looking at
 fglrx.

 I tested RS880, which is very similar to RV670, and it didn't hang. I
 can test RV670 later and if there's any issue, I'll fix it. I'd like
 this patch to be fixed instead of dropped, that's why I'm asking and I
 still haven't got a definitive answer how to change the patch, so that
 it can be pushed. Besides that...

 Has it ever occured to you that the register ordering is changing in
 fglrx, because the ordering doesn't matter at all, just like Alex
 said, and the closed driver devs wrote it that way because they didn't
 care about the ordering either?

 I think the lockups you are seeing on r600-r700 are actually caused by
 something entirely different and it confuses you. See this thread from
 the comment #9 onwards:
 https://bugs.freedesktop.org/show_bug.cgi?id=50655#c9

 Marek

 This modified version is fine (rv670,rv770, caicos)
 http://people.freedesktop.org/~glisse/0001-r600g-convert-the-remnants-of-VGT-state-into-immedia.patch

 Cheers,
 Jerome

This one also works

http://people.freedesktop.org/~glisse/0001-r600g-convert-the-remnants-of-VGT-state-into-immedia.patch

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] r600g: add htile support v9

2012-09-12 Thread Jerome Glisse
On Tue, Jul 17, 2012 at 1:58 PM,  j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 htile is used for HiZ and HiS support and fast Z/S clears.
 This commit just adds the htile setup and Fast Z clear.
 We don't take full advantage of HiS with that patch.

 v2 really use fast clear, still random issue with some tiles
need to try more flush combination, fix depth/stencil
texture decompression
 v3 fix random issue on r6xx/r7xx
 v4 rebase on top of lastest mesa, disable CB export when clearing
htile surface to avoid wasting bandwidth
 v5 resummarize htile surface when uploading z value. Fix z/stencil
decompression, the custom blitter with custom dsa is no longer
needed.
 v6 Reorganize render control/override update mecanism, fixing more
issues in the process.
 v7 Add nop after depth surface base update to work around some htile
flushing issue. For htile to 8x8 on r6xx/r7xx as other combination
have issue. Do not enable hyperz when flushing/uncompressing
depth buffer.
 v8 Fix htile surface, preload and prefetch setup. Only set preload
and prefetch on htile surface clear like fglrx. Record depth
clear value per level. Support several level for the htile
surface. First depth clear can't be a fast clear.
 v9 Fix comments, properly account new register in emit function,
disable fast zclear if clearing different layer of texture
array to different value

 Signed-off-by: Pierre-Eric Pelloux-Prayer pell...@gmail.com
 Signed-off-by: Alex Deucher alexander.deuc...@amd.com
 Signed-off-by: Jerome Glisse jgli...@redhat.com

Btw v11 version against newer mesa is at:
http://people.freedesktop.org/~glisse/0001-r600g-add-htile-support-v11.patch

Cheers,
Jerome

 ---
  src/gallium/drivers/r600/evergreen_hw_context.c |6 +
  src/gallium/drivers/r600/evergreen_state.c  |  102 -
  src/gallium/drivers/r600/evergreend.h   |4 +
  src/gallium/drivers/r600/r600_blit.c|   38 +++
  src/gallium/drivers/r600/r600_hw_context.c  |   25 +
  src/gallium/drivers/r600/r600_pipe.c|8 ++
  src/gallium/drivers/r600/r600_pipe.h|   13 ++-
  src/gallium/drivers/r600/r600_resource.h|7 ++
  src/gallium/drivers/r600/r600_state.c   |  133 
 ---
  src/gallium/drivers/r600/r600_texture.c |  103 ++
  src/gallium/drivers/r600/r600d.h|6 +
  11 files changed, 399 insertions(+), 46 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c 
 b/src/gallium/drivers/r600/evergreen_hw_context.c
 index 081701f..546c884 100644
 --- a/src/gallium/drivers/r600/evergreen_hw_context.c
 +++ b/src/gallium/drivers/r600/evergreen_hw_context.c
 @@ -62,6 +62,9 @@ static const struct r600_reg evergreen_context_reg_list[] = 
 {
 {GROUP_FORCE_NEW_BLOCK, 0, 0},
 {R_028058_DB_DEPTH_SIZE, 0, 0},
 {R_02805C_DB_DEPTH_SLICE, 0, 0},
 +   {R_02802C_DB_DEPTH_CLEAR, 0, 0},
 +   {R_028ABC_DB_HTILE_SURFACE, 0, 0},
 +   {R_028AC8_DB_PRELOAD_CONTROL, 0, 0},
 {R_028204_PA_SC_WINDOW_SCISSOR_TL, 0, 0},
 {R_028208_PA_SC_WINDOW_SCISSOR_BR, 0, 0},
 {R_028234_PA_SU_HARDWARE_SCREEN_OFFSET, 0, 0},
 @@ -319,6 +322,9 @@ static const struct r600_reg cayman_context_reg_list[] = {
 {GROUP_FORCE_NEW_BLOCK, 0, 0},
 {R_028058_DB_DEPTH_SIZE, 0, 0},
 {R_02805C_DB_DEPTH_SLICE, 0, 0},
 +   {R_02802C_DB_DEPTH_CLEAR, 0, 0},
 +   {R_028ABC_DB_HTILE_SURFACE, 0, 0},
 +   {R_028AC8_DB_PRELOAD_CONTROL, 0, 0},
 {R_028204_PA_SC_WINDOW_SCISSOR_TL, 0, 0},
 {R_028208_PA_SC_WINDOW_SCISSOR_BR, 0, 0},
 {R_028234_PA_SU_HARDWARE_SCREEN_OFFSET, 0, 0},
 diff --git a/src/gallium/drivers/r600/evergreen_state.c 
 b/src/gallium/drivers/r600/evergreen_state.c
 index a66387b..214d76b 100644
 --- a/src/gallium/drivers/r600/evergreen_state.c
 +++ b/src/gallium/drivers/r600/evergreen_state.c
 @@ -710,13 +710,15 @@ static void *evergreen_create_blend_state(struct 
 pipe_context *ctx,
 }
 blend-cb_target_mask = target_mask;

 -   if (target_mask)
 +   if (target_mask) {
 color_control |= S_028808_MODE(V_028808_CB_NORMAL);
 -   else
 +   } else {
 color_control |= S_028808_MODE(V_028808_CB_DISABLE);
 +   }

 r600_pipe_state_add_reg(rstate, R_028808_CB_COLOR_CONTROL,
 color_control);
 +
 /* only have dual source on MRT0 */
 blend-dual_src_blend = util_blend_state_is_dual(state, 0);
 for (int i = 0; i  8; i++) {
 @@ -1668,6 +1670,26 @@ static void evergreen_db(struct r600_context *rctx, 
 struct r600_pipe_state *rsta
 }
 }

 +   /* hyperz */
 +   if (rtex-hyperz) {
 +   uint64_t htile_offset = 
 rtex-hyperz-surface.level[level].offset;
 +
 +   rctx-db_misc_state.hyperz = true

Re: [Mesa-dev] [PATCH 00/19] r600g refactoring and cleanups

2012-09-11 Thread Jerome Glisse
On Mon, Sep 10, 2012 at 7:16 PM, Marek Olšák mar...@gmail.com wrote:
 Nothing too exciting. Besides cleanups, there are fine-grained sampler state 
 updates (it emits only the samplers which changed), support for geometry 
 shader resources (because it was easy; I am not working on GS right now), 
 atomization of some states, some fixes and a major cleanup in r600_draw_vbo.

 Tested on RS880 and REDWOOD.

 Please review.

For the first 18 patch :
Reviewed-by: Jerome Glisse jgli...@redhat.com

NAK for the 19 see other reply


 Marek Olšák (19):
   r600g: consolidate initialization of common state functions
   r600g: cleanup state function names
   r600g: put constant buffer state into an array indexed by shader type
   r600g: consolidate set_sampler_views functions
   r600g: consolidate set_viewport_state functions
   r600g: do fine-grained sampler state updates
   r600g: put sampler states and views into an array indexed by shader type
   r600g: add support for geometry shader samplers and constant buffers
   r600g: initialize the first CS just like any other CS
   r600g: remove unused state ID definitions
   r600g: atomize stencil ref state
   r600g: atomize viewport state
   r600g: atomize blend color
   r600g: atomize clip state
   r600g: fix the number of CS dwords of cb_misc_state
   r600g: fix computing how much space is needed for a draw command
   r600g: add clip_misc_state for clip registers emitted in draw_vbo
   r600g: emit the primitive type and associated regs only if the type is 
 changed
   r600g: convert the remnants of VGT state into immediate register writes

  src/gallium/drivers/r600/evergreen_hw_context.c |  108 +
  src/gallium/drivers/r600/evergreen_state.c  |  191 +++-
  src/gallium/drivers/r600/evergreend.h   |2 +
  src/gallium/drivers/r600/r600.h |8 +-
  src/gallium/drivers/r600/r600_blit.c|   16 +-
  src/gallium/drivers/r600/r600_buffer.c  |   31 +-
  src/gallium/drivers/r600/r600_hw_context.c  |  133 +++---
  src/gallium/drivers/r600/r600_hw_context_priv.h |3 +-
  src/gallium/drivers/r600/r600_pipe.c|6 +-
  src/gallium/drivers/r600/r600_pipe.h|  169 
  src/gallium/drivers/r600/r600_shader.c  |3 +-
  src/gallium/drivers/r600/r600_shader.h  |1 -
  src/gallium/drivers/r600/r600_state.c   |  211 +++--
  src/gallium/drivers/r600/r600_state_common.c|  526 
 ++-
  src/gallium/drivers/r600/r600d.h|2 +
  15 files changed, 615 insertions(+), 795 deletions(-)

 Marek
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes

2012-09-11 Thread Jerome Glisse
On Mon, Sep 10, 2012 at 7:16 PM, Marek Olšák mar...@gmail.com wrote:

NAK this one introduce lockup. As i said in another email register
group/order matter and with this patch i get 100% lockup rate in some
test case for instance the test case i reference in my other email

 ---
  src/gallium/drivers/r600/evergreen_hw_context.c |   16 ---
  src/gallium/drivers/r600/r600.h |7 -
  src/gallium/drivers/r600/r600_hw_context.c  |   15 ++
  src/gallium/drivers/r600/r600_hw_context_priv.h |2 +-
  src/gallium/drivers/r600/r600_pipe.h|8 +++---
  src/gallium/drivers/r600/r600_state_common.c|   34 
 ---
  6 files changed, 26 insertions(+), 56 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c 
 b/src/gallium/drivers/r600/evergreen_hw_context.c
 index 483021f..0c2159a 100644
 --- a/src/gallium/drivers/r600/evergreen_hw_context.c
 +++ b/src/gallium/drivers/r600/evergreen_hw_context.c
 @@ -32,10 +32,6 @@ static const struct r600_reg cayman_config_reg_list[] = {
 {R_00913C_SPI_CONFIG_CNTL_1, REG_FLAG_ENABLE_ALWAYS | 
 REG_FLAG_FLUSH_CHANGE, 0},
  };

 -static const struct r600_reg evergreen_ctl_const_list[] = {
 -   {R_03CFF4_SQ_VTX_START_INST_LOC, 0, 0},
 -};
 -
  static const struct r600_reg evergreen_context_reg_list[] = {
 {R_028008_DB_DEPTH_VIEW, 0, 0},
 {R_028010_DB_RENDER_OVERRIDE2, 0, 0},
 @@ -63,10 +59,6 @@ static const struct r600_reg evergreen_context_reg_list[] 
 = {
 {R_028254_PA_SC_VPORT_SCISSOR_0_BR, 0, 0},
 {R_028350_SX_MISC, 0, 0},
 {GROUP_FORCE_NEW_BLOCK, 0, 0},
 -   {R_028408_VGT_INDX_OFFSET, 0, 0},
 -   {R_02840C_VGT_MULTI_PRIM_IB_RESET_INDX, 0, 0},
 -   {R_028A94_VGT_MULTI_PRIM_IB_RESET_EN, 0, 0},
 -   {GROUP_FORCE_NEW_BLOCK, 0, 0},
 {R_02861C_SPI_VS_OUT_ID_0, 0, 0},
 {R_028620_SPI_VS_OUT_ID_1, 0, 0},
 {R_028624_SPI_VS_OUT_ID_2, 0, 0},
 @@ -353,10 +345,6 @@ static const struct r600_reg cayman_context_reg_list[] = 
 {
 {R_028254_PA_SC_VPORT_SCISSOR_0_BR, 0, 0},
 {R_028350_SX_MISC, 0, 0},
 {GROUP_FORCE_NEW_BLOCK, 0, 0},
 -   {R_028408_VGT_INDX_OFFSET, 0, 0},
 -   {R_02840C_VGT_MULTI_PRIM_IB_RESET_INDX, 0, 0},
 -   {R_028A94_VGT_MULTI_PRIM_IB_RESET_EN, 0, 0},
 -   {GROUP_FORCE_NEW_BLOCK, 0, 0},
 {R_02861C_SPI_VS_OUT_ID_0, 0, 0},
 {R_028620_SPI_VS_OUT_ID_1, 0, 0},
 {R_028624_SPI_VS_OUT_ID_2, 0, 0},
 @@ -664,10 +652,6 @@ int evergreen_context_init(struct r600_context *ctx)

 Elements(evergreen_context_reg_list), PKT3_SET_CONTEXT_REG, 
 EVERGREEN_CONTEXT_REG_OFFSET);
 if (r)
 goto out_err;
 -   r = r600_context_add_block(ctx, evergreen_ctl_const_list,
 -  Elements(evergreen_ctl_const_list), 
 PKT3_SET_CTL_CONST, EVERGREEN_CTL_CONST_OFFSET);
 -   if (r)
 -   goto out_err;

 /* PS loop const */
 evergreen_loop_const_init(ctx, 0);
 diff --git a/src/gallium/drivers/r600/r600.h b/src/gallium/drivers/r600/r600.h
 index 6363a03..83d21a4 100644
 --- a/src/gallium/drivers/r600/r600.h
 +++ b/src/gallium/drivers/r600/r600.h
 @@ -228,11 +228,4 @@ void _r600_pipe_state_add_reg(struct r600_context *ctx,
  #define r600_pipe_state_add_reg_bo(state, offset, value, bo, usage) 
 _r600_pipe_state_add_reg_bo(rctx, state, offset, value, CTX_RANGE_ID(offset), 
 CTX_BLOCK_ID(offset), bo, usage)
  #define r600_pipe_state_add_reg(state, offset, value) 
 _r600_pipe_state_add_reg(rctx, state, offset, value, CTX_RANGE_ID(offset), 
 CTX_BLOCK_ID(offset))

 -static inline void r600_pipe_state_mod_reg(struct r600_pipe_state *state,
 -  uint32_t value)
 -{
 -   state-regs[state-nregs].value = value;
 -   state-nregs++;
 -}
 -
  #endif
 diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
 b/src/gallium/drivers/r600/r600_hw_context.c
 index 57dcc7e..122f878 100644
 --- a/src/gallium/drivers/r600/r600_hw_context.c
 +++ b/src/gallium/drivers/r600/r600_hw_context.c
 @@ -233,10 +233,6 @@ static const struct r600_reg r600_config_reg_list[] = {
 {R_008C04_SQ_GPR_RESOURCE_MGMT_1, REG_FLAG_ENABLE_ALWAYS | 
 REG_FLAG_FLUSH_CHANGE, 0},
  };

 -static const struct r600_reg r600_ctl_const_list[] = {
 -   {R_03CFF4_SQ_VTX_START_INST_LOC, 0, 0},
 -};
 -
  static const struct r600_reg r600_context_reg_list[] = {
 {R_028A4C_PA_SC_MODE_CNTL, 0, 0},
 {GROUP_FORCE_NEW_BLOCK, 0, 0},
 @@ -461,9 +457,6 @@ static const struct r600_reg r600_context_reg_list[] = {
 {GROUP_FORCE_NEW_BLOCK, 0, 0},
 {R_028850_SQ_PGM_RESOURCES_PS, 0, 0},
 {R_028854_SQ_PGM_EXPORTS_PS, 0, 0},
 -   {R_028408_VGT_INDX_OFFSET, 0, 0},
 -   {R_02840C_VGT_MULTI_PRIM_IB_RESET_INDX, 0, 0},
 -   {R_028A94_VGT_MULTI_PRIM_IB_RESET_EN, 0, 0},
 {R_028C1C_PA_SC_AA_SAMPLE_LOCS_MCTX, 

Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes

2012-09-11 Thread Jerome Glisse
On Tue, Sep 11, 2012 at 1:10 PM, Marek Olšák mar...@gmail.com wrote:
 Please provide information about the GPU and the test which locks up. I'd
 like to reproduce it. Also please explain what's the cause of the
 lockup if you know it (which registers are not emitted in the correct
 order and how it can fixed).

 Marek


For instance
http://people.freedesktop.org/~glisse/registerposition/lockup-longprim.sh

will lockup probably any r6xx/r7xx (definitely rv670  rv770)

I know that the whole vgt register order is picky and that most of
them need to be emitted before ta_cntl_aux and before cb/db. But the
ordering relative to pa is kind of weird and moving when looking at
fglrx.

Cheers,
Jerome

 On Tue, Sep 11, 2012 at 6:48 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Mon, Sep 10, 2012 at 7:16 PM, Marek Olšák mar...@gmail.com wrote:

 NAK this one introduce lockup. As i said in another email register
 group/order matter and with this patch i get 100% lockup rate in some
 test case for instance the test case i reference in my other email

 ---
  src/gallium/drivers/r600/evergreen_hw_context.c |   16 ---
  src/gallium/drivers/r600/r600.h |7 -
  src/gallium/drivers/r600/r600_hw_context.c  |   15 ++
  src/gallium/drivers/r600/r600_hw_context_priv.h |2 +-
  src/gallium/drivers/r600/r600_pipe.h|8 +++---
  src/gallium/drivers/r600/r600_state_common.c|   34 
 ---
  6 files changed, 26 insertions(+), 56 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c 
 b/src/gallium/drivers/r600/evergreen_hw_context.c
 index 483021f..0c2159a 100644
 --- a/src/gallium/drivers/r600/evergreen_hw_context.c
 +++ b/src/gallium/drivers/r600/evergreen_hw_context.c
 @@ -32,10 +32,6 @@ static const struct r600_reg cayman_config_reg_list[] = {
 {R_00913C_SPI_CONFIG_CNTL_1, REG_FLAG_ENABLE_ALWAYS | 
 REG_FLAG_FLUSH_CHANGE, 0},
  };

 -static const struct r600_reg evergreen_ctl_const_list[] = {
 -   {R_03CFF4_SQ_VTX_START_INST_LOC, 0, 0},
 -};
 -
  static const struct r600_reg evergreen_context_reg_list[] = {
 {R_028008_DB_DEPTH_VIEW, 0, 0},
 {R_028010_DB_RENDER_OVERRIDE2, 0, 0},
 @@ -63,10 +59,6 @@ static const struct r600_reg 
 evergreen_context_reg_list[] = {
 {R_028254_PA_SC_VPORT_SCISSOR_0_BR, 0, 0},
 {R_028350_SX_MISC, 0, 0},
 {GROUP_FORCE_NEW_BLOCK, 0, 0},
 -   {R_028408_VGT_INDX_OFFSET, 0, 0},
 -   {R_02840C_VGT_MULTI_PRIM_IB_RESET_INDX, 0, 0},
 -   {R_028A94_VGT_MULTI_PRIM_IB_RESET_EN, 0, 0},
 -   {GROUP_FORCE_NEW_BLOCK, 0, 0},
 {R_02861C_SPI_VS_OUT_ID_0, 0, 0},
 {R_028620_SPI_VS_OUT_ID_1, 0, 0},
 {R_028624_SPI_VS_OUT_ID_2, 0, 0},
 @@ -353,10 +345,6 @@ static const struct r600_reg cayman_context_reg_list[] 
 = {
 {R_028254_PA_SC_VPORT_SCISSOR_0_BR, 0, 0},
 {R_028350_SX_MISC, 0, 0},
 {GROUP_FORCE_NEW_BLOCK, 0, 0},
 -   {R_028408_VGT_INDX_OFFSET, 0, 0},
 -   {R_02840C_VGT_MULTI_PRIM_IB_RESET_INDX, 0, 0},
 -   {R_028A94_VGT_MULTI_PRIM_IB_RESET_EN, 0, 0},
 -   {GROUP_FORCE_NEW_BLOCK, 0, 0},
 {R_02861C_SPI_VS_OUT_ID_0, 0, 0},
 {R_028620_SPI_VS_OUT_ID_1, 0, 0},
 {R_028624_SPI_VS_OUT_ID_2, 0, 0},
 @@ -664,10 +652,6 @@ int evergreen_context_init(struct r600_context *ctx)

 Elements(evergreen_context_reg_list), PKT3_SET_CONTEXT_REG, 
 EVERGREEN_CONTEXT_REG_OFFSET);
 if (r)
 goto out_err;
 -   r = r600_context_add_block(ctx, evergreen_ctl_const_list,
 -  Elements(evergreen_ctl_const_list), 
 PKT3_SET_CTL_CONST, EVERGREEN_CTL_CONST_OFFSET);
 -   if (r)
 -   goto out_err;

 /* PS loop const */
 evergreen_loop_const_init(ctx, 0);
 diff --git a/src/gallium/drivers/r600/r600.h 
 b/src/gallium/drivers/r600/r600.h
 index 6363a03..83d21a4 100644
 --- a/src/gallium/drivers/r600/r600.h
 +++ b/src/gallium/drivers/r600/r600.h
 @@ -228,11 +228,4 @@ void _r600_pipe_state_add_reg(struct r600_context *ctx,
  #define r600_pipe_state_add_reg_bo(state, offset, value, bo, usage) 
 _r600_pipe_state_add_reg_bo(rctx, state, offset, value, 
 CTX_RANGE_ID(offset), CTX_BLOCK_ID(offset), bo, usage)
  #define r600_pipe_state_add_reg(state, offset, value) 
 _r600_pipe_state_add_reg(rctx, state, offset, value, CTX_RANGE_ID(offset), 
 CTX_BLOCK_ID(offset))

 -static inline void r600_pipe_state_mod_reg(struct r600_pipe_state *state,
 -  uint32_t value)
 -{
 -   state-regs[state-nregs].value = value;
 -   state-nregs++;
 -}
 -
  #endif
 diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
 b/src/gallium/drivers/r600/r600_hw_context.c
 index 57dcc7e..122f878 100644
 --- a/src/gallium/drivers/r600/r600_hw_context.c
 +++ b/src/gallium/drivers/r600/r600_hw_context.c
 @@ -233,10 +233,6 @@ static const struct r600_reg

Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes

2012-09-11 Thread Jerome Glisse
On Tue, Sep 11, 2012 at 2:29 PM, Marek Olšák mar...@gmail.com wrote:
 On Tue, Sep 11, 2012 at 7:41 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Tue, Sep 11, 2012 at 1:10 PM, Marek Olšák mar...@gmail.com wrote:
 Please provide information about the GPU and the test which locks up. I'd
 like to reproduce it. Also please explain what's the cause of the
 lockup if you know it (which registers are not emitted in the correct
 order and how it can fixed).

 Marek


 For instance
 http://people.freedesktop.org/~glisse/registerposition/lockup-longprim.sh

 will lockup probably any r6xx/r7xx (definitely rv670  rv770)

 I know that the whole vgt register order is picky and that most of
 them need to be emitted before ta_cntl_aux and before cb/db. But the
 ordering relative to pa is kind of weird and moving when looking at
 fglrx.

 I tested RS880, which is very similar to RV670, and it didn't hang. I
 can test RV670 later and if there's any issue, I'll fix it. I'd like
 this patch to be fixed instead of dropped, that's why I'm asking and I
 still haven't got a definitive answer how to change the patch, so that
 it can be pushed. Besides that...

 Has it ever occured to you that the register ordering is changing in
 fglrx, because the ordering doesn't matter at all, just like Alex
 said, and the closed driver devs wrote it that way because they didn't
 care about the ordering either?

 I think the lockups you are seeing on r600-r700 are actually caused by
 something entirely different and it confuses you. See this thread from
 the comment #9 onwards:
 https://bugs.freedesktop.org/show_bug.cgi?id=50655#c9

 Marek

It's simple without that patch no lockup, with it lockup all the time.
It's just a hard fact, i am not confused about anything, i know for a
fact that reg grouping/order matter somehow. I run several automated
tools that compare register value at draw call time btw r600g and
fglrx while doing hyperz and there was no difference at all, down the
last bit. One was locking up the other not.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes

2012-09-11 Thread Jerome Glisse
On Tue, Sep 11, 2012 at 3:00 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Tue, Sep 11, 2012 at 2:29 PM, Marek Olšák mar...@gmail.com wrote:
 On Tue, Sep 11, 2012 at 7:41 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Tue, Sep 11, 2012 at 1:10 PM, Marek Olšák mar...@gmail.com wrote:
 Please provide information about the GPU and the test which locks up. I'd
 like to reproduce it. Also please explain what's the cause of the
 lockup if you know it (which registers are not emitted in the correct
 order and how it can fixed).

 Marek


 For instance
 http://people.freedesktop.org/~glisse/registerposition/lockup-longprim.sh

 will lockup probably any r6xx/r7xx (definitely rv670  rv770)

 I know that the whole vgt register order is picky and that most of
 them need to be emitted before ta_cntl_aux and before cb/db. But the
 ordering relative to pa is kind of weird and moving when looking at
 fglrx.

 I tested RS880, which is very similar to RV670, and it didn't hang. I
 can test RV670 later and if there's any issue, I'll fix it. I'd like
 this patch to be fixed instead of dropped, that's why I'm asking and I
 still haven't got a definitive answer how to change the patch, so that
 it can be pushed. Besides that...

 Has it ever occured to you that the register ordering is changing in
 fglrx, because the ordering doesn't matter at all, just like Alex
 said, and the closed driver devs wrote it that way because they didn't
 care about the ordering either?

 I think the lockups you are seeing on r600-r700 are actually caused by
 something entirely different and it confuses you. See this thread from
 the comment #9 onwards:
 https://bugs.freedesktop.org/show_bug.cgi?id=50655#c9

 Marek

 It's simple without that patch no lockup, with it lockup all the time.
 It's just a hard fact, i am not confused about anything, i know for a
 fact that reg grouping/order matter somehow. I run several automated
 tools that compare register value at draw call time btw r600g and
 fglrx while doing hyperz and there was no difference at all, down the
 last bit. One was locking up the other not.

 Cheers,
 Jerome

And if your curious r600g command stream good and bad and diff btw bad
and good are at:
http://people.freedesktop.org/~glisse/longprim/

If it's the bad that is emited before the fbo-stencil test then it
lockup, if it's the good one then no lockup.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes

2012-09-11 Thread Jerome Glisse
On Tue, Sep 11, 2012 at 2:29 PM, Marek Olšák mar...@gmail.com wrote:
 On Tue, Sep 11, 2012 at 7:41 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Tue, Sep 11, 2012 at 1:10 PM, Marek Olšák mar...@gmail.com wrote:
 Please provide information about the GPU and the test which locks up. I'd
 like to reproduce it. Also please explain what's the cause of the
 lockup if you know it (which registers are not emitted in the correct
 order and how it can fixed).

 Marek


 For instance
 http://people.freedesktop.org/~glisse/registerposition/lockup-longprim.sh

 will lockup probably any r6xx/r7xx (definitely rv670  rv770)

 I know that the whole vgt register order is picky and that most of
 them need to be emitted before ta_cntl_aux and before cb/db. But the
 ordering relative to pa is kind of weird and moving when looking at
 fglrx.

 I tested RS880, which is very similar to RV670, and it didn't hang. I
 can test RV670 later and if there's any issue, I'll fix it. I'd like
 this patch to be fixed instead of dropped, that's why I'm asking and I
 still haven't got a definitive answer how to change the patch, so that
 it can be pushed. Besides that...

 Has it ever occured to you that the register ordering is changing in
 fglrx, because the ordering doesn't matter at all, just like Alex
 said, and the closed driver devs wrote it that way because they didn't
 care about the ordering either?

fglrx definitly emit register according to certain grouping. Thing is
there is a bunch of register that are emitted in 2/3 or 4 different
group at most of what i have seen. Otherwise all other register are
_always_ emitted as part of same group with the whole group being
emitted. The issue i have is understanding those register that are
emitted in few different ways and how fglrx choose btw those different
one.


 I think the lockups you are seeing on r600-r700 are actually caused by
 something entirely different and it confuses you. See this thread from
 the comment #9 onwards:
 https://bugs.freedesktop.org/show_bug.cgi?id=50655#c9

 Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: simplify flushing

2012-09-10 Thread Jerome Glisse
On Sun, Sep 9, 2012 at 1:03 AM, Marek Olšák mar...@gmail.com wrote:
 Based on the patch called simplify and fix flushing and synchronization
 by Jerome Glisse.

 Rebased, removed unneded code, simplified more and cleaned up.

 Also, SH_ACTION_ENA is not set when changing shaders (hw doesn't seem
 to need it). It's only used to flush constant buffers.

Looks good, still would like to do some stress testing will try to do
that today.
Reviewed-by: Jerome Glisse jgli...@redhat.com

 ---
  src/gallium/drivers/r600/evergreen_compute.c   |   20 +-
  .../drivers/r600/evergreen_compute_internal.c  |4 +-
  src/gallium/drivers/r600/evergreen_state.c |7 +-
  src/gallium/drivers/r600/evergreend.h  |7 +-
  src/gallium/drivers/r600/r600.h|   18 +-
  src/gallium/drivers/r600/r600_hw_context.c |  218 
 +---
  src/gallium/drivers/r600/r600_hw_context_priv.h|3 +-
  src/gallium/drivers/r600/r600_pipe.c   |2 -
  src/gallium/drivers/r600/r600_pipe.h   |4 -
  src/gallium/drivers/r600/r600_state.c  |   21 +-
  src/gallium/drivers/r600/r600_state_common.c   |   76 ++-
  src/gallium/drivers/r600/r600d.h   |   12 ++
  12 files changed, 210 insertions(+), 182 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
 b/src/gallium/drivers/r600/evergreen_compute.c
 index 3533312..1fb63d6 100644
 --- a/src/gallium/drivers/r600/evergreen_compute.c
 +++ b/src/gallium/drivers/r600/evergreen_compute.c
 @@ -96,7 +96,7 @@ static void evergreen_cs_set_vertex_buffer(
 vb-buffer = buffer;
 vb-user_buffer = NULL;

 -   r600_inval_vertex_cache(rctx);
 +   rctx-flags |= rctx-has_vertex_cache ? R600_CONTEXT_VTX_FLUSH : 
 R600_CONTEXT_TEX_FLUSH;
 state-enabled_mask |= 1  vb_index;
 state-dirty_mask |= 1  vb_index;
 r600_atom_dirty(rctx, state-atom);
 @@ -332,8 +332,11 @@ static void compute_emit_cs(struct r600_context *ctx, 
 const uint *block_layout,
  */
 r600_emit_atom(ctx, ctx-start_compute_cs_cmd.atom);

 +   ctx-flags |= R600_CONTEXT_CB_FLUSH;
 +   r600_flush_emit(ctx);
 +
 /* Emit cb_state */
 -cb_state = ctx-states[R600_PIPE_STATE_FRAMEBUFFER];
 +   cb_state = ctx-states[R600_PIPE_STATE_FRAMEBUFFER];
 r600_context_pipe_state_emit(ctx, cb_state, 
 RADEON_CP_PACKET3_COMPUTE_MODE);

 /* Set CB_TARGET_MASK  XXX: Use cb_misc_state */
 @@ -384,15 +387,10 @@ static void compute_emit_cs(struct r600_context *ctx, 
 const uint *block_layout,
 /* Emit dispatch state and dispatch packet */
 evergreen_emit_direct_dispatch(ctx, block_layout, grid_layout);

 -   /* r600_flush_framebuffer() updates the cb_flush_flags and then
 -* calls r600_emit_atom() on the ctx-surface_sync_cmd.atom, which 
 emits
 -* a SURFACE_SYNC packet via r600_emit_surface_sync().
 -*
 -* XXX r600_emit_surface_sync() hardcodes the CP_COHER_SIZE to
 -* 0x, so we will need to add a field to struct
 -* r600_surface_sync_cmd if we want to manually set this value.
 +   /* XXX evergreen_flush_emit() hardcodes the CP_COHER_SIZE to 
 0x
  */
 -   r600_flush_framebuffer(ctx, true /* Flush now */);
 +   ctx-flags |= R600_CONTEXT_CB_FLUSH;
 +   r600_flush_emit(ctx);

  #if 0
 COMPUTE_DBG(cdw: %i\n, cs-cdw);
 @@ -444,7 +442,7 @@ void evergreen_emit_cs_shader(
 r600_write_value(cs, r600_context_bo_reloc(rctx, 
 shader-shader_code_bo,
 RADEON_USAGE_READ));

 -   r600_inval_shader_cache(rctx);
 +   rctx-flags |= R600_CONTEXT_SHADERCONST_FLUSH;
  }

  static void evergreen_launch_grid(
 diff --git a/src/gallium/drivers/r600/evergreen_compute_internal.c 
 b/src/gallium/drivers/r600/evergreen_compute_internal.c
 index 50a60d3..dc95732 100644
 --- a/src/gallium/drivers/r600/evergreen_compute_internal.c
 +++ b/src/gallium/drivers/r600/evergreen_compute_internal.c
 @@ -562,7 +562,7 @@ void evergreen_set_tex_resource(
  
 util_format_get_blockwidth(tmp-resource.b.b.format) *
  view-base.texture-width0*height*depth;

 -   r600_inval_texture_cache(pipe-ctx);
 +   pipe-ctx-flags |= R600_CONTEXT_TEX_FLUSH;

 evergreen_emit_force_reloc(res);
 evergreen_emit_force_reloc(res);
 @@ -621,7 +621,7 @@ void evergreen_set_const_cache(
 res-usage = RADEON_USAGE_READ;
 res-coher_bo_size = size;

 -   r600_inval_shader_cache(pipe-ctx);
 +   pipe-ctx-flags |= R600_CONTEXT_SHADERCONST_FLUSH;
  }

  struct r600_resource* r600_compute_buffer_alloc_vram(
 diff --git a/src/gallium/drivers/r600/evergreen_state.c 
 b/src/gallium/drivers/r600/evergreen_state.c
 index 9a5183e..2a7a35f 100644
 --- a/src/gallium/drivers/r600/evergreen_state.c
 +++ b/src/gallium

Re: [Mesa-dev] [PATCH] r600g: order atom emission

2012-09-07 Thread Jerome Glisse
On Thu, Sep 6, 2012 at 11:32 AM, Alex Deucher alexdeuc...@gmail.com wrote:
 On Thu, Sep 6, 2012 at 10:54 AM, Jerome Glisse j.gli...@gmail.com wrote:
 On Thu, Sep 6, 2012 at 6:20 AM, Dave Airlie airl...@gmail.com wrote:
 On Thu, Sep 6, 2012 at 5:21 PM, Philipp Klaus Krause p...@spth.de wrote:
 On 06.09.2012 07:35, j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 To avoid GPU lockup registers must be emited in a specific order
 (no kidding ...). This patch rework atom emission so order in which
 atom are emited in respect to each other is always the same. We
 don't have any informations on what is the correct order so order
 will need to be infered from fglrx command stream.

 Shouldn't this be stated in comments, so the next person who comes along
 and makes a change in this code doesn't inadvertently change the order?

 Also a comment on what ordering matters most, like I suspect this is
 just hiding a real issue.

 Dave.

 No it's not hiding an issue, afaict it's how the hw works. The hw do
 what some amd document call states validations. So here is how i
 understand how things happen and i can be completely wrong. Hw process
 register write in order it receive them and to avoid postponing state
 validation the hw do state validation while processing register. That
 means if writing register A trigger state validation that use some
 field of register B the hw might not redo state validation when
 register B is latter written. ie only some register trigger the state
 validation no matter on what they depends on. I believe state
 validation is only use as pipeline optimization by the hw, so the hw
 knows it can take some short cut. But in some rare case if short cut
 are taken for wrong reasons we end up in GPU lockup.

 No matter if my guess is right or wrong, i know for a fact that
 register order is important in some situation, that's the hard bottom
 line, no matter what is the reasons inside the hw.

 This patch is far from having all the order right, it's just a first
 step, i am atomizing everything and it's what needed to go forward
 without regression.

 I've talked to the internal hw and sw guys and they said there isn't
 any specific ordering required and the closed driver doesn't impose
 any specific order.  The pipeline doesn't get kicked off until a draw
 command is issued, so I don't see why the state update order would
 matter.  It's possible there are subtle ordering requirements and the
 closed driver just happened to get it right.  There are dependencies
 and hw bug workarounds however.  E.g., some blocks snoop registers
 from other blocks so you need to make sure those dependant registers
 have been initialized before drawing.  I don't know if it's the
 ordering so much as making sure we emit all the necessary state when
 needed.  The closed driver tends to update a lot more state the is
 minimally required for a lot of things.  That said, it probably
 wouldn't hurt to mirror the closed driver more closely.

 Alex

I don't know what are the reason but what register are emitted and
along which other register definitely matter. All files i am talking
in this mail are located at :
http://people.freedesktop.org/~glisse/registerposition/

So if you apply :
0001-r600g-FORCE-LOCKUP-BY-EMITTING-OR-NOT-REGISTER.patch

and run piglit test like in lockup-longprim.sh you will lockup the GPU
(i only tested on r6xx, r7xx so far).

I double checked through automated tools that no register that was
written by command stream from longprim piglist test are reprogram
properly by the fbo test (if you have my constant buffer size patch i
sent earlier).

The only diff with command stream is one where
R_02881C_PA_CL_VS_OUT_CNTL is emitted with each and the other only
once, when emitted with each draw it lockups.

bad command stream r600g-long-prim-simple-b.txt
good one r600g-long-prim-simple-g.txt
diff r600g-long-prim-simple-d.txt

Given the bad one emit more register some draw command are moved to
the second cs.

Emitting some other register along PA_CL_VS_OUT_CNTL fix the lockup
(don't have short list) but many other register behave the same as
PA_CL_VS_OUT_CNTL. So if order does not matter then register group
definitely does. I really wish that the hw were less picky about how
command stream are supposed to be formated. Anyhow given that we have
no information on what register need to be emitted together, mimicking
fglrx sounds like the way to go.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: order atom emission

2012-09-06 Thread Jerome Glisse
On Thu, Sep 6, 2012 at 11:32 AM, Alex Deucher alexdeuc...@gmail.com wrote:
 On Thu, Sep 6, 2012 at 10:54 AM, Jerome Glisse j.gli...@gmail.com wrote:
 On Thu, Sep 6, 2012 at 6:20 AM, Dave Airlie airl...@gmail.com wrote:
 On Thu, Sep 6, 2012 at 5:21 PM, Philipp Klaus Krause p...@spth.de wrote:
 On 06.09.2012 07:35, j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 To avoid GPU lockup registers must be emited in a specific order
 (no kidding ...). This patch rework atom emission so order in which
 atom are emited in respect to each other is always the same. We
 don't have any informations on what is the correct order so order
 will need to be infered from fglrx command stream.

 Shouldn't this be stated in comments, so the next person who comes along
 and makes a change in this code doesn't inadvertently change the order?

 Also a comment on what ordering matters most, like I suspect this is
 just hiding a real issue.

 Dave.

 No it's not hiding an issue, afaict it's how the hw works. The hw do
 what some amd document call states validations. So here is how i
 understand how things happen and i can be completely wrong. Hw process
 register write in order it receive them and to avoid postponing state
 validation the hw do state validation while processing register. That
 means if writing register A trigger state validation that use some
 field of register B the hw might not redo state validation when
 register B is latter written. ie only some register trigger the state
 validation no matter on what they depends on. I believe state
 validation is only use as pipeline optimization by the hw, so the hw
 knows it can take some short cut. But in some rare case if short cut
 are taken for wrong reasons we end up in GPU lockup.

 No matter if my guess is right or wrong, i know for a fact that
 register order is important in some situation, that's the hard bottom
 line, no matter what is the reasons inside the hw.

 This patch is far from having all the order right, it's just a first
 step, i am atomizing everything and it's what needed to go forward
 without regression.

 I've talked to the internal hw and sw guys and they said there isn't
 any specific ordering required and the closed driver doesn't impose
 any specific order.  The pipeline doesn't get kicked off until a draw
 command is issued, so I don't see why the state update order would
 matter.  It's possible there are subtle ordering requirements and the
 closed driver just happened to get it right.  There are dependencies
 and hw bug workarounds however.  E.g., some blocks snoop registers
 from other blocks so you need to make sure those dependant registers
 have been initialized before drawing.  I don't know if it's the
 ordering so much as making sure we emit all the necessary state when
 needed.  The closed driver tends to update a lot more state the is
 minimally required for a lot of things.  That said, it probably
 wouldn't hurt to mirror the closed driver more closely.

 Alex


Yeah it's possible that it's also related to some register need to be
re-emitted, i often see that fglrx is re-emitting some register even
if it emitted it with same value just before and some register are
emitted several time around other register block.

Anyhow this patch is a first step to atomize everything and match
fglrx register pattern more closely.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: order atom emission v2

2012-09-06 Thread Jerome Glisse
On Thu, Sep 6, 2012 at 4:10 PM, Marek Olšák mar...@gmail.com wrote:
 On Thu, Sep 6, 2012 at 8:34 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Thu, Sep 6, 2012 at 2:29 PM, Marek Olšák mar...@gmail.com wrote:
 This looks good to me. It's funny to see the r300g architecture being
 re-implemented in r600g. :)

 There's one optimization that r300g has that this patch doesn't. r300g
 keeps the index of the first and the last dirty atom and the loops
 over the list of atoms look like this:
 for (i = first_dirty; i = last_dirty; i++)

 And after emission:
 first_dirty = some large number;
 last_dirty= 0;

 The atoms should be ordered according to how frequently they are
 updated (except when the ordering is required by the hw). But most
 importantly, if there are no state changes, the loops are trivially
 skipped.

 Marek

 Don't think this optimization is worth it, there won't be much more
 than 32 atom in the end and it definitely can't be ordered from most
 frequent to less frequent as some of the stuff need to be at the last
 being emitted and they are frequent one (primitive type for instance).

 I didn't say all atoms *must* be sorted. I meant that some (most?)
 atoms can be sorted, i.e. you can have some atoms at fixed positions
 (like the primitype type or the seamless cubemap state), but you have
 always at least *some* freedom where you put the rest. The ordering I
 had in mind was actually from the least frequent to the most frequent,
 in other words, from the framebuffer (least frequent) to shaders to
 textures to constant buffers to vertex buffers (most frequent).

 Of course, the code should document which atoms must have fixed
 positions along with an explanation. The comment that all atom
 positions must not be changed isn't enough, because it's not true.

 Marek

I won't try to find which atom can have complete floating position, i
am just grouping together register that are always emitted together in
fglrx and then i position this group relative to each other according
to fglrx position. That means all atom are always emitted in a
specific order. So there won't be any freedom. The only freedom i can
think of is btw 2 position forced atom and that make the sorting
completely useless and complicated.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/7] MSAA on R700 and improvements for Evergreen

2012-08-23 Thread Jerome Glisse
On Wed, Aug 22, 2012 at 9:54 PM, Marek Olšák mar...@gmail.com wrote:
 This series adds R700 MSAA support along with compression of MSAA 
 colorbuffers for R700 and Evergreen, which should save a lot of bandwidth 
 with MSAA. There are also some minor fixes.

 Please review.

 Marek Olšák (7):
   gallium/u_blitter: initialize sample mask in resolve
   r600g: set CB_TARGET_MASK to 0xf and not 0xff for resolve on evergreen
   r600g: fix evergreen 8x MSAA sample positions
   r600g: cleanup names around depth decompression
   r600g: implement compression for MSAA colorbuffers for evergreen
   r600g: change programming of CB_SHADER_MASK on r600-r700
   r600g: implement MSAA for r700

For the serie :
Reviewed-by: Jerome Glisse jgli...@redhat.com

What's wrong with r6xx ?


  src/gallium/auxiliary/util/u_blitter.c  |   46 
  src/gallium/auxiliary/util/u_blitter.h  |5 +
  src/gallium/drivers/r600/evergreen_hw_context.c |   64 ++
  src/gallium/drivers/r600/evergreen_state.c  |   87 ++--
  src/gallium/drivers/r600/evergreend.h   |   76 ++-
  src/gallium/drivers/r600/r600_blit.c|   97 -
  src/gallium/drivers/r600/r600_hw_context.c  |   16 ++
  src/gallium/drivers/r600/r600_pipe.c|6 +
  src/gallium/drivers/r600/r600_pipe.h|   16 +-
  src/gallium/drivers/r600/r600_resource.h|   14 +-
  src/gallium/drivers/r600/r600_state.c   |  262 
 +++
  src/gallium/drivers/r600/r600_state_common.c|   45 +++-
  src/gallium/drivers/r600/r600_texture.c |  116 +-
  src/gallium/drivers/r600/r600d.h|   20 ++
  14 files changed, 770 insertions(+), 100 deletions(-)

 Marek
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] winsys/radeon: fix VA allocation

2012-08-03 Thread Jerome Glisse
On Fri, Aug 3, 2012 at 11:06 AM, Christian König
deathsim...@vodafone.de wrote:
 Wait for VA use to end before reusing it.

 Should fix: https://bugs.freedesktop.org/show_bug.cgi?id=45018

 Signed-off-by: Christian König deathsim...@vodafone.de
 ---
  src/gallium/winsys/radeon/drm/radeon_drm_bo.c |   64 
 +
  1 file changed, 43 insertions(+), 21 deletions(-)

 diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
 b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
 index 2626586..0c94461 100644
 --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
 +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
 @@ -102,6 +102,7 @@ static INLINE struct radeon_bo *radeon_bo(struct 
 pb_buffer *bo)

  struct radeon_bo_va_hole {
  struct list_head list;
 +uint32_t handle;
  uint64_t offset;
  uint64_t size;
  };
 @@ -204,7 +205,30 @@ static uint64_t radeon_bomgr_find_va(struct radeon_bomgr 
 *mgr, uint64_t size, ui
  pipe_mutex_lock(mgr-bo_va_mutex);
  /* first look for a hole */
  LIST_FOR_EACH_ENTRY_SAFE(hole, n, mgr-va_holes, list) {
 +if (hole-handle) {
 +struct drm_radeon_gem_busy busy_args;
 +struct drm_gem_close close_args;
 +
 +memset(busy_args, 0, sizeof(busy_args));
 +busy_args.handle = hole-handle;
 +if (drmCommandWriteRead(mgr-rws-fd, DRM_RADEON_GEM_BUSY,
 +busy_args, sizeof(busy_args)) != 0) {
 +continue;
 +}
 +
 +memset(close_args, 0, sizeof(close_args));
 +close_args.handle = hole-handle;
 +drmIoctl(mgr-rws-fd, DRM_IOCTL_GEM_CLOSE, close_args);
 +
 +hole-handle = 0;
 +}
  offset = hole-offset;
 +   if ((offset + hole-size) == mgr-va_offset) {
 +mgr-va_offset = offset;
 +list_del(hole-list);
 +FREE(hole);
 +continue;
 +   }
  waste = 0;
  if (alignment) {
  waste = offset % alignment;
 @@ -280,23 +304,21 @@ static void radeon_bomgr_force_va(struct radeon_bomgr 
 *mgr, uint64_t va, uint64_
  pipe_mutex_unlock(mgr-bo_va_mutex);
  }

 -static void radeon_bomgr_free_va(struct radeon_bomgr *mgr, uint64_t va, 
 uint64_t size)
 +static void radeon_bomgr_free_va(struct radeon_bomgr *mgr, uint64_t va,
 + uint64_t size, uint32_t handle)
  {
 +struct radeon_bo_va_hole *hole;
  pipe_mutex_lock(mgr-bo_va_mutex);
 -if ((va + size) == mgr-va_offset) {
 -mgr-va_offset = va;
 -} else {
 -struct radeon_bo_va_hole *hole;

 -/* FIXME on allocation failure we just lose virtual address space
 - * maybe print a warning
 - */
 -hole = CALLOC_STRUCT(radeon_bo_va_hole);
 -if (hole) {
 -hole-size = size;
 -hole-offset = va;
 -list_add(hole-list, mgr-va_holes);
 -}
 +/* FIXME on allocation failure we just lose virtual address space
 + * maybe print a warning
 + */
 +hole = CALLOC_STRUCT(radeon_bo_va_hole);
 +if (hole) {
 +hole-handle = handle;
 +hole-size = size;
 +hole-offset = va;
 +list_add(hole-list, mgr-va_holes);
  }
  pipe_mutex_unlock(mgr-bo_va_mutex);
  }
 @@ -320,12 +342,12 @@ static void radeon_bo_destroy(struct pb_buffer *_buf)
  os_munmap(bo-ptr, bo-base.size);

  if (mgr-va) {
 -radeon_bomgr_free_va(mgr, bo-va, bo-va_size);
 +radeon_bomgr_free_va(mgr, bo-va, bo-va_size, bo-handle);
 +} else {
 +/* Close object. */
 +args.handle = bo-handle;
 +drmIoctl(bo-rws-fd, DRM_IOCTL_GEM_CLOSE, args);
  }
 -
 -/* Close object. */
 -args.handle = bo-handle;
 -drmIoctl(bo-rws-fd, DRM_IOCTL_GEM_CLOSE, args);
  pipe_mutex_destroy(bo-map_mutex);
  FREE(bo);
  }
 @@ -540,7 +562,7 @@ static struct pb_buffer *radeon_bomgr_create_bo(struct 
 pb_manager *_mgr,
  return NULL;
  }
  if (va.operation == RADEON_VA_RESULT_VA_EXIST) {
 -radeon_bomgr_free_va(mgr, bo-va, bo-va_size);
 +radeon_bomgr_free_va(mgr, bo-va, bo-va_size, 0);
  bo-va = va.offset;
  radeon_bomgr_force_va(mgr, bo-va, bo-va_size);
  }
 @@ -865,7 +887,7 @@ done:
  return NULL;
  }
  if (va.operation == RADEON_VA_RESULT_VA_EXIST) {
 -radeon_bomgr_free_va(mgr, bo-va, bo-va_size);
 +radeon_bomgr_free_va(mgr, bo-va, bo-va_size, 0);
  bo-va = va.offset;
  radeon_bomgr_force_va(mgr, bo-va, bo-va_size);
  }
 --
 1.7.9.5

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


As i said in the bug report this is not needed. As soon as you call in
kernel it should work (given you 

Re: [Mesa-dev] [PATCH 1/2] winsys/radeon: fix VA allocation

2012-08-03 Thread Jerome Glisse
On Fri, Aug 3, 2012 at 11:06 AM, Christian König
deathsim...@vodafone.de wrote:
 Wait for VA use to end before reusing it.

 Should fix: https://bugs.freedesktop.org/show_bug.cgi?id=45018

 Signed-off-by: Christian König deathsim...@vodafone.de

Actually you right mesa can't free right away va, it needs to wait
kernel is done. But kernel was severly buggy too, never cleared the
pagetable when freeing object. I attached kernel patch. I am in
prossed of testing them.

Cheers,
Jerome

 ---
  src/gallium/winsys/radeon/drm/radeon_drm_bo.c |   64 
 +
  1 file changed, 43 insertions(+), 21 deletions(-)

 diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
 b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
 index 2626586..0c94461 100644
 --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
 +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
 @@ -102,6 +102,7 @@ static INLINE struct radeon_bo *radeon_bo(struct 
 pb_buffer *bo)

  struct radeon_bo_va_hole {
  struct list_head list;
 +uint32_t handle;
  uint64_t offset;
  uint64_t size;
  };
 @@ -204,7 +205,30 @@ static uint64_t radeon_bomgr_find_va(struct radeon_bomgr 
 *mgr, uint64_t size, ui
  pipe_mutex_lock(mgr-bo_va_mutex);
  /* first look for a hole */
  LIST_FOR_EACH_ENTRY_SAFE(hole, n, mgr-va_holes, list) {
 +if (hole-handle) {
 +struct drm_radeon_gem_busy busy_args;
 +struct drm_gem_close close_args;
 +
 +memset(busy_args, 0, sizeof(busy_args));
 +busy_args.handle = hole-handle;
 +if (drmCommandWriteRead(mgr-rws-fd, DRM_RADEON_GEM_BUSY,
 +busy_args, sizeof(busy_args)) != 0) {
 +continue;
 +}
 +
 +memset(close_args, 0, sizeof(close_args));
 +close_args.handle = hole-handle;
 +drmIoctl(mgr-rws-fd, DRM_IOCTL_GEM_CLOSE, close_args);
 +
 +hole-handle = 0;
 +}
  offset = hole-offset;
 +   if ((offset + hole-size) == mgr-va_offset) {
 +mgr-va_offset = offset;
 +list_del(hole-list);
 +FREE(hole);
 +continue;
 +   }
  waste = 0;
  if (alignment) {
  waste = offset % alignment;
 @@ -280,23 +304,21 @@ static void radeon_bomgr_force_va(struct radeon_bomgr 
 *mgr, uint64_t va, uint64_
  pipe_mutex_unlock(mgr-bo_va_mutex);
  }

 -static void radeon_bomgr_free_va(struct radeon_bomgr *mgr, uint64_t va, 
 uint64_t size)
 +static void radeon_bomgr_free_va(struct radeon_bomgr *mgr, uint64_t va,
 + uint64_t size, uint32_t handle)
  {
 +struct radeon_bo_va_hole *hole;
  pipe_mutex_lock(mgr-bo_va_mutex);
 -if ((va + size) == mgr-va_offset) {
 -mgr-va_offset = va;
 -} else {
 -struct radeon_bo_va_hole *hole;

 -/* FIXME on allocation failure we just lose virtual address space
 - * maybe print a warning
 - */
 -hole = CALLOC_STRUCT(radeon_bo_va_hole);
 -if (hole) {
 -hole-size = size;
 -hole-offset = va;
 -list_add(hole-list, mgr-va_holes);
 -}
 +/* FIXME on allocation failure we just lose virtual address space
 + * maybe print a warning
 + */
 +hole = CALLOC_STRUCT(radeon_bo_va_hole);
 +if (hole) {
 +hole-handle = handle;
 +hole-size = size;
 +hole-offset = va;
 +list_add(hole-list, mgr-va_holes);
  }
  pipe_mutex_unlock(mgr-bo_va_mutex);
  }
 @@ -320,12 +342,12 @@ static void radeon_bo_destroy(struct pb_buffer *_buf)
  os_munmap(bo-ptr, bo-base.size);

  if (mgr-va) {
 -radeon_bomgr_free_va(mgr, bo-va, bo-va_size);
 +radeon_bomgr_free_va(mgr, bo-va, bo-va_size, bo-handle);
 +} else {
 +/* Close object. */
 +args.handle = bo-handle;
 +drmIoctl(bo-rws-fd, DRM_IOCTL_GEM_CLOSE, args);
  }
 -
 -/* Close object. */
 -args.handle = bo-handle;
 -drmIoctl(bo-rws-fd, DRM_IOCTL_GEM_CLOSE, args);
  pipe_mutex_destroy(bo-map_mutex);
  FREE(bo);
  }
 @@ -540,7 +562,7 @@ static struct pb_buffer *radeon_bomgr_create_bo(struct 
 pb_manager *_mgr,
  return NULL;
  }
  if (va.operation == RADEON_VA_RESULT_VA_EXIST) {
 -radeon_bomgr_free_va(mgr, bo-va, bo-va_size);
 +radeon_bomgr_free_va(mgr, bo-va, bo-va_size, 0);
  bo-va = va.offset;
  radeon_bomgr_force_va(mgr, bo-va, bo-va_size);
  }
 @@ -865,7 +887,7 @@ done:
  return NULL;
  }
  if (va.operation == RADEON_VA_RESULT_VA_EXIST) {
 -radeon_bomgr_free_va(mgr, bo-va, bo-va_size);
 +radeon_bomgr_free_va(mgr, bo-va, bo-va_size, 0);
  bo-va = va.offset;
  radeon_bomgr_force_va(mgr, bo-va, bo-va_size);
  }
 --
 1.7.9.5

 

Re: [Mesa-dev] [PATCH 1/2] r600g: add htile support v9

2012-07-30 Thread Jerome Glisse
On Sun, Jul 29, 2012 at 1:50 PM, Marek Olšák mar...@gmail.com wrote:
 On Tue, Jul 17, 2012 at 7:58 PM,  j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 htile is used for HiZ and HiS support and fast Z/S clears.
 This commit just adds the htile setup and Fast Z clear.
 We don't take full advantage of HiS with that patch.

 v2 really use fast clear, still random issue with some tiles
need to try more flush combination, fix depth/stencil
texture decompression
 v3 fix random issue on r6xx/r7xx
 v4 rebase on top of lastest mesa, disable CB export when clearing
htile surface to avoid wasting bandwidth
 v5 resummarize htile surface when uploading z value. Fix z/stencil
decompression, the custom blitter with custom dsa is no longer
needed.
 v6 Reorganize render control/override update mecanism, fixing more
issues in the process.
 v7 Add nop after depth surface base update to work around some htile
flushing issue. For htile to 8x8 on r6xx/r7xx as other combination
have issue. Do not enable hyperz when flushing/uncompressing
depth buffer.
 v8 Fix htile surface, preload and prefetch setup. Only set preload
and prefetch on htile surface clear like fglrx. Record depth
clear value per level. Support several level for the htile
surface. First depth clear can't be a fast clear.
 v9 Fix comments, properly account new register in emit function,
disable fast zclear if clearing different layer of texture
array to different value

 Signed-off-by: Pierre-Eric Pelloux-Prayer pell...@gmail.com
 Signed-off-by: Alex Deucher alexander.deuc...@amd.com
 Signed-off-by: Jerome Glisse jgli...@redhat.com
 ---
  src/gallium/drivers/r600/evergreen_hw_context.c |6 +
  src/gallium/drivers/r600/evergreen_state.c  |  102 -
  src/gallium/drivers/r600/evergreend.h   |4 +
  src/gallium/drivers/r600/r600_blit.c|   38 +++
  src/gallium/drivers/r600/r600_hw_context.c  |   25 +
  src/gallium/drivers/r600/r600_pipe.c|8 ++
  src/gallium/drivers/r600/r600_pipe.h|   13 ++-
  src/gallium/drivers/r600/r600_resource.h|7 ++
  src/gallium/drivers/r600/r600_state.c   |  133 
 ---
  src/gallium/drivers/r600/r600_texture.c |  103 ++
  src/gallium/drivers/r600/r600d.h|6 +
  11 files changed, 399 insertions(+), 46 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c 
 b/src/gallium/drivers/r600/evergreen_hw_context.c
 index 081701f..546c884 100644
 --- a/src/gallium/drivers/r600/evergreen_hw_context.c
 +++ b/src/gallium/drivers/r600/evergreen_hw_context.c
 @@ -62,6 +62,9 @@ static const struct r600_reg evergreen_context_reg_list[] 
 = {
 {GROUP_FORCE_NEW_BLOCK, 0, 0},
 {R_028058_DB_DEPTH_SIZE, 0, 0},
 {R_02805C_DB_DEPTH_SLICE, 0, 0},
 +   {R_02802C_DB_DEPTH_CLEAR, 0, 0},
 +   {R_028ABC_DB_HTILE_SURFACE, 0, 0},
 +   {R_028AC8_DB_PRELOAD_CONTROL, 0, 0},
 {R_028204_PA_SC_WINDOW_SCISSOR_TL, 0, 0},
 {R_028208_PA_SC_WINDOW_SCISSOR_BR, 0, 0},
 {R_028234_PA_SU_HARDWARE_SCREEN_OFFSET, 0, 0},
 @@ -319,6 +322,9 @@ static const struct r600_reg cayman_context_reg_list[] = 
 {
 {GROUP_FORCE_NEW_BLOCK, 0, 0},
 {R_028058_DB_DEPTH_SIZE, 0, 0},
 {R_02805C_DB_DEPTH_SLICE, 0, 0},
 +   {R_02802C_DB_DEPTH_CLEAR, 0, 0},
 +   {R_028ABC_DB_HTILE_SURFACE, 0, 0},
 +   {R_028AC8_DB_PRELOAD_CONTROL, 0, 0},
 {R_028204_PA_SC_WINDOW_SCISSOR_TL, 0, 0},
 {R_028208_PA_SC_WINDOW_SCISSOR_BR, 0, 0},
 {R_028234_PA_SU_HARDWARE_SCREEN_OFFSET, 0, 0},
 diff --git a/src/gallium/drivers/r600/evergreen_state.c 
 b/src/gallium/drivers/r600/evergreen_state.c
 index a66387b..214d76b 100644
 --- a/src/gallium/drivers/r600/evergreen_state.c
 +++ b/src/gallium/drivers/r600/evergreen_state.c
 @@ -710,13 +710,15 @@ static void *evergreen_create_blend_state(struct 
 pipe_context *ctx,
 }
 blend-cb_target_mask = target_mask;

 -   if (target_mask)
 +   if (target_mask) {
 color_control |= S_028808_MODE(V_028808_CB_NORMAL);
 -   else
 +   } else {
 color_control |= S_028808_MODE(V_028808_CB_DISABLE);
 +   }

 r600_pipe_state_add_reg(rstate, R_028808_CB_COLOR_CONTROL,
 color_control);
 +
 /* only have dual source on MRT0 */
 blend-dual_src_blend = util_blend_state_is_dual(state, 0);
 for (int i = 0; i  8; i++) {
 @@ -1668,6 +1670,26 @@ static void evergreen_db(struct r600_context *rctx, 
 struct r600_pipe_state *rsta
 }
 }

 +   /* hyperz */
 +   if (rtex-hyperz) {
 +   uint64_t htile_offset = 
 rtex-hyperz-surface.level[level].offset;
 +
 +   rctx-db_misc_state.hyperz = true;
 +   rctx-db_misc_state.db_htile_surface_mask = 0x

Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2

2012-07-23 Thread Jerome Glisse
On Sun, Jul 22, 2012 at 8:58 PM, Marek Olšák mar...@gmail.com wrote:
 On Fri, Jul 20, 2012 at 4:54 AM, Jerome Glisse j.gli...@gmail.com wrote:
 On Thu, Jul 19, 2012 at 10:25 PM, Marek Olšák mar...@gmail.com wrote:
 On Fri, Jul 20, 2012 at 4:17 AM, Jerome Glisse j.gli...@gmail.com wrote:
 On Thu, Jul 19, 2012 at 10:06 PM, Marek Olšák mar...@gmail.com wrote:
 I actually care a lot about lockups. Well, you are complaing about
 lockups, yet you have quite obvious bugs in your hyperz code, so let's
 fix them first. (I wouldn't even try and run the hyperz code in its
 current state. Please don't take that personally.) Then, if the
 lockups persist, we can start looking into *what* fixes them. You seem
 to think that this patch helps a lot, but you don't say why. Aren't
 you interested in what sequence of GPU commands helps? If I am
 counting correctly, there are 7 changes in behavior in this patch. It
 should be pretty easy to nail down the few that help, document them
 (like /* these two lines fix a lockup with hyperz */), and discard the
 rest. The documenting part is very important, so that the other
 developers won't break your code accidentally.

 Marek


 You haven't even try hyperz and you say i have an obvious bug, that's
 kind of funny, but you would not know why. I try pretty much all of

 Oh come on, I already told you about all the bugs I found in the
 hyperz patch. You now know them too, and so does everybody else
 reading mesa-dev.

 Marek


 None of the issue you pointed out showed in piglit, none of them did
 have impact on things like openarena, nexuiz, doomIII, lightmark, ...
 so no issue you pointed does not cripple the hyperz patch, it's
 working quite well for many things. Before you extrapolate, yes issue
 you pointed out have impact in backward use of GL but none the less i
 addressed them and i can tell you it does help a bit with lockup.

 I have no doubt that it helps with your lockups and I also have no
 doubt that the piece of code that helps can be bisected. I have
 mentioned 7 changes in the patch which are questionable, so the
 bisection should ideally take 3 steps. After we find the change which
 helps (and document it), we can discard the rest. That should give us
 the same stability as this patch does, but without unnecessary code
 which does cost GPU cycles (regardless of whether it is measurable on
 a particular machine or not).

 By the way, in draw_vbo, the emit functions should be called after
 r600_need_cs_space. Otherwise the command stream may overflow.

 Marek

Again i haven't found a combination other than the outcome of the full
patch that helps more. So be my guest bisect on rv610, rv635, rv670,
rv710, rv740, rv770.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2

2012-07-23 Thread Jerome Glisse
On Mon, Jul 23, 2012 at 5:28 PM, Marek Olšák mar...@gmail.com wrote:
 On Mon, Jul 23, 2012 at 4:25 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Sun, Jul 22, 2012 at 8:58 PM, Marek Olšák mar...@gmail.com wrote:
 On Fri, Jul 20, 2012 at 4:54 AM, Jerome Glisse j.gli...@gmail.com wrote:
 On Thu, Jul 19, 2012 at 10:25 PM, Marek Olšák mar...@gmail.com wrote:
 On Fri, Jul 20, 2012 at 4:17 AM, Jerome Glisse j.gli...@gmail.com wrote:
 On Thu, Jul 19, 2012 at 10:06 PM, Marek Olšák mar...@gmail.com wrote:
 I actually care a lot about lockups. Well, you are complaing about
 lockups, yet you have quite obvious bugs in your hyperz code, so let's
 fix them first. (I wouldn't even try and run the hyperz code in its
 current state. Please don't take that personally.) Then, if the
 lockups persist, we can start looking into *what* fixes them. You seem
 to think that this patch helps a lot, but you don't say why. Aren't
 you interested in what sequence of GPU commands helps? If I am
 counting correctly, there are 7 changes in behavior in this patch. It
 should be pretty easy to nail down the few that help, document them
 (like /* these two lines fix a lockup with hyperz */), and discard the
 rest. The documenting part is very important, so that the other
 developers won't break your code accidentally.

 Marek


 You haven't even try hyperz and you say i have an obvious bug, that's
 kind of funny, but you would not know why. I try pretty much all of

 Oh come on, I already told you about all the bugs I found in the
 hyperz patch. You now know them too, and so does everybody else
 reading mesa-dev.

 Marek


 None of the issue you pointed out showed in piglit, none of them did
 have impact on things like openarena, nexuiz, doomIII, lightmark, ...
 so no issue you pointed does not cripple the hyperz patch, it's
 working quite well for many things. Before you extrapolate, yes issue
 you pointed out have impact in backward use of GL but none the less i
 addressed them and i can tell you it does help a bit with lockup.

 I have no doubt that it helps with your lockups and I also have no
 doubt that the piece of code that helps can be bisected. I have
 mentioned 7 changes in the patch which are questionable, so the
 bisection should ideally take 3 steps. After we find the change which
 helps (and document it), we can discard the rest. That should give us
 the same stability as this patch does, but without unnecessary code
 which does cost GPU cycles (regardless of whether it is measurable on
 a particular machine or not).

 By the way, in draw_vbo, the emit functions should be called after
 r600_need_cs_space. Otherwise the command stream may overflow.

 Marek

 Again i haven't found a combination other than the outcome of the full
 patch that helps more. So be my guest bisect on rv610, rv635, rv670,
 rv710, rv740, rv770.

 So your patch doesn't fix any issue with evergreen? That's great.
 Thanks for keeping that to yourself. It's always a pleasure working
 with you. :) Now that we know the truth, the questionable changes to
 the evergreen code can be discarded freely.

No, it helps on evergreen too, redwood,juniper,turks and bart are the
only one i tested with. Evergreen is in a slightly better position but
when it comes to lockup there is no good metrics.

 Concerning older chipsets, I can do the bisection only on rs880, rv670
 and rv730. That will have to suffice. One way or another, every single
 change must be done for a *reason* and that reason should be
 documented if it's not obvious. Please give me all the necessary
 information, so that I can start bisecting. That is what lockups your
 patch fixes and where (name apps or tests, a specific place in a game,
 etc.) on what chipsets and whether hyperz is enabled.

Sorry no such things. It just helps, pick something test with and
without and you will see that with it lockup less often. I did not did
any of the change in isolation to fix a single case, it's just that
with all the change it helps. But of course you assume that i dumb and
i did spend no time testing, and just put together some random thing.

 It is very likely that all the changes I questioned in my first email
 do not make any difference with regard to lockups, because there are
 also other changes in your patch which may help too and which I fully
 agree with.

 Marek

As i said it's a package deal, i did not find a solution but i did
find something that improved the overall.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2

2012-07-23 Thread Jerome Glisse
On Mon, Jul 23, 2012 at 5:28 PM, Marek Olšák mar...@gmail.com wrote:
 On Mon, Jul 23, 2012 at 4:25 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Sun, Jul 22, 2012 at 8:58 PM, Marek Olšák mar...@gmail.com wrote:
 On Fri, Jul 20, 2012 at 4:54 AM, Jerome Glisse j.gli...@gmail.com wrote:
 On Thu, Jul 19, 2012 at 10:25 PM, Marek Olšák mar...@gmail.com wrote:
 On Fri, Jul 20, 2012 at 4:17 AM, Jerome Glisse j.gli...@gmail.com wrote:
 On Thu, Jul 19, 2012 at 10:06 PM, Marek Olšák mar...@gmail.com wrote:
 I actually care a lot about lockups. Well, you are complaing about
 lockups, yet you have quite obvious bugs in your hyperz code, so let's
 fix them first. (I wouldn't even try and run the hyperz code in its
 current state. Please don't take that personally.) Then, if the
 lockups persist, we can start looking into *what* fixes them. You seem
 to think that this patch helps a lot, but you don't say why. Aren't
 you interested in what sequence of GPU commands helps? If I am
 counting correctly, there are 7 changes in behavior in this patch. It
 should be pretty easy to nail down the few that help, document them
 (like /* these two lines fix a lockup with hyperz */), and discard the
 rest. The documenting part is very important, so that the other
 developers won't break your code accidentally.

 Marek


 You haven't even try hyperz and you say i have an obvious bug, that's
 kind of funny, but you would not know why. I try pretty much all of

 Oh come on, I already told you about all the bugs I found in the
 hyperz patch. You now know them too, and so does everybody else
 reading mesa-dev.

 Marek


 None of the issue you pointed out showed in piglit, none of them did
 have impact on things like openarena, nexuiz, doomIII, lightmark, ...
 so no issue you pointed does not cripple the hyperz patch, it's
 working quite well for many things. Before you extrapolate, yes issue
 you pointed out have impact in backward use of GL but none the less i
 addressed them and i can tell you it does help a bit with lockup.

 I have no doubt that it helps with your lockups and I also have no
 doubt that the piece of code that helps can be bisected. I have
 mentioned 7 changes in the patch which are questionable, so the
 bisection should ideally take 3 steps. After we find the change which
 helps (and document it), we can discard the rest. That should give us
 the same stability as this patch does, but without unnecessary code
 which does cost GPU cycles (regardless of whether it is measurable on
 a particular machine or not).

 By the way, in draw_vbo, the emit functions should be called after
 r600_need_cs_space. Otherwise the command stream may overflow.

 Marek

 Again i haven't found a combination other than the outcome of the full
 patch that helps more. So be my guest bisect on rv610, rv635, rv670,
 rv710, rv740, rv770.

 So your patch doesn't fix any issue with evergreen? That's great.
 Thanks for keeping that to yourself. It's always a pleasure working
 with you. :) Now that we know the truth, the questionable changes to
 the evergreen code can be discarded freely.

As usual you make the worst assumption about me.

Cheers,
Jerome

 Concerning older chipsets, I can do the bisection only on rs880, rv670
 and rv730. That will have to suffice. One way or another, every single
 change must be done for a *reason* and that reason should be
 documented if it's not obvious. Please give me all the necessary
 information, so that I can start bisecting. That is what lockups your
 patch fixes and where (name apps or tests, a specific place in a game,
 etc.) on what chipsets and whether hyperz is enabled.

 It is very likely that all the changes I questioned in my first email
 do not make any difference with regard to lockups, because there are
 also other changes in your patch which may help too and which I fully
 agree with.

 Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2

2012-07-19 Thread Jerome Glisse
On Thu, Jul 19, 2012 at 9:00 PM, Marek Olšák mar...@gmail.com wrote:
 On Fri, Jul 20, 2012 at 1:34 AM, Jerome Glisse j.gli...@gmail.com wrote:
 On Thu, Jul 19, 2012 at 6:07 PM, Marek Olšák mar...@gmail.com wrote:
 I have these issues with the patch:

 1) On GPUs without a vertex cache, you flush the texture cache every
 draw operation. Are you kidding?

 Show me one app with perf regression due to that ? Or just go look at
 what fglrx is doing.

 I don't believe that fglrx unconditionally emits SURFACE_SYNC with
 TC_ACTION_ENA before every DRAW packet. I just don't buy that. It's
 too stupid to be true. And considering that it wasn't needed before,
 it's not needed now either. Please give me some other argument than
 just fglrx.

No fglrx don't set it for each draw, fglrx set it if a bunch of reg is
touch. Given than right now we pretty much always touch one of those
reg btw draw it just turn up that my patch trigger the flush btw each
draw.


 2) All colorbuffers / streamout buffers are flushed, even those which
 are not enabled. E.g. instead of flushing only CB0 when there is only
 one, this code flushes all of them. Why? This either needs an
 explanation or it should only flush the buffers which are enabled
 (like the old code did).

 fglrx + no perf regression ...

 The no perf regression argument doesn't apply here, because it just
 might not be the bottleneck now. I'm willing to step aside from this
 one issue though.

I am just trying to stick to fglrx pattern.



 3) Please explain:
 - why you added PS_PARTIAL_FLUSH in r600_texture_barrier and
 r600_set_framebuffer_state.

 fglrx is doing something similar

 But not exactly the same thing, right? So there's no reason for it to be 
 there.

It's hard to do as fglrx as the pattern is evading me no matter how
much different app command stream i look at i always find an exception
to rule i formulating.


 - why you added CACHE_FLUSH_AND_INV_EVENT in set_framebuffer_state for
 R700 and evergreen.

 fglrx ...

 - why you applied the CB flush workarounds meant for RV6xx to all R600
 and R700 chipsets.

 fglrx ...

 - why the streamout workaround for RV6xx (S_0085F0_DEST_BASE_0_ENA) is
 applied to all R600, R700, and evergreen chipsets.

 didn't hurt thought fglrx is not using that at all but i did not
 wanted to remove it

 Well, you didn't remove it. You added it for those other chipsets.
 That's a difference. You don't even know what you did there, do you?
 :) All the things I mentioned are either half-assed or added for no
 reason. Fglrx might do all sorts of stupid things or for its own
 reasons, but that doesn't mean it's automatically good for us. Besides
 that, it's almost impossible to figure out why a CS was built up
 exactly the way it was without access to the driver code and to its 
 developers.

Oh yeah i don't have fucking clue, i am fucking cluesless, i am just a
fool that write fucking random line of code and have no fucking idea
of what i am doing. Of course you know better, please enlight me.

I am totaly on board with fglrx doing stupid things but yet it does
not lockup ... so one of those stupid things is important and until
someone figure which one i would rather do more stupid thing and not
lockup then trying to pretend that flushing is a bottleneck with the
driver right now.


 - why R600_CONTEXT_FLUSH_AND_INV emits SURFACE_SYNC on evergreen,
 resulting in emission of SURFACE_SYNC twice in a row in most
 situations.

 fglrx is doing that and without that lockup ...

 Hm, now you're talking. So do you need:

 FLUSH_AND_INV +
 SURFACE_SYNC (COHER_CNTL = ~0)

 or do you need:

 FLUSH_AND_INV +
 SURFACE_SYNC (COHER_CNTL = ~0) +
 SURFACE_SYNC (COHER_CNTL = according to flags)

 for it not to lock up?

flush inv is always follow by surface sync with few exception (on
which i am not clear but there is always a surface sync before a draw
after a flush inv.


 Flushing has always worked without all the changes (1, 2, 3) mentioned
 above, so please if you don't have a reasonable explanation, revert to
 the old behavior.

 Well if you have a better solution please show me ...

 I already showed you in the first reply. If you are unwilling to
 change your patches even a little bit, I'll happily take them over
 from you.

 Marek

Oh i will change them, just not the way you like, i am trying to avoid
lockup, you oubviously don't give a shit about that

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2

2012-07-19 Thread Jerome Glisse
On Thu, Jul 19, 2012 at 10:06 PM, Marek Olšák mar...@gmail.com wrote:
 I actually care a lot about lockups. Well, you are complaing about
 lockups, yet you have quite obvious bugs in your hyperz code, so let's
 fix them first. (I wouldn't even try and run the hyperz code in its
 current state. Please don't take that personally.) Then, if the
 lockups persist, we can start looking into *what* fixes them. You seem
 to think that this patch helps a lot, but you don't say why. Aren't
 you interested in what sequence of GPU commands helps? If I am
 counting correctly, there are 7 changes in behavior in this patch. It
 should be pretty easy to nail down the few that help, document them
 (like /* these two lines fix a lockup with hyperz */), and discard the
 rest. The documenting part is very important, so that the other
 developers won't break your code accidentally.

 Marek


You haven't even try hyperz and you say i have an obvious bug, that's
kind of funny, but you would not know why. I try pretty much all of
the thing my patch do in isolation and combination of each other and
the only way i got improvement is with something similar to this
patch. Remove one things and i can find things program that are more
likely to lockup.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2

2012-07-19 Thread Jerome Glisse
On Thu, Jul 19, 2012 at 10:25 PM, Marek Olšák mar...@gmail.com wrote:
 On Fri, Jul 20, 2012 at 4:17 AM, Jerome Glisse j.gli...@gmail.com wrote:
 On Thu, Jul 19, 2012 at 10:06 PM, Marek Olšák mar...@gmail.com wrote:
 I actually care a lot about lockups. Well, you are complaing about
 lockups, yet you have quite obvious bugs in your hyperz code, so let's
 fix them first. (I wouldn't even try and run the hyperz code in its
 current state. Please don't take that personally.) Then, if the
 lockups persist, we can start looking into *what* fixes them. You seem
 to think that this patch helps a lot, but you don't say why. Aren't
 you interested in what sequence of GPU commands helps? If I am
 counting correctly, there are 7 changes in behavior in this patch. It
 should be pretty easy to nail down the few that help, document them
 (like /* these two lines fix a lockup with hyperz */), and discard the
 rest. The documenting part is very important, so that the other
 developers won't break your code accidentally.

 Marek


 You haven't even try hyperz and you say i have an obvious bug, that's
 kind of funny, but you would not know why. I try pretty much all of

 Oh come on, I already told you about all the bugs I found in the
 hyperz patch. You now know them too, and so does everybody else
 reading mesa-dev.

 Marek


None of the issue you pointed out showed in piglit, none of them did
have impact on things like openarena, nexuiz, doomIII, lightmark, ...
so no issue you pointed does not cripple the hyperz patch, it's
working quite well for many things. Before you extrapolate, yes issue
you pointed out have impact in backward use of GL but none the less i
addressed them and i can tell you it does help a bit with lockup.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: hyperz

2012-07-14 Thread Jerome Glisse
On Sat, Jul 14, 2012 at 9:56 AM, Alex Deucher alexdeuc...@gmail.com wrote:
 On Fri, Jul 13, 2012 at 8:11 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Fri, Jul 13, 2012 at 8:08 PM, Marek Olšák mar...@gmail.com wrote:
 Hi Jerome,

 I couldn't open the patch, because freedesktop.org doesn't seem to
 work for me today, it always times out.

 Anyway, non-working code shouldn't be merged into Mesa master, because
 it decreases the quality of the driver and is a pain to maintain. As
 as I said in another email, merging non-working code on purpose is a
 very bad idea. Please don't do it.

 Marek

 Code works, no regression, but if you enable hyperz get ready to
 experience lockup, likelyhood depends on what you are doing.

 So no i don't consider this a non working code. It does work and
 doesn't regress.

 Is it just 6xx/7xx that locks or also evergreen?  Also even if we
 don't turn on hyperz, it probably makes sense to always have an htile
 buffer bound as the htile cache (and backing htile buffer) is used for
 Z/S compression, culling, fast ops, etc. in addition to HiZ/S if a Z
 or S buffer is bound.

 Alex

Just enabling htile surface is enough to trigger the lockup, thus we
can't bind the htile buffer. Quite frankly i don't know how much
evergreen is an issue, i pretty much stuck with r6xx/r7xx as they were
always locking up with my test case. Thought i have been able to
lockup evergreen but i did have the feeling that it was lot less
likely to happen.

Basicly to trigger the lockup you have to switch btw a lot of depth
surface/htile surface, if you just have a single depth buffer you will
be fine. Thus most use case will just work properly.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g: hyperz

2012-07-13 Thread Jerome Glisse
On Fri, Jul 13, 2012 at 8:08 PM, Marek Olšák mar...@gmail.com wrote:
 Hi Jerome,

 I couldn't open the patch, because freedesktop.org doesn't seem to
 work for me today, it always times out.

 Anyway, non-working code shouldn't be merged into Mesa master, because
 it decreases the quality of the driver and is a pain to maintain. As
 as I said in another email, merging non-working code on purpose is a
 very bad idea. Please don't do it.

 Marek

Code works, no regression, but if you enable hyperz get ready to
experience lockup, likelyhood depends on what you are doing.

So no i don't consider this a non working code. It does work and
doesn't regress.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: improve flushed depth texture handling v2

2012-07-10 Thread Jerome Glisse
On Tue, Jul 10, 2012 at 2:10 PM, Marek Olšák mar...@gmail.com wrote:
 On Tue, Jul 10, 2012 at 6:40 AM, Vadim Girlin vadimgir...@gmail.com wrote:
 On Sat, 2012-07-07 at 01:48 +0200, Marek Olšák wrote:
 On Wed, Jun 27, 2012 at 1:34 AM, Vadim Girlin vadimgir...@gmail.com wrote:
  Use r600_resource_texture::flished_depth_texture for GPU access, and
  allocate it in the VRAM. For transfers we'll allocate untiled texture in 
  the
  GTT and store it in the r600_transfer::staging.
 
  Improves performance when flushed depth texture is frequently used by the
  GPU (about 30% for Lightsmark).
 
  Signed-off-by: Vadim Girlin vadimgir...@gmail.com
  ---
 
  Fixes fbo-clear-formats, fbo-generatemipmap-formats, no regressions on
  evergreen

 Hi,

 is there any reason this patch hasn't been committed yet?


 Hi,

 I have some doubts because it was benchmarked by phoronix and there were
 regressions, though I suspect that something is wrong with the results:

 http://www.phoronix.com/scan.php?page=articleitem=amd_r600g_texdepthnum=4

 I was going to look into it but had no time yet. I'd like to be sure
 that there are no regressions before committing.

 Well, there's nothing wrong with your patch. I wouldn't trust
 benchmarks run with the Unity desktop so much. I myself had to switch
 from Unity 2D to Xfce just to get consistent results when testing
 performance.

 Now that your patch separates flushing for texturing and transfers, I
 think we could make it a little bit faster by imlementing an in-place
 flush for texturing (that is without having to allocate another
 resource).

 Marek

In place flush are useful for the case where you know you wont reuse
the depth buffer as a depth buffer, or if you know next operation will
be a gClear on the depth buffer. What i am worried about is that
recompression might not work in place, for it to work you need to have
db decompressed into db tiling format and not cb tiling format.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: improve flushed depth texture handling v2

2012-07-10 Thread Jerome Glisse
On Tue, Jul 10, 2012 at 5:16 PM, Marek Olšák mar...@gmail.com wrote:
 On Tue, Jul 10, 2012 at 10:00 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Tue, Jul 10, 2012 at 2:10 PM, Marek Olšák mar...@gmail.com wrote:
 On Tue, Jul 10, 2012 at 6:40 AM, Vadim Girlin vadimgir...@gmail.com wrote:
 On Sat, 2012-07-07 at 01:48 +0200, Marek Olšák wrote:
 On Wed, Jun 27, 2012 at 1:34 AM, Vadim Girlin vadimgir...@gmail.com 
 wrote:
  Use r600_resource_texture::flished_depth_texture for GPU access, and
  allocate it in the VRAM. For transfers we'll allocate untiled texture 
  in the
  GTT and store it in the r600_transfer::staging.
 
  Improves performance when flushed depth texture is frequently used by 
  the
  GPU (about 30% for Lightsmark).
 
  Signed-off-by: Vadim Girlin vadimgir...@gmail.com
  ---
 
  Fixes fbo-clear-formats, fbo-generatemipmap-formats, no regressions on
  evergreen

 Hi,

 is there any reason this patch hasn't been committed yet?


 Hi,

 I have some doubts because it was benchmarked by phoronix and there were
 regressions, though I suspect that something is wrong with the results:

 http://www.phoronix.com/scan.php?page=articleitem=amd_r600g_texdepthnum=4

 I was going to look into it but had no time yet. I'd like to be sure
 that there are no regressions before committing.

 Well, there's nothing wrong with your patch. I wouldn't trust
 benchmarks run with the Unity desktop so much. I myself had to switch
 from Unity 2D to Xfce just to get consistent results when testing
 performance.

 Now that your patch separates flushing for texturing and transfers, I
 think we could make it a little bit faster by imlementing an in-place
 flush for texturing (that is without having to allocate another
 resource).

 Marek

 In place flush are useful for the case where you know you wont reuse
 the depth buffer as a depth buffer, or if you know next operation will
 be a gClear on the depth buffer. What i am worried about is that
 recompression might not work in place, for it to work you need to have
 db decompressed into db tiling format and not cb tiling format.

 The case where the depth is not reused is the most common one. It
 might even be the only one in practice. Depth textures are most
 commonly used for shadow mapping, which is the not-reusing case. They
 can also be used to implement deferred rendering (though that's not
 very common), which means the same as shadow mapping for us. Actually,
 no graphics algorithm comes to mind that would do write-texture-write
 with the same depth buffer.

 Marek

I am not saying it's not the most common one, i am saying that
recompressing might be more complex (recompress to different buffer
then copy back to original buffer, or copy buffer and uncompress from
copy).

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] [RFC] r600g: improve handling of the shader exports

2012-06-26 Thread Jerome Glisse
On Tue, Jun 26, 2012 at 5:45 AM, Vadim Girlin vadimgir...@gmail.com wrote:
 On Fri, 2012-06-22 at 14:24 -0400, Jerome Glisse wrote:
 On Fri, Jun 22, 2012 at 10:02 AM, Vadim Girlin vadimgir...@gmail.com wrote:
   r600g: avoid unnecessary shader exports
   r600g: enable DUAL_EXPORT mode when possible
 
  First patch fixes the lockups with DUAL_EXPORT mode for me, also AFAICS it
  fixes some depth/stencil tests, though I'm not sure why, haven't looked
  into it (possibly unexpected color exports were written over the depth
  exports).
 
  Second patch enables DUAL_EXPORT mode when possible, giving about 40%
  improvement with the results of the fill demo (on juniper). Also it sets
  DB_SOURCE_FORMAT to the EXPORT_DB_TWO when in DUAL_EXPORT mode, though I'm 
  not sure yet if it has any effect on performance.
 
  I haven't tried to implement the same for pre-evergreen cards - I can't 
  test it
  anyway without r600 hw, but I guess it shouldn't be hard. AFAIK there will 
  be
  additional requirements for DUAL_EXPORT mode for r6xx (it's documented in 
  the
  R6xx_3D_Registers.pdf).
 
  There are no regressions with piglit on evergreen (juniper).

 r6xx/r7xx version WIP not working (well not improving perf)
 http://people.freedesktop.org/~glisse/0003-r600g-enable-DUAL_EXPORT-mode-when-possible-on-r6xx-.patch

 AFAIK you've fixed that already, do you have any regressions with dual
 export on r6xx/7xx? There are some issues reported on rv770 with patch 1
 - http://lists.freedesktop.org/archives/mesa-dev/2012-June/023229.html

 Vadim


Yeah i have updated patches here that fix regression. Will send
shortly once i am confident.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] r600g: avoid unnecessary shader exports

2012-06-22 Thread Jerome Glisse
On Fri, Jun 22, 2012 at 10:02 AM, Vadim Girlin vadimgir...@gmail.com wrote:
 In some cases TGSI shader has more color outputs than the number of CBs,
 so it seems we need to limit the number of color exports. This requires
 different shader variants depending on the nr_cbufs, but on the other hand
 we are doing less exports, which are very costly.

 Signed-off-by: Vadim Girlin vadimgir...@gmail.com
Reviewed-by: Jerome Glisse jgli...@redhat.com

 ---
  src/gallium/drivers/r600/evergreen_state.c   |   10 +++---
  src/gallium/drivers/r600/r600_shader.c       |   25 ++---
  src/gallium/drivers/r600/r600_shader.h       |    7 ++-
  src/gallium/drivers/r600/r600_state_common.c |    4 ++--
  4 files changed, 33 insertions(+), 13 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_state.c 
 b/src/gallium/drivers/r600/evergreen_state.c
 index b618ca8..3fe95e1 100644
 --- a/src/gallium/drivers/r600/evergreen_state.c
 +++ b/src/gallium/drivers/r600/evergreen_state.c
 @@ -2641,18 +2641,14 @@ void evergreen_pipe_shader_ps(struct pipe_context 
 *ctx, struct r600_pipe_shader
                db_shader_control |= S_02880C_KILL_ENABLE(1);

        exports_ps = 0;
 -       num_cout = 0;
        for (i = 0; i  rshader-noutput; i++) {
                if (rshader-output[i].name == TGSI_SEMANTIC_POSITION ||
                    rshader-output[i].name == TGSI_SEMANTIC_STENCIL)
                        exports_ps |= 1;
 -               else if (rshader-output[i].name == TGSI_SEMANTIC_COLOR) {
 -                       if (rshader-fs_write_all)
 -                               num_cout = rshader-nr_cbufs;
 -                       else
 -                               num_cout++;
 -               }
        }
 +
 +       num_cout = rshader-nr_ps_color_exports;
 +
        exports_ps |= S_02884C_EXPORT_COLORS(num_cout);
        if (!exports_ps) {
                /* always at least export 1 component per pixel */
 diff --git a/src/gallium/drivers/r600/r600_shader.c 
 b/src/gallium/drivers/r600/r600_shader.c
 index 63b9a03..782113b 100644
 --- a/src/gallium/drivers/r600/r600_shader.c
 +++ b/src/gallium/drivers/r600/r600_shader.c
 @@ -801,6 +801,12 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx)
                                ctx-cv_output = i;
                                break;
                        }
 +               } else if (ctx-type == TGSI_PROCESSOR_FRAGMENT) {
 +                       switch (d-Semantic.Name) {
 +                       case TGSI_SEMANTIC_COLOR:
 +                               ctx-shader-nr_ps_max_color_exports++;
 +                               break;
 +                       }
                }
                break;
        case TGSI_FILE_CONSTANT:
 @@ -1153,8 +1159,10 @@ static int r600_shader_from_tgsi(struct r600_context * 
 rctx, struct r600_pipe_sh
        ctx.colors_used = 0;
        ctx.clip_vertex_write = 0;

 +       shader-nr_ps_color_exports = 0;
 +       shader-nr_ps_max_color_exports = 0;
 +
        shader-two_side = (ctx.type == TGSI_PROCESSOR_FRAGMENT)  
 rctx-two_side;
 -       shader-nr_cbufs = rctx-nr_cbufs;

        /* register allocations */
        /* Values [0,127] correspond to GPR[0..127].
 @@ -1289,6 +1297,9 @@ static int r600_shader_from_tgsi(struct r600_context * 
 rctx, struct r600_pipe_sh
                }
        }

 +       if (shader-fs_write_all  rctx-chip_class = EVERGREEN)
 +               shader-nr_ps_max_color_exports = 8;
 +
        if (ctx.fragcoord_input = 0) {
                if (ctx.bc-chip_class == CAYMAN) {
                        for (j = 0 ; j  4; j++) {
 @@ -1528,10 +1539,17 @@ static int r600_shader_from_tgsi(struct r600_context 
 * rctx, struct r600_pipe_sh
                        break;
                case TGSI_PROCESSOR_FRAGMENT:
                        if (shader-output[i].name == TGSI_SEMANTIC_COLOR) {
 +                               /* never export more colors than the number 
 of CBs */
 +                               if (next_pixel_base = rctx-nr_cbufs) {
 +                                       /* skip export */
 +                                       j--;
 +                                       continue;
 +                               }
                                output[j].array_base = next_pixel_base++;
                                output[j].type = 
 V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PIXEL;
 +                               shader-nr_ps_color_exports++;
                                if (shader-fs_write_all  (rctx-chip_class 
 = EVERGREEN)) {
 -                                       for (k = 1; k  shader-nr_cbufs; 
 k++) {
 +                                       for (k = 1; k  rctx-nr_cbufs; k++) {
                                                j++;
                                                memset(output[j], 0, 
 sizeof(struct r600_bytecode_output));
                                                output[j].gpr = 
 shader-output[i].gpr;
 @@ -1545,6 +1563,7 @@ static int

Re: [Mesa-dev] [PATCH 2/2] r600g: enable DUAL_EXPORT mode when possible

2012-06-22 Thread Jerome Glisse
On Fri, Jun 22, 2012 at 10:02 AM, Vadim Girlin vadimgir...@gmail.com wrote:
 It seems DUAL_EXPORT on evergreen may be enabled when all CBs use 16-bit 
 export
 mode (EXPORT_4C_16BPC), also there should be at least one CB, and the PS
 shouldn't export depth/stencil.

 Signed-off-by: Vadim Girlin vadimgir...@gmail.com
Reviewed-by: Jerome Glisse jgli...@redhat.com
 ---
  src/gallium/drivers/r600/evergreen_state.c   |   46 
 ++
  src/gallium/drivers/r600/evergreend.h        |    7 
  src/gallium/drivers/r600/r600_pipe.h         |    5 +++
  src/gallium/drivers/r600/r600_state_common.c |    3 ++
  4 files changed, 55 insertions(+), 6 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_state.c 
 b/src/gallium/drivers/r600/evergreen_state.c
 index 3fe95e1..bddb67e 100644
 --- a/src/gallium/drivers/r600/evergreen_state.c
 +++ b/src/gallium/drivers/r600/evergreen_state.c
 @@ -1458,7 +1458,6 @@ static void evergreen_cb(struct r600_context *rctx, 
 struct r600_pipe_state *rsta
             (desc-channel[i].size  17 
              desc-channel[i].type == UTIL_FORMAT_TYPE_FLOAT))) {
                color_info |= S_028C70_SOURCE_FORMAT(V_028C70_EXPORT_4C_16BPC);
 -               rctx-export_16bpc = true;
        } else {
                rctx-export_16bpc = false;
        }
 @@ -1661,6 +1660,7 @@ static void evergreen_set_framebuffer_state(struct 
 pipe_context *ctx,
        struct r600_context *rctx = (struct r600_context *)ctx;
        struct r600_pipe_state *rstate = CALLOC_STRUCT(r600_pipe_state);
        uint32_t tl, br;
 +       int i;

        if (rstate == NULL)
                return;
 @@ -1674,10 +1674,16 @@ static void evergreen_set_framebuffer_state(struct 
 pipe_context *ctx,

        /* build states */
        rctx-have_depth_fb = 0;
 +       rctx-export_16bpc = true;
        rctx-nr_cbufs = state-nr_cbufs;
 -       for (int i = 0; i  state-nr_cbufs; i++) {
 +       for (i = 0; i  state-nr_cbufs; i++) {
                evergreen_cb(rctx, rstate, state, i);
        }
 +
 +       for (; i  8 ; i++) {
 +               r600_pipe_state_add_reg(rstate, R_028C70_CB_COLOR0_INFO + i * 
 0x3C, 0);
 +       }
 +
        if (state-zsbuf) {
                evergreen_db(rctx, rstate, state);
        }
 @@ -2585,6 +2591,7 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, 
 struct r600_pipe_shader
        int ninterp = 0;
        boolean have_linear = FALSE, have_centroid = FALSE, have_perspective = 
 FALSE;
        unsigned spi_baryc_cntl, sid, tmp, idx = 0;
 +       unsigned z_export = 0, stencil_export = 0;

        rstate-nregs = 0;

 @@ -2633,13 +2640,16 @@ void evergreen_pipe_shader_ps(struct pipe_context 
 *ctx, struct r600_pipe_shader

        for (i = 0; i  rshader-noutput; i++) {
                if (rshader-output[i].name == TGSI_SEMANTIC_POSITION)
 -                       db_shader_control |= S_02880C_Z_EXPORT_ENABLE(1);
 +                       z_export = 1;
                if (rshader-output[i].name == TGSI_SEMANTIC_STENCIL)
 -                       db_shader_control |= 
 S_02880C_STENCIL_EXPORT_ENABLE(1);
 +                       stencil_export = 1;
        }
        if (rshader-uses_kill)
                db_shader_control |= S_02880C_KILL_ENABLE(1);

 +       db_shader_control |= S_02880C_Z_EXPORT_ENABLE(z_export);
 +       db_shader_control |= S_02880C_STENCIL_EXPORT_ENABLE(stencil_export);
 +
        exports_ps = 0;
        for (i = 0; i  rshader-noutput; i++) {
                if (rshader-output[i].name == TGSI_SEMANTIC_POSITION ||
 @@ -2711,8 +2721,9 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, 
 struct r600_pipe_shader
        r600_pipe_state_add_reg(rstate,
                                R_02884C_SQ_PGM_EXPORTS_PS,
                                exports_ps);
 -       r600_pipe_state_add_reg(rstate, R_02880C_DB_SHADER_CONTROL,
 -                               db_shader_control);
 +
 +       shader-db_shader_control = db_shader_control;
 +       shader-ps_depth_export = z_export | stencil_export;

        shader-sprite_coord_enable = rctx-sprite_coord_enable;
        if (rctx-rasterizer)
 @@ -2798,3 +2809,26 @@ void *evergreen_create_db_flush_dsa(struct 
 r600_context *rctx)
        /* Don't set the 'is_flush' flag in r600_pipe_dsa, evergreen doesn't 
 need it. */
        return rstate;
  }
 +
 +void evergreen_update_dual_export_state(struct r600_context * rctx)
 +{
 +       unsigned dual_export = rctx-export_16bpc  rctx-nr_cbufs 
 +                       !rctx-ps_shader-ps_depth_export;
 +
 +       unsigned db_source_format = dual_export ? V_02880C_EXPORT_DB_TWO :
 +                       V_02880C_EXPORT_DB_FULL;
 +
 +       unsigned db_shader_control = rctx-ps_shader-db_shader_control |
 +                       S_02880C_DUAL_EXPORT_ENABLE(dual_export) |
 +                       S_02880C_DB_SOURCE_FORMAT(db_source_format);
 +
 +       if (db_shader_control != rctx-db_shader_control) {
 +               struct r600_pipe_state rstate

Re: [Mesa-dev] [PATCH 0/2] [RFC] r600g: improve handling of the shader exports

2012-06-22 Thread Jerome Glisse
On Fri, Jun 22, 2012 at 10:02 AM, Vadim Girlin vadimgir...@gmail.com wrote:
  r600g: avoid unnecessary shader exports
  r600g: enable DUAL_EXPORT mode when possible

 First patch fixes the lockups with DUAL_EXPORT mode for me, also AFAICS it
 fixes some depth/stencil tests, though I'm not sure why, haven't looked
 into it (possibly unexpected color exports were written over the depth
 exports).

 Second patch enables DUAL_EXPORT mode when possible, giving about 40%
 improvement with the results of the fill demo (on juniper). Also it sets
 DB_SOURCE_FORMAT to the EXPORT_DB_TWO when in DUAL_EXPORT mode, though I'm 
 not sure yet if it has any effect on performance.

 I haven't tried to implement the same for pre-evergreen cards - I can't test 
 it
 anyway without r600 hw, but I guess it shouldn't be hard. AFAIK there will be
 additional requirements for DUAL_EXPORT mode for r6xx (it's documented in the
 R6xx_3D_Registers.pdf).

 There are no regressions with piglit on evergreen (juniper).

r6xx/r7xx version WIP not working (well not improving perf)
http://people.freedesktop.org/~glisse/0003-r600g-enable-DUAL_EXPORT-mode-when-possible-on-r6xx-.patch

Cheers,
Jerome


  src/gallium/drivers/r600/evergreen_state.c   |   56 
 --
  src/gallium/drivers/r600/evergreend.h        |    7 
  src/gallium/drivers/r600/r600_pipe.h         |    5 +++
  src/gallium/drivers/r600/r600_shader.c       |   25 ++--
  src/gallium/drivers/r600/r600_shader.h       |    7 +++-
  src/gallium/drivers/r600/r600_state_common.c |    7 +++-
  6 files changed, 88 insertions(+), 19 deletions(-)

 --
 1.7.10.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Unify SURFACE_SYNC packet emission for 3D and compute

2012-06-19 Thread Jerome Glisse
On Tue, Jun 19, 2012 at 2:06 PM, Tom Stellard thomas.stell...@amd.com wrote:
 On Tue, Jun 19, 2012 at 07:57:50PM +0200, Marek Olšák wrote:
 Hi Tom,

 This adds new calls to r600_inval_xxx_cache, which justs sets the
 dirty flag in the atom surface_sync_cmd to true, but I couldn't find
 where the compute code calls r600_emit_atom. The proper way to emit
 dirty atoms is in r600_state_common.c:843-845.


 The compute code is calling r600_flush_framebuffer() from
 compute_emit_cs, which is what calls r600_emit_atom() for
 surface_sync_cmd.

 -Tom

I am heavily refactoring all this for hyperz, but i can rebase once i
have it working.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] mesa: Add support for GL_ARB_base_instance

2012-06-19 Thread Jerome Glisse
On Mon, Jun 18, 2012 at 8:33 PM, Fredrik Höglund fred...@kde.org wrote:
 On Tuesday 19 June 2012, Brian Paul wrote:
 On 06/18/2012 02:50 PM, Fredrik Höglund wrote:
  Reviewed-by: Brian Paulbri...@vmware.com
  ---
 
  v2: Change baseinstance to base_instance in _mesa_prims
       and to baseInstance in the vbo_exec functions.
 
    src/mapi/glapi/gen/ARB_base_instance.xml |   40 +++
    src/mapi/glapi/gen/Makefile              |    1 +
    src/mapi/glapi/gen/gl_API.xml            |    3 +-
    src/mesa/main/dd.h                       |   10 +++
    src/mesa/main/dlist.c                    |   45 
    src/mesa/main/extensions.c               |    1 +
    src/mesa/main/mtypes.h                   |    1 +
    src/mesa/main/vtxfmt.c                   |    3 +
    src/mesa/vbo/vbo.h                       |    1 +
    src/mesa/vbo/vbo_exec_api.c              |    1 +
    src/mesa/vbo/vbo_exec_array.c            |  114 
  +++---
    src/mesa/vbo/vbo_save_api.c              |    2 +
    src/mesa/vbo/vbo_split_inplace.c         |    6 +-
    13 files changed, 216 insertions(+), 12 deletions(-)
    create mode 100644 src/mapi/glapi/gen/ARB_base_instance.xml

 Looks good.  Do you need me to commit/push these for you?

 Yeah, I don't have commit access, so please do.

 Fredrik


This break gallium driver, nothing render with it

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] mesa: Add support for GL_ARB_base_instance

2012-06-19 Thread Jerome Glisse
On Tue, Jun 19, 2012 at 4:46 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Mon, Jun 18, 2012 at 8:33 PM, Fredrik Höglund fred...@kde.org wrote:
 On Tuesday 19 June 2012, Brian Paul wrote:
 On 06/18/2012 02:50 PM, Fredrik Höglund wrote:
  Reviewed-by: Brian Paulbri...@vmware.com
  ---
 
  v2: Change baseinstance to base_instance in _mesa_prims
       and to baseInstance in the vbo_exec functions.
 
    src/mapi/glapi/gen/ARB_base_instance.xml |   40 +++
    src/mapi/glapi/gen/Makefile              |    1 +
    src/mapi/glapi/gen/gl_API.xml            |    3 +-
    src/mesa/main/dd.h                       |   10 +++
    src/mesa/main/dlist.c                    |   45 
    src/mesa/main/extensions.c               |    1 +
    src/mesa/main/mtypes.h                   |    1 +
    src/mesa/main/vtxfmt.c                   |    3 +
    src/mesa/vbo/vbo.h                       |    1 +
    src/mesa/vbo/vbo_exec_api.c              |    1 +
    src/mesa/vbo/vbo_exec_array.c            |  114 
  +++---
    src/mesa/vbo/vbo_save_api.c              |    2 +
    src/mesa/vbo/vbo_split_inplace.c         |    6 +-
    13 files changed, 216 insertions(+), 12 deletions(-)
    create mode 100644 src/mapi/glapi/gen/ARB_base_instance.xml

 Looks good.  Do you need me to commit/push these for you?

 Yeah, I don't have commit access, so please do.

 Fredrik


 This break gallium driver, nothing render with it

 Cheers,
 Jerome

Well nevermind, git clean -fdX did the trick sorry for the noise.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Clarifications w.r.t MSAA

2012-06-12 Thread Jerome Glisse
On Tue, Jun 12, 2012 at 8:39 AM, Christoph Bumiller
e0425...@student.tuwien.ac.at wrote:
 On 06/12/2012 02:25 PM, Olivier Galibert wrote:
 On Tue, Jun 12, 2012 at 01:50:08PM +0200, Christoph Bumiller wrote:
 First question: how many depths should be computed, and for which
 coordinates? Which of these values is associated with which sample?

 One for each sample point. The depth buffer will be multisampled as well.
 Coverage sampling (CSAA) where you have extra coverage samples that do
 NOT (necessarily) correspond to color sample locations are not covered
 by the GL spec, it's vendor-specific.

 Ok.  So that means that if the shader writes z, you have to do full
 supersampling then.


 No, I don't think that's the case. You get per-sample depth values if
 you use fixed-pipe depth, but shader-computed depth should simply be
 replicated (to all samples covered by the shader invocation), like color
 outputs.

I don't think thats how it wors, each sample will have its color and
depth value no matter if fixed pipeline or not. When resolving the
msaa surface, you only use the sample that cover the surface to make
the average.

Anyway that's my understanding.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Clarifications w.r.t MSAA

2012-06-12 Thread Jerome Glisse
On Tue, Jun 12, 2012 at 1:34 PM, Christoph Bumiller
e0425...@student.tuwien.ac.at wrote:
 On 06/12/2012 06:52 PM, Jerome Glisse wrote:
 On Tue, Jun 12, 2012 at 8:39 AM, Christoph Bumiller
 e0425...@student.tuwien.ac.at wrote:
 On 06/12/2012 02:25 PM, Olivier Galibert wrote:
 On Tue, Jun 12, 2012 at 01:50:08PM +0200, Christoph Bumiller wrote:
 First question: how many depths should be computed, and for which
 coordinates? Which of these values is associated with which sample?

 One for each sample point. The depth buffer will be multisampled as well.
 Coverage sampling (CSAA) where you have extra coverage samples that do
 NOT (necessarily) correspond to color sample locations are not covered
 by the GL spec, it's vendor-specific.

 Ok.  So that means that if the shader writes z, you have to do full
 supersampling then.


 No, I don't think that's the case. You get per-sample depth values if
 you use fixed-pipe depth, but shader-computed depth should simply be
 replicated (to all samples covered by the shader invocation), like color
 outputs.

 I don't think thats how it wors, each sample will have its color and
 depth value no matter if fixed pipeline or not. When resolving the

 Sorry, fixed-pipe was misleading, I meant the z-value from the
 rasterizer (which can be regarded as fixed functionality), not without
 (custom) shaders.

 If the shader is only invoked once for each fragment (i.e.
 MinSampleShading == 1), all the samples that belong to that fragment
 will share the same color and depth values.


So i think we agree but according to spec  MinSampleShading=1 - the
fragment shader is run once for each sample. MinSampleShading value is
a fraction of x/MIN_SAMPLE_SHADING_VALUE_ARB So if you have 8 sample
surface and you set MinSampleShading to 0.5 you will get the fragment
shader invoked for 4 sample. Note that according to spec
implementation might ignore the fraction and only cover the case
MinSampleShading==1

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Four questions about DRI1 drivers

2012-03-01 Thread Jerome Glisse
On Thu, 2012-03-01 at 13:56 -0600, Patrick Baggett wrote:
 Now I'm curious. Is it the case that every DRI1 driver could be a DRI2
 driver with enough effort? Not talking about emulating hardware
 features.
 
 
 Patrick

DRI2 impose nothing on hw capabilities. So any hw can do DRI2 even hw
without 3d engine (see virtual gem for instance).

Cheers,
Jerome


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/12] R600g: cleanups and rework of queries

2012-02-22 Thread Jerome Glisse
On Tue, Feb 21, 2012 at 7:55 PM, Marek Olšák mar...@gmail.com wrote:

 Hi everyone,

 Besides the cleanups, there are fixes for create_context fail paths and
 rework of queries. The rework is the most important, because it eliminates
 buffer_map calls (and therefore buffer_wait) in begin_query.

 There are no piglit regressions on Evergreen.

 Please review.


Reviewed.

Do you test with 2d tiling on or off ?

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g tiling final

2012-02-03 Thread Jerome Glisse
Hi,

So tiling work is i believe done. I have run piglit accross wide range
of hw and sw combination. Bottom line is new mesa on top of either old
kernel or old ddx won't regress anything. New mesa on top of proper
kernel will get you 2D tiling for texture and anything allocated by
mesa, and if you have proper DDX with option ColorTiling2D enabled you
will also get 2D tiling for front buffer and depth/stencil buffer.

For libdrm you need the lastest master. I will do a libdrm release
on monday. Afterward i will commit mesa  ddx with proper autoconf
voodoo to check for the new libdrm.

kernel patches :
http://people.freedesktop.org/~glisse/tiling/0001-drm-radeon-kms-add-support-for-streamout-v7.patch
http://people.freedesktop.org/~glisse/tiling/0001-drm-radeon-add-support-for-evergreen-ni-tiling-infor.patch

mesa patch:
http://people.freedesktop.org/~glisse/tiling/0001-r600g-add-support-for-common-surface-allocator-for-t.patch

ddx patch:
http://people.freedesktop.org/~glisse/tiling/0001-r600-evergreen-use-common-surface-allocator-for-tili.patch

Link to piglit test:
http://people.freedesktop.org/~glisse/tiling/cayman/changes.html
http://people.freedesktop.org/~glisse/tiling/cedar/changes.html
http://people.freedesktop.org/~glisse/tiling/redwood/changes.html
http://people.freedesktop.org/~glisse/tiling/juniper/changes.html
http://people.freedesktop.org/~glisse/tiling/fusion/changes.html
http://people.freedesktop.org/~glisse/tiling/rv770/changes.html
http://people.freedesktop.org/~glisse/tiling/rv710/changes.html
http://people.freedesktop.org/~glisse/tiling/rv635/changes.html
http://people.freedesktop.org/~glisse/tiling/rv610/changes.html

first column GPU name is unpatched mesa,unpatched ddx,unpatched kernel

second column surf0-ddx0 is patched mesa,patched ddx with 2d tiling
disabled and new mesa code path disabled (basicly check that nothing
regress in old code path).

third column patched mesa, unpatched ddx using new mesa code path.
This check that mesa on top of old userspace doesn't break anything.

fourth column patched mesa, patched ddx, unpatched kernel. This check
that new mesa on top of old kernel works properly.

fith column is everything is patched and 2D tiling is enabled everywhere.


Note that few test just randomly switch from pass to fail (fbo-sys-blit*,
read-front, ...).

I also tested a lot the old userspace on top of new kernel for evergreen
to make sure that the command checker doesn't regress anything. While it
reject some command stream thus were wrong and never successfully completed.
Leading to no regression in piglit (basicly second column).

Fusion doesn't have unpatched kernel run as things keep locking up for me
with unpatched kernel.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/15] R600g cleanup and rework of cache flushing

2012-01-30 Thread Jerome Glisse
On Mon, Jan 30, 2012 at 09:23:03PM +0100, Marek Olšák wrote:
 Hi everyone,
 
 This patch series is a follow-up to the previous one (Remove all uses of the 
 register mask). First, it cleans up some code and merges r600_context into 
 r600_pipe_context. The split of functionality between the two contexts made 
 absolutely no sense.
 
 Next, it adds a new mechanism for emitting states. It's largely inspired by 
 r300g and it's really simple, yet robust. (some people should seriously learn 
 what polymorphism means and how it's used to write software before even 
 writing drivers, because I feel like I am the only one making use of it in 
 r600g, which is really a shame /rant) It can be used to schedule *any* 
 commands for execution before the next draw operation, not just register 
 updates. We'll use that more often in the future. For now, it's only used for 
 cache flushes.
 
 Finally, this series completely reworks cache flushes. The problem with the 
 old code was that the flags last_flush and binding, which were stored in 
 resource structs, were possible causes of race conditions. Not only does this 
 new code fix that, it also simplifies the whole thing. The flushes are done 
 explicitly when states are changed according to this scheme:
 bind_shader - r600_inval_shader_cache
 set_constant_buffer - r600_inval_shader_cache
 bind_vertex_elements - r600_inval_shader_cache (for the fetch shader)
 bind_vertex_buffers - r600_inval_vertex_cache
 bind_sampler_views - r600_inval_texture_cache
 set_framebuffer - r600_flush_framebuffer
 flush - r600_flush_framebuffer
 
 Besides that, SURFACE_SYNC is called at most once between draw operations and 
 flushes the whole memory range. The inval/flush functions only accumulate the 
 flush flags.
 
 The rework also fixes flushes on RV670. The fbo-drawbuffers test no longer 
 causes issues. Flushing CB1_DEST_BASE was not enough, DEST_BASE_0 must be 
 flushed as well. This fixes 21 piglit tests on RV670. The flushing seems to 
 be fixed finally, but the piglit results are not yet up to par with RV730.
 
 All this code has been tested on RV670, RV730, and REDWOOD.
 

It makes no sense and it's over engineer if you forget the initial
design decision which was for a new kernel API which matched closely
what r600g had.

But i agree that against cs ioctl this design is just painful.

Anyway looks good from quick review.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] r600-r800 2D tiling

2012-01-16 Thread Jerome Glisse
On Mon, Jan 16, 2012 at 12:08:17PM +, Simon Farnsworth wrote:
 (resending due to my inability to work my e-mail client - I neither cc'd
 Jerome, nor used the correct identity, so the original appears to be held in
 moderation).
 
 On Thursday 12 January 2012, Jerome Glisse j.gli...@gmail.com wrote:
  Hi,
  
  I don't cross post as i am pretty sure all interested people are reading
  this mailing-list.
  
  Attached is kernel, libdrm, ddx, mesa/r600g patches to enable 2D tiling
  on r600 to cayman. I haven't yet done a full regression testing but 2D
  tiling seems to work ok. I would like to get feedback on 2 things :
  
  - the kernel API
 
 I notice that you don't expose all the available Evergreen parameters to
 user control (TILE_SPLIT_BYTES, NUM_BANKS are both currently fixed by the
 kernel). Is this deliberate?
 
 It looks like it's leftovers from a previous attempt to force Evergreen's
 flexible 2D tiling to behave like R600's fixed-by-hardware 2D tiling.

I need to add tile split to kernel API, num banks is not a surface parameter.
Well it is but it needs to be set to the same value as the global one. I think
it might only be usefull in multi-gpu case with different GPU (but that's
just a wild guess).

 
  - using libdrm/radeon as common place for surface allocation
  
  The second question especialy impact the layering/abstraction of gallium
  btw winsys as it make libdrm/radeon_surface API a part of the winsys.
  The ddx doesn't need as much knowledge as mesa (pretty much the whole
  mipmap tree is pointless to the ddx). So anyone have strong feeling
  about moving the whole mipmap tree computation to this common code ?
 
 I'm in favour - it means that all the code relating to the details of how
 modern Radeons tile surfaces is in one place.
 
 I've looked at the API you introduce to handle this, and it should be very
 easy to port to a non-libdrm platform - the only element of the API that's
 currently tied to libdrm is radeon_surface_manager_new, so a new platform
 shouldn't struggle to adapt it.

I am in process of reworking a bit the API but it will be very close and
only the surface manager creator will have drm specific code.

 I do have one question; how are you intending to handle passing the tiling
 parameters from the DDX to Mesa for GLX_EXT_texture_from_pixmap? Right now,
 it works because the DDX uses the surface manager's defaults for tiling, as
 does Mesa; I would expect Mesa to read out the parameters as set in the
 kernel and use those.
 
 At a future date, I can envisage the DDX wanting to choose a different
 tiling layout for DRI2 buffers, or XComposite backing pixmaps (e.g. because
 someone's benchmarked it and found that choosing something beyond the bare
 minimum that meets constraints improves performance); it would be a shame if
 we can't do this because Mesa's not flexible enough.

We don't use dri2 to communicate tiling info, we go through kernel for that.
So ddx call set_tiling ioctl and mesa call get_tiling, i haven't hooked up
the mesa side to extract various eg values yet, right now it works because
both ddx and mesa use same surface allocator param so they end up taking
same value for various eg fields. Again i am working on this. Hopefully
should be completely done this week.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] r600-r800 2D tiling

2012-01-13 Thread Jerome Glisse
On Fri, Jan 13, 2012 at 11:59:28AM +0100, Michel Dänzer wrote:
 On Don, 2012-01-12 at 14:50 -0500, Jerome Glisse wrote: 
  
  Attached is kernel, libdrm, ddx, mesa/r600g patches to enable 2D tiling
  on r600 to cayman. I haven't yet done a full regression testing but 2D
  tiling seems to work ok. I would like to get feedback on 2 things :
  
  - the kernel API
  - using libdrm/radeon as common place for surface allocation
 
 I generally like the idea of centralizing this in libdrm_radeon.
 
 
  The second question especialy impact the layering/abstraction of gallium
  btw winsys as it make libdrm/radeon_surface API a part of the winsys.
 
 That's unfortunate, but then again the Radeon Gallium drivers have never
 been very clean in this regard. I guess the first one to want to use
 them on a non-DRM platform gets to clean that up. :)
 
 
  To test you need to set ColorTiling2D to true in your xorg.conf, plan
  is to get mesa 8.0 and newer with proper support for 2D tiling and
  in 1 year, to move ColorTiling2D default value from false to true.
  (assumption is that by then we could assume that someone with a working
  ddx would also have a supported mesa)
 
 Sounds good.
 
 Note that the Mesa and X driver changes need to either continue building
 and working with older libdrm_radeon, or bump the libdrm_radeon version
 requirement in configure.ac.

Plan is to release updated libdrm before commiting to mesa, at which point
i will try to dust off my configure.ac foo.

I updated patches and are now at :
http://people.freedesktop.org/~glisse/tiling/

For them to work you need the ddx option and for mesa you need to set
R600_TILING=1  R600_SURF=1. I will remove this once i am confident that
it works accross various GPU without regression.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/1] Delete i965g

2011-11-29 Thread Jerome Glisse
On Tue, Nov 29, 2011 at 10:12 AM, Jose Fonseca jfons...@vmware.com wrote:
 The bulk is there but there are a few places missing.

 I'll update those, do some sanity checks and commit.

 Jose

Is there a good reason to delete i965g ? Maybe some people are interested in it.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Why failover module are not used?

2011-09-23 Thread Jerome Glisse
On Fri, Sep 23, 2011 at 3:18 AM,  jaco...@viatech.com.cn wrote:
 Hi all,

    In our mesa code, there is a pipe driver named failover which is not used
 at all.  I think the failover pipe driver is a good solution of the hardware
 without full capability to support GL2.0. But why it’s discarded? It’s
 because fallback solution isn’t needed for almost all hardware or because
 there is critical bug to stop using it?

    Any answer will be appreciated.



 Thanks.

 Best Regards,

 Jacob He

I think it was decided that it's better not lie about hw capabilities
and have the hw driver reject unsupported shader/features.

Regards,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] r600g: implement fragment and vertex color clamp

2011-06-27 Thread Jerome Glisse
On Mon, Jun 27, 2011 at 8:38 AM, Roland Scheidegger srol...@vmware.com wrote:
 Am 25.06.2011 00:22, schrieb Vadim Girlin:
 On 06/24/2011 11:38 PM, Jerome Glisse wrote:
 On Fri, Jun 24, 2011 at 12:29 PM, Vadim Girlinvadimgir...@gmail.com
 wrote:
 Fixes https://bugs.freedesktop.org/show_bug.cgi?id=38440

 Signed-off-by: Vadim Girlinvadimgir...@gmail.com

 As discussed previously, there is better to handle this. I think best
 solution is to always add the instruction and to conditionally execute
 them thanks to the boolean constant. If this reveal to have a too big
 impact on shader, other solution i see is adding a cf block with those
 instructions and to enable or disable that block (cf_nop) and reupload
 shader that would avoid a rebuild.

 I know its not optimal to do a full rebuild, but rebuild is needed only
 when the application will use the same shader in different clamping
 states. It won't be a problem if the application doesn't change clamping
 state or if it changes the state but uses each shader in one state only.
 So assuming that typical app will not use one shader in both states, it
 shouldn't be a problem. Is this assumption wrong? I'm not really sure
 because I have no much experience in this. But if it's wrong then it's
 probably better for performance to build and cache both versions.
 I tend to think you're right apps probably don't want to use the same
 shader both with and without clamping.

Well if boolean block (see COND field set to SQ_CF_COND_BOOL in
SQ_CF_WORD1) are free from perf point of view then i think it's best
to have one shader with the clamp instruction inside the boolean
enabled block. Only benchmark can tell.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] r600g: implement fragment and vertex color clamp

2011-06-24 Thread Jerome Glisse
On Fri, Jun 24, 2011 at 12:29 PM, Vadim Girlin vadimgir...@gmail.com wrote:
 Fixes https://bugs.freedesktop.org/show_bug.cgi?id=38440

 Signed-off-by: Vadim Girlin vadimgir...@gmail.com

As discussed previously, there is better to handle this. I think best
solution is to always add the instruction and to conditionally execute
them thanks to the boolean constant. If this reveal to have a too big
impact on shader, other solution i see is adding a cf block with those
instructions and to enable or disable that block (cf_nop) and reupload
shader that would avoid a rebuild.

But as a mean time solution i think this patch is ok

Cheers,
Jerome

 ---
  src/gallium/drivers/r600/evergreen_state.c   |    2 +
  src/gallium/drivers/r600/r600_pipe.c         |    2 +-
  src/gallium/drivers/r600/r600_pipe.h         |    7 +++-
  src/gallium/drivers/r600/r600_shader.c       |   52 
 +++---
  src/gallium/drivers/r600/r600_shader.h       |    1 +
  src/gallium/drivers/r600/r600_state.c        |    2 +
  src/gallium/drivers/r600/r600_state_common.c |   30 ++-
  7 files changed, 87 insertions(+), 9 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_state.c 
 b/src/gallium/drivers/r600/evergreen_state.c
 index f86e4d4..dfe7896 100644
 --- a/src/gallium/drivers/r600/evergreen_state.c
 +++ b/src/gallium/drivers/r600/evergreen_state.c
 @@ -256,6 +256,8 @@ static void *evergreen_create_rs_state(struct 
 pipe_context *ctx,
        }

        rstate = rs-rstate;
 +       rs-clamp_vertex_color = state-clamp_vertex_color;
 +       rs-clamp_fragment_color = state-clamp_fragment_color;
        rs-flatshade = state-flatshade;
        rs-sprite_coord_enable = state-sprite_coord_enable;

 diff --git a/src/gallium/drivers/r600/r600_pipe.c 
 b/src/gallium/drivers/r600/r600_pipe.c
 index 38801d6..12599bf 100644
 --- a/src/gallium/drivers/r600/r600_pipe.c
 +++ b/src/gallium/drivers/r600/r600_pipe.c
 @@ -377,6 +377,7 @@ static int r600_get_param(struct pipe_screen* pscreen, 
 enum pipe_cap param)
        case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_HALF_INTEGER:
        case PIPE_CAP_SM3:
        case PIPE_CAP_SEAMLESS_CUBE_MAP:
 +       case PIPE_CAP_FRAGMENT_COLOR_CLAMP_CONTROL:
                return 1;

        /* Supported except the original R600. */
 @@ -392,7 +393,6 @@ static int r600_get_param(struct pipe_screen* pscreen, 
 enum pipe_cap param)
        /* Unsupported features. */
        case PIPE_CAP_STREAM_OUTPUT:
        case PIPE_CAP_PRIMITIVE_RESTART:
 -       case PIPE_CAP_FRAGMENT_COLOR_CLAMP_CONTROL:
        case PIPE_CAP_TGSI_INSTANCEID:
        case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT:
        case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_INTEGER:
 diff --git a/src/gallium/drivers/r600/r600_pipe.h 
 b/src/gallium/drivers/r600/r600_pipe.h
 index 63ddd39..dc9aad0 100644
 --- a/src/gallium/drivers/r600/r600_pipe.h
 +++ b/src/gallium/drivers/r600/r600_pipe.h
 @@ -88,6 +88,8 @@ struct r600_pipe_sampler_view {

  struct r600_pipe_rasterizer {
        struct r600_pipe_state          rstate;
 +       boolean                         clamp_vertex_color;
 +       boolean                         clamp_fragment_color;
        boolean                         flatshade;
        unsigned                        sprite_coord_enable;
        float                           offset_units;
 @@ -125,6 +127,7 @@ struct r600_pipe_shader {
        struct r600_bo                  *bo;
        struct r600_bo                  *bo_fetch;
        struct r600_vertex_element      vertex_elements;
 +       struct tgsi_token               *tokens;
  };

  struct r600_pipe_sampler_state {
 @@ -202,6 +205,8 @@ struct r600_pipe_context {
        struct pipe_query               *saved_render_cond;
        unsigned                        saved_render_cond_mode;
        /* shader information */
 +       boolean                         clamp_vertex_color;
 +       boolean                         clamp_fragment_color;
        boolean                         spi_dirty;
        unsigned                        sprite_coord_enable;
        boolean                         flatshade;
 @@ -265,7 +270,7 @@ void r600_init_query_functions(struct r600_pipe_context 
 *rctx);
  void r600_init_context_resource_functions(struct r600_pipe_context *r600);

  /* r600_shader.c */
 -int r600_pipe_shader_create(struct pipe_context *ctx, struct 
 r600_pipe_shader *shader, const struct tgsi_token *tokens);
 +int r600_pipe_shader_create(struct pipe_context *ctx, struct 
 r600_pipe_shader *shader);
  void r600_pipe_shader_destroy(struct pipe_context *ctx, struct 
 r600_pipe_shader *shader);
  int r600_find_vs_semantic_index(struct r600_shader *vs,
                                struct r600_shader *ps, int id);
 diff --git a/src/gallium/drivers/r600/r600_shader.c 
 b/src/gallium/drivers/r600/r600_shader.c
 index 904cc69..2e5d4a6 100644
 --- a/src/gallium/drivers/r600/r600_shader.c
 +++ b/src/gallium/drivers/r600/r600_shader.c
 @@ -118,9 +118,9 @@ static 

Re: [Mesa-dev] [PATCH 0/3] r600g patches

2011-06-24 Thread Jerome Glisse
On Fri, Jun 24, 2011 at 12:29 PM, Vadim Girlin vadimgir...@gmail.com wrote:
 #1 fixes slots order for x  y writes in the LIT implementation.
 Without this patch fp-lit-mask piglit test fails after patch 3. It seems
 wrong order causes wrong PV.* values for the next instruction.

 #2 reduces unneeded calls to r600_spi_update.

 #3 implements color clamping in shaders by adding MOV_SAT R,R
 instructions for each color output before export. Shaders are rebuilt when
 clamping state changes.

 Fixes https://bugs.freedesktop.org/show_bug.cgi?id=38440

 There are no regressions with r600.tests on evergreen with these patches.

  r600g: LIT: fix xy slots order
  r600g: optimize spi update
  r600g: implement fragment and vertex color clamp

  src/gallium/drivers/r600/evergreen_state.c   |    2 +
  src/gallium/drivers/r600/r600_pipe.c         |    2 +-
  src/gallium/drivers/r600/r600_pipe.h         |    8 +++-
  src/gallium/drivers/r600/r600_shader.c       |   74 
 --
  src/gallium/drivers/r600/r600_shader.h       |    1 +
  src/gallium/drivers/r600/r600_state.c        |    2 +
  src/gallium/drivers/r600/r600_state_common.c |   40 --
  7 files changed, 106 insertions(+), 23 deletions(-)


Pushed the series thanks

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] linker: Reject shaders that use too many varyings

2011-06-23 Thread Jerome Glisse
On Wed, Jun 22, 2011 at 10:49 PM, Alex Deucher alexdeuc...@gmail.com wrote:
 On Wed, Jun 22, 2011 at 10:12 PM, Roland Scheidegger srol...@vmware.com 
 wrote:
 Am 21.06.2011 20:59, schrieb Sven Arvidsson:
 This change broke a whole lot of stuff on r600g, for example Unigine
 Heaven:

       shader uses too many varying components (36  32)

 It looks like the r600g driver claims to only support 10 varyings, which
 the state tracker reduces to 8 (as it subtracts the supposedly included
 color varyings).
 At first sight I can't quite see why it's limited to 10, all r600 chips
 should be able to handle 32 (dx10 requirement) but of course the driver
 might not (mesa itself is limited to 16 it seems). If it worked just
 fine before that suggests it indeed works just fine with more...
 Someone more familiar with the driver should be able to tell if it's
 safe to increase the limit to 32 (the state tracker will cap it to 16).

 The hardware definitely supports 32.  I'm not sure why it's currently
 set to 10; I don't see any limitations in the code off hand.

 Alex

IIRC it's just cut  paste from r300g it can be safely bump

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] linker: Reject shaders that use too many varyings

2011-06-23 Thread Jerome Glisse
On Thu, Jun 23, 2011 at 10:38 AM, Roland Scheidegger srol...@vmware.com wrote:
 Am 23.06.2011 16:09, schrieb Jerome Glisse:
 On Wed, Jun 22, 2011 at 10:49 PM, Alex Deucher alexdeuc...@gmail.com wrote:
 On Wed, Jun 22, 2011 at 10:12 PM, Roland Scheidegger srol...@vmware.com 
 wrote:
 Am 21.06.2011 20:59, schrieb Sven Arvidsson:
 This change broke a whole lot of stuff on r600g, for example Unigine
 Heaven:

       shader uses too many varying components (36  32)

 It looks like the r600g driver claims to only support 10 varyings, which
 the state tracker reduces to 8 (as it subtracts the supposedly included
 color varyings).
 At first sight I can't quite see why it's limited to 10, all r600 chips
 should be able to handle 32 (dx10 requirement) but of course the driver
 might not (mesa itself is limited to 16 it seems). If it worked just
 fine before that suggests it indeed works just fine with more...
 Someone more familiar with the driver should be able to tell if it's
 safe to increase the limit to 32 (the state tracker will cap it to 16).

 The hardware definitely supports 32.  I'm not sure why it's currently
 set to 10; I don't see any limitations in the code off hand.

 Alex

 IIRC it's just cut  paste from r300g it can be safely bump

 Ok Marek bumped it to 34. That seems to be lying too I don't think it
 could handle 32 generic inputs and 2 colors. But there's no way to
 really express that right now.

 Roland


Also iirc r6xx/r7xx needs special code for handling varying over 16,
can't remember if we had proper code for that.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Status of the GLSL-TGSI translator

2011-06-16 Thread Jerome Glisse
On Thu, Jun 16, 2011 at 10:08 AM, Brian Paul bri...@vmware.com wrote:
 On 06/15/2011 03:38 PM, Bryan Cain wrote:

 My work on the GLSL IR to TGSI translator I announced on the list this
 April is now at the point where I think it is ready to be merged into
 Mesa.  It is stable and doesn't regress any piglit tests on softpipe or
 nv50.

 It adds native integer support as required by GLSL 1.30, although it is
 currently disabled for all drivers since GLSL 1.30 support is not
 complete yet and most Gallium drivers haven't implemented the TGSI
 integer opcodes.  (This would be a good time for Gallium driver
 developers to add support for TGSI's integer opcodes, which are
 currently only implemented in softpipe.)

 Developing this necessitated significant changes elsewhere in Mesa, and
 some small changes in Gallium.  This means that some of the commits in
 my branch probably need to be reviewed by the developers of those
 components.

 If I had commit access to Mesa, I would create a branch for this work in
 the main Mesa repository.  But since I am still waiting on my
 freedesktop.org account to be created, I have pushed the latest version
 to the glsl-to-tgsi branch of my personal Mesa repository on GitHub:

 Git clone URL: git://github.com/Plombo/mesa.git
 Web interface for viewing commits:
 https://github.com/Plombo/mesa/commits/glsl-to-tgsi

 Hopefully my freedesktop.org account will be created soon (I have
 already had my account request approved), so that I can push this to a
 branch in the central repository.

 Looks like nice work, Bryan.

 Just a few minor questions/comments for now:

 1. The st_fragment/vertex/geometry_program structs now have a glsl_to_tgsi
 field.  I did a grep, but I couldn't find where that field is assigned.  Can
 you clue me in?

 2. The above mentioned program structs contains an old Mesa instruction
 program AND/OR(?) a GLSL IR.  Do both types of representations co-exist
 sometimes?  Perhaps you could update the comments on those structs to
 explain that.

 3. Kind of a follow-on: for glDrawPixels and glBitmap we take the original
 program code (in Mesa form) and prepend extra instructions for fetching the
 fragment color or doing the fragment kill.  Do we always have the Mesa
 instructions for this?  It seems we don't normally want to generate Mesa
 instructions all the time but we still need them sometimes.

I must be missing something but why do we need to take the original program for
those ?

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] GLSL IR int-to-float pass

2011-05-25 Thread Jerome Glisse
On Tue, May 24, 2011 at 8:09 PM, Bryan Cain bryanca...@gmail.com wrote:
 Hi,

 In the past few days, I've been working on native integer support in my
 GLSL to TGSI translator.  Something that's come to my attention is that
 supporting Gallium targets with and without integer support using a
 single GLSL IR backend will more or less require a GLSL IR pass to
 convert int, uint, and possibly bool variables and operations to floats.

 Currently, this is done directly in the backend, in both ir_to_mesa and
 st_glsl_to_tgsi.  However, the mod_to_fract and div_to_mul_rcp lowering
 passes for GLSL IR need to know whether to lower integer modulus and
 division operations to their corresponding float operations.  (They both
 do this in Mesa master without asking the backend, but that will be easy
 to change later.)  So a GLSL IR pass will be needed to do the type lowering.

 Such a pass would also have the advantage of less duplicated
 functionality between backends, since ir_to_mesa could also take
 advantage of the pass to eliminate some code.

 I'm more than willing to try writing such a pass myself if no one else
 is interested in doing it, but I figure I should make sure there are no
 objections before starting on it.

 Bryan

TGSI needs to grow type support (int, uint and possibly int8,16,32..)

Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] GLSL IR int-to-float pass

2011-05-25 Thread Jerome Glisse
On Wed, May 25, 2011 at 9:41 AM, Keith Whitwell kei...@vmware.com wrote:
 On Wed, 2011-05-25 at 09:32 -0400, Jerome Glisse wrote:
 On Tue, May 24, 2011 at 8:09 PM, Bryan Cain bryanca...@gmail.com wrote:
  Hi,
 
  In the past few days, I've been working on native integer support in my
  GLSL to TGSI translator.  Something that's come to my attention is that
  supporting Gallium targets with and without integer support using a
  single GLSL IR backend will more or less require a GLSL IR pass to
  convert int, uint, and possibly bool variables and operations to floats.
 
  Currently, this is done directly in the backend, in both ir_to_mesa and
  st_glsl_to_tgsi.  However, the mod_to_fract and div_to_mul_rcp lowering
  passes for GLSL IR need to know whether to lower integer modulus and
  division operations to their corresponding float operations.  (They both
  do this in Mesa master without asking the backend, but that will be easy
  to change later.)  So a GLSL IR pass will be needed to do the type 
  lowering.
 
  Such a pass would also have the advantage of less duplicated
  functionality between backends, since ir_to_mesa could also take
  advantage of the pass to eliminate some code.
 
  I'm more than willing to try writing such a pass myself if no one else
  is interested in doing it, but I figure I should make sure there are no
  objections before starting on it.
 
  Bryan

 TGSI needs to grow type support (int, uint and possibly int8,16,32..)

 Or go away entirely...

 I'm not trying to impose a direction on this, but it seems like the GLSL
 IR-TGSI converter (once running) could be pushed down into the
 individual drivers and GLSL IR or a close cousin of it could become the
 gallium-level interface.  Then individual drivers could be modified to
 consume IR directly.

 Keith



I am also in favor of getting rid of tgsi, i would prefer having the
driver callback into mesa to set informations mesa needs from the
shader, for instance that would allow driver to pick where they put
attribute (might be a huge win on hw like r6xx or newer) and few
others things like that.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] shader-db, and justifying an i965 compiler optimization.

2011-05-18 Thread Jerome Glisse
On Tue, May 17, 2011 at 11:22 PM, Eric Anholt e...@anholt.net wrote:
 One of the pain points of working on compiler optimizations has been
 justifying them -- sometimes I come up with something I think is
 useful and spend a day or two on it, but the value doesn't show up as
 fps in the application that suggested the optimization to me.  Then I
 wonder if this transformation of the code is paying off in general,
 and thus if I should push it.  If I don't push it, I end up bringing
 that patch out on every application I look at that it could affect, to
 see if now I finally have justification to get it out of a private
 branch.

 At a conference this week, we heard about how another team is are
 using a database of (assembly) shaders, which they run through their
 compiler and count resulting instructions for testing purposes.  This
 sounded like a fun idea, so I threw one together.  Patch #1 is good in
 general (hey, link errors, finally!), but also means that a quick hack
 to glslparsertest makes it link a passing compile shader and therefore
 generate assembly that gets dumped under INTEL_DEBUG=wm.  Patch #2 I
 used for automatic scraping of shaders in every application I could
 find on my system at the time.  The open-source ones I pushed to:

 http://cgit.freedesktop.org/~anholt/shader-db

 And finally, patch #3 is something I built before but couldn't really
 justify until now.  However, given that it reduced fragment shader
 instructions 0.3% across 831 shaders (affecting 52 of them including
 yofrankie, warsow, norsetto, and gstreamer) and didn't increase
 instructions anywhere, I'm a lot happier now.

 Hopefully we hook up EXT_timer_query to apitrace soon so I can do more
 targeted optimizations and need this less :) In the meantime, I hope
 this can prove useful to others -- if you want to contribute
 appropriately-licensed shaders to the database so we track those, or
 if you want to make the analysis work on your hardware backend, feel
 free.


I have been thinking at doing somethings slightly different. Sadly
instruction count is not necesarily the best metric to evaluate
optimization performed by shader compiler. Hidding texture fetch
latency of a shader can improve performance a lot more than saving 2
instructions. So my idea was to do a gl app that render into
framebuffer thousand time the same shader. The use of fbo is to avoid
to have things like swapbuffer or a like to play a role while we are
solely interested in shader performance. Also use an fbo as big as
possible so fragment shader has a lot of pixel to go through and i
believe disabling things like blending, zbuffer ... so no other part
of the pipeline impact in anyway the shader.

Others things might play a role, for instance if we provide small
dummy texture we might just hide the gain texture fetch optimization
might give, as the GPU might be able to have the texture in cache and
thus have very low latency on each texture fetch. Same if we are using
same texture for all unit, texture cache might hide latency that real
application might otherwise face. So i think we need to have big
enough dummy texture like 512*512 and different one for each unit,
also try to provide random u,v for texture fetch so that texture cache
doesn't hide too much of the latency.

I am sure i am missing other factor that we should try to diminish
while testing for shader performance.

I think such things isn't a good fit for piglit but it can still be
added as a subtools (so that we don't add yet another repository)

Thanks a lot for extracting all those shader, i am sure we can get
some people to write us shader with some what advance math under
acceptable license.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] shader-db, and justifying an i965 compiler optimization.

2011-05-18 Thread Jerome Glisse
On Wed, May 18, 2011 at 3:16 PM, Eric Anholt e...@anholt.net wrote:
 On Wed, 18 May 2011 11:05:39 -0400, Jerome Glisse j.gli...@gmail.com wrote:
 On Tue, May 17, 2011 at 11:22 PM, Eric Anholt e...@anholt.net wrote:
  One of the pain points of working on compiler optimizations has been
  justifying them -- sometimes I come up with something I think is
  useful and spend a day or two on it, but the value doesn't show up as
  fps in the application that suggested the optimization to me.  Then I
  wonder if this transformation of the code is paying off in general,
  and thus if I should push it.  If I don't push it, I end up bringing
  that patch out on every application I look at that it could affect, to
  see if now I finally have justification to get it out of a private
  branch.
 
  At a conference this week, we heard about how another team is are
  using a database of (assembly) shaders, which they run through their
  compiler and count resulting instructions for testing purposes.  This
  sounded like a fun idea, so I threw one together.  Patch #1 is good in
  general (hey, link errors, finally!), but also means that a quick hack
  to glslparsertest makes it link a passing compile shader and therefore
  generate assembly that gets dumped under INTEL_DEBUG=wm.  Patch #2 I
  used for automatic scraping of shaders in every application I could
  find on my system at the time.  The open-source ones I pushed to:
 
  http://cgit.freedesktop.org/~anholt/shader-db
 
  And finally, patch #3 is something I built before but couldn't really
  justify until now.  However, given that it reduced fragment shader
  instructions 0.3% across 831 shaders (affecting 52 of them including
  yofrankie, warsow, norsetto, and gstreamer) and didn't increase
  instructions anywhere, I'm a lot happier now.
 
  Hopefully we hook up EXT_timer_query to apitrace soon so I can do more
  targeted optimizations and need this less :) In the meantime, I hope
  this can prove useful to others -- if you want to contribute
  appropriately-licensed shaders to the database so we track those, or
  if you want to make the analysis work on your hardware backend, feel
  free.
 

 I have been thinking at doing somethings slightly different. Sadly
 instruction count is not necesarily the best metric to evaluate
 optimization performed by shader compiler. Hidding texture fetch
 latency of a shader can improve performance a lot more than saving 2
 instructions. So my idea was to do a gl app that render into
 framebuffer thousand time the same shader. The use of fbo is to avoid
 to have things like swapbuffer or a like to play a role while we are
 solely interested in shader performance. Also use an fbo as big as
 possible so fragment shader has a lot of pixel to go through and i
 believe disabling things like blending, zbuffer ... so no other part
 of the pipeline impact in anyway the shader.

 You might take a look at mesa-demos/src/perf for that.  I haven't had
 success using them for performance work due to the noisiness of the
 results.

 More generally, imo, the problem with that plan is you have to build the
 shaders yourself and justify to yourself why that shader you wrote is
 representative, and you spend all your time on building the tests when
 you just wanted to know if an instruction-reduction optimization did
 anything.  shader-db took me one evening to build and collect for all
 applications I had (I've got a personal branch for all the closed-source
 stuff :/ )

Shader is a bunch of input, so for each shader collected the issue is
to provide proper input, texture could use dummy texture unless the
shader have some dependency on the texture data (like if the texture
fetched data determine the number of iteration or is use to kill a
fragment, ...). Well it's all about going through know shader and
building a reasonable set of input for each of them, it's time
consuming but i believe it brings a lot more for testing point of
view.

 For actual performance testing of apps without idsoftware-style
 timedemos, I'm way more excited by the potential of using apitrace with
 EXT_timer_query to decide which shaders I should be analyzing, and then
 I'd know afterward whether I impacted a real application by replaying
 the trace.  That is, assuming I didn't increase CPU costs in the
 process, which is where an apitrace replay would not be representative.

 Our perspective is: if we are driving the hardware anywhere below what
 is possible, that is a bug that we should fix.  Analyzing the costs of
 instructions, scheduling impacts, CPU overhead impacts, etc. may be out
 of scope for shader-db, but does make some types of analysis quick and
 easy (test all shaders you have ever seen of in a couple minutes).

 I  agree that shader-db provide a usefull tools, i am just convinced
that number of instruction in complex shader is a bad metric especialy
when considering things like r6xx and newer class of hw where texture
fetch and instruction can run

Re: [Mesa-dev] [PATCH] r600g: add support for anisotropic filtering

2011-05-06 Thread Jerome Glisse
Please resend by attaching the patch not pasting it

On Fri, May 6, 2011 at 4:53 PM, Carl-Philip Haensch
carl-philip.haen...@mailbox.tu-dresden.de wrote:
 From b5ad4e6fb399203afcfe2a5ccb35bb8ccad28b65 Mon Sep 17 00:00:00 2001
 From: Carl-Philip Haensch carli@carli-laptop.(none)
 Date: Fri, 6 May 2011 22:48:08 +0200
 Subject: [PATCH] r600g: add support for anisotropic filtering

 ---
  src/gallium/drivers/r600/r600_state.c |   20 +---
  src/gallium/drivers/r600/r600d.h      |    9 +
  2 files changed, 26 insertions(+), 3 deletions(-)

 diff --git a/src/gallium/drivers/r600/r600_state.c
 b/src/gallium/drivers/r600/r600_state.c
 index 3f979cf..aeffb9e 100644
 --- a/src/gallium/drivers/r600/r600_state.c
 +++ b/src/gallium/drivers/r600/r600_state.c
 @@ -364,6 +364,17 @@ static void *r600_create_rs_state(struct pipe_context
 *ctx,
        return rstate;
  }

 +
 +
 +static inline unsigned r600_tex_aniso_filter(unsigned filter)
 +{
 +       if (filter = 1)   return 0;
 +       if (filter = 2)   return 1;
 +       if (filter = 4)   return 2;
 +       if (filter = 8)   return 3;
 +        /* else */        return 4;
 +}
 +
  static void *r600_create_sampler_state(struct pipe_context *ctx,
                                        const struct pipe_sampler_state
 *state)
  {
 @@ -376,13 +387,15 @@ static void *r600_create_sampler_state(struct
 pipe_context *ctx,

        rstate-id = R600_PIPE_STATE_SAMPLER;
        util_pack_color(state-border_color, PIPE_FORMAT_B8G8R8A8_UNORM,
 uc);
 +       unsigned aniso_flag_offset = state-max_anisotropy  1 ? 4 : 0;
        r600_pipe_state_add_reg(rstate, R_03C000_SQ_TEX_SAMPLER_WORD0_0,
                        S_03C000_CLAMP_X(r600_tex_wrap(state-wrap_s)) |
                        S_03C000_CLAMP_Y(r600_tex_wrap(state-wrap_t)) |
                        S_03C000_CLAMP_Z(r600_tex_wrap(state-wrap_r)) |
 -
 S_03C000_XY_MAG_FILTER(r600_tex_filter(state-mag_img_filter)) |
 -
 S_03C000_XY_MIN_FILTER(r600_tex_filter(state-min_img_filter)) |
 +
 S_03C000_XY_MAG_FILTER(r600_tex_filter(state-mag_img_filter) |
 aniso_flag_offset) |
 +
 S_03C000_XY_MIN_FILTER(r600_tex_filter(state-min_img_filter) |
 aniso_flag_offset) |

  S_03C000_MIP_FILTER(r600_tex_mipfilter(state-min_mip_filter)) |
 +
 S_03C000_ANISO(r600_tex_aniso_filter(state-max_anisotropy)) |

  S_03C000_DEPTH_COMPARE_FUNCTION(r600_tex_compare(state-compare_func)) |
                        S_03C000_BORDER_COLOR_TYPE(uc.ui ?
 V_03C000_SQ_TEX_BORDER_COLOR_REGISTER : 0), 0x, NULL);
        r600_pipe_state_add_reg(rstate, R_03C004_SQ_TEX_SAMPLER_WORD1_0,
 @@ -492,7 +505,8 @@ static struct pipe_sampler_view
 *r600_create_sampler_view(struct pipe_context *c
                                S_038014_BASE_ARRAY(state-u.tex.first_layer)
 |
                                S_038014_LAST_ARRAY(state-u.tex.last_layer),
 0x, NULL);
        r600_pipe_state_add_reg(rstate, R_038018_RESOURCE0_WORD6,
 -
 S_038018_TYPE(V_038010_SQ_TEX_VTX_VALID_TEXTURE), 0x, NULL);
 +
 S_038018_TYPE(V_038010_SQ_TEX_VTX_VALID_TEXTURE) |
 +                               S_038018_ANISO(4 /* max 16 samples */),
 0x, NULL);

        return resource-base;
  }
 diff --git a/src/gallium/drivers/r600/r600d.h
 b/src/gallium/drivers/r600/r600d.h
 index 8296b52..c997462 100644
 --- a/src/gallium/drivers/r600/r600d.h
 +++ b/src/gallium/drivers/r600/r600d.h
 @@ -1012,6 +1012,9 @@
  #define   S_038018_MPEG_CLAMP(x)                       (((x)  0x3)  0)
  #define   G_038018_MPEG_CLAMP(x)                       (((x)  0)  0x3)
  #define   C_038018_MPEG_CLAMP                          0xFFFC
 +#define   S_038018_ANISO(x)                            (((x)  0x7)  2)
 +#define   G_038018_ANISO(x)                            (((x)  2)  0x7)
 +#define   C_038018_ANISO                               0xFFE3
  #define   S_038018_PERF_MODULATION(x)                  (((x)  0x7)  5)
  #define   G_038018_PERF_MODULATION(x)                  (((x)  5)  0x7)
  #define   C_038018_PERF_MODULATION                     0xFF1F
 @@ -1090,6 +1093,9 @@
  #define   S_03C000_MIP_FILTER(x)                       (((x)  0x3)  17)
  #define   G_03C000_MIP_FILTER(x)                       (((x)  17)  0x3)
  #define   C_03C000_MIP_FILTER                          0xFFF9
 +#define   S_03C000_ANISO(x)                            (((x)  0x7)  19)
 +#define   G_03C000_ANISO(x)                            (((x)  19)  0x7)
 +#define   C_03C000_ANISO                               0xFFB7
  #define   S_03C000_BORDER_COLOR_TYPE(x)                (((x)  0x3)  22)
  #define   G_03C000_BORDER_COLOR_TYPE(x)                (((x)  22)  0x3)
  #define   C_03C000_BORDER_COLOR_TYPE                   0xFF3F
 @@ -1152,6 +1158,9 @@
  #define   S_03C008_PERF_Z(x)                           (((x)  0x3)  18)
  #define   G_03C008_PERF_Z(x)                           (((x)  18)  0x3)
  #define   C_03C008_PERF_Z                              0xFFF3
 +#define   

Re: [Mesa-dev] KWin and Mesa

2011-04-20 Thread Jerome Glisse
On Wed, Apr 20, 2011 at 8:01 AM, Martin Gräßlin mgraess...@kde.org wrote:
 On Wed, 20 Apr 2011 04:32:25 +0200, Henri Verbeet hverb...@gmail.com
 wrote:

 On 19 April 2011 16:52, Martin Gräßlin mgraess...@kde.org wrote:

 Hi Mesa-devs,

 yesterday I published a rant about Mesa breaking KWin and given some
 comments on Phoronix Forums it seems like there is the wish for more
 communication between our development groups and so I want to start it.
 Please
 note that I am not subscribed to this mailing list, so please keep me in
 CC (I
 might not be able to reply this week at all). It is my wish to never have
 to
 rant about the state of Linux drivers any more and that I never have to
 see
 Mesa breaking KWin again.


 I think there are a couple of points here, some of them already made
 by others. Note that the following is mostly just how I personally see
 things, not necessarily what anyone else thinks.

 Thanks for your mail. This is really constructive and a reply in the kind of
 I hoped to receive. A good starting point to fix the mess we are currently
 in :-)

 First, there's the specific issue your blog post talks about. While I
 understand the issue, and can sympathize somewhat, I essentially think
 you're just wrong there. (Yeah, I can be direct too.) It's perhaps
 unfortunate that this change happened on a minor release, but the
 basic issues are that blacklisting / whitelisting drivers is just a
 bad idea, and you can't depend on renderer strings being stable. If
 you do it anyway, it's going to break, you get to keep all the pieces,
 and you can't blame the drivers.

 Actually I agree with you and all other who wrote it: it is a hack and it
 should not be there. It was added to make KWin at least work around the
 4.5 release. As a matter of fact and that question might sound stupid, where
 do I find information on additional API provided by Mesa than not parsing
 the renderer/version string? In the response to the blog post I received
 replies that we should use DRI2QueryVersion. That was the first time that I
 heard this thing existed. Where is that documented? How can we find out
 about it? I seriously have never ever heard about it or read about it in any
 documentation I have read so far.

 In the more general case, I think hacking around driver bugs is about
 the worst way to deal with driver bugs in GL applications. In the best
 case you're just removing an incentive to fix the bug, but it's more
 likely you just end up creating fragile code or depending on the bug
 somehow.

 The problem is that at the time we release it has to work. Our users do not
 care about whether it is the driver or not. It just has to work. A big
 problem in that regard is as you noticed yourself the distributions. They do
 not ship updates to the drivers, so we need to make it work with the driver
 version out there and not with the next bug fix release. Our work would be
 much easier if we could just tell the users to update their drivers ;-)

Your issue is right there, gnome-shell have been successful dealing
with that because they target a particular mesa version and they set a
lower bar for the GL feature they need. Your issue is that you want to
enable feature that are using too advanced GL stuff for the opensource
driver, GLSL wasn't that good before mesa 7.7 (or even 7.8 can't
remember). What you should do is decide was is the lowest mesa version
you are ready to support and then use that to decide what gl feature
you can safely use. If you want to support debian that would more than
likely mean dropping glsl. Trying to enable feature one by one is a
real bad idea, again i believe here gnome-shell took the right
approach.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >