Re: [Mesa-dev] [PATCH] r600g: don't reserve more stack space than required v4

2013-04-01 Thread Vincent Lejeune
Btw where can I find some more info on stack_size ?
I assumed it should represent the amout of max stacked exec_mask,
but it looks like it is possible to have much more manually pushed exec_mask 
level
than reported by nstack (iiuc a push count as much as a 1/4 of a loop level).




- Mail original -
 De : Vadim Girlin vadimgir...@gmail.com
 À : Vincent Lejeune v...@ovi.com
 Cc : Alex Deucher alexdeuc...@gmail.com; mesa-dev@lists.freedesktop.org 
 mesa-dev@lists.freedesktop.org
 Envoyé le : Dimanche 31 mars 2013 22h34
 Objet : Re: [Mesa-dev] [PATCH] r600g: don't reserve more stack space than 
 required v4
 
 On 04/01/2013 12:00 AM, Vincent Lejeune wrote:
  Hi Vadim,
 
  Does this patch work ? (It's still not pushed)
 
 It works for me on evergreen, but I'm not sure about other chip generations. 
 I wanted to ask somebody to test it, but the problem is that the piglit 
 coverage 
 for this is not enough (e.g. initial version of this patch had no regressions 
 with piglit but resulted in artifacts with Heaven). I thought about adding 
 more 
 control flow tests but haven't written them yet. The same algorithm 
 seemingly works in my r600-sb branch with other chips, but the test coverage 
 with that branch is even lower due to the if-conversion that eliminates most 
 of 
 the conditional control flow.
 
 I usually prefer not to push any patches until I'm sure that they are not 
 breaking anything. But well, possibly in this case it's easier to simply 
 push it and wait for the bug reports. I think I'll check if it needs 
 rebasing and push it in a day or two if there are no objections.
 
 Vadim
 
  I'm working on doing native control flow for llvm and intend to port 
 your patch on the control flow reservation.
 
  Vincent
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: don't reserve more stack space than required v4

2013-04-01 Thread Vadim Girlin

On 04/02/2013 12:48 AM, Vincent Lejeune wrote:

Btw where can I find some more info on stack_size ?
I assumed it should represent the amout of max stacked exec_mask,
but it looks like it is possible to have much more manually pushed exec_mask 
level
than reported by nstack (iiuc a push count as much as a 1/4 of a loop level).


Yes, different instructions consume different amount of stack space. 
There is an explanation in the ISA docs, section 3.6.5 Stack 
Allocation, it's basically correct but don't expect it to be precise 
regarding the special cases (e.g. in the cayman isa doc comments in the 
table 3.6 look like a copy-paste from r600/r700 docs instead of the 
cayman-specific comments). I've added the additional info that I have 
regarding the special cases for chip generations and my notes as the 
comments in the patch (see callstack_update_max_depth function).


Vadim






- Mail original -

De�: Vadim Girlin vadimgir...@gmail.com
�: Vincent Lejeune v...@ovi.com
Cc�: Alex Deucher alexdeuc...@gmail.com; mesa-dev@lists.freedesktop.org 
mesa-dev@lists.freedesktop.org
Envoy� le : Dimanche 31 mars 2013 22h34
Objet�: Re: [Mesa-dev] [PATCH] r600g: don't reserve more stack space than 
required v4

On 04/01/2013 12:00 AM, Vincent Lejeune wrote:

  Hi Vadim,

  Does this patch work ? (It's still not pushed)


It works for me on evergreen, but I'm not sure about other chip generations.
I wanted to ask somebody to test it, but the problem is that the piglit coverage
for this is not enough (e.g. initial version of this patch had no regressions
with piglit but resulted in artifacts with Heaven). I thought about adding more
control flow tests but haven't written them yet. The same algorithm
seemingly works in my r600-sb branch with other chips, but the test coverage
with that branch is even lower due to the if-conversion that eliminates most of
the conditional control flow.

I usually prefer not to push any patches until I'm sure that they are not
breaking anything. But well, possibly in this case it's easier to simply
push it and wait for the bug reports. I think I'll check if it needs
rebasing and push it in a day or two if there are no objections.

Vadim


  I'm working on doing native control flow for llvm and intend to port

your patch on the control flow reservation.


  Vincent




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: don't reserve more stack space than required v4

2013-03-31 Thread Vincent Lejeune
Hi Vadim,

Does this patch work ? (It's still not pushed)
I'm working on doing native control flow for llvm and intend to port your patch 
on the control flow reservation.

Vincent




- Mail original -
 De : Vadim Girlin vadimgir...@gmail.com
 À : Alex Deucher alexdeuc...@gmail.com
 Cc : mesa-dev@lists.freedesktop.org
 Envoyé le : Vendredi 22 février 2013 1h37
 Objet : Re: [Mesa-dev] [PATCH] r600g: don't reserve more stack space than 
 required v4
 
 On 02/22/2013 04:23 AM, Alex Deucher wrote:
  On Thu, Feb 21, 2013 at 6:52 PM, Vadim Girlin vadimgir...@gmail.com 
 wrote:
  v4: implement exact computation taking into account wavefront size
 
  Signed-off-by: Vadim Girlin vadimgir...@gmail.com
  ---
    src/gallium/drivers/r600/r600_asm.c    |  44 +--
    src/gallium/drivers/r600/r600_asm.h    |  24 --
    src/gallium/drivers/r600/r600_shader.c | 131 
 ++---
    3 files changed, 142 insertions(+), 57 deletions(-)
 
  diff --git a/src/gallium/drivers/r600/r600_asm.c 
 b/src/gallium/drivers/r600/r600_asm.c
  index 3632aa5..f041e27 100644
  --- a/src/gallium/drivers/r600/r600_asm.c
  +++ b/src/gallium/drivers/r600/r600_asm.c
  @@ -86,6 +86,38 @@ static struct r600_bytecode_tex 
 *r600_bytecode_tex(void)
           return tex;
    }
 
  +static unsigned stack_entry_size(enum radeon_family chip) {
  +       /* Wavefront size:
  +        *   64: R600/RV670/RV770/Cypress/R740/Barts/Turks/Caicos/
  +        *       Aruba/Sumo/Sumo2/redwood/juniper
  +        *   32: R630/R730/R710/Palm/Cedar
  +        *   16: R610/Rs780
  +        *
  +        * Stack row size:
  +        *      Wavefront Size                        16  32  48  64
  +        *      Columns per Row (R6xx/R7xx/R8xx only)  8   8   4   4
  +        *      Columns per Row (R9xx+)                8   4   4   4 */
  +
  +       switch (chip) {
  +       /* FIXME: are some chips missing here? */
  +       /* wavefront size 16 */
  +       case CHIP_RV610:
  +       case CHIP_RS780:
 
  RV620
  RS880
 
  Should be 16 as well.
 
 Thanks, I'll add them.
 
 Vadim
 
 
  +       /* wavefront size 32 */
  +       case CHIP_RV630:
  +       case CHIP_RV635:
  +       case CHIP_RV730:
  +       case CHIP_RV710:
  +       case CHIP_PALM:
  +       case CHIP_CEDAR:
  +               return 8;
  +
  +       /* wavefront size 64 */
  +       default:
  +               return 4;
  +       }
  +}
  +
    void r600_bytecode_init(struct r600_bytecode *bc,
                           enum chip_class chip_class,
                           enum radeon_family family,
  @@ -103,6 +135,7 @@ void r600_bytecode_init(struct r600_bytecode *bc,
           LIST_INITHEAD(bc-cf);
           bc-chip_class = chip_class;
           bc-msaa_texture_mode = msaa_texture_mode;
  +       bc-stack.entry_size = stack_entry_size(family);
    }
 
    static int r600_bytecode_add_cf(struct r600_bytecode *bc)
  @@ -1524,8 +1557,8 @@ int r600_bytecode_build(struct r600_bytecode *bc)
           unsigned addr;
           int i, r;
 
  -       if (bc-callstack[0].max  0)
  -               bc-nstack = ((bc-callstack[0].max + 3)  
 2) + 2;
  +       bc-nstack = bc-stack.max_entries;
  +
           if (bc-type == TGSI_PROCESSOR_VERTEX  
 !bc-nstack) {
                   bc-nstack = 1;
           }
  @@ -1826,8 +1859,8 @@ void r600_bytecode_disasm(struct r600_bytecode 
 *bc)
                   chip = '6';
                   break;
           }
  -       fprintf(stderr, bytecode %d dw -- %d gprs 
 -\n,
  -               bc-ndw, bc-ngpr);
  +       fprintf(stderr, bytecode %d dw -- %d gprs -- %d nstack 
 -\n,
  +               bc-ndw, bc-ngpr, bc-nstack);
           fprintf(stderr, shader %d -- %c\n, index++, 
 chip);
 
           LIST_FOR_EACH_ENTRY(cf, bc-cf, list) {
  @@ -2105,7 +2138,8 @@ void r600_bytecode_dump(struct r600_bytecode *bc)
                   chip = '6';
                   break;
           }
  -       fprintf(stderr, bytecode %d dw -- %d gprs 
 -\n, bc-ndw, bc-ngpr);
  +       fprintf(stderr, bytecode %d dw -- %d gprs -- %d nstack 
 -\n,
  +               bc-ndw, bc-ngpr, bc-nstack);
           fprintf(stderr,      %c\n, chip);
 
           LIST_FOR_EACH_ENTRY(cf, bc-cf, list) {
  diff --git a/src/gallium/drivers/r600/r600_asm.h 
 b/src/gallium/drivers/r600/r600_asm.h
  index 03cd238..5a9869d 100644
  --- a/src/gallium/drivers/r600/r600_asm.h
  +++ b/src/gallium/drivers/r600/r600_asm.h
  @@ -173,16 +173,25 @@ struct r600_cf_stack_entry {
    };
 
    #define SQ_MAX_CALL_DEPTH 0x0020
  -struct r600_cf_callstack {
  -       unsigned                        fc_sp_before_entry;
  -       int                             sub_desc_index;
  -       int                             current;
  -       int                             max;
  -};
 
    #define AR_HANDLE_NORMAL 0
    #define AR_HANDLE_RV6XX 1 /* except RV670 */
 
  +struct r600_stack_info {
  +       /* current level of non

Re: [Mesa-dev] [PATCH] r600g: don't reserve more stack space than required v4

2013-03-31 Thread Vadim Girlin

On 04/01/2013 12:00 AM, Vincent Lejeune wrote:

Hi Vadim,

Does this patch work ? (It's still not pushed)


It works for me on evergreen, but I'm not sure about other chip 
generations. I wanted to ask somebody to test it, but the problem is 
that the piglit coverage for this is not enough (e.g. initial version of 
this patch had no regressions with piglit but resulted in artifacts with 
Heaven). I thought about adding more control flow tests but haven't 
written them yet. The same algorithm seemingly works in my r600-sb 
branch with other chips, but the test coverage with that branch is even 
lower due to the if-conversion that eliminates most of the conditional 
control flow.


I usually prefer not to push any patches until I'm sure that they are 
not breaking anything. But well, possibly in this case it's easier to 
simply push it and wait for the bug reports. I think I'll check if it 
needs rebasing and push it in a day or two if there are no objections.


Vadim


I'm working on doing native control flow for llvm and intend to port your patch 
on the control flow reservation.

Vincent


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: don't reserve more stack space than required v4

2013-02-21 Thread Vadim Girlin
v4: implement exact computation taking into account wavefront size

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
 src/gallium/drivers/r600/r600_asm.c|  44 +--
 src/gallium/drivers/r600/r600_asm.h|  24 --
 src/gallium/drivers/r600/r600_shader.c | 131 ++---
 3 files changed, 142 insertions(+), 57 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index 3632aa5..f041e27 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -86,6 +86,38 @@ static struct r600_bytecode_tex *r600_bytecode_tex(void)
return tex;
 }
 
+static unsigned stack_entry_size(enum radeon_family chip) {
+   /* Wavefront size:
+*   64: R600/RV670/RV770/Cypress/R740/Barts/Turks/Caicos/
+*   Aruba/Sumo/Sumo2/redwood/juniper
+*   32: R630/R730/R710/Palm/Cedar
+*   16: R610/Rs780
+*
+* Stack row size:
+*  Wavefront Size16  32  48  64
+*  Columns per Row (R6xx/R7xx/R8xx only)  8   8   4   4
+*  Columns per Row (R9xx+)8   4   4   4 */
+
+   switch (chip) {
+   /* FIXME: are some chips missing here? */
+   /* wavefront size 16 */
+   case CHIP_RV610:
+   case CHIP_RS780:
+   /* wavefront size 32 */
+   case CHIP_RV630:
+   case CHIP_RV635:
+   case CHIP_RV730:
+   case CHIP_RV710:
+   case CHIP_PALM:
+   case CHIP_CEDAR:
+   return 8;
+
+   /* wavefront size 64 */
+   default:
+   return 4;
+   }
+}
+
 void r600_bytecode_init(struct r600_bytecode *bc,
enum chip_class chip_class,
enum radeon_family family,
@@ -103,6 +135,7 @@ void r600_bytecode_init(struct r600_bytecode *bc,
LIST_INITHEAD(bc-cf);
bc-chip_class = chip_class;
bc-msaa_texture_mode = msaa_texture_mode;
+   bc-stack.entry_size = stack_entry_size(family);
 }
 
 static int r600_bytecode_add_cf(struct r600_bytecode *bc)
@@ -1524,8 +1557,8 @@ int r600_bytecode_build(struct r600_bytecode *bc)
unsigned addr;
int i, r;
 
-   if (bc-callstack[0].max  0)
-   bc-nstack = ((bc-callstack[0].max + 3)  2) + 2;
+   bc-nstack = bc-stack.max_entries;
+
if (bc-type == TGSI_PROCESSOR_VERTEX  !bc-nstack) {
bc-nstack = 1;
}
@@ -1826,8 +1859,8 @@ void r600_bytecode_disasm(struct r600_bytecode *bc)
chip = '6';
break;
}
-   fprintf(stderr, bytecode %d dw -- %d gprs -\n,
-   bc-ndw, bc-ngpr);
+   fprintf(stderr, bytecode %d dw -- %d gprs -- %d nstack 
-\n,
+   bc-ndw, bc-ngpr, bc-nstack);
fprintf(stderr, shader %d -- %c\n, index++, chip);
 
LIST_FOR_EACH_ENTRY(cf, bc-cf, list) {
@@ -2105,7 +2138,8 @@ void r600_bytecode_dump(struct r600_bytecode *bc)
chip = '6';
break;
}
-   fprintf(stderr, bytecode %d dw -- %d gprs -\n, 
bc-ndw, bc-ngpr);
+   fprintf(stderr, bytecode %d dw -- %d gprs -- %d nstack 
-\n,
+   bc-ndw, bc-ngpr, bc-nstack);
fprintf(stderr,  %c\n, chip);
 
LIST_FOR_EACH_ENTRY(cf, bc-cf, list) {
diff --git a/src/gallium/drivers/r600/r600_asm.h 
b/src/gallium/drivers/r600/r600_asm.h
index 03cd238..5a9869d 100644
--- a/src/gallium/drivers/r600/r600_asm.h
+++ b/src/gallium/drivers/r600/r600_asm.h
@@ -173,16 +173,25 @@ struct r600_cf_stack_entry {
 };
 
 #define SQ_MAX_CALL_DEPTH 0x0020
-struct r600_cf_callstack {
-   unsignedfc_sp_before_entry;
-   int sub_desc_index;
-   int current;
-   int max;
-};
 
 #define AR_HANDLE_NORMAL 0
 #define AR_HANDLE_RV6XX 1 /* except RV670 */
 
+struct r600_stack_info {
+   /* current level of non-WQM PUSH operations
+* (PUSH, PUSH_ELSE, ALU_PUSH_BEFORE) */
+   int push;
+   /* current level of WQM PUSH operations
+* (PUSH, PUSH_ELSE, PUSH_WQM) */
+   int push_wqm;
+   /* current loop level */
+   int loop;
+
+   /* required depth */
+   int max_entries;
+   /* subentries per entry */
+   int entry_size;
+};
 
 struct r600_bytecode {
enum chip_class chip_class;
@@ -199,8 +208,7 @@ struct r600_bytecode {
uint32_t*bytecode;
uint32_tfc_sp;
struct r600_cf_stack_entry  fc_stack[32];
-   unsignedcall_sp;
-   struct r600_cf_callstackcallstack[SQ_MAX_CALL_DEPTH];
+   struct r600_stack_info  stack;
unsignedar_loaded;
unsignedar_reg;
unsignedar_chan;
diff --git 

Re: [Mesa-dev] [PATCH] r600g: don't reserve more stack space than required v4

2013-02-21 Thread Alex Deucher
On Thu, Feb 21, 2013 at 6:52 PM, Vadim Girlin vadimgir...@gmail.com wrote:
 v4: implement exact computation taking into account wavefront size

 Signed-off-by: Vadim Girlin vadimgir...@gmail.com
 ---
  src/gallium/drivers/r600/r600_asm.c|  44 +--
  src/gallium/drivers/r600/r600_asm.h|  24 --
  src/gallium/drivers/r600/r600_shader.c | 131 
 ++---
  3 files changed, 142 insertions(+), 57 deletions(-)

 diff --git a/src/gallium/drivers/r600/r600_asm.c 
 b/src/gallium/drivers/r600/r600_asm.c
 index 3632aa5..f041e27 100644
 --- a/src/gallium/drivers/r600/r600_asm.c
 +++ b/src/gallium/drivers/r600/r600_asm.c
 @@ -86,6 +86,38 @@ static struct r600_bytecode_tex *r600_bytecode_tex(void)
 return tex;
  }

 +static unsigned stack_entry_size(enum radeon_family chip) {
 +   /* Wavefront size:
 +*   64: R600/RV670/RV770/Cypress/R740/Barts/Turks/Caicos/
 +*   Aruba/Sumo/Sumo2/redwood/juniper
 +*   32: R630/R730/R710/Palm/Cedar
 +*   16: R610/Rs780
 +*
 +* Stack row size:
 +*  Wavefront Size16  32  48  64
 +*  Columns per Row (R6xx/R7xx/R8xx only)  8   8   4   4
 +*  Columns per Row (R9xx+)8   4   4   4 */
 +
 +   switch (chip) {
 +   /* FIXME: are some chips missing here? */
 +   /* wavefront size 16 */
 +   case CHIP_RV610:
 +   case CHIP_RS780:

RV620
RS880

Should be 16 as well.

 +   /* wavefront size 32 */
 +   case CHIP_RV630:
 +   case CHIP_RV635:
 +   case CHIP_RV730:
 +   case CHIP_RV710:
 +   case CHIP_PALM:
 +   case CHIP_CEDAR:
 +   return 8;
 +
 +   /* wavefront size 64 */
 +   default:
 +   return 4;
 +   }
 +}
 +
  void r600_bytecode_init(struct r600_bytecode *bc,
 enum chip_class chip_class,
 enum radeon_family family,
 @@ -103,6 +135,7 @@ void r600_bytecode_init(struct r600_bytecode *bc,
 LIST_INITHEAD(bc-cf);
 bc-chip_class = chip_class;
 bc-msaa_texture_mode = msaa_texture_mode;
 +   bc-stack.entry_size = stack_entry_size(family);
  }

  static int r600_bytecode_add_cf(struct r600_bytecode *bc)
 @@ -1524,8 +1557,8 @@ int r600_bytecode_build(struct r600_bytecode *bc)
 unsigned addr;
 int i, r;

 -   if (bc-callstack[0].max  0)
 -   bc-nstack = ((bc-callstack[0].max + 3)  2) + 2;
 +   bc-nstack = bc-stack.max_entries;
 +
 if (bc-type == TGSI_PROCESSOR_VERTEX  !bc-nstack) {
 bc-nstack = 1;
 }
 @@ -1826,8 +1859,8 @@ void r600_bytecode_disasm(struct r600_bytecode *bc)
 chip = '6';
 break;
 }
 -   fprintf(stderr, bytecode %d dw -- %d gprs -\n,
 -   bc-ndw, bc-ngpr);
 +   fprintf(stderr, bytecode %d dw -- %d gprs -- %d nstack 
 -\n,
 +   bc-ndw, bc-ngpr, bc-nstack);
 fprintf(stderr, shader %d -- %c\n, index++, chip);

 LIST_FOR_EACH_ENTRY(cf, bc-cf, list) {
 @@ -2105,7 +2138,8 @@ void r600_bytecode_dump(struct r600_bytecode *bc)
 chip = '6';
 break;
 }
 -   fprintf(stderr, bytecode %d dw -- %d gprs -\n, 
 bc-ndw, bc-ngpr);
 +   fprintf(stderr, bytecode %d dw -- %d gprs -- %d nstack 
 -\n,
 +   bc-ndw, bc-ngpr, bc-nstack);
 fprintf(stderr,  %c\n, chip);

 LIST_FOR_EACH_ENTRY(cf, bc-cf, list) {
 diff --git a/src/gallium/drivers/r600/r600_asm.h 
 b/src/gallium/drivers/r600/r600_asm.h
 index 03cd238..5a9869d 100644
 --- a/src/gallium/drivers/r600/r600_asm.h
 +++ b/src/gallium/drivers/r600/r600_asm.h
 @@ -173,16 +173,25 @@ struct r600_cf_stack_entry {
  };

  #define SQ_MAX_CALL_DEPTH 0x0020
 -struct r600_cf_callstack {
 -   unsignedfc_sp_before_entry;
 -   int sub_desc_index;
 -   int current;
 -   int max;
 -};

  #define AR_HANDLE_NORMAL 0
  #define AR_HANDLE_RV6XX 1 /* except RV670 */

 +struct r600_stack_info {
 +   /* current level of non-WQM PUSH operations
 +* (PUSH, PUSH_ELSE, ALU_PUSH_BEFORE) */
 +   int push;
 +   /* current level of WQM PUSH operations
 +* (PUSH, PUSH_ELSE, PUSH_WQM) */
 +   int push_wqm;
 +   /* current loop level */
 +   int loop;
 +
 +   /* required depth */
 +   int max_entries;
 +   /* subentries per entry */
 +   int entry_size;
 +};

  struct r600_bytecode {
 enum chip_class chip_class;
 @@ -199,8 +208,7 @@ struct r600_bytecode {
 uint32_t*bytecode;
 uint32_tfc_sp;
 struct r600_cf_stack_entry  fc_stack[32];
 -   unsignedcall_sp;
 -  

Re: [Mesa-dev] [PATCH] r600g: don't reserve more stack space than required v4

2013-02-21 Thread Vadim Girlin

On 02/22/2013 04:23 AM, Alex Deucher wrote:

On Thu, Feb 21, 2013 at 6:52 PM, Vadim Girlin vadimgir...@gmail.com wrote:

v4: implement exact computation taking into account wavefront size

Signed-off-by: Vadim Girlin vadimgir...@gmail.com
---
  src/gallium/drivers/r600/r600_asm.c|  44 +--
  src/gallium/drivers/r600/r600_asm.h|  24 --
  src/gallium/drivers/r600/r600_shader.c | 131 ++---
  3 files changed, 142 insertions(+), 57 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index 3632aa5..f041e27 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -86,6 +86,38 @@ static struct r600_bytecode_tex *r600_bytecode_tex(void)
 return tex;
  }

+static unsigned stack_entry_size(enum radeon_family chip) {
+   /* Wavefront size:
+*   64: R600/RV670/RV770/Cypress/R740/Barts/Turks/Caicos/
+*   Aruba/Sumo/Sumo2/redwood/juniper
+*   32: R630/R730/R710/Palm/Cedar
+*   16: R610/Rs780
+*
+* Stack row size:
+*  Wavefront Size16  32  48  64
+*  Columns per Row (R6xx/R7xx/R8xx only)  8   8   4   4
+*  Columns per Row (R9xx+)8   4   4   4 */
+
+   switch (chip) {
+   /* FIXME: are some chips missing here? */
+   /* wavefront size 16 */
+   case CHIP_RV610:
+   case CHIP_RS780:


RV620
RS880

Should be 16 as well.


Thanks, I'll add them.

Vadim




+   /* wavefront size 32 */
+   case CHIP_RV630:
+   case CHIP_RV635:
+   case CHIP_RV730:
+   case CHIP_RV710:
+   case CHIP_PALM:
+   case CHIP_CEDAR:
+   return 8;
+
+   /* wavefront size 64 */
+   default:
+   return 4;
+   }
+}
+
  void r600_bytecode_init(struct r600_bytecode *bc,
 enum chip_class chip_class,
 enum radeon_family family,
@@ -103,6 +135,7 @@ void r600_bytecode_init(struct r600_bytecode *bc,
 LIST_INITHEAD(bc-cf);
 bc-chip_class = chip_class;
 bc-msaa_texture_mode = msaa_texture_mode;
+   bc-stack.entry_size = stack_entry_size(family);
  }

  static int r600_bytecode_add_cf(struct r600_bytecode *bc)
@@ -1524,8 +1557,8 @@ int r600_bytecode_build(struct r600_bytecode *bc)
 unsigned addr;
 int i, r;

-   if (bc-callstack[0].max  0)
-   bc-nstack = ((bc-callstack[0].max + 3)  2) + 2;
+   bc-nstack = bc-stack.max_entries;
+
 if (bc-type == TGSI_PROCESSOR_VERTEX  !bc-nstack) {
 bc-nstack = 1;
 }
@@ -1826,8 +1859,8 @@ void r600_bytecode_disasm(struct r600_bytecode *bc)
 chip = '6';
 break;
 }
-   fprintf(stderr, bytecode %d dw -- %d gprs -\n,
-   bc-ndw, bc-ngpr);
+   fprintf(stderr, bytecode %d dw -- %d gprs -- %d nstack 
-\n,
+   bc-ndw, bc-ngpr, bc-nstack);
 fprintf(stderr, shader %d -- %c\n, index++, chip);

 LIST_FOR_EACH_ENTRY(cf, bc-cf, list) {
@@ -2105,7 +2138,8 @@ void r600_bytecode_dump(struct r600_bytecode *bc)
 chip = '6';
 break;
 }
-   fprintf(stderr, bytecode %d dw -- %d gprs -\n, 
bc-ndw, bc-ngpr);
+   fprintf(stderr, bytecode %d dw -- %d gprs -- %d nstack 
-\n,
+   bc-ndw, bc-ngpr, bc-nstack);
 fprintf(stderr,  %c\n, chip);

 LIST_FOR_EACH_ENTRY(cf, bc-cf, list) {
diff --git a/src/gallium/drivers/r600/r600_asm.h 
b/src/gallium/drivers/r600/r600_asm.h
index 03cd238..5a9869d 100644
--- a/src/gallium/drivers/r600/r600_asm.h
+++ b/src/gallium/drivers/r600/r600_asm.h
@@ -173,16 +173,25 @@ struct r600_cf_stack_entry {
  };

  #define SQ_MAX_CALL_DEPTH 0x0020
-struct r600_cf_callstack {
-   unsignedfc_sp_before_entry;
-   int sub_desc_index;
-   int current;
-   int max;
-};

  #define AR_HANDLE_NORMAL 0
  #define AR_HANDLE_RV6XX 1 /* except RV670 */

+struct r600_stack_info {
+   /* current level of non-WQM PUSH operations
+* (PUSH, PUSH_ELSE, ALU_PUSH_BEFORE) */
+   int push;
+   /* current level of WQM PUSH operations
+* (PUSH, PUSH_ELSE, PUSH_WQM) */
+   int push_wqm;
+   /* current loop level */
+   int loop;
+
+   /* required depth */
+   int max_entries;
+   /* subentries per entry */
+   int entry_size;
+};

  struct r600_bytecode {
 enum chip_class chip_class;
@@ -199,8 +208,7 @@ struct r600_bytecode {
 uint32_t*bytecode;
 uint32_tfc_sp;
 struct r600_cf_stack_entry  fc_stack[32];
-   unsignedcall_sp;
-   struct