Re: [Qemu-devel] [PATCH 16/17] s390x: translate engine for s390x CPU

2011-03-31 Thread Peter Maydell
On 29 March 2011 09:55, Alexander Graf ag...@suse.de wrote:
 On 28.03.2011, at 17:40, Peter Maydell wrote:
 Doesn't this take you over MAX_OP_PER_INSTR for some cases?

 I haven't encountered any case where it does.

This untested patch against your v2 ought to make it print
a warning if you do go over the limit:

diff --git a/target-s390x/translate.c b/target-s390x/translate.c
index 6756b84..35a2351 100644
--- a/target-s390x/translate.c
+++ b/target-s390x/translate.c
@@ -5040,7 +5040,7 @@ static inline void
gen_intermediate_code_internal(CPUState *env,
 DisasContext dc;
 target_ulong pc_start;
 uint64_t next_page_start;
-uint16_t *gen_opc_end;
+uint16_t *gen_opc_end, *gen_opc_insnstart;
 int j, lj = -1;
 int num_insns, max_insns;
 CPUBreakpoint *bp;
@@ -5070,6 +5070,8 @@ static inline void
gen_intermediate_code_internal(CPUState *env,
 gen_icount_start();

 do {
+gen_opc_insnstart = gen_opc_ptr;
+
 if (unlikely(!QTAILQ_EMPTY(env-breakpoints))) {
 QTAILQ_FOREACH(bp, env-breakpoints, entry) {
 if (bp-pc == dc.pc) {
@@ -5104,6 +5106,14 @@ static inline void
gen_intermediate_code_internal(CPUState *env,
 if (env-singlestep_enabled) {
 gen_debug(dc);
 }
+
+if (gen_opc_ptr  gen_opc_insnstart + MAX_OP_PER_INSTR) {
+/* Codegen bug, but not a fatal one -- if we really overran the
+ * TCG buffer we will assert later. So just warn.
+ */
+fprintf(stderr, Too many TCG ops for insn before 
+TARGET_FMT_lx  \n, dc.pc);
+}
 } while (!dc.is_jmp  gen_opc_ptr  gen_opc_end  dc.pc  next_page_start
   num_insns  max_insns  !env-singlestep_enabled
   !singlestep);

I'm not yet sure what the right way to do this generically would be.

-- PMM



Re: [Qemu-devel] [PATCH 16/17] s390x: translate engine for s390x CPU

2011-03-29 Thread Alexander Graf

On 28.03.2011, at 17:40, Peter Maydell wrote:

 On 24 March 2011 15:58, Alexander Graf ag...@suse.de wrote:
 diff --git a/target-s390x/translate.c b/target-s390x/translate.c
 
 +typedef struct DisasContext DisasContext;
 +struct DisasContext {
 +uint64_t pc;
 +int is_jmp;
 +enum cc_op cc_op;
 +CPUS390XState *env;
 +struct TranslationBlock *tb;
 +};
 
 I don't think anything actually uses dc-env, does it?
 (I like the way that almost none of the translate.c code
 gets a CPUState pointer, makes it hard to accidentally write
 buggy code that relies on things not in the tb_flags.)

Good idea :)

 
 +static char cpu_reg_names[10*3 + 6*4];
 
 I can see code ins390x_translate_init() which sets this up, but
 I can't see anything which uses it?

It's used during the setup (and needs to be kept alive after that - hence the 
global):

p = cpu_reg_names;
for (i = 0; i  16; i++) {
snprintf(p, cpu_reg_names_size, r%d, i);
regs[i] = tcg_global_mem_new(TCG_AREG0,
 offsetof(CPUState, regs[i]), p);
p += (i  10) ? 3 : 4;
cpu_reg_names_size -= (i  10) ? 3 : 4;
}


 
 +#if 0  /* reads four when it should read only 3 */
 +case 2:
 
 Is there any point having #if'd out broken code?
 Either fix it and enable it, or just have a comment
 to the effect that we could have optimised versions
 for cases 2, 4, 5, 6 but currently do not.

I wanted to keep it in as documentation. But yeah, might make sense to just 
remove it.

 
 +case 0x4:  /* LMG  R1,R3,D2(B2) [RSE] */
 +case 0x24: /* STMG R1,R3,D2(B2) [RSE] */
 +case 0x26: /* STMH R1,R3,D2(B2) [RSE] */
 +case 0x96: /* LMH  R1,R3,D2(B2) [RSE] */
 +/* Apparently, unrolling lmg/stmg of any size gains performance -
 +   even for very long ones... */
 
 Doesn't this take you over MAX_OP_PER_INSTR for some cases?

I haven't encountered any case where it does.

 
 +tmp2 = tcg_const_i64uint64_t)i2)  48) | 
 0xULL);
 
 This line is over 80 chars, as are a handful of others in this file.

Yeah, I generally see the 80 char limit as soft limit and make it hard on ~90. 
If a line is only over it by very little, readability doesn't improve by 
breaking it up. So far, everyone agreed to that approach :).

 
 +case 0xa: /* SVCI [RR] */
 +insn = ld_code2(s-pc);
 +debug_insn(insn);
 +i = insn  0xff;
 +#ifdef CONFIG_USER_ONLY
 +s-pc += 2;
 +#endif
 +update_psw_addr(s);
 +gen_op_calc_cc(s);
 
 Why do we only need to update s-pc if CONFIG_USER_ONLY?
 Not saying it's wrong, but it could use an explanatory comment...

The user code needs to know where it jumps back to, while the exception 
generation code needs to get the exact position it was in to generate some more 
metadata. I'm not sure a comment really would be helpful here - it's an 
implementation detail that is hard to explain properly in a few words.


Alex




Re: [Qemu-devel] [PATCH 16/17] s390x: translate engine for s390x CPU

2011-03-29 Thread Peter Maydell
On 29 March 2011 09:55, Alexander Graf ag...@suse.de wrote:

 On 28.03.2011, at 17:40, Peter Maydell wrote:

 On 24 March 2011 15:58, Alexander Graf ag...@suse.de wrote:
 diff --git a/target-s390x/translate.c b/target-s390x/translate.c
 +    case 0x4:  /* LMG      R1,R3,D2(B2)     [RSE] */
 +    case 0x24: /* STMG     R1,R3,D2(B2)     [RSE] */
 +    case 0x26: /* STMH     R1,R3,D2(B2)     [RSE] */
 +    case 0x96: /* LMH      R1,R3,D2(B2)     [RSE] */
 +        /* Apparently, unrolling lmg/stmg of any size gains performance -
 +           even for very long ones... */

 Doesn't this take you over MAX_OP_PER_INSTR for some cases?

 I haven't encountered any case where it does.

Really? MAX_OP_PER_INSTR's only 96, so if you have 16 registers
in your loop then it only needs 6 ops per register to hit that,
and the op 0x96 case looks like it must generate more than that.

I have an item on my todo list to see if I can add an assert()
check for this limit, because there are cases for Neon load/stores
that apparently hit it.

 +            tmp2 = tcg_const_i64uint64_t)i2)  48) | 
 0xULL);

 This line is over 80 chars, as are a handful of others in this file.

 Yeah, I generally see the 80 char limit as soft limit and make it
 hard on ~90. If a line is only over it by very little, readability
 doesn't improve by breaking it up. So far, everyone agreed to that
 approach :).

80 chars reduces readability for me because I have emacs configured
to make long lines look very ugly so I don't write them :-)

Also, if we want the standard to be 'soft 80, hard 90' we should
say so in CODING_STYLE...

 +    case 0xa: /* SVC    I         [RR] */
 +        insn = ld_code2(s-pc);
 +        debug_insn(insn);
 +        i = insn  0xff;
 +#ifdef CONFIG_USER_ONLY
 +        s-pc += 2;
 +#endif
 +        update_psw_addr(s);
 +        gen_op_calc_cc(s);

 Why do we only need to update s-pc if CONFIG_USER_ONLY?
 Not saying it's wrong, but it could use an explanatory comment...

 The user code needs to know where it jumps back to, while the
 exception generation code needs to get the exact position it was
 in to generate some more metadata.

Ah. For ARM we do this by advancing env-regs[15] in linux-user/main.c
cpu_loop() when we get an EXCP_SWI. It looks like we do it that way
for MIPS and SPARC at least too, so I guess it would be better for
s390 to follow that pattern.

-- PMM



Re: [Qemu-devel] [PATCH 16/17] s390x: translate engine for s390x CPU

2011-03-29 Thread Alexander Graf

On 29.03.2011, at 11:17, Peter Maydell wrote:

 On 29 March 2011 09:55, Alexander Graf ag...@suse.de wrote:
 
 On 28.03.2011, at 17:40, Peter Maydell wrote:
 
 On 24 March 2011 15:58, Alexander Graf ag...@suse.de wrote:
 diff --git a/target-s390x/translate.c b/target-s390x/translate.c
 +case 0x4:  /* LMG  R1,R3,D2(B2) [RSE] */
 +case 0x24: /* STMG R1,R3,D2(B2) [RSE] */
 +case 0x26: /* STMH R1,R3,D2(B2) [RSE] */
 +case 0x96: /* LMH  R1,R3,D2(B2) [RSE] */
 +/* Apparently, unrolling lmg/stmg of any size gains performance -
 +   even for very long ones... */
 
 Doesn't this take you over MAX_OP_PER_INSTR for some cases?
 
 I haven't encountered any case where it does.
 
 Really? MAX_OP_PER_INSTR's only 96, so if you have 16 registers
 in your loop then it only needs 6 ops per register to hit that,
 and the op 0x96 case looks like it must generate more than that.
 
 I have an item on my todo list to see if I can add an assert()
 check for this limit, because there are cases for Neon load/stores
 that apparently hit it.

Hrm - might be useful to increase MAX_OP_PER_INSTR then, no?

 
 +tmp2 = tcg_const_i64uint64_t)i2)  48) | 
 0xULL);
 
 This line is over 80 chars, as are a handful of others in this file.
 
 Yeah, I generally see the 80 char limit as soft limit and make it
 hard on ~90. If a line is only over it by very little, readability
 doesn't improve by breaking it up. So far, everyone agreed to that
 approach :).
 
 80 chars reduces readability for me because I have emacs configured
 to make long lines look very ugly so I don't write them :-)

Heh, I have vi configured to color in lines 80 chars as well, so I usually 
only keep them there 

 Also, if we want the standard to be 'soft 80, hard 90' we should
 say so in CODING_STYLE...

*shrug* so far CODING_STYLE has only brought badness to qemu. The new style is 
less readable than the common dominator that was there before (Fabrice's coding 
style) and resulted in man-years of wasted time on rejected patches for the 
sake of braces. I'd rather want to remove the file than patching it (which 
again would create a mail thread of 300 mails, waste 5 man-years of 
productivity and bring us no gain in the end).

 
 +case 0xa: /* SVCI [RR] */
 +insn = ld_code2(s-pc);
 +debug_insn(insn);
 +i = insn  0xff;
 +#ifdef CONFIG_USER_ONLY
 +s-pc += 2;
 +#endif
 +update_psw_addr(s);
 +gen_op_calc_cc(s);
 
 Why do we only need to update s-pc if CONFIG_USER_ONLY?
 Not saying it's wrong, but it could use an explanatory comment...
 
 The user code needs to know where it jumps back to, while the
 exception generation code needs to get the exact position it was
 in to generate some more metadata.
 
 Ah. For ARM we do this by advancing env-regs[15] in linux-user/main.c
 cpu_loop() when we get an EXCP_SWI. It looks like we do it that way
 for MIPS and SPARC at least too, so I guess it would be better for
 s390 to follow that pattern.

Unfortunately, it's not that easy as there are 2 different ways of issuing an 
SVC (actual SVC and EXECUTE instruction), both of which having different 
instruction lengths. So we really need to keep the information in the 
instruction decoder :(


Alex




Re: [Qemu-devel] [PATCH 16/17] s390x: translate engine for s390x CPU

2011-03-29 Thread Peter Maydell
On 29 March 2011 10:25, Alexander Graf ag...@suse.de wrote:
 On 29.03.2011, at 11:17, Peter Maydell wrote:
 On 29 March 2011 09:55, Alexander Graf ag...@suse.de wrote:
 On 28.03.2011, at 17:40, Peter Maydell wrote:
 Doesn't this take you over MAX_OP_PER_INSTR for some cases?

 I haven't encountered any case where it does.

 Really? MAX_OP_PER_INSTR's only 96, so if you have 16 registers
 in your loop then it only needs 6 ops per register to hit that,
 and the op 0x96 case looks like it must generate more than that.

 I have an item on my todo list to see if I can add an assert()
 check for this limit, because there are cases for Neon load/stores
 that apparently hit it.

 Hrm - might be useful to increase MAX_OP_PER_INSTR then, no?

The TCG README suggests that 20 ops is the point at which it's
better to use a helper function. I don't know if that advice
still holds true or not.

 The user code needs to know where it jumps back to, while the
 exception generation code needs to get the exact position it was
 in to generate some more metadata.

 Ah. For ARM we do this by advancing env-regs[15] in linux-user/main.c
 cpu_loop() when we get an EXCP_SWI. It looks like we do it that way
 for MIPS and SPARC at least too, so I guess it would be better for
 s390 to follow that pattern.

 Unfortunately, it's not that easy as there are 2 different ways of
 issuing an SVC (actual SVC and EXECUTE instruction), both of which
 having different instruction lengths. So we really need to keep the
 information in the instruction decoder :(

Does a Linux kernel running on a real s/390 system have to deal with
the distinction in its SVC entrypoint?

-- PMM



Re: [Qemu-devel] [PATCH 16/17] s390x: translate engine for s390x CPU

2011-03-29 Thread Alexander Graf

On 29.03.2011, at 11:56, Peter Maydell wrote:

 On 29 March 2011 10:25, Alexander Graf ag...@suse.de wrote:
 On 29.03.2011, at 11:17, Peter Maydell wrote:
 On 29 March 2011 09:55, Alexander Graf ag...@suse.de wrote:
 On 28.03.2011, at 17:40, Peter Maydell wrote:
 Doesn't this take you over MAX_OP_PER_INSTR for some cases?
 
 I haven't encountered any case where it does.
 
 Really? MAX_OP_PER_INSTR's only 96, so if you have 16 registers
 in your loop then it only needs 6 ops per register to hit that,
 and the op 0x96 case looks like it must generate more than that.
 
 I have an item on my todo list to see if I can add an assert()
 check for this limit, because there are cases for Neon load/stores
 that apparently hit it.
 
 Hrm - might be useful to increase MAX_OP_PER_INSTR then, no?
 
 The TCG README suggests that 20 ops is the point at which it's
 better to use a helper function. I don't know if that advice
 still holds true or not.

Well, the amount of stores / loads stays the same throughout a helper or not. 
The main reason to not use a heavy helper is that all the register values 
need to be flushed out, adding quite substantial overhead for the stores/loads 
on those. When flushing them, tcg also loses its ability to predict which 
values are used, so dead code elimination is moot.

So yes, complex operations that could be optimized by a C compiler should go 
into helpers - preferably if they only use input and output values that get 
passed as registers. As soon as they touch memory or env though, the world 
isn't as great anymore :(.

 
 The user code needs to know where it jumps back to, while the
 exception generation code needs to get the exact position it was
 in to generate some more metadata.
 
 Ah. For ARM we do this by advancing env-regs[15] in linux-user/main.c
 cpu_loop() when we get an EXCP_SWI. It looks like we do it that way
 for MIPS and SPARC at least too, so I guess it would be better for
 s390 to follow that pattern.
 
 Unfortunately, it's not that easy as there are 2 different ways of
 issuing an SVC (actual SVC and EXECUTE instruction), both of which
 having different instruction lengths. So we really need to keep the
 information in the instruction decoder :(
 
 Does a Linux kernel running on a real s/390 system have to deal with
 the distinction in its SVC entrypoint?

An s390 kernel gets the size information as interrupt parameter, so it can 
simply calculate the NIP from that.


Alex




Re: [Qemu-devel] [PATCH 16/17] s390x: translate engine for s390x CPU

2011-03-28 Thread Peter Maydell
On 24 March 2011 15:58, Alexander Graf ag...@suse.de wrote:
 diff --git a/target-s390x/translate.c b/target-s390x/translate.c

 +typedef struct DisasContext DisasContext;
 +struct DisasContext {
 +    uint64_t pc;
 +    int is_jmp;
 +    enum cc_op cc_op;
 +    CPUS390XState *env;
 +    struct TranslationBlock *tb;
 +};

I don't think anything actually uses dc-env, does it?
(I like the way that almost none of the translate.c code
gets a CPUState pointer, makes it hard to accidentally write
buggy code that relies on things not in the tb_flags.)

 +static char cpu_reg_names[10*3 + 6*4];

I can see code ins390x_translate_init() which sets this up, but
I can't see anything which uses it?

 +#if 0  /* reads four when it should read only 3 */
 +    case 2:

Is there any point having #if'd out broken code?
Either fix it and enable it, or just have a comment
to the effect that we could have optimised versions
for cases 2, 4, 5, 6 but currently do not.

 +    case 0x4:  /* LMG      R1,R3,D2(B2)     [RSE] */
 +    case 0x24: /* STMG     R1,R3,D2(B2)     [RSE] */
 +    case 0x26: /* STMH     R1,R3,D2(B2)     [RSE] */
 +    case 0x96: /* LMH      R1,R3,D2(B2)     [RSE] */
 +        /* Apparently, unrolling lmg/stmg of any size gains performance -
 +           even for very long ones... */

Doesn't this take you over MAX_OP_PER_INSTR for some cases?

 +            tmp2 = tcg_const_i64uint64_t)i2)  48) | 
 0xULL);

This line is over 80 chars, as are a handful of others in this file.

 +case 0xa: /* SVCI [RR] */
 +insn = ld_code2(s-pc);
 +debug_insn(insn);
 +i = insn  0xff;
 +#ifdef CONFIG_USER_ONLY
 +s-pc += 2;
 +#endif
 +update_psw_addr(s);
 +gen_op_calc_cc(s);

Why do we only need to update s-pc if CONFIG_USER_ONLY?
Not saying it's wrong, but it could use an explanatory comment...

-- PMM