Re: [Qemu-devel] [PATCH 16/17] s390x: translate engine for s390x CPU
On 29 March 2011 09:55, Alexander Graf ag...@suse.de wrote: On 28.03.2011, at 17:40, Peter Maydell wrote: Doesn't this take you over MAX_OP_PER_INSTR for some cases? I haven't encountered any case where it does. This untested patch against your v2 ought to make it print a warning if you do go over the limit: diff --git a/target-s390x/translate.c b/target-s390x/translate.c index 6756b84..35a2351 100644 --- a/target-s390x/translate.c +++ b/target-s390x/translate.c @@ -5040,7 +5040,7 @@ static inline void gen_intermediate_code_internal(CPUState *env, DisasContext dc; target_ulong pc_start; uint64_t next_page_start; -uint16_t *gen_opc_end; +uint16_t *gen_opc_end, *gen_opc_insnstart; int j, lj = -1; int num_insns, max_insns; CPUBreakpoint *bp; @@ -5070,6 +5070,8 @@ static inline void gen_intermediate_code_internal(CPUState *env, gen_icount_start(); do { +gen_opc_insnstart = gen_opc_ptr; + if (unlikely(!QTAILQ_EMPTY(env-breakpoints))) { QTAILQ_FOREACH(bp, env-breakpoints, entry) { if (bp-pc == dc.pc) { @@ -5104,6 +5106,14 @@ static inline void gen_intermediate_code_internal(CPUState *env, if (env-singlestep_enabled) { gen_debug(dc); } + +if (gen_opc_ptr gen_opc_insnstart + MAX_OP_PER_INSTR) { +/* Codegen bug, but not a fatal one -- if we really overran the + * TCG buffer we will assert later. So just warn. + */ +fprintf(stderr, Too many TCG ops for insn before +TARGET_FMT_lx \n, dc.pc); +} } while (!dc.is_jmp gen_opc_ptr gen_opc_end dc.pc next_page_start num_insns max_insns !env-singlestep_enabled !singlestep); I'm not yet sure what the right way to do this generically would be. -- PMM
Re: [Qemu-devel] [PATCH 16/17] s390x: translate engine for s390x CPU
On 28.03.2011, at 17:40, Peter Maydell wrote: On 24 March 2011 15:58, Alexander Graf ag...@suse.de wrote: diff --git a/target-s390x/translate.c b/target-s390x/translate.c +typedef struct DisasContext DisasContext; +struct DisasContext { +uint64_t pc; +int is_jmp; +enum cc_op cc_op; +CPUS390XState *env; +struct TranslationBlock *tb; +}; I don't think anything actually uses dc-env, does it? (I like the way that almost none of the translate.c code gets a CPUState pointer, makes it hard to accidentally write buggy code that relies on things not in the tb_flags.) Good idea :) +static char cpu_reg_names[10*3 + 6*4]; I can see code ins390x_translate_init() which sets this up, but I can't see anything which uses it? It's used during the setup (and needs to be kept alive after that - hence the global): p = cpu_reg_names; for (i = 0; i 16; i++) { snprintf(p, cpu_reg_names_size, r%d, i); regs[i] = tcg_global_mem_new(TCG_AREG0, offsetof(CPUState, regs[i]), p); p += (i 10) ? 3 : 4; cpu_reg_names_size -= (i 10) ? 3 : 4; } +#if 0 /* reads four when it should read only 3 */ +case 2: Is there any point having #if'd out broken code? Either fix it and enable it, or just have a comment to the effect that we could have optimised versions for cases 2, 4, 5, 6 but currently do not. I wanted to keep it in as documentation. But yeah, might make sense to just remove it. +case 0x4: /* LMG R1,R3,D2(B2) [RSE] */ +case 0x24: /* STMG R1,R3,D2(B2) [RSE] */ +case 0x26: /* STMH R1,R3,D2(B2) [RSE] */ +case 0x96: /* LMH R1,R3,D2(B2) [RSE] */ +/* Apparently, unrolling lmg/stmg of any size gains performance - + even for very long ones... */ Doesn't this take you over MAX_OP_PER_INSTR for some cases? I haven't encountered any case where it does. +tmp2 = tcg_const_i64uint64_t)i2) 48) | 0xULL); This line is over 80 chars, as are a handful of others in this file. Yeah, I generally see the 80 char limit as soft limit and make it hard on ~90. If a line is only over it by very little, readability doesn't improve by breaking it up. So far, everyone agreed to that approach :). +case 0xa: /* SVCI [RR] */ +insn = ld_code2(s-pc); +debug_insn(insn); +i = insn 0xff; +#ifdef CONFIG_USER_ONLY +s-pc += 2; +#endif +update_psw_addr(s); +gen_op_calc_cc(s); Why do we only need to update s-pc if CONFIG_USER_ONLY? Not saying it's wrong, but it could use an explanatory comment... The user code needs to know where it jumps back to, while the exception generation code needs to get the exact position it was in to generate some more metadata. I'm not sure a comment really would be helpful here - it's an implementation detail that is hard to explain properly in a few words. Alex
Re: [Qemu-devel] [PATCH 16/17] s390x: translate engine for s390x CPU
On 29 March 2011 09:55, Alexander Graf ag...@suse.de wrote: On 28.03.2011, at 17:40, Peter Maydell wrote: On 24 March 2011 15:58, Alexander Graf ag...@suse.de wrote: diff --git a/target-s390x/translate.c b/target-s390x/translate.c + case 0x4: /* LMG R1,R3,D2(B2) [RSE] */ + case 0x24: /* STMG R1,R3,D2(B2) [RSE] */ + case 0x26: /* STMH R1,R3,D2(B2) [RSE] */ + case 0x96: /* LMH R1,R3,D2(B2) [RSE] */ + /* Apparently, unrolling lmg/stmg of any size gains performance - + even for very long ones... */ Doesn't this take you over MAX_OP_PER_INSTR for some cases? I haven't encountered any case where it does. Really? MAX_OP_PER_INSTR's only 96, so if you have 16 registers in your loop then it only needs 6 ops per register to hit that, and the op 0x96 case looks like it must generate more than that. I have an item on my todo list to see if I can add an assert() check for this limit, because there are cases for Neon load/stores that apparently hit it. + tmp2 = tcg_const_i64uint64_t)i2) 48) | 0xULL); This line is over 80 chars, as are a handful of others in this file. Yeah, I generally see the 80 char limit as soft limit and make it hard on ~90. If a line is only over it by very little, readability doesn't improve by breaking it up. So far, everyone agreed to that approach :). 80 chars reduces readability for me because I have emacs configured to make long lines look very ugly so I don't write them :-) Also, if we want the standard to be 'soft 80, hard 90' we should say so in CODING_STYLE... + case 0xa: /* SVC I [RR] */ + insn = ld_code2(s-pc); + debug_insn(insn); + i = insn 0xff; +#ifdef CONFIG_USER_ONLY + s-pc += 2; +#endif + update_psw_addr(s); + gen_op_calc_cc(s); Why do we only need to update s-pc if CONFIG_USER_ONLY? Not saying it's wrong, but it could use an explanatory comment... The user code needs to know where it jumps back to, while the exception generation code needs to get the exact position it was in to generate some more metadata. Ah. For ARM we do this by advancing env-regs[15] in linux-user/main.c cpu_loop() when we get an EXCP_SWI. It looks like we do it that way for MIPS and SPARC at least too, so I guess it would be better for s390 to follow that pattern. -- PMM
Re: [Qemu-devel] [PATCH 16/17] s390x: translate engine for s390x CPU
On 29.03.2011, at 11:17, Peter Maydell wrote: On 29 March 2011 09:55, Alexander Graf ag...@suse.de wrote: On 28.03.2011, at 17:40, Peter Maydell wrote: On 24 March 2011 15:58, Alexander Graf ag...@suse.de wrote: diff --git a/target-s390x/translate.c b/target-s390x/translate.c +case 0x4: /* LMG R1,R3,D2(B2) [RSE] */ +case 0x24: /* STMG R1,R3,D2(B2) [RSE] */ +case 0x26: /* STMH R1,R3,D2(B2) [RSE] */ +case 0x96: /* LMH R1,R3,D2(B2) [RSE] */ +/* Apparently, unrolling lmg/stmg of any size gains performance - + even for very long ones... */ Doesn't this take you over MAX_OP_PER_INSTR for some cases? I haven't encountered any case where it does. Really? MAX_OP_PER_INSTR's only 96, so if you have 16 registers in your loop then it only needs 6 ops per register to hit that, and the op 0x96 case looks like it must generate more than that. I have an item on my todo list to see if I can add an assert() check for this limit, because there are cases for Neon load/stores that apparently hit it. Hrm - might be useful to increase MAX_OP_PER_INSTR then, no? +tmp2 = tcg_const_i64uint64_t)i2) 48) | 0xULL); This line is over 80 chars, as are a handful of others in this file. Yeah, I generally see the 80 char limit as soft limit and make it hard on ~90. If a line is only over it by very little, readability doesn't improve by breaking it up. So far, everyone agreed to that approach :). 80 chars reduces readability for me because I have emacs configured to make long lines look very ugly so I don't write them :-) Heh, I have vi configured to color in lines 80 chars as well, so I usually only keep them there Also, if we want the standard to be 'soft 80, hard 90' we should say so in CODING_STYLE... *shrug* so far CODING_STYLE has only brought badness to qemu. The new style is less readable than the common dominator that was there before (Fabrice's coding style) and resulted in man-years of wasted time on rejected patches for the sake of braces. I'd rather want to remove the file than patching it (which again would create a mail thread of 300 mails, waste 5 man-years of productivity and bring us no gain in the end). +case 0xa: /* SVCI [RR] */ +insn = ld_code2(s-pc); +debug_insn(insn); +i = insn 0xff; +#ifdef CONFIG_USER_ONLY +s-pc += 2; +#endif +update_psw_addr(s); +gen_op_calc_cc(s); Why do we only need to update s-pc if CONFIG_USER_ONLY? Not saying it's wrong, but it could use an explanatory comment... The user code needs to know where it jumps back to, while the exception generation code needs to get the exact position it was in to generate some more metadata. Ah. For ARM we do this by advancing env-regs[15] in linux-user/main.c cpu_loop() when we get an EXCP_SWI. It looks like we do it that way for MIPS and SPARC at least too, so I guess it would be better for s390 to follow that pattern. Unfortunately, it's not that easy as there are 2 different ways of issuing an SVC (actual SVC and EXECUTE instruction), both of which having different instruction lengths. So we really need to keep the information in the instruction decoder :( Alex
Re: [Qemu-devel] [PATCH 16/17] s390x: translate engine for s390x CPU
On 29 March 2011 10:25, Alexander Graf ag...@suse.de wrote: On 29.03.2011, at 11:17, Peter Maydell wrote: On 29 March 2011 09:55, Alexander Graf ag...@suse.de wrote: On 28.03.2011, at 17:40, Peter Maydell wrote: Doesn't this take you over MAX_OP_PER_INSTR for some cases? I haven't encountered any case where it does. Really? MAX_OP_PER_INSTR's only 96, so if you have 16 registers in your loop then it only needs 6 ops per register to hit that, and the op 0x96 case looks like it must generate more than that. I have an item on my todo list to see if I can add an assert() check for this limit, because there are cases for Neon load/stores that apparently hit it. Hrm - might be useful to increase MAX_OP_PER_INSTR then, no? The TCG README suggests that 20 ops is the point at which it's better to use a helper function. I don't know if that advice still holds true or not. The user code needs to know where it jumps back to, while the exception generation code needs to get the exact position it was in to generate some more metadata. Ah. For ARM we do this by advancing env-regs[15] in linux-user/main.c cpu_loop() when we get an EXCP_SWI. It looks like we do it that way for MIPS and SPARC at least too, so I guess it would be better for s390 to follow that pattern. Unfortunately, it's not that easy as there are 2 different ways of issuing an SVC (actual SVC and EXECUTE instruction), both of which having different instruction lengths. So we really need to keep the information in the instruction decoder :( Does a Linux kernel running on a real s/390 system have to deal with the distinction in its SVC entrypoint? -- PMM
Re: [Qemu-devel] [PATCH 16/17] s390x: translate engine for s390x CPU
On 29.03.2011, at 11:56, Peter Maydell wrote: On 29 March 2011 10:25, Alexander Graf ag...@suse.de wrote: On 29.03.2011, at 11:17, Peter Maydell wrote: On 29 March 2011 09:55, Alexander Graf ag...@suse.de wrote: On 28.03.2011, at 17:40, Peter Maydell wrote: Doesn't this take you over MAX_OP_PER_INSTR for some cases? I haven't encountered any case where it does. Really? MAX_OP_PER_INSTR's only 96, so if you have 16 registers in your loop then it only needs 6 ops per register to hit that, and the op 0x96 case looks like it must generate more than that. I have an item on my todo list to see if I can add an assert() check for this limit, because there are cases for Neon load/stores that apparently hit it. Hrm - might be useful to increase MAX_OP_PER_INSTR then, no? The TCG README suggests that 20 ops is the point at which it's better to use a helper function. I don't know if that advice still holds true or not. Well, the amount of stores / loads stays the same throughout a helper or not. The main reason to not use a heavy helper is that all the register values need to be flushed out, adding quite substantial overhead for the stores/loads on those. When flushing them, tcg also loses its ability to predict which values are used, so dead code elimination is moot. So yes, complex operations that could be optimized by a C compiler should go into helpers - preferably if they only use input and output values that get passed as registers. As soon as they touch memory or env though, the world isn't as great anymore :(. The user code needs to know where it jumps back to, while the exception generation code needs to get the exact position it was in to generate some more metadata. Ah. For ARM we do this by advancing env-regs[15] in linux-user/main.c cpu_loop() when we get an EXCP_SWI. It looks like we do it that way for MIPS and SPARC at least too, so I guess it would be better for s390 to follow that pattern. Unfortunately, it's not that easy as there are 2 different ways of issuing an SVC (actual SVC and EXECUTE instruction), both of which having different instruction lengths. So we really need to keep the information in the instruction decoder :( Does a Linux kernel running on a real s/390 system have to deal with the distinction in its SVC entrypoint? An s390 kernel gets the size information as interrupt parameter, so it can simply calculate the NIP from that. Alex
Re: [Qemu-devel] [PATCH 16/17] s390x: translate engine for s390x CPU
On 24 March 2011 15:58, Alexander Graf ag...@suse.de wrote: diff --git a/target-s390x/translate.c b/target-s390x/translate.c +typedef struct DisasContext DisasContext; +struct DisasContext { + uint64_t pc; + int is_jmp; + enum cc_op cc_op; + CPUS390XState *env; + struct TranslationBlock *tb; +}; I don't think anything actually uses dc-env, does it? (I like the way that almost none of the translate.c code gets a CPUState pointer, makes it hard to accidentally write buggy code that relies on things not in the tb_flags.) +static char cpu_reg_names[10*3 + 6*4]; I can see code ins390x_translate_init() which sets this up, but I can't see anything which uses it? +#if 0 /* reads four when it should read only 3 */ + case 2: Is there any point having #if'd out broken code? Either fix it and enable it, or just have a comment to the effect that we could have optimised versions for cases 2, 4, 5, 6 but currently do not. + case 0x4: /* LMG R1,R3,D2(B2) [RSE] */ + case 0x24: /* STMG R1,R3,D2(B2) [RSE] */ + case 0x26: /* STMH R1,R3,D2(B2) [RSE] */ + case 0x96: /* LMH R1,R3,D2(B2) [RSE] */ + /* Apparently, unrolling lmg/stmg of any size gains performance - + even for very long ones... */ Doesn't this take you over MAX_OP_PER_INSTR for some cases? + tmp2 = tcg_const_i64uint64_t)i2) 48) | 0xULL); This line is over 80 chars, as are a handful of others in this file. +case 0xa: /* SVCI [RR] */ +insn = ld_code2(s-pc); +debug_insn(insn); +i = insn 0xff; +#ifdef CONFIG_USER_ONLY +s-pc += 2; +#endif +update_psw_addr(s); +gen_op_calc_cc(s); Why do we only need to update s-pc if CONFIG_USER_ONLY? Not saying it's wrong, but it could use an explanatory comment... -- PMM