Re: [Qemu-devel] TCG broken in system mode (was TCG assertion with qemu-system-mipsel)

Aurélien Jarno Thu, 21 Mar 2013 15:12:55 -0700

On Thu, Mar 21, 2013 at 04:04:44PM +0900, Yeongkyoon Lee wrote:
> On 03/18/2013 07:27 AM, Aurélien Jarno wrote:
> >On Wed, Mar 06, 2013 at 07:10:17AM +0100, Aurélien Jarno wrote:
> >>On Wed, Mar 06, 2013 at 11:05:15AM +0900, Yeongkyoon Lee wrote:
> >>>On 03/05/2013 11:18 PM, Aurélien Jarno wrote:
> >>>>On Mon, Mar 04, 2013 at 05:37:31PM +0100, Aurélien Jarno wrote:
> >>>>>Hi,
> >>>>>
> >>>>>On Sat, Feb 23, 2013 at 11:10:18PM +0100, Stefan Weil wrote:
> >>>>>>This assertion occured with latest git master:
> >>>>>>
> >>>>>>qemu-system-mipsel: /src/qemu/tcg/tcg-op.h:2589:
> >>>>>>  tcg_gen_goto_tb: Assertion `(tcg_ctx.goto_tb_issue_mask & (1 << idx))
> >>>>>>== 0' failed.
> >>>>>>Aborted
> >>>>>>
> >>>>>>QEMU was built with --enable-debug and running a Debian MIPS Lenny (NFS
> >>>>>>root).
> >>>>>>The assertion happened when running "apt-get update" in the guest.
> >>>>>>
> >>>>>Is it something reproductible or more or less random? Have you Cc:ed
> >>>>>Richard because it's related to the latest patches?
> >>>>>
> >>>>>On my side I am experiencing random segfaults in various guests (at
> >>>>>least PowerPC, MIPS, SH4 and ARM). I have found a way to bisect it, even
> >>>>>if it is quite long (building Perl + the testsuite). Currently I know
> >>>>>that 1.3 is affected, while 1.2 is not.
> >>>>>
> >>>>I have found that the issue comes from the following commits, which
> >>>>unfortunately are not bisectable one by one (though it won't change the
> >>>>results a lot):
> >>>>
> >>>>     commit b76f0d8c2e3eac94bc7fd90a510cb7426b2a2699
> >>>>     Author: Yeongkyoon Lee <yeongkyoon....@samsung.com>
> >>>>     Date:   Wed Oct 31 16:04:25 2012 +0900
> >>>>         tcg: Optimize qemu_ld/st by generating slow paths at the end of 
> >>>> a block
> >>>>         Add optimized TCG qemu_ld/st generation which locates the code 
> >>>> of TLB miss
> >>>>         cases at the end of a block after generating the other IRs.
> >>>>         Currently, this optimization supports only i386 and x86_64 hosts.
> >>>>         Signed-off-by: Yeongkyoon Lee <yeongkyoon....@samsung.com>
> >>>>         Signed-off-by: Blue Swirl <blauwir...@gmail.com>
> >>>>     commit fdbb84d1332ae0827d60f1a2ca03c7d5678c6edd
> >>>>     Author: Yeongkyoon Lee <yeongkyoon....@samsung.com>
> >>>>     Date:   Wed Oct 31 16:04:24 2012 +0900
> >>>>         tcg: Add extended GETPC mechanism for MMU helpers with ldst 
> >>>> optimization
> >>>>         Add GETPC_EXT which is used by MMU helpers to selectively 
> >>>> calculate the code
> >>>>         address of accessing guest memory when called from a qemu_ld/st 
> >>>> optimized code
> >>>>         or a C function. Currently, it supports only i386 and x86-64 
> >>>> hosts.
> >>>>         Signed-off-by: Yeongkyoon Lee <yeongkyoon....@samsung.com>
> >>>>         Signed-off-by: Blue Swirl <blauwir...@gmail.com>
> >>>>     commit 32761257c0b9fa7ee04d2871a6e48a41f119c469
> >>>>     Author: Yeongkyoon Lee <yeongkyoon....@samsung.com>
> >>>>     Date:   Wed Oct 31 16:04:23 2012 +0900
> >>>>         configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st 
> >>>> optimization
> >>>>         Enable CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st 
> >>>> optimization only when
> >>>>         a host is i386 or x86_64.
> >>>>         Signed-off-by: Yeongkyoon Lee <yeongkyoon....@samsung.com>
> >>>>         Signed-off-by: Blue Swirl <blauwir...@gmail.com>
> >>>>
> >>>>I will try to understand why.
> >>>>
> >>>>
> >>>Hi Aurélien,
> >>>Do you mean that those random segfaults occurred only when
> >>>configured with "--enable-debug"?
> >>>Although I cannot see how my commits affect debug built image at a
> >>>glance, I'll do double-check.
> >>>Thanks.
> >>The problem is there even without configuring QEMU with --enable-debug.
> >>It justs doesn't happens very often, and very randomly. The only way to
> >>reproduce it each time is to launch a big task in the guest (for me
> >>building Perl) and see if it completes or now. It can take up to one
> >>hour until it happens.
> >>
> >>I should precise that the segfault is on the guest side.
> >>
> >>I have tried to look at your patches, and so far I haven't found the
> >>issue. It seems the two first patches are fine, ie I have verified the
> >>return address is always correctly computed.
> >>
> >I still haven't found the issue, but on the other hand I can't find any
> >problem in your code, after reading it dozen of times. I also tried to
> >modify it as less as possible while issuing the slow path back inside
> >the TB and it fixes the problem. So it really looks like to be due to
> >the slow path being at the end of the TB, and not to a bug in the code
> >generating it. After adding various checks, I am also convinced the
> >address computed in GETPC_EXT() is always correct. I have to say I am
> >running out of ideas.
> >
> >One way to reproduce the issue more easily is to reduce the size of the
> >generated code buffer, for example by setting it to 512kB for both
> >MIN_CODE_GEN_BUFFER_SIZE and MAX_CODE_GEN_BUFFER_SIZE in
> >translate-all.c. That way booting an ARM guest triggers plenty of
> >segmentation faults or other strange issues with your patch but not
> >without.
> >
> >OTOH increasing this size make the issue to almost disappear even when
> >building perl including the testsuite (for that it has to be at least
> >512MB).
> >
> 
> Although I've not succeeded to reproduce the problem, I've found a
> suspicious code stub about boundary-checking of generated code
> (is_tcg_gen_code() in translate-all.c).
> 
> The code is supposed to be changed as follows.case
> Before:
>     return (tc_ptr >= (uintptr_t)tcg_ctx.code_gen_buffer &&
>                 tc_ptr < (uintptr_t)(tcg_ctx.code_gen_buffer +
>                 tcg_ctx.code_gen_buffer_max_size));
> After:
>     return (tc_ptr >= (uintptr_t)tcg_ctx.code_gen_buffer &&
>                 tc_ptr < (uintptr_t)(tcg_ctx.code_gen_buffer +
>                 tcg_ctx.code_gen_buffer_size));
> 
> The reason is that there could happen to miss out the generated code
> ranges by "(TCG_MAX_OP_SIZE * OPC_BUF_SIZE)".
> See code_gen_alloc() in translate-all.c:
>     tcg_ctx.code_gen_buffer_max_size = tcg_ctx.code_gen_buffer_size
> - (TCG_MAX_OP_SIZE * OPC_BUF_SIZE)
>


Very good catch! Thanks. This fixes the issue I observed.

To give more details, code_gen_buffer_max_size corresponds to the
threshold which clear all TBs before continuing generating code. This
means that it can be exceeded by a few bytes and up to (TCG_MAX_OP_SIZE
* OPC_BUF_SIZE) bytes which corresponds to the maximum bytes of a
generated TB.

Could you please send a proper patch to fix that? I think it should also
be fixed in the next 0.13.x and 0.14.x releases (0.12.x releases are not
affected), so please Cc: qemu-stable (even if the patch will have to be 
slightly tweaked).

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurel...@aurel32.net                 http://www.aurel32.net

Re: [Qemu-devel] TCG broken in system mode (was TCG assertion with qemu-system-mipsel)

Reply via email to