Hi Peter, More detail illustration at below. > -----Original Message----- > From: Peter Maydell [mailto:peter.mayd...@linaro.org] > Sent: Friday, March 24, 2017 6:06 PM > To: Wangjintang > Cc: Pranith Kumar; Shlomo Pongratz (A); Wanghaibin (Benjamin); qemu-arm; > qemu-devel; Ori Chalak (A) > Subject: Re: [Qemu-arm] [patch 1/1]about armv8's prefetch decode > No, these changes look wrong. PRFM instructions do not need to > do anything and should definitely not be emitting any intermediate > code. In particular if you let execution fall through and try > do_gpr_ld() then it will really do a load, which might cause > an exception -- this is specifically forbidden for PRFM. > Architecturally the ARM ARM says "it is valid for the PE to > treat any or all prefetch instructions as a NOP", which is > what QEMU does. > > The existing code is correct. In general you should not > expect to be able to deduce the guest instructions from > the intermediate code representation. >
"it is valid for the PE to treat any or all prefetch instructions as a NOP", from software view, it's right. the patch regard the prefetch as load instruction, at the same time don't affect rm/rt register. Only the PRFM instruction been emitted to intermediate code and do a really load, then we can get the memory address relative to the prefetch instruction. Because the rm/rt register don't been modified, so the application can run correctly. BTW, the new added code default is disable. So for the common user, have no affect to them. In our case, we need all the instruction trace & ld/st instruction's access memory address, the trace as the input for chip cycle-accurate model. Similar with flexus + qemu. Current code that skip generate prefetch instructions' intermediate code, So we can get prefetch instruction, but can't get the prefetch instruction relative memory address. We have tested that the ratio of prefetch instructions is about 2%~3% during run Dhrystone in system mode. The ratio is high. ________________ ________________ | | | | | | | | | Qemu | | chip | | | instruction trace | cycle-accurate | | | -----------------> | model | | | memory trace | | |________________| |________________| Ori Chalak's explain this as below: " Indeed, prefetch instruction affects only the micro architecture, and hence not needed for running correctly the generated code. However, we developed a performance simulator for a detailed ARMv8 CPU model, and use Qemu to resolve the functionality. And for this purpose we need to translate all instructions that may affect the pipeline behavior, caches, etc. This is not the major usage of Qemu, however there may be others doing this and it may help them. http://www.linux-kvm.org/images/4/45/01x09-Christopher_Covington-Using_Upstream_QEMU_for_CASS.pdf " Best Regards, Wang jintang / Jed Huawei Technologies Co., Ltd. Email: wangjint...@huawei.com http://www.huawei.com