On Fri, Jun 15, 2012 at 12:30 AM, Lluís Vilanova <vilan...@ac.upc.edu> wrote:
[...]
> Now that I think of it, you will have problems generating code to surround 
> each
> qemu_ld/st with a lightweight mechanism to get the time. In x86 it would be
> rdtsc, but you want to generate a host rdtsc instruction inside the code
> generated by QEMU's TCG, so you should also have to hack TCG (or the code
> generation pointers) to issue an rdtsc instruction.

Even rdtsc would introduce enough noise that it wouldn't be reliable
for such a micro measurement:  as far as I understand it, this instruction
can be reordered, so you need to flush the pipeline before issuing it.

Intel has a document about that:
download.intel.com/embedded/software/IA/324264.pdf
The overhead of their proposed method is so high that it's likely it
would take longer than the execution of the fast path itself.

IMHO a mix of YeongKyoon Lee way to count ld/st and comparison
between user mode and softmmu still seems to be the best approach
(well unless you have access to a cycle accurate simulator :-).


Laurent

Reply via email to