Dear all:
My platform is:
Intel Pentium 4 CPU
OpenSolaris B74, built by myself
Sun Studio 11
In my program, I use asm("rdtsc") to measure the time cost between two rdtsc.
for example:
int some_func(...)
{
long long time1, time2;
int i = 3198, j = 324;
asm volatile("rdtsc" : "=A" (time1));
....
i = i + j * i / j;
asm volatile("rdtsc" : "=A" (time2))
return i;
}
int main(...)
{
....
some_func();
....
}
When I compile this program using "cc example.c" and disasmble a.out
by dis, the program logic is ok. The output is
some_func()
main+0x36: 0f 31 rdtsc
main+0x38: 89 45 f4 movl %eax,-0xc(%ebp)
main+0x3b: 89 55 f8 movl %edx,-0x8(%ebp)
main+0x3e: 8b 45 e8 movl -0x18(%ebp),%eax
main+0x41: 03 45 e4 addl -0x1c(%ebp),%eax
main+0x44: 89 45 e8 movl %eax,-0x18(%ebp)
main+0x47: 8b 45 e8 movl -0x18(%ebp),%eax
main+0x4a: 0f af 45 e4 imull -0x1c(%ebp),%eax
main+0x4e: 89 45 e8 movl %eax,-0x18(%ebp)
main+0x51: 8b 45 e8 movl -0x18(%ebp),%eax
main+0x54: 99 cltd
main+0x55: f7 7d e4 idivl -0x1c(%ebp)
main+0x58: 8b d0 movl %eax,%edx
main+0x5a: 89 55 e8 movl %edx,-0x18(%ebp)
main+0x5d: 0f 31 rdtsc
main+0x5f: 89 45 ec movl %eax,-0x14(%ebp)
main+0x62: 89 55 f0 movl %edx,-0x10(%ebp)
When I compile this program using "cc -xO5", the dis output is
some_func()
main+0x7: 0f 31 rdtsc
main+0x9: 89 45 e8 movl %eax,-0x18(%ebp)
main+0xc: 89 55 ec movl %edx,-0x14(%ebp)
main+0xf: 0f 31 rdtsc
main+0x11: 89 45 f0 movl %eax,-0x10(%ebp)
main+0x14: 89 55 f4 movl %edx,-0xc(%ebp)
main+0x17: 8b 5d f0 movl -0x10(%ebp),%ebx
main+0x1a: 8b 45 f4 movl -0xc(%ebp),%eax
main+0x1d: 8b 4d e8 movl -0x18(%ebp),%ecx
main+0x20: 8b 55 ec movl -0x14(%ebp),%edx
main+0x23: 2b d9 subl %ecx,%ebx
main+0x25: 1b c2 sbbl %edx,%eax
main+0x27: 89 5d e0 movl %ebx,-0x20(%ebp)
main+0x2a: 89 45 e4 movl %eax,-0x1c(%ebp)
Now the program logic is wrong! sun cc thinks rdtscs are irrelative
with the other parts in some_func, and then it advances the second
asm("rdtsc")!
In this case, I can't measure the time cost.
Then how can I stop sun cc optimization partly between these two asm
statements when using -xO5 optimization to the whole program?
I mean the second rdtsc should be put after the statement i = i + j *
i / j strictly. (though I know the instructions will be executed in
x86 cpu out-of-order, and the result may not be very precise, but it
still works)
Any good ideas?
TIA
Regards,
TJ
_______________________________________________
opensolaris-discuss mailing list
[email protected]