On Thu, 18 Mar 2010, stephane eranian wrote:
> What about your pin your thread and run it at real-time prio. Make sure
> it is non-blocking, minimal syscalls. Compare cat /proc/interrupts
> before and after
> for that CPU.
>
> But I think what we are after is the number of transitions in and out of priv
> level 3. Could be interrupts, could be syscalls, traps. I believe the walker
> runs at the current priv level.
I did some more tests, with the attached assembly language program that
loops for 10 billion instructions. This is on a Core2 machine with 2.6.32
and perf_events.
$ perf stat -e instructions:u,cycles:u,faults:u -- ./ten_billion
Performance counter stats for './ten_billion':
10000000506 instructions # 2.000 IPC
5000523251 cycles
1 page-faults
1.689113069 seconds time elapsed
This test has no memory access at all and is less than 4kb in size, hence
the 1 page-fault to bring in the executable.
I ran cat /proc/interrupts before and after (in a script). In the time
the test ran, there were
13 USB interrupts
49 ethernet interrupts
15 hard drive interrupts
6 NMI
423 timer ticks
+ 6 Performance counter interrupts
=====
512 interrupts
The retired instruction counter reported 506 extra instructions... so
possibly NMI or perf counter interrupts don't count (or are the same
thing). This makes it look like much of the "non-determinism" can be
attributed solely to interrupts. It's a shame there isn't an easy way
that I can find for getting this count on a per-process basis.
I should next make a memory heavy test to see how that changes things.
Vince
# ten_billion by Vince Weaver
# needs a 64-bit x86 system to run
# Compile with:
# as -o ten_billion.o ten_billion.s ; ld -o ten_billion ten_billion.o
# count for 10 billion instructions
# total is
# 2 + (( (inside) + 2) * 10000) + 2 + (9996 * 2) + 4
# inside = ( 2 + (499997 * 2)) = 999996
# total = 2 + ((999996+2) * 10000) + 2 + 19992 + 4
# total = 10 billion
.globl _start
_start:
xor %edx,%edx # outer counter
mov $10000,%edx #
outside_loop:
xor %ecx,%ecx # not needed, pads total to 1M
mov $499997,%ecx # load counter
inside_loop:
dec %ecx # repeat count times
jnz inside_loop
dec %edx
jnz outside_loop
# above gets us to within 19,994
xor %ecx,%ecx
mov $9996,%ecx
final_loop:
dec %ecx
jnz final_loop
#================================
# Exit
#================================
exit:
nop # make it an even number of insn
xor %rdi,%rdi # we return 0
mov $60,%rax
syscall
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel