On Thu, 18 Mar 2010, stephane eranian wrote:

> What about your pin your thread and run it at real-time prio.  Make sure
> it is non-blocking, minimal syscalls. Compare cat /proc/interrupts
> before and after
> for that CPU.
> 
> But I think what we are after is the number of transitions in and out of priv
> level 3. Could be interrupts, could be syscalls, traps. I believe the walker
> runs at the current priv level.

I did some more tests, with the attached assembly language program that 
loops for 10 billion instructions.  This is on a Core2 machine with 2.6.32 
and perf_events.


$ perf stat -e instructions:u,cycles:u,faults:u -- ./ten_billion

 Performance counter stats for './ten_billion':

    10000000506  instructions             #      2.000 IPC  
     5000523251  cycles                  
              1  page-faults             

    1.689113069  seconds time elapsed


This test has no memory access at all and is less than 4kb in size, hence 
the 1 page-fault to bring in the executable.

I ran cat /proc/interrupts before and after (in a script).  In the time 
the test ran, there were
   13 USB interrupts
   49 ethernet interrupts
   15 hard drive interrupts
    6 NMI
  423 timer ticks
+   6 Performance counter interrupts
=====
  512 interrupts

The retired instruction counter reported 506 extra instructions... so 
possibly NMI or perf counter interrupts don't count (or are the same 
thing).  This makes it look like much of the "non-determinism" can be 
attributed solely to interrupts.  It's a shame there isn't an easy way 
that I can find for getting this count on a per-process basis.

I should next make a memory heavy test to see how that changes things.

Vince

# ten_billion by Vince Weaver
# needs a 64-bit x86 system to run
# Compile with:
#     as -o ten_billion.o ten_billion.s ; ld -o ten_billion ten_billion.o

             # count for 10 billion instructions
             #   total is 
             #    2 + (( (inside) + 2) * 10000) + 2 + (9996 * 2) + 4
             #    inside = ( 2 + (499997 * 2)) = 999996
             #    total = 2 + ((999996+2) * 10000) + 2 + 19992 + 4
             #    total = 10 billion
             
        .globl _start   
_start:

        xor     %edx,%edx               # outer counter
        mov     $10000,%edx             # 
outside_loop:

        xor     %ecx,%ecx               # not needed, pads total to 1M
        mov     $499997,%ecx            # load counter
inside_loop:    
        dec     %ecx                    # repeat count times
        jnz     inside_loop

        dec     %edx
        jnz     outside_loop
        
        # above gets us to within 19,994
        
        xor     %ecx,%ecx
        mov     $9996,%ecx
final_loop:
        dec     %ecx
        jnz     final_loop
        
        

        #================================
        # Exit
        #================================

exit:
        nop                             # make it an even number of insn
        xor     %rdi,%rdi               # we return 0
        mov     $60,%rax
        syscall

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to