Re: [perf-discuss] Single-thread performance on Niagara

Steve Sistare Thu, 15 May 2008 10:14:47 -0700

Some instructions cause the hardware thread to stall for a few cycles;
for your simple test, they are probably taken branches, and loads
(even those that hit in L1$).  Disassemble your loop and it will
probably be obvious.


When multiple threads run on the core, the stall cycles for one thread
are consumed by useful work performed by another thread.

- Steve Sistare

Elad Lahav wrote:
> I am toying around with a T1000 machine (T1 1GHz processor, 8 cores,
> 4-threads per core, 8GB RAM). I was unable to saturate a single Gigabit NIC
> with netperf, so I started investigating with the help of performance
> counters. It turns out that even a simple for loop that only increments a
> counter can do at most 250 million instructions per second (hardly any
> cache/TLB misses, as expected). From my understanding of the Niagara
> architecture, a single thread executing on a core should be able to fully
> utilise it (1 billion instructions per second in my case).
> 
> What am I missing?
> 
> Thanks, Elad
> 
> P.S., I am tracking performance with cputrack -c Instr_cnt,sys
> 
> 
> This message posted from opensolaris.org 
> _______________________________________________ perf-discuss mailing list 
> perf-discuss@opensolaris.org

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Re: [perf-discuss] Single-thread performance on Niagara

Reply via email to