Hello. I am investigating performance characteristics of some processes which 
taken as a group dont perform as well as hoped.
  Basic characterisitics: These processes are 32 bit apps, multithreaded to a 
greater or lesser degree, share a fairly large (~2GB) dataset via mmap'd files. 
The processes are compiled on Studio 8 with no optimization whatsoever. The 
representative target machine is a 1280 with from 4-12 cpus.
  Using collect() from studio10, I use hardware counters to collect the 
instruction count and IC_miss numbers for a representative interval. I observe 
that the ration between the two is ~9:1, ie over an interval in which ~54 
million instructions are completed, appx 6 million IC_misses are reported. 
Similarly, capturing clock cycles vs instructions, I see a net instruction rate 
of about 233 million instructions/sec on 900Mhz cpus. Since these guys have two 
integer pipelines thats pretty poor.
  So the first question is, is IC_miss reporting only the on-chip instruction 
cache stats, and if so is there a way to determine how many of those misses 
also missed (or hit...I can do the math) in the external cache.
  Also, since the instruction stream is pretty predictable once initiated, is 
there a way for us to do explicit prefetches of "future functions" that might 
help us reduce this phenomenon?  The limited references I've seen to using 
prefetch all discuss prefetching data but not instructions.
  Its possible I'm totally barking up the wrong tree, but thats the nice thing 
about starting from the beginning... there are opportunities every where :-)
This message posted from opensolaris.org
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to