Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-13 Thread Bob Friesenhahn
On Thu, 12 Apr 2012, Hans Rosenfeld wrote: I was hoping to investigate GCC's bdver1 output (which does try to address L1 instruction cache issues) on Illumos but I discovered that Illumos is not currently capable of executing this code (illegal instruction). Did you test this with the latest

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-13 Thread Hans Rosenfeld
On Fri, Apr 13, 2012 at 08:57:21AM -0500, Bob Friesenhahn wrote: On Thu, 12 Apr 2012, Hans Rosenfeld wrote: I was hoping to investigate GCC's bdver1 output (which does try to address L1 instruction cache issues) on Illumos but I discovered that Illumos is not currently capable of executing

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-13 Thread Bob Friesenhahn
On Fri, 13 Apr 2012, Hans Rosenfeld wrote: These messages appear in 'cpustat -h' output on Opteron 62XX: CPU performance counter interface: AMD Family 15h (unsupported) See BIOS and Kernel Developer's Guide (BKDG) For AMD Family 15h Processors. (Note that this pcbe does not explicitly

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-13 Thread Bob Friesenhahn
As a follow-up to this discussion, one reason why my application shows that locks are held for a long time is that it currently only uses simple mutex locks. It should also be using condition variables to handle the case of waiting for work to do (rather than access locking). Regardless,

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-12 Thread Hans Rosenfeld
On Wed, Apr 11, 2012 at 02:18:14PM -0400, Richard Lowe wrote: Different application algorithms show different high-runners but the high-runner locks are usually not called very often but are held for an abnormally long time.  For some algorithms the high-runners are in libc (e.g. malloc)

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-12 Thread Bob Friesenhahn
On Thu, 12 Apr 2012, Hans Rosenfeld wrote: FYI, there is a white paper about the L1I cache aliasing issue on family 0x15 here: http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf I don't know whether this applies to Illumos or this particular problem at all, but it was the

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-12 Thread Hans Rosenfeld
On Thu, Apr 12, 2012 at 09:39:23AM -0500, Bob Friesenhahn wrote: My OpenMP-based application definitely fits the description of a potentially problematic application because it does execute the same code in tight loops in both cores of a compute unit. That is its whole purpose. The

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-12 Thread Richard Lowe
The problem you had with SIGILL from AVX code is very likely because you're on AMD chips and are running bits without the changes to support that. I'm not sure whether anyone is currently shipping that code. -- Rich --- illumos-discuss Archives:

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-12 Thread Bob Friesenhahn
On Thu, 12 Apr 2012, Hans Rosenfeld wrote: On Thu, Apr 12, 2012 at 09:39:23AM -0500, Bob Friesenhahn wrote: My OpenMP-based application definitely fits the description of a potentially problematic application because it does execute the same code in tight loops in both cores of a compute unit.

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-11 Thread Bob Friesenhahn
On Wed, 11 Apr 2012, Rich wrote: You neglect to mention your test platform's kernel or userland version - are you running the latest illumos head, the stock kernel+userland provided by OpenIndiana/Nexenta of some version, etc, etc? The OS was installed by someone else and the testing is on a

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-11 Thread Bob Friesenhahn
On Wed, 11 Apr 2012, Bob Friesenhahn wrote: On Wed, 11 Apr 2012, Rich wrote: You neglect to mention your test platform's kernel or userland version - are you running the latest illumos head, the stock kernel+userland provided by OpenIndiana/Nexenta of some version, etc, etc? The OS was

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-11 Thread Richard Lowe
Upfront: This is not my area, neither of expertise nor code. You may also have more luck on developer@ First things first, I'd wonder whether we were losing time to contention, or to the locking primitives just being unperformant for some reason. What does plockstat report regarding

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-11 Thread Bob Friesenhahn
On Wed, 11 Apr 2012, Richard Lowe wrote: Upfront: This is not my area, neither of expertise nor code. You may also have more luck on developer@ Ok. First things first, I'd wonder whether we were losing time to contention, or to the locking primitives just being unperformant for some

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-11 Thread Bob Friesenhahn
On Wed, 11 Apr 2012, Dan McDonald wrote: One other thing you may wish to try is use libumem's version of malloc() instead. You can run libumem w/o any recompilation by doing stupid environment tricks: LD_PRELOAD=libumem.so will be enough to make libumem's version of malloc be used

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-11 Thread Richard Lowe
Different application algorithms show different high-runners but the high-runner locks are usually not called very often but are held for an abnormally long time.  For some algorithms the high-runners are in libc (e.g. malloc) and OpenMP but in some others it is my application's explicit

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-11 Thread Alasdair Lumsden
On 11/04/2012 15:26, Bob Friesenhahn wrote: $ uname -a SunOS openindiana 5.11 oi_151a2 i86pc i386 i86pc Solaris This would pre-date the release of Bulldozer-based CPUs and it is a wonder that the system works at all. oi_151a2 is actually pre-stable 2 which came out 4 weeks ago. As such it

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-11 Thread Richard Lowe
cputrack is likely to be causing some of that impact. The command is giving you I$ misses in general, I$ misses that hit in L2, and I$ misses that went to memory. Given that apparently bulldozer shares the I$ to some degree What's plockstat -a showing? (all events, not just contention). I'm not

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-11 Thread Richard Lowe
I'd also be interested in: dtrace -n 'profile-97hz /pid == $target/ { @[ustack()] = count(); }' -o foo.log -c '... your app ...' foo.log is perhaps going to be large. -- Rich --- illumos-discuss Archives:

Re: [discuss] User-space locks slow on Opteron 6200?

2012-04-10 Thread Rich
You neglect to mention your test platform's kernel or userland version - are you running the latest illumos head, the stock kernel+userland provided by OpenIndiana/Nexenta of some version, etc, etc? I know that the Bulldozer family is a strange beast, and a few commits related to them have made