On Thu, 12 Apr 2012, Hans Rosenfeld wrote:
I was hoping to investigate GCC's bdver1 output (which does try to
address L1 instruction cache issues) on Illumos but I discovered that
Illumos is not currently capable of executing this code (illegal
instruction).
Did you test this with the latest
On Fri, Apr 13, 2012 at 08:57:21AM -0500, Bob Friesenhahn wrote:
On Thu, 12 Apr 2012, Hans Rosenfeld wrote:
I was hoping to investigate GCC's bdver1 output (which does try to
address L1 instruction cache issues) on Illumos but I discovered that
Illumos is not currently capable of executing
On Fri, 13 Apr 2012, Hans Rosenfeld wrote:
These messages appear in 'cpustat -h' output on Opteron 62XX:
CPU performance counter interface: AMD Family 15h (unsupported)
See BIOS and Kernel Developer's Guide (BKDG) For AMD Family 15h
Processors. (Note that this pcbe does not explicitly
As a follow-up to this discussion, one reason why my application shows
that locks are held for a long time is that it currently only uses
simple mutex locks. It should also be using condition variables to
handle the case of waiting for work to do (rather than access
locking).
Regardless,
On Wed, Apr 11, 2012 at 02:18:14PM -0400, Richard Lowe wrote:
Different application algorithms show different high-runners but the
high-runner locks are usually
not called very often but are held for an abnormally long time. For some
algorithms the high-runners are in libc (e.g. malloc)
On Thu, 12 Apr 2012, Hans Rosenfeld wrote:
FYI, there is a white paper about the L1I cache aliasing issue on family
0x15 here:
http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf
I don't know whether this applies to Illumos or this particular problem
at all, but it was the
On Thu, Apr 12, 2012 at 09:39:23AM -0500, Bob Friesenhahn wrote:
My OpenMP-based application definitely fits the description of a
potentially problematic application because it does execute the same
code in tight loops in both cores of a compute unit. That is its
whole purpose. The
The problem you had with SIGILL from AVX code is very likely because you're
on AMD chips and are running bits without the changes to support that. I'm
not sure whether anyone is currently shipping that code.
-- Rich
---
illumos-discuss
Archives:
On Thu, 12 Apr 2012, Hans Rosenfeld wrote:
On Thu, Apr 12, 2012 at 09:39:23AM -0500, Bob Friesenhahn wrote:
My OpenMP-based application definitely fits the description of a
potentially problematic application because it does execute the same
code in tight loops in both cores of a compute unit.
On Wed, 11 Apr 2012, Rich wrote:
You neglect to mention your test platform's kernel or userland version
- are you running the latest illumos head, the stock kernel+userland
provided by OpenIndiana/Nexenta of some version, etc, etc?
The OS was installed by someone else and the testing is on a
On Wed, 11 Apr 2012, Bob Friesenhahn wrote:
On Wed, 11 Apr 2012, Rich wrote:
You neglect to mention your test platform's kernel or userland version
- are you running the latest illumos head, the stock kernel+userland
provided by OpenIndiana/Nexenta of some version, etc, etc?
The OS was
Upfront: This is not my area, neither of expertise nor code. You may
also have more luck on developer@
First things first, I'd wonder whether we were losing time to
contention, or to the locking primitives just being unperformant for
some reason. What does plockstat report regarding
On Wed, 11 Apr 2012, Richard Lowe wrote:
Upfront: This is not my area, neither of expertise nor code. You may
also have more luck on developer@
Ok.
First things first, I'd wonder whether we were losing time to
contention, or to the locking primitives just being unperformant for
some
On Wed, 11 Apr 2012, Dan McDonald wrote:
One other thing you may wish to try is use libumem's version of malloc()
instead. You can run libumem w/o any recompilation by doing stupid
environment tricks:
LD_PRELOAD=libumem.so
will be enough to make libumem's version of malloc be used
Different application algorithms show different high-runners but the
high-runner locks are usually
not called very often but are held for an abnormally long time. For some
algorithms the high-runners are in libc (e.g. malloc) and OpenMP but in some
others it is my application's explicit
On 11/04/2012 15:26, Bob Friesenhahn wrote:
$ uname -a
SunOS openindiana 5.11 oi_151a2 i86pc i386 i86pc Solaris
This would pre-date the release of Bulldozer-based CPUs and it is a
wonder that the system works at all.
oi_151a2 is actually pre-stable 2 which came out 4 weeks ago. As such it
cputrack is likely to be causing some of that impact.
The command is giving you I$ misses in general, I$ misses that hit in
L2, and I$ misses that went to memory. Given that apparently
bulldozer shares the I$ to some degree
What's plockstat -a showing? (all events, not just contention).
I'm not
I'd also be interested in:
dtrace -n 'profile-97hz /pid == $target/ { @[ustack()] = count(); }'
-o foo.log -c '... your app ...'
foo.log is perhaps going to be large.
-- Rich
---
illumos-discuss
Archives:
You neglect to mention your test platform's kernel or userland version
- are you running the latest illumos head, the stock kernel+userland
provided by OpenIndiana/Nexenta of some version, etc, etc?
I know that the Bulldozer family is a strange beast, and a few commits
related to them have made
19 matches
Mail list logo