Hello,
To check whether kstat is able to report the psrset definitions, I
defined a set consisting of 2 CPUs (psrset -c 1-2) CPU1 and CPU2. The
remaining CPUs (CPU0, CPU2..CPU23) were left undefined.
On the machine, we can execute the "kstat" command and receive (among
1000s of lines) the following info:
module: unix instance: 0
name: pset class: misc
avenrun_15min 70
avenrun_1min 53
avenrun_5min 47
crtime 0
ncpus 22
runnable 1146912
snaptime 80083.491239257
updates 790784
waiting 0
module: unix instance: 1
name: pset class: misc
avenrun_15min 0
avenrun_1min 0
avenrun_5min 0
crtime 79983.070416351
ncpus 2
runnable 0
snaptime 80083.595839172
updates 1005
waiting 0
which is not very comprehensive and doesn't even tell, which CPUs are
part of the particular set, but could probably be used to at least warn
about the existence of a CPU set and prevent the (not very intuitive)
error message and consequent abort.
However, doing the same on the machine without the pset defined, we get:
module: unix instance: 0
name: pset class: misc
avenrun_15min 50
avenrun_1min 38
avenrun_5min 41
crtime 0
ncpus 24
runnable 1163866
snaptime 81105.346688035
updates 801003
waiting 0
so the (only) processor set encompasses all 24 (virtual) cores. This may
be the key to check for.
The C-API to check for processor set(s) is available through the libpool
library, which allows more resource pool configuration than just
processor sets, but can probably act as an abstraction layer for
different Solaris flavors...
Matthias
Hello
So processor sets are not taken into account when Solaris reports
topology information in kstat etc.
Do you know if hwloc can query processor sets from the C interface?
If so, we could apply the processor set mask to hwloc object cpusets
during discovery to avoid your error.
Brice
Le 05/01/2016 10:18, Karl Behler a écrit :
There was a processor set defined (command psrset) on this machine.
Having removed the psrset hwloc-info produces a result without error
messages:
hwloc-info -v
depth 0: 1 Machine (type #1)
depth 1: 2 NUMANode (type #2)
depth 2: 2 Package (type #3)
depth 3: 12 Core (type #5)
depth 4: 24 PU (type #6)
It seems the concept of defining a psrset is in contradiction to what
hwloc and/or openmpi expects/allows.
On 04.01.16 18:16, Karl Behler wrote:
We used to run our MPI application with the SUNWhpc implementation
from Sun/Oracle. (This was derived from openmpi 1.5.)
However, the Oracle HPC implementation fails for the new Solaris 11.3
platform.
So we downloaded and made openmpi 1.10.1 on this platform from scratch.
All seems fine and a simple test application runs fine.
However, with the real application we are running into a hwloc problem.
So we also downloaded and made the hwloc package 1.11.2.
Now examining hardware locality we get the following error:
hwloc-info -v --whole-io
****************************************************************************
* hwloc 1.11.2 has encountered what looks like an error from the
operating system.
*
* Core (P#0 cpuset 0x00001001) intersects with NUMANode (P#1 cpuset
0x0003c001) without inclusion!
* Error occurred in topology.c line 1046
*
* The following FAQ entry in the hwloc documentation may help:
* What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's
mailing list,
* along with any relevant topology information from your platform.
****************************************************************************
depth 0: 1 Machine (type #1)
depth 1: 2 Package (type #3)
depth 2: 2 NUMANode (type #2)
depth 3: 1 Core (type #5)
depth 4: 24 PU (type #6)
Since I could not find the mentioned FAQ topic I'm asking the list
for advice.
Our system is an Oracle/ Solaris 11.3 (latest patch level) on an
Intel hardware platform from Sun.
output of uname -a -> SunOS sxaug28 5.11 11.3 i86pc i386 i86pc
output of psrinfo -v ->
Status of virtual processor 0 as of: 01/04/2016 17:10:17
on-line since 01/04/2016 14:44:28.
The i386 processor operates at 1600 MHz,
and has an i387 compatible floating point processor.
Status of virtual processor 1 as of: 01/04/2016 17:10:17
on-line since 01/04/2016 14:45:10.
The i386 processor operates at 1600 MHz,
and has an i387 compatible floating point processor.
.
. (similar lines removed)
.
Status of virtual processor 23 as of: 01/04/2016 17:10:17
on-line since 01/04/2016 14:45:11.
The i386 processor operates at 1600 MHz,
and has an i387 compatible floating point processor.
Following comes the script which was used to make hwloc: (used
compiler: Sunstudio 12.4, see config.log as bz2 attachment)
setenv CFLAGS "-m64 -xtarget=generic -xarch=sse2 -xprefetch
-xprefetch_level=2 -xvector=simd -xdepend=yes -xbuiltin=%all -xO5"
setenv CXXFLAGS "$CFLAGS"
setenv FCFLAGS "-m64 -xtarget=generic -xarch=sse2 -xprefetch
-xprefetch_level=2 -xvector=simd -stackvar -xO5"
setenv FFLAGS "$FCFLAGS"
setenv PREFIX /usr/openmpi/hwloc-1.11.2
./configure --prefix=$PREFIX --disable-debug
dmake -j 12
# as root: make install
# : cp -p config.status $PREFIX/config.status
Any advice much appreciated.
Karl
_______________________________________________
hwloc-users mailing list
hwloc-users_at_[hidden]
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
Searchable archives:
http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1236.php
--
Dr. Karl Behler
CODAC & IT services ASDEX Upgrade
phon +49 89 3299-1351 fax 3299-961351
_______________________________________________
hwloc-users mailing list
hwloc-users_at_[hidden]
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
Link to this post:
http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1236.php