Re: [hwloc-users] error from the operating system - Solaris 11.3 - SOLVED
Thanks, I copied useful information from this thread and some links to https://github.com/open-mpi/hwloc/issues/143 However, not sure I'll have time to look at this in the near future :/ Brice Le 07/01/2016 09:03, Matthias Reich a écrit : > Hello, > > To check whether kstat is able to report the psrset definitions, I > defined a set consisting of 2 CPUs (psrset -c 1-2) CPU1 and CPU2. The > remaining CPUs (CPU0, CPU2..CPU23) were left undefined. > > On the machine, we can execute the "kstat" command and receive (among > 1000s of lines) the following info: > > module: unixinstance: 0 > name: psetclass:misc > avenrun_15min 70 > avenrun_1min53 > avenrun_5min47 > crtime 0 > ncpus 22 > runnable1146912 > snaptime80083.491239257 > updates 790784 > waiting 0 > > > module: unixinstance: 1 > name: psetclass:misc > avenrun_15min 0 > avenrun_1min0 > avenrun_5min0 > crtime 79983.070416351 > ncpus 2 > runnable0 > snaptime80083.595839172 > updates 1005 > waiting 0 > > which is not very comprehensive and doesn't even tell, which CPUs are > part of the particular set, but could probably be used to at least warn > about the existence of a CPU set and prevent the (not very intuitive) > error message and consequent abort. > > However, doing the same on the machine without the pset defined, we get: > > module: unixinstance: 0 > name: psetclass:misc > avenrun_15min 50 > avenrun_1min38 > avenrun_5min41 > crtime 0 > ncpus 24 > runnable1163866 > snaptime81105.346688035 > updates 801003 > waiting 0 > > so the (only) processor set encompasses all 24 (virtual) cores. This > may be the key to check for. > > The C-API to check for processor set(s) is available through the > libpool library, which allows more resource pool configuration than > just processor sets, but can probably act as an abstraction layer for > different Solaris flavors... > > Matthias > >> Hello >> So processor sets are not taken into account when Solaris reports >> topology information in kstat etc. >> Do you know if hwloc can query processor sets from the C interface? >> If so, we could apply the processor set mask to hwloc object cpusets >> during discovery to avoid your error. >> Brice >> >> Le 05/01/2016 10:18, Karl Behler a écrit : >>> There was a processor set defined (command psrset) on this machine. >>> Having removed the psrset hwloc-info produces a result without error >>> messages: >>> >>> hwloc-info -v >>> depth 0:1 Machine (type #1) >>> depth 1: 2 NUMANode (type #2) >>> depth 2: 2 Package (type #3) >>>depth 3: 12 Core (type #5) >>> depth 4:24 PU (type #6) >>> >>> It seems the concept of defining a psrset is in contradiction to what >>> hwloc and/or openmpi expects/allows. >>> >>> >>> On 04.01.16 18:16, Karl Behler wrote: We used to run our MPI application with the SUNWhpc implementation from Sun/Oracle. (This was derived from openmpi 1.5.) However, the Oracle HPC implementation fails for the new Solaris 11.3 platform. So we downloaded and made openmpi 1.10.1 on this platform from scratch. All seems fine and a simple test application runs fine. However, with the real application we are running into a hwloc problem. So we also downloaded and made the hwloc package 1.11.2. Now examining hardware locality we get the following error: hwloc-info -v --whole-io * hwloc 1.11.2 has encountered what looks like an error from the operating system. * * Core (P#0 cpuset 0x1001) intersects with NUMANode (P#1 cpuset 0x0003c001) without inclusion! * Error occurred in topology.c line 1046 * * The following FAQ entry in the hwloc documentation may help: * What should I do when hwloc reports "operating system" warnings? * Otherwise please report this error message to the hwloc user's mailing list,
Re: [hwloc-users] error from the operating system - Solaris 11.3 - SOLVED
Hello, To check whether kstat is able to report the psrset definitions, I defined a set consisting of 2 CPUs (psrset -c 1-2) CPU1 and CPU2. The remaining CPUs (CPU0, CPU2..CPU23) were left undefined. On the machine, we can execute the "kstat" command and receive (among 1000s of lines) the following info: module: unixinstance: 0 name: psetclass:misc avenrun_15min 70 avenrun_1min53 avenrun_5min47 crtime 0 ncpus 22 runnable1146912 snaptime80083.491239257 updates 790784 waiting 0 module: unixinstance: 1 name: psetclass:misc avenrun_15min 0 avenrun_1min0 avenrun_5min0 crtime 79983.070416351 ncpus 2 runnable0 snaptime80083.595839172 updates 1005 waiting 0 which is not very comprehensive and doesn't even tell, which CPUs are part of the particular set, but could probably be used to at least warn about the existence of a CPU set and prevent the (not very intuitive) error message and consequent abort. However, doing the same on the machine without the pset defined, we get: module: unixinstance: 0 name: psetclass:misc avenrun_15min 50 avenrun_1min38 avenrun_5min41 crtime 0 ncpus 24 runnable1163866 snaptime81105.346688035 updates 801003 waiting 0 so the (only) processor set encompasses all 24 (virtual) cores. This may be the key to check for. The C-API to check for processor set(s) is available through the libpool library, which allows more resource pool configuration than just processor sets, but can probably act as an abstraction layer for different Solaris flavors... Matthias Hello So processor sets are not taken into account when Solaris reports topology information in kstat etc. Do you know if hwloc can query processor sets from the C interface? If so, we could apply the processor set mask to hwloc object cpusets during discovery to avoid your error. Brice Le 05/01/2016 10:18, Karl Behler a écrit : There was a processor set defined (command psrset) on this machine. Having removed the psrset hwloc-info produces a result without error messages: hwloc-info -v depth 0:1 Machine (type #1) depth 1: 2 NUMANode (type #2) depth 2: 2 Package (type #3) depth 3: 12 Core (type #5) depth 4:24 PU (type #6) It seems the concept of defining a psrset is in contradiction to what hwloc and/or openmpi expects/allows. On 04.01.16 18:16, Karl Behler wrote: We used to run our MPI application with the SUNWhpc implementation from Sun/Oracle. (This was derived from openmpi 1.5.) However, the Oracle HPC implementation fails for the new Solaris 11.3 platform. So we downloaded and made openmpi 1.10.1 on this platform from scratch. All seems fine and a simple test application runs fine. However, with the real application we are running into a hwloc problem. So we also downloaded and made the hwloc package 1.11.2. Now examining hardware locality we get the following error: hwloc-info -v --whole-io * hwloc 1.11.2 has encountered what looks like an error from the operating system. * * Core (P#0 cpuset 0x1001) intersects with NUMANode (P#1 cpuset 0x0003c001) without inclusion! * Error occurred in topology.c line 1046 * * The following FAQ entry in the hwloc documentation may help: * What should I do when hwloc reports "operating system" warnings? * Otherwise please report this error message to the hwloc user's mailing list, * along with any relevant topology information from your platform. depth 0:1 Machine (type #1) depth 1: 2 Package (type #3) depth 2: 2 NUMANode (type #2) depth 3: 1 Core (type #5) depth 4:24 PU (type #6) Since I could not find the mentioned FAQ topic I'm asking the list for advice. Our system is an Oracle/ Solaris 11.3 (latest patch level) on an Intel hardware platform from Sun. output of uname -a -> SunOS sxaug28 5.11 11.3 i86pc i386 i86pc output of psrinfo -v ->
Re: [hwloc-users] error from the operating system - Solaris 11.3 - SOLVED
Hello So processor sets are not taken into account when Solaris reports topology information in kstat etc. Do you know if hwloc can query processor sets from the C interface? If so, we could apply the processor set mask to hwloc object cpusets during discovery to avoid your error. Brice Le 05/01/2016 10:18, Karl Behler a écrit : > There was a processor set defined (command psrset) on this machine. > Having removed the psrset hwloc-info produces a result without error > messages: > > hwloc-info -v > depth 0:1 Machine (type #1) > depth 1: 2 NUMANode (type #2) > depth 2: 2 Package (type #3) >depth 3: 12 Core (type #5) > depth 4:24 PU (type #6) > > It seems the concept of defining a psrset is in contradiction to what > hwloc and/or openmpi expects/allows. > > > On 04.01.16 18:16, Karl Behler wrote: >> We used to run our MPI application with the SUNWhpc implementation >> from Sun/Oracle. (This was derived from openmpi 1.5.) >> However, the Oracle HPC implementation fails for the new Solaris 11.3 >> platform. >> So we downloaded and made openmpi 1.10.1 on this platform from scratch. >> >> All seems fine and a simple test application runs fine. >> However, with the real application we are running into a hwloc problem. >> >> So we also downloaded and made the hwloc package 1.11.2. >> >> Now examining hardware locality we get the following error: >> >> hwloc-info -v --whole-io >> >> >> * hwloc 1.11.2 has encountered what looks like an error from the >> operating system. >> * >> * Core (P#0 cpuset 0x1001) intersects with NUMANode (P#1 cpuset >> 0x0003c001) without inclusion! >> * Error occurred in topology.c line 1046 >> * >> * The following FAQ entry in the hwloc documentation may help: >> * What should I do when hwloc reports "operating system" warnings? >> * Otherwise please report this error message to the hwloc user's >> mailing list, >> * along with any relevant topology information from your platform. >> >> >> depth 0:1 Machine (type #1) >> depth 1: 2 Package (type #3) >> depth 2: 2 NUMANode (type #2) >>depth 3: 1 Core (type #5) >> depth 4:24 PU (type #6) >> >> Since I could not find the mentioned FAQ topic I'm asking the list >> for advice. >> >> Our system is an Oracle/ Solaris 11.3 (latest patch level) on an >> Intel hardware platform from Sun. >> >> output of uname -a -> SunOS sxaug28 5.11 11.3 i86pc i386 i86pc >> output of psrinfo -v -> >> >> Status of virtual processor 0 as of: 01/04/2016 17:10:17 >> on-line since 01/04/2016 14:44:28. >> The i386 processor operates at 1600 MHz, >> and has an i387 compatible floating point processor. >> Status of virtual processor 1 as of: 01/04/2016 17:10:17 >> on-line since 01/04/2016 14:45:10. >> The i386 processor operates at 1600 MHz, >> and has an i387 compatible floating point processor. >> . >> . (similar lines removed) >> . >> Status of virtual processor 23 as of: 01/04/2016 17:10:17 >> on-line since 01/04/2016 14:45:11. >> The i386 processor operates at 1600 MHz, >> and has an i387 compatible floating point processor. >> >> Following comes the script which was used to make hwloc: (used >> compiler: Sunstudio 12.4, see config.log as bz2 attachment) >> >> setenv CFLAGS "-m64 -xtarget=generic -xarch=sse2 -xprefetch >> -xprefetch_level=2 -xvector=simd -xdepend=yes -xbuiltin=%all -xO5" >> setenv CXXFLAGS "$CFLAGS" >> setenv FCFLAGS "-m64 -xtarget=generic -xarch=sse2 -xprefetch >> -xprefetch_level=2 -xvector=simd -stackvar -xO5" >> setenv FFLAGS "$FCFLAGS" >> setenv PREFIX /usr/openmpi/hwloc-1.11.2 >> ./configure --prefix=$PREFIX --disable-debug >> dmake -j 12 >> # as root: make install >> #: cp -p config.status $PREFIX/config.status >> >> Any advice much appreciated. >> >> Karl >> >> >> ___ >> hwloc-users mailing list >> hwloc-us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users >> Searchable archives: >> http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1236.php > > > -- > Dr. Karl Behler > CODAC & IT services ASDEX Upgrade > phon +49 89 3299-1351 fax 3299-961351 > > > > ___ > hwloc-users mailing list > hwloc-us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > Link to this post: > http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1236.php