Re: [hwloc-users] error from the operating system - Solaris 11.3 - SOLVED

2016-01-07 Thread Brice Goglin
Thanks, I copied useful information from this thread and some links to
https://github.com/open-mpi/hwloc/issues/143

However, not sure I'll have time to look at this in the near future :/

Brice




Le 07/01/2016 09:03, Matthias Reich a écrit :
> Hello,
>
> To check whether kstat is able to report the psrset definitions, I
> defined a set consisting of 2 CPUs (psrset -c 1-2) CPU1 and CPU2. The
> remaining CPUs (CPU0, CPU2..CPU23) were left undefined.
>
> On the machine, we can execute the "kstat" command and receive (among
> 1000s of lines) the following info:
>
> module: unixinstance: 0
> name:   psetclass:misc
> avenrun_15min   70
> avenrun_1min53
> avenrun_5min47
> crtime  0
> ncpus   22
> runnable1146912
> snaptime80083.491239257
> updates 790784
> waiting 0
>
>
> module: unixinstance: 1
> name:   psetclass:misc
> avenrun_15min   0
> avenrun_1min0
> avenrun_5min0
> crtime  79983.070416351
> ncpus   2
> runnable0
> snaptime80083.595839172
> updates 1005
> waiting 0
>
> which is not very comprehensive and doesn't even tell, which CPUs are
> part of the particular set, but could probably be used to at least warn
> about the existence of a CPU set and prevent the (not very intuitive)
> error message and consequent abort.
>
> However, doing the same on the machine without the pset defined, we get:
>
> module: unixinstance: 0
> name:   psetclass:misc
> avenrun_15min   50
> avenrun_1min38
> avenrun_5min41
> crtime  0
> ncpus   24
> runnable1163866
> snaptime81105.346688035
> updates 801003
> waiting 0
>
> so the (only) processor set encompasses all 24 (virtual) cores. This
> may be the key to check for.
>
> The C-API to check for processor set(s) is available through the
> libpool library, which allows more resource pool configuration than
> just processor sets, but can probably act as an abstraction layer for
> different Solaris flavors...
>
> Matthias
>
>>  Hello
>> So processor sets are not taken into account when Solaris reports
>> topology information in kstat etc.
>> Do you know if hwloc can query processor sets from the C interface?
>> If so, we could apply the processor set mask to hwloc object cpusets
>> during discovery to avoid your error.
>> Brice
>>
>> Le 05/01/2016 10:18, Karl Behler a écrit :
>>> There was a processor set defined (command psrset) on this machine.
>>> Having removed the psrset hwloc-info produces a result without error
>>> messages:
>>>
>>> hwloc-info -v
>>> depth 0:1 Machine (type #1)
>>>  depth 1:   2 NUMANode (type #2)
>>>   depth 2:  2 Package (type #3)
>>>depth 3: 12 Core (type #5)
>>> depth 4:24 PU (type #6)
>>>
>>> It seems the concept of defining a psrset is in contradiction to what
>>> hwloc and/or openmpi expects/allows.
>>>
>>>
>>> On 04.01.16 18:16, Karl Behler wrote:
 We used to run our MPI application with the SUNWhpc implementation
 from Sun/Oracle. (This was derived from openmpi 1.5.)
 However, the Oracle HPC implementation fails for the new Solaris 11.3
 platform.
 So we downloaded and made openmpi 1.10.1 on this platform from
 scratch.

 All seems fine and a simple test application runs fine.
 However, with the real application we are running into a hwloc
 problem.

 So we also downloaded and made the hwloc package 1.11.2.

 Now examining hardware locality we get the following error:

 hwloc-info -v --whole-io
 


 * hwloc 1.11.2 has encountered what looks like an error from the
 operating system.
 *
 * Core (P#0 cpuset 0x1001) intersects with NUMANode (P#1 cpuset
 0x0003c001) without inclusion!
 * Error occurred in topology.c line 1046
 *
 * The following FAQ entry in the hwloc documentation may help:
 *   What should I do when hwloc reports "operating system" warnings?
 * Otherwise please report this error message to the hwloc user's
 mailing list,

Re: [hwloc-users] error from the operating system - Solaris 11.3 - SOLVED

2016-01-07 Thread Matthias Reich

Hello,

To check whether kstat is able to report the psrset definitions, I
defined a set consisting of 2 CPUs (psrset -c 1-2) CPU1 and CPU2. The
remaining CPUs (CPU0, CPU2..CPU23) were left undefined.

On the machine, we can execute the "kstat" command and receive (among
1000s of lines) the following info:

module: unixinstance: 0
name:   psetclass:misc
avenrun_15min   70
avenrun_1min53
avenrun_5min47
crtime  0
ncpus   22
runnable1146912
snaptime80083.491239257
updates 790784
waiting 0


module: unixinstance: 1
name:   psetclass:misc
avenrun_15min   0
avenrun_1min0
avenrun_5min0
crtime  79983.070416351
ncpus   2
runnable0
snaptime80083.595839172
updates 1005
waiting 0

which is not very comprehensive and doesn't even tell, which CPUs are
part of the particular set, but could probably be used to at least warn
about the existence of a CPU set and prevent the (not very intuitive)
error message and consequent abort.

However, doing the same on the machine without the pset defined, we get:

module: unixinstance: 0
name:   psetclass:misc
avenrun_15min   50
avenrun_1min38
avenrun_5min41
crtime  0
ncpus   24
runnable1163866
snaptime81105.346688035
updates 801003
waiting 0

so the (only) processor set encompasses all 24 (virtual) cores. This may 
be the key to check for.


The C-API to check for processor set(s) is available through the libpool 
library, which allows more resource pool configuration than just 
processor sets, but can probably act as an abstraction layer for

different Solaris flavors...

Matthias


 Hello
So processor sets are not taken into account when Solaris reports
topology information in kstat etc.
Do you know if hwloc can query processor sets from the C interface?
If so, we could apply the processor set mask to hwloc object cpusets
during discovery to avoid your error.
Brice

Le 05/01/2016 10:18, Karl Behler a écrit :

There was a processor set defined (command psrset) on this machine.
Having removed the psrset hwloc-info produces a result without error
messages:

hwloc-info -v
depth 0:1 Machine (type #1)
 depth 1:   2 NUMANode (type #2)
  depth 2:  2 Package (type #3)
   depth 3: 12 Core (type #5)
depth 4:24 PU (type #6)

It seems the concept of defining a psrset is in contradiction to what
hwloc and/or openmpi expects/allows.


On 04.01.16 18:16, Karl Behler wrote:

We used to run our MPI application with the SUNWhpc implementation
from Sun/Oracle. (This was derived from openmpi 1.5.)
However, the Oracle HPC implementation fails for the new Solaris 11.3
platform.
So we downloaded and made openmpi 1.10.1 on this platform from scratch.

All seems fine and a simple test application runs fine.
However, with the real application we are running into a hwloc problem.

So we also downloaded and made the hwloc package 1.11.2.

Now examining hardware locality we get the following error:

hwloc-info -v --whole-io


* hwloc 1.11.2 has encountered what looks like an error from the
operating system.
*
* Core (P#0 cpuset 0x1001) intersects with NUMANode (P#1 cpuset
0x0003c001) without inclusion!
* Error occurred in topology.c line 1046
*
* The following FAQ entry in the hwloc documentation may help:
*   What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's
mailing list,
* along with any relevant topology information from your platform.


depth 0:1 Machine (type #1)
 depth 1:   2 Package (type #3)
  depth 2:  2 NUMANode (type #2)
   depth 3: 1 Core (type #5)
depth 4:24 PU (type #6)

Since I could not find the mentioned FAQ topic I'm asking the list
for advice.

Our system is an Oracle/ Solaris 11.3 (latest patch level) on an
Intel hardware platform from Sun.

output of uname -a -> SunOS sxaug28 5.11 11.3 i86pc i386 i86pc
output of psrinfo -v ->

Re: [hwloc-users] error from the operating system - Solaris 11.3 - SOLVED

2016-01-05 Thread Brice Goglin
Hello
So processor sets are not taken into account when Solaris reports
topology information in kstat etc.
Do you know if hwloc can query processor sets from the C interface?
If so, we could apply the processor set mask to hwloc object cpusets
during discovery to avoid your error.
Brice




Le 05/01/2016 10:18, Karl Behler a écrit :
> There was a processor set defined (command psrset) on this machine.
> Having removed the psrset hwloc-info produces a result without error
> messages:
>
> hwloc-info -v
> depth 0:1 Machine (type #1)
>  depth 1:   2 NUMANode (type #2)
>   depth 2:  2 Package (type #3)
>depth 3: 12 Core (type #5)
> depth 4:24 PU (type #6)
>
> It seems the concept of defining a psrset is in contradiction to what
> hwloc and/or openmpi expects/allows.
>
>
> On 04.01.16 18:16, Karl Behler wrote:
>> We used to run our MPI application with the SUNWhpc implementation
>> from Sun/Oracle. (This was derived from openmpi 1.5.)
>> However, the Oracle HPC implementation fails for the new Solaris 11.3
>> platform.
>> So we downloaded and made openmpi 1.10.1 on this platform from scratch.
>>
>> All seems fine and a simple test application runs fine.
>> However, with the real application we are running into a hwloc problem.
>>
>> So we also downloaded and made the hwloc package 1.11.2.
>>
>> Now examining hardware locality we get the following error:
>>
>> hwloc-info -v --whole-io
>> 
>>
>> * hwloc 1.11.2 has encountered what looks like an error from the
>> operating system.
>> *
>> * Core (P#0 cpuset 0x1001) intersects with NUMANode (P#1 cpuset
>> 0x0003c001) without inclusion!
>> * Error occurred in topology.c line 1046
>> *
>> * The following FAQ entry in the hwloc documentation may help:
>> *   What should I do when hwloc reports "operating system" warnings?
>> * Otherwise please report this error message to the hwloc user's
>> mailing list,
>> * along with any relevant topology information from your platform.
>> 
>>
>> depth 0:1 Machine (type #1)
>>  depth 1:   2 Package (type #3)
>>   depth 2:  2 NUMANode (type #2)
>>depth 3: 1 Core (type #5)
>> depth 4:24 PU (type #6)
>>
>> Since I could not find the mentioned FAQ topic I'm asking the list
>> for advice.
>>
>> Our system is an Oracle/ Solaris 11.3 (latest patch level) on an
>> Intel hardware platform from Sun.
>>
>> output of uname -a -> SunOS sxaug28 5.11 11.3 i86pc i386 i86pc
>> output of psrinfo -v ->
>>
>> Status of virtual processor 0 as of: 01/04/2016 17:10:17
>>   on-line since 01/04/2016 14:44:28.
>>   The i386 processor operates at 1600 MHz,
>> and has an i387 compatible floating point processor.
>> Status of virtual processor 1 as of: 01/04/2016 17:10:17
>>   on-line since 01/04/2016 14:45:10.
>>   The i386 processor operates at 1600 MHz,
>> and has an i387 compatible floating point processor.
>> .
>> . (similar lines removed)
>> .
>> Status of virtual processor 23 as of: 01/04/2016 17:10:17
>>   on-line since 01/04/2016 14:45:11.
>>   The i386 processor operates at 1600 MHz,
>> and has an i387 compatible floating point processor.
>>
>> Following comes the script which was used to make hwloc: (used
>> compiler: Sunstudio 12.4, see config.log as bz2 attachment)
>>
>> setenv CFLAGS "-m64 -xtarget=generic -xarch=sse2 -xprefetch
>> -xprefetch_level=2 -xvector=simd -xdepend=yes -xbuiltin=%all -xO5"
>> setenv CXXFLAGS "$CFLAGS"
>> setenv FCFLAGS "-m64 -xtarget=generic -xarch=sse2 -xprefetch
>> -xprefetch_level=2 -xvector=simd -stackvar -xO5"
>> setenv FFLAGS "$FCFLAGS"
>> setenv PREFIX /usr/openmpi/hwloc-1.11.2
>> ./configure --prefix=$PREFIX --disable-debug
>> dmake -j 12
>> # as root: make install
>> #: cp -p config.status $PREFIX/config.status
>>
>> Any advice much appreciated.
>>
>> Karl
>>
>>
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>> Searchable archives: 
>> http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1236.php
>
>
> -- 
> Dr. Karl Behler   
> CODAC & IT services ASDEX Upgrade
> phon +49 89 3299-1351 fax 3299-961351
>
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
> Link to this post: 
> http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1236.php