Brice -- I know this started on the hwloc list and then bounced over here, but 
we're running out of ideas.

Got any clue what is happening here?  From the OMPI config logs that Fabian 
sent, it looks like hwloc built with libnuma support properly...?



> On Oct 30, 2015, at 4:34 PM, Ralph Castain <r...@open-mpi.org> wrote:
> 
> I honestly have no ideas…best I can see, it looks like hwloc feels that it 
> cannot perform that operation and returns an error.
> 
> 
>> On Oct 30, 2015, at 1:31 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>> wrote:
>> 
>> He's building and running on a single server (leo).  From the configure 
>> output, all the numa libs and headers are available on this leo server.
>> 
>> 
>>> On Oct 30, 2015, at 11:09 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>> 
>>> I think Dave has probably hit the problem - that node may well not have a 
>>> “numa” object. You also might check that node “leo” has libnuma on it
>>> 
>>> 
>>>> On Oct 30, 2015, at 6:48 AM, Dave Love <d.l...@liverpool.ac.uk> wrote:
>>>> 
>>>> Fabian Wein <fabian.w...@fau.de> writes:
>>>> 
>>>>> Is this a valid test?
>>>>> 
>>>>> 
>>>>> /opt/openmpi-1.10.0-gcc/bin/mpiexec -n 4 hostname
>>>>> leo
>>>>> leo
>>>>> leo
>>>>> leo
>>>> 
>>>> So, unless you turned off the default binding -- to socket? check the
>>>> mpirun man page -- it worked, but the "numa" level failed.  I don't know
>>>> if that level has to exist, and there have been bugs in that area
>>>> before.  Running lstopo might be useful, and checking that you're
>>>> picking up the right hwloc dynamic library.
>>>> 
>>>> What happens if you try to bind to sockets, assuming you don't want to
>>>> bind to cores?  [I don't understand why the default isn't to cores when
>>>> you have only one process per core.]
>>>> 
>>>>> /opt/openmpi-1.10.0-gcc/bin/mpiexec -bind-to numa -n 4 hostname
>>>>> --------------------------------------------------------------------------
>>>>> A request was made to bind a process, but at least one node does NOT
>>>>> support binding processes to cpus.
>>>>> 
>>>>> Node:  leo
>>>>> This usually is due to not having libnumactl and libnumactl-devel
>>>>> installed on the node.
>>>> 
>>>> By the way, you can check the binding done, independently to what
>>>> openmpi says, with
>>>> mpirun ... hwloc-ps
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2015/10/27957.php
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2015/10/27958.php
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/10/27961.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/10/27962.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to