Re: [hwloc-users] Travis CI unit tests failing with HW "operating system" error

2018-09-14 Thread Madhu, Kavitha Tiptur
We will upgrade the hwloc submodule used in MPICH asap. IIRC, we have supressed 
hwloc warnings as well. I will double check this.

Kavitha

On Sep 14, 2018, at 12:36 AM, Brice Goglin 
mailto:brice.gog...@inria.fr>> wrote:


If lstopo fails there, run "hwloc-gather-topology foo" and send foo.tar.bz2

As a workaround for ARMCI, you may try setting HWLOC_COMPONENTS=no_os,stop in 
the environment so that hwloc behaves as if the operating system had no 
topology support.

Brice


Le 14/09/2018 à 06:11, Jeff Hammond a écrit :
All of the job failures have this warning so I am inclined to think they are 
related.  I don't know what I should expect from lstopo on inside of AWS, but I 
guess I'll try it.

I was using the hwloc shipped with the mpich-3.3b1.  Talk to the MPICH team if 
you want them to upgrade :-)

Jeff

On Thu, Sep 13, 2018 at 8:42 AM, Brice Goglin 
mailto:brice.gog...@inria.fr>> wrote:

This is actually just a warning. Usually it causes the topology to be wrong 
(like a missing object), but it shouldn't prevent the program from working. Are 
you sure your programs are failing because of hwloc? Do you have a way to run 
lstopo on that node?

By the way, you shouldn't use hwloc 2.0.0rc2, at least because it's old, it has 
a broken ABI, and it's a RC :)

Brice


Le 13/09/2018 à 16:12, Jeff Hammond a écrit :
I am running ARMCI-MPI over MPICH in a Travis CI Linux instance and topology is 
causing it to fail.  I do not care about topology in a virtualized environment. 
 How do I fix this?


* hwloc 2.0.0rc2-git has encountered what looks like an error from the 
operating system.
*
* Group0 (cpuset 0x,0x) intersects with L3 (cpuset 
0x1000,0x0212) without inclusion!
* Error occurred in topology.c line 1384
*
* The following FAQ entry in the hwloc documentation may help:
*   What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing list
* along with the files generated by the hwloc-gather-topology script.


https://travis-ci.org/jeffhammond/armci-mpi/jobs/425342479 has all of the 
details.

Jeff


--
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/



___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users



--
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/



___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] Netloc integration with hwloc

2018-04-04 Thread Madhu, Kavitha Tiptur

>> 
>> — I tried building older netloc with hwloc 2.0 and it throws compiler 
>> errors. Note that netloc was cloned from it’s git repo.
> 
> My guess is that the "map" part that joins netloc's info about the
> fabric with hwloc's info about the nodes doesn't like hwloc 2.0. But
> that should be easy to disable in the Makefiles and/or to update for
> hwloc 2.0.
> 

—We do need the map functionality since we need to identify which processor 
core is mapped to which network node (from my understanding of the 
documentation and the definition of mapping, 
Please correct me if I am wrong here). My other concern is, in the older 
version of netloc, netloc_ib_gather_raw is not listing any subnets on the 
cluster where the newer version built within hwloc reports some. 
I compared the perl scripts and there doesn’t seem to be much difference in the 
two other than the newer version adding some pattern matching for hfi.


 The plan should rather be to tell us what you need from netloc so that
 we can reenable it with a good API. We hear lots of people saying they
 are interested in netloc, but *nobody* ever told us anything about what
 they want to do for real. And I am not even sure anybody ever played
 with the old API. This software cannot go forward unless we know where
 it's going. There are many ways to design the netloc API.
>> — At this point, our requirement is to expose graph construction from raw 
>> topology xml and mapping and traversal at best.
>> I see some of these already defined in private/hwloc.h in the newer version. 
>> Our problem here Is that we couldn’t build it in embedded mode, which is how 
>> we are using hwloc.
> 
> Can't you hack your build system to build hwloc in standalone instead of
> embedded mode for testing? Or use an external hwloc instead of your
> embedded one?

— We can do this, shouldn’t be a major concern. But we can only make this work 
if we use the newer hwloc version and expose some of the functions as I 
mentioned. 

> I'd like to get feedback about private/netloc.h before making some of it
> public.
> 
> I'll look at making libnetloc embeddable in 2.1.
> 
> Brice
> 
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] Netloc integration with hwloc

2018-04-04 Thread Madhu, Kavitha Tiptur
Hi
Chiming in on this conversation, we have a few questions/concerns with some of 
the responses we received from you.
>> 
>> If you really want the old netloc API now, you could try hwloc 2.x with
>> the old netloc. But that's certainly not maintained anymore, and that
>> only works for IB while the new netloc should have OPA and Cray support
>> soon.

— I tried building older netloc with hwloc 2.0 and it throws compiler errors. 
Note that netloc was cloned from it’s git repo.

 
>> 
>> The plan should rather be to tell us what you need from netloc so that
>> we can reenable it with a good API. We hear lots of people saying they
>> are interested in netloc, but *nobody* ever told us anything about what
>> they want to do for real. And I am not even sure anybody ever played
>> with the old API. This software cannot go forward unless we know where
>> it's going. There are many ways to design the netloc API.

— At this point, our requirement is to expose graph construction from raw 
topology xml and mapping and traversal at best.
I see some of these already defined in private/hwloc.h in the newer version. 
Our problem here Is that we couldn’t build it in embedded mode, which is how we 
are using hwloc.




> On Apr 4, 2018, at 9:13 AM, Balaji, Pavan <bal...@anl.gov> wrote:
> 
> Brice,
> 
> We don't actually care if it is a graph or a different API.  We'll anyway 
> simply parse the graph and create our own internal structures that we can map 
> to our internal algorithms.  We simply need some model (any model) to 
> retrieve the network topology.  That's it.  We'll take care of everything 
> else in MPICH.
> 
>  -- Pavan
> 
>> On Apr 4, 2018, at 12:46 AM, Brice Goglin <brice.gog...@inria.fr> wrote:
>> 
>> If you really want the old netloc API now, you could try hwloc 2.x with
>> the old netloc. But that's certainly not maintained anymore, and that
>> only works for IB while the new netloc should have OPA and Cray support
>> soon.
>> 
>> The plan should rather be to tell us what you need from netloc so that
>> we can reenable it with a good API. We hear lots of people saying they
>> are interested in netloc, but *nobody* ever told us anything about what
>> they want to do for real. And I am not even sure anybody ever played
>> with the old API. This software cannot go forward unless we know where
>> it's going. There are many ways to design the netloc API.
>> 
>> * We had an explicit graph API in the old netloc but that API implied
>> expensive graph algorithmics in the runtimes using it. It seemed
>> unusable for taking decision at runtime anyway, but again ever nobody
>> tried. Also it was rather strange to expose the full graph when you know
>> the fabric is a 3D dragonfly on Cray, etc.
>> 
>> * In the new netloc, we're thinking of having higher-level implicit
>> topologies for each class of fabric (dragon-fly, fat-tree, clos-network,
>> etc) that require more work on the netloc side and easier work in the
>> runtime using it. However that's less portable than exposing the full
>> graph. Not sure which one is best, or if both are needed.
>> 
>> * There are also issues regarding nodes/links failure etc. How do we
>> expose topology changes at runtime? Do we have a daemon running as root
>> in the background, etc?
>> 
>> Lots of question that need to be discussed before we expose a new API In
>> the wild. Unfortunately, we lost several years because of the lack of
>> users' feedback. I don't want to invest time and rush for a new API if
>> MPICH never actually uses it like other people did in the past.
>> 
>> Brice
>> 
>> 
>> 
>> 
>> Le 04/04/2018 à 01:36, Balaji, Pavan a écrit :
>>> Brice,
>>> 
>>> We want to use both hwloc and netloc in mpich.  What are our options here?  
>>> Move back to hwloc-1.x?  That’d be a bummer because we already invested a 
>>> lot of effort to migrate to hwloc-2.x.
>>> 
>>> — Pavan
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Apr 3, 2018, at 6:19 PM, Brice Goglin <brice.gog...@inria.fr> wrote:
>>>> 
>>>> It's not possible now but that would certainly be considered whenever
>>>> people start using the API and linking against libnetloc.
>>>> 
>>>> Brice
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> Le 03/04/2018 à 21:34, Madhu, Kavitha Tiptur a écrit :
>>>>> Hi
>>>>> A follow up question, is it possible to build netloc along with hwloc in 
>>>>> embedd

Re: [hwloc-users] Netloc integration with hwloc

2018-04-03 Thread Madhu, Kavitha Tiptur
Hi
A follow up question, is it possible to build netloc along with hwloc in 
embedded mode?


> On Mar 30, 2018, at 1:34 PM, Brice Goglin <brice.gog...@inria.fr> wrote:
> 
> Hello
> 
> In 2.0, netloc is still highly experimental. Hopefully, a large rework
> will be merged in git master next month for being released in hwloc 2.1.
> 
> Most of the API from the old standalone netloc was made private when
> integrated in hwloc because there wasn't any actual user. The API was
> quite large (things for traversing the graph of both the fabric and the
> servers' internals). We didn't want to expose such a large API before
> getting actual user feedback.
> 
> In short, in your need features, please let us know, so that we can
> discuss what to expose in the public headers and how.
> 
> Brice
> 
> 
> 
> 
> Le 30/03/2018 à 20:14, Madhu, Kavitha Tiptur a écrit :
>> Hi
>> 
>> I need some info on the status of netloc integration with hwloc. I see the 
>> include/netloc.h header is almost empty in hwloc 2.0 and lots of 
>> functionality missing compared to the previous standalone netloc release, 
>> even in private/netloc.h. Am I missing something here?
>> 
>> Thanks
>> Kavitha
>> 
> 
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] NUMA, io and miscellaneous object depths

2018-03-14 Thread Madhu, Kavitha Tiptur
Thanks for the response.

> On Mar 14, 2018, at 4:28 PM, Brice Goglin <brice.gog...@inria.fr> wrote:
> 
> Good point. In theory, that's possible because we only look at cpusets
> (NUMA nodes have cpusets, I/O don't). So the name of the function still
> matches its behavior.
> 
> However it won't happen in practice with the current code because I/O
> are always attached to CPU objects. But it may change in the future with
> things like processing-in-memory etc.
> 
> Instead of calling this function, you could do a while
> (!hwloc_obj_type_is_normal(obj->type)) obj = obj->parent;
> 
> I'll update the doc too. Thanks.
> 
> Brice
> 
> 
> 
> Le 14/03/2018 à 22:16, Madhu, Kavitha Tiptur a écrit :
>> A follow up question, can the call to hwloc_get_non_io_ancestor_obj() return 
>> a numa object? 
>> 
>>> On Mar 14, 2018, at 3:09 PM, Madhu, Kavitha Tiptur <kma...@anl.gov> wrote:
>>> 
>>> Hi
>>> This function was used to query depth of hardware objects of a certain type 
>>> to bind processes to objects at the depth or above in Hydra previously. As 
>>> you pointed out, the functionality makes no sense with NUMA/IO objects 
>>> possibly being at different depths or for objects.
>>> 
>>>> On Mar 14, 2018, at 3:00 PM, Brice Goglin <brice.gog...@inria.fr> wrote:
>>>> 
>>>> Hello
>>>> 
>>>> I can fix the documentation to say that the function always suceeds and
>>>> returns the virtual depth for NUMA/IO/Misc.
>>>> 
>>>> I don't understand your third sentence. If by "actual depth", you mean
>>>> the depth of a (normal) parent where NUMA are attached (for instance the
>>>> depth of Package if NUMAs are attached to Packages), see
>>>> hwloc_get_memory_parents_depth(). However, you may have NUMA/IO/Misc
>>>> attached to parents at different depths, so it doesn't make much sense
>>>> in the general case.
>>>> 
>>>> What do you use this function for? I thought of removing it from 2.0
>>>> because it's hard to define a "usual" order for object types (for
>>>> instance L3 can be above or below NUMA for different modern platforms).
>>>> 
>>>> Brice
>>>> 
>>>> 
>>>> 
>>>> Le 14/03/2018 à 20:24, Madhu, Kavitha Tiptur a écrit :
>>>>> Hello folks,
>>>>> 
>>>>> The function hwloc_get_type_or_above_depth() is supposed to return the 
>>>>> depth of objects of type “type" or above. It internally calls 
>>>>> hwloc_get_type_depth which returns virtual depths to NUMA, IO and misc 
>>>>> objects. In order to retrieve the actual depth of these objects, one 
>>>>> needs to call hwloc_get_obj_depth() with virtual depth. Can the 
>>>>> documentation be updated to cover this? Or are there plans of changing 
>>>>> this behavior?
>>>>> 
>>>>> Thanks
>>>>> Kavitha
>>>>> ___
>>>>> hwloc-users mailing list
>>>>> hwloc-users@lists.open-mpi.org
>>>>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>>>> ___
>>>> hwloc-users mailing list
>>>> hwloc-users@lists.open-mpi.org
>>>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>>> ___
>>> hwloc-users mailing list
>>> hwloc-users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>> ___
>> hwloc-users mailing list
>> hwloc-users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
> 
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] NUMA, io and miscellaneous object depths

2018-03-14 Thread Madhu, Kavitha Tiptur
A follow up question, can the call to hwloc_get_non_io_ancestor_obj() return a 
numa object? 

> On Mar 14, 2018, at 3:09 PM, Madhu, Kavitha Tiptur <kma...@anl.gov> wrote:
> 
> Hi
> This function was used to query depth of hardware objects of a certain type 
> to bind processes to objects at the depth or above in Hydra previously. As 
> you pointed out, the functionality makes no sense with NUMA/IO objects 
> possibly being at different depths or for objects.
> 
>> On Mar 14, 2018, at 3:00 PM, Brice Goglin <brice.gog...@inria.fr> wrote:
>> 
>> Hello
>> 
>> I can fix the documentation to say that the function always suceeds and
>> returns the virtual depth for NUMA/IO/Misc.
>> 
>> I don't understand your third sentence. If by "actual depth", you mean
>> the depth of a (normal) parent where NUMA are attached (for instance the
>> depth of Package if NUMAs are attached to Packages), see
>> hwloc_get_memory_parents_depth(). However, you may have NUMA/IO/Misc
>> attached to parents at different depths, so it doesn't make much sense
>> in the general case.
>> 
>> What do you use this function for? I thought of removing it from 2.0
>> because it's hard to define a "usual" order for object types (for
>> instance L3 can be above or below NUMA for different modern platforms).
>> 
>> Brice
>> 
>> 
>> 
>> Le 14/03/2018 à 20:24, Madhu, Kavitha Tiptur a écrit :
>>> Hello folks,
>>> 
>>> The function hwloc_get_type_or_above_depth() is supposed to return the 
>>> depth of objects of type “type" or above. It internally calls 
>>> hwloc_get_type_depth which returns virtual depths to NUMA, IO and misc 
>>> objects. In order to retrieve the actual depth of these objects, one needs 
>>> to call hwloc_get_obj_depth() with virtual depth. Can the documentation be 
>>> updated to cover this? Or are there plans of changing this behavior?
>>> 
>>> Thanks
>>> Kavitha
>>> ___
>>> hwloc-users mailing list
>>> hwloc-users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>> 
>> ___
>> hwloc-users mailing list
>> hwloc-users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
> 
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] NUMA, io and miscellaneous object depths

2018-03-14 Thread Madhu, Kavitha Tiptur
Hi
This function was used to query depth of hardware objects of a certain type to 
bind processes to objects at the depth or above in Hydra previously. As you 
pointed out, the functionality makes no sense with NUMA/IO objects possibly 
being at different depths or for objects.

> On Mar 14, 2018, at 3:00 PM, Brice Goglin <brice.gog...@inria.fr> wrote:
> 
> Hello
> 
> I can fix the documentation to say that the function always suceeds and
> returns the virtual depth for NUMA/IO/Misc.
> 
> I don't understand your third sentence. If by "actual depth", you mean
> the depth of a (normal) parent where NUMA are attached (for instance the
> depth of Package if NUMAs are attached to Packages), see
> hwloc_get_memory_parents_depth(). However, you may have NUMA/IO/Misc
> attached to parents at different depths, so it doesn't make much sense
> in the general case.
> 
> What do you use this function for? I thought of removing it from 2.0
> because it's hard to define a "usual" order for object types (for
> instance L3 can be above or below NUMA for different modern platforms).
> 
> Brice
> 
> 
> 
> Le 14/03/2018 à 20:24, Madhu, Kavitha Tiptur a écrit :
>> Hello folks,
>> 
>> The function hwloc_get_type_or_above_depth() is supposed to return the depth 
>> of objects of type “type" or above. It internally calls hwloc_get_type_depth 
>> which returns virtual depths to NUMA, IO and misc objects. In order to 
>> retrieve the actual depth of these objects, one needs to call 
>> hwloc_get_obj_depth() with virtual depth. Can the documentation be updated 
>> to cover this? Or are there plans of changing this behavior?
>> 
>> Thanks
>> Kavitha
>> ___
>> hwloc-users mailing list
>> hwloc-users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
> 
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

[hwloc-users] NUMA, io and miscellaneous object depths

2018-03-14 Thread Madhu, Kavitha Tiptur
Hello folks,

The function hwloc_get_type_or_above_depth() is supposed to return the depth of 
objects of type “type" or above. It internally calls hwloc_get_type_depth which 
returns virtual depths to NUMA, IO and misc objects. In order to retrieve the 
actual depth of these objects, one needs to call hwloc_get_obj_depth() with 
virtual depth. Can the documentation be updated to cover this? Or are there 
plans of changing this behavior?

Thanks
Kavitha
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] Machine nodes in hwloc topology

2018-02-05 Thread Madhu, Kavitha Tiptur
Hi

Thanks for the response. Could you also confirm if hwloc topology object would 
have only machine node?

Thanks,
Kavitha



On Feb 5, 2018, at 4:14 PM, Brice Goglin 
<brice.gog...@inria.fr<mailto:brice.gog...@inria.fr>> wrote:

Hello,

Oops, sorry, this sentence is obsolete, I am removing it from the doc right now.

We don't support the assembly of multiple machines in a single hwloc topology 
anymore. For the record, this feature was a very small corner case and it had 
important limitations (you couldn't bind things or use cpusets unless you were 
very careful about which host you were talking about), and it made the core 
hwloc code much more complex.

Thanks for the report
Brice


Le 05/02/2018 à 23:02, Madhu, Kavitha Tiptur a écrit :
Hi

I have a question on topology query. The hwloc 2.0.0 documentation states that 
"Additionally it may assemble the topologies of multiple machines into a single 
one so as to let applications consult the topology of an entire fabric or 
cluster at once.”. Since “system” object type has been removed from hwloc, does 
this statement mean that multiple “machine” nodes in the topology object would 
be combined to one? I can see in function“hwloc_topology_check” that machine 
node is at depth 0 and there are no machine nodes at depth other than 0. Can 
anyone confirm this?

Thanks
Kavitha



___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org<mailto:hwloc-users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org<mailto:hwloc-users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

[hwloc-users] Machine nodes in hwloc topology

2018-02-05 Thread Madhu, Kavitha Tiptur
Hi

I have a question on topology query. The hwloc 2.0.0 documentation states that 
"Additionally it may assemble the topologies of multiple machines into a single 
one so as to let applications consult the topology of an entire fabric or 
cluster at once.”. Since “system” object type has been removed from hwloc, does 
this statement mean that multiple “machine” nodes in the topology object would 
be combined to one? I can see in function“hwloc_topology_check” that machine 
node is at depth 0 and there are no machine nodes at depth other than 0. Can 
anyone confirm this?

Thanks
Kavitha
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users