Re: [OMPI users] Error when using Einstein toolkit

2018-04-04 Thread Jeff Squyres (jsquyres)
Greetings, and welcome to the wonderful world of MPI.  :-)

First thing to note is that there are multiple different software packages that 
implement the MPI specification.  Open MPI -- the mailing list that you sent to 
-- is one of them.  MPICH, from Argonne National Labs, is another.

From the error message you included, it looks like you're using MPICH.  You'll 
need to contact them for any specific assistance with MPICH (they're an 
entirely different group from us).

That being said, the error that you display is usually indicative of an error 
in your program: i.e., you passed a bad communicator argument to 
MPI_Comm_rank().  Double check your source code and make sure that the 
communicator parameter value that you're passing to MPI_Comm_rank() is 
initialized / valid / etc.

Sidenote: As a general rule, we won't do homework for students on this list -- 
so don't just send us your code and say "please fix this for me" (I only 
mention this because unbelievably, some people do exactly this :-( ).  We're 
happy to help with general MPI questions, but you need to do your assignments 
yourself.


> On Apr 4, 2018, at 4:59 PM, Swarnim Shashank  
> wrote:
> 
> Hello,
> 
> I am an undergraduate student. I have just started learning MPI and I have to 
> use the Einstein Toolkit code which uses MPI for my project.
> I get this error when I try to check the output of my simulation:
> 
> 
> Fatal error in PMPI_Comm_rank: Invalid communicator, error stack:
> PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xe0cc28e0, rank=0x7ffd2aa27d38) 
> failed
> PMPI_Comm_rank(68).: Invalid communicator
> [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=873024773
> :
> system msg for write_line failure : Bad file descriptor
> 
> 
> Please let me know what the problem is and how to solve it.
> 
> ​Regards
> Swarnim Shashank​
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Error when using Einstein toolkit

2018-04-04 Thread Swarnim Shashank
Hello,

I am an undergraduate student. I have just started learning MPI and I have
to use the Einstein Toolkit code which uses MPI for my project.
I get this error when I try to check the output of my simulation:








*Fatal error in PMPI_Comm_rank: Invalid communicator, error
stack:PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xe0cc28e0,
rank=0x7ffd2aa27d38) failedPMPI_Comm_rank(68).: Invalid
communicator[unset]: write_line error; fd=-1 buf=:cmd=abort
exitcode=873024773:system msg for write_line failure : Bad file descriptor*


Please let me know what the problem is and how to solve it.

​Regards
Swarnim Shashank​
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Linkage problem

2018-04-04 Thread Jeff Squyres (jsquyres)
On Apr 4, 2018, at 12:58 PM, Quentin Faure  wrote:
> 
> Sorry, I did not see my autocorrect changed some word.
> 
> I added the -l and it did not change anything. Also the mpicxx —showme does 
> not work. It says that the option —showme does not exist

If 'mpicxx --showme' (with 2 dashes) does not work, then you are not using Open 
MPI's mpicxx.  You should check to make sure you are testing what you think you 
are testing.

Note, too, that Nathan was pointing out a missing capital "i" (as in "include") 
not a missing capital "l" (as in "link").  Depending on the font in your mail 
client, it can be difficult to tell the two apart.

He is correct that what you showed was not an *error* -- it was a warning that 
the C++ compiler was telling you that it ignored an argument on the command 
line.  Specifically, you did:

 mpicxx -g -O3  -DLAMMPS_GZIP -DLMP_USER_INTEL -DLMP_MPIIO  
 /usr/lib/openmpi/include -pthread -DFFT_FFTW3 -DFFT_SINGLE   
 -I../../lib/molfile   -c ../create_atoms.cpp

But you missed the -I (capital I, as in indigo, not capital L, as in Llama).  
It should have been:

 mpicxx -g -O3  -DLAMMPS_GZIP -DLMP_USER_INTEL -DLMP_MPIIO -I 
 /usr/lib/openmpi/include -pthread -DFFT_FFTW3 -DFFT_SINGLE   
 -I../../lib/molfile   -c ../create_atoms.cpp

That being said, you shouldn't need to mention /usr/lib/openmpi/include at all 
(even with the -I), because mpicxx will automatically insert that for you.  
Specifically: mpicxx is not a compiler itself -- it's just a "wrapper" around 
the underlying C++ compiler.  All mpicxx does is add some additional command 
line arguments and then invoke the underlying C++ compiler.  When you run 
"mpicxx --showme" (with Open MPI's mpicxx command), it will show you the 
underlying C++ compiler command that it would have invoked.

Similarly, the "ompi_info: error while loading shared libraries: libmpi.so.40: 
cannot open shared object file: No such file or directory" error means that you 
do not have Open MPI's libmpi at the front of your searchable library path.  
See https://www.open-mpi.org/faq/?category=running#adding-ompi-to-path.

-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Linkage problem

2018-04-04 Thread Quentin Faure
Sorry, I did not see my autocorrect changed some word.

I added the -l and it did not change anything. Also the mpicxx —showme does not 
work. It says that the option —showme does not exist

Quentin

On Apr 4, 2018, at 10:45, Quentin Faure 
> wrote:

I had the -l but it did not change anything. Also the mlicxx —showme does not 
work, it says that the option —showmen does not exist.

Quentin

On Apr 3, 2018, at 14:25, Nathan Hjelm > 
wrote:

I guess I should point out the reason the compiler thought you had linker input 
was a missing -I on /usr/lib/openmpi/include . Though that include shouldn't be 
needed as the wrapper will do that for you. You can see what the wrapper passes 
to gcc by running: mpicxx --showme.

-Nathan

On Apr 03, 2018, at 02:16 PM, Quentin Faure 
> wrote:

Hello,


Date: Fri, 30 Mar 2018 14:29:57 +
From: "Jeff Squyres (jsquyres)" >
To: "Open MPI User's List" 
>
Subject: Re: [OMPI users] Linkage problem
Message-ID: 
>
Content-Type: text/plain; charset="utf-8"

On Mar 29, 2018, at 11:19 AM, Quentin Faure 
> wrote:

I would like to use openmpi with a software called LAMMPS. I know it is 
possible when compiling the software to indicate it to use it with openmpi. 
However when I do that I have a warning message telling me that the linkage 
could not have been done (I specified the path for openmpi library and name 
like it is done in LAMMPS manual).

What error message are you getting?

The error I get is:
 /usr/lib/openmpi/include: linker input file unused because linking not done
mpicxx -g -O3  -DLAMMPS_GZIP -DLMP_USER_INTEL -DLMP_MPIIO  
/usr/lib/openmpi/include -pthread -DFFT_FFTW3 -DFFT_SINGLE   
-I../../lib/molfile   -c ../create_atoms.cpp



I tried to reinstall openmpi in two different ways (following the advices of 
people that had LAMMPS and openmpi works together) but it still does not work. 
Also I don?t know if this is part of my problem or not but, the option ?showme 
does not work (command: mpicc ?showme).

What error message are you getting?  It's quite possible that you're using a 
different mpicc (e.g., from a different MPI installation on your same machine), 
and not using the mpicc from the Open MPI that you just installed.

I do not have any errors when I install openmpi, i just have an error when I 
tried to compile my other software with openmpi. Concerning the mpicc command, 
normally there is no other MPI software install on this computer.


Can you send all the information listed here: 
https://www.open-mpi.org/community/help/

I am enclosing the config log file. I tried the command ompi_info —all and I 
got: ompi_info: error while loading shared libraries: libmpi.so.40: cannot open 
shared object file: No such file or directory.

I tried to install the software hwloc and used the command stoop -v but I got 
an error: lstopo: error while loading shared libraries: libhwloc.so15: cannot 
open shared object file: No such file or directory.

One thing that I did not specify about the computer is that it has been build 
and all the software have been install manually, it is not a computer already 
built when it was bought.







--
Jeff Squyres
jsquy...@cisco.com


Quentin
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] running mpi program between my PC and an ARM-architektur raspberry

2018-04-04 Thread George Reeke
On Wed, 2018-04-04 at 11:57 -0400, George Bosilca wrote:
> We can always build complicated solutions, but in some cases sane and
> simple solutions exists. Let me clear some of the misinformation in
> this thread.
> 
Oh, well, when I wrote the stuff I described earlier, it was before
MPI existed, or at least before I had heard of it, and I had access
only to a proprietary message passing library that just transmitted
byte strings.  Clearly you are right about simplicity for most cases.
My solution might still occasionally be useful for its ability to
collect a whole tree of setup information (for example, for a
multi-layer neural network) with compile-time code analysis into a
"memory pool" and broadcast the whole thing with a single call
when changes occur at various stages of a computation, as this was
my original purpose (and I still use it after replacing the original
message passing calls with corresponding mpi calls even though all
the processors are now equivalent Intel cpus).  Did I mention:
pointers within the data tree are maintained on each processor with
likely different values.  Oh, and if anybody wants this, you have to
accept GPL licensing.
George Reeke 


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Linkage problem

2018-04-04 Thread Quentin Faure
I had the -l but it did not change anything. Also the mlicxx —showme does not 
work, it says that the option —showmen does not exist.

Quentin

On Apr 3, 2018, at 14:25, Nathan Hjelm > 
wrote:

I guess I should point out the reason the compiler thought you had linker input 
was a missing -I on /usr/lib/openmpi/include . Though that include shouldn't be 
needed as the wrapper will do that for you. You can see what the wrapper passes 
to gcc by running: mpicxx --showme.

-Nathan

On Apr 03, 2018, at 02:16 PM, Quentin Faure 
> wrote:

Hello,


Date: Fri, 30 Mar 2018 14:29:57 +
From: "Jeff Squyres (jsquyres)" >
To: "Open MPI User's List" 
>
Subject: Re: [OMPI users] Linkage problem
Message-ID: 
>
Content-Type: text/plain; charset="utf-8"

On Mar 29, 2018, at 11:19 AM, Quentin Faure 
> wrote:

I would like to use openmpi with a software called LAMMPS. I know it is 
possible when compiling the software to indicate it to use it with openmpi. 
However when I do that I have a warning message telling me that the linkage 
could not have been done (I specified the path for openmpi library and name 
like it is done in LAMMPS manual).

What error message are you getting?

The error I get is:
 /usr/lib/openmpi/include: linker input file unused because linking not done
mpicxx -g -O3  -DLAMMPS_GZIP -DLMP_USER_INTEL -DLMP_MPIIO  
/usr/lib/openmpi/include -pthread -DFFT_FFTW3 -DFFT_SINGLE   
-I../../lib/molfile   -c ../create_atoms.cpp



I tried to reinstall openmpi in two different ways (following the advices of 
people that had LAMMPS and openmpi works together) but it still does not work. 
Also I don?t know if this is part of my problem or not but, the option ?showme 
does not work (command: mpicc ?showme).

What error message are you getting?  It's quite possible that you're using a 
different mpicc (e.g., from a different MPI installation on your same machine), 
and not using the mpicc from the Open MPI that you just installed.

I do not have any errors when I install openmpi, i just have an error when I 
tried to compile my other software with openmpi. Concerning the mpicc command, 
normally there is no other MPI software install on this computer.


Can you send all the information listed here: 
https://www.open-mpi.org/community/help/

I am enclosing the config log file. I tried the command ompi_info —all and I 
got: ompi_info: error while loading shared libraries: libmpi.so.40: cannot open 
shared object file: No such file or directory.

I tried to install the software hwloc and used the command stoop -v but I got 
an error: lstopo: error while loading shared libraries: libhwloc.so15: cannot 
open shared object file: No such file or directory.

One thing that I did not specify about the computer is that it has been build 
and all the software have been install manually, it is not a computer already 
built when it was bought.







--
Jeff Squyres
jsquy...@cisco.com


Quentin
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [hwloc-users] Netloc integration with hwloc

2018-04-04 Thread Madhu, Kavitha Tiptur

>> 
>> — I tried building older netloc with hwloc 2.0 and it throws compiler 
>> errors. Note that netloc was cloned from it’s git repo.
> 
> My guess is that the "map" part that joins netloc's info about the
> fabric with hwloc's info about the nodes doesn't like hwloc 2.0. But
> that should be easy to disable in the Makefiles and/or to update for
> hwloc 2.0.
> 

—We do need the map functionality since we need to identify which processor 
core is mapped to which network node (from my understanding of the 
documentation and the definition of mapping, 
Please correct me if I am wrong here). My other concern is, in the older 
version of netloc, netloc_ib_gather_raw is not listing any subnets on the 
cluster where the newer version built within hwloc reports some. 
I compared the perl scripts and there doesn’t seem to be much difference in the 
two other than the newer version adding some pattern matching for hfi.


 The plan should rather be to tell us what you need from netloc so that
 we can reenable it with a good API. We hear lots of people saying they
 are interested in netloc, but *nobody* ever told us anything about what
 they want to do for real. And I am not even sure anybody ever played
 with the old API. This software cannot go forward unless we know where
 it's going. There are many ways to design the netloc API.
>> — At this point, our requirement is to expose graph construction from raw 
>> topology xml and mapping and traversal at best.
>> I see some of these already defined in private/hwloc.h in the newer version. 
>> Our problem here Is that we couldn’t build it in embedded mode, which is how 
>> we are using hwloc.
> 
> Can't you hack your build system to build hwloc in standalone instead of
> embedded mode for testing? Or use an external hwloc instead of your
> embedded one?

— We can do this, shouldn’t be a major concern. But we can only make this work 
if we use the newer hwloc version and expose some of the functions as I 
mentioned. 

> I'd like to get feedback about private/netloc.h before making some of it
> public.
> 
> I'll look at making libnetloc embeddable in 2.1.
> 
> Brice
> 
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [OMPI users] running mpi program between my PC and an ARM-architektur raspberry

2018-04-04 Thread George Bosilca
We can always build complicated solutions, but in some cases sane and
simple solutions exists. Let me clear some of the misinformation in this
thread.

The MPI standard is clear what type of conversion is allowed and how it
should be done (for more info read Chapter 4): no type conversion is
allowed (don't send a long and expect a short), for everything else
truncation to a sane value is the rule. This is nothing new, the rules are
similar to other data conversion standards such as XDR. Thus, if you send
an MPI_LONG from a machine where long is 8 bytes to an MPI_LONG on a
machine where it is 4 bytes, you will get a valid number when possible,
otherwise [MAX|MIN]_LONG on the target machine. For floating point data the
rules are more complicated due to potential exponent and mantissa length
mismatch, but in general if the data is representable on the target
architecture a sane value is obtained. Otherwise, the data will be replaced
with one of the extremes. This also applies to file operations for as long
as the correct external32 type is used.

The datatype engine in Open MPI supports all these conversions, for as long
as the source and target machine are correctly identified. This
identification is only enabled when OMPI is compiled with support for
heterogeneous architectures.

  George.


On Wed, Apr 4, 2018 at 11:35 AM, George Reeke 
wrote:

> Dear colleagues,
>FWIW, years ago I was looking at this problem and developed my
> own solution (for C programs) with this structure:
> --Be sure your code that works with ambiguous-length types like
> 'long' can handle different sizes.  I have replacement unambiguous
> typedef names like 'si32', 'ui64' etc. for the usual signed and
> unsigned fixed-point numbers.
> --Run your source code through a utility that analyzes a specified
> set of variables, structures, and unions that will be used in
> messages and builds tables giving their included types.  Include
> these tables in your makefiles.
> --Replace malloc, calloc, realloc, free with my own versions,
> where you pass a type argument pointing into to this table along
> with number of items, etc.  There are separate memory pools for
> items that will be passed often, rarely, or never, just to make
> things more efficient.
> --Do all these calls on the rank 0 processor at program startup and
> call a special broadcast routine that sets up data structures on
> all the other processors to manage the conversions.
> --Replace mpi message passing and broadcast calls with new routines
> that use the type information (stored by malloc, calloc, etc.) to
> determine what variables to lengthen or shorten or swap on arrival
> at the destination.  Regular mpi message passing is used inside
> these routines and can be used natively for variables that do not
> ever need length changes or byte swapping (i.e. text).  I have a
> simple set of routines to gather statistics across nodes with sum,
> max, etc. operations, but not too fancy.  I do not have versions of
> any of the mpi operations that collect or distribute matrices, etc.
> --A little routine must be written for every union.  This is called
> from the package when a union is received to determine which
> member is present so the right conversion can be done.
> --There was a hook to handle IBM (hex exponent) vs IEEE floating
> point, but the code never got written.
>Because this is all very complicated and demanding on the
> programmer, I am not making it publicly available, but will be
> glad to send it privately to anyone who really thinks they can
> use it and is willing to get their hands dirty.
>George Reeke (private email: re...@rockefeller.edu)
>
>
>
>
>
>
> On Tue, 2018-04-03 at 23:39 +, Jeff Squyres (jsquyres) wrote:
> > On Apr 2, 2018, at 1:39 PM, dpchoudh .  wrote:
> > >
> > > Sorry for a pedantic follow up:
> > >
> > > Is this (heterogeneous cluster support) something that is specified by
> > > the MPI standard (perhaps as an optional component)?
> >
> > The MPI standard states that if you send a message, you should receive
> the same values at the receiver.  E.g., if you sent int=3, you should
> receive int=3, even if one machine is big endian and the other machine is
> little endian.
> >
> > It does not specify what happens when data sizes are different (e.g., if
> type X is 4 bits on one side and 8 bits on the other) -- there's no good
> answers on what to do there.
> >
> > > Do people know if
> > > MPICH. MVAPICH, Intel MPI etc support it? (I do realize this is an
> > > OpenMPI forum)
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] running mpi program between my PC and an ARM-architektur raspberry

2018-04-04 Thread George Reeke
Dear colleagues,
   FWIW, years ago I was looking at this problem and developed my
own solution (for C programs) with this structure:
--Be sure your code that works with ambiguous-length types like
'long' can handle different sizes.  I have replacement unambiguous
typedef names like 'si32', 'ui64' etc. for the usual signed and
unsigned fixed-point numbers.
--Run your source code through a utility that analyzes a specified
set of variables, structures, and unions that will be used in
messages and builds tables giving their included types.  Include
these tables in your makefiles.
--Replace malloc, calloc, realloc, free with my own versions,
where you pass a type argument pointing into to this table along
with number of items, etc.  There are separate memory pools for
items that will be passed often, rarely, or never, just to make
things more efficient.
--Do all these calls on the rank 0 processor at program startup and
call a special broadcast routine that sets up data structures on
all the other processors to manage the conversions.
--Replace mpi message passing and broadcast calls with new routines
that use the type information (stored by malloc, calloc, etc.) to
determine what variables to lengthen or shorten or swap on arrival
at the destination.  Regular mpi message passing is used inside
these routines and can be used natively for variables that do not
ever need length changes or byte swapping (i.e. text).  I have a
simple set of routines to gather statistics across nodes with sum,
max, etc. operations, but not too fancy.  I do not have versions of
any of the mpi operations that collect or distribute matrices, etc.
--A little routine must be written for every union.  This is called
from the package when a union is received to determine which
member is present so the right conversion can be done.
--There was a hook to handle IBM (hex exponent) vs IEEE floating
point, but the code never got written.
   Because this is all very complicated and demanding on the
programmer, I am not making it publicly available, but will be
glad to send it privately to anyone who really thinks they can
use it and is willing to get their hands dirty.
   George Reeke (private email: re...@rockefeller.edu)






On Tue, 2018-04-03 at 23:39 +, Jeff Squyres (jsquyres) wrote:
> On Apr 2, 2018, at 1:39 PM, dpchoudh .  wrote:
> > 
> > Sorry for a pedantic follow up:
> > 
> > Is this (heterogeneous cluster support) something that is specified by
> > the MPI standard (perhaps as an optional component)?
> 
> The MPI standard states that if you send a message, you should receive the 
> same values at the receiver.  E.g., if you sent int=3, you should receive 
> int=3, even if one machine is big endian and the other machine is little 
> endian.
> 
> It does not specify what happens when data sizes are different (e.g., if type 
> X is 4 bits on one side and 8 bits on the other) -- there's no good answers 
> on what to do there.
> 
> > Do people know if
> > MPICH. MVAPICH, Intel MPI etc support it? (I do realize this is an
> > OpenMPI forum)


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [hwloc-users] Netloc integration with hwloc

2018-04-04 Thread Brice Goglin
Le 04/04/2018 à 16:49, Madhu, Kavitha Tiptur a écrit :
>
> — I tried building older netloc with hwloc 2.0 and it throws compiler errors. 
> Note that netloc was cloned from it’s git repo.

My guess is that the "map" part that joins netloc's info about the
fabric with hwloc's info about the nodes doesn't like hwloc 2.0. But
that should be easy to disable in the Makefiles and/or to update for
hwloc 2.0.

>>> The plan should rather be to tell us what you need from netloc so that
>>> we can reenable it with a good API. We hear lots of people saying they
>>> are interested in netloc, but *nobody* ever told us anything about what
>>> they want to do for real. And I am not even sure anybody ever played
>>> with the old API. This software cannot go forward unless we know where
>>> it's going. There are many ways to design the netloc API.
> — At this point, our requirement is to expose graph construction from raw 
> topology xml and mapping and traversal at best.
> I see some of these already defined in private/hwloc.h in the newer version. 
> Our problem here Is that we couldn’t build it in embedded mode, which is how 
> we are using hwloc.

Can't you hack your build system to build hwloc in standalone instead of
embedded mode for testing? Or use an external hwloc instead of your
embedded one?
I'd like to get feedback about private/netloc.h before making some of it
public.

I'll look at making libnetloc embeddable in 2.1.

Brice

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] Netloc integration with hwloc

2018-04-04 Thread Madhu, Kavitha Tiptur
Hi
Chiming in on this conversation, we have a few questions/concerns with some of 
the responses we received from you.
>> 
>> If you really want the old netloc API now, you could try hwloc 2.x with
>> the old netloc. But that's certainly not maintained anymore, and that
>> only works for IB while the new netloc should have OPA and Cray support
>> soon.

— I tried building older netloc with hwloc 2.0 and it throws compiler errors. 
Note that netloc was cloned from it’s git repo.

 
>> 
>> The plan should rather be to tell us what you need from netloc so that
>> we can reenable it with a good API. We hear lots of people saying they
>> are interested in netloc, but *nobody* ever told us anything about what
>> they want to do for real. And I am not even sure anybody ever played
>> with the old API. This software cannot go forward unless we know where
>> it's going. There are many ways to design the netloc API.

— At this point, our requirement is to expose graph construction from raw 
topology xml and mapping and traversal at best.
I see some of these already defined in private/hwloc.h in the newer version. 
Our problem here Is that we couldn’t build it in embedded mode, which is how we 
are using hwloc.




> On Apr 4, 2018, at 9:13 AM, Balaji, Pavan  wrote:
> 
> Brice,
> 
> We don't actually care if it is a graph or a different API.  We'll anyway 
> simply parse the graph and create our own internal structures that we can map 
> to our internal algorithms.  We simply need some model (any model) to 
> retrieve the network topology.  That's it.  We'll take care of everything 
> else in MPICH.
> 
>  -- Pavan
> 
>> On Apr 4, 2018, at 12:46 AM, Brice Goglin  wrote:
>> 
>> If you really want the old netloc API now, you could try hwloc 2.x with
>> the old netloc. But that's certainly not maintained anymore, and that
>> only works for IB while the new netloc should have OPA and Cray support
>> soon.
>> 
>> The plan should rather be to tell us what you need from netloc so that
>> we can reenable it with a good API. We hear lots of people saying they
>> are interested in netloc, but *nobody* ever told us anything about what
>> they want to do for real. And I am not even sure anybody ever played
>> with the old API. This software cannot go forward unless we know where
>> it's going. There are many ways to design the netloc API.
>> 
>> * We had an explicit graph API in the old netloc but that API implied
>> expensive graph algorithmics in the runtimes using it. It seemed
>> unusable for taking decision at runtime anyway, but again ever nobody
>> tried. Also it was rather strange to expose the full graph when you know
>> the fabric is a 3D dragonfly on Cray, etc.
>> 
>> * In the new netloc, we're thinking of having higher-level implicit
>> topologies for each class of fabric (dragon-fly, fat-tree, clos-network,
>> etc) that require more work on the netloc side and easier work in the
>> runtime using it. However that's less portable than exposing the full
>> graph. Not sure which one is best, or if both are needed.
>> 
>> * There are also issues regarding nodes/links failure etc. How do we
>> expose topology changes at runtime? Do we have a daemon running as root
>> in the background, etc?
>> 
>> Lots of question that need to be discussed before we expose a new API In
>> the wild. Unfortunately, we lost several years because of the lack of
>> users' feedback. I don't want to invest time and rush for a new API if
>> MPICH never actually uses it like other people did in the past.
>> 
>> Brice
>> 
>> 
>> 
>> 
>> Le 04/04/2018 à 01:36, Balaji, Pavan a écrit :
>>> Brice,
>>> 
>>> We want to use both hwloc and netloc in mpich.  What are our options here?  
>>> Move back to hwloc-1.x?  That’d be a bummer because we already invested a 
>>> lot of effort to migrate to hwloc-2.x.
>>> 
>>> — Pavan
>>> 
>>> Sent from my iPhone
>>> 
 On Apr 3, 2018, at 6:19 PM, Brice Goglin  wrote:
 
 It's not possible now but that would certainly be considered whenever
 people start using the API and linking against libnetloc.
 
 Brice
 
 
 
 
> Le 03/04/2018 à 21:34, Madhu, Kavitha Tiptur a écrit :
> Hi
> A follow up question, is it possible to build netloc along with hwloc in 
> embedded mode?
> 
> 
>> On Mar 30, 2018, at 1:34 PM, Brice Goglin  wrote:
>> 
>> Hello
>> 
>> In 2.0, netloc is still highly experimental. Hopefully, a large rework
>> will be merged in git master next month for being released in hwloc 2.1.
>> 
>> Most of the API from the old standalone netloc was made private when
>> integrated in hwloc because there wasn't any actual user. The API was
>> quite large (things for traversing the graph of both the fabric and the
>> servers' internals). We didn't want to expose such a large API before
>> getting actual user feedback.
>> 
>> 

Re: [hwloc-users] Netloc integration with hwloc

2018-04-04 Thread Balaji, Pavan
Brice,

We don't actually care if it is a graph or a different API.  We'll anyway 
simply parse the graph and create our own internal structures that we can map 
to our internal algorithms.  We simply need some model (any model) to retrieve 
the network topology.  That's it.  We'll take care of everything else in MPICH.

  -- Pavan

> On Apr 4, 2018, at 12:46 AM, Brice Goglin  wrote:
> 
> If you really want the old netloc API now, you could try hwloc 2.x with
> the old netloc. But that's certainly not maintained anymore, and that
> only works for IB while the new netloc should have OPA and Cray support
> soon.
> 
> The plan should rather be to tell us what you need from netloc so that
> we can reenable it with a good API. We hear lots of people saying they
> are interested in netloc, but *nobody* ever told us anything about what
> they want to do for real. And I am not even sure anybody ever played
> with the old API. This software cannot go forward unless we know where
> it's going. There are many ways to design the netloc API.
> 
> * We had an explicit graph API in the old netloc but that API implied
> expensive graph algorithmics in the runtimes using it. It seemed
> unusable for taking decision at runtime anyway, but again ever nobody
> tried. Also it was rather strange to expose the full graph when you know
> the fabric is a 3D dragonfly on Cray, etc.
> 
> * In the new netloc, we're thinking of having higher-level implicit
> topologies for each class of fabric (dragon-fly, fat-tree, clos-network,
> etc) that require more work on the netloc side and easier work in the
> runtime using it. However that's less portable than exposing the full
> graph. Not sure which one is best, or if both are needed.
> 
> * There are also issues regarding nodes/links failure etc. How do we
> expose topology changes at runtime? Do we have a daemon running as root
> in the background, etc?
> 
> Lots of question that need to be discussed before we expose a new API In
> the wild. Unfortunately, we lost several years because of the lack of
> users' feedback. I don't want to invest time and rush for a new API if
> MPICH never actually uses it like other people did in the past.
> 
> Brice
> 
> 
> 
> 
> Le 04/04/2018 à 01:36, Balaji, Pavan a écrit :
>> Brice,
>> 
>> We want to use both hwloc and netloc in mpich.  What are our options here?  
>> Move back to hwloc-1.x?  That’d be a bummer because we already invested a 
>> lot of effort to migrate to hwloc-2.x.
>> 
>>  — Pavan
>> 
>> Sent from my iPhone
>> 
>>> On Apr 3, 2018, at 6:19 PM, Brice Goglin  wrote:
>>> 
>>> It's not possible now but that would certainly be considered whenever
>>> people start using the API and linking against libnetloc.
>>> 
>>> Brice
>>> 
>>> 
>>> 
>>> 
 Le 03/04/2018 à 21:34, Madhu, Kavitha Tiptur a écrit :
 Hi
 A follow up question, is it possible to build netloc along with hwloc in 
 embedded mode?
 
 
> On Mar 30, 2018, at 1:34 PM, Brice Goglin  wrote:
> 
> Hello
> 
> In 2.0, netloc is still highly experimental. Hopefully, a large rework
> will be merged in git master next month for being released in hwloc 2.1.
> 
> Most of the API from the old standalone netloc was made private when
> integrated in hwloc because there wasn't any actual user. The API was
> quite large (things for traversing the graph of both the fabric and the
> servers' internals). We didn't want to expose such a large API before
> getting actual user feedback.
> 
> In short, in your need features, please let us know, so that we can
> discuss what to expose in the public headers and how.
> 
> Brice
> 
> 
> 
> 
>> Le 30/03/2018 à 20:14, Madhu, Kavitha Tiptur a écrit :
>> Hi
>> 
>> I need some info on the status of netloc integration with hwloc. I see 
>> the include/netloc.h header is almost empty in hwloc 2.0 and lots of 
>> functionality missing compared to the previous standalone netloc 
>> release, even in private/netloc.h. Am I missing something here?
>> 
>> Thanks
>> Kavitha
>> 
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
 ___
 hwloc-users mailing list
 hwloc-users@lists.open-mpi.org
 https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>>> ___
>>> hwloc-users mailing list
>>> hwloc-users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>> ___
>> hwloc-users mailing list
>> hwloc-users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
> 
> ___
> hwloc-users mailing list
> 

Re: [hwloc-users] Netloc integration with hwloc

2018-04-04 Thread Josh Hursey
I'm also interested in re-igniting the discussion, if others are also
interested. I'm curious about the current state of netloc, and what we all
want it to look like. Maybe we should move the discussion to the devel list
or have a teleconf or something to kick things off?

On Wed, Apr 4, 2018 at 12:46 AM, Brice Goglin  wrote:

> If you really want the old netloc API now, you could try hwloc 2.x with
> the old netloc. But that's certainly not maintained anymore, and that
> only works for IB while the new netloc should have OPA and Cray support
> soon.
>
> The plan should rather be to tell us what you need from netloc so that
> we can reenable it with a good API. We hear lots of people saying they
> are interested in netloc, but *nobody* ever told us anything about what
> they want to do for real. And I am not even sure anybody ever played
> with the old API. This software cannot go forward unless we know where
> it's going. There are many ways to design the netloc API.
>
> * We had an explicit graph API in the old netloc but that API implied
> expensive graph algorithmics in the runtimes using it. It seemed
> unusable for taking decision at runtime anyway, but again ever nobody
> tried. Also it was rather strange to expose the full graph when you know
> the fabric is a 3D dragonfly on Cray, etc.
>
> * In the new netloc, we're thinking of having higher-level implicit
> topologies for each class of fabric (dragon-fly, fat-tree, clos-network,
> etc) that require more work on the netloc side and easier work in the
> runtime using it. However that's less portable than exposing the full
> graph. Not sure which one is best, or if both are needed.
>
> * There are also issues regarding nodes/links failure etc. How do we
> expose topology changes at runtime? Do we have a daemon running as root
> in the background, etc?
>
> Lots of question that need to be discussed before we expose a new API In
> the wild. Unfortunately, we lost several years because of the lack of
> users' feedback. I don't want to invest time and rush for a new API if
> MPICH never actually uses it like other people did in the past.
>
> Brice
>
>
>
>
> Le 04/04/2018 à 01:36, Balaji, Pavan a écrit :
> > Brice,
> >
> > We want to use both hwloc and netloc in mpich.  What are our options
> here?  Move back to hwloc-1.x?  That’d be a bummer because we already
> invested a lot of effort to migrate to hwloc-2.x.
> >
> >   — Pavan
> >
> > Sent from my iPhone
> >
> >> On Apr 3, 2018, at 6:19 PM, Brice Goglin  wrote:
> >>
> >> It's not possible now but that would certainly be considered whenever
> >> people start using the API and linking against libnetloc.
> >>
> >> Brice
> >>
> >>
> >>
> >>
> >>> Le 03/04/2018 à 21:34, Madhu, Kavitha Tiptur a écrit :
> >>> Hi
> >>> A follow up question, is it possible to build netloc along with hwloc
> in embedded mode?
> >>>
> >>>
>  On Mar 30, 2018, at 1:34 PM, Brice Goglin 
> wrote:
> 
>  Hello
> 
>  In 2.0, netloc is still highly experimental. Hopefully, a large rework
>  will be merged in git master next month for being released in hwloc
> 2.1.
> 
>  Most of the API from the old standalone netloc was made private when
>  integrated in hwloc because there wasn't any actual user. The API was
>  quite large (things for traversing the graph of both the fabric and
> the
>  servers' internals). We didn't want to expose such a large API before
>  getting actual user feedback.
> 
>  In short, in your need features, please let us know, so that we can
>  discuss what to expose in the public headers and how.
> 
>  Brice
> 
> 
> 
> 
> > Le 30/03/2018 à 20:14, Madhu, Kavitha Tiptur a écrit :
> > Hi
> >
> > I need some info on the status of netloc integration with hwloc. I
> see the include/netloc.h header is almost empty in hwloc 2.0 and lots of
> functionality missing compared to the previous standalone netloc release,
> even in private/netloc.h. Am I missing something here?
> >
> > Thanks
> > Kavitha
> >
>  ___
>  hwloc-users mailing list
>  hwloc-users@lists.open-mpi.org
>  https://lists.open-mpi.org/mailman/listinfo/hwloc-users
> >>> ___
> >>> hwloc-users mailing list
> >>> hwloc-users@lists.open-mpi.org
> >>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
> >> ___
> >> hwloc-users mailing list
> >> hwloc-users@lists.open-mpi.org
> >> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
> > ___
> > hwloc-users mailing list
> > hwloc-users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
>