On Wed, Jul 24, 2019 at 09:46:13PM +, Jeff Squyres (jsquyres) wrote:
> On Jul 24, 2019, at 5:16 PM, Ralph Castain via users
> wrote:
> >
> > It doesn't work that way, as you discovered. You need to add this
> > information at the same place where vader currently calls modex send, and
> >
On Jul 24, 2019, at 5:16 PM, Ralph Castain via users
wrote:
>
> It doesn't work that way, as you discovered. You need to add this information
> at the same place where vader currently calls modex send, and then retrieve
> it at the same place vader currently calls modex recv. Those macros
It doesn't work that way, as you discovered. You need to add this information
at the same place where vader currently calls modex send, and then retrieve it
at the same place vader currently calls modex recv. Those macros don't do an
immediate send/recv like you are thinking - the send simply
Just add it to the existing modex.
-Nathan
> On Jul 22, 2019, at 12:20 PM, Adrian Reber via users
> wrote:
>
> I have most of the code ready, but I still have troubles doing
> OPAL_MODEX_RECV. I am using the following lines, based on the code from
> orte/test/mpi/pmix.c:
>
>
I have most of the code ready, but I still have troubles doing
OPAL_MODEX_RECV. I am using the following lines, based on the code from
orte/test/mpi/pmix.c:
OPAL_MODEX_SEND_VALUE(rc, OPAL_PMIX_LOCAL, "user_ns_id", , OPAL_INT);
This sets rc to 0. For receiving:
OPAL_MODEX_RECV_VALUE(rc,
If that works, then it might be possible to include the namespace ID in the
job-info provided by PMIx at startup - would have to investigate, so please
confirm that the modex option works first.
> On Jul 22, 2019, at 1:22 AM, Gilles Gouaillardet via users
> wrote:
>
> Adrian,
>
>
> An
Adrian,
An option is to involve the modex.
each task would OPAL_MODEX_SEND() its own namespace ID, and then
OPAL_MODEX_RECV()
the one from its peers and decide whether CMA support can be enabled.
Cheers,
Gilles
On 7/22/2019 4:53 PM, Adrian Reber via users wrote:
I had a look at it and
I had a look at it and not sure if it really makes sense.
In btl_vader_{put,get}.c it would be easy to check for the user
namespace ID of the other process, but the function would then just
return OPAL_ERROR a bit earlier instead of as a result of
process_vm_{read,write}v(). Nothing would really
Patches are always welcome. What would be great is a nice big warning that CMA
support is disabled because the processes are on different namespaces. Ideally
all MPI processes should be on the same namespace to ensure the best
performance.
-Nathan
> On Jul 21, 2019, at 2:53 PM, Adrian Reber
For completeness I am mentioning my results also here.
To be able to mount file systems in the container it can only work if
user namespaces are used and even if the user IDs are all the same (in
each container and on the host), to be able to ptrace the kernel also
checks if the processes are in
Gilles,
thanks again. Adding '--mca btl_vader_single_copy_mechanism none' helps
indeed.
The default seems to be 'cma' and that seems to use process_vm_readv()
and process_vm_writev(). That seems to require CAP_SYS_PTRACE, but
telling Podman to give the process CAP_SYS_PTRACE with
Adrian,
Can you try
mpirun --mca btl_vader_copy_mechanism none ...
Please double check the MCA parameter name, I am AFK
IIRC, the default copy mechanism used by vader directly accesses the remote
process address space, and this requires some permission (ptrace?) that might
be dropped by
So upstream Podman was really fast and merged a PR which makes my
wrapper unnecessary:
Add support for --env-host : https://github.com/containers/libpod/pull/3557
As commented in the PR I can now start mpirun with Podman without a
wrapper:
$ mpirun --hostfile ~/hosts --mca orte_tmpdir_base
Not really a relevant reply, however Nomad has task drivers for Docker and
Singularity
https://www.hashicorp.com/blog/singularity-and-hashicorp-nomad-a-perfect-fit
I'm not sure if it woul dbe easier to set up an MPI enviroment with Nomad
though
On Thu, 11 Jul 2019 at 11:08, Adrian Reber via
Gilles,
thanks for pointing out the environment variables. I quickly created a
wrapper which tells Podman to re-export all OMPI_ and PMIX_ variables
(grep "\(PMIX\|OMPI\)"). Now it works:
$ mpirun --hostfile ~/hosts ./wrapper -v /tmp:/tmp --userns=keep-id --net=host
mpi-test /home/mpi/hello
Adrian,
the MPI application relies on some environment variables (they typically
start with OMPI_ and PMIX_).
The MPI application internally uses a PMIx client that must be able to
contact a PMIx server
(that is included in mpirun and the orted daemon(s) spawned on the
remote hosts).
I did a quick test to see if I can use Podman in combination with Open
MPI:
[test@test1 ~]$ mpirun --hostfile ~/hosts podman run
quay.io/adrianreber/mpi-test /home/mpi/hello
Hello, world (1 procs total)
--> Process # 0 of 1 is alive. ->789b8fb622ef
Hello, world (1 procs total)
17 matches
Mail list logo