Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-28 Thread Mark Santcroos
Hi Brock, Angel, Reuti,



You might want to look at a tool we developed:
http://radical-cybertools.github.io/radical-pilot/index.html

This was actually one of the drivers for isolating the persistent ORTE DVM 
thats being discussed in this thread.

With RADICAL-Pilot you can use a Python API to launch an ORTE DVM on a 
computational resource and then run tasks on top of that.

Happy to answer questions off-list.



Regards,

Mark
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread r...@open-mpi.org

> On Feb 27, 2017, at 9:39 AM, Reuti  wrote:
> 
> 
>> Am 27.02.2017 um 18:24 schrieb Angel de Vicente :
>> 
>> […]
>> 
>> For a small group of users if the DVM can run with my user and there is
>> no restriction on who can use it or if I somehow can authorize others to
>> use it (via an authority file or similar) that should be enough.
> 
> AFAICS there is no user authorization at all. Everyone can hijack a running 
> DVM once he knows the URI. The only problem might be, that all processes are 
> running under the account of the user who started the DVM. I.e. output files 
> have to go to the home directory of this user, as any other user can't write 
> to his own directory any longer this way.

We can add some authorization protection, at least at the user/group level. One 
can resolve the directory issue by creating some place that has group 
authorities, and then requesting that to be the working directory.

> 
> Running the DVM under root might help, but this would be a high risk that any 
> faulty script might write to a place where sensible system information is 
> stored and may leave the machine unusable afterwards.
> 

I would advise against that

> My first attempts using DVM often leads to a terminated DVM once a process 
> returned with a non-zero exit code. But once the DVM is gone, the queued jobs 
> might be lost too I fear. I would wish that the DVM could be more forgivable 
> (or this feature be adjustable what to do in case of a non-zero exit code).

We just fixed that issue the other day :-)

> 
> -- Reuti
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread Angel de Vicente
Hi,

Reuti  writes:
> At first I thought you want to run a queuing system inside a queuing
> system, but this looks like you want to replace the resource manager.

yes, if this could work reasonably well, we could do without the
resource manager.

> Under which user account the DVM daemons will run? Are all users using the 
> same account?

Well, if this could work only for one user, this could still be useful
as I could use it as I do now use GNU Parallel or a private Condor
system, where I can submit hundreds of jobs, but make sure they get
executed without oversubscribing.

For a small group of users if the DVM can run with my user and there is
no restriction on who can use it or if I somehow can authorize others to
use it (via an authority file or similar) that should be enough.

Thanks,
-- 
Ángel de Vicente
http://www.iac.es/galeria/angelv/  
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread Reuti

> Am 27.02.2017 um 18:24 schrieb Angel de Vicente :
> 
> […]
> 
> For a small group of users if the DVM can run with my user and there is
> no restriction on who can use it or if I somehow can authorize others to
> use it (via an authority file or similar) that should be enough.

AFAICS there is no user authorization at all. Everyone can hijack a running DVM 
once he knows the URI. The only problem might be, that all processes are 
running under the account of the user who started the DVM. I.e. output files 
have to go to the home directory of this user, as any other user can't write to 
his own directory any longer this way.

Running the DVM under root might help, but this would be a high risk that any 
faulty script might write to a place where sensible system information is 
stored and may leave the machine unusable afterwards.

My first attempts using DVM often leads to a terminated DVM once a process 
returned with a non-zero exit code. But once the DVM is gone, the queued jobs 
might be lost too I fear. I would wish that the DVM could be more forgivable 
(or this feature be adjustable what to do in case of a non-zero exit code).

-- Reuti


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread Reuti
Hi,

> Am 27.02.2017 um 14:33 schrieb Angel de Vicente :
> 
> Hi,
> 
> "r...@open-mpi.org"  writes:
>>> With the DVM, is it possible to keep these jobs in some sort of queue,
>>> so that they will be executed when the cores get free?
>> 
>> It wouldn’t be hard to do so - as long as it was just a simple FIFO 
>> scheduler. I wouldn’t want it to get too complex.
> 
> a simple FIFO should be probably enough. This can be useful as a simple
> way to make a multi-core machine accessible to a small group of (friendly)
> users, making sure that they don't oversubscribe the machine, but
> without going the full route of installing/maintaining a full resource
> manager.

At first I thought you want to run a queuing system inside a queuing system, 
but this looks like you want to replace the resource manager.

Under which user account the DVM daemons will run? Are all users using the same 
account?

-- Reuti


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread Angel de Vicente
Hi,

"r...@open-mpi.org"  writes:
>> With the DVM, is it possible to keep these jobs in some sort of queue,
>> so that they will be executed when the cores get free?
>
> It wouldn’t be hard to do so - as long as it was just a simple FIFO 
> scheduler. I wouldn’t want it to get too complex.

a simple FIFO should be probably enough. This can be useful as a simple
way to make a multi-core machine accessible to a small group of (friendly)
users, making sure that they don't oversubscribe the machine, but
without going the full route of installing/maintaining a full resource
manager. 

Cheers,
-- 
Ángel de Vicente
http://www.iac.es/galeria/angelv/  
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread r...@open-mpi.org

> On Feb 27, 2017, at 4:58 AM, Angel de Vicente  wrote:
> 
> Hi,
> 
> "r...@open-mpi.org"  writes:
>> You might want to try using the DVM (distributed virtual machine)
>> mode in ORTE. You can start it on an allocation using the “orte-dvm”
>> cmd, and then submit jobs to it with “mpirun --hnp ”, where foo
>> is either the contact info printed out by orte-dvm, or the name of
>> the file you told orte-dvm to put that info in. You’ll need to take
>> it from OMPI master at this point.
> 
> this question looked interesting so I gave it a try. In a cluster with
> Slurm I had no problem submitting a job which launched an orte-dvm
> -report-uri ... and then use that file to launch jobs onto that virtual
> machine via orte-submit. 
> 
> To be useful to us at this point, I should be able to start executing
> jobs if there are cores available and just hold them in a queue if the
> cores are already filled. At this point this is not happenning, and if I
> try to submit a second job while the previous one has not finished, I
> get a message like:
> 
> ,
> | DVM ready
> | --
> | All nodes which are allocated for this job are already filled.
> | --
> `
> 
> With the DVM, is it possible to keep these jobs in some sort of queue,
> so that they will be executed when the cores get free?

It wouldn’t be hard to do so - as long as it was just a simple FIFO scheduler. 
I wouldn’t want it to get too complex.

> 
> Thanks,
> -- 
> Ángel de Vicente
> http://www.iac.es/galeria/angelv/  
> -
> ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
> Datos, acceda a http://www.iac.es/disclaimer.php
> WARNING: For more information on privacy and fulfilment of the Law concerning 
> the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread Angel de Vicente
Hi,

"r...@open-mpi.org"  writes:
> You might want to try using the DVM (distributed virtual machine)
> mode in ORTE. You can start it on an allocation using the “orte-dvm”
> cmd, and then submit jobs to it with “mpirun --hnp ”, where foo
> is either the contact info printed out by orte-dvm, or the name of
> the file you told orte-dvm to put that info in. You’ll need to take
> it from OMPI master at this point.

this question looked interesting so I gave it a try. In a cluster with
Slurm I had no problem submitting a job which launched an orte-dvm
-report-uri ... and then use that file to launch jobs onto that virtual
machine via orte-submit. 

To be useful to us at this point, I should be able to start executing
jobs if there are cores available and just hold them in a queue if the
cores are already filled. At this point this is not happenning, and if I
try to submit a second job while the previous one has not finished, I
get a message like:

,
| DVM ready
| --
| All nodes which are allocated for this job are already filled.
| --
`

With the DVM, is it possible to keep these jobs in some sort of queue,
so that they will be executed when the cores get free?

Thanks,
-- 
Ángel de Vicente
http://www.iac.es/galeria/angelv/  
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-23 Thread r...@open-mpi.org
You might want to try using the DVM (distributed virtual machine) mode in ORTE. 
You can start it on an allocation using the “orte-dvm” cmd, and then submit 
jobs to it with “mpirun --hnp ”, where foo is either the contact info 
printed out by orte-dvm, or the name of the file you told orte-dvm to put that 
info in. You’ll need to take it from OMPI master at this point.

Alternatively, you can get just the DVM bits by downloading the PMIx Reference 
Server (https://github.com/pmix/pmix-reference-server 
). It’s just ORTE, but with it 
locked to the DVM operation. So a simple “psrvr” starts the machine, and then 
“prun” executes cmds (supports all the orterun options, doesn’t need to be told 
how to contact psrvr).

Both will allow you to run serial as well as parallel codes (so long as they 
are built against OMPI master). We are working on providing cross-version PMIx 
support - at that time, you’ll be able to run OMPI v2.0 and above against 
either one as well.

HTH
Ralph

> On Feb 23, 2017, at 1:41 PM, Brock Palen  wrote:
> 
> Is it possible to use mpirun / orte as a load balancer for running serial
> jobs in parallel similar to GNU Parallel?
> https://www.biostars.org/p/63816/ 
> 
> Reason is on any major HPC system you normally want to use a resource
> manager launcher (TM, slurm etc)  and not ssh like gnu parallel.
> 
> I recall there being a way to give OMPI a stack of work todo from the talk
> at SC this year, but I can't figure it out if it does what I think it
> should do.
> 
> Thanks,
> 
> Brock Palen
> www.umich.edu/~brockp 
> Director Advanced Research Computing - TS
> XSEDE Campus Champion
> bro...@umich.edu 
> (734)936-1985
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-23 Thread Brock Palen
Is it possible to use mpirun / orte as a load balancer for running serial
jobs in parallel similar to GNU Parallel?
https://www.biostars.org/p/63816/

Reason is on any major HPC system you normally want to use a resource
manager launcher (TM, slurm etc)  and not ssh like gnu parallel.

I recall there being a way to give OMPI a stack of work todo from the talk
at SC this year, but I can't figure it out if it does what I think it
should do.

Thanks,

Brock Palen
www.umich.edu/~brockp 
Director Advanced Research Computing - TS
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users