Re: [OMPI users] Performing partial calculation on a single node in an MPI job

2016-10-16 Thread George Bosilca
Vahid,

You cannot use Fortan's vector subscript with MPI.  Are you certain that
the arrays used in your bcast are contiguous ? If not you would either need
to move the data first into a single dimension array (which will then have
the elements contiguously in memory), or define specialized datatypes to
match th memory layout of your array subscript.

  George.


On Sun, Oct 16, 2016 at 6:55 PM, Vahid Askarpour  wrote:

> Hello,
>
> I am attempting to modify a relatively large code (Quantum Espresso/EPW)
> and here I will try to summarize the problem in general terms.
>
> I am using an OPENMPI-compiled fortran 90 code in which, midway through
> the code, say 10 points x(3,10) are broadcast  across say 4 nodes. The
> index 3 refers to x,y,z. For each point, a number of calculations are done
> and an array, B(3,20,n) is generated. The integer n depends on the symmetry
> of the system and so varies from node to node.
>
> When I run this code serially, I can print all the correct B values to
> file, so I know the algorithm works.  When I run it in parallel, I get
> numbers that are meaningless. Collecting the points would not help because
> I need to collect the B values. I have tried to run that section of the
> code on one node by setting the processor index “mpime" equal to “ionode"
> or “root” using the following IF statement:
>
> IF (mpime .eq. root ) THEN
> do the calculation and print B
> ENDIF
>
> Neither ionode nor root returns the correct B array.
>
> What would be the best way to extract the B array?
>
> Thank you,
>
> Vahid
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Performing partial calculation on a single node in an MPI job

2016-10-16 Thread Vahid Askarpour
Hello,

I am attempting to modify a relatively large code (Quantum Espresso/EPW) and 
here I will try to summarize the problem in general terms.

I am using an OPENMPI-compiled fortran 90 code in which, midway through the 
code, say 10 points x(3,10) are broadcast  across say 4 nodes. The index 3 
refers to x,y,z. For each point, a number of calculations are done and an 
array, B(3,20,n) is generated. The integer n depends on the symmetry of the 
system and so varies from node to node.

When I run this code serially, I can print all the correct B values to file, so 
I know the algorithm works.  When I run it in parallel, I get numbers that are 
meaningless. Collecting the points would not help because I need to collect the 
B values. I have tried to run that section of the code on one node by setting 
the processor index “mpime" equal to “ionode" or “root” using the following IF 
statement:

IF (mpime .eq. root ) THEN
do the calculation and print B
ENDIF

Neither ionode nor root returns the correct B array.

What would be the best way to extract the B array?

Thank you,

Vahid


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] mpirun works with cmd line call , but not with app context file arg

2016-10-16 Thread MM
On 16 October 2016 at 14:50, Gilles Gouaillardet
 wrote:
> Out of curiosity, why do you specify both --hostfile and -H ?
> Do you observe the same behavior without --hostfile ~/.mpihosts ?

When I specify only -H like so:

mpirun -H localhost -np 1 prog1 : -H A.lan -np  4 prog2 : -H B.lan -np 4 prog2

I get the same error:
"There are not enough slots available in the system to satisfy the 4 slots
that were requested by the application:
  prog2
Either request fewer slots for your application, or make more slots
available for use".

> Also, do you have at least 4 cores on both A.lan and B.lan ?
Yes both A and B have exactly 4 cores each.
>
> Cheers,
>
> Gilles
>
>
> On Sunday, October 16, 2016, MM  wrote:
>>
>> Hi,
>>
>> openmpi 1.10.3
>>
>> this call:
>>
>> mpirun --hostfile ~/.mpihosts -H localhost -np 1 prog1 : -H A.lan -np
>> 4 prog2 : -H B.lan -np 4 prog2
>>
>> works, yet this one:
>>
>> mpirun --hostfile ~/.mpihosts --app ~/.mpiapp
>>
>> doesn't.  where ~/.mpiapp
>>
>> -H localhost -np 1 prog1
>> -H A.lan -np 4 prog2
>> -H B.lan -np 4 prog2
>>
>> it says
>>
>> "There are not enough slots available in the system to satisfy the 4 slots
>> that were requested by the application:
>>   prog2
>> Either request fewer slots for your application, or make more slots
>> available
>> for use".
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Low CPU utilization

2016-10-16 Thread Reuti
Hi,

Am 16.10.2016 um 20:34 schrieb Mahmood Naderan:

> Hi,
> I am running two softwares that use OMPI-2.0.1. Problem is that the CPU 
> utilization is low on the nodes.
> 
> 
> For example, see the process information below
> 
> [root@compute-0-1 ~]# ps aux | grep siesta
> mahmood  14635  0.0  0.0 108156  1300 ?S21:58   0:00 /bin/bash 
> /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta.p1 A.fdf
> mahmood  14636  0.0  0.0 108156  1300 ?S21:58   0:00 /bin/bash 
> /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta.p1 A.fdf
> mahmood  14637 61.6  0.2 372076 158220 ?   Rl   21:58   0:38 
> /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta
> mahmood  14639 59.6  0.2 365992 154228 ?   Rl   21:58   0:37 
> /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta
> 
> 
> Note that the cpu utilization is the third column. The "siesta.pl" script is
> 
> #!/bin/bash
> BENCH=$1
> export OMP_NUM_THREADS=1
> /share/apps/chemistry/siesta-4.0-mpi201/spar/siesta < $BENCH
> 
> 
> 
> 
> I also saw a similar behavior from Gromacs which has been discussed at 
> https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2016-October/108939.html
> 
> It seems that there is a tricky thing with OMPI. Any idea is welcomed.

Sounds like the two jobs are using the same cores by automatic core binding as 
one instance doesn't know anything about the other. For a first test you can 
start both with "mpiexec --bind-to none ..." and check whether you see a 
different behavior.

`man mpiexec` mentions some hints about threads in applications.

-- Reuti

> 
> 
> Regards,
> Mahmood
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


[OMPI users] Low CPU utilization

2016-10-16 Thread Mahmood Naderan
Hi,
I am running two softwares that use OMPI-2.0.1. Problem is that the CPU
utilization is low on the nodes.


For example, see the process information below

[root@compute-0-1 ~]# ps aux | grep siesta
mahmood  14635  0.0  0.0 108156  1300 ?S21:58   0:00 /bin/bash
/share/apps/chemistry/siesta-4.0-mpi201/spar/siesta.p1 A.fdf
mahmood  14636  0.0  0.0 108156  1300 ?S21:58   0:00 /bin/bash
/share/apps/chemistry/siesta-4.0-mpi201/spar/siesta.p1 A.fdf
mahmood  14637 61.6  0.2 372076 158220 ?   Rl   21:58   0:38
/share/apps/chemistry/siesta-4.0-mpi201/spar/siesta
mahmood  14639 59.6  0.2 365992 154228 ?   Rl   21:58   0:37
/share/apps/chemistry/siesta-4.0-mpi201/spar/siesta


Note that the cpu utilization is the third column. The "siesta.pl" script is

#!/bin/bash
BENCH=$1
export OMP_NUM_THREADS=1
/share/apps/chemistry/siesta-4.0-mpi201/spar/siesta < $BENCH




I also saw a similar behavior from Gromacs which has been discussed at
https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2016-October/108939.html

It seems that there is a tricky thing with OMPI. Any idea is welcomed.


Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] How to yield CPU more when not computing (was curious behavior during wait for broadcast: 100% cpu)

2016-10-16 Thread Jeff Hammond
If you want to keep long-waiting MPI processes from clogging your CPU
pipeline and heating up your machines, you can turn blocking MPI
collectives into nicer ones by implementing them in terms of MPI-3
nonblocking collectives using something like the following.

I typed this code straight into this email, so you should validate it
carefully.

Jeff

#ifdef HAVE_UNISTD_H
#include #include 
const int myshortdelay = 1; /* microseconds */
const int mylongdelay = 1; /* seconds */
#else
#define USE_USLEEP 0
#define USE_SLEEP 0
#endif

#ifdef HAVE_SCHED_H
#include 
#else
#define USE_YIELD 0
#endif

int MPI_Bcast( void *buffer, int count, MPI_Datatype datatype, int root,
MPI_Comm comm )
{
  MPI_Request request;
  {
int rc = PMPI_Ibcast(buffer, count, datatype, root, comm, );
if (rc!=MPI_SUCCESS) return rc;
  }
  int flag = 0;
  while (!flag)
  {
int rc = PMPI_Test(, , MPI_STATUS_IGNORE)
if (rc!=MPI_SUCCESS) return rc;

/* pick one of these... */
#if USE_YIELD
sched_yield();
#elif USE_USLEEP
usleep(myshortdelay);
#elif USE_SLEEP
sleep(mylongdelay);
#elif USE_CPU_RELAX
cpu_relax(); /*
http://linux-kernel.2935.n7.nabble.com/x86-cpu-relax-why-nop-vs-pause-td398656.html
*/
#else
#warning Hard polling may not be the best idea...
#endif
  }
  return MPI_SUCCESS;
}

On Sun, Oct 16, 2016 at 2:24 AM, MM  wrote:
>
> I would like to see if there are any updates re this thread back from
2010:
>
> https://mail-archive.com/users@lists.open-mpi.org/msg15154.html
>
> I've got 3 boxes at home, a laptop and 2 other quadcore nodes . When the
CPU is at 100% for a long time, the fans make quite some noise:-)
>
> The laptop runs the UI, and the 2 other boxes are the compute nodes.
> The user triggers compute tasks at random times... In between those times
when no parallelized compute is done, the user does analysis, looks at data
and so on.
> This does not involve any MPI compute.
> At that point, the nodes are blocked in a mpi_broadcast with each of the
4 processes on each of the nodes polling at 100%, triggering the cpu fan:-)
>
> homogeneous openmpi 1.10.3  linux 4.7.5
>
> Nowadays, are there any more options than the yield_when_idle mentioned
in that initial thread?
>
> The model I have used for so far is really a master/slave model where the
master sends the jobs (which take substantially longer than the MPI
communication itself), so in this model I would want the mpi nodes to be
really idle and i can sacrifice the latency while there's nothing to do.
> if there are no other options, is it possible to somehow start all the
processes outside of the mpi world, then only start the mpi framework once
it's needed?
>
> Regards,
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users




--
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[MTT users] MTT Server Downtime - Tues., Oct. 18, 2016

2016-10-16 Thread Josh Hursey
I will announce this on the Open MPI developer's teleconf on Tuesday,
before the move.

Geoff - Please add this item to the agenda.


Short version:
---
MTT server (mtt.open-mpi.org) will be going down for maintenance on
Tuesday, Oct. 18, 2016 from 2-5 pm US Eastern. During this time the MTT
Reporter and the MTT client submission interface will not be accessible. I
will send an email out when the service is back online.


Longer version:
---
We need to move the MTT Server/Database from the IU server to the AWS
server. This move will be completely transparent to users submitting to the
database, except for a window of downtime to move the database.

I estimate that moving the database will take about two hours. So I have
blocked off three hours to give us time to test, and redirect the DNS
record.

Once the service comes back online, you should be able to access MTT using
themtt.open-mpi.org URL. No changes are needed in your MTT client setup,
and all permalinks are expected to still work after the move.


Let me know if you have any questions or concerns about the move.


-- 
Josh Hursey
IBM Spectrum MPI Developer
___
mtt-users mailing list
mtt-users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/mtt-users

Re: [OMPI users] mpirun works with cmd line call , but not with app context file arg

2016-10-16 Thread Gilles Gouaillardet
Out of curiosity, why do you specify both --hostfile and -H ?
Do you observe the same behavior without --hostfile ~/.mpihosts ?

Also, do you have at least 4 cores on both A.lan and B.lan ?

Cheers,

Gilles

On Sunday, October 16, 2016, MM  wrote:

> Hi,
>
> openmpi 1.10.3
>
> this call:
>
> mpirun --hostfile ~/.mpihosts -H localhost -np 1 prog1 : -H A.lan -np
> 4 prog2 : -H B.lan -np 4 prog2
>
> works, yet this one:
>
> mpirun --hostfile ~/.mpihosts --app ~/.mpiapp
>
> doesn't.  where ~/.mpiapp
>
> -H localhost -np 1 prog1
> -H A.lan -np 4 prog2
> -H B.lan -np 4 prog2
>
> it says
>
> "There are not enough slots available in the system to satisfy the 4 slots
> that were requested by the application:
>   prog2
> Either request fewer slots for your application, or make more slots
> available
> for use".
> ___
> users mailing list
> users@lists.open-mpi.org 
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] mpirun works with cmd line call , but not with app context file arg

2016-10-16 Thread MM
Hi,

openmpi 1.10.3

this call:

mpirun --hostfile ~/.mpihosts -H localhost -np 1 prog1 : -H A.lan -np
4 prog2 : -H B.lan -np 4 prog2

works, yet this one:

mpirun --hostfile ~/.mpihosts --app ~/.mpiapp

doesn't.  where ~/.mpiapp

-H localhost -np 1 prog1
-H A.lan -np 4 prog2
-H B.lan -np 4 prog2

it says

"There are not enough slots available in the system to satisfy the 4 slots
that were requested by the application:
  prog2
Either request fewer slots for your application, or make more slots available
for use".
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


[OMPI users] How to yield CPU more when not computing (was curious behavior during wait for broadcast: 100% cpu)

2016-10-16 Thread MM
I would like to see if there are any updates re this thread back from 2010:

https://mail-archive.com/users@lists.open-mpi.org/msg15154.html

I've got 3 boxes at home, a laptop and 2 other quadcore nodes . When the
CPU is at 100% for a long time, the fans make quite some noise:-)

The laptop runs the UI, and the 2 other boxes are the compute nodes.
The user triggers compute tasks at random times... In between those times
when no parallelized compute is done, the user does analysis, looks at data
and so on.
This does not involve any MPI compute.
At that point, the nodes are blocked in a mpi_broadcast with each of the 4
processes on each of the nodes polling at 100%, triggering the cpu fan:-)

homogeneous openmpi 1.10.3  linux 4.7.5

Nowadays, are there any more options than the yield_when_idle mentioned in
that initial thread?

The model I have used for so far is really a master/slave model where the
master sends the jobs (which take substantially longer than the MPI
communication itself), so in this model I would want the mpi nodes to be
really idle and i can sacrifice the latency while there's nothing to do.
if there are no other options, is it possible to somehow start all the
processes outside of the mpi world, then only start the mpi framework once
it's needed?

Regards,
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users