Re: [OMPI users] (no subject)

2013-11-02 Thread San B
Yes MM...  But here a single node has 16cores not 64 cores.
The 1st two jobs were with OMPI-1.4.5.
  16 cores of single node - 3692.403
  16 cores on two nodes (8 cores per node) - 12338.809

The 1st two jobs were with OMPI-1.6.5.
  16 cores of single node - 3547.879
  16 cores on two nodes (8 cores per node) - 5527.320

  As others said, due to shared memory communication the single node
job is running faster, but I was expecting a slight difference between 1 &
2 nodes - which is taking 60% more time here.



On Thu, Oct 31, 2013 at 8:19 PM, Ralph Castain  wrote:

> Yes, though the degree of impact obviously depends on the messaging
> pattern of the app.
>
> On Oct 31, 2013, at 2:50 AM, MM  wrote:
>
> Of course, by this you mean, with the same total number of nodes, for e.g.
> 64 process on 1 node using shared mem, vs 64 processes spread over 2 nodes
> (32 each for e.g.)?
>
>
> On 29 October 2013 14:37, Ralph Castain  wrote:
>
>> As someone previously noted, apps will always run slower on multiple
>> nodes vs everything on a single node due to the shared memory vs IB
>> differences. Nothing you can do about that one.
>>
> ___
>
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] (no subject)

2013-10-29 Thread San B
  As discussed earlier, the executable which was compiled with
OpenMPI-1.4.5 gave very low performance of 12338.809 seconds when job
executed on two nodes(8 cores per node). The same job run on single
node(all 16cores) got executed in just 3692.403 seconds. Now I compiled the
application with OpenMPI-1.6.5 and got executed in 5527.320 seconds on two
nodes.

 Is this a performance gain with OMPI-1.6.5 over OMPI-1.4.5 or an issue
with OPENMPI itself?


On Tue, Oct 15, 2013 at 5:32 PM, San B <forum@gmail.com> wrote:

> Hi,
>
>  As per your instruction, I did the profiling of the application with
> mpiP. Following is the difference between the two runs:
>
> Run 1: 16 mpi processes on single node
>
> @--- MPI Time (seconds) ---
> ---
> TaskAppTimeMPITime MPI%
>0   3.61e+0366118.32
>1   3.61e+0362717.37
>2   3.61e+0370019.39
>3   3.61e+0366518.41
>4   3.61e+0370219.45
>5   3.61e+0370319.48
>6   3.61e+0374020.50
>7   3.61e+0376321.14
> ...
> ...
>
> Run 2: 16 mpi processes on two nodes - 8 mpi processes per node
>
> @--- MPI Time (seconds) ---
> ---
> TaskAppTimeMPITime MPI%
>0   1.27e+04   1.06e+0484.14
>1   1.27e+04   1.07e+0484.34
>2   1.27e+04   1.07e+0484.20
>3   1.27e+04   1.07e+0484.20
>4   1.27e+04   1.07e+0484.22
>5   1.27e+04   1.07e+0484.25
>6   1.27e+04   1.06e+0484.02
>7   1.27e+04   1.07e+0484.35
>8   1.27e+04   1.07e+0484.29
>
>
>   The time spent for MPI functions in run 1 is less than 20%,
> where as it is more than 80% in the run 2. For more details, I've attached
> both output files. Please go thru these files and suggest what optimization
> we can do with OpenMPI or Intel MKL.
>
> Thanks
>
>
> On Mon, Oct 7, 2013 at 12:15 PM, San B <forum@gmail.com> wrote:
>
>> Hi,
>>
>> I'm facing a  performance issue with a scientific application(Fortran).
>> The issue is, it runs faster on single node but runs very slow on multiple
>> nodes. For example, a 16 core job on single node finishes in 1hr 2mins, but
>> the same job on two nodes (i.e. 8 cores per node & remaining 8 cores kept
>> free) takes 3hr 20mins. The code is compiled with ifort-13.1.1,
>> openmpi-1.4.5 and intel MKL libraries - lapack, blas, scalapack, blacs &
>> fftw. What could be the problem here with?
>> Is it possible to do any tuning in OpenMPI? FY More info: The cluster has
>> Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is
>> Enabled. Jobs are submitted thru LSF scheduler.
>>
>> Does HyperThreading causing any problem here?
>>
>>
>> Thanks
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] (no subject)

2013-10-15 Thread San B
Hi,

 As per your instruction, I did the profiling of the application with
mpiP. Following is the difference between the two runs:

Run 1: 16 mpi processes on single node

@--- MPI Time (seconds) ---
---
TaskAppTimeMPITime MPI%
   0   3.61e+0366118.32
   1   3.61e+0362717.37
   2   3.61e+0370019.39
   3   3.61e+0366518.41
   4   3.61e+0370219.45
   5   3.61e+0370319.48
   6   3.61e+0374020.50
   7   3.61e+0376321.14
...
...

Run 2: 16 mpi processes on two nodes - 8 mpi processes per node

@--- MPI Time (seconds) ---
---
TaskAppTimeMPITime MPI%
   0   1.27e+04   1.06e+0484.14
   1   1.27e+04   1.07e+0484.34
   2   1.27e+04   1.07e+0484.20
   3   1.27e+04   1.07e+0484.20
   4   1.27e+04   1.07e+0484.22
   5   1.27e+04   1.07e+0484.25
   6   1.27e+04   1.06e+0484.02
   7   1.27e+04   1.07e+0484.35
   8   1.27e+04   1.07e+0484.29


  The time spent for MPI functions in run 1 is less than 20%, where
as it is more than 80% in the run 2. For more details, I've attached both
output files. Please go thru these files and suggest what optimization we
can do with OpenMPI or Intel MKL.

Thanks


On Mon, Oct 7, 2013 at 12:15 PM, San B <forum@gmail.com> wrote:

> Hi,
>
> I'm facing a  performance issue with a scientific application(Fortran).
> The issue is, it runs faster on single node but runs very slow on multiple
> nodes. For example, a 16 core job on single node finishes in 1hr 2mins, but
> the same job on two nodes (i.e. 8 cores per node & remaining 8 cores kept
> free) takes 3hr 20mins. The code is compiled with ifort-13.1.1,
> openmpi-1.4.5 and intel MKL libraries - lapack, blas, scalapack, blacs &
> fftw. What could be the problem here with?
> Is it possible to do any tuning in OpenMPI? FY More info: The cluster has
> Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is
> Enabled. Jobs are submitted thru LSF scheduler.
>
> Does HyperThreading causing any problem here?
>
>
> Thanks
>


mpi-App-profile-1node-16perNode.mpiP
Description: Binary data


mpi-App-profile-2Nodes-8perNode.mpiP
Description: Binary data


[OMPI users] (no subject)

2013-10-07 Thread San B
Hi,

I'm facing a  performance issue with a scientific application(Fortran). The
issue is, it runs faster on single node but runs very slow on multiple
nodes. For example, a 16 core job on single node finishes in 1hr 2mins, but
the same job on two nodes (i.e. 8 cores per node & remaining 8 cores kept
free) takes 3hr 20mins. The code is compiled with ifort-13.1.1,
openmpi-1.4.5 and intel MKL libraries - lapack, blas, scalapack, blacs &
fftw. What could be the problem here with?
Is it possible to do any tuning in OpenMPI? FY More info: The cluster has
Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is
Enabled. Jobs are submitted thru LSF scheduler.

Does HyperThreading causing any problem here?


Thanks


[OMPI users] OpenMPI-1.6.1: Warning - registering physical memry for mpi jobs

2012-09-05 Thread San B
   OpenMPI-1.6.1 is installed on Rocks-5.5 Linux cluster with intel
compilers and OFED-1.5.3. A sample Helloworld MPI program gives following
warning message:


/mpi/openmpi/1.6.1/intel/bin/mpirun -np 4 ./mpi
--
WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory.  This can cause MPI jobs to
run with erratic performance, hang, and/or crash.

This may be caused by your OpenFabrics vendor limiting the amount of
physical memory that can be registered.  You should investigate the
relevant Linux kernel module parameters that control how much physical
memory can be registered, and increase them to allow registering all
physical memory on your machine.

See this Open MPI FAQ item for more information on these Linux kernel module
parameters:

http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

  Local host:  masternode
  Registerable memory: 4096 MiB
  Total memory:32151 MiB
--
Greetings: 1 of 4 from the node masternode
Greetings: 2 of 4 from the node masternode
Greetings: 3 of 4 from the node masternode
Greetings: 0 of 4 from the node masternode
[masternode:29820] 3 more processes have sent help message
help-mpi-btl-openib.txt / reg mem limit low
[masternode:29820] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages

The ulimit parameters also set to unlimited:

]# ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 278528
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) unlimited
cpu time   (seconds, -t) unlimited
max user processes  (-u) 278528
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited


The file /etc/securoty/limits.conf contains following lines:

* soft memlock unlimited
* hard memlock unlimited

But why still OpenMPI is throwing warning message wrt registered memory.

Thanks in advance