Re: [OMPI users] (no subject)
Yes MM... But here a single node has 16cores not 64 cores. The 1st two jobs were with OMPI-1.4.5. 16 cores of single node - 3692.403 16 cores on two nodes (8 cores per node) - 12338.809 The 1st two jobs were with OMPI-1.6.5. 16 cores of single node - 3547.879 16 cores on two nodes (8 cores per node) - 5527.320 As others said, due to shared memory communication the single node job is running faster, but I was expecting a slight difference between 1 & 2 nodes - which is taking 60% more time here. On Thu, Oct 31, 2013 at 8:19 PM, Ralph Castainwrote: > Yes, though the degree of impact obviously depends on the messaging > pattern of the app. > > On Oct 31, 2013, at 2:50 AM, MM wrote: > > Of course, by this you mean, with the same total number of nodes, for e.g. > 64 process on 1 node using shared mem, vs 64 processes spread over 2 nodes > (32 each for e.g.)? > > > On 29 October 2013 14:37, Ralph Castain wrote: > >> As someone previously noted, apps will always run slower on multiple >> nodes vs everything on a single node due to the shared memory vs IB >> differences. Nothing you can do about that one. >> > ___ > > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] (no subject)
As discussed earlier, the executable which was compiled with OpenMPI-1.4.5 gave very low performance of 12338.809 seconds when job executed on two nodes(8 cores per node). The same job run on single node(all 16cores) got executed in just 3692.403 seconds. Now I compiled the application with OpenMPI-1.6.5 and got executed in 5527.320 seconds on two nodes. Is this a performance gain with OMPI-1.6.5 over OMPI-1.4.5 or an issue with OPENMPI itself? On Tue, Oct 15, 2013 at 5:32 PM, San B <forum@gmail.com> wrote: > Hi, > > As per your instruction, I did the profiling of the application with > mpiP. Following is the difference between the two runs: > > Run 1: 16 mpi processes on single node > > @--- MPI Time (seconds) --- > --- > TaskAppTimeMPITime MPI% >0 3.61e+0366118.32 >1 3.61e+0362717.37 >2 3.61e+0370019.39 >3 3.61e+0366518.41 >4 3.61e+0370219.45 >5 3.61e+0370319.48 >6 3.61e+0374020.50 >7 3.61e+0376321.14 > ... > ... > > Run 2: 16 mpi processes on two nodes - 8 mpi processes per node > > @--- MPI Time (seconds) --- > --- > TaskAppTimeMPITime MPI% >0 1.27e+04 1.06e+0484.14 >1 1.27e+04 1.07e+0484.34 >2 1.27e+04 1.07e+0484.20 >3 1.27e+04 1.07e+0484.20 >4 1.27e+04 1.07e+0484.22 >5 1.27e+04 1.07e+0484.25 >6 1.27e+04 1.06e+0484.02 >7 1.27e+04 1.07e+0484.35 >8 1.27e+04 1.07e+0484.29 > > > The time spent for MPI functions in run 1 is less than 20%, > where as it is more than 80% in the run 2. For more details, I've attached > both output files. Please go thru these files and suggest what optimization > we can do with OpenMPI or Intel MKL. > > Thanks > > > On Mon, Oct 7, 2013 at 12:15 PM, San B <forum@gmail.com> wrote: > >> Hi, >> >> I'm facing a performance issue with a scientific application(Fortran). >> The issue is, it runs faster on single node but runs very slow on multiple >> nodes. For example, a 16 core job on single node finishes in 1hr 2mins, but >> the same job on two nodes (i.e. 8 cores per node & remaining 8 cores kept >> free) takes 3hr 20mins. The code is compiled with ifort-13.1.1, >> openmpi-1.4.5 and intel MKL libraries - lapack, blas, scalapack, blacs & >> fftw. What could be the problem here with? >> Is it possible to do any tuning in OpenMPI? FY More info: The cluster has >> Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is >> Enabled. Jobs are submitted thru LSF scheduler. >> >> Does HyperThreading causing any problem here? >> >> >> Thanks >> > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] (no subject)
Hi, As per your instruction, I did the profiling of the application with mpiP. Following is the difference between the two runs: Run 1: 16 mpi processes on single node @--- MPI Time (seconds) --- --- TaskAppTimeMPITime MPI% 0 3.61e+0366118.32 1 3.61e+0362717.37 2 3.61e+0370019.39 3 3.61e+0366518.41 4 3.61e+0370219.45 5 3.61e+0370319.48 6 3.61e+0374020.50 7 3.61e+0376321.14 ... ... Run 2: 16 mpi processes on two nodes - 8 mpi processes per node @--- MPI Time (seconds) --- --- TaskAppTimeMPITime MPI% 0 1.27e+04 1.06e+0484.14 1 1.27e+04 1.07e+0484.34 2 1.27e+04 1.07e+0484.20 3 1.27e+04 1.07e+0484.20 4 1.27e+04 1.07e+0484.22 5 1.27e+04 1.07e+0484.25 6 1.27e+04 1.06e+0484.02 7 1.27e+04 1.07e+0484.35 8 1.27e+04 1.07e+0484.29 The time spent for MPI functions in run 1 is less than 20%, where as it is more than 80% in the run 2. For more details, I've attached both output files. Please go thru these files and suggest what optimization we can do with OpenMPI or Intel MKL. Thanks On Mon, Oct 7, 2013 at 12:15 PM, San B <forum@gmail.com> wrote: > Hi, > > I'm facing a performance issue with a scientific application(Fortran). > The issue is, it runs faster on single node but runs very slow on multiple > nodes. For example, a 16 core job on single node finishes in 1hr 2mins, but > the same job on two nodes (i.e. 8 cores per node & remaining 8 cores kept > free) takes 3hr 20mins. The code is compiled with ifort-13.1.1, > openmpi-1.4.5 and intel MKL libraries - lapack, blas, scalapack, blacs & > fftw. What could be the problem here with? > Is it possible to do any tuning in OpenMPI? FY More info: The cluster has > Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is > Enabled. Jobs are submitted thru LSF scheduler. > > Does HyperThreading causing any problem here? > > > Thanks > mpi-App-profile-1node-16perNode.mpiP Description: Binary data mpi-App-profile-2Nodes-8perNode.mpiP Description: Binary data
[OMPI users] (no subject)
Hi, I'm facing a performance issue with a scientific application(Fortran). The issue is, it runs faster on single node but runs very slow on multiple nodes. For example, a 16 core job on single node finishes in 1hr 2mins, but the same job on two nodes (i.e. 8 cores per node & remaining 8 cores kept free) takes 3hr 20mins. The code is compiled with ifort-13.1.1, openmpi-1.4.5 and intel MKL libraries - lapack, blas, scalapack, blacs & fftw. What could be the problem here with? Is it possible to do any tuning in OpenMPI? FY More info: The cluster has Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is Enabled. Jobs are submitted thru LSF scheduler. Does HyperThreading causing any problem here? Thanks
[OMPI users] OpenMPI-1.6.1: Warning - registering physical memry for mpi jobs
OpenMPI-1.6.1 is installed on Rocks-5.5 Linux cluster with intel compilers and OFED-1.5.3. A sample Helloworld MPI program gives following warning message: /mpi/openmpi/1.6.1/intel/bin/mpirun -np 4 ./mpi -- WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash. This may be caused by your OpenFabrics vendor limiting the amount of physical memory that can be registered. You should investigate the relevant Linux kernel module parameters that control how much physical memory can be registered, and increase them to allow registering all physical memory on your machine. See this Open MPI FAQ item for more information on these Linux kernel module parameters: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages Local host: masternode Registerable memory: 4096 MiB Total memory:32151 MiB -- Greetings: 1 of 4 from the node masternode Greetings: 2 of 4 from the node masternode Greetings: 3 of 4 from the node masternode Greetings: 0 of 4 from the node masternode [masternode:29820] 3 more processes have sent help message help-mpi-btl-openib.txt / reg mem limit low [masternode:29820] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages The ulimit parameters also set to unlimited: ]# ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 278528 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 278528 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited The file /etc/securoty/limits.conf contains following lines: * soft memlock unlimited * hard memlock unlimited But why still OpenMPI is throwing warning message wrt registered memory. Thanks in advance