subject:"\[OMPI users\] \(no subject\)"

[OMPI users] (no subject)

2018-11-11 Thread Andrei Berceanu

Hi all,

Running a CUDA+MPI application on a node with 2 K80 GPUs, I get the
following warnings:

--
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them).  This is most certainly not what you wanted.  Check your
cables, subnet manager configuration, etc.  The openib BTL will be
ignored for this job.

  Local host: gpu01
--
[gpu01:107262] 1 more process has sent help message help-mpi-btl-openib.txt
/ no active ports found
[gpu01:107262] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages

Any idea of what is going on and how I can fix this?
I am using OpenMPI 3.1.2.
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] (no subject)

2018-11-01 Thread Jeff Squyres (jsquyres) via users

That's pretty weird.

I notice that you're using 3.1.0rc2.  Does the same thing happen with Open MPI 
3.1.3?


> On Oct 31, 2018, at 9:08 PM, Dmitry N. Mikushin  wrote:
> 
> Dear all,
> 
> ompi_info reports pml components are available:
> 
> $ /usr/mpi/gcc/openmpi-3.1.0rc2/bin/ompi_info -a | grep pml
>  MCA pml: v (MCA v2.1.0, API v2.0.0, Component v3.1.0)
>  MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component 
> v3.1.0)
>  MCA pml: yalla (MCA v2.1.0, API v2.0.0, Component v3.1.0)
>  MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v3.1.0)
>  MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v3.1.0)
>  MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v3.1.0)
> 
> However, when I'm trying to use them, mpirun gives back:
> 
> --
> No components were able to be opened in the pml framework.
> 
> This typically means that either no components of this type were
> installed, or none of the installed components can be loaded.
> Sometimes this means that shared libraries required by these
> components are unable to be found/loaded.
> 
>   Host:  cloudgpu6
>   Framework: pml
> --
> 
> With the strace I can see the libraries 
> /usr/mpi/gcc/openmpi-3.1.0rc2/lib64/openmpi/mca_pml_* are reached out by 
> mpirun. Then I also can see ldd does not show any unresolved dependencies for 
> them.
> 
> How else could it be the that pml is not found?
> 
> Thanks,
> - Dmitry.
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] (no subject)

2018-10-31 Thread Dmitry N. Mikushin

Dear all,

ompi_info reports pml components are available:

$ /usr/mpi/gcc/openmpi-3.1.0rc2/bin/ompi_info -a | grep pml
 MCA pml: v (MCA v2.1.0, API v2.0.0, Component v3.1.0)
 MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component
v3.1.0)
 MCA pml: yalla (MCA v2.1.0, API v2.0.0, Component v3.1.0)
 MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v3.1.0)
 MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v3.1.0)
 MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v3.1.0)

However, when I'm trying to use them, mpirun gives back:

--
No components were able to be opened in the pml framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:  cloudgpu6
  Framework: pml
--

With the strace I can see the
libraries /usr/mpi/gcc/openmpi-3.1.0rc2/lib64/openmpi/mca_pml_* are reached
out by mpirun. Then I also can see ldd does not show any unresolved
dependencies for them.

How else could it be the that pml is not found?

Thanks,
- Dmitry.
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] (no subject)

2017-08-21 Thread Lizhaogeng

Hi all,

I encountered a problem when I tested the performance of OpenMPI over ROCE
100Gbps.

I have two servers connected with mellanox 100Gbps Connect-X4 ROCE NICs on
them.
I used intel mpi benchmark to test the performance of OpenMPI (1.10.3) over
RDMA.
I found the bandwidth of benchmark pingpong (2 ranks, every server has only
one rank) could reach only 6GB/s (with openib btl).
I also used osu mpi benchmark, the bandwidth could reach only 6.5GB/s.
However, when I start two benchmarks at the same time, the total bandwidth
can reach about 11GB/s (every server has two ranks).

It seems that the CPU is the bottleneck.
Obviously, the bottleneck is not memcpy.
And RDMA itself ought not to comsume too much CPU resources, since the
perftest of ib_write_bw can reach 11GB/s easily.

Is the bandwidth limit is normal?
Is there anyone know what is the real bottleneck?

Thanks for your kindly help in advance.

Regards,
Zhaogeng
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] (no subject)

2017-05-16 Thread Gilles Gouaillardet

: ...
--
2 total processes failed to start
[se01.grid.tuc.gr:19607] mca: base: close: component mmap closed
[se01.grid.tuc.gr:19607] mca: base: close: unloading component mmap


jb


-Original Message-
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
gil...@rist.or.jp
Sent: Monday, May 15, 2017 1:47 PM
To: Open MPI Users <users@lists.open-mpi.org>
Subject: Re: [OMPI users] (no subject)

Ioannis,

### What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git
branch name and hash, etc.)



### Describe how Open MPI was installed (e.g., from a source/
distribution tarball, from a git clone, from an operating system
distribution package, etc.)



### Please describe the system on which you are running

* Operating system/version:
* Computer hardware:
* Network type:

also, what if you

mpirun --mca shmem_base_verbose 100 ...


Cheers,

Gilles
- Original Message -

Hi

I am trying to run the following simple demo to a cluster of two nodes

--



#include 
#include 

int main(int argc, char** argv) {
  MPI_Init(NULL, NULL);

  int world_size;
  MPI_Comm_size(MPI_COMM_WORLD, _size);

  int world_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, _rank);

  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  MPI_Get_processor_name(processor_name, _len);

  printf("Hello world from processor %s, rank %d"   " out of %d
processors\n",  processor_name, world_rank, world_size);

  MPI_Finalize();
}
--

---

i get always the message

--

--

It looks like opal_init failed for some reason; your parallel process

is

likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS
--



any hint?

Ioannis Botsis



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] (no subject)

2017-05-15 Thread Ioannis Botsis

 15, 2017 1:47 PM
To: Open MPI Users <users@lists.open-mpi.org>
Subject: Re: [OMPI users] (no subject)

Ioannis,

### What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git
branch name and hash, etc.)



### Describe how Open MPI was installed (e.g., from a source/
distribution tarball, from a git clone, from an operating system 
distribution package, etc.)



### Please describe the system on which you are running

* Operating system/version: 
* Computer hardware: 
* Network type: 

also, what if you

mpirun --mca shmem_base_verbose 100 ...


Cheers,

Gilles
- Original Message -
> Hi
> 
> I am trying to run the following simple demo to a cluster of two nodes
> 
> --

> #include 
> #include 
> 
> int main(int argc, char** argv) {
>  MPI_Init(NULL, NULL);
> 
>  int world_size;
>  MPI_Comm_size(MPI_COMM_WORLD, _size);
> 
>  int world_rank;
>  MPI_Comm_rank(MPI_COMM_WORLD, _rank);
> 
>  char processor_name[MPI_MAX_PROCESSOR_NAME];
>  int name_len;
>  MPI_Get_processor_name(processor_name, _len);
> 
>  printf("Hello world from processor %s, rank %d"   " out of %d 
> processors\n",  processor_name, world_rank, world_size);
> 
>  MPI_Finalize();
> }
> --
---
> 
> i get always the message
> 
> --
--
> It looks like opal_init failed for some reason; your parallel process 
is
> likely to abort.  There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>opal_shmem_base_select failed
>--> Returned value -1 instead of OPAL_SUCCESS
> --

> 
> any hint?
> 
> Ioannis Botsis
> 
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] (no subject)

2017-05-15 Thread gilles

Ioannis,

### What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git 
branch name and hash, etc.)



### Describe how Open MPI was installed (e.g., from a source/
distribution tarball, from a git clone, from an operating system 
distribution package, etc.)



### Please describe the system on which you are running

* Operating system/version: 
* Computer hardware: 
* Network type: 

also, what if you

mpirun --mca shmem_base_verbose 100 ...


Cheers,

Gilles
- Original Message -
> Hi
> 
> I am trying to run the following simple demo to a cluster of two nodes
> 
> --

> #include 
> #include 
> 
> int main(int argc, char** argv) {
>  MPI_Init(NULL, NULL);
> 
>  int world_size;
>  MPI_Comm_size(MPI_COMM_WORLD, _size);
> 
>  int world_rank;
>  MPI_Comm_rank(MPI_COMM_WORLD, _rank);
> 
>  char processor_name[MPI_MAX_PROCESSOR_NAME];
>  int name_len;
>  MPI_Get_processor_name(processor_name, _len);
> 
>  printf("Hello world from processor %s, rank %d"   " out of %d 
> processors\n",  processor_name, world_rank, world_size);
> 
>  MPI_Finalize();
> }
> --
---
> 
> i get always the message
> 
> --
--
> It looks like opal_init failed for some reason; your parallel process 
is
> likely to abort.  There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>opal_shmem_base_select failed
>--> Returned value -1 instead of OPAL_SUCCESS
> --

> 
> any hint?
> 
> Ioannis Botsis
> 
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] (no subject)

2017-05-15 Thread Ioannis Botsis


Hi

I am trying to run the following simple demo to a cluster of two nodes

--
#include 
#include 

int main(int argc, char** argv) {
MPI_Init(NULL, NULL);

int world_size;
MPI_Comm_size(MPI_COMM_WORLD, _size);

int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, _rank);

char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, _len);

printf("Hello world from processor %s, rank %d"   " out of %d 
processors\n",  processor_name, world_rank, world_size);


MPI_Finalize();
}
-

i get always the message


It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--

any hint?

Ioannis Botsis



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] (no subject)

2016-10-05 Thread Mahmood Naderan

Hi,
Is there any idea about the following error? On that node, there are 15
empty cores.

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] (no subject)

2016-02-22 Thread Mike Dubman

Hi,
it seems that your ompi was compiled with ofed ver X but running on ofed
ver Y.
X and Y are incompatible.


On Mon, Feb 22, 2016 at 8:18 PM, Mark Potter  wrote:

> I am usually able to find the answer to my problems by searching the
> archive but I've run up against one that I can't suss out.
>
> bison-opt: relocation error:
> /home/pbme002/opt/gcc-4.8.2-tpls/openmpi-1.8.4/lib/libmpi.so.1: symbol
> rdma_get_src_port, version RDMACM_1.0 not defined in file librdmacm.so.1
> with link time reference
>
> There is the error I am getting, the problem is that it's not consistent.
> This happens to a random few jobs in a series of the same job on different
> data sets. The ones that fail and produce the error run fine when a second
> attempt is made. I am the admin for this cluster and the user is using
> their own compiled OpenMPI and not the system OpenMPI so I can't say for
> certain that it was compiled correctly but it strikes me as odd that jobs
> would fail with the above error but run perfectly fine when a second
> attempt is made.
>
> I'm looking for any help sussing out what could be causing this issue.
>
> Regards,
>
> Mark L. Potter
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/02/28565.php
>



-- 

Kind Regards,

M.

[OMPI users] (no subject)

2016-02-22 Thread Mark Potter

I am usually able to find the answer to my problems by searching the archive 
but I've run up against one that I can't suss out.

bison-opt: relocation error: 
/home/pbme002/opt/gcc-4.8.2-tpls/openmpi-1.8.4/lib/libmpi.so.1: symbol 
rdma_get_src_port, version RDMACM_1.0 not defined in file librdmacm.so.1 with 
link time reference

There is the error I am getting, the problem is that it's not consistent. This 
happens to a random few jobs in a series of the same job on different data 
sets. The ones that fail and produce the error run fine when a second attempt 
is made. I am the admin for this cluster and the user is using their own 
compiled OpenMPI and not the system OpenMPI so I can't say for certain that it 
was compiled correctly but it strikes me as odd that jobs would fail with the 
above error but run perfectly fine when a second attempt is made.

I'm looking for any help sussing out what could be causing this issue.

Regards,

Mark L. Potter

[OMPI users] (no subject)

2016-01-18 Thread ?q???s

??

[OMPI users] (no subject)

2014-09-11 Thread Ahmed Salama

I install openmpi-1.6.5 , decide to install new version to support Java openmpi 
so i chosse openmpi-1.8.2 , configure it with following :

 $./configure --enable-mpi-java --with-jdk-bindir=/usr/jdk7/bin 
--with-jdk-headers=/usr/jdk6/include --prefix=/usr/openmpi-1.8.2

and it's ok with no errors

but when i install using $make all install, i have the following attached

error.docx
Description: MS-Word 2007 document

Re: [OMPI users] (no subject)

2013-11-02 Thread Ralph Castain

As I said, the degree of impact depends on the messaging pattern. If rank A 
typically sends/recvs with rank A+!, then you won't see much difference. 
However, if rank A typically sends/recvs with rank N-A, where N=#ranks in job, 
then you'll see a very large difference.

You might try simply changing the mapping pattern - e.g., add -bynode to your 
cmd line. This would make it run faster if it followed the latter example.


On Nov 2, 2013, at 12:40 AM, San B  wrote:

> Yes MM...  But here a single node has 16cores not 64 cores. 
> The 1st two jobs were with OMPI-1.4.5.
>   16 cores of single node - 3692.403
>   16 cores on two nodes (8 cores per node) - 12338.809
> 
> The 1st two jobs were with OMPI-1.6.5.
>   16 cores of single node - 3547.879
>   16 cores on two nodes (8 cores per node) - 5527.320 
> 
>   As others said, due to shared memory communication the single node job 
> is running faster, but I was expecting a slight difference between 1 & 2 
> nodes - which is taking 60% more time here.
> 
> 
> 
> On Thu, Oct 31, 2013 at 8:19 PM, Ralph Castain  wrote:
> Yes, though the degree of impact obviously depends on the messaging pattern 
> of the app. 
> 
> On Oct 31, 2013, at 2:50 AM, MM  wrote:
> 
>> Of course, by this you mean, with the same total number of nodes, for e.g. 
>> 64 process on 1 node using shared mem, vs 64 processes spread over 2 nodes 
>> (32 each for e.g.)?
>> 
>> 
>> On 29 October 2013 14:37, Ralph Castain  wrote:
>> As someone previously noted, apps will always run slower on multiple nodes 
>> vs everything on a single node due to the shared memory vs IB differences. 
>> Nothing you can do about that one.
>> ___
>> 
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] (no subject)

2013-11-02 Thread San B

Yes MM...  But here a single node has 16cores not 64 cores.
The 1st two jobs were with OMPI-1.4.5.
  16 cores of single node - 3692.403
  16 cores on two nodes (8 cores per node) - 12338.809

The 1st two jobs were with OMPI-1.6.5.
  16 cores of single node - 3547.879
  16 cores on two nodes (8 cores per node) - 5527.320

  As others said, due to shared memory communication the single node
job is running faster, but I was expecting a slight difference between 1 &
2 nodes - which is taking 60% more time here.

On Thu, Oct 31, 2013 at 8:19 PM, Ralph Castain  wrote:

> Yes, though the degree of impact obviously depends on the messaging
> pattern of the app.
>
> On Oct 31, 2013, at 2:50 AM, MM  wrote:
>
> Of course, by this you mean, with the same total number of nodes, for e.g.
> 64 process on 1 node using shared mem, vs 64 processes spread over 2 nodes
> (32 each for e.g.)?
>
>
> On 29 October 2013 14:37, Ralph Castain  wrote:
>
>> As someone previously noted, apps will always run slower on multiple
>> nodes vs everything on a single node due to the shared memory vs IB
>> differences. Nothing you can do about that one.
>>
> ___
>
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] (no subject)

2013-10-31 Thread Ralph Castain

Yes, though the degree of impact obviously depends on the messaging pattern of 
the app. 

On Oct 31, 2013, at 2:50 AM, MM  wrote:

> Of course, by this you mean, with the same total number of nodes, for e.g. 64 
> process on 1 node using shared mem, vs 64 processes spread over 2 nodes (32 
> each for e.g.)?
> 
> 
> On 29 October 2013 14:37, Ralph Castain  wrote:
> As someone previously noted, apps will always run slower on multiple nodes vs 
> everything on a single node due to the shared memory vs IB differences. 
> Nothing you can do about that one.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] (no subject)

2013-10-31 Thread MM

Of course, by this you mean, with the same total number of nodes, for e.g.
64 process on 1 node using shared mem, vs 64 processes spread over 2 nodes
(32 each for e.g.)?

On 29 October 2013 14:37, Ralph Castain  wrote:

> As someone previously noted, apps will always run slower on multiple nodes
> vs everything on a single node due to the shared memory vs IB differences.
> Nothing you can do about that one.
>

Re: [OMPI users] (no subject)

2013-10-29 Thread Ralph Castain

I don't think it's a bug in OMPI, but more likely reflects improvements in the 
default collective algorithms. If you want to further improve performance, you 
should bind your processes to a core (if your application isn't threaded) or to 
a socket (if threaded).

As someone previously noted, apps will always run slower on multiple nodes vs 
everything on a single node due to the shared memory vs IB differences. Nothing 
you can do about that one.


On Oct 28, 2013, at 10:36 PM, San B  wrote:

>   As discussed earlier, the executable which was compiled with 
> OpenMPI-1.4.5 gave very low performance of 12338.809 seconds when job 
> executed on two nodes(8 cores per node). The same job run on single node(all 
> 16cores) got executed in just 3692.403 seconds. Now I compiled the 
> application with OpenMPI-1.6.5 and got executed in 5527.320 seconds on two 
> nodes. 
> 
>  Is this a performance gain with OMPI-1.6.5 over OMPI-1.4.5 or an issue 
> with OPENMPI itself?
> 
> 
> On Tue, Oct 15, 2013 at 5:32 PM, San B  wrote:
> Hi,
> 
>  As per your instruction, I did the profiling of the application with 
> mpiP. Following is the difference between the two runs:
> 
> Run 1: 16 mpi processes on single node
> 
> @--- MPI Time (seconds) ---
> ---
> TaskAppTimeMPITime MPI%
>0   3.61e+0366118.32
>1   3.61e+0362717.37
>2   3.61e+0370019.39
>3   3.61e+0366518.41
>4   3.61e+0370219.45
>5   3.61e+0370319.48
>6   3.61e+0374020.50
>7   3.61e+0376321.14
> ...
> ...
> 
> Run 2: 16 mpi processes on two nodes - 8 mpi processes per node
> 
> @--- MPI Time (seconds) ---
> ---
> TaskAppTimeMPITime MPI%
>0   1.27e+04   1.06e+0484.14
>1   1.27e+04   1.07e+0484.34
>2   1.27e+04   1.07e+0484.20
>3   1.27e+04   1.07e+0484.20
>4   1.27e+04   1.07e+0484.22
>5   1.27e+04   1.07e+0484.25
>6   1.27e+04   1.06e+0484.02
>7   1.27e+04   1.07e+0484.35
>8   1.27e+04   1.07e+0484.29
> 
> 
>   The time spent for MPI functions in run 1 is less than 20%, where 
> as it is more than 80% in the run 2. For more details, I've attached both 
> output files. Please go thru these files and suggest what optimization we can 
> do with OpenMPI or Intel MKL.
> 
> Thanks
> 
> 
> On Mon, Oct 7, 2013 at 12:15 PM, San B  wrote:
> Hi,
> 
> I'm facing a  performance issue with a scientific application(Fortran). The 
> issue is, it runs faster on single node but runs very slow on multiple nodes. 
> For example, a 16 core job on single node finishes in 1hr 2mins, but the same 
> job on two nodes (i.e. 8 cores per node & remaining 8 cores kept free) takes 
> 3hr 20mins. The code is compiled with ifort-13.1.1, openmpi-1.4.5 and intel 
> MKL libraries - lapack, blas, scalapack, blacs & fftw. What could be the 
> problem here with?
> 
> Is it possible to do any tuning in OpenMPI? FY More info: The cluster has 
> Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is 
> Enabled. Jobs are submitted thru LSF scheduler.
> 
> Does HyperThreading causing any problem here?
> 
> 
> Thanks
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] (no subject)

2013-10-29 Thread San B

  As discussed earlier, the executable which was compiled with
OpenMPI-1.4.5 gave very low performance of 12338.809 seconds when job
executed on two nodes(8 cores per node). The same job run on single
node(all 16cores) got executed in just 3692.403 seconds. Now I compiled the
application with OpenMPI-1.6.5 and got executed in 5527.320 seconds on two
nodes.

 Is this a performance gain with OMPI-1.6.5 over OMPI-1.4.5 or an issue
with OPENMPI itself?


On Tue, Oct 15, 2013 at 5:32 PM, San B  wrote:

> Hi,
>
>  As per your instruction, I did the profiling of the application with
> mpiP. Following is the difference between the two runs:
>
> Run 1: 16 mpi processes on single node
>
> @--- MPI Time (seconds) ---
> ---
> TaskAppTimeMPITime MPI%
>0   3.61e+0366118.32
>1   3.61e+0362717.37
>2   3.61e+0370019.39
>3   3.61e+0366518.41
>4   3.61e+0370219.45
>5   3.61e+0370319.48
>6   3.61e+0374020.50
>7   3.61e+0376321.14
> ...
> ...
>
> Run 2: 16 mpi processes on two nodes - 8 mpi processes per node
>
> @--- MPI Time (seconds) ---
> ---
> TaskAppTimeMPITime MPI%
>0   1.27e+04   1.06e+0484.14
>1   1.27e+04   1.07e+0484.34
>2   1.27e+04   1.07e+0484.20
>3   1.27e+04   1.07e+0484.20
>4   1.27e+04   1.07e+0484.22
>5   1.27e+04   1.07e+0484.25
>6   1.27e+04   1.06e+0484.02
>7   1.27e+04   1.07e+0484.35
>8   1.27e+04   1.07e+0484.29
>
>
>   The time spent for MPI functions in run 1 is less than 20%,
> where as it is more than 80% in the run 2. For more details, I've attached
> both output files. Please go thru these files and suggest what optimization
> we can do with OpenMPI or Intel MKL.
>
> Thanks
>
>
> On Mon, Oct 7, 2013 at 12:15 PM, San B  wrote:
>
>> Hi,
>>
>> I'm facing a  performance issue with a scientific application(Fortran).
>> The issue is, it runs faster on single node but runs very slow on multiple
>> nodes. For example, a 16 core job on single node finishes in 1hr 2mins, but
>> the same job on two nodes (i.e. 8 cores per node & remaining 8 cores kept
>> free) takes 3hr 20mins. The code is compiled with ifort-13.1.1,
>> openmpi-1.4.5 and intel MKL libraries - lapack, blas, scalapack, blacs &
>> fftw. What could be the problem here with?
>> Is it possible to do any tuning in OpenMPI? FY More info: The cluster has
>> Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is
>> Enabled. Jobs are submitted thru LSF scheduler.
>>
>> Does HyperThreading causing any problem here?
>>
>>
>> Thanks
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] (no subject)

2013-10-15 Thread San B

Hi,

 As per your instruction, I did the profiling of the application with
mpiP. Following is the difference between the two runs:

Run 1: 16 mpi processes on single node

@--- MPI Time (seconds) ---
---
TaskAppTimeMPITime MPI%
   0   3.61e+0366118.32
   1   3.61e+0362717.37
   2   3.61e+0370019.39
   3   3.61e+0366518.41
   4   3.61e+0370219.45
   5   3.61e+0370319.48
   6   3.61e+0374020.50
   7   3.61e+0376321.14
...
...

Run 2: 16 mpi processes on two nodes - 8 mpi processes per node

@--- MPI Time (seconds) ---
---
TaskAppTimeMPITime MPI%
   0   1.27e+04   1.06e+0484.14
   1   1.27e+04   1.07e+0484.34
   2   1.27e+04   1.07e+0484.20
   3   1.27e+04   1.07e+0484.20
   4   1.27e+04   1.07e+0484.22
   5   1.27e+04   1.07e+0484.25
   6   1.27e+04   1.06e+0484.02
   7   1.27e+04   1.07e+0484.35
   8   1.27e+04   1.07e+0484.29


  The time spent for MPI functions in run 1 is less than 20%, where
as it is more than 80% in the run 2. For more details, I've attached both
output files. Please go thru these files and suggest what optimization we
can do with OpenMPI or Intel MKL.

Thanks


On Mon, Oct 7, 2013 at 12:15 PM, San B  wrote:

> Hi,
>
> I'm facing a  performance issue with a scientific application(Fortran).
> The issue is, it runs faster on single node but runs very slow on multiple
> nodes. For example, a 16 core job on single node finishes in 1hr 2mins, but
> the same job on two nodes (i.e. 8 cores per node & remaining 8 cores kept
> free) takes 3hr 20mins. The code is compiled with ifort-13.1.1,
> openmpi-1.4.5 and intel MKL libraries - lapack, blas, scalapack, blacs &
> fftw. What could be the problem here with?
> Is it possible to do any tuning in OpenMPI? FY More info: The cluster has
> Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is
> Enabled. Jobs are submitted thru LSF scheduler.
>
> Does HyperThreading causing any problem here?
>
>
> Thanks
>


mpi-App-profile-1node-16perNode.mpiP
Description: Binary data


mpi-App-profile-2Nodes-8perNode.mpiP
Description: Binary data

Re: [OMPI users] (no subject)

2013-10-08 Thread Iliev, Hristo

Hi,

When all processes run on the same node they communicate via shared memory
which delivers both high bandwidth and low latency. InfiniBand is slower and
more latent than shared memory. Your parallel algorithm might simply be very
latency sensitive and you should profile it with something like mpiP or
Vampir/VampirTrace in order to find why and only then try to further tune
Open MPI.

Hope that helps,
Hristo

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of San B
Sent: Monday, October 07, 2013 8:46 AM
To: OpenMPI ML
Subject: [OMPI users] (no subject)

Hi,
I'm facing a  performance issue with a scientific application(Fortran). The
issue is, it runs faster on single node but runs very slow on multiple
nodes. For example, a 16 core job on single node finishes in 1hr 2mins, but
the same job on two nodes (i.e. 8 cores per node & remaining 8 cores kept
free) takes 3hr 20mins. The code is compiled with ifort-13.1.1,
openmpi-1.4.5 and intel MKL libraries - lapack, blas, scalapack, blacs &
fftw. What could be the problem here with?
Is it possible to do any tuning in OpenMPI? FY More info: The cluster has
Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is
Enabled. Jobs are submitted thru LSF scheduler.
Does HyperThreading causing any problem here?

Thanks

--
Hristo Iliev, PhD  High Performance Computing Team
RWTH Aachen University, Center for Computing and Communication
Rechen- und Kommunikationszentrum der RWTH Aachen
Seffenter Weg 23, D 52074 Aachen (Germany)
Phone: +49 241 80 24367  Fax/UMS: +49 241 80 624367



smime.p7s
Description: S/MIME cryptographic signature

Re: [OMPI users] (no subject)

2013-10-07 Thread Reuti

Hi,

Am 07.10.2013 um 08:45 schrieb San B:

> I'm facing a  performance issue with a scientific application(Fortran). The 
> issue is, it runs faster on single node but runs very slow on multiple nodes. 
> For example, a 16 core job on single node finishes in 1hr 2mins, but the same 
> job on two nodes (i.e. 8 cores per node & remaining 8 cores kept free) takes 
> 3hr 20mins. The code is compiled with ifort-13.1.1, openmpi-1.4.5 and intel 
> MKL libraries - lapack, blas, scalapack, blacs & fftw. What could be the 
> problem here with?

How do you provide a list of hosts it should use to the application? Maybe it's 
now just running on only one machine - and/or can make use only of local OpenMP 
inside MKL (yes, OpenMP here which is bound to run on a single machine only).

-- Reuti

PS: Do you have 16 real cores or 8 plus Hyperthreading?


> Is it possible to do any tuning in OpenMPI? FY More info: The cluster has 
> Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is 
> Enabled. Jobs are submitted thru LSF scheduler.
> 
> Does HyperThreading causing any problem here?
> 
> 
> Thanks
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] (no subject)

2013-10-07 Thread San B

Hi,

I'm facing a  performance issue with a scientific application(Fortran). The
issue is, it runs faster on single node but runs very slow on multiple
nodes. For example, a 16 core job on single node finishes in 1hr 2mins, but
the same job on two nodes (i.e. 8 cores per node & remaining 8 cores kept
free) takes 3hr 20mins. The code is compiled with ifort-13.1.1,
openmpi-1.4.5 and intel MKL libraries - lapack, blas, scalapack, blacs &
fftw. What could be the problem here with?
Is it possible to do any tuning in OpenMPI? FY More info: The cluster has
Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is
Enabled. Jobs are submitted thru LSF scheduler.

Does HyperThreading causing any problem here?


Thanks

Re: [OMPI users] (no subject)

2012-11-30 Thread Josh Hursey

Pramoda,

That paper was exploring an application of a proposed extension to the MPI
standard for fault tolerance purposes. By default this proposed interface
is not provided by Open MPI. We have created a prototype version of Open
MPI that includes this extension, and it can be found at the following
website:
  http://fault-tolerance.org/

You should look at the interfaces in the new proposal (ULFM Specification)
since MPI_Comm_validate_rank is no longer part of the proposal. You can get
the same functionality through some of the new interfaces that replace it.
There are some examples on that website, and in the proposal that should
help you as well.

Best,
Josh

On Mon, Nov 19, 2012 at 8:59 AM, sri pramoda  wrote:

>  Dear Sir,
> I am Pramoda,PG scholar from Jadavpur Univesity,India.
>  I've gone through a paper "Building a Fault Tolerant MPI Application:
>  A Ring Communication Example".In this I found MPI_Comm_validate_rank
> command.
>  But I didn't found this command in mpi. Hence I request you to please
> send methe implementation of this command.
> Thank you,
>   Pramoda.
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Joshua Hursey
Assistant Professor of Computer Science
University of Wisconsin-La Crosse
http://cs.uwlax.edu/~jjhursey

[OMPI users] (no subject)

2012-11-19 Thread sri pramoda

 Dear Sir,
    I am Pramoda,PG scholar from Jadavpur Univesity,India.
 I've gone through a paper "Building a Fault Tolerant MPI Application:
 A Ring Communication Example".In this I found MPI_Comm_validate_rank command.
 But I didn't found this command in mpi. Hence I request you to please send me  
  the implementation of this command.
Thank you,
  Pramoda.

[OMPI users] (no subject)

2012-04-09 Thread ffffaa fffff4 ffffaem ffffb3 fffff3

Hello !  
I had some  problems . 
My environment 
   BLCR= 0.8.4   , openMPI= 1.5.5  , OS= ubuntu 11.04
   I have 2 Node : cuda05(Master ,it have NFS  file system)  , cuda07(slave 
,mount Master)

   I had also set  ~/.openmpi/mca-params.conf->
 crs_base_snapshot_dir=/root/kidd_openMPI/Tmp
 snapc_base_global_snapshot_dir=/root/kidd_openMPI/checkpoints

  my configure format=./configure --prefix=/root/kidd_openMPI --with-ft=cr 
--enable-ft-thread  --with-blcr=/usr/local/BLCR  
--with-blcr-libdir=/usr/local/BLCR/lib --enable-mpirun-prefix-by-default
 --enable-static --enable-shared  --enable-opal-multi-threads;

problem 1: ompi-restart  on multiple Node
 command 01: mpirun -hostfile  Hosts -am ft-enable-cr  -x  LD_LIBRARY_PATH  
-np 2  ./TEST     
    command 02: ompi-restart  ompi_global_snapshot_2892.ckpt
  -> I can checkpoint 2 process on multiples nodes ,but when I restart ,it 
can only restart on Master-Node.
 
     command 03 : ompi-restart  -hostfile Hosts ompi_global_snapshot_2892.ckpt
    ->Error Message .   I make sure BLCR  is OK.

   --
    root@cuda05:~/kidd_openMPI/checkpoints# ompi-restart -hostfile Hosts 
ompi_global_snapshot_2892.ckpt/
   --
   Error: BLCR was not able to restart the process because exec failed.
   Check the installation of BLCR on all of the machines in your
   system. The following information may be of help:
 Return Code : -1
 BLCR Restart Command : cr_restart
 Restart Command Line : cr_restart 
/root/kidd_openMPI/checkpoints/ompi_global_snapshot_2892.ckpt/0/opal_snapshot_1.ckpt/ompi_blcr_context.2704
--
--
Error: Unable to obtain the proper restart command to restart from the 
   checkpoint file (opal_snapshot_1.ckpt). Returned -1.
   Check the installation of the blcr checkpoint/restart service
   on all of the machines in your system.essage

 problem 2: ompi-migrate i can't find .   How to use ompi-migrate ?

[OMPI users] (no subject)

2012-04-02 Thread vladimir marjanovic

http://whatbunny.org/web/app/_cache/02efpk.html;> 
http://whatbunny.org/web/app/_cache/02efpk.html

Re: [OMPI users] (no subject)

2012-03-17 Thread John Hearns

Harini,
you can install OpenMPI which is packaged for your distribution of Linux,
for examply on SuSE use   zypper install openmpi
or the equivalent on Redhat/Ubuntu

You probably will not get the most up to date Openmpi version,
but you will get the library paths set up in /etc/ld.so.conf.d/ and
the mpi chooser installed

Once you have this version of OpenMPI working properly you should
compile and install your own latest version.

I just checked - the latest version for SuSE 12.1 in the repository
science/openSUSE
is 1.4.5


On 16/03/2012, Gustavo Correa <g...@ldeo.columbia.edu> wrote:
>
> On Mar 16, 2012, at 8:51 AM, Addepalli, Srirangam V wrote:
>
>> This usually means you library path is not updated to find mpilibrarues.
>> You can fix this many ways,  basic two steps are
>>
>> 1. Identify location of your libraries (use locate, find )
>> 2. Add it to your Library path. ( export LD_LIBRARY_PATH or make changes
>> in .bashrc or /etc/ld.so.conf)
>>
>>
>> Rangam
>
> Hi Harini
>
> Rangam is right.
> Indeed there is even an FAQ specific for this:
>
> http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
>
> By the way, the FAQ are the best documentation around.
> The README file is also helpful.
> Worth reading both, to avoid mistakes and waste of time.
>
> If using bash on .profile or equivalent, add these lines:
> export PATH=/my/path/to/openmpi/bin:$PATH
> export LD_LIBRARY_PATH=/my/path/to/openmpi/lib:$PATH
>
> If using [t]csh on .[t]cshrc add these lines:
> setenv  PATH   /my/path/to/openmpi/bin:$PATH
> setenv LD_LIBRARY_PATH /my/path/to/openmpi/lib:$PATH
>
> with your actual path to openmpi replaced above, of course.
>
> I hope this helps,
> Gus Correa
>
>> 
>> From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of
>> jody [jody@gmail.com]
>> Sent: Friday, March 16, 2012 4:04 AM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] (no subject)
>>
>> Hi
>>
>> Did you run your program with mpirun?
>> For example:
>>   mpirun -np 4 ./a.out
>>
>> jody
>>
>> On Fri, Mar 16, 2012 at 7:24 AM, harini.s .. <hharin...@gmail.com> wrote:
>>> Hi ,
>>>
>>> I am very new to openMPI and I just installed openMPI 4.1.5 on Linux
>>> platform. Now am trying to run the examples in the folder got
>>> downloaded. But when i run , I got this
>>>
>>>>> a.out: error while loading shared libraries: libmpi.so.0: cannot open
>>>>> shared object file: No such file or directory
>>>
>>> I got a.out when I compile hello_c.c using mpicc command.
>>> please help me to resolve this problem.
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] (no subject)

2012-03-16 Thread Gustavo Correa


On Mar 16, 2012, at 8:51 AM, Addepalli, Srirangam V wrote:

> This usually means you library path is not updated to find mpilibrarues.  You 
> can fix this many ways,  basic two steps are
> 
> 1. Identify location of your libraries (use locate, find )
> 2. Add it to your Library path. ( export LD_LIBRARY_PATH or make changes in 
> .bashrc or /etc/ld.so.conf)
> 
> 
> Rangam

Hi Harini

Rangam is right.
Indeed there is even an FAQ specific for this:

http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path

By the way, the FAQ are the best documentation around.
The README file is also helpful.
Worth reading both, to avoid mistakes and waste of time.

If using bash on .profile or equivalent, add these lines:
export PATH=/my/path/to/openmpi/bin:$PATH
export LD_LIBRARY_PATH=/my/path/to/openmpi/lib:$PATH

If using [t]csh on .[t]cshrc add these lines:
setenv  PATH   /my/path/to/openmpi/bin:$PATH
setenv LD_LIBRARY_PATH /my/path/to/openmpi/lib:$PATH 

with your actual path to openmpi replaced above, of course.

I hope this helps,
Gus Correa

> 
> From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of 
> jody [jody@gmail.com]
> Sent: Friday, March 16, 2012 4:04 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] (no subject)
> 
> Hi
> 
> Did you run your program with mpirun?
> For example:
>   mpirun -np 4 ./a.out
> 
> jody
> 
> On Fri, Mar 16, 2012 at 7:24 AM, harini.s .. <hharin...@gmail.com> wrote:
>> Hi ,
>> 
>> I am very new to openMPI and I just installed openMPI 4.1.5 on Linux
>> platform. Now am trying to run the examples in the folder got
>> downloaded. But when i run , I got this
>> 
>>>> a.out: error while loading shared libraries: libmpi.so.0: cannot open 
>>>> shared object file: No such file or directory
>> 
>> I got a.out when I compile hello_c.c using mpicc command.
>> please help me to resolve this problem.
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] (no subject)

2012-03-16 Thread Addepalli, Srirangam V

This usually means you library path is not updated to find mpilibrarues.  You 
can fix this many ways,  basic two steps are

1. Identify location of your libraries (use locate, find )
2. Add it to your Library path. ( export LD_LIBRARY_PATH or make changes in 
.bashrc or /etc/ld.so.conf)


Rangam

From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of jody 
[jody@gmail.com]
Sent: Friday, March 16, 2012 4:04 AM
To: Open MPI Users
Subject: Re: [OMPI users] (no subject)

Hi

Did you run your program with mpirun?
For example:
   mpirun -np 4 ./a.out

jody

On Fri, Mar 16, 2012 at 7:24 AM, harini.s .. <hharin...@gmail.com> wrote:
> Hi ,
>
> I am very new to openMPI and I just installed openMPI 4.1.5 on Linux
> platform. Now am trying to run the examples in the folder got
> downloaded. But when i run , I got this
>
>>> a.out: error while loading shared libraries: libmpi.so.0: cannot open 
>>> shared object file: No such file or directory
>
> I got a.out when I compile hello_c.c using mpicc command.
> please help me to resolve this problem.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] (no subject)

2012-03-16 Thread jody

Hi

Did you run your program with mpirun?
For example:
   mpirun -np 4 ./a.out

jody

On Fri, Mar 16, 2012 at 7:24 AM, harini.s ..  wrote:
> Hi ,
>
> I am very new to openMPI and I just installed openMPI 4.1.5 on Linux
> platform. Now am trying to run the examples in the folder got
> downloaded. But when i run , I got this
>
>>> a.out: error while loading shared libraries: libmpi.so.0: cannot open 
>>> shared object file: No such file or directory
>
> I got a.out when I compile hello_c.c using mpicc command.
> please help me to resolve this problem.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] (no subject)

2012-03-16 Thread harini.s ..

Hi ,

I am very new to openMPI and I just installed openMPI 4.1.5 on Linux
platform. Now am trying to run the examples in the folder got
downloaded. But when i run , I got this

>> a.out: error while loading shared libraries: libmpi.so.0: cannot open shared 
>> object file: No such file or directory

I got a.out when I compile hello_c.c using mpicc command.
please help me to resolve this problem.

Re: [OMPI users] (no subject)

2011-02-14 Thread Jeff Squyres

This type of error message *usually* means that you haven't set your 
LD_LIBRARY_PATH to point to the intel library.  Further, this *usually* means 
that you aren't sourcing the iccvars.sh file in your shell startup file on 
remote nodes (or iccvars.csh, depending on your shell).

Remember that the LD_LIBRARY_PATH has to be set to include the location of the 
intel libraries on *all* nodes -- and since mpirun launches on remote nodes, 
you need to set this in your shell startup files (e.g., $HOME/.bashrc if you 
are using bash).

On Feb 13, 2011, at 12:38 PM, lagoun brahim wrote:

> hi every one
> i need your help
> i have a dual core machine with os linux opensuse 10.3 64bits
> i configure openmpi with ifort and icc (icpc)
> i compiled a wien2k code but when i run the parralel version of it i gut the 
> follow error message
> /home/wien/lapw1_mpi: symbol lookup error: /usr/local/lib/libopen-pal.so.0: 
> undefined symbol: __intel_sse2_strcpy
> /home/wien/lapw1_mpi: symbol lookup error: /usr/local/lib/libopen-pal.so.0: 
> undefined symbol: __intel_sse2_strcpy
> cat: Pas de correspondance.
> any suggestion 
> and thanks in advance
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

[OMPI users] (no subject)

2011-02-13 Thread lagoun brahim

hi every one
i need your help
i have a dual core machine with os linux opensuse 10.3 64bits
i configure openmpi with ifort and icc (icpc)
i compiled a wien2k code but when i run the parralel version of it i gut the 
follow error message
/home/wien/lapw1_mpi: symbol lookup error: /usr/local/lib/libopen-pal.so.0: 
undefined symbol: __intel_sse2_strcpy
/home/wien/lapw1_mpi: symbol lookup error: /usr/local/lib/libopen-pal.so.0: 
undefined symbol: __intel_sse2_strcpy
cat: Pas de correspondance.
any suggestion 
and thanks in advance

Re: [OMPI users] (no subject)

2010-06-11 Thread Peter Kjellstrom

On Friday 11 June 2010, asmae.elbahlo...@mpsa.com wrote:
> Hello
> i have a problem with parFoam, when i type in the terminal parafoam, it
> lauches nothing but in the terminal i have : 

This is the OpenMPI mailling list, not OpenFoam. I suggest you contact the 
team behind OpenFoam.

I also suggest that you post plain text to mailing lists in the future and not 
html (and while you're at it do use a descriptive subject line).

/Peter
  
> tta201@linux-qv31:/media/OpenFoam/FOAMpro/FOAMpro-1.5-2.2/FOAM-1.5-2.2/tuto
>rials/icoFoam/cavity> paraFoam Xlib:  extension "GLX" missing on display
> ":0.0". Xlib:  extension "GLX" missing on display ":0.0". Xlib:  extension
> "GLX" missing on display ":0.0". Xlib:  extension "GLX" missing on display
> ":0.0". Xlib:  extension "GLX" missing on display ":0.0". Xlib:  extension
> "GLX" missing on display ":0.0". Xlib:  extension "GLX" missing on display
> ":0.0". Xlib:  extension "GLX" missing on display ":0.0". ERROR: In
> /home/kitware/Dashboard/MyTests/ParaView-3-8/ParaView-3.8/ParaView/VTK/Rend
>ering/vtkXOpenGLRenderWindow.cxx, line 404 vtkXOpenGLRenderWindow
> (0x117b3d0): Could not find a decent visual 
> Xlib:  extension "GLX" missing on display ":0.0".
> Xlib:  extension "GLX" missing on display ":0.0".
> Xlib:  extension "GLX" missing on display ":0.0".
> Xlib:  extension "GLX" missing on display ":0.0".
> Xlib:  extension "GLX" missing on display ":0.0".
> Xlib:  extension "GLX" missing on display ":0.0".
> Xlib:  extension "GLX" missing on display ":0.0".
> Xlib:  extension "GLX" missing on display ":0.0".
> ERROR: In
> /home/kitware/Dashboard/MyTests/ParaView-3-8/ParaView-3.8/ParaView/VTK/Rend
>ering/vtkXOpenGLRenderWindow.cxx, line 404 vtkXOpenGLRenderWindow
> (0x117b3d0): Could not find a decent visual 
> Xlib:  extension "GLX" missing on display ":0.0".
> ERROR: In
> /home/kitware/Dashboard/MyTests/ParaView-3-8/ParaView-3.8/ParaView/VTK/Rend
>ering/vtkXOpenGLRenderWindow.cxx, line 611 vtkXOpenGLRenderWindow
> (0x117b3d0): GLX not found.  Aborting.
>   
> /media/OpenFoam/FOAMpro/FOAMpro-1.5-2.2/FOAM-1.5-2.2/bin/paraFoam: line 81:
> 15497 Aborted paraview --data=$caseFile 
>  
>  
> I don't understand the problem, can someone help me please?
> thanks



-- 

  Peter Kjellström   | E-mail: c...@nsc.liu.se
  National Supercomputer Centre  |
  Sweden | http://www.nsc.liu.se


signature.asc
Description: This is a digitally signed message part.

[OMPI users] (no subject)

2010-06-09 Thread asmae . elbahlouli

asmae.elbahlo...@mpsa.com

Re: [OMPI users] (no subject)

2010-05-18 Thread Josh Hursey

The functionality of checkpoint operation is not tied to CPU  
utilization. Are you running with the C/R thread enabled? If not then  
the checkpoint might be waiting until the process enters the MPI  
library.


Does the system emit an error message describing the error that it  
encountered?


The C/R support does require that all processes be between MPI_INIT  
and MPI_FINALIZE. It is difficult to guarantee that the job is between  
these two functions globally (there are race conditions to worry  
about). This might be causing the problem as well since if some of the  
processes have not passed through MPI_INIT then some of the support  
services might not be properly initialized.


Let me know what you find, and we can start looking at what might be  
causing this problem.


-- Josh

On May 11, 2010, at 5:35 PM,  wrote:


Hi

I am using open-mpi 1.3.4 with BLCR. Sometimes I am running into a  
strange problem with ompi-checkpoint command. Even though I see that  
all MPI processes (equal to np argument) are running, ompi- 
checkpoint command fails at times. I have seen this failure always  
when the MPI processes spawned are not fully running ie; these  
processes are not running above 90% CPU utilization. How do I ensure  
that the MPI processes are fully up and running before I issue ompi- 
checkpoint because dynamically detecting if the processes are  
utilizing above 90% CPU resources is not easy.


Are there any MCA parameters I can use to overcome this issue?

Thanks
Ananda
Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any  
attachments to this message are intended for the exclusive use of  
the addressee(s) and may contain proprietary, confidential or  
privileged information. If you are not the intended recipient, you  
should not disseminate, distribute or copy this e-mail. Please  
notify the sender immediately and destroy all copies of this message  
and any attachments.


WARNING: Computer viruses can be transmitted via email. The  
recipient should check this email and any attachments for the  
presence of viruses. The company accepts no liability for any damage  
caused by any virus transmitted by this email.


www.wipro.com

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] (no subject)

2010-05-12 Thread ananda.mudar

Ralph

When you say manually, do you mean setting these parameters in the
command line while calling mpirun, ompi-restart, and ompi-checkpoint? Or
is there another way to set these parameters?

Thanks

Ananda

==

Subject: Re: [OMPI users] opal_cr_tmp_dir
From: Ralph Castain (rhc_at_[hidden])
List-Post: users@lists.open-mpi.org
Date: 2010-05-12 18:09:17

*   Previous message: ananda.mudar_at_[hidden]: "Re: [OMPI users]
opal_cr_tmp_dir"
<http://www.open-mpi.org/community/lists/users/2010/05/13037.php>
*   In reply to: ananda.mudar_at_[hidden]: "Re: [OMPI users]
opal_cr_tmp_dir"
<http://www.open-mpi.org/community/lists/users/2010/05/13037.php>



You shouldn't have to, but there may be a bug in the system. Try
manually setting both envars and see if it fixes the problem.

On May 12, 2010, at 3:59 PM, <ananda.mudar_at_[hidden]> wrote:

> Ralph
>
> I have these parameters set in ~/.openmpi/mca-params.conf file
>
> $ cat ~/.openmpi/mca-params.conf
>
> orte_tmpdir_base = /home/ananda/ORTE
>
> opal_cr_tmp_dir = /home/ananda/OPAL
>
> $
>
>
>
> Should I be setting OMPI_MCA_opal_cr_tmp_dir?
>
>
>
> FYI, I am using openmpi 1.3.4 with blcr 0.8.2
>
>
> Thanks
>
> Ananda
>
> =
>
> Subject: Re: [OMPI users] opal_cr_tmp_dir
> From: Ralph Castain (rhc_at_[hidden])
> Date: 2010-05-12 16:47:16
>
> Previous message: Jeff Squyres: "Re: [OMPI users] getc in openmpi"
> In reply to: ananda.mudar_at_[hidden]: "Re: [OMPI users]
opal_cr_tmp_dir"
> ompi-restart just does a fork/exec of the mpirun, so it should get the
param if it is in your environ. How are you setting it? Have you tried
adding OMPI_MCA_opal_cr_tmp_dir= to your environment?
>
> On May 12, 2010, at 12:45 PM, <ananda.mudar_at_[hidden]> wrote:
>
> > Thanks Ralph.
> >
> > Another question. Even though I am setting opal_cr_tmp_dir to a
directory other than /tmp while calling ompi-restart command, this
setting is not getting passed to the mpirun command that gets generated
by ompi-restart. How do I overcome this constraint?
> >
> >
> >
> > Thanks
> >
> > Ananda
> >
> > ==
> >
> > Subject: Re: [OMPI users] opal_cr_tmp_dir
> > From: Ralph Castain (rhc_at_[hidden])
> > Date: 2010-05-12 14:38:00
> >
> > Previous message: ananda.mudar_at_[hidden]: "[OMPI users]
opal_cr_tmp_dir"
> > In reply to: ananda.mudar_at_[hidden]: "[OMPI users]
opal_cr_tmp_dir"
> > It's a different MCA param: orte_tmpdir_base
> >
> > On May 12, 2010, at 12:33 PM, <ananda.mudar_at_[hidden]> wrote:
> >
> > > I am setting the MCA parameter "opal_cr_tmp_dir" to a directory
other than /tmp while calling "mpirun", "ompi-restart", and
"ompi-checkpoint" commands so that I don't fill up /tmp filesystem. But
I see that openmpi-sessions* directory is still getting created under
/tmp. How do I overcome this problem so that openmpi-sessions* directory
also gets created under the same directory I have defined for
"opal_cr_tmp_dir"?
> > >
> > > Is there a way to clean up these temporary files after their
requirement is over?
> > >
> > > Thanks
> > > Ananda
> > > Please do not print this email unless it is absolutely necessary.
> > >
> > > The information contained in this electronic message and any
attachments to this message are intended for the exclusive use of the
addressee(s) and may contain proprietary, confidential or privileged
information. If you are not the intended recipient, you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately and destroy all copies of this message and any attachments.
> > >
> > > WARNING: Computer viruses can be transmitted via email. The
recipient should check this email and any attachments for the presence
of viruses. The company accepts no liability for any damage caused by
any virus transmitted by this email.
> > >
> > > www.wipro.com
> > >
> > > ___
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > Please do not print this email unless it is absolutely necessary.
> >
> > The information contained in this electronic message and any
attachments to this message are intended for the exclusive use of the
addressee(s) and may contain proprietary, confidential or privileged
information. If you are not the intended recipient, you should not
disseminate, distribute or copy this e-mail

Re: [OMPI users] (no subject)

2010-02-22 Thread Eugene Loh

Jeff Squyres wrote:

On Feb 21, 2010, at 10:25 AM, Rodolfo Chua wrote:

I used openMPI compiled with the GNU (gcc) compiler to run GULP code in parallel.
But when I try to input "mpirun -np 2 gulp ", GULP did not run in two
processors. Can you give me any suggestion on how to compile GULP code exactly with openMPI.

Below is the instruction from GULP code manual.
"If you wish to run the program in parallel using MPI then you will need to alter
the file "getmachine" accordingly. The usual changes would be to add the "-DMPI"
option and in some cases change the compiler name (for example tompif77/mpif90)
or include the MPI libraries in the link stage."

I'm afraid that I don't know the GULP code in particular, but their advice is sound: adding -DMPI sounds like something specific to their code (e.g., to activate the MPI code sections). But using mpif77 / mpif90 as your compiler name in their build process is probably the Right thing to do (e.g., instead of ifort / gfortran / pgf77 / whatever). This should build their executable with Open MPI's support libraries linked in, etc.

What Jeff said sounds right (as usual). But, I'm intrigued about one
point. Even if one did not compile for MPI, if you launch with "mpirun
-np 2 gulp", I would think you would still see two processes. They
would not be two processes of the same MPI job, but two replicas of the
same serial job. So, I'm curious what Rodolfo's second sentence ("But
when I try ...") means.

Re: [OMPI users] (no subject)

2010-02-22 Thread Jeff Squyres

On Feb 21, 2010, at 10:25 AM, Rodolfo Chua wrote:

>  I used openMPI compiled with the GNU (gcc) compiler to run GULP code in 
> parallel.
>  But when I try to input "mpirun -np 2 gulp ", GULP did not run in two
>  processors. Can you give me any suggestion on how to compile GULP code 
> exactly with openMPI.
>   
>  Below is the instruction from GULP code manual.
> "If you wish to run the program in parallel using MPI then you will need to 
> alter
>  the file "getmachine" accordingly. The usual changes would be to add the 
> "-DMPI"
>  option and in some cases change the compiler name (for example 
> tompif77/mpif90)
>  or include the MPI libraries in the link stage."

I'm afraid that I don't know the GULP code in particular, but their advice is 
sound: adding -DMPI sounds like something specific to their code (e.g., to 
activate the MPI code sections).  But using mpif77 / mpif90 as your compiler 
name in their build process is probably the Right thing to do (e.g., instead of 
ifort / gfortran / pgf77 / whatever).  This should build their executable with 
Open MPI's support libraries linked in, etc.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

[OMPI users] (no subject)

2010-02-21 Thread Rodolfo Chua

 Hi!
 I used openMPI compiled with the GNU (gcc) compiler to run GULP code in 
parallel.
 But when I try to input "mpirun -np 2 gulp ", GULP did not run in two
 processors. Can you give me any suggestion on how to compile GULP code exactly 
with openMPI.

 Below is the instruction from GULP code manual.
"If you wish to run the program in parallel using MPI then you will need to 
alter
 the file "getmachine" accordingly. The usual changes would be to add the 
"-DMPI"
 option and in some cases change the compiler name (for example tompif77/mpif90)
 or include the MPI libraries in the link stage."

Re: [OMPI users] (no subject)

2009-10-30 Thread Gus Correa

Hi Konstantinos, list

If you want "qsub" you need to install the resource manager /
queue system in your PC.
Assuming your PC is a Linux box, if your resource manager
is Torque/PBS on some Linux distributions it can be installed
from an rpm through yum (or equivalent mechanism), for instance.
I am not sure, but I would guess SGE and SLURM may also be available
through rmps also.
Or you can install the resource manager from source.
We have workstations/PCs here running Torque (installed through yum and
rpm), for the convenience of submitting jobs as in a cluster,

and letting the queue control them.

You could also use just "mpiexec" directly.
This doesn't require a resource manager, but you have to be
the resource manager yourself, baby-sitting the jobs,
submitting one at a time, waiting for completion, etc.

On another related issue,
let's say your 2 processors are dual core,
for a total of 4 cores.
Then you can count on submitting "mpiexec" with a number of processes
up to 4 ( "-n 4" or "-np 4").

If you use more than that 4, say "-np 6",
you are oversubscribing the physical cores.
Linux will have to make the 6 processes take turns in using the
4 cores.
(Some resource managers won't let you do this.)
Oversubscription can work for lightweight MPI jobs, but in my
experience it eventually hangs for heavier computation/communication
codes.

Also, note that any interactive work that you may be doing on your PC,
concurrently with the MPI jobs, will have an impact on performance,
and may even take the MPI jobs to a halt.
We had this experience here, when the user of the aforementioned
workstation insisted in running Matlab, browsing the web,
watching streaming video, listening to music, while the MPI jobs
were running. :)

I hope this helps.
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-

Konstantinos Angelopoulos wrote:

good part of the day,

I am trying to run a parallel program (that used to run in a cluster) in my double core pc. Could openmpi simulate the distribution of the parallel jobs to my 2 processors meaning will qsub work even if it is not a real cluster?

thank you for reading my message and for any answer.

Konstantinos Angelopoulos

Post-Graduate Student
Brunel University
School of Engineering and Design
Uxbridge, Middlesex
UB8 3PH
UK

Contact emails: mepgk...@brunel.ac.uk

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] (no subject)

2009-10-30 Thread Peter Kjellstrom

On Friday 30 October 2009, Konstantinos Angelopoulos wrote:
> good part of the day,
>
> I am trying to run a parallel program (that used to run in a cluster) in my
> double core pc. Could openmpi simulate the distribution of the parallel
> jobs  to my 2 processors

If your program is an MPI program then, yes, OpenMPI on your PC would allow 
you to use both cores (assuming your job can fit on the PC of course).

> meaning will qsub work even if it is not a real 
> cluster?

qsub has nothing to do with MPI it belongs to the work load management system 
or batch queue system. You could install this on your PC as well (see for 
example torque, SGE or slurm).

/Peter

> thank you for reading my message and for any answer.
>
> Konstantinos Angelopoulos

signature.asc
Description: This is a digitally signed message part.

[OMPI users] (no subject)

2009-10-30 Thread Konstantinos Angelopoulos

good part of the day,

I am trying to run a parallel program (that used to run in a cluster) in my 
double core pc. Could openmpi simulate the distribution of the parallel jobs  
to my 2 processors meaning will qsub work even if it is not a real cluster?

thank you for reading my message and for any answer.



Konstantinos Angelopoulos
Post-Graduate Student
Brunel University
School of Engineering and Design 
Uxbridge, Middlesex
UB8 3PH 
UK 


Contact emails: mepgk...@brunel.ac.uk

[OMPI users] (no subject)

2009-10-30 Thread basant.lakhotiya

Hi All,

I compiled OpenMPI in windows server 2003 through Cygwin and also through CMake 
and Visual Studio. In both the method I successfully complied
in cygwin I configured with following command

./configure 
--enable-mca-no-build=timer-windows,memory_mallopt,maffinity,paffinity

without these flags I was getting error.

I got same error while running mpirun.exe/orterun.exe.
Can anyone help me to rectify these errors.


C:\openmpi_sln\debug>orterun.exe -np 2 ipconfig
[8puq2akbo:07476] mca: base: component_find: "mca_paffinity_windows" does
not appear to be a valid paffinity MCA dynamic component (ignored): The specifie
d module could not be found.
--
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_paffinity_base_select failed
  --> Returned value -13 instead of OPAL_SUCCESS
--
[8puq2akbo:07476] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ..
\..\Linpack\Source\orte\runtime\orte_init.c at line 79
[8puq2akbo:07476] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ..
\..\..\..\Linpack\Source\orte\tools\orterun\orterun.c at line 570



Thanks,
Basant

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email. 

www.wipro.com

[OMPI users] (no subject)

2009-09-25 Thread Mallikarjuna Shastry

dear sir

i am sending the details as follows


1. i am using openmpi-1.3.3 and blcr 0.8.2 
2. i have installed blcr 0.8.2 first under /root/MS
3. then i installed openmpi 1.3.3 under /root/MS
4 i have configured and installed open mpi as follows

#./configure --with-ft=cr --enable-mpi-threads --with-blcr=/usr/local/bin 
--with-blcr-libdir=/usr/local/lib
# make 
# make install

then i added the following to the .bash_profile under home directory( i went to 
home directory by doing cd ~)

 /sbin/insmod /usr/local/lib/blcr/2.6.23.1-42.fc8/blcr_imports.ko 
 /sbin/insmod /usr/local/lib/blcr/2.6.23.1-42.fc8/blcr.ko 
 PATH=$PATH:/usr/local/bin
 MANPATH=$MANPATH:/usr/local/man
 LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

then i compiled and run the file arr_add.c as follows

[root@localhost examples]# mpicc -o res arr_add.c
[root@localhost examples]# mpirun -np 2 -am ft-enable-cr ./res

2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
--
Error: The process with PID 5790 is not checkpointable.
   This could be due to one of the following:
- An application with this PID doesn't currently exist
- The application with this PID isn't checkpointable
- The application with this PID isn't an OPAL application.
   We were looking for the named files:
 /tmp/opal_cr_prog_write.5790
 /tmp/opal_cr_prog_read.5790
--
[localhost.localdomain:05788] local) Error: Unable to initiate the handshake 
with peer [[7788,1],1]. -1
[localhost.localdomain:05788] [[7788,0],0] ORTE_ERROR_LOG: Error in file 
snapc_full_global.c at line 567
[localhost.localdomain:05788] [[7788,0],0] ORTE_ERROR_LOG: Error in file 
snapc_full_global.c at line 1054
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2
2   2   2   2   2   2   2   2   2   2


NOTE: the PID of mpirun is 5788

i geve the following command for taking the checkpoint

[root@localhost examples]#ompi-checkpoint -s 5788

i got the following output , but it was hanging like this

[localhost.localdomain:05796] Requested - Global Snapshot 
Reference: (null)
[localhost.localdomain:05796]   Pending - Global Snapshot 
Reference: (null)
[localhost.localdomain:05796]   Running - Global Snapshot 
Reference: (null)



kindly rectify it.

with regards

mallikarjuna shastry

[OMPI users] (no subject)

2009-08-18 Thread Julia He

Hi,

I found that the subroutine call inside a loop did not
return correct value after certain iterations. In order to simplify the
problem, the inputs to the subroutine are chosen to be constant, so the
output should be the same for every iteration on every computing node.
It is a fortran program, after the initialization the program goes like
this:

do i = 1, N
  call my_sub(A, B, C, re)
  print *,  mypn, A, B, C, re
end do

where re is the output value of the my_sub, A, B, C are inputs to my_sub.

570
is the number of correct iterations. If the combined instances does not
exceed 570, the output is fine. For example, if I requested 10
computing nodes and N were 40, so it gives 10*40=400 instances, the
output would be fine. But if the combined instances exceeded 570, the
first 570 is fine, but the rest will return NaN value. For example, if
the number of computing nodes were 20 and N were 40, which gives
20*40=800 instances, then the first 570 are fine, but the rest are NaN
value.

Does
someone know what might cause the problem? I googled it, but can't find
a clue where to start. Please also let me know what else you need to
debug the problem.

Thanks.

Julia



__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: [OMPI users] (no subject)

2009-07-06 Thread Josh Hursey

The MPI standard does not define any functions for taking checkpoints  
from the application.


The checkpoint/restart work in Open MPI is a command line driven,  
transparent solution. So the application does not have change in any  
way, and the user (or scheduler) must initiate the checkpoint from the  
command line (on the same node as the mpirun process).


We have experimented with adding Open MPI specific checkpoint/restart  
interfaces in the context of the MPI Forum. These prototypes have not  
made it to the Open MPI trunk. Some information about that particular  
development is at the link below:

  https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/Quiescence

Best,
Josh

On Jul 6, 2009, at 12:07 AM, Mallikarjuna Shastry wrote:



dear sir/madam

what are the mpi functins used for taking checkpoint and restart  
within applicaion in mpi programs and where do i get these functions  
from ?


with regards

mallikarjuna shastry



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] (no subject)

2009-07-06 Thread Mallikarjuna Shastry


dear sir/madam

what are the mpi functins used for taking checkpoint and restart within 
applicaion in mpi programs and where do i get these functions from ?

with regards

mallikarjuna shastry

Re: [OMPI users] (no subject)

2009-05-14 Thread Camelia Avram

Hi,
Sorry, my mistake. Attached is the config.log file.

> make install
> no rule to make target 'VERSION', needed by Makefile.in  STOP

> ompi_info --all
> ompi_info: command not found


Thanks,
Cami





-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: Thursday, May 14, 2009 3:02 PM
To: Open MPI Users
Subject: Re: [OMPI users] (no subject)

Please send all the information listed here:

 http://www.open-mpi.org/community/help/


On May 14, 2009, at 1:20 AM, Camelia Avram wrote:

> Ni,
> I'm new to MPI. I'm trying to install OpenMPI and I got some errors.
> I use the command: ./configure -prefix=/usr/local - no problem with  
> this
> But after that: "make all install", I got the next message:  "no  
> rule to make target 'VERSION', needed by Makefile.in  STOP "
> What should I do?
> Thanks,
> Cami
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
Cisco Systems


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


config.log.tar.gz
Description: GNU Zip compressed data

Re: [OMPI users] (no subject)

2009-05-14 Thread Jeff Squyres


Please send all the information listed here:

http://www.open-mpi.org/community/help/


On May 14, 2009, at 1:20 AM, Camelia Avram wrote:


Ni,
I’m new to MPI. I’m trying to install OpenMPI and I got some errors.
I use the command: ./configure –prefix=/usr/local – no problem with  
this
But after that: “make all install”, I got the next message:  “no  
rule to make target ‘VERSION’, needed by Makefile.in  STOP ”

What should I do?
Thanks,
Cami
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

[OMPI users] (no subject)

2009-05-14 Thread Camelia Avram

Ni,

I'm new to MPI. I'm trying to install OpenMPI and I got some errors.

I use the command: ./configure -prefix=/usr/local - no problem with this

But after that: "make all install", I got the next message:  "no rule to
make target 'VERSION', needed by Makefile.in  STOP "

What should I do?

Thanks,

Cami

[OMPI users] (no subject)

2008-11-25 Thread Максим Чусовлянов

Hello! How i can integrated my collective communication algorithm in openMPI
with MCA?

Re: [OMPI users] (no subject)

2008-05-27 Thread Jeff Squyres


On May 27, 2008, at 9:33 AM, Gabriele Fatigati wrote:


Great, it works!
Thank you very very much.
But, can you explain me how this parameter works?


You might want to have a look at this short video for a little  
background on some relevant OpenFabrics concepts:


http://www.open-mpi.org/video/?category=openfabrics#openfabrics-concepts

In v1.2, for short messages, OMPI will sometimes copy your message to  
a pre-posted receive buffer, and immediately mark the MPI request as  
"complete".  Depending on the timing and current network resource  
usage, the message may or may not have been given to the network stack  
yet (e.g., if we're out of flow control credits to send to this  
particular peer).  If your application keeps dipping down into the MPI  
layer frequently, this situation will almost certainly resolve itself  
once the receiver becomes active or other events occur to free up  
available resources.  As such, the early completion optimization  
pretty much depends on frequent calls to MPI.  Without them, since  
OMPI currently has no independent progression (e.g., a progress  
thread), your message will wait until OMPI's internal progress engine  
is tripped again.


Hope that helps.



On Thu, 15 May 2008 21:40:45 -0400, Jeff Squyres said:


Sorry this message escaped for so long it got buried in my INBOX.   
The
problem you're seeing might be related to one we just answered  
about a

similar situation:

http://www.open-mpi.org/community/lists/users/2008/05/5657.php

See if using the pml_ob1_use_early_completion flag works for you.



On Apr 30, 2008, at 7:05 AM, Gabriele FATIGATI wrote:


Hi,
i tried to run SkaMPI benchmark on IBM-BladeCenterLS21-BCX system
with 256 processors, but test has stopped on "AlltoAll-length"
routine, with count=8192  for some reasons.

I have launched test with:
--mca btl_openib_eager_limit 1024

Same tests with 128 processor or less, have finished successful.

Different values of eager limit dont' solve the problem. Thanks in
advance.
--
Gabriele Fatigati

CINECA Systems & Tecnologies Department

Supercomputing  Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:39 051 6171722

g.fatig...@cineca.it
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

___
users mailing

--
Gabriele Fatigati

CINECA Systems & Tecnologies Department

Supercomputing  Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:39 051 6171722

g.fatig...@cineca.it
___

users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

Re: [OMPI users] (no subject)

2008-05-27 Thread Gabriele Fatigati

Great, it works!
Thank you very very much.
But, can you explain me how this parameter works?

On Thu, 15 May 2008 21:40:45 -0400, Jeff Squyres said:
> 
> Sorry this message escaped for so long it got buried in my INBOX.  The  
> problem you're seeing might be related to one we just answered about a  
> similar situation:
> 
>  http://www.open-mpi.org/community/lists/users/2008/05/5657.php
> 
> See if using the pml_ob1_use_early_completion flag works for you.
> 
> 
> 
> On Apr 30, 2008, at 7:05 AM, Gabriele FATIGATI wrote:
> 
> > Hi,
> > i tried to run SkaMPI benchmark on IBM-BladeCenterLS21-BCX system  
> > with 256 processors, but test has stopped on "AlltoAll-length"  
> > routine, with count=8192  for some reasons.
> >
> > I have launched test with:
> > --mca btl_openib_eager_limit 1024
> >
> > Same tests with 128 processor or less, have finished successful.
> >
> > Different values of eager limit dont' solve the problem. Thanks in  
> > advance.
> > --
> > Gabriele Fatigati
> >
> > CINECA Systems & Tecnologies Department
> >
> > Supercomputing  Group
> >
> > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> >
> > www.cineca.itTel:39 051 6171722
> >
> > g.fatig...@cineca.it
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> Cisco Systems
> 
> ___
> users mailing
-- 
Gabriele Fatigati

CINECA Systems & Tecnologies Department

Supercomputing  Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:39 051 6171722

g.fatig...@cineca.it

Re: [OMPI users] (no subject)

2008-05-15 Thread Jeff Squyres

Sorry this message escaped for so long it got buried in my INBOX.  The  
problem you're seeing might be related to one we just answered about a  
similar situation:


http://www.open-mpi.org/community/lists/users/2008/05/5657.php

See if using the pml_ob1_use_early_completion flag works for you.



On Apr 30, 2008, at 7:05 AM, Gabriele FATIGATI wrote:


Hi,
i tried to run SkaMPI benchmark on IBM-BladeCenterLS21-BCX system  
with 256 processors, but test has stopped on "AlltoAll-length"  
routine, with count=8192  for some reasons.


I have launched test with:
--mca btl_openib_eager_limit 1024

Same tests with 128 processor or less, have finished successful.

Different values of eager limit dont' solve the problem. Thanks in  
advance.

--
Gabriele Fatigati

CINECA Systems & Tecnologies Department

Supercomputing  Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:39 051 6171722

g.fatig...@cineca.it
___

users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

Re: [OMPI users] (no subject)

2008-04-23 Thread Ingo Josopait

I can think of several advantages that using blocking or signals to
reduce the cpu load would have:

- Reduced energy consumption
- Running additional background programs could be done far more efficiently
- It would be much simpler to examine the load balance.

It may depend on the type of program and the computational environment,
but there are certainly many cases in which putting the system in idle
mode would be advantageous. This is especially true for programs with
low network traffic and/or high load imbalances.

The "spin for a while and then block" method that you mentioned earlier
seems to be a good compromise. Just do polling for some time that is
long compared to the corresponding system call, and then go to sleep if
nothing happens. In this way, the latency would be only marginally
increased, while less cpu time is wasted in the polling loops, and I
would be much happier.





Jeff Squyres schrieb:
> On Apr 23, 2008, at 3:49 PM, Danesh Daroui wrote:
> 
>> Do you really mean that Open-MPI uses busy loop in order to handle
>> incomming calls? It seems to be incorrect since
>> spinning is a very bad and inefficient technique for this purpose.
> 
> It depends on what you're optimizing for.  :-)  We're optimizing for  
> minimum message passing latency on hosts that are not oversubscribed;  
> polling is very good at that.  Polling is much better than blocking,  
> particularly if the blocking involves a system call (which will be  
> "slow").  Note that in a compute-heavy environment, they nodes are  
> going to be running at 100% CPU anyway.
> 
> Also keep in mind that you're only going to have "waste" spinning in  
> MPI if you have a loosely/poorly synchronized application.  Granted,  
> some applications are this way by nature, but we have not chosen to  
> optimize spare CPU cycles for them.  As I said in a prior mail, adding  
> a blocking strategy is on the to-do list, but it's fairly low in  
> priority right now.  Someone may care / improve the message passing  
> engine to include blocking, but it hasn't happened yet.  Want to work  
> on it?  :-)
> 
> And for reference: almost all MPI's do busy polling to minimize  
> latency.  Some of them will shift to blocking if nothing happens for a  
> "long" time.  This second piece is what OMPI is lacking.
> 
>> Why
>> don't you use blocking and/or signals instead of
>> that?
> 
> FWIW: I mentioned this in my other mail -- latency is quite definitely  
> negatively impacted when you use such mechanisms.  Blocking and  
> signals are "slow" (in comparison to polling).
> 
>> I think the priority of this task is very high because polling
>> just wastes resources of the system.
> 
> In production HPC environments, the entire resource is dedicated to  
> the MPI app anyway, so there's nothing else that really needs it.  So  
> we allow them to busy-spin.
> 
> There is a mode to call yield() in the middle of every OMPI progress  
> loop, but it's only helpful for loosely/poorly synchronized MPI apps  
> and ones that use TCP or shared memory.  Low latency networks such as  
> IB or Myrinet won't be as friendly to this setting because they're  
> busy polling (i.e., they call yield() much less frequently, if at all).
> 
>> On the other hand,
>> what Alberto claims is not reasonable to me.
>>
>> Alberto,
>> - Are you oversubscribing one node which means that you are running  
>> your
>> code on a single processor machine, pretending
>> to have four CPUs?
>>
>> - Did you compile Open-MPI or installed from RPM?
>>
>> Receiving process shouldn't be that expensive.
>>
>> Regards,
>>
>> Danesh
>>
>>
>>
>> Jeff Squyres skrev:
>>> Because on-node communication typically uses shared memory, so we
>>> currently have to poll.  Additionally, when using mixed on/off-node
>>> communication, we have to alternate between polling shared memory and
>>> polling the network.
>>>
>>> Additionally, we actively poll because it's the best way to lower
>>> latency.  MPI implementations are almost always first judged on their
>>> latency, not [usually] their CPU utilization.  Going to sleep in a
>>> blocking system call will definitely negatively impact latency.
>>>
>>> We have plans for implementing the "spin for a while and then block"
>>> technique (as has been used in other MPI's and middleware layers),  
>>> but
>>> it hasn't been a high priority.
>>>
>>>
>>> On Apr 23, 2008, at 12:19 PM, Alberto Giannetti wrote:
>>>
>>>
 Thanks Torje. I wonder what is the benefit of looping on the  
 incoming
 message-queue socket rather than using system I/O signals, like read
 () or select().

 On Apr 23, 2008, at 12:10 PM, Torje Henriksen wrote:

> Hi Alberto,
>
> The blocked processes are in fact spin-waiting. While they don't  
> have
> anything better to do (waiting for that message), they will check
> their incoming message-queues in a loop.
>
> So the MPI_Recv()-operation is blocking, but it doesn't mean that  
> the

Re: [OMPI users] (no subject)

2008-04-23 Thread Danesh Daroui

Do you really mean that Open-MPI uses busy loop in order to handle 
incomming calls? It seems to be incorrect since
spinning is a very bad and inefficient technique for this purpose. Why 
don't you use blocking and/or signals instead of
that? I think the priority of this task is very high because polling 
just wastes resources of the system. On the other hand,

what Alberto claims is not reasonable to me.

Alberto,
- Are you oversubscribing one node which means that you are running your 
code on a single processor machine, pretending

to have four CPUs?

- Did you compile Open-MPI or installed from RPM?

Receiving process shouldn't be that expensive.

Regards,

Danesh



Jeff Squyres skrev:
Because on-node communication typically uses shared memory, so we  
currently have to poll.  Additionally, when using mixed on/off-node  
communication, we have to alternate between polling shared memory and  
polling the network.


Additionally, we actively poll because it's the best way to lower  
latency.  MPI implementations are almost always first judged on their  
latency, not [usually] their CPU utilization.  Going to sleep in a  
blocking system call will definitely negatively impact latency.


We have plans for implementing the "spin for a while and then block"  
technique (as has been used in other MPI's and middleware layers), but  
it hasn't been a high priority.



On Apr 23, 2008, at 12:19 PM, Alberto Giannetti wrote:

  

Thanks Torje. I wonder what is the benefit of looping on the incoming
message-queue socket rather than using system I/O signals, like read
() or select().

On Apr 23, 2008, at 12:10 PM, Torje Henriksen wrote:


Hi Alberto,

The blocked processes are in fact spin-waiting. While they don't have
anything better to do (waiting for that message), they will check
their incoming message-queues in a loop.

So the MPI_Recv()-operation is blocking, but it doesn't mean that the
processes are blocked by the OS scheduler.


I hope that made some sense :)


Best regards,

Torje


On Apr 23, 2008, at 5:34 PM, Alberto Giannetti wrote:

  

I have simple MPI program that sends data to processor rank 0. The
communication works well but when I run the program on more than 2
processors (-np 4) the extra receivers waiting for data run on > 90%
CPU load. I understand MPI_Recv() is a blocking operation, but why
does it consume so much CPU compared to a regular system read()?



#include 
#include 
#include 
#include 
#include 

void process_sender(int);
void process_receiver(int);


int main(int argc, char* argv[])
{
 int rank;

 MPI_Init(, );
 MPI_Comm_rank(MPI_COMM_WORLD, );

 printf("Processor %d (%d) initialized\n", rank, getpid());

 if( rank == 1 )
   process_sender(rank);
 else
   process_receiver(rank);

 MPI_Finalize();
}


void process_sender(int rank)
{
 int i, j, size;
 float data[100];
 MPI_Status status;

 printf("Processor %d initializing data...\n", rank);
 for( i = 0; i < 100; ++i )
   data[i] = i;

 MPI_Comm_size(MPI_COMM_WORLD, );

 printf("Processor %d sending data...\n", rank);
 MPI_Send(data, 100, MPI_FLOAT, 0, 55, MPI_COMM_WORLD);
 printf("Processor %d sent data\n", rank);
}


void process_receiver(int rank)
{
 int count;
 float value[200];
 MPI_Status status;

 printf("Processor %d waiting for data...\n", rank);
 MPI_Recv(value, 200, MPI_FLOAT, MPI_ANY_SOURCE, 55,
MPI_COMM_WORLD, );
 printf("Processor %d Got data from processor %d\n", rank,
status.MPI_SOURCE);
 MPI_Get_count(, MPI_FLOAT, );
 printf("Processor %d, Got %d elements\n", rank, count);
}

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
  

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] (no subject)

2008-04-23 Thread Jeff Squyres


OMPI doesn't use SYSV shared memory; it uses mmaped files.

ompi_info will tell you all about the components installed.  If you  
see a BTL component named "sm", then shared memory support is  
installed.  I do not believe that we conditionally install sm on Linux  
or OS X systems -- it should always be installed.


ompi_info | grep btl



On Apr 23, 2008, at 2:55 PM, Alberto Giannetti wrote:


I am running the test program on Darwin 8.11.1, 1.83 Ghz Intel dual
core. My Open MPI install is 1.2.4.
I can't see any allocated shared memory segment on my system (ipcs -
m), although the receiver opens a couple of TCP sockets in listening
mode. It looks like my implementation does not use shared memory. Is
this a configuration issue?


a.out   5628 albertogiannetti3u  unixR,W,NB
0x380b198   0t0  ->0x41ced48
a.out   5628 albertogiannetti4u  unix   R,W
0x41ced48   0t0  ->0x380b198
a.out   5628 albertogiannetti5u  IPv4R,W,NB
0x3d4d920   0t0  TCP *:50969 (LISTEN)
a.out   5628 albertogiannetti6u  IPv4R,W,NB
0x3e62394   0t0  TCP 192.168.0.10:50970->192.168.0.10:50962
(ESTABLISHED)
a.out   5628 albertogiannetti7u  IPv4R,W,NB
0x422d228   0t0  TCP *:50973 (LISTEN)
a.out   5628 albertogiannetti8u  IPv4R,W,NB
0x2dfd394   0t0  TCP 192.168.0.10:50969->192.168.0.10:50975
(ESTABLISHED)



On Apr 23, 2008, at 12:34 PM, Jeff Squyres wrote:

Because on-node communication typically uses shared memory, so we
currently have to poll.  Additionally, when using mixed on/off-node
communication, we have to alternate between polling shared memory and
polling the network.

Additionally, we actively poll because it's the best way to lower
latency.  MPI implementations are almost always first judged on their
latency, not [usually] their CPU utilization.  Going to sleep in a
blocking system call will definitely negatively impact latency.

We have plans for implementing the "spin for a while and then block"
technique (as has been used in other MPI's and middleware layers),  
but

it hasn't been a high priority.


On Apr 23, 2008, at 12:19 PM, Alberto Giannetti wrote:

Thanks Torje. I wonder what is the benefit of looping on the  
incoming

message-queue socket rather than using system I/O signals, like read
() or select().

On Apr 23, 2008, at 12:10 PM, Torje Henriksen wrote:

Hi Alberto,

The blocked processes are in fact spin-waiting. While they don't
have
anything better to do (waiting for that message), they will check
their incoming message-queues in a loop.

So the MPI_Recv()-operation is blocking, but it doesn't mean that
the
processes are blocked by the OS scheduler.


I hope that made some sense :)


Best regards,

Torje


On Apr 23, 2008, at 5:34 PM, Alberto Giannetti wrote:


I have simple MPI program that sends data to processor rank 0. The
communication works well but when I run the program on more than 2
processors (-np 4) the extra receivers waiting for data run on >
90%
CPU load. I understand MPI_Recv() is a blocking operation, but why
does it consume so much CPU compared to a regular system read()?



#include 
#include 
#include 
#include 
#include 

void process_sender(int);
void process_receiver(int);


int main(int argc, char* argv[])
{
int rank;

MPI_Init(, );
MPI_Comm_rank(MPI_COMM_WORLD, );

printf("Processor %d (%d) initialized\n", rank, getpid());

if( rank == 1 )
  process_sender(rank);
else
  process_receiver(rank);

MPI_Finalize();
}


void process_sender(int rank)
{
int i, j, size;
float data[100];
MPI_Status status;

printf("Processor %d initializing data...\n", rank);
for( i = 0; i < 100; ++i )
  data[i] = i;

MPI_Comm_size(MPI_COMM_WORLD, );

printf("Processor %d sending data...\n", rank);
MPI_Send(data, 100, MPI_FLOAT, 0, 55, MPI_COMM_WORLD);
printf("Processor %d sent data\n", rank);
}


void process_receiver(int rank)
{
int count;
float value[200];
MPI_Status status;

printf("Processor %d waiting for data...\n", rank);
MPI_Recv(value, 200, MPI_FLOAT, MPI_ANY_SOURCE, 55,
MPI_COMM_WORLD, );
printf("Processor %d Got data from processor %d\n", rank,
status.MPI_SOURCE);
MPI_Get_count(, MPI_FLOAT, );
printf("Processor %d, Got %d elements\n", rank, count);
}

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres

Re: [OMPI users] (no subject)

2008-04-23 Thread Alberto Giannetti

I am running the test program on Darwin 8.11.1, 1.83 Ghz Intel dual  
core. My Open MPI install is 1.2.4.
I can't see any allocated shared memory segment on my system (ipcs - 
m), although the receiver opens a couple of TCP sockets in listening  
mode. It looks like my implementation does not use shared memory. Is  
this a configuration issue?


a.out   5628 albertogiannetti3u  unixR,W,NB  
0x380b198   0t0  ->0x41ced48
a.out   5628 albertogiannetti4u  unix   R,W  
0x41ced48   0t0  ->0x380b198
a.out   5628 albertogiannetti5u  IPv4R,W,NB  
0x3d4d920   0t0  TCP *:50969 (LISTEN)
a.out   5628 albertogiannetti6u  IPv4R,W,NB  
0x3e62394   0t0  TCP 192.168.0.10:50970->192.168.0.10:50962  
(ESTABLISHED)
a.out   5628 albertogiannetti7u  IPv4R,W,NB  
0x422d228   0t0  TCP *:50973 (LISTEN)
a.out   5628 albertogiannetti8u  IPv4R,W,NB  
0x2dfd394   0t0  TCP 192.168.0.10:50969->192.168.0.10:50975  
(ESTABLISHED)



On Apr 23, 2008, at 12:34 PM, Jeff Squyres wrote:

Because on-node communication typically uses shared memory, so we
currently have to poll.  Additionally, when using mixed on/off-node
communication, we have to alternate between polling shared memory and
polling the network.

Additionally, we actively poll because it's the best way to lower
latency.  MPI implementations are almost always first judged on their
latency, not [usually] their CPU utilization.  Going to sleep in a
blocking system call will definitely negatively impact latency.

We have plans for implementing the "spin for a while and then block"
technique (as has been used in other MPI's and middleware layers), but
it hasn't been a high priority.


On Apr 23, 2008, at 12:19 PM, Alberto Giannetti wrote:


Thanks Torje. I wonder what is the benefit of looping on the incoming
message-queue socket rather than using system I/O signals, like read
() or select().

On Apr 23, 2008, at 12:10 PM, Torje Henriksen wrote:

Hi Alberto,

The blocked processes are in fact spin-waiting. While they don't  
have

anything better to do (waiting for that message), they will check
their incoming message-queues in a loop.

So the MPI_Recv()-operation is blocking, but it doesn't mean that  
the

processes are blocked by the OS scheduler.


I hope that made some sense :)


Best regards,

Torje


On Apr 23, 2008, at 5:34 PM, Alberto Giannetti wrote:


I have simple MPI program that sends data to processor rank 0. The
communication works well but when I run the program on more than 2
processors (-np 4) the extra receivers waiting for data run on >  
90%

CPU load. I understand MPI_Recv() is a blocking operation, but why
does it consume so much CPU compared to a regular system read()?



#include 
#include 
#include 
#include 
#include 

void process_sender(int);
void process_receiver(int);


int main(int argc, char* argv[])
{
 int rank;

 MPI_Init(, );
 MPI_Comm_rank(MPI_COMM_WORLD, );

 printf("Processor %d (%d) initialized\n", rank, getpid());

 if( rank == 1 )
   process_sender(rank);
 else
   process_receiver(rank);

 MPI_Finalize();
}


void process_sender(int rank)
{
 int i, j, size;
 float data[100];
 MPI_Status status;

 printf("Processor %d initializing data...\n", rank);
 for( i = 0; i < 100; ++i )
   data[i] = i;

 MPI_Comm_size(MPI_COMM_WORLD, );

 printf("Processor %d sending data...\n", rank);
 MPI_Send(data, 100, MPI_FLOAT, 0, 55, MPI_COMM_WORLD);
 printf("Processor %d sent data\n", rank);
}


void process_receiver(int rank)
{
 int count;
 float value[200];
 MPI_Status status;

 printf("Processor %d waiting for data...\n", rank);
 MPI_Recv(value, 200, MPI_FLOAT, MPI_ANY_SOURCE, 55,
MPI_COMM_WORLD, );
 printf("Processor %d Got data from processor %d\n", rank,
status.MPI_SOURCE);
 MPI_Get_count(, MPI_FLOAT, );
 printf("Processor %d, Got %d elements\n", rank, count);
}

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] (no subject)

2008-04-23 Thread Alberto Giannetti

I have simple MPI program that sends data to processor rank 0. The  
communication works well but when I run the program on more than 2  
processors (-np 4) the extra receivers waiting for data run on > 90%  
CPU load. I understand MPI_Recv() is a blocking operation, but why  
does it consume so much CPU compared to a regular system read()?




#include 
#include 
#include 
#include 
#include 

void process_sender(int);
void process_receiver(int);


int main(int argc, char* argv[])
{
  int rank;

  MPI_Init(, );
  MPI_Comm_rank(MPI_COMM_WORLD, );

  printf("Processor %d (%d) initialized\n", rank, getpid());

  if( rank == 1 )
process_sender(rank);
  else
process_receiver(rank);

  MPI_Finalize();
}


void process_sender(int rank)
{
  int i, j, size;
  float data[100];
  MPI_Status status;

  printf("Processor %d initializing data...\n", rank);
  for( i = 0; i < 100; ++i )
data[i] = i;

  MPI_Comm_size(MPI_COMM_WORLD, );

  printf("Processor %d sending data...\n", rank);
  MPI_Send(data, 100, MPI_FLOAT, 0, 55, MPI_COMM_WORLD);
  printf("Processor %d sent data\n", rank);
}


void process_receiver(int rank)
{
  int count;
  float value[200];
  MPI_Status status;

  printf("Processor %d waiting for data...\n", rank);
  MPI_Recv(value, 200, MPI_FLOAT, MPI_ANY_SOURCE, 55,  
MPI_COMM_WORLD, );
  printf("Processor %d Got data from processor %d\n", rank,  
status.MPI_SOURCE);

  MPI_Get_count(, MPI_FLOAT, );
  printf("Processor %d, Got %d elements\n", rank, count);
}

[OMPI users] (no subject)

2008-01-24 Thread Wong, Wayne

I'm having some difficulty geting the Open MPI checkpoint/restart fault
tolerance working.  I have compiled Open MPI with the "--with-ft=cr"
flag, but when I attempt to run my test program (ring), the
ompi-checkpoint command fails.  I have verified that the test program
works fine without the fault tolerance enabled.  Here are the details:
 
 [me@dev1 ~]$ mpirun -np 4 -am ft-enable-cr ring
 [me@dev1 ~]$ ps -efa | grep mpirun
 me 3052  2820  1 08:25 pts/200:00:00 mpirun -np 4 -am
ft-enable-cr ring
 

 [me@dev1 ~]$ ompi-checkpoint 3052
 [dev1.acme.local:03060] [NO-NAME] ORTE_ERROR_LOG: Unknown error:
5854512 in file sds_singleton_module.c at line 50
 [dev1.acme.local:03060] [NO-NAME] ORTE_ERROR_LOG: Unknown error:
5854512 in file runtime/orte_init.c at line 311
 

--
 It looks like orte_init failed for some reason; your parallel
process is
 likely to abort.  There are many reasons that a parallel process
can
 fail during orte_init; some of which are due to configuration or
 environment problems.  This failure appears to be an internal
failure;
 here's some additional information (which may only be relevant to
an
 Open MPI developer):
 
   orte_sds_base_set_name failed
   --> Returned value Unknown error: 5854512 (5854512) instead of
ORTE_SUCCESS
 
 

--

Any help would be appreciated.  Thanks.


ompi_info.txt.gz
Description: ompi_info.txt.gz


config.log.gz
Description: config.log.gz

Re: [OMPI users] (no subject)

2007-04-06 Thread JiaXing Cai

Hi,

I wrote the information below in my hostfile:
192.168.1.1 4 slots
192.168.1.2 4 slots

and I entered the command below in the directory which contains my
hostfile(my_host) :
~Administrator/PCal$ mpirun -np 8 -hostfile my_host --byslot hello 

then the following information returned:
---
bash:line 1:orted:command not found.
[Apple1.local:00516][0,0,0]ORTE_ERROR_LOG:Timeout in file
base/pls_base,orte+cmds.c at line 275
...
[Apple1.local:00516]ERROR:A daemon on node 192.168.1.2 failed to start as
expected.
...
[Apple1.local:00516][0,0,0]ORTE_ERROR_LOG:Timeout in file pls_rsh_module.c at 
line
1187.
...
mpirun was unable to cleanly terminate the daemons for this job.Returned value
Timeout instead of ORTE_SUCCESS.

Should there be a SSI(Single `System Image) on both of my Apple PCs?How can I 
do?
Thank you.


ÔÚÄúµÄÀ´ÐÅÖÐÔø¾Ìáµ½:
>From: "Götz Waschk" <goetz.was...@gmail.com>
>Reply-To: Open MPI Users <us...@open-mpi.org>
>To: "Open MPI Users" <us...@open-mpi.org>
>Subject: Re: [OMPI users] (no subject)
>Date:Wed, 4 Apr 2007 13:28:15 +0200
>
>On 4/4/07, JiaXing Cai <ca...@mail.ustc.edu.cn> wrote:
>>I want to do a parallel job on 2 Apple PowerPCs(Power Ma GS Quad) which 
>> run
on
>> Mac OS X 10.4.8.How can I make them to communicate with each other using
>> open-mpi?
>> I have tried ,but failed.An error related to daemons has occured.
>
>Hi,
>
>could you please tell us what exactly you have tried and please
>include the complete error message as well.
>
>Regards, Götz Waschk
>
>-- 
>AL I:40: Do what thou wilt shall be the whole of the Law.
>
>___
>users mailing list
>us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] (no subject)

2007-04-04 Thread JiaXing Cai

Help
   I want to do a parallel job on 2 Apple PowerPCs(Power Ma GS Quad) which run 
on
Mac OS X 10.4.8.How can I make them to communicate with each other using
open-mpi?
I have tried ,but failed.An error related to daemons has occured.

Re: [OMPI users] (no subject)

2006-07-06 Thread Jonathan Blocksom

Check out "Windows Compute Cluster Server 2003", 
http://www.microsoft.com/windowsserver2003/ccs/default.mspx.


From the FAQ: "Windows Compute Cluster Server 2003 comes with the 
Microsoft Message Passing Interface (MS MPI), an MPI stack based on the 
MPICH2 implementation from Argonne National Labs."


I have no experience with it, just sharing the link.

Jonathan

usha devi regadi wrote:
  
hello

 I'll be glad to know if an MPI is available On WINDOWS Platform.

 Regards
usha






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

65 matches

Mail list logo