[OMPI users] Fix the use of hostfiles when a username is supplied in v1.5 ?

2010-10-22 Thread Olivier Riff
Hello,

There was a bug in the use of hostfiles when a username is supplied which
has been fixed in OpenMPI v1.4.2.
I have just installed the v1.5 and the bug seems to come out again : only
the first username provided in the machinefile is taken into account.

See mails below for the history.

My configuration :
OpenMPI 1.5
Linux Mandriva 2008 x86_64 and Linux RHE x86_64
machinefile example :
or985966@is209898 slots=1
realtime@is206022 slots=8
realtime@is206025 slots=8

Best regards,

Olivier

-- Forwarded message --
From: Ralph Castain <r...@open-mpi.org>
List-Post: users@lists.open-mpi.org
Date: 2010/3/11
Subject: Re: [OMPI users] OPenMpi: How to specify login name in machinefile
passed to mpirun
To: Open MPI Users <us...@open-mpi.org>


Yeah, it was a bug in the parser - fix scheduled for 1.4.2 release.

Thanks!
Ralph

On Mar 11, 2010, at 4:32 AM, Olivier Riff wrote:

Hello Ralph,

Thanks for you quick reply.
Sorry I did not mention the version : it is the v1.4 (which indeed is not
the very last one).
I will appreciate if you could make a short test.

Thanks and Regards,

Olivier

2010/3/10 Ralph Castain <r...@open-mpi.org>

> Probably a bug - I don't recall if/when anyone actually tested that code
> path. I'll have a look...probably in the hostfile parser.
>
> What version are you using?
>
> On Mar 10, 2010, at 8:26 AM, Olivier Riff wrote:
>
> Oops sorry I made the test too fast: it still does not work properly with
> several logins:
>
> I start on user1's machine:
> mpirun -np 2 --machinefile machinefile.txt MyProgram
>
> with machinefile:
> user1@machine1 slots=1
> user2@machine2 slots=1
>
> and I got :
> user1@machine2 password prompt ?! (there is no user1 account on
> machine2...)
>
> My problem is still open... why is there a connection attempt to machine2
> with user1 ...
> Has somebody an explanation ?
>
> Thanks,
>
> Olivier
>
>
> 2010/3/10 Olivier Riff <olir...@googlemail.com>
>
>> OK, it works now thanks. I forgot to add the slots information in the
>> machinefile.
>>
>> Cheers,
>>
>> Olivier
>>
>>
>>
>> 2010/3/10 Ralph Castain <r...@open-mpi.org>
>>
>> It is the exact same syntax inside of the machinefile:
>>>
>>> user1@machine1 slots=4
>>> user2@machine2 slots=3
>>> 
>>>
>>>
>>> On Mar 10, 2010, at 5:41 AM, Olivier Riff wrote:
>>>
>>> > Hello,
>>> >
>>> > I am using openmpi on several machines which have different user
>>> accounts and I cannot find a way to specify the login for each machine in
>>> the machinefile passed to mpirun.
>>> > The only solution I found is to use the -host argument of mpirun, such
>>> as:
>>> > mpirun -np 2 --host user1@machine1,user2@machine2 MyProgram
>>> > which is very inconvenient with a lot of machines.
>>> >
>>> > Is there a way to do the same using a machinefile text? :
>>> > mpirun -np 2 -machinefile machinefile.txt MyProgram
>>> >
>>> > I cannot find the appropriate syntax for specifying a user in
>>> machinefile.txt...
>>> >
>>> > Thanks in advance,
>>> >
>>> > Olivier
>>> >
>>> > ___
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] Display in terminal of error message using throw std::runtime_error on distant node...

2010-09-24 Thread Olivier Riff
That is already an answer that make sense. I understand that it is really
not a trivial issue. I have seen other recent threads about "running on
crashed nodes", and that the openmpi team is working hard on it. Well we
will wait and be glad to test the first versions when (I understand it will
take some time) they are released.

Thanks for this quick reply,

Olivier

2010/9/24 Jeff Squyres <jsquy...@cisco.com>

> Open MPI's fault tolerance is still somewhat rudimentary; it's a complex
> topic within the entire scope of MPI.  There has been much research into MPI
> and fault tolerance over the years; the MPI Forum itself is grappling with
> terms and definitions that make sense.  It's by no means a "solved" problem.
>
> It's unfortunately unsurprising that Open MPI may hang in the case of a
> node crash.  I wish that I had a better answer for you, but I don't.  :-\
>
>
> On Sep 24, 2010, at 3:36 AM, Olivier Riff wrote:
>
> > Hello,
> >
> > My question concerns the display of error message generated by a throw
> std::runtime_error("Explicit error message").
> > I am launching on a terminal an openMPI program on several machines
> using:
> > mpirun -v -machinefile MyMachineFile.txt MyProgram.
> > I am wondering why I cannot see an error message displayed on the
> terminal when one of my distant node (meaning not the node where the
> terminal is used) is crashing. I was expecting that following try catch
> could also generates a display in the terminal:
> > try {...My code where a crash happens... }
> > {
> >   throw std::runtime_error( "Explicit error message" );
> > }
> >
> > Generally, my problem is that one of the node crashes and the global
> application waits forever data from this node. On the terminal, nothing is
> displayed indicating that the node has crashed and generated a useful
> information of the crash nature.
> >
> > ( I don't think these information are relevant here, but just in case: I
> am using openMPI 1.4.2, on a Mandriva 2008 system )
> >
> > Thanks in advance for any help/info/advice.
> >
> > Olivier
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] Display in terminal of error message using throw std::runtime_error on distant node...

2010-09-24 Thread Olivier Riff
Hello,

My question concerns the display of error message generated by a throw
std::runtime_error("Explicit error message").
I am launching on a terminal an openMPI program on several machines using:
mpirun -v -machinefile MyMachineFile.txt MyProgram.
I am wondering why I cannot see an error message displayed on the terminal
when one of my distant node (meaning not the node where the terminal is
used) is crashing. I was expecting that following try catch could also
generates a display in the terminal:
try {...My code where a crash happens... }
{
  throw std::runtime_error( "Explicit error message" );
}

Generally, my problem is that one of the node crashes and the global
application waits forever data from this node. On the terminal, nothing is
displayed indicating that the node has crashed and generated a useful
information of the crash nature.

( I don't think these information are relevant here, but just in case: I am
using openMPI 1.4.2, on a Mandriva 2008 system )

Thanks in advance for any help/info/advice.

Olivier


Re: [OMPI users] General question on the implementation ofa"scheduler" on client side...

2010-05-23 Thread Olivier Riff
2010/5/21 Jeff Squyres <jsquy...@cisco.com>

> On May 21, 2010, at 3:13 AM, Olivier Riff wrote:
>
> > -> That is what I was thinking about to implement. As you mentioned, and
> specifically for my case where I mainly send short messages, there might not
> be much win. By the way, are their some benchmarks testing sequential
> MPI_ISend versus MPI_BCAST for instance ? The aim is to determine up which
> buffer size a MPI_BCast is worth to be used for my case. You can answer that
> the test is easy to do and that I can test it by myself :)
>
> "It depends".  :-)
>
> You're probably best writing a benchmark yourself that mirrors what your
> application is going to do.
>


OK, you're right I'll do it. Thanks again for your support.


Regards,

Olivier



>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] General question on the implementation of a"scheduler" on client side...

2010-05-21 Thread Olivier Riff
Hello Jeff,

thanks for your detailed answer.

2010/5/20 Jeff Squyres <jsquy...@cisco.com>

> You're basically talking about implementing some kind of
> application-specific protocol.  A few tips that may help in your design:
>
> 1. Look into MPI_Isend / MPI_Irecv for non-blocking sends and receives.
>  These may be particularly useful in the server side, so that it can do
> other stuff while sends and receives are progressing.
>

-> You are definitively right, I have to switch to non blocking sends and
receives.


>
> 2. You probably already noticed that collective operations (broadcasts and
> the link) need to be invoked by all members of the communicator.  So if you
> want to do a broadcast, everyone needs to know.  That being said, you can
> send a short message to everyone alerting them that a longer broadcast is
> coming -- then they can execute MPI_BCAST, etc.  That works best if your
> broadcasts are large messages (i.e., you benefit from scalable
> implementations of broadcast) -- otherwise you're individually sending short
> messages followed by a short broadcast.  There might not be much of a "win"
> there.
>

-> That is what I was thinking about to implement. As you mentioned, and
specifically for my case where I mainly send short messages, there might not
be much win. By the way, are their some benchmarks testing sequential
MPI_ISend versus MPI_BCAST for instance ? The aim is to determine up which
buffer size a MPI_BCast is worth to be used for my case. You can answer that
the test is easy to do and that I can test it by myself :)


>
> 3. FWIW, the MPI Forum has introduced the concept of non-blocking
> collective operations for the upcoming MPI-3 spec.  These may help; google
> for libnbc for a (non-optimized) implementation that may be of help to you.
>  MPI implementations (like Open MPI) will feature non-blocking collectives
> someday in the future.
>
>
-> Interesting to know and to keep in mind. Sometimes the future is really
near.

Thanks again for your answer and info.

Olivier



> On May 20, 2010, at 5:30 AM, Olivier Riff wrote:
>
> > Hello,
> >
> > I have a general question about the best way to implement an openmpi
> application, i.e the design of the application.
> >
> > A machine (I call it the "server") should send to a cluster containing a
> lot of processors (the "clients") regularly task to do (byte buffers from
> very various size).
> > The server should send to each client a different buffer, then wait for
> each client answers (buffer sent by each client after some processing), and
> retrieve the result data.
> >
> > First I made something looking like this.
> > On the server side: Send sequentially to each client buffers using
> MPI_Send.
> > On each client side: loop which waits a buffer using MPI_Recv, then
> process the buffer and sends the result using MPI_Send
> > This is really not efficient because a lot of time is lost due to the
> fact that the server sends and receives sequentially the buffers.
> > It only has the advantage to have on the client size a pretty easy
> scheduler:
> > Wait for buffer (MPI_Recv) -> Analyse it -> Send result (MPI_Send)
> >
> > My wish is to mix MPI_Send/MPI_Recv and other mpi functions like
> MPI_BCast/MPI_Scatter/MPI_Gather... (like I imagine every mpi application
> does).
> > The problem is that I cannot find a easy solution in order that each
> client knows which kind of mpi function is currently called by the server.
> If the server calls MPI_BCast the client should do the same. Sending at each
> time a first message to indicate the function the server will call next does
> not look very nice. Though I do not see an easy/best way to implement an
> "adaptative" scheduler on the client side.
> >
> > Any tip, advice, help would be appreciate.
> >
> >
> > Thanks,
> >
> > Olivier
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Buffer size limit and memory consumption problem on heterogeneous (32 bit / 64 bit) machines

2010-05-20 Thread Olivier Riff
I have done the test with v1.4.2 and indeed it fixes the problem.
Thanks Nysal.
Thank you also Terry for your help. With the fix I do not need anymore to
use a huge value of btl_tcp_eager_limit (I keep the default value) which
considerably decreases the memory consumption I had before. Everything works
fine now.

Regards,

Olivier

2010/5/20 Olivier Riff <olir...@googlemail.com>

>
>
> 2010/5/20 Nysal Jan <jny...@gmail.com>
>
> This probably got fixed in https://svn.open-mpi.org/trac/ompi/ticket/2386
>> Can you try 1.4.2, the fix should be in there.
>>
>>
>
> I will test it soon (takes some time to install the new version on each
> node) . It would be perfect if it fixes it.
> I will tell you the result asap.
>
> Thanks.
>
> Olivier
>
>
>
>
>
>
>
>> Regards
>> --Nysal
>>
>>
>> On Thu, May 20, 2010 at 2:02 PM, Olivier Riff <olir...@googlemail.com>wrote:
>>
>>> Hello,
>>>
>>> I assume this question has been already discussed many times, but I can
>>> not find on Internet a solution to my problem.
>>> It is about buffer size limit of MPI_Send and MPI_Recv with heterogeneous
>>> system (32 bit laptop / 64 bit cluster).
>>> My configuration is :
>>> open mpi 1.4, configured with: --without-openib --enable-heterogeneous
>>> --enable-mpi-threads
>>> Program is launched a laptop (32 bit Mandriva 2008) which distributes
>>> tasks to do to a cluster of 70 processors  (64 bit RedHat Entreprise
>>> distribution):
>>> I have to send various buffer size from few bytes till 30Mo.
>>>
>>> I tested following commands:
>>> 1) mpirun -v -machinefile machinefile.txt MyMPIProgram
>>> -> crash on client side ( 64 bit RedHat Entreprise ) when sent buffer
>>> size > 65536.
>>> 2) mpirun --mca btl_tcp_eager_limit 3000 -v -machinefile
>>> machinefile.txt MyMPIProgram
>>> -> works but has the effect of generating gigantic memory consumption on
>>> 32 bit machine side after MPI_Recv. Memory consumption goes from 800Mo to
>>> 2,1Go after receiving about 20ko from each 70 clients ( a total of about 1.4
>>> Mo ).  This makes my program crash later because I have no more memory to
>>> allocate new structures. I read in a openmpi forum thread that setting
>>> btl_tcp_eager_limit to a huge value explains this huge memory consumption
>>> when a message sent does not have a preposted ready recv. Also after all
>>> messages have been received and there is no more traffic activity : the
>>> memory consumed remains at 2.1go... and I do not understand why.
>>>
>>> What is the best way to do in order to have a working program which also
>>> has a small memory consumption (the speed performance can be lower) ?
>>> I tried to play with mca paramters btl_tcp_sndbuf and mca btl_tcp_rcvbuf,
>>> but without success.
>>>
>>> Thanks in advance for you help.
>>>
>>> Best regards,
>>>
>>> Olivier
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>


Re: [OMPI users] Buffer size limit and memory consumption problem on heterogeneous (32 bit / 64 bit) machines

2010-05-20 Thread Olivier Riff
2010/5/20 Nysal Jan <jny...@gmail.com>

> This probably got fixed in https://svn.open-mpi.org/trac/ompi/ticket/2386
> Can you try 1.4.2, the fix should be in there.
>
>

I will test it soon (takes some time to install the new version on each
node) . It would be perfect if it fixes it.
I will tell you the result asap.

Thanks.

Olivier







> Regards
> --Nysal
>
>
> On Thu, May 20, 2010 at 2:02 PM, Olivier Riff <olir...@googlemail.com>wrote:
>
>> Hello,
>>
>> I assume this question has been already discussed many times, but I can
>> not find on Internet a solution to my problem.
>> It is about buffer size limit of MPI_Send and MPI_Recv with heterogeneous
>> system (32 bit laptop / 64 bit cluster).
>> My configuration is :
>> open mpi 1.4, configured with: --without-openib --enable-heterogeneous
>> --enable-mpi-threads
>> Program is launched a laptop (32 bit Mandriva 2008) which distributes
>> tasks to do to a cluster of 70 processors  (64 bit RedHat Entreprise
>> distribution):
>> I have to send various buffer size from few bytes till 30Mo.
>>
>> I tested following commands:
>> 1) mpirun -v -machinefile machinefile.txt MyMPIProgram
>> -> crash on client side ( 64 bit RedHat Entreprise ) when sent buffer size
>> > 65536.
>> 2) mpirun --mca btl_tcp_eager_limit 3000 -v -machinefile
>> machinefile.txt MyMPIProgram
>> -> works but has the effect of generating gigantic memory consumption on
>> 32 bit machine side after MPI_Recv. Memory consumption goes from 800Mo to
>> 2,1Go after receiving about 20ko from each 70 clients ( a total of about 1.4
>> Mo ).  This makes my program crash later because I have no more memory to
>> allocate new structures. I read in a openmpi forum thread that setting
>> btl_tcp_eager_limit to a huge value explains this huge memory consumption
>> when a message sent does not have a preposted ready recv. Also after all
>> messages have been received and there is no more traffic activity : the
>> memory consumed remains at 2.1go... and I do not understand why.
>>
>> What is the best way to do in order to have a working program which also
>> has a small memory consumption (the speed performance can be lower) ?
>> I tried to play with mca paramters btl_tcp_sndbuf and mca btl_tcp_rcvbuf,
>> but without success.
>>
>> Thanks in advance for you help.
>>
>> Best regards,
>>
>> Olivier
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Buffer size limit and memory consumption problem on heterogeneous (32 bit / 64 bit) machines

2010-05-20 Thread Olivier Riff
Hello Terry,

Thanks for your answer.

2010/5/20 Terry Dontje <terry.don...@oracle.com>

>  Olivier Riff wrote:
>
> Hello,
>
> I assume this question has been already discussed many times, but I can not
> find on Internet a solution to my problem.
> It is about buffer size limit of MPI_Send and MPI_Recv with heterogeneous
> system (32 bit laptop / 64 bit cluster).
> My configuration is :
> open mpi 1.4, configured with: --without-openib --enable-heterogeneous
> --enable-mpi-threads
> Program is launched a laptop (32 bit Mandriva 2008) which distributes tasks
> to do to a cluster of 70 processors  (64 bit RedHat Entreprise
> distribution):
> I have to send various buffer size from few bytes till 30Mo.
>
>  You really want to get your program running without the tcp_eager_limit
> set if you want a better usage of memory.  I believe the crash has something
> to do with the rendezvous protocol in OMPI.  Have you narrowed this failure
> down to a simple MPI program?  Also I noticed that you're configuring with
> --enable-mpi-threads, have you tried configuring without that option?
>
>
-> No, unfortunately I did not narrowed this behaviour to a simple MPI
program. I think I will have to do it if I do not find a solution in the
next days.
I will also make the test without the --enable-mpi-threads configuration.


> I tested following commands:
> 1) mpirun -v -machinefile machinefile.txt MyMPIProgram
> -> crash on client side ( 64 bit RedHat Entreprise ) when sent buffer size
> > 65536.
> 2) mpirun --mca btl_tcp_eager_limit 3000 -v -machinefile
> machinefile.txt MyMPIProgram
> -> works but has the effect of generating gigantic memory consumption on 32
> bit machine side after MPI_Recv. Memory consumption goes from 800Mo to 2,1Go
> after receiving about 20ko from each 70 clients ( a total of about 1.4 Mo
> ).  This makes my program crash later because I have no more memory to
> allocate new structures. I read in a openmpi forum thread that setting
> btl_tcp_eager_limit to a huge value explains this huge memory consumption
> when a message sent does not have a preposted ready recv. Also after all
> messages have been received and there is no more traffic activity : the
> memory consumed remains at 2.1go... and I do not understand why.
>
> Are the 70 clients all on different nodes?  I am curious if the 2.1GB is
> due to the SM BTL or possibly a leak in the TCP BTL.
>

No, 70 clients are only on 9 nodes. In fact it is 72 clients: they are nine
8-processor machines.
The 2.1Gb memory consumption appears when I sequentially try to read the
result on each 72 clients (for loop from 1 to 72 calling MPI_Recv). I assume
that many clients have already sent the result whereas the server has not
called the MPI_Rec for the corresponding rank yet.


>
> What is the best way to do in order to have a working program which also
> has a small memory consumption (the speed performance can be lower) ?
> I tried to play with mca paramters btl_tcp_sndbuf and mca btl_tcp_rcvbuf,
> but without success.
>
> Thanks in advance for you help.
>
> Best regards,
>
> Olivier
>
> --
>
> ___
> users mailing 
> listusers@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> [image: Oracle]
>   Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.650.633.7054
>  Oracle * - Performance Technologies*
>  95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] General question on the implementation of a "scheduler" on client side...

2010-05-20 Thread Olivier Riff
Hello,

I have a general question about the best way to implement an openmpi
application, i.e the design of the application.

A machine (I call it the "server") should send to a cluster containing a lot
of processors (the "clients") regularly task to do (byte buffers from very
various size).
The server should send to each client a different buffer, then wait for each
client answers (buffer sent by each client after some processing), and
retrieve the result data.

First I made something looking like this.
On the server side: Send sequentially to each client buffers using MPI_Send.
On each client side: loop which waits a buffer using MPI_Recv, then process
the buffer and sends the result using MPI_Send
This is really not efficient because a lot of time is lost due to the fact
that the server sends and receives sequentially the buffers.
It only has the advantage to have on the client size a pretty easy
scheduler:
Wait for buffer (MPI_Recv) -> Analyse it -> Send result (MPI_Send)

My wish is to mix MPI_Send/MPI_Recv and other mpi functions like
MPI_BCast/MPI_Scatter/MPI_Gather... (like I imagine every mpi application
does).
The problem is that I cannot find a easy solution in order that each client
knows which kind of mpi function is currently called by the server. If the
server calls MPI_BCast the client should do the same. Sending at each time a
first message to indicate the function the server will call next does not
look very nice. Though I do not see an easy/best way to implement an
"adaptative" scheduler on the client side.

Any tip, advice, help would be appreciate.


Thanks,

Olivier


[OMPI users] Buffer size limit and memory consumption problem on heterogeneous (32 bit / 64 bit) machines

2010-05-20 Thread Olivier Riff
Hello,

I assume this question has been already discussed many times, but I can not
find on Internet a solution to my problem.
It is about buffer size limit of MPI_Send and MPI_Recv with heterogeneous
system (32 bit laptop / 64 bit cluster).
My configuration is :
open mpi 1.4, configured with: --without-openib --enable-heterogeneous
--enable-mpi-threads
Program is launched a laptop (32 bit Mandriva 2008) which distributes tasks
to do to a cluster of 70 processors  (64 bit RedHat Entreprise
distribution):
I have to send various buffer size from few bytes till 30Mo.

I tested following commands:
1) mpirun -v -machinefile machinefile.txt MyMPIProgram
-> crash on client side ( 64 bit RedHat Entreprise ) when sent buffer size >
65536.
2) mpirun --mca btl_tcp_eager_limit 3000 -v -machinefile machinefile.txt
MyMPIProgram
-> works but has the effect of generating gigantic memory consumption on 32
bit machine side after MPI_Recv. Memory consumption goes from 800Mo to 2,1Go
after receiving about 20ko from each 70 clients ( a total of about 1.4 Mo
).  This makes my program crash later because I have no more memory to
allocate new structures. I read in a openmpi forum thread that setting
btl_tcp_eager_limit to a huge value explains this huge memory consumption
when a message sent does not have a preposted ready recv. Also after all
messages have been received and there is no more traffic activity : the
memory consumed remains at 2.1go... and I do not understand why.

What is the best way to do in order to have a working program which also has
a small memory consumption (the speed performance can be lower) ?
I tried to play with mca paramters btl_tcp_sndbuf and mca btl_tcp_rcvbuf,
but without success.

Thanks in advance for you help.

Best regards,

Olivier


Re: [OMPI users] Question on virtual memory allocated

2010-05-15 Thread Olivier Riff
Thank you Jeff for your explaination. It is much clearer now.

Best regards.

Olivier

2010/5/15 Jeff Squyres <jsquy...@cisco.com>

> On May 12, 2010, at 8:19 AM, Olivier Riff wrote:
>
> > What I do not understand is where the value of 2104m for the virtual
> memory comes from.
> > When I add the value of Mem used (777848k) to the value of the cache
> (339184k) : the amount is by far inferior to the Virtual memory (2104m).
> > Are here part of the memory allocated by the clients taken into account ?
>
> No, top only shows the data from one machine.
>
> > Where are physically allocated these 2104m of data ?
>
> They may be in physical memory and may also be swapped out on disk.
>
> Keep in mind that the virtual memory encompasses *all* memory for an
> application -- its code and its data.  Hence, this also includes shared
> libraries (which may be shared amongst several processes on the same
> machine), process-specific instructions, process-specific data, and shared
> process data.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] Question on virtual memory allocated

2010-05-12 Thread Olivier Riff
Hello,

My question is about virtual memory allocated by an open-mpi program. I am
not familiar with memory managment and I will be grateful if you could
explain me what I am observing when I launch my openMpi program on several
machines.

My program is started on a server machine which comunicate with 72 client
machines.
When I am doing a "top" on the Linux shell of the server machine: I can
observe:

Mem:   2074468k total, 777848k used,  1296628k free,  4224k
buffers
Swap:  4192924k total,   52k used,  4192872k free,
339184k cached
PID   USER  PR  NI   VIRT   RES   SHRS   %CPU
%MEM TIME+COMMAND
28211  realtime  20   0 *2104m*158m 29mS100
4.6 1:04.14   MyOpenMPIProgram

What I do not understand is where the value of 2104m for the virtual memory
comes from.
When I add the value of Mem used (777848k) to the value of the cache
(339184k) : the amount is by far inferior to the Virtual memory (2104m).
Are here part of the memory allocated by the clients taken into account ?
Where are physically allocated these 2104m of data ?
I was assuming that a process cannot allocate more the 2Go of RAM on a 32bit
machine, this would meant that part of this 2104m is located on the disk or
anywhere else...

My configuration is:
OpenMPI 1.4 on Mandriva 2008 (32bit)
Program is started using: mpirun --mca btl_tcp_eager_limit 5000 -v
-machinefile machinefile.txt MyOpenMPIProgram

Thanks in advance for any help/tips (and sorry if this is not completly
related to openmpi).

Olivier


Re: [OMPI users] OPenMpi: How to specify login name in machinefile passed to mpirun

2010-03-11 Thread Olivier Riff
2010/3/11 Ralph Castain <r...@open-mpi.org>

> Yeah, it was a bug in the parser - fix scheduled for 1.4.2 release.
>
> Thanks!
> Ralph
>


OK, thanks Ralph for the test and the quick analyse.

Regards,

Olivier




>
> On Mar 11, 2010, at 4:32 AM, Olivier Riff wrote:
>
> Hello Ralph,
>
> Thanks for you quick reply.
> Sorry I did not mention the version : it is the v1.4 (which indeed is not
> the very last one).
> I will appreciate if you could make a short test.
>
> Thanks and Regards,
>
> Olivier
>
> 2010/3/10 Ralph Castain <r...@open-mpi.org>
>
>> Probably a bug - I don't recall if/when anyone actually tested that code
>> path. I'll have a look...probably in the hostfile parser.
>>
>> What version are you using?
>>
>> On Mar 10, 2010, at 8:26 AM, Olivier Riff wrote:
>>
>> Oops sorry I made the test too fast: it still does not work properly with
>> several logins:
>>
>> I start on user1's machine:
>> mpirun -np 2 --machinefile machinefile.txt MyProgram
>>
>> with machinefile:
>> user1@machine1 slots=1
>> user2@machine2 slots=1
>>
>> and I got :
>> user1@machine2 password prompt ?! (there is no user1 account on
>> machine2...)
>>
>> My problem is still open... why is there a connection attempt to machine2
>> with user1 ...
>> Has somebody an explanation ?
>>
>> Thanks,
>>
>> Olivier
>>
>>
>> 2010/3/10 Olivier Riff <olir...@googlemail.com>
>>
>>> OK, it works now thanks. I forgot to add the slots information in the
>>> machinefile.
>>>
>>> Cheers,
>>>
>>> Olivier
>>>
>>>
>>>
>>> 2010/3/10 Ralph Castain <r...@open-mpi.org>
>>>
>>> It is the exact same syntax inside of the machinefile:
>>>>
>>>> user1@machine1 slots=4
>>>> user2@machine2 slots=3
>>>> 
>>>>
>>>>
>>>> On Mar 10, 2010, at 5:41 AM, Olivier Riff wrote:
>>>>
>>>> > Hello,
>>>> >
>>>> > I am using openmpi on several machines which have different user
>>>> accounts and I cannot find a way to specify the login for each machine in
>>>> the machinefile passed to mpirun.
>>>> > The only solution I found is to use the -host argument of mpirun, such
>>>> as:
>>>> > mpirun -np 2 --host user1@machine1,user2@machine2 MyProgram
>>>> > which is very inconvenient with a lot of machines.
>>>> >
>>>> > Is there a way to do the same using a machinefile text? :
>>>> > mpirun -np 2 -machinefile machinefile.txt MyProgram
>>>> >
>>>> > I cannot find the appropriate syntax for specifying a user in
>>>> machinefile.txt...
>>>> >
>>>> > Thanks in advance,
>>>> >
>>>> > Olivier
>>>> >
>>>> > ___
>>>> > users mailing list
>>>> > us...@open-mpi.org
>>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> ___
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] OPenMpi: How to specify login name in machinefile passed to mpirun

2010-03-11 Thread Olivier Riff
Hello Ralph,

Thanks for you quick reply.
Sorry I did not mention the version : it is the v1.4 (which indeed is not
the very last one).
I will appreciate if you could make a short test.

Thanks and Regards,

Olivier

2010/3/10 Ralph Castain <r...@open-mpi.org>

> Probably a bug - I don't recall if/when anyone actually tested that code
> path. I'll have a look...probably in the hostfile parser.
>
> What version are you using?
>
> On Mar 10, 2010, at 8:26 AM, Olivier Riff wrote:
>
> Oops sorry I made the test too fast: it still does not work properly with
> several logins:
>
> I start on user1's machine:
> mpirun -np 2 --machinefile machinefile.txt MyProgram
>
> with machinefile:
> user1@machine1 slots=1
> user2@machine2 slots=1
>
> and I got :
> user1@machine2 password prompt ?! (there is no user1 account on
> machine2...)
>
> My problem is still open... why is there a connection attempt to machine2
> with user1 ...
> Has somebody an explanation ?
>
> Thanks,
>
> Olivier
>
>
> 2010/3/10 Olivier Riff <olir...@googlemail.com>
>
>> OK, it works now thanks. I forgot to add the slots information in the
>> machinefile.
>>
>> Cheers,
>>
>> Olivier
>>
>>
>>
>> 2010/3/10 Ralph Castain <r...@open-mpi.org>
>>
>> It is the exact same syntax inside of the machinefile:
>>>
>>> user1@machine1 slots=4
>>> user2@machine2 slots=3
>>> 
>>>
>>>
>>> On Mar 10, 2010, at 5:41 AM, Olivier Riff wrote:
>>>
>>> > Hello,
>>> >
>>> > I am using openmpi on several machines which have different user
>>> accounts and I cannot find a way to specify the login for each machine in
>>> the machinefile passed to mpirun.
>>> > The only solution I found is to use the -host argument of mpirun, such
>>> as:
>>> > mpirun -np 2 --host user1@machine1,user2@machine2 MyProgram
>>> > which is very inconvenient with a lot of machines.
>>> >
>>> > Is there a way to do the same using a machinefile text? :
>>> > mpirun -np 2 -machinefile machinefile.txt MyProgram
>>> >
>>> > I cannot find the appropriate syntax for specifying a user in
>>> machinefile.txt...
>>> >
>>> > Thanks in advance,
>>> >
>>> > Olivier
>>> >
>>> > ___
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] OPenMpi: How to specify login name in machinefile passed to mpirun

2010-03-10 Thread Olivier Riff
Oops sorry I made the test too fast: it still does not work properly with
several logins:

I start on user1's machine:
mpirun -np 2 --machinefile machinefile.txt MyProgram

with machinefile:
user1@machine1 slots=1
user2@machine2 slots=1

and I got :
user1@machine2 password prompt ?! (there is no user1 account on machine2...)

My problem is still open... why is there a connection attempt to machine2
with user1 ...
Has somebody an explanation ?

Thanks,

Olivier


2010/3/10 Olivier Riff <olir...@googlemail.com>

> OK, it works now thanks. I forgot to add the slots information in the
> machinefile.
>
> Cheers,
>
> Olivier
>
>
>
> 2010/3/10 Ralph Castain <r...@open-mpi.org>
>
> It is the exact same syntax inside of the machinefile:
>>
>> user1@machine1 slots=4
>> user2@machine2 slots=3
>> 
>>
>>
>> On Mar 10, 2010, at 5:41 AM, Olivier Riff wrote:
>>
>> > Hello,
>> >
>> > I am using openmpi on several machines which have different user
>> accounts and I cannot find a way to specify the login for each machine in
>> the machinefile passed to mpirun.
>> > The only solution I found is to use the -host argument of mpirun, such
>> as:
>> > mpirun -np 2 --host user1@machine1,user2@machine2 MyProgram
>> > which is very inconvenient with a lot of machines.
>> >
>> > Is there a way to do the same using a machinefile text? :
>> > mpirun -np 2 -machinefile machinefile.txt MyProgram
>> >
>> > I cannot find the appropriate syntax for specifying a user in
>> machinefile.txt...
>> >
>> > Thanks in advance,
>> >
>> > Olivier
>> >
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>


Re: [OMPI users] OPenMpi: How to specify login name in machinefile passed to mpirun

2010-03-10 Thread Olivier Riff
OK, it works now thanks. I forgot to add the slots information in the
machinefile.

Cheers,

Olivier



2010/3/10 Ralph Castain <r...@open-mpi.org>

> It is the exact same syntax inside of the machinefile:
>
> user1@machine1 slots=4
> user2@machine2 slots=3
> 
>
>
> On Mar 10, 2010, at 5:41 AM, Olivier Riff wrote:
>
> > Hello,
> >
> > I am using openmpi on several machines which have different user accounts
> and I cannot find a way to specify the login for each machine in the
> machinefile passed to mpirun.
> > The only solution I found is to use the -host argument of mpirun, such
> as:
> > mpirun -np 2 --host user1@machine1,user2@machine2 MyProgram
> > which is very inconvenient with a lot of machines.
> >
> > Is there a way to do the same using a machinefile text? :
> > mpirun -np 2 -machinefile machinefile.txt MyProgram
> >
> > I cannot find the appropriate syntax for specifying a user in
> machinefile.txt...
> >
> > Thanks in advance,
> >
> > Olivier
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] OPenMpi: How to specify login name in machinefile passed to mpirun

2010-03-10 Thread Olivier Riff
Hello,

I am using openmpi on several machines which have different user accounts
and I cannot find a way to specify the login for each machine in the
machinefile passed to mpirun.
The only solution I found is to use the -host argument of mpirun, such as:
mpirun -np 2 --host user1@machine1,user2@machine2 MyProgram
which is very inconvenient with a lot of machines.

Is there a way to do the same using a machinefile text? :
mpirun -np 2 -machinefile machinefile.txt MyProgram

I cannot find the appropriate syntax for specifying a user in
machinefile.txt...

Thanks in advance,

Olivier