Re: [OMPI users] Question on staging in checkpoint

2010-09-13 Thread Joshua Hursey
Adjust the 'filem_rsh_max_incomming' parameter:
 http://osl.iu.edu/research/ft/ompi-cr/api.php#mca-filem_rsh_max_incomming

I defaulted this MCA parameter to 10 since, depending on how big each 
individual checkpoint is, you will find that often sending them all at once is 
often worse than sending only a window of them at a time. I would recommend 
trying a few different values for this parameter and seeing the impact it has 
both on checkpoint overhead (additional application overhead) and checkpoint 
latency (the time it takes for the checkpoint to completely finish).

-- Josh

On Sep 13, 2010, at 7:42 PM,   
wrote:

> Hi
>  
> I was trying out the staging option in checkpoint where I save the checkpoint 
> image in local file system and have the image transferred to global 
> filesystem in the background. As part of the background process I see that 
> the “scp” command is launched to transfer the images from local file system 
> to global file system. I am using openmpi-1.5rc6 with BLCR 0.8.2.
>  
> In my experiment, I had about 128 cores saved their respective checkpoint 
> images on local file system. During the background process, I see that only 
> 10 “scp” requests are sent at a time. Is this a configurable parameter? Since 
> these commands will run on respective nodes, how can I launch all 128 scp 
> requests (to take care of all 128 images in my experiment) simultaneously?
>  
> Thanks
> Ananda
> Please do not print this email unless it is absolutely necessary.
> 
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you are not 
> the intended recipient, you should not disseminate, distribute or copy this 
> e-mail. Please notify the sender immediately and destroy all copies of this 
> message and any attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should 
> check this email and any attachments for the presence of viruses. The company 
> accepts no liability for any damage caused by any virus transmitted by this 
> email.
> 
> www.wipro.com
> 
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://www.cs.indiana.edu/~jjhursey




[OMPI users] Question on staging in checkpoint

2010-09-13 Thread ananda.mudar
Hi



I was trying out the staging option in checkpoint where I save the
checkpoint image in local file system and have the image transferred to
global filesystem in the background. As part of the background process I
see that the "scp" command is launched to transfer the images from local
file system to global file system. I am using openmpi-1.5rc6 with BLCR
0.8.2.



In my experiment, I had about 128 cores saved their respective
checkpoint images on local file system. During the background process, I
see that only 10 "scp" requests are sent at a time. Is this a
configurable parameter? Since these commands will run on respective
nodes, how can I launch all 128 scp requests (to take care of all 128
images in my experiment) simultaneously?



Thanks

Ananda


Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email. 

www.wipro.com


Re: [OMPI users] latency #2

2010-09-13 Thread Eugene Loh

Georges Markomanolis wrote:


Dear all,

Hi again, after using MPI_Ssend seems to be what I was looking for but 
I would like to know more about MPI_Send.


For example sending 1 byte with MPI_Send it takes 8.69 microsec but 
with MPI_Ssend it takes 152.9 microsec. I understand the difference 
but it seems that from one message's size and after their difference 
is not so big like trying for 518400 bytes where it needs 3515.78 
microsec with MPI_Send and 3584.1 microsec with MPI_Ssend. So has is 
there any rule to figure out (of course it depends on the hardware) 
the threshold that after this size the difference between the timings 
of MPI_Send and MPI_Send is not so big or at least how to find it for 
my hardware? 


Most MPI implementations choose one strategy for passing short messages 
(sender sends, MPI implementation buffers the message, receiver 
receives) and long messages (sender alerts receiver, receiver replies, 
then sender and receiver coordinate the transfer).  The first style is 
"eager" (sender sends eagerly without waiting for receiver to 
coordinate) while the second style is "rendezvous" (sender and receiver 
meet).  The message size at which the crossover occurs can be determined 
or changed.  In OMPI, it depends on the BTL.  E.g., try "ompi_info -a | 
grep eager".


Try the OMPI FAQ at http://www.open-mpi.org/faq/ and look at the 
"Tuning" categories.


Re: [OMPI users] computing the latency with OpenMpi

2010-09-13 Thread Eugene Loh

Georges Markomanolis wrote:

I have some questions about the duration of the communication with 
MPI_Send and MPI_Recv. I am using either SkaMPI either my 
implementation to measure the pingpong (MPI_Send and MPI_Recv) time 
between two nodes for 1 byte and more. The timing of the pingpong is 
106.8 microseconds. Although if I measure only the ping of the message 
(only the MPI_Send) the time is ~20 microseconds. Could anyone explain 
me why it is not the half? I would like to understand what is the 
difference inside to OpenMpi about MPI_Send and MPI_Recv.


The time for the MPI_Send is the time to move the data out of the user's 
send buffer.  It is quite possible that the data has not yet gotten to 
the destination.  If the message is short, it could be buffered 
somewhere by the MPI implementation.


The time for MPI_Recv probably includes some amount of waiting time.



More analytical the timings for pingpong between two nodes with a 
simple pingpong application, timings only for rank 0 (almost the same 
for rank 1):

1 byte, time for MPI_Send, 9 microsec, time for MPI_Recv, 86.4 microsec
1600 bytes, time for MPI_Send, 14.7 microsec, time for MPI_Recv, 
197.07 microsec
3200 bytes, time for MPI_Send, 19.73 microsec, time for MPI_Recv, 
227.6 microsec
518400 bytes, time for MPI_Send, 3536.5 microsec, time for MPI_Recv, 
5739.6 microsec
1049760 bytes, time for MPI_Send, 8020.33 microsec, time for MPI_Recv, 
10287 microsec


So the duration of the MPI_Send is till the buffer goes to the queue 
of the destination without the message to be saved in the memory or 
something like this, right?


It is possible that the data has not gone to the destination, but only 
some intermediate buffer, but yes it is possible that the message has 
not made it all the way to the receive buffer by the time the MPI_Send 
has finished.


So if I want to know the real time of sending one message to another 
node (taking the half of pingpoing seems that is not right)


It is not clear to me what "the real time" is.  I don't think there is 
any well-defined answer.  It depends on what you're really looking for, 
and that is unclear to me.  You could send many sends to many receivers 
and see how fast a process can emit sends.  You can use a profiler to 
send how the MPI implementation spends its time;  I've had some success 
with using Oracle Studio Performance Analyzer on OMPI.  You could use 
the PERUSE instrumentation inside of OMPI to get timestamps on 
particular internal events.  You could try designing other experiments.  
But which one is "right" could be debated.


Why does it matter?  What are you really looking for?


should I use a program with other commands like  MPI_Fence, MPI_Put etc?


Those are a different set of calls (one-sided operations) that could be 
more or less efficient than Send/Recv.  It varies.


Or is there any flag when I execute the application where MPI_Send 
behaves like I would expect? According to MPI standards what is 
MPI_Send measuring? If there is any article which explain all these 
please inform me. 


MPI_Send completes when the data has left the send buffer and that 
buffer can be reused by the application.  There are many implementation 
choices.  Specifically, it is possible that the MPI_Send will complete 
even before the MPI_Recv has started.  But it is also possible that the 
MPI_Send will not complete until after the MPI_Recv has completed.  It 
depends on the implementation, which may choose a strategy based on the 
message size, the interconnect, and other factors.


Re: [OMPI users] latency #2

2010-09-13 Thread Ashley Pittman

On 13 Sep 2010, at 12:20, Georges Markomanolis wrote:

> Dear all,
> 
> Hi again, after using MPI_Ssend seems to be what I was looking for but I 
> would like to know more about MPI_Send.
> 
> For example sending 1 byte with MPI_Send it takes 8.69 microsec but with 
> MPI_Ssend it takes 152.9 microsec. I understand the difference but it seems 
> that from one message's size and after their difference is not so big like 
> trying for 518400 bytes where it needs 3515.78 microsec with MPI_Send and 
> 3584.1 microsec with MPI_Ssend.

It sounds like you are measuring send overhead rather than latency, in fact as 
far as I know it's impossible to measure the send latency as you have no way of 
being able to know when to 'stop the clock', this is why ping-pong latency is 
always quoted.  I suspect the underlying latency of the two sends is very 
similar to each other in practice.

> So has is there any rule to figure out (of course it depends on the hardware) 
> the threshold that after this size the difference between the timings of 
> MPI_Send and MPI_Send is not so big or at least how to find it for my 
> hardware?

Yes there is but I'm not familiar enough with OMPI to be able to tell you, I'm 
sure somebody can though.  If my suspicion above is correct I have doubt 
knowing what this value is would help you at all though in terms of application 
performance.

Ashley.

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk




[OMPI users] latency #2

2010-09-13 Thread Georges Markomanolis

Dear all,

Hi again, after using MPI_Ssend seems to be what I was looking for but I 
would like to know more about MPI_Send.


For example sending 1 byte with MPI_Send it takes 8.69 microsec but with 
MPI_Ssend it takes 152.9 microsec. I understand the difference but it 
seems that from one message's size and after their difference is not so 
big like trying for 518400 bytes where it needs 3515.78 microsec with 
MPI_Send and 3584.1 microsec with MPI_Ssend. So has is there any rule to 
figure out (of course it depends on the hardware) the threshold that 
after this size the difference between the timings of MPI_Send and 
MPI_Send is not so big or at least how to find it for my hardware?


Thanks a lot,
Best regards,
Georges


[OMPI users] computing the latency with OpenMpi

2010-09-13 Thread Georges Markomanolis

Dear all,

I have some questions about the duration of the communication with 
MPI_Send and MPI_Recv. I am using either SkaMPI either my implementation 
to measure the pingpong (MPI_Send and MPI_Recv) time between two nodes 
for 1 byte and more. The timing of the pingpong is 106.8 microseconds. 
Although if I measure only the ping of the message (only the MPI_Send) 
the time is ~20 microseconds. Could anyone explain me why it is not the 
half? I would like to understand what is the difference inside to 
OpenMpi about MPI_Send and MPI_Recv.


More analytical the timings for pingpong between two nodes with a simple 
pingpong application, timings only for rank 0 (almost the same for rank 1):

1 byte, time for MPI_Send, 9 microsec, time for MPI_Recv, 86.4 microsec
1600 bytes, time for MPI_Send, 14.7 microsec, time for MPI_Recv, 197.07 
microsec
3200 bytes, time for MPI_Send, 19.73 microsec, time for MPI_Recv, 227.6 
microsec
518400 bytes, time for MPI_Send, 3536.5 microsec, time for MPI_Recv, 
5739.6 microsec
1049760 bytes, time for MPI_Send, 8020.33 microsec, time for MPI_Recv, 
10287 microsec


So the duration of the MPI_Send is till the buffer goes to the queue of 
the destination without the message to be saved in the memory or 
something like this, right?  So if I want to know the real time of 
sending one message to another node (taking the half of pingpoing seems 
that is not right) should I use a program with other commands like  
MPI_Fence, MPI_Put etc? Or is there any flag when I execute the 
application where MPI_Send behaves like I would expect? According to MPI 
standards what is MPI_Send measuring? If there is any article which 
explain all these please inform me.


Thanks a lot,
Best regards,
George Markomanolis


Re: [OMPI users] send message twice

2010-09-13 Thread Srikanth Raju
2010/9/13 김효한 

> Hi all.
>
> I have some problem with sending messages. I want to send 2 messages to
> each node.
>
> for example, send 2 messages to 2 nodes,
>
> if (rank == 0) {
> for (dest = 1; dest < numProcs; dest++) {
> MPI_Send(, 1, MPI_INT, dest, 1, MPI_COMM_WORLD);
> MPI_Send(, 1, MPI_INT, dest, 2, MPI_COMM_WORLD);
> }
>
> } else {
> MPI_Recv(_recv, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, );
> MPI_Recv(_recv, 1, MPI_INT, 0, 2, MPI_COMM_WORLD, );
> }
>
>
On a slightly different note, a MPI_Broadcast seems to be the right function
to use here.


> but it doesn't work well. Only first message(sending data "a" to node1) is
> sent successfully, but the rest 3 transmissions(sending data "b" to node1
> and sending data "a" and "b" to node2) have no response which seems to be
> deadlock. There are no runtime error.
>
> The version 1.4.1 has been used.
>
>
> best regards,
> hyo
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Regards,
Srikanth Raju


Re: [OMPI users] send message twice

2010-09-13 Thread Trent Creekmore
I find issues like this is related to security issue. Firewall, not enough
access privilege, SE Linux, etc.







From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of ???
Sent: Monday, September 13, 2010 12:04 AM
To: us...@open-mpi.org
Subject: [OMPI users] send message twice



Hi all.

I have some problem with sending messages. I want to send 2 messages to
each node.

for example, send 2 messages to 2 nodes,

if (rank == 0) {
for (dest = 1; dest < numProcs; dest++) {
MPI_Send(, 1, MPI_INT, dest, 1, MPI_COMM_WORLD);
MPI_Send(, 1, MPI_INT, dest, 2, MPI_COMM_WORLD);
}

} else {
MPI_Recv(_recv, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, );
MPI_Recv(_recv, 1, MPI_INT, 0, 2, MPI_COMM_WORLD, );
}

but it doesn't work well. Only first message(sending data "a" to node1) is
sent successfully, but the rest 3 transmissions(sending data "b" to node1
and sending data "a" and "b" to node2) have no response which seems to be
deadlock. There are no runtime error.

The version 1.4.1 has been used.


best regards,
hyo


 



[OMPI users] send message twice

2010-09-13 Thread 김효한
Hi all.
I have some problem with sending messages. I want to send 2 messages to each 
node.
for example, send 2 messages to 2 nodes,
 if (rank == 0) {
 for (dest = 1; dest  numProcs; dest++) {
 MPI_Send(, 1, MPI_INT, dest, 1, MPI_COMM_WORLD);
 MPI_Send(, 1, MPI_INT, dest, 2, MPI_COMM_WORLD);
 }
 } else {
 MPI_Recv(_recv, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, );
 MPI_Recv(_recv, 1, MPI_INT, 0, 2, MPI_COMM_WORLD, );
 }
but it doesn't work well. Only first message(sending data "a" to node1) is sent 
successfully, but the rest 3 transmissions(sending data "b" to node1 and 
sending data "a" and "b" to node2) have no response which seems to be deadlock. 
There are no runtime error.
The version 1.4.1 has been used.
best regards,
hyo