Re: [OMPI users] File locking in ADIO, OpenMPI 1.6.4

2014-09-18 Thread Rob Latham
On 09/18/2014 04:56 PM, Beichuan Yan wrote: Rob, Thank you very much for the suggestion. There are two independent scenarios using parallel IO in my code: 1. MPI processes conditionally print, i.e., some processes print in current loop (but may not print in next loop), some processes do

Re: [OMPI users] About debugging and asynchronous communication

2014-09-18 Thread XingFENG
Thanks for your advice. I added tags for messages in ascending order. But it didn't work, either. For example, after 103043 times of communication, in the sender side, it sends an int 78 with tag 206086, followed by 78 bytes data with tag 206087. In the receiver side, it receives an int 41 with

Re: [OMPI users] File locking in ADIO, OpenMPI 1.6.4

2014-09-18 Thread Beichuan Yan
Rob, Thank you very much for the suggestion. There are two independent scenarios using parallel IO in my code: 1. MPI processes conditionally print, i.e., some processes print in current loop (but may not print in next loop), some processes do not print in current loop (but may print next

Re: [OMPI users] About debugging and asynchronous communication

2014-09-18 Thread Gus Correa
There is no guarantee that the messages will be received in the same order that they were sent. Use tags or another mechanism to match the messages on send and recv ends. On 09/18/2014 10:42 AM, XingFENG wrote: I have found some thing strange. Basically, in my codes, processes send and receive

Re: [OMPI users] File locking in ADIO, OpenMPI 1.6.4

2014-09-18 Thread Rob Latham
On 09/17/2014 05:46 PM, Beichuan Yan wrote: Hi Rob, As you pointed out in April that there are many cases that could arouse ADIOI_Set_lock error. My code writes to a file at a location specified by a shared file pointer (it is a blocking and collective call):

Re: [OMPI users] About debugging and asynchronous communication

2014-09-18 Thread Nick Papior Andersen
If all your send/recv are asynchronous you should ensure different tags to decipher between messages. It could be that you have the same tag for two different asynchronous sends, in which case there is a race condition for the receiving end. Also, if you know the upper bound of your messages I

Re: [OMPI users] About debugging and asynchronous communication

2014-09-18 Thread XingFENG
I have found some thing strange. Basically, in my codes, processes send and receive messages to/from others with variable lengths asynchronously. When sending messages, a process would first send the length of message and then the content of the message. When receiving, a process would first

Re: [OMPI users] How does binding option affect network traffic?

2014-09-18 Thread McGrattan, Kevin B. Dr.
Yes and no. When I ran a single job that uses 16 MPI processes, and I mapped by socket and used 8 nodes, 2 ppn, the job ran 30% faster than the same job mapped by core on 2 nodes. Each process was fairly CPU intensive compared to the communication, so I suspect that the speed up was due to the

Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots (updated findings)

2014-09-18 Thread Jeff Squyres (jsquyres)
Lane -- Can you confirm that adding numactl-devel and using --hetero-nodes fixed your problem? On Sep 2, 2014, at 5:08 PM, Ralph Castain wrote: > Argh - yeah, I got confused as things context switched a few too many times. > The 1.8.2 release should certainly understand

Re: [OMPI users] How does binding option affect network traffic?

2014-09-18 Thread Jeff Squyres (jsquyres)
On Sep 5, 2014, at 11:49 PM, Ralph Castain wrote: > It would be about the worst thing you can do, to be honest. Reason is that > each socket is typically a separate NUMA region, and so the shared memory > system would be sub-optimized in that configuration. It would be much

Re: [hwloc-users] problem with X11 using Solaris

2014-09-18 Thread Siegmar Gross
Hi Brice, > I just pushed a fix. Can you verify that this tarball enables X > automatically and properly? Yes, it works fine. Thank you very much for your help. Kind regards Siegmar https://ci.inria.fr/hwloc/job/master-0-tarball/lastSuccessfulBuild/artifact/hwlo

Re: [OMPI users] About debugging and asynchronous communication

2014-09-18 Thread XingFENG
Thank you for your reply! I am still working on my codes. I would update the post when I fix bugs. On Thu, Sep 18, 2014 at 9:48 PM, Nick Papior Andersen wrote: > I just checked, if the tests return "Received" for all messages it will > not go into memory burst. > Hence

Re: [OMPI users] About debugging and asynchronous communication

2014-09-18 Thread Nick Papior Andersen
I just checked, if the tests return "Received" for all messages it will not go into memory burst. Hence doing MPI_Test will be enough. :) Hence, if at any time the mpi-layer is notified about the success of a send/recv it will clean up. This makes sense :) See the updated code. 2014-09-18 13:39

Re: [OMPI users] About debugging and asynchronous communication

2014-09-18 Thread Tobias Kloeffel
ok i have to wait until tomorrow, they have some problems with the network... On 09/18/2014 01:27 PM, Nick Papior Andersen wrote: I am not sure whether test will cover this... You should check it... I here attach my example script which shows two working cases, and one not workning (you

Re: [OMPI users] About debugging and asynchronous communication

2014-09-18 Thread Nick Papior Andersen
I am not sure whether test will cover this... You should check it... I here attach my example script which shows two working cases, and one not workning (you can check the memory usage simultaneously and see that the first two works, the last one goes ballistic in memory). Just check it with

Re: [OMPI users] About debugging and asynchronous communication

2014-09-18 Thread XingFENG
Thanks very much for your reply! To Sir Jeff Squyres: I think it fails due to truncation errors. I am now logging information of each send and receive to find out the reason. To Sir Nick Papior Andersen: Oh, wait (mpi_wait) is never called in my codes. What I do is to call MPI_Irecv once.

Re: [OMPI users] About debugging and asynchronous communication

2014-09-18 Thread Nick Papior Andersen
In complement to Jeff, I would add that using asynchronous messages REQUIRES that you wait (mpi_wait) for all messages at some point. Even though this might not seem obvious it is due to memory allocation "behind the scenes" which are only de-allocated upon completion through a wait statement.

Re: [OMPI users] About debugging and asynchronous communication

2014-09-18 Thread Jeff Squyres (jsquyres)
On Sep 18, 2014, at 2:43 AM, XingFENG wrote: > a. How to get more information about errors? I got errors like below. This > says that program exited abnormally in function MPI_Test(). But is there a > way to know more about the error? > > *** An error occurred in

Re: [hwloc-users] problem with X11 using Solaris

2014-09-18 Thread Brice Goglin
Thanks, I just pushed a fix. Can you verify that this tarball enables X automatically and properly? https://ci.inria.fr/hwloc/job/master-0-tarball/lastSuccessfulBuild/artifact/hwloc-master-20140918.1131.git005a7e8.tar.gz I am looking at the warnings and make check failures you sent. Brice Le

Re: [OMPI users] About debugging and asynchronous communication

2014-09-18 Thread XingFENG
By the way, I am using Open MPI 1.6.5 and programming with C++. On Thu, Sep 18, 2014 at 4:43 PM, XingFENG wrote: > Dear all, > > I am new to MPI. Please forgive me if I ask a redundant question. > > I am now programming about graph processing using MPI. I get two

[OMPI users] About debugging and asynchronous communication

2014-09-18 Thread XingFENG
Dear all, I am new to MPI. Please forgive me if I ask a redundant question. I am now programming about graph processing using MPI. I get two problems as described below. a. How to get more information about errors? I got errors like below. This says that program exited abnormally in function