Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-24 Thread Gutierrez, Samuel K
Hi Ralph, Yes, you are absolutely correct. A user can suppress the warning, however, by simply setting shmem_mmap_enable_nfs_warning to 0. For what it's worth, I just verified that the warning shows itself on Panasas and NFS. Looks like Lustre and GPFS will behave similarly. Sam On Apr 24,

Re: [OMPI users] MPI doesn't recognize multiple cores available on multicore machines

2012-04-24 Thread Kyle Boe
Right, I tried using a hostfile, and it made no difference. This is running OpenMPI 1.4.4 on CentOS 5.x machines. The original issue was an error trap built into my code, where it said one of the cores was asking for information it already owned. I'm sorry to be vague, but I can't share anything

Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-24 Thread Ralph Castain
I thought we had code in the 1.5 series that would "bark" if the tmp dir was on a network mount? Is that not true? On Apr 24, 2012, at 3:20 PM, Gutierrez, Samuel K wrote: > Hi, > > I just wanted to record the behind the scenes resolution to this particular > issue. For more info, take a look

Re: [OMPI users] MPI doesn't recognize multiple cores available on multicore machines

2012-04-24 Thread Ralph Castain
You don't need a hostfile to run multiple procs on the localhost. What version of OMPI are you using? What was the original issue? On Apr 24, 2012, at 4:07 PM, Jingcha Joba wrote: > Try using slots in hostfile ? > > -- > Sent from my iPhone > > On Apr 24, 2012, at 2:52 PM, Kyle Boe

Re: [OMPI users] MPI doesn't recognize multiple cores available on multicore machines

2012-04-24 Thread Jingcha Joba
Try using slots in hostfile ? -- Sent from my iPhone On Apr 24, 2012, at 2:52 PM, Kyle Boe wrote: > I'm having a problem trying to use OpenMPI on some multicore machines I have. > The code I am running was giving me errors which suggested that MPI was > assigning multiple

[OMPI users] MPI doesn't recognize multiple cores available on multicore machines

2012-04-24 Thread Kyle Boe
I'm having a problem trying to use OpenMPI on some multicore machines I have. The code I am running was giving me errors which suggested that MPI was assigning multiple processes to the same core (which I do not want). So, I tried launching my job using the -nooversubscribe option, and I get this

Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-24 Thread Gutierrez, Samuel K
Hi, I just wanted to record the behind the scenes resolution to this particular issue. For more info, take a look at: https://svn.open-mpi.org/trac/ompi/ticket/3076 It seems as if the problem stems from /tmp being mounted as an NFS space that is shared between the compute nodes. This

Re: [OMPI users] Optimal 3-D Cartesian processor mapping

2012-04-24 Thread Jeffrey Squyres
On Apr 24, 2012, at 3:33 PM, Tom Rosmond wrote: > Yes, I would be interested in such a plugin. But be advised that I am > strictly a fortran programmer, so if it requires any C/C++ talent, I > would be in trouble. So maybe, before jumping into that, I would like > to be able to look at what

Re: [OMPI users] MPI_Allreduce hangs

2012-04-24 Thread Brock Palen
To throw in my $0.02, though it is worth less. Were you running this on verb based infiniband? We see a problem that we have a work around for even with the newest 1.4.5 only on IB, we can reproduce it with IMB. You can find an old thread from me about it. Your problem might not be the

Re: [OMPI users] Optimal 3-D Cartesian processor mapping

2012-04-24 Thread Tom Rosmond
Will do. My machine is currently quite busy, so it will be a while before I get answers. Stay tuned. T. Rosmond On Tue, 2012-04-24 at 13:36 -0600, Ralph Castain wrote: > Add --display-map to your mpirun cmd line > > On Apr 24, 2012, at 1:33 PM, Tom Rosmond wrote: > > > Jeff, > > > > Yes, I

Re: [OMPI users] Optimal 3-D Cartesian processor mapping

2012-04-24 Thread Ralph Castain
Add --display-map to your mpirun cmd line On Apr 24, 2012, at 1:33 PM, Tom Rosmond wrote: > Jeff, > > Yes, I would be interested in such a plugin. But be advised that I am > strictly a fortran programmer, so if it requires any C/C++ talent, I > would be in trouble. So maybe, before jumping

Re: [OMPI users] Optimal 3-D Cartesian processor mapping

2012-04-24 Thread Tom Rosmond
Jeff, Yes, I would be interested in such a plugin. But be advised that I am strictly a fortran programmer, so if it requires any C/C++ talent, I would be in trouble. So maybe, before jumping into that, I would like to be able to look at what processor/node mapping Open-mpi is actually giving

Re: [OMPI users] Segmentation fault during MPI initialization

2012-04-24 Thread Jeffrey Squyres
That's very odd, indeed -- it's listed as being inside MPI_INIT, but we don't get any further details from there. :-\ Any chance you could try upgrading to OMPI 1.4.5 and/or 1.5.5? On Apr 24, 2012, at 1:57 PM, Jeffrey A Cummings wrote: > I've been having an intermittent failure during MPI

Re: [OMPI users] MPI_Allreduce hangs

2012-04-24 Thread Jeffrey Squyres
Could you repeat your tests with 1.4.5 and/or 1.5.5? On Apr 23, 2012, at 1:32 PM, Martin Siegert wrote: > Hi, > > I am debugging a program that hangs in MPI_Allreduce (openmpi-1.4.3). > An strace of one of the processes shows: > > Process 10925 attached with 3 threads - interrupt to quit >

[OMPI users] Optimal 3-D Cartesian processor mapping

2012-04-24 Thread Tom Rosmond
We have a large ensemble-based atmospheric data assimilation system that does a 3-D cartesian partitioning of the 'domain' using MPI_DIMS_CREATE, MPI_CART_CREATE, etc. Two of the dimensions are spacial, i.e. latitude and longitude; the third is an 'ensemble' dimension, across which subsets of

Re: [OMPI users] Ompi-restart failed and process migration

2012-04-24 Thread Josh Hursey
The ~/.openmpi/mca-params.conf file should contain the same information on all nodes. You can install Open MPI as root. However, we do not recommend that you run Open MPI as root. If the user $HOME directory is NFS mounted, then you can use an NFS mounted directory to store your files. With this

Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-24 Thread Seyyed Mohtadin Hashemi
Hi, I ran those cmd's and have posted the outputs on: https://svn.open-mpi.org/trac/ompi/ticket/3076 -mca shmem posix worked for all -np (even when oversubscribing), however sysv did not work for any -np. On Tue, Apr 24, 2012 at 5:36 PM, Gutierrez, Samuel K wrote: > Hi, > >

Re: [OMPI users] Segmentation fault during MPI initialization

2012-04-24 Thread Gus Correa
Hi Jeffrey Assuming you are on Linux, a frequent cause of out-of-nowhere segfaults is a limited/small stack size. They can happen if you [ab]use big automatic arrays, etc. You can set the stacksize bigger/unlimited with the ulimit/limit command, or edit the /etc/security/limits.conf. Of

[OMPI users] Segmentation fault during MPI initialization

2012-04-24 Thread Jeffrey A Cummings
I've been having an intermittent failure during MPI initialization (v 1.4.3) for several months. It comes and goes as I make changes to my application, that is changes unrelated to MPI calls. Even when I have a version of my app which shows the problem, it doesn't happen on every submittal.

Re: [OMPI users] Ompi-restart failed and process migration

2012-04-24 Thread kidd
Hi ,Thank you For your reply. I have some problem: Q1:  I setting 2 kinds  mac.para.conf     (1) crs_base_snapshot_dir=/root/kidd_openMPI/Tmp   snapc_base_global_snapshot_dir=/root/kidd_openMPI/checkpoints My Master : /root/kidd_openMPI   is My opempi-Installed Dir 

Re: [OMPI users] Ompi-restart failed and process migration

2012-04-24 Thread kidd
Hi ,Thank you For your reply.   but I still failed. I must add -x  LD_LIBRARY_PATH this is my  All Setting ; 1) Master-Node(cuda07)  &  Slaves Node(cuda08) :    Configure:    ./configure --prefix=/root/kidd_openMPI  --with-ft=cr  --enable-ft-thread  --with-blcr=/usr/local/BLCR    

Re: [OMPI users] MPI and CUDA

2012-04-24 Thread Rolf vandeVaart
I am not sure about everything that is going wrong, but there are at least two issues I found. First, you are skipping the first line that you read from integers.txt. Maybe something like this instead. while(fgets(line, sizeof line, fp)!= NULL){ sscanf(line,"%d",[k]); sum = sum +

Re: [OMPI users] Regarding the problem while connecting to nodes present in a cluster

2012-04-24 Thread Jeffrey Squyres
It looks like you are using LAM/MPI. This list is for supporting Open MPI, a wholly different MPI software implementation. However, speaking as one of the core LAM/MPI developers, I'll tell you that you should uninstall LAM and install Open MPI install. We abandoned LAM/MPI several years

Re: [OMPI users] HRM problem

2012-04-24 Thread Syed Ahsan Ali
I am not familiar with attaching debugger to the processes. Other things you asked are as follows: Is this the first time you've ran it (with Open MPI? with any MPI?) *No We have been running this and other models but this problem has arised now * How many processes is the job using? Are you

Re: [OMPI users] HRM problem

2012-04-24 Thread TERRY DONTJE
To determine if an MPI process is waiting for a message do what Rayson suggested and attach a debugger to the processes and see if any of them are stuck in MPI. Either internally in a MPI_Recv or MPI_Wait call or looping on a MPI_Test call. Other things to consider. Is this the first time

[OMPI users] Regarding the problem while connecting to nodes present in a cluster

2012-04-24 Thread seshendra seshu
Hi, I have installed MPI and when i tried to run MPI parallelly on all the nodes, that while MPI is looking to establish connection i have been getting the following error "*ERROR: LAM/MPI unexpectedly received the following on stderr:Permission denied (publickey,gssapi-with-mic)." so any one

[OMPI users] MPI and CUDA

2012-04-24 Thread Rohan Deshpande
I am combining mpi and cuda. Trying to find out sum of array elements using cuda and using mpi to distribute the array. my cuda code #include __global__ void add(int *devarray, int *devsum) { int index = blockIdx.x * blockDim.x + threadIdx.x; *devsum = *devsum +

Re: [OMPI users] HRM problem

2012-04-24 Thread Syed Ahsan Ali
Dear Rayson, That is a Nuemrical model that is written by National weather service of a country. The logs of the model show every detail about the simulation progress. I have checked on the remote nodes as well the application binary is running but the logs show no progress, it is just waiting at

Re: [OMPI users] HRM problem

2012-04-24 Thread Rayson Ho
Seems like there's a bug in the application. Did you or someone else write it, or did you get it from an ISV?? You can log onto one of the nodes, attach a debugger, and see if the MPI task is waiting for a message (looping in one of the MPI receive functions)... Rayson

[OMPI users] HRM problem

2012-04-24 Thread Syed Ahsan Ali
Dear All, I am having problem with running an application on Dell cluster . The model starts well but no further progress is shown. It just stuck. I have checked the systems, no apparent hardware error is there. Other open mpi applications are running well on the same cluster. I have tried