Re: [OMPI users] random error bugging me..

2014-01-19 Thread George Bosilca
Thomas, Here is a quick way to see how a function get called after MPI_Finalize. In the following I will use gdb scripting, but with little effort you can adapt this to work with your preferred debugger (lldb as an example). The idea is to break on the function generating the error you get on t

Re: [OMPI users] random error bugging me..

2014-01-19 Thread Ralph Castain
Hard to say what could be the cause of the problem without a better understanding of the code, but the root cause appears to be some code path that allows you to call an MPI function after you called MPI_Finalize. From your description, it appears you have a race condition in the code that activ

Re: [OMPI users] random error bugging me..

2014-01-19 Thread thomas . forde
Yes. It's a shared NSF partition on the nodes. Sendt fra min iPhone > Den 19. jan. 2014 kl. 13:29 skrev "Reuti" : > > Hi, > > Am 18.01.2014 um 22:43 schrieb thomas.fo...@ulstein.com: > > > I have had a running cluster going good for a while, and 2 days ago we decided to upgrade it from 128 to 2

Re: [OMPI users] random error bugging me..

2014-01-19 Thread Reuti
Hi, Am 18.01.2014 um 22:43 schrieb thomas.fo...@ulstein.com: > I have had a running cluster going good for a while, and 2 days ago we > decided to upgrade it from 128 to 256 cores. > > Most om my deployment of nodes goes through cobbler and scripting, and it has > worked fine before.on the fi

[OMPI users] random error bugging me..

2014-01-18 Thread thomas . forde
Hi I have had a running cluster going good for a while, and 2 days ago we decided to upgrade it from 128 to 256 cores. Most om my deployment of nodes goes through cobbler and scripting, and it has worked fine before.on the first 8 nodes. But after adding new nodes, everything is fucked up and