Re: [OMPI users] Problem with implementation of Foxa algorithm
Hi, at line 171 MPI_Gather([i*matrixSize], blockSize, MPI_DOUBLE, 0, tmpVar[i*matrixSize], MPI_DOUBLE, 0, rowComm); but per the man page int MPI_Gather(const void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm) so you have recvbuf = 0 (!) recvcount = tmpVar[i*matrixSize] i guess you meant to have recvcount = blockSize that being said, tmpVar[i*matrixSize] is and int and it should likely be an double * Cheers, Gilles On 9/24/2015 8:13 AM, Surivinta Surivinta wrote: Hi everybody! I try implement Fox algorithm via mpi, but got some errors (see below) Can someone explain how fix it or give a way for search. Source code attached to letter errors: [estri_mobile:6337] *** An error occurred in MPI_Gather [estri_mobile:6337] *** reported by process [1826816001,0] [estri_mobile:6337] *** on communicator MPI COMMUNICATOR 4 SPLIT FROM 3 [estri_mobile:6337] *** MPI_ERR_COUNT: invalid count argument [estri_mobile:6337] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [estri_mobile:6337] ***and potentially your MPI job) -- С уважением. ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/09/27656.php
Re: [OMPI users] Problem using Open MPI 1.10.0 built with Intel compilers 16.0.0
Fabrice, i do not fully understand the root cause of this error, and you might want to ask Intel folks to comment on that. that being said, and since this compiler does support fortran 2008, i strongly encourage you to use mpi_f08 instead of use mpi a happy feature/side effect is that your program compiles and runs just fine if you use mpi_f08 module (!) Cheers, Gilles On 9/24/2015 1:00 AM, Fabrice Roy wrote: program testmpi use mpi implicit none integer :: pid integer :: ierr integer :: tok call mpi_init(ierr) call mpi_comm_rank(mpi_comm_world, pid,ierr) if(pid==0) then tok = 1 else tok = 0 end if call mpi_bcast(tok,1,mpi_integer,0,mpi_comm_world,ierr) call mpi_finalize(ierr) end program testmpi
[OMPI users] Problem with implementation of Foxa algorithm
Hi everybody! I try implement Fox algorithm via mpi, but got some errors (see below) Can someone explain how fix it or give a way for search. Source code attached to letter errors: [estri_mobile:6337] *** An error occurred in MPI_Gather [estri_mobile:6337] *** reported by process [1826816001,0] [estri_mobile:6337] *** on communicator MPI COMMUNICATOR 4 SPLIT FROM 3 [estri_mobile:6337] *** MPI_ERR_COUNT: invalid count argument [estri_mobile:6337] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [estri_mobile:6337] ***and potentially your MPI job) -- С уважением. #include #include #include #include #include #include #include "somehead.h" int size; // numb of CPU int rank; // numb of current proc int gridSz; // greed size (must be size*size) int gridCoord[2]; // coord proc in grid double *aMatrix; // double *bMatrix; // double *cMatrix; // double *aMatrixBlock; // block matrix for buffer A double *aBufProc; // matix A in current proc double *bBufProc; // matix B in current proc double *cBufProc; // matix C in current proc static MPI_Comm gridComm; // static MPI_Comm rowComm; // static MPI_Comm colComm; // /// // init data in matrices void dataInit(double* aMatrix, double* bMatrix, int matrixSize) { int value = 1; uint i, j; srand(value); for (i = 0; i < matrixSize; i++){ for (j = 0; j < matrixSize; j++){ aMatrix[i * matrixSize+j] = 1.0 + rand() % 5; bMatrix[i * matrixSize+j] = 1.0 + rand() % 7; } } } /// // create comm for 2d grid coord // define coord of proc in current grid // make comm for row and column (MPI_Cart_create) void gridCommCr() { int dimSize[2]; // for carry numb of proc in gridколичество процессов в каждой размерности сетки int period[2]; // 1 - periodicaly size, 0 - not int subDim[2]; // 1 - if dimension must be in subgrid else - 0 dimSize[0] = gridSz; dimSize[1] = gridSz; period[0] = 0; period[1] = 0; MPI_Dims_create(size, 2, dimSize); MPI_Cart_create(MPI_COMM_WORLD, 2, dimSize, period, 1, ); MPI_Cart_coords(gridComm, rank, 2, gridCoord); subDim[0] = 0; // subDim[1] = 1; // MPI_Cart_sub(gridComm, subDim, ); // subDim[0] = 1; // subDim[1] = 0; // MPI_Cart_sub(gridComm, subDim, ); // printf("Comm created!"); } void printMa(double* curMatrix, int numbRow, int numbCol) { uint i, j; for (i = 0; i < numbRow; i++){ for (j = 0; j < numbCol; j++){ printf("%7.4f ", curMatrix[i*numbRow+j]); } printf("\n"); } } // void delivData(double* aMatrix, double* bMatrix, double* aMatrixBlock, double* bBufProc, int matrixSize, int blockSize) { matrixScatter(aMatrix, aMatrixBlock, matrixSize, blockSize); matrixScatter(bMatrix, bBufProc, matrixSize, blockSize); } void matrixScatter(double* curMatrix, double* curBufBlock, int maSize, int blockSize) { uint i; double * tempMaRow = (double*) malloc((blockSize*maSize)*sizeof(double)); // if (gridCoord[1] == 0){ MPI_Scatter(curMatrix, blockSize*maSize, MPI_DOUBLE, tempMaRow, blockSize*maSize, MPI_DOUBLE, 0, colComm); } for (i = 0; i < blockSize; i++){ MPI_Scatter([i*maSize], blockSize, MPI_DOUBLE,&(curBufBlock[i*blockSize]), blockSize, MPI_DOUBLE, 0, rowComm); } free (tempMaRow); } /// void calcParal(double* aMatrix, double* aMatrixBlock, double* bBufProc, double* cBufProc, int blockSize) { int iter; for (iter = 0; iter < gridSz; iter++){ blockAbroadcast(iter, aMatrix, aMatrixBlock, blockSize); blMulti(aMatrix, bBufProc, cBufProc, blockSize); bBlSendRecv(bBufProc, blockSize); } } void blockAbroadcast(int iter, double* aBufProc, double* aMatrixBlock, int blockSize) { uint i; int tmpVar = (gridCoord[0] + iter) % gridSz; if (gridCoord[1] == tmpVar){ for(i = 0; i < blockSize*blockSize; i++){ aBufProc[i] = aMatrixBlock[i]; } } MPI_Bcast(aBufProc, blockSize*blockSize, MPI_DOUBLE, tmpVar, rowComm); } void bBlSendRecv(double* bBufProc, int blockSize) { MPI_Status status; int nextProc = gridCoord[0] + 1; if (gridCoord[0] == gridSz - 1){ nextProc = 0; } int pervProc = gridCoord[0] - 1; if(gridCoord[0] == 0){ pervProc = gridSz -1; } MPI_Sendrecv_replace( bBufProc, blockSize*blockSize, MPI_DOUBLE, nextProc, 0, pervProc, 0, colComm, ); } void blMulti(double* aBlock, double* bBlock, double* cBlock, int matrixSize) {
[OMPI users] Problem using Open MPI 1.10.0 built with Intel compilers 16.0.0
Hello, I have built Open MPI 1.10.0 using Intel compilers 16.0.0. When I am trying to compile the following test code: program testmpi use mpi implicit none integer :: pid integer :: ierr integer :: tok call mpi_init(ierr) call mpi_comm_rank(mpi_comm_world, pid,ierr) if(pid==0) then tok = 1 else tok = 0 end if call mpi_bcast(tok,1,mpi_integer,0,mpi_comm_world,ierr) call mpi_finalize(ierr) end program testmpi I get the following error message: testmpi.f90(21): error #6285: There is no matching specific subroutine for this generic subroutine call. [MPI_BCAST] call mpi_bcast(tok,1,mpi_integer,0,mpi_comm_world,ierr) -^ compilation aborted for testmpi.f90 (code 1) The compilation and execution succeed if I declare tok as integer, dimension(1) :: tok I have also built Open MPI 1.10.0 with GNU 5.2.0 compilers and both versions of the test code (with tok declared as an integer or as an integer, dimension(1) ) compile and execute. Open MPI was configured with the same options with both compilers. Do you have any idea how I could solve this problem? Thanks, Fabrice Roy -- Fabrice Roy Ingénieur en calcul scientifique LUTH - CNRS / Observatoire de Paris 5 place Jules Janssen 92190 Meudon Tel. : 01 45 07 71 20 smime.p7s Description: Signature cryptographique S/MIME
Re: [OMPI users] OpenMPI 1.8.5 build question
Thank you -Original Message- From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Mark Santcroos Sent: Wednesday, September 23, 2015 6:52 AM To: Open MPI Users Subject: Re: [OMPI users] OpenMPI 1.8.5 build question > On 23 Sep 2015, at 13:49 , Kumar, Sudhirwrote: > I have a version of OpenMPI 1.8.5 installed. Is there any way of knowing, > with which version of gcc it was compiled with. ompi_info |grep -i compiler ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/09/27652.php
Re: [OMPI users] OpenMPI 1.8.5 build question
Hi, That may answer to your question : mpicc -showme Jalel Le 23/09/2015 13:49, Kumar, Sudhir a écrit : Hi I have a version of OpenMPI 1.8.5 installed. Is there any way of knowing, with which version of gcc it was compiled with. Thanks ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/09/27651.php -- ** Jalel CHERGUI, LIMSI-CNRS, Bât. 508 - BP 133, 91403 Orsay cedex, FRANCE Tél: (33 1) 69 85 81 27 ; Télécopie: (33 1) 69 85 80 88 Mél: jalel.cher...@limsi.fr ; Référence: http://perso.limsi.fr/chergui **
Re: [OMPI users] OpenMPI 1.8.5 build question
> On 23 Sep 2015, at 13:49 , Kumar, Sudhirwrote: > I have a version of OpenMPI 1.8.5 installed. Is there any way of knowing, > with which version of gcc it was compiled with. ompi_info |grep -i compiler
[OMPI users] OpenMPI 1.8.5 build question
Hi I have a version of OpenMPI 1.8.5 installed. Is there any way of knowing, with which version of gcc it was compiled with. Thanks
Re: [OMPI users] OpenMPI-1.10.0 bind-to core error
I’m really puzzled by that one - we very definitely will report an error and exit if the user specifies that MCA param and we don’t find the given agent. Could you please send us the actual cmd line plus the hostfile you gave, and verify that the MCA param was set? > On Sep 21, 2015, at 8:42 AM, Gilles Gouaillardet >wrote: > > Patrick, > > thanks for the report. > > can you confirm what happened was > - you defined > OMPI_MCA_plm_rsh_agent=oarshmost > - oarshmost was not in the $PATH > - mpirun silently ignored the remote nodes > > if that is correct, then i think mpirun should have reported an error > (oarshmost not found, or cannot remote start orted) > instead of this silent behaviour > > Cheers, > > Gilles > > > On Mon, Sep 21, 2015 at 11:43 PM, Patrick Begou > wrote: >> Hi Gilles, >> >> I've done a big mistake! Compiling the patched version of openMPI and >> creating a new module, I've forgotten to add the path to oarshmost command >> while OMPI_MCA_plm_rsh_agent=oarshmost was set >> OpenMPI was silently ignoring oarshmost command as it was unable to find it >> and so only one node was available! >> >> The good thing is that with your patch, oversuscribing does not occur >> anymore on the nodes, it seems to solves efficiently the problem we had. >> I'll keep this patched version in prod for the users as the previous one was >> allowing 2 processes on some cores time to time, and haphazardly bad code >> performances in thes cases. >> >> Yes this computer is the biggest one of CIMENT mesocenter, it is called... >> froggy and all the nodes are littles frogs :-) >> https://ciment.ujf-grenoble.fr/wiki-pub/index.php/Hardware:Froggy >> >> I was using $OAR_NODEFILE and frog.txt to check different syntax, one with a >> liste of nodes (on line with a node name for each available core) and the >> second with one line per node and the "slots" information for the number of >> cores. EG: >> >> [begou@frog7 MPI_TESTS]$ cat $OAR_NODEFILE >> frog7 >> frog7 >> frog7 >> frog7 >> frog8 >> frog8 >> frog8 >> frog8 >> >> [begou@frog7 MPI_TESTS]$ cat frog.txt >> frog7 slots=4 >> frog8 slots=4 >> >> Thanks again for the patch and your help. >> >> Patrick >> >> >> Gilles Gouaillardet wrote: >> >> Thanks Patrick, >> >> could you please try again with the --hetero-nodes mpirun option ? >> (I am afk, and not 100% sure about the syntax) >> >> could you also submit a job with 2 nodes and 4 cores on each node, that does >> cat /proc/self/status >> oarshmost cat /proc/self/status >> >> btw, is there any reason why do you use a machine file (frog.txt) instead of >> using $OAR_NODEFILE directly ? >> /* not to mention I am surprised a French supercomputer is called "frog" ;-) >> */ >> >> Cheers, >> >> Gilles >> >> On Friday, September 18, 2015, Patrick Begou >> wrote: >>> >>> Gilles Gouaillardet wrote: >>> >>> Patrick, >>> >>> by the way, this will work when running on a single node. >>> >>> i do not know what will happen when you run on multiple node ... >>> since there is no OAR integration in openmpi, i guess you are using ssh to >>> start orted on the remote nodes >>> (unless you instructed ompi to use an OARified version of ssh) >>> >>> Yes, OMPI_MCA_plm_rsh_agent=oarshmost >>> This exports also needed environment instead of multpiple -x options. To >>> be as similar as possible to the environments on french national >>> supercomputers. >>> >>> my concern is the remote orted might not run within the cpuset that was >>> created by OAR for this job, >>> so you might end up using all the cores on the remote nodes. >>> >>> The oar environment does this. With older OpenMPI version all is working >>> fine. >>> >>> please let us know how that works for you >>> >>> Cheers, >>> >>> Gilles >>> >>> >>> On 9/18/2015 5:02 PM, Gilles Gouaillardet wrote: >>> >>> Patrick, >>> >>> i just filled PR 586 https://github.com/open-mpi/ompi-release/pull/586 for >>> the v1.10 series >>> >>> this is only a three line patch. >>> could you please give it a try ? >>> >>> >>> This patch solve the problem when OpenMPI uses one node but now I'm unable >>> to use more than one node. >>> On one node, with 4 cores in the cpuset: >>> >>> mpirun --bind-to core --hostfile $OAR_NODEFILE ./location.exe |grep >>> 'thread is now running on PU' |sort >>> (process 0) thread is now running on PU logical index 0 (OS/physical index >>> 12) on system frog26 >>> (process 1) thread is now running on PU logical index 1 (OS/physical index >>> 13) on system frog26 >>> (process 2) thread is now running on PU logical index 2 (OS/physical index >>> 14) on system frog26 >>> (process 3) thread is now running on PU logical index 3 (OS/physical index >>> 15) on system frog26 >>> >>> [begou@frog26 MPI_TESTS]$ mpirun -np 5 --bind-to core --hostfile >>> $OAR_NODEFILE ./location.exe >>>