Re: [OMPI users] qsub error
yep, runs well now. On Sat, Feb 16, 2013 at 6:50 AM, Jeff Squyres (jsquyres)wrote: > Glad you got it working! > > On Feb 15, 2013, at 6:53 PM, Erik Nelson wrote: > > > I may have deleted any responses to this message. In either case, we > appear to have fixed the problem > > by installing a more current version of openmpi. > > > > > > On Thu, Feb 14, 2013 at 2:27 PM, Erik Nelson > wrote: > > > > I'm encountering an error using qsub that none of us can figure out. MPI > C++ programs seem to > > run fine when executed from the command line, but for some reason when I > submit them through > > the queue I get a strange error message .. > > > > > > > [compute-3-12.local][[58672,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] > > connect() to 2002:8170:6c2f:b:21d:9ff:fefd:7d94 failed: Permission > denied (13) > > > > > > the compute node 3-12 doesn't matter (the error can generate from any of > the nodes, and I'm > > guessing that 3-12 is the parent node here). > > > > To check if there was some problem with my own code, I created a simple > 'hello world' program > > (see attached files). > > > > Again, the program runs fine from the command line but fails in qsub > with the same sort of error > > message. > > > > I have included (i) the code (ii) the job script for qsub, and (iii) the > ".o" file from qsub for the > > "hello world" program. > > > > These don't look like MPI errors, but rather some conflict with, maybe, > secure communication > > across nodes. > > > > Is there something simple I can do to fix this? > > > > Thanks, Erik > > > > -- > > Erik Nelson > > > > Howard Hughes Medical Institute > > 6001 Forest Park Blvd., Room ND10.124 > > Dallas, Texas 75235-9050 > > > > p : 214 645 5981 > > f : 214 645 5948 > > > > > > > > -- > > Erik Nelson > > > > Howard Hughes Medical Institute > > 6001 Forest Park Blvd., Room ND10.124 > > Dallas, Texas 75235-9050 > > > > p : 214 645 5981 > > f : 214 645 5948 > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Erik Nelson Howard Hughes Medical Institute 6001 Forest Park Blvd., Room ND10.124 Dallas, Texas 75235-9050 p : 214 645 5981 f : 214 645 5948
Re: [OMPI users] qsub error
Glad you got it working! On Feb 15, 2013, at 6:53 PM, Erik Nelsonwrote: > I may have deleted any responses to this message. In either case, we appear > to have fixed the problem > by installing a more current version of openmpi. > > > On Thu, Feb 14, 2013 at 2:27 PM, Erik Nelson wrote: > > I'm encountering an error using qsub that none of us can figure out. MPI C++ > programs seem to > run fine when executed from the command line, but for some reason when I > submit them through > the queue I get a strange error message .. > > > [compute-3-12.local][[58672,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] > > connect() to 2002:8170:6c2f:b:21d:9ff:fefd:7d94 failed: Permission denied (13) > > > the compute node 3-12 doesn't matter (the error can generate from any of the > nodes, and I'm > guessing that 3-12 is the parent node here). > > To check if there was some problem with my own code, I created a simple > 'hello world' program > (see attached files). > > Again, the program runs fine from the command line but fails in qsub with the > same sort of error > message. > > I have included (i) the code (ii) the job script for qsub, and (iii) the ".o" > file from qsub for the > "hello world" program. > > These don't look like MPI errors, but rather some conflict with, maybe, > secure communication > accross nodes. > > Is there something simple I can do to fix this? > > Thanks, Erik > > -- > Erik Nelson > > Howard Hughes Medical Institute > 6001 Forest Park Blvd., Room ND10.124 > Dallas, Texas 75235-9050 > > p : 214 645 5981 > f : 214 645 5948 > > > > -- > Erik Nelson > > Howard Hughes Medical Institute > 6001 Forest Park Blvd., Room ND10.124 > Dallas, Texas 75235-9050 > > p : 214 645 5981 > f : 214 645 5948 > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] qsub error
I may have deleted any responses to this message. In either case, we appear to have fixed the problem by installing a more current version of openmpi. On Thu, Feb 14, 2013 at 2:27 PM, Erik Nelsonwrote: > > I'm encountering an error using qsub that none of us can figure out. MPI > C++ programs seem to > run fine when executed from the command line, but for some reason when I > submit them through > the queue I get a strange error message .. > > > [compute-3-12.local][[58672,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] > > connect() to 2002:8170:6c2f:b:21d:9ff:fefd:7d94 failed: Permission denied > (13) > > > the compute node 3-12 doesn't matter (the error can generate from any of > the nodes, and I'm > guessing that 3-12 is the parent node here). > > To check if there was some problem with my own code, I created a simple > 'hello world' program > (see attached files). > > Again, the program runs fine from the command line but fails in qsub with > the same sort of error > message. > > I have included (i) the code (ii) the job script for qsub, and (iii) the > ".o" file from qsub for the > "hello world" program. > > These don't look like MPI errors, but rather some conflict with, maybe, > secure communication > accross nodes. > > Is there something simple I can do to fix this? > > Thanks, Erik > > -- > Erik Nelson > > Howard Hughes Medical Institute > 6001 Forest Park Blvd., Room ND10.124 > Dallas, Texas 75235-9050 > > p : 214 645 5981 > f : 214 645 5948 -- Erik Nelson Howard Hughes Medical Institute 6001 Forest Park Blvd., Room ND10.124 Dallas, Texas 75235-9050 p : 214 645 5981 f : 214 645 5948
[OMPI users] qsub error
I'm encountering an error using qsub that none of us can figure out. MPI C++ programs seem to run fine when executed from the command line, but for some reason when I submit them through the queue I get a strange error message .. [compute-3-12.local][[58672,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 2002:8170:6c2f:b:21d:9ff:fefd:7d94 failed: Permission denied (13) the compute node 3-12 doesn't matter (the error can generate from any of the nodes, and I'm guessing that 3-12 is the parent node here). To check if there was some problem with my own code, I created a simple 'hello world' program (see attached files). Again, the program runs fine from the command line but fails in qsub with the same sort of error message. I have included (i) the code (ii) the job script for qsub, and (iii) the ".o" file from qsub for the "hello world" program. These don't look like MPI errors, but rather some conflict with, maybe, secure communication accross nodes. Is there something simple I can do to fix this? Thanks, Erik Nelson Howard Hughes Medical Institute 6001 Forest Park Blvd., Room ND10.124 Dallas, Texas 75235-9050 p : 214 645 5981 f : 214 645 5948 #include #include "/opt/openmpi/include/mpi.h" #define bufdim128 int main(int argc, char *argv[]) { char buffer[bufdim]; char id_str[32]; // mpi : MPI::Init(argc,argv); MPI::Status status; int size; int rank; int tag; size=MPI::COMM_WORLD.Get_size(); rank=MPI::COMM_WORLD.Get_rank(); tag=0; if (rank==0) { printf("%d: we have %d processors\n",rank,size); int i; i=1; for ( ;i