Re: [OMPI users] qsub error

2013-02-16 Thread Erik Nelson
yep, runs well now.

On Sat, Feb 16, 2013 at 6:50 AM, Jeff Squyres (jsquyres)  wrote:

> Glad you got it working!
>
> On Feb 15, 2013, at 6:53 PM, Erik Nelson  wrote:
>
> > I may have deleted any responses to this message. In either case, we
> appear to have fixed the problem
> > by installing a more current version of openmpi.
> >
> >
> > On Thu, Feb 14, 2013 at 2:27 PM, Erik Nelson 
> wrote:
> >
> > I'm encountering an error using qsub that none of us can figure out. MPI
> C++ programs seem to
> > run fine when executed from the command line, but for some reason when I
> submit them through
> > the queue I get a strange error message ..
> >
> >
> >
> [compute-3-12.local][[58672,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
> > connect() to 2002:8170:6c2f:b:21d:9ff:fefd:7d94 failed: Permission
> denied (13)
> >
> >
> > the compute node 3-12 doesn't matter (the error can generate from any of
> the nodes, and I'm
> > guessing that 3-12 is the parent node here).
> >
> > To check if there was some problem with my own code, I created a simple
> 'hello world' program
> > (see attached files).
> >
> > Again, the program runs fine from the command line but fails in qsub
> with the same sort of error
> > message.
> >
> > I have included (i) the code (ii) the job script for qsub, and (iii) the
> ".o" file from qsub for the
> > "hello world" program.
> >
> > These don't look like MPI errors, but rather some conflict with, maybe,
> secure communication
> > across nodes.
> >
> > Is there something simple I can do to fix this?
> >
> > Thanks, Erik
> >
> > --
> > Erik Nelson
> >
> > Howard Hughes Medical Institute
> > 6001 Forest Park Blvd., Room ND10.124
> > Dallas, Texas 75235-9050
> >
> > p : 214 645 5981
> > f : 214 645 5948
> >
> >
> >
> > --
> > Erik Nelson
> >
> > Howard Hughes Medical Institute
> > 6001 Forest Park Blvd., Room ND10.124
> > Dallas, Texas 75235-9050
> >
> > p : 214 645 5981
> > f : 214 645 5948
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Erik Nelson

Howard Hughes Medical Institute
6001 Forest Park Blvd., Room ND10.124
Dallas, Texas 75235-9050

p : 214 645 5981
f : 214 645 5948


Re: [OMPI users] qsub error

2013-02-16 Thread Jeff Squyres (jsquyres)
Glad you got it working!

On Feb 15, 2013, at 6:53 PM, Erik Nelson  wrote:

> I may have deleted any responses to this message. In either case, we appear 
> to have fixed the problem 
> by installing a more current version of openmpi.
> 
> 
> On Thu, Feb 14, 2013 at 2:27 PM, Erik Nelson  wrote:
> 
> I'm encountering an error using qsub that none of us can figure out. MPI C++ 
> programs seem to 
> run fine when executed from the command line, but for some reason when I 
> submit them through 
> the queue I get a strange error message ..
> 
> 
> [compute-3-12.local][[58672,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>  
> connect() to 2002:8170:6c2f:b:21d:9ff:fefd:7d94 failed: Permission denied (13)
> 
> 
> the compute node 3-12 doesn't matter (the error can generate from any of the 
> nodes, and I'm 
> guessing that 3-12 is the parent node here). 
> 
> To check if there was some problem with my own code, I created a simple 
> 'hello world' program 
> (see attached files).
> 
> Again, the program runs fine from the command line but fails in qsub with the 
> same sort of error 
> message.
> 
> I have included (i) the code (ii) the job script for qsub, and (iii) the ".o" 
> file from qsub for the 
> "hello world" program.
> 
> These don't look like MPI errors, but rather some conflict with, maybe, 
> secure communication
> accross nodes.
> 
> Is there something simple I can do to fix this?
> 
> Thanks, Erik 
> 
> -- 
> Erik Nelson
> 
> Howard Hughes Medical Institute
> 6001 Forest Park Blvd., Room ND10.124
> Dallas, Texas 75235-9050
> 
> p : 214 645 5981
> f : 214 645 5948
> 
> 
> 
> -- 
> Erik Nelson
> 
> Howard Hughes Medical Institute
> 6001 Forest Park Blvd., Room ND10.124
> Dallas, Texas 75235-9050
> 
> p : 214 645 5981
> f : 214 645 5948
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] qsub error

2013-02-15 Thread Erik Nelson
I may have deleted any responses to this message. In either case, we appear
to have fixed the problem
by installing a more current version of openmpi.


On Thu, Feb 14, 2013 at 2:27 PM, Erik Nelson  wrote:

>
> I'm encountering an error using qsub that none of us can figure out. MPI
> C++ programs seem to
> run fine when executed from the command line, but for some reason when I
> submit them through
> the queue I get a strange error message ..
>
>
> [compute-3-12.local][[58672,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>
> connect() to 2002:8170:6c2f:b:21d:9ff:fefd:7d94 failed: Permission denied
> (13)
>
>
> the compute node 3-12 doesn't matter (the error can generate from any of
> the nodes, and I'm
> guessing that 3-12 is the parent node here).
>
> To check if there was some problem with my own code, I created a simple
> 'hello world' program
> (see attached files).
>
> Again, the program runs fine from the command line but fails in qsub with
> the same sort of error
> message.
>
> I have included (i) the code (ii) the job script for qsub, and (iii) the
> ".o" file from qsub for the
> "hello world" program.
>
> These don't look like MPI errors, but rather some conflict with, maybe,
> secure communication
> accross nodes.
>
> Is there something simple I can do to fix this?
>
> Thanks, Erik
>
> --
> Erik Nelson
>
> Howard Hughes Medical Institute
> 6001 Forest Park Blvd., Room ND10.124
> Dallas, Texas 75235-9050
>
> p : 214 645 5981
> f : 214 645 5948




-- 
Erik Nelson

Howard Hughes Medical Institute
6001 Forest Park Blvd., Room ND10.124
Dallas, Texas 75235-9050

p : 214 645 5981
f : 214 645 5948


[OMPI users] qsub error

2013-02-14 Thread Erik Nelson
I'm encountering an error using qsub that none of us can figure out. MPI
C++ programs seem to
run fine when executed from the command line, but for some reason when I
submit them through
the queue I get a strange error message ..


[compute-3-12.local][[58672,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]

connect() to 2002:8170:6c2f:b:21d:9ff:fefd:7d94 failed: Permission denied
(13)


the compute node 3-12 doesn't matter (the error can generate from any of
the nodes, and I'm
guessing that 3-12 is the parent node here).

To check if there was some problem with my own code, I created a simple
'hello world' program
(see attached files).

Again, the program runs fine from the command line but fails in qsub with
the same sort of error
message.

I have included (i) the code (ii) the job script for qsub, and (iii) the
".o" file from qsub for the
"hello world" program.

These don't look like MPI errors, but rather some conflict with, maybe,
secure communication
accross nodes.

Is there something simple I can do to fix this?

Thanks,

Erik Nelson

Howard Hughes Medical Institute
6001 Forest Park Blvd., Room ND10.124
Dallas, Texas 75235-9050

p : 214 645 5981
f : 214 645 5948
#include 
#include "/opt/openmpi/include/mpi.h"

#define bufdim128

int main(int argc, char *argv[])
{
char buffer[bufdim];
char id_str[32];

//  mpi :
MPI::Init(argc,argv);
MPI::Status status;

int size;
int rank;
int tag;

size=MPI::COMM_WORLD.Get_size();
rank=MPI::COMM_WORLD.Get_rank();
tag=0;

if (rank==0) {
	printf("%d: we have %d processors\n",rank,size);
	int i;
	i=1;
	for ( ;i