Re: [OMPI users] Segmentation fault in mca_btl_tcp

2010-04-15 Thread Werner Van Geit
> > We sometimes see mysterious crashes like this one. At least some of them > are caused by port scanners, i.e. unexpected non-mpi related packets > coming in on the sockets will sometimes cause havoc. > Port scanners etc I don't really see happening on our cluster, since the nodes are well s

Re: [OMPI users] Segmentation fault in mca_btl_tcp

2010-04-15 Thread Jeff Squyres
On Apr 15, 2010, at 3:18 AM, Ake Sandgren wrote: > We sometimes see mysterious crashes like this one. At least some of them > are caused by port scanners, i.e. unexpected non-mpi related packets > coming in on the sockets will sometimes cause havoc. Ooohhh... ouch. > We've been getting http traf

Re: [OMPI users] Segmentation fault in mca_btl_tcp

2010-04-15 Thread Werner Van Geit
Hi, Yeah, I understand that would be handy, but it's a bit difficult, but I'll see if I could make a simple test case. The problem is, sorry that I forgot to mention that, that this segmentation fault only seems to happen after running the code for a couple of hours (on 10-20 8-core nodes). And

Re: [OMPI users] Segmentation fault in mca_btl_tcp

2010-04-15 Thread Jeff Squyres (jsquyres)
Can you send a small program that reproduces the problem, perchance? -jms Sent from my PDA. No type good. - Original Message - From: users-boun...@open-mpi.org To: us...@open-mpi.org Sent: Thu Apr 15 01:57:10 2010 Subject: [OMPI users] Segmentation fault in mca_btl_tcp Hi, We are usi

Re: [OMPI users] Segmentation fault in mca_btl_tcp

2010-04-15 Thread Ake Sandgren
On Thu, 2010-04-15 at 15:57 +0900, Werner Van Geit wrote: > Hi, > > We are using openmpi 1.4.1 on our cluster computer (in conjunction with > Torque). One of our users has a problem with his jobs generating a > segmentation fault on one of the slaves, this is the backtrace: > > [cstone-00613:28