[OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-01 Thread Steve Wise
Hello, I'm seeing an error trying to run a simple OMPI job on a 2 node cluster where one node is a PPC64 BE byte order and the other is a X86_64 LE byte order node. OMPI 1.8.4 is configured with --enable-heterogeneous: ./configure --with-openib=/usr CC=gcc CXX=g++ F77=gfortran FC=gfortran

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-01 Thread Steve Wise
been so long since someone tried this that I’d have to look to remember what it does. On Jun 1, 2015, at 7:28 AM, Steve Wise <sw...@opengridcomputing.com> wrote: Hello, I'm seeing an error trying to run a simple OMPI job on a 2 node cluster where one node is a PPC64 BE byte

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-01 Thread Steve Wise
one of the settings that were printed out: P,128,256,192,128:S,2048,1024,1008,64:S,12288,1024,1008,64:S,65536,1024,1008,64 or P,65536,64 -Nathan On Mon, Jun 01, 2015 at 09:28:28AM -0500, Steve Wise wrote: Hello, I'm seeing an error trying to run a simple OMPI job on a 2 node cluster where one

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-01 Thread Steve Wise
On 6/1/2015 9:53 AM, Ralph Castain wrote: Well, I checked and it looks to me like —hetero-apps is a stale option in the master at least - I don’t see where it gets used. Looking at the code, I would suspect that something didn’t get configured correctly - either the —enable-heterogeneous flag

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-02 Thread Steve Wise
cannot handle when different MPI processes specific different receive queue specifications. You mentioned that the device ID is being incorrectly identified: is that OMPI's fault, or something wrong with the device itself? On Jun 1, 2015, at 6:06 PM, Steve Wise <sw...@opengridcomputing.com>

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-02 Thread Steve Wise
On 6/2/2015 10:04 AM, Ralph Castain wrote: On Jun 2, 2015, at 7:10 AM, Steve Wise <sw...@opengridcomputing.com <mailto:sw...@opengridcomputing.com>> wrote: On 6/1/2015 9:51 PM, Ralph Castain wrote: I’m wondering if it is also possible that the error message is simply prin

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-02 Thread Steve Wise
bootstrapping). :-) On Jun 2, 2015, at 10:04 AM, Ralph Castain <r...@open-mpi.org> wrote: On Jun 2, 2015, at 7:10 AM, Steve Wise <sw...@opengridcomputing.com> wrote: On 6/1/2015 9:51 PM, Ralph Castain wrote: I’m wondering if it is also possible that the error message is simply print

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-02 Thread Steve Wise
tain <r...@open-mpi.org <javascript:;>> wrote: > > >> On Jun 2, 2015, at 7:10 AM, Steve Wise <sw...@opengridcomputing.com <javascript:;>> wrote: >> >> On 6/1/2015 9:51 PM, Ralph Castain wrote: >>> I’m wondering if i

Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-06-10 Thread Steve Wise
c: Nathan Hjelm; Steve Wise > Subject: Re: [OMPI users] Default value of btl_openib_memalign_threshold > > Nathan / Steve -- you guys are nominally the owners of the openib BTL: can > you please investigate? > > > > On Jun 10, 2015, at 4:15 PM, Ralph Castain <r...@

Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-06-11 Thread Steve Wise
FYI: I opened: https://github.com/open-mpi/ompi/issues/638 to track this. Steve. On 6/10/2015 4:07 PM, Ralph Castain wrote: Done On Jun 10, 2015, at 1:55 PM, Steve Wise <sw...@opengridcomputing.com <mailto:sw...@opengridcomputing.com>> wrote: If you're trying to rele

Re: [OMPI users] Default value of btl_openib_memalign_threshold

2015-06-11 Thread Steve Wise
Hey Jeff, what did you run to generate the memory corruption? Can you run the same test with --mca btl_openib_memalign_threshold 12288 and see if you get the same corruption? I'm not hitting any corruption over iw_cxgb4 with a simple test. On 6/10/2015 2:39 PM, Jeff Squyres (jsquyres)

[OMPI users] padb and openmpi

2010-08-17 Thread Steve Wise
Hi, I'm trying to use padb 3.0 to get stack traces on open-mpi / IMB1 runs. While the job is running, I do run this, but get an error: [ompi@hpc-hn1 ~]$ padb --show-jobs --config-option rmgr=orte 65427 [ompi@hpc-hn1 ~]$ padb --all --proc-summary --config-option rmgr=orte Warning, failed to

[OMPI users] delivering SIGUSR2 to an ompi process

2010-08-25 Thread Steve Wise
Hey Open MPI wizards, I'm trying to debug something in my library that gets loaded into my mpi processes when they are started via mpirun. With other MPIs, I've been able to deliver SIGUSR2 to the process and trigger some debug code I have in my library that sets up a handler for SIGUSR2.

[OMPI users] newbie question

2007-05-10 Thread Steve Wise
I'm trying to run a job specifically over tcp and the eth1 interface. It seems to be barfing on trying to listen via ipv6. I don't want ipv6. How can I disable it? Here's my mpirun line: [root@vic12-10g ~]# mpirun --n 2 --host vic12,vic20 --mca btl self,tcp -mca btl_tcp_if_include eth1

Re: [OMPI users] newbie question

2007-05-10 Thread Steve Wise
On Thu, 2007-05-10 at 20:07 -0400, Jeff Squyres wrote: > Brian -- > > Didn't you add something to fix exactly this problem recently? I > have a dim recollection of seeing a commit go by about this...? > > (I advised Steve in IM to use --disable-ipv6 in the meantime) > Yes, disabling it

Re: [OMPI users] TCP Latency

2008-08-17 Thread Steve Wise
With OpenMPI 1.3 / iWARP you should get around 8us latency using mpi pingpong tests. Andy Georgi wrote: Thanks again for all the answers. It seems that were was a bug in the driver in combination with Suse Linux Enterprise Server 10. It was fixed with version 1.0.146. Now we have 12us with

Re: [OMPI users] TCP Bandwidth

2008-08-17 Thread Steve Wise
Andy Georgi wrote: Hello again ;), after getting acceptable latency on our Chelsio S320E-CXA adapters we now want to check if we can also tune the bandwidth. On TCP level (measured via iperf) we get 1.15 GB/s, on MPI level (measured via MPI-Ping-Pong) just 930 MB/s. We already set

Re: [OMPI users] TCP Bandwidth

2008-08-18 Thread Steve Wise
Jon Mason wrote: On Mon, Aug 18, 2008 at 10:00:24AM +0200, Andy Georgi wrote: Steve Wise wrote: Are you using Chelsio's TOE drivers? Or just a driver from the distro? We use the Chelsio TOE drivers. Steve Wise wrote: Ok. Did you run their perftune.sh script

Re: [OMPI users] TCP Bandwidth

2008-08-18 Thread Steve Wise
Andy Georgi wrote: Steve Wise wrote: Are you using Chelsio's TOE drivers? Or just a driver from the distro? We use the Chelsio TOE drivers. Steve Wise wrote: Ok. Did you run their perftune.sh script? Yes, if not we wouldn't get the 1.15 GB/s on the TCP level. We had ~800 MB/s before

Re: [OMPI users] QP creation failure on iWARP adapter

2016-02-06 Thread Steve Wise
On 2/5/2016 2:38 AM, dpchoudh . wrote: Dear all This is a slightly off-topic post, and hopefully people won't mind helping me out. I have a very simple setup with two PCs, both with identical Chelsio 10GE iWARP adapter connected back-to-back. With this setup, the TCP channel works fine