You can explicitly specify the type of buffering
that you want to get with setvbuf() C function.
It can be block-buffered, line-buffered and unbuffered.
Stdout is line-buffered by default.
To make it un-buffered, you need something like this:
setvbuf(stdout, NULL, _IONBF, 0)
-- YK
On 30-Mar-11
Michael,
Could you try to run this again with "--mca mpi_leave_pinned 0" parameter?
I suspect that this might be due to a message size problem - MPI
tries to do RDMA with a message bigger than what HCA supports.
-- YK
On 11-Apr-11 7:44 PM, Michael Di Domenico wrote:
> Here's a chunk of code
Gretchen,
Could you please send stack-trace of the processes when it hangs? (with
padb/gdb)
Does the same problem persist in small scale (2,3 nodes)?
What is the minimal setup that reproduces the problem?
-- YK
>
> -- Forwarded message --
> From: *Gretchen*
Hi Bill,
On 08-Jul-11 7:59 PM, Bill Johnstone wrote:
> Hello, and thanks for the reply.
>
>
>
> - Original Message -
>> From: Jeff Squyres
>> Sent: Thursday, July 7, 2011 5:14 PM
>> Subject: Re: [OMPI users] InfiniBand, different OpenFabrics transport types
>>
>> On
Hi Yiguang,
On 08-Jul-11 4:38 PM, ya...@adina.com wrote:
> Hi all,
>
> The message says :
>
> [[17549,1],0][btl_openib_component.c:3224:handle_wc] from
> gulftown to: gulftown error polling LP CQ with status LOCAL
> LENGTH ERROR status number 1 for wr_id 492359816 opcode
> 32767 vendor error
On 11-Jul-11 5:23 PM, Bill Johnstone wrote:
> Hi Yevgeny and list,
>
> - Original Message -
>
>> From: Yevgeny Kliteynik<klit...@dev.mellanox.co.il>
>
>> I'll check the MCA_BTL_OPENIB_TRANSPORT_UNKNOWN thing and get back to you.
&g
Hi,
Please try running OMPI with XRC:
mpirun --mca btl openib... --mca btl_openib_receive_queues
X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32 ...
XRC (eXtended Reliable Connection) decreases memory consumption
of Open MPI by decreasing number of QP per machine.
plication Performance Tools Group
> Computer Science and Math Division
> Oak Ridge National Laboratory
>
>
>
>
>
>
> On Aug 1, 2011, at 11:41 AM, Yevgeny Kliteynik wrote:
>
>> Hi,
>>
>> Please try running OMPI with XRC:
>>
>> m
On 30-Aug-11 4:50 PM, Michael Shuey wrote:
> I'm using RoCE (or rather, attempting to) and need to select a
> non-default GID to get my traffic properly classified.
You probably saw it, but just making sure:
http://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce
> Both 1.4.4rc2
> and
This means that you have some problem on that node,
and it's probably unrelated to Open MPI.
Bad cable? Bad port? FW/driver in some bad state?
Do other IB performance tests work OK on this node?
Try rebooting the node.
-- YK
On 12-Sep-11 7:52 AM, Ahsan Ali wrote:
> Hello all
>
> I am getting
On 14-Sep-11 12:59 PM, Jeff Squyres wrote:
> On Sep 13, 2011, at 6:33 PM, kevin.buck...@ecs.vuw.ac.nz wrote:
>
>> there have been two runs of jobs that invoked the mpirun using these
>> OpenMPI parameter setting flags (basically, these mimic what I have
>> in the global config file)
>>
>> -mca
Hi Sébastien,
If I understand you correctly, you are running your application on two
different MPIs on two different clusters with two different IB vendors.
Could you make a comparison more "apples to apples"-ish?
For instance:
- run the same version of Open MPI on both clusters
- run the same
On 22-Sep-11 12:09 AM, Jeff Squyres wrote:
> On Sep 21, 2011, at 4:24 PM, Sébastien Boisvert wrote:
>
>>> What happens if you run 2 ibv_rc_pingpong's on each node? Or N
>>> ibv_rc_pingpongs?
>>
>> With 11 ibv_rc_pingpong's
>>
>> http://pastebin.com/85sPcA47
>>
>> Code to do that =>
On 26-Sep-11 11:27 AM, Yevgeny Kliteynik wrote:
> On 22-Sep-11 12:09 AM, Jeff Squyres wrote:
>> On Sep 21, 2011, at 4:24 PM, Sébastien Boisvert wrote:
>>
>>>> What happens if you run 2 ibv_rc_pingpong's on each node? Or N
>>>> ibv_rc_pingpongs?
>>
Jeff,
On 01-Oct-11 1:01 AM, Konz, Jeffrey (SSA Solution Centers) wrote:
> Encountered a problem when trying to run OpenMPI 1.5.4 with RoCE over 10GbE
> fabric.
>
> Got this run time error:
>
> An invalid CPC name was specified via the btl_openib_cpc_include MCA
> parameter.
>
>Local host:
On 05-Oct-11 3:15 PM, Jeff Squyres wrote:
>> You shouldn't use the "--enable-openib-rdmacm" option - rdmacm
>> support is enabled by default, providing librdmacm is found on
>> the machine.
>
> Actually, this might be a configure bug. We have lots of other configure
> options that, even if
On 05-Oct-11 3:41 PM, Jeff Squyres wrote:
> On Oct 5, 2011, at 9:35 AM, Yevgeny Kliteynik wrote:
>
>>> Yevgeny -- can you check that out?
>>
>> Yep, indeed - configure doesn't abort when "--enable-openib-rdmacm"
>> is provided and "rdma/rdma_cma.
Hi,
> By any chance is it a particular node (or pair of nodes) this seems to
> happen with?
No. I've got 40 nodes total with this hardware configuration, and the
problem has been seen on most/all nodes at one time or another. It
doesn't seem, based on the limited
Hi,
Does OMPI with IMP work OK on the official OFED release?
Do the usual ibv performance tests (ibv_rc_*) work on your customized OFED?
-- YK
On 29-Dec-11 9:34 AM, Venkateswara Rao Dokku wrote:
> Hi,
> We tried running the Intel Benchmarks(IMB_3.2) on the customized
> OFED(that was build
On 13-Jan-12 12:23 AM, Nathan Hjelm wrote:
> I would start by adjusting btl_openib_receive_queues . The default uses
> a per-peer QP which can eat up a lot of memory. I recommend using no
> per-peer and several shared receive queues.
> We use S,4096,1024:S,12288,512:S,65536,512
And here's the FAQ
On 24-Jan-12 5:59 PM, Ronald Heerema wrote:
> I was wondering if anyone can comment on the current state of support for the
> openib btl when MPI_THREAD_MULTIPLE is enabled.
Short version - it's not supported.
Longer version - no one really spent time on testing it and fixing all
the places
Hi,
I just noticed that my previous mail bounced,
but it doesn't matter. Please ignore it if
you got it anyway - I re-read the thread and
there is a much simpler way to do it.
If you want to check whether LID L is reachable
through HCA H from port P, you can run this command:
smpquery --Ca H
Randolph,
Some clarification on the setup:
"Melanox III HCA 10G cards" - are those ConnectX 3 cards configured to Ethernet?
That is, when you're using openib BTL, you mean RoCE, right?
Also, have you had a chance to try some newer OMPI release?
Any 1.6.x would do.
-- YK
On 8/31/2012 10:53
On 8/30/2012 10:28 PM, Yong Qin wrote:
> On Thu, Aug 30, 2012 at 5:12 AM, Jeff Squyres wrote:
>> On Aug 29, 2012, at 2:25 PM, Yong Qin wrote:
>>
>>> This issue has been observed on OMPI 1.6 and 1.6.1 with openib btl but
>>> not on 1.4.5 (tcp btl is always fine). The
On 9/4/2012 7:21 PM, Yong Qin wrote:
> On Tue, Sep 4, 2012 at 5:42 AM, Yevgeny Kliteynik
> <klit...@dev.mellanox.co.il> wrote:
>> On 8/30/2012 10:28 PM, Yong Qin wrote:
>>> On Thu, Aug 30, 2012 at 5:12 AM, Jeff Squyres<jsquy...@cisco.com> wrote:
>>>>
------
Randolph,
On 9/7/2012 7:43 AM, Randolph Pullen wrote:
> Yevgeny,
> The ibstat results:
> CA 'mthca0'
> CA type: MT25208 (MT23108 compat mode)
What you have is InfiniHost III HCA, which is 4x SDR card.
This card has theoretical peak of 10 Gb/s, which is 1GB/s in IB bit coding.
> And more
-
> *Fr
On 11/28/2012 10:52 AM, Pavel Mezentsev wrote:
> You can try downloading and installing a fresher version of MXM from mellanox
> web site. There was a thread on the list with the same problem, you can
> search for it.
Indeed, that OFED version comes with older version of MXM.
You can get the
Joseph,
On 11/29/2012 11:50 PM, Joseph Farran wrote:
> make[2]: Entering directory
> `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
> CC mtl_mxm.lo
> CC mtl_mxm_cancel.lo
> CC mtl_mxm_component.lo
> CC mtl_mxm_endpoint.lo
> CC mtl_mxm_probe.lo
> CC mtl_mxm_recv.lo
> CC mtl_mxm_send.lo
>
On 11/30/2012 12:47 AM, Joseph Farran wrote:
> I'll assume: /etc/modprobe.d/mlx4_en.conf
Add these to /etc/modprobe.d/mofed.conf:
options mlx4_core log_num_mtt=24
options mlx4_core log_mtts_per_seg=1
And then restart the driver.
You need to do it on all the machines.
-- YK
>
> On 11/29/2012
You can also set these parameters in /etc/modprobe.conf:
options mlx4_core log_num_mtt=24 log_mtts_per_seg=1
-- YK
On 11/30/2012 2:12 AM, Yevgeny Kliteynik wrote:
> On 11/30/2012 12:47 AM, Joseph Farran wrote:
>> I'll assume: /etc/modprobe.d/mlx4_en.conf
>
> Add these to
Joseph,
Indeed, there was a problem in the MXM rpm.
The fixed MXM has been published at the same location:
http://mellanox.com/downloads/hpc/mxm/v1.1/mxm-latest.tar
-- YK
On 12/4/2012 9:20 AM, Joseph Farran wrote:
> Hi Mike.
>
> Removed the old mxm, downloaded and installed:
>
>
33 matches
Mail list logo