by 0x834F418: Uintah::AMRSimulationController::run()
(AMRSimulationController.cc:117)
==22736==by 0x4089AE: main (sus.cc:629)
Are these problems with openmpi and is there any known work arounds?
Thanks,
Justin
in ompi_request_default_wait_some () from
/opt/apps/intel10_1/openmpi/1.3//lib/libmpi.so.0
#4 0x2b2ded109e34 in PMPI_Waitsome () from
/opt/apps/intel10_1/openmpi/1.3//lib/libmpi.so.0
Thanks,
Justin
that will turn off buffering?
Thanks,
Justin
Brock Palen wrote:
When ever this happens we found the code to have a deadlock. users
never saw it until they cross the eager->roundevous threshold.
Yes you can disable shared memory with:
mpirun --mca btl ^sm
Or you can try increasing the eager li
reproduceable.In addition we might be able to lower the
number of processors down. Right now determining which processor is
deadlocks when we are using 8K cores and each processor has hundreds of
messages sent out would be quite difficult.
Thanks for your suggestions,
Justin
Brock Palen wrote
that might alleviate these deadlocks I would be
grateful.
Thanks,
Justin
Rolf Vandevaart wrote:
The current version of Open MPI installed on ranger is 1.3a1r19685
which is from early October. This version has a fix for ticket
#1378. Ticket #1449 is not an issue is this case because each node
?
Thanks,
Justin
Jeff Squyres wrote:
George --
Is this the same issue that you're working on?
(we have a "blocker" bug for v1.3 about deadlock at heavy messaging
volume -- on Tuesday, it looked like a bug in our freelist...)
On Dec 9, 2008, at 10:28 AM, Justin wrote:
I have tried
Hi, has this deadlock been fixed in the 1.3 source yet?
Thanks,
Justin
Jeff Squyres wrote:
On Dec 11, 2008, at 5:30 PM, Justin wrote:
The more I look at this bug the more I'm convinced it is with openMPI
and not our code. Here is why: Our code generates a
communication/execution
to update it but it would be a lot easier to request an actual release.
What is the current schedule for the 1.3 release?
Justin
Jeff Squyres wrote:
Justin --
Could you actually give your code a whirl with 1.3rc3 to ensure that
it fixes the problem for you?
http://www.open-mpi.org
My guess would be that your count argument is overflowing. Is the count
a signed 32 bit integer? If so it will overflow around 2GB. Try
outputting the size that you are sending and see if you get large
negative number.
Justin
Vittorio wrote:
Hi! I'm doing a test to measure the transfer
double&) (SimulationController.cc:352)
==3629==by 0x89A8568: Uintah::AMRSimulationController::run()
(AMRSimulationController.cc:126)
==3629==by 0x408B9F: main (sus.cc:622)
This is then followed by a segfault.
Justin
Jeff Squyres wrote:
On Feb 26, 2009, at 7:03 PM, Justin wrote:
I'm trying to
Also the stable version of openmpi on Debian is 1.2.7rc2. Are there any
known issues with this version and valgrid?
Thanks,
Justin
Justin wrote:
Is there any tricks to getting it to work? When we run with valgrind
we get segfaults, valgrind reports errors in different MPI functions
Have you tried MPI_Probe?
Justin
Shaun Jackman wrote:
Is there a function similar to MPI_Test that doesn't deallocate the
MPI_Request object? I would like to test if a message has been
received (MPI_Irecv), check its tag, and dispatch the MPI_Request to
another function based on that tag
that there is no
message waiting to be received? The message has already been received
by the MPI_Irecv. It's the MPI_Request object of the MPI_Irecv call
that needs to be probed, but MPI_Test has the side effect of also
deallocating the MPI_Request object.
Cheers,
Shaun
Justin wrote:
Have you
if (getenv("FAKEROOTKEY") != NULL ||
-getenv("FAKED_MODE") != NULL) {
+getenv("FAKED_MODE") != NULL ||
+getenv("SANDBOX_PID") != NULL ) {
return;
}
--
1.8.1.5
--
Justin Bronder
signature.asc
Description: Digital signature
don't get any segfaults.
-Justin.
On 07/26/2011 05:49 PM, Ralph Castain wrote:
I don't believe we ever got anywhere with this due to lack of response. If you
get some info on what happened to tm_init, please pass it along.
Best guess: something changed in a recent PBS Pro release. Since none of us
Cluster hangs/shows error while executing simple MPI program in C
I am trying to run a simple MPI program(multiple array addition), it
runs perfectly in my PC but simply hangs or shows the following error in the
cluster.
I am using open mpi and the following command to execute .
mpirun
ionController.cc:243)
==22736==by 0x834F418: Uintah::AMRSimulationController::run()
(AMRSimulationController.cc:117)
==22736==by 0x4089AE: main (sus.cc:629)
Are these problems with openmpi and is there any known work arounds?
Thanks,
Justin
Why not do something like this:
double **A=new double*[N];
double *A_data new double [N*N];
for(int i=0;i
use
>> MPI_Create_type_struct to create an MPI datatype (
>> http://web.mit.edu/course/13/13.715/OldFiles/build/mpich2-1.0.6p1/www/www3/MPI_Type_create_struct.html
>> )
>> using MPI_BOTTOM as the original displacement.
>>
>> On Oct 29, 2009, at 15:31 , Justin Luitje
OpenMPI:
jbronder@mejis ~ $ which mpicc
/usr/lib64/mpi/mpi-openmpi/usr/bin/mpicc
jbronder@mejis ~ $ mpicc -showme:compile -I/bleh
-I/usr/lib64/mpi/mpi-openmpi/usr/include/openmpi -pthread -I/bleh
Thanks,
--
Justin Bronder
pgpUpu5h4BdhJ.pgp
Description: PGP signature
/win32/CMakeModules/setup_f77.cmake:26
(OMPI_F77_FIND_EXT_SYMBOL_CONVENTION)
contrib/platform/win32/CMakeModules/ompi_configure.cmake:1113 (INCLUDE)
CMakeLists.txt:87 (INCLUDE)
Configuring incomplete, errors occurred!
Has anyone had success in building with a similar configuration?
Justin K. Watson
within the context of a module as well?
I have been getting different result using different compilers.
I have tried Lahey and Intel and they both show signs of not handling this
properly. I have attach a small test problem that mimics what I am doing in
the large code.
Justin K
/bin/mpicc
MPI_INCLUDE=/programs/openmpi/include/
MPI_LIB=mpi
MPI_LIBDIR=/programs/openmpi/lib/
MPI_LINKERFORPROGRAMS=/programs/openmpi/bin/mpicxx
Any clue? The directory /programs is NSF mounted on the nodes.
Many thanks again,
JO
--- On Thu, 3/5/09, justin oppenheim <j
gt;
List-Post: users@lists.open-mpi.org
Date: Saturday, March 14, 2009, 9:15 AM
Sorry for the delay in replying; this week unexpectedly turned exceptionally
hectic for several us...
On Mar 9, 2009, at 2:53 PM, justin oppenheim wrote:
> Yes. As I indicated earlier, I did use these options
MCA topo: basic (MCA v2.0.0, API v2.1.0, Component v1.10.2)
MCA vprotocol: pessimist (MCA v2.0.0, API v2.0.0, Component
v1.10.2)
Thanks,
Justin
---
This email message is for
We have figured this out. It turns out that the first call to each
MPI_Isend/Irecv is staged through the host but subsequent calls are not.
Thanks,
Justin
From: Justin Luitjens
Sent: Wednesday, March 30, 2016 9:37 AM
To: us...@open-mpi.org
Subject: CUDA IPC/RDMA Not Working
Hello,
I have
Fork call location:
https://github.com/open-mpi/ompi-release/blob/v2.x/orte/mca/plm/rsh/plm_rsh_module.c#L911-921
BR Justin
On 07/14/2016 03:12 PM, larkym wrote:
Where in the code does the tree based launch via ssh occur in open-mpi?
I have read a few articles, but would like to understand
Brian Barrett wrote:
> On May 27, 2006, at 10:01 AM, Justin Bronder wrote:
>
>
>> I've attached the required logs. Essentially the problem seems to
>> be that the XL Compilers fail to recognize "__asm__ __volatile__" in
>> opal/include/sys/powerpc/atom
On 5/30/06, Brian Barrett <brbar...@open-mpi.org> wrote:
On May 28, 2006, at 8:48 AM, Justin Bronder wrote:
> Brian Barrett wrote:
>> On May 27, 2006, at 10:01 AM, Justin Bronder wrote:
>>
>>
>>> I've attached the required logs. Essentially the problem s
l/.libs/libopal.so
../../../opal/.libs/libopal.so -ldl -lm -lutil -lnsl --rpath
/usr/local/ompi-xl/lib -lpthread
ld: warning: cannot find entry symbol _start; defaulting to 10013ed8
Of course, I've been told that directly linking with ld isn't such a great
idea in the first
place. Ideas?
Thanks,
Justin.
sr/src/openmpi-1.1 jbronder$
My thanks for any help in advance,
Justin Bronder.
ompi_info.log.gz
Description: GNU Zip compressed data
As far as the nightly builds go, I'm still seeing what I believe to be
this problem in both r10670 and r10652. This is happening with
both Linux and OS X. Below are the systems and ompi_info for the
newest revision 10670.
As an example of the error, when running HPL with Myrinet I get the
Disregard the failure on Linux, a rebuild from scratch of HPL and OpenMPI
seems to have resolved the issue. At least I'm not getting the errors
during
the residual checks.
However, this is persisting under OS X.
Thanks,
Justin.
On 7/6/06, Justin Bronder <jsbron...@gmail.com> wrote:
Fo
yrinet (GM)? If so, I'd
love to hear
the configure arguments and various versions you are using. Bonus points if
you are
using the IBM XL compilers.
Thanks,
Justin.
On 7/6/06, Justin Bronder <jsbron...@gmail.com> wrote:
Yes, that output was actually cut and pasted from an OS X run. I
the build with the standard gcc compilers that are
included
with OS X. This is powerpc-apple-darwin8-gcc-4.0.1.
Thanks,
Justin.
Jeff Squyres (jsquyres) wrote:
> Justin --
>
> Can we eliminate some variables so that we can figure out where the
> error is originating?
>
> - Can
On a number of my Linux machines, /usr/local/lib is not searched by
ldconfig, and hence, is
not going to be found by gcc. You can fix this by adding /usr/local/lib to
/etc/ld.so.conf and
running ldconfig ( add the -v flag if you want to see the output ).
-Justin.
On 10/19/06, Durga Choudhury
owing fails:
/usr/local/ompi-gnu/bin/mpirun -np 4 -mca btl gm --host node84,node83 ./xhpl
I've attached gziped files as suggested on the "Getting Help" section of the
website and the output from the failed mpirun. Both nodes are known good
Myrinet nodes, using FMA to map.
Thanks
arwin15.6.0
Thread model: posix
InstalledDir:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
I tested Hello World with both mpicc and mpif90, and they still work
despite showing those two error/warning messages.
Thanks,
Justin
___
prior to that error indicates that you have some cruft
> sitting in your tmpdir. You just need to clean it out - look for something
> that starts with “openmpi”
>
>
>> On Sep 22, 2016, at 1:45 AM, Justin Chang <jychan...@gmail.com> wrote:
>>
>> Dear all,
>>
Oh, so setting this in my ~/.profile
export TMPDIR=/tmp
in fact solves my problem completely! Not sure why this is the case, but thanks!
Justin
On Thu, Sep 22, 2016 at 7:33 AM, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com> wrote:
> Justin,
>
> i do not see this err
Thank you, using the default $TMPDIR works now.
On Fri, Sep 30, 2016 at 7:32 AM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:
> Justin and all,
>
> the root cause is indeed a bug i fixed in
> https://github.com/open-mpi/ompi/pull/2135
> i also had this patch
to change to get around this error?
Thanks,
Justin
---
This email message is for the sole use of the intended recipient(s) and may
contain
confidential information. Any unauthorized review, use, disclosure or
distributi
I'd suggest updating the configure/make scripts to look for nvml there
and link in the stubs. This way the build is not dependent on the driver being
installed and only the toolkit.
Thanks,
Justin
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Justin
Luitjens
Sent: Tuesday
of what I could try to work around this issue?
Thanks,
Justin
---
This email message is for the sole use of the intended recipient(s) and may
contain
confidential information. Any unauthorized review, use
I'm trying to build OpenMPI on Ubuntu 16.04.3 and I'm getting an error.
Here is how I configure and build:
./configure --with-cuda=$CUDA_HOME --prefix=$MPI_HOME && make clean && make -j
&& make install
Here is the error I see:
make[2]: Entering directory
them look at why it was failing to do the tm_init. Does anyone have an update
to this, and has anyone been able to run successfully using recent versions of
PBSPro? I've also contacted our rep at Altair, but he hasn't responded yet.
Thanks, Justin.
Justin Wood
Systems Engineer
FNMOC | SAIC
7
That is not guaranteed to work. There is no streaming concept in the MPI
standard. The fundamental issue here is MPI is only asynchronous on the
completion and not the initiation of the send/recv.
It would be nice if the next version of mpi would look to add something like a
triggered send
47 matches
Mail list logo