[OMPI users] OpenMPI 3.0.0 Failing To Compile

2018-02-28 Thread Justin Luitjens
I'm trying to build OpenMPI on Ubuntu 16.04.3 and I'm getting an error. Here is how I configure and build: ./configure --with-cuda=$CUDA_HOME --prefix=$MPI_HOME && make clean && make -j && make install Here is the error I see: make[2]: Entering directory

[OMPI users] Crash in libopen-pal.so

2017-06-19 Thread Justin Luitjens
of what I could try to work around this issue? Thanks, Justin --- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-18 Thread Justin Luitjens
I'd suggest updating the configure/make scripts to look for nvml there and link in the stubs. This way the build is not dependent on the driver being installed and only the toolkit. Thanks, Justin From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Justin Luitjens Sent: Tuesday

[OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-18 Thread Justin Luitjens
to change to get around this error? Thanks, Justin --- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distributi

Re: [OMPI users] Strange errors when running mpirun

2016-09-30 Thread Justin Chang
Thank you, using the default $TMPDIR works now. On Fri, Sep 30, 2016 at 7:32 AM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Justin and all, > > the root cause is indeed a bug i fixed in > https://github.com/open-mpi/ompi/pull/2135 > i also had this patch

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread Justin Chang
Oh, so setting this in my ~/.profile export TMPDIR=/tmp in fact solves my problem completely! Not sure why this is the case, but thanks! Justin On Thu, Sep 22, 2016 at 7:33 AM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com> wrote: > Justin, > > i do not see this err

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread Justin Chang
I tried that and also deleted everything inside $TMPDIR. The error still persists On Thu, Sep 22, 2016 at 4:21 AM, r...@open-mpi.org <r...@open-mpi.org> wrote: > Try removing the “pmix” entries as well > >> On Sep 22, 2016, at 2:19 AM, Justin Chang <jychan...@gmail.com> w

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread Justin Chang
prior to that error indicates that you have some cruft > sitting in your tmpdir. You just need to clean it out - look for something > that starts with “openmpi” > > >> On Sep 22, 2016, at 1:45 AM, Justin Chang <jychan...@gmail.com> wrote: >> >> Dear all, >>

[OMPI users] Strange errors when running mpirun

2016-09-22 Thread Justin Chang
arwin15.6.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin I tested Hello World with both mpicc and mpif90, and they still work despite showing those two error/warning messages. Thanks, Justin ___

Re: [OMPI users] Ssh launch code

2016-07-14 Thread Justin Cinkelj
Fork call location: https://github.com/open-mpi/ompi-release/blob/v2.x/orte/mca/plm/rsh/plm_rsh_module.c#L911-921 BR Justin On 07/14/2016 03:12 PM, larkym wrote: Where in the code does the tree based launch via ssh occur in open-mpi? I have read a few articles, but would like to understand

Re: [OMPI users] CUDA IPC/RDMA Not Working

2016-03-30 Thread Justin Luitjens
We have figured this out. It turns out that the first call to each MPI_Isend/Irecv is staged through the host but subsequent calls are not. Thanks, Justin From: Justin Luitjens Sent: Wednesday, March 30, 2016 9:37 AM To: us...@open-mpi.org Subject: CUDA IPC/RDMA Not Working Hello, I have

[OMPI users] CUDA IPC/RDMA Not Working

2016-03-30 Thread Justin Luitjens
MCA topo: basic (MCA v2.0.0, API v2.1.0, Component v1.10.2) MCA vprotocol: pessimist (MCA v2.0.0, API v2.0.0, Component v1.10.2) Thanks, Justin --- This email message is for

[OMPI users] [PATCH] hooks: disable malloc override inside of Gentoo sandbox

2013-07-02 Thread Justin Bronder
if (getenv("FAKEROOTKEY") != NULL || -getenv("FAKED_MODE") != NULL) { +getenv("FAKED_MODE") != NULL || +getenv("SANDBOX_PID") != NULL ) { return; } -- 1.8.1.5 -- Justin Bronder signature.asc Description: Digital signature

[OMPI users] Cluster hangs/shows error while executing simple MPI program in C

2013-03-05 Thread Justin Joseph
Cluster hangs/shows error while executing simple MPI program in C I am trying to run a simple MPI program(multiple array addition), it runs perfectly in my PC but simply hangs or shows the following error in the cluster. I am using open mpi and the following command to execute . mpirun

Re: [OMPI users] Seg fault with PBS Pro 10.4

2011-07-27 Thread Justin Wood
don't get any segfaults. -Justin. On 07/26/2011 05:49 PM, Ralph Castain wrote: I don't believe we ever got anywhere with this due to lack of response. If you get some info on what happened to tm_init, please pass it along. Best guess: something changed in a recent PBS Pro release. Since none of us

[OMPI users] Seg fault with PBS Pro 10.4

2011-07-26 Thread Wood, Justin Contractor, SAIC
them look at why it was failing to do the tm_init. Does anyone have an update to this, and has anyone been able to run successfully using recent versions of PBSPro? I've also contacted our rep at Altair, but he hasn't responded yet. Thanks, Justin. Justin Wood Systems Engineer FNMOC | SAIC 7

[OMPI users] Problem with private variables in modules

2010-03-10 Thread Justin Watson
within the context of a module as well? I have been getting different result using different compilers. I have tried Lahey and Intel and they both show signs of not handling this properly. I have attach a small test problem that mimics what I am doing in the large code. Justin K

[OMPI users] building OpenMPI on Windows XP 64 using Visual Studio 6 and Compaq Visual Fortran

2010-01-28 Thread Justin Watson
/win32/CMakeModules/setup_f77.cmake:26 (OMPI_F77_FIND_EXT_SYMBOL_CONVENTION) contrib/platform/win32/CMakeModules/ompi_configure.cmake:1113 (INCLUDE) CMakeLists.txt:87 (INCLUDE) Configuring incomplete, errors occurred! Has anyone had success in building with a similar configuration? Justin K. Watson

Re: [OMPI users] Wrappers should put include path *after* user args

2010-01-19 Thread Justin Bronder
OpenMPI: jbronder@mejis ~ $ which mpicc /usr/lib64/mpi/mpi-openmpi/usr/bin/mpicc jbronder@mejis ~ $ mpicc -showme:compile -I/bleh -I/usr/lib64/mpi/mpi-openmpi/usr/include/openmpi -pthread -I/bleh Thanks, -- Justin Bronder pgpUpu5h4BdhJ.pgp Description: PGP signature

Re: [OMPI users] MPI-Send for entire entire matrix when allocating memory dynamically

2009-10-31 Thread Justin Luitjens
use >> MPI_Create_type_struct to create an MPI datatype ( >> http://web.mit.edu/course/13/13.715/OldFiles/build/mpich2-1.0.6p1/www/www3/MPI_Type_create_struct.html >> ) >> using MPI_BOTTOM as the original displacement. >> >> On Oct 29, 2009, at 15:31 , Justin Luitje

Re: [OMPI users] MPI-Send for entire entire matrix when allocating memory dynamically

2009-10-29 Thread Justin Luitjens
Why not do something like this: double **A=new double*[N]; double *A_data new double [N*N]; for(int i=0;i

Re: [OMPI users] Segfault when using valgrind

2009-07-09 Thread Justin Luitjens
. Thanks, Justin On Thu, Jul 9, 2009 at 5:16 AM, Jeff Squyres <jsquy...@cisco.com> wrote: > On Jul 7, 2009, at 11:47 AM, Justin wrote: > > (Sorry if this is posted twice, I sent the same email yesterday but it >> never appeared on the list). >> >> > Sorry for

[OMPI users] Segfault when using valgrind

2009-07-07 Thread Justin
by 0x834F418: Uintah::AMRSimulationController::run() (AMRSimulationController.cc:117) ==22736==by 0x4089AE: main (sus.cc:629) Are these problems with openmpi and is there any known work arounds? Thanks, Justin

[OMPI users] Segfault when using valgrind

2009-07-06 Thread Justin Luitjens
ionController.cc:243) ==22736==by 0x834F418: Uintah::AMRSimulationController::run() (AMRSimulationController.cc:117) ==22736==by 0x4089AE: main (sus.cc:629) Are these problems with openmpi and is there any known work arounds? Thanks, Justin

Re: [OMPI users] MPI_Test without deallocation

2009-03-25 Thread Justin
that there is no message waiting to be received? The message has already been received by the MPI_Irecv. It's the MPI_Request object of the MPI_Irecv call that needs to be probed, but MPI_Test has the side effect of also deallocating the MPI_Request object. Cheers, Shaun Justin wrote: Have you

Re: [OMPI users] MPI_Test without deallocation

2009-03-25 Thread Justin
Have you tried MPI_Probe? Justin Shaun Jackman wrote: Is there a function similar to MPI_Test that doesn't deallocate the MPI_Request object? I would like to test if a message has been received (MPI_Irecv), check its tag, and dispatch the MPI_Request to another function based on that tag

Re: [OMPI users] Run-time problem

2009-03-16 Thread justin oppenheim
gt; List-Post: users@lists.open-mpi.org Date: Saturday, March 14, 2009, 9:15 AM Sorry for the delay in replying; this week unexpectedly turned exceptionally hectic for several us... On Mar 9, 2009, at 2:53 PM, justin oppenheim wrote: > Yes. As I indicated earlier, I did use these options

Re: [OMPI users] Run-time problem

2009-03-09 Thread justin oppenheim
ph On Mar 6, 2009, at 11:02 AM, justin oppenheim wrote: Please let me go over it again, and maybe it helps clarifying things a bit better. All the OS involved are Suse 10.3. I have a place for the the installed programs, say /programs. In /programs I have installed openmpi and my mpi program, say my

Re: [OMPI users] Run-time problem

2009-03-06 Thread justin oppenheim
/bin/mpicc MPI_INCLUDE=/programs/openmpi/include/ MPI_LIB=mpi MPI_LIBDIR=/programs/openmpi/lib/ MPI_LINKERFORPROGRAMS=/programs/openmpi/bin/mpicxx Any clue? The directory /programs is NSF mounted on the nodes. Many thanks again, JO --- On Thu, 3/5/09, justin oppenheim <j

[OMPI users] Run-time problem

2009-03-05 Thread justin oppenheim
Hi: When I execute something like mpirun -machinefile machinefile my_mpi_executable I get something like this my_mpi_executable symbol lookup error: remote_openmpi/lib/libmpi_cxx.so.0: undefined symbol: ompi_registered_datareps where both my_mpi_executable and remote_openmpi are installed

Re: [OMPI users] valgrind problems

2009-02-26 Thread Justin
Also the stable version of openmpi on Debian is 1.2.7rc2. Are there any known issues with this version and valgrid? Thanks, Justin Justin wrote: Is there any tricks to getting it to work? When we run with valgrind we get segfaults, valgrind reports errors in different MPI functions

Re: [OMPI users] valgrind problems

2009-02-26 Thread Justin
double&) (SimulationController.cc:352) ==3629==by 0x89A8568: Uintah::AMRSimulationController::run() (AMRSimulationController.cc:126) ==3629==by 0x408B9F: main (sus.cc:622) This is then followed by a segfault. Justin Jeff Squyres wrote: On Feb 26, 2009, at 7:03 PM, Justin wrote: I'm trying to

Re: [OMPI users] MPI_Send over 2 GB

2009-02-18 Thread Justin
My guess would be that your count argument is overflowing. Is the count a signed 32 bit integer? If so it will overflow around 2GB. Try outputting the size that you are sending and see if you get large negative number. Justin Vittorio wrote: Hi! I'm doing a test to measure the transfer

Re: [OMPI users] Deadlock on large numbers of processors

2009-01-12 Thread Justin
to update it but it would be a lot easier to request an actual release. What is the current schedule for the 1.3 release? Justin Jeff Squyres wrote: Justin -- Could you actually give your code a whirl with 1.3rc3 to ensure that it fixes the problem for you? http://www.open-mpi.org

Re: [OMPI users] Deadlock on large numbers of processors

2009-01-12 Thread Justin
Hi, has this deadlock been fixed in the 1.3 source yet? Thanks, Justin Jeff Squyres wrote: On Dec 11, 2008, at 5:30 PM, Justin wrote: The more I look at this bug the more I'm convinced it is with openMPI and not our code. Here is why: Our code generates a communication/execution

Re: [OMPI users] Deadlock on large numbers of processors

2008-12-11 Thread Justin
? Thanks, Justin Jeff Squyres wrote: George -- Is this the same issue that you're working on? (we have a "blocker" bug for v1.3 about deadlock at heavy messaging volume -- on Tuesday, it looked like a bug in our freelist...) On Dec 9, 2008, at 10:28 AM, Justin wrote: I have tried

Re: [OMPI users] Deadlock on large numbers of processors

2008-12-09 Thread Justin
that might alleviate these deadlocks I would be grateful. Thanks, Justin Rolf Vandevaart wrote: The current version of Open MPI installed on ranger is 1.3a1r19685 which is from early October. This version has a fix for ticket #1378. Ticket #1449 is not an issue is this case because each node

Re: [OMPI users] Deadlock on large numbers of processors

2008-12-05 Thread Justin
reproduceable.In addition we might be able to lower the number of processors down. Right now determining which processor is deadlocks when we are using 8K cores and each processor has hundreds of messages sent out would be quite difficult. Thanks for your suggestions, Justin Brock Palen wrote

Re: [OMPI users] Deadlock on large numbers of processors

2008-12-05 Thread Justin
that will turn off buffering? Thanks, Justin Brock Palen wrote: When ever this happens we found the code to have a deadlock. users never saw it until they cross the eager->roundevous threshold. Yes you can disable shared memory with: mpirun --mca btl ^sm Or you can try increasing the eager li

[OMPI users] Deadlock on large numbers of processors

2008-12-05 Thread Justin
in ompi_request_default_wait_some () from /opt/apps/intel10_1/openmpi/1.3//lib/libmpi.so.0 #4 0x2b2ded109e34 in PMPI_Waitsome () from /opt/apps/intel10_1/openmpi/1.3//lib/libmpi.so.0 Thanks, Justin

[OMPI users] Open-MPI 1.2 and GM

2007-03-27 Thread Justin Bronder
owing fails: /usr/local/ompi-gnu/bin/mpirun -np 4 -mca btl gm --host node84,node83 ./xhpl I've attached gziped files as suggested on the "Getting Help" section of the website and the output from the failed mpirun. Both nodes are known good Myrinet nodes, using FMA to map. Thanks

Re: [OMPI users] how do i link to .la library files?

2006-10-27 Thread Justin Bronder
If you just add this to your .bashrc you should be fine. The other options, assuming root access is to just add the lib directory to /etc/ld.so.conf and rerun ldconfig on all machines. This will have the same effect, albeit for all users. -Justin. On 10/27/06, shane kennedy <kennedy_sh...@yahoo

Re: [OMPI users] problem abut openmpi running

2006-10-19 Thread Justin Bronder
On a number of my Linux machines, /usr/local/lib is not searched by ldconfig, and hence, is not going to be found by gcc. You can fix this by adding /usr/local/lib to /etc/ld.so.conf and running ldconfig ( add the -v flag if you want to see the output ). -Justin. On 10/19/06, Durga Choudhury

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-08 Thread Justin Bronder
the build with the standard gcc compilers that are included with OS X. This is powerpc-apple-darwin8-gcc-4.0.1. Thanks, Justin. Jeff Squyres (jsquyres) wrote: > Justin -- > > Can we eliminate some variables so that we can figure out where the > error is originating? > > - Can

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder
yrinet (GM)? If so, I'd love to hear the configure arguments and various versions you are using. Bonus points if you are using the IBM XL compilers. Thanks, Justin. On 7/6/06, Justin Bronder <jsbron...@gmail.com> wrote: Yes, that output was actually cut and pasted from an OS X run. I

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder
Yes, that output was actually cut and pasted from an OS X run. I'm about to test against 1.0.3a1r10670. Justin. On 7/6/06, Galen M. Shipman <gship...@lanl.gov> wrote: Justin, Is the OS X run showing the same residual failure? - Galen On Jul 6, 2006, at 10:49 AM, Justin Bronder

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder
Disregard the failure on Linux, a rebuild from scratch of HPL and OpenMPI seems to have resolved the issue. At least I'm not getting the errors during the residual checks. However, this is persisting under OS X. Thanks, Justin. On 7/6/06, Justin Bronder <jsbron...@gmail.com> wrote: Fo

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder
As far as the nightly builds go, I'm still seeing what I believe to be this problem in both r10670 and r10652. This is happening with both Linux and OS X. Below are the systems and ompi_info for the newest revision 10670. As an example of the error, when running HPL with Myrinet I get the

Re: [OMPI users] OpenMpi 1.1 and Torque 2.1.1

2006-06-30 Thread Justin Bronder
know. Thanks, Justin Bronder. On 6/30/06, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: There was a bug in early Torque 2.1.x versions (I'm afraid I don't remember which one) that -- I think -- had something to do with a faulty poll() implementation. Whatever the problem was, it

[OMPI users] OpenMpi 1.1 and Torque 2.1.1

2006-06-29 Thread Justin Bronder
sr/src/openmpi-1.1 jbronder$ My thanks for any help in advance, Justin Bronder. ompi_info.log.gz Description: GNU Zip compressed data

Re: [OMPI users] [PMX:VIRUS] Re: OpenMPI 1.0.3a1r10002 Fails to build with IBM XL Compilers.

2006-05-31 Thread Justin Bronder
l/.libs/libopal.so ../../../opal/.libs/libopal.so -ldl -lm -lutil -lnsl --rpath /usr/local/ompi-xl/lib -lpthread ld: warning: cannot find entry symbol _start; defaulting to 10013ed8 Of course, I've been told that directly linking with ld isn't such a great idea in the first place. Ideas? Thanks, Justin.

Re: [OMPI users] [PMX:VIRUS] Re: OpenMPI 1.0.3a1r10002 Fails to build with IBM XL Compilers.

2006-05-31 Thread Justin Bronder
On 5/30/06, Brian Barrett <brbar...@open-mpi.org> wrote: On May 28, 2006, at 8:48 AM, Justin Bronder wrote: > Brian Barrett wrote: >> On May 27, 2006, at 10:01 AM, Justin Bronder wrote: >> >> >>> I've attached the required logs. Essentially the problem s

[OMPI users] [PMX:VIRUS] Re: OpenMPI 1.0.3a1r10002 Fails to build with IBM XL Compilers.

2006-05-28 Thread Justin Bronder
Brian Barrett wrote: > On May 27, 2006, at 10:01 AM, Justin Bronder wrote: > > >> I've attached the required logs. Essentially the problem seems to >> be that the XL Compilers fail to recognize "__asm__ __volatile__" in >> opal/include/sys/powerpc/atom