Hi there,
I'm running into an issue where mpirun isn't terminating when my executable
has a nonzero exit status - instead it's hanging indefinitely. I'm
attaching my process tree, the error message from the application, and the
messages printed to stderr. Please let me know what I can do.
are using?
On Sep 21, 2014, at 6:08 AM, Lee-Ping Wang <leep...@stanford.edu> wrote:
Hi there,
I'm running into an issue where mpirun isn't terminating when my executable
has a nonzero exit status - instead it's hanging indefinitely. I'm
attaching my process tree, the error messag
is hanging
Just to be clear: is your program returning a non-zero status and then
exiting, or is it segfaulting?
On Sep 21, 2014, at 8:22 AM, Lee-Ping Wang <leep...@stanford.edu> wrote:
I'm using version 1.8.1.
Thanks,
- Lee-Ping
From: users [mailto:users-boun...@open-m
, at 10:02 AM, Lee-Ping Wang <leep...@stanford.edu> wrote:
My program isn't segfaulting - it's returning a non-zero status and then
existing.
Thanks,
- Lee-Ping
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Sunday, September 21, 2014
in the 1.8 series, but I can't replicate it now with the nightly 1.8
tarball that is about to be released as 1.8.3.
http://www.open-mpi.org/nightly/v1.8/
On Sep 21, 2014, at 12:25 PM, Lee-Ping Wang <leep...@stanford.edu> wrote:
Hmm, I didn't know those were segfault reports. It could
Hi there,
I'm building OpenMPI 1.8.3 on a system where I explicitly don't want any of the
BTL components (they tend to break my single node jobs).
./configure CC=gcc CXX=g++ F77=gfortran FC=gfortran
--prefix=$QC_EXT_LIBS/openmpi --enable-static --enable-mca-no-build=btl
Building gives me
returned 1 exit status
Thanks,
- Lee-Ping
On Sep 29, 2014, at 5:27 AM, Lee-Ping Wang <leep...@stanford.edu> wrote:
> Hi there,
>
> I'm building OpenMPI 1.8.3 on a system where I explicitly don't want any of
> the BTL components (they tend to break my single node jobs).
>
undefined reference to `ibv_get_device_list'
>> ../../../ompi/.libs/libmpi.so: undefined reference to `ibv_get_device_name'
>> ../../../ompi/.libs/libmpi.so: undefined reference to `ibv_destroy_cq'
>> collect2: error: ld returned 1 exit status
>>
>> Thanks,
>>
&
Hi there,
My application uses MPI to run parallel jobs on a single node, so I have no
need of any support for communication between nodes. However, when I use
mpirun to launch my application I see strange errors such as:
sn't freeing up some system
resource as it should. Is there something that needs to be corrected in the
code?
Thanks,
- Lee-Ping
On Sep 29, 2014, at 5:12 PM, Lee-Ping Wang <leep...@stanford.edu> wrote:
> Hi there,
>
> My application uses MPI to run parallel jobs on a single node, so
Here's another data point that might be useful: The error message is much more
rare if I run my application on 4 cores instead of 8.
Thanks,
- Lee-Ping
On Sep 29, 2014, at 5:38 PM, Lee-Ping Wang <leep...@stanford.edu> wrote:
> Sorry for my last email - I think I spoke too quick. I
here we find missing library issues. It looks like
>> someone has left incorrect configure logic in the system such that we always
>> attempt to build Infiniband-related code, but without linking against the
>> library.
>>
>> We'll have to try and track it down.
>&g
ble that you are trying to open
> statically defined ports, which means that running the job again too soon
> could leave the OS thinking the socket is already busy. It takes awhile for
> the OS to release a socket resource.
>
>
> On Sep 29, 2014, at 5:49 PM, Lee-Ping Wan
pute node - it's only
subsequent executions that give the error messages.
Thanks,
- Lee-Ping
On Sep 30, 2014, at 11:05 AM, Ralph Castain <r...@open-mpi.org> wrote:
>
> On Sep 30, 2014, at 10:49 AM, Lee-Ping Wang <leep...@stanford.edu> wrote:
>
>> Hi Ralph,
&
Hi Ralph,
Thanks. I'll add some print statements to the code and try to figure out
precisely where the failure is happening.
- Lee-Ping
On Sep 30, 2014, at 12:06 PM, Ralph Castain <r...@open-mpi.org> wrote:
>
> On Sep 30, 2014, at 11:19 AM, Lee-Ping Wang <leep...@stan
with the fix. :)
Thanks,
- Lee-Ping
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lee-Ping Wang
Sent: Tuesday, September 30, 2014 1:15 PM
To: Open MPI Users
Subject: Re: [OMPI users] General question about running single-node jobs.
Hi Ralph,
Thanks. I'll add some
clusters, it
> is sometimes preferable to
> write the temporary files to a disk local to the node.
> QCLOCALSCR
> spec-
> ifies this directory. The temporary files will be copied to
> QCSCRATCH
> at
> the end of the job, unless the job is terminated abnormally. I
>
it always thinks it's running on a single node,
regardless of the type of job I submit to the batch system?
Thank you,
- Lee-Ping Wang (Postdoc in Dept. of Chemistry, Stanford
University)
[compute-1-1.local:10910] [[42010,0],0] ORTE_ERROR_LOG: File open failure in
file
np 16 ./my-Q-chem-executable
I hope this helps,
Gus Correa
On Aug 10, 2013, at 1:51 PM, Lee-Ping Wang wrote:
> Hi there,
>
> Recently, I've begun some calculations on a cluster where I submit a
multiple node job to the Torque batch system, and the job executes multiple
single-node par
-boun...@open-mpi.org] On Behalf Of Gustavo Correa
Sent: Saturday, August 10, 2013 12:39 PM
To: Open MPI Users
Subject: Re: [OMPI users] Error launching single-node tasks from
multiple-node job.
Hi Lee-Ping
On Aug 10, 2013, at 3:15 PM, Lee-Ping Wang wrote:
> Hi Gus,
>
> Thank you for your
-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lee-Ping Wang
Sent: Saturday, August 10, 2013 12:51 PM
To: 'Open MPI Users'
Subject: Re: [OMPI users] Error launching single-node tasks from
multiple-node job.
Hi Gus,
Thank you. You gave me many helpful suggestions, which I
-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lee-Ping Wang
Sent: Saturday, August 10, 2013 3:07 PM
To: 'Open MPI Users'
Subject: Re: [OMPI users] Error launching single-node tasks from
multiple-node job.
Hi Gus,
I tried your suggestions. Here is the command line which executes
nvironment
variables:
echo $PBS_JOBID
echo $PBS_NODEFILE
ls -l $PBS_NODEFILE
cat $PBS_NODEFILE
cat $PBS_JOBID [this one should fail, because that is not a file, but may
work the PBS variables were messed up along the way]
I hope this helps,
Gus Correa
On Aug 10, 2013, at 6:39 PM, Lee-Ping Wang wrote:
Regardless, you can always deselect Torque support at runtime. Just put the
following in your environment:
OMPI_MCA_ras=^tm
That will tell ORTE to ignore the Torque allocation module and it should
then look at the machinefile.
On Aug 10, 2013, at 4:18 PM, "Lee-Ping Wang" <leep.
ment: -machinefile /scratch/leeping/pbs_nodefile.$HOSTNAME
This will leave the PBS_NODEFILE variable intact, and have the same net
effect as your workflow.
Anyway, congratulations for sorting things out and making it work!
Gus Correa
On Aug 10, 2013, at 7:40 PM, Lee-Ping Wang wrote:
> H
Hi there,
On a few clusters I am running into an issue where a temporary directory cannot
be created due to the root filesystem being full, causing mpirun to crash.
Would it be possible to change the location where this directory is being
created?
[compute-109-4.local:12055]
euti
On Sep 4, 2013, at 10:13 AM, Lee-Ping Wang <leep...@stanford.edu> wrote:
Hi there,
On a few clusters I am running into an issue where a temporary directory cannot
be created due to the root filesystem being full, causing mpirun to crash.
Would it be possible to change the locatio
27 matches
Mail list logo