Open MPI IB Gurus,
I have some slightly older InfiniBand-equipped nodes with IB which have
less RAM than we'd like, and on which we tend to run jobs that can span
16-32 nodes of this type. The jobs themselves tend to run on the heavy
side in terms of their own memory requirements.
When we used
responded to the firmware part of this earlier:
http://www.open-mpi.org/community/lists/users/2011/12/18014.php
Thank you,
V. Ram
--
http://www.fastmail.fm - Access your email from home and the web
Thank you.
V. Ram
> On Dec 15, 2011, at 7:24 PM, V. Ram wrote:
>
> > Hi Terry,
> >
> > Thanks so much for the response. My replies are in-line below.
> >
> > On Thu, Dec 15, 2011, at 07:00 AM, TERRY DONTJE wrote:
> >> IIRC, RNR's are usu
ted number of observable parameters I'm
aware of, to be dependent on the number of nodes involved.
It is an intermittent problem, but when it happens, it happens at job
launch, and it does occur most of the time.
Thanks,
V. Ram
> --td
> >
> > Open MPI InfiniBand gurus and/or Mellanox:
Open MPI InfiniBand gurus and/or Mellanox: could I please get some
assistance with this? Any suggestions on tunables or debugging
parameters to try?
Thank you very much.
On Mon, Dec 12, 2011, at 10:42 AM, V. Ram wrote:
> Hello,
>
> We are running a cluster that has a good number of ol
use the same InfiniBand fabric
continuously without any issue, so I don't think it's the fabric/switch.
I'm at a loss for what to do next to try and find the root cause of the
issue. I suspect something perhaps having to do with the mthca
support/drivers, but how can I track this down further?
Terry Frankcombe wrote:
> Isn't it up to the OS scheduler what gets run where?
I was under the impression that the processor affinity API was designed
to let the OS (at least Linux) know how a given task preferred to be
bound in terms of the system topology.
--
V. Ram
v_r_...@fastmail
ve any easy way to tell that without a
hostfile, etc.
--
V. Ram
v_r_...@fastmail.fm
--
http://www.fastmail.fm - Or how I learned to stop worrying and
love email again
e sockets (all 4 cores) active at a time on this job.
Does this make more sense?
--
V. Ram
v_r_...@fastmail.fm
--
http://www.fastmail.fm - A no graphics, no pop-ups email service
such functionality is technically possible via PLPA. Is
there in fact a way to specify such a thing with 1.2.8, and if not, will
1.3 support these kinds arguments?
Thank you.
--
V. Ram
v_r_...@fastmail.fm
--
http://www.fastmail.fm - Or how I learned to stop worrying
nyone else experiencing the same issues.
Thanks Leonardo!
OMPI devs: does this imply bug(s) in the e1000 driver/chip? Should I
contact the driver authors?
On Fri, 10 Oct 2008 12:42:19 -0400, "V. Ram" <v_r_...@fastmail.fm> said:
> Leonardo,
>
> These nodes are all usi
uot;eth0, eth1". You should
> > try to restrict Open MPI to use only one of the available networks by
> > using the --mca btl_tcp_if_include ethx parameter to mpirun, where x
> > is the network interface that is always connected to the same logical
> > and physical
re there any suggestions on how to figure out if it's a problem with
> > the code or the OMPI installation/software on the system? We have
> > tried
> > "--debug-daemons" with no new/interesting information being revealed.
> > Is there a way to trap segfault messages or more detailed MPI
> > transaction information or anything else that could help diagnose
> > this?
> >
> > Thanks.
> > --
> > V. Ram
> > v_r_...@fastmail.fm
--
V. Ram
v_r_...@fastmail.fm
--
http://www.fastmail.fm - A no graphics, no pop-ups email service
on the system? We have tried
"--debug-daemons" with no new/interesting information being revealed.
Is there a way to trap segfault messages or more detailed MPI
transaction information or anything else that could help diagnose this?
Thanks.
--
V. Ram
v_r_...@fastmail.fm
installation/software on the system? We have tried
"--debug-daemons" with no new/interesting information being revealed.
Is there a way to trap segfault messages or more detailed MPI
transaction information or anything else that could help diagnose this?
Thanks.
--
V. Ram
v_r_...@f
15 matches
Mail list logo