On Apr 27, 2011, at 1:31 PM, Sindhi, Waris PW wrote:
> No we do not have a firewall turned on. I can run smaller 96 slave cases
> on ln10 and ln13 included on the slavelist.
>
> Could there be another reason for this to fail ?
What is in "procgroup"? Is it a single application?
Offhand,
On Thu, Apr 28, 2011 at 12:46:27AM +0200, Tru Huynh wrote:
> On Thu, Apr 21, 2011 at 06:35:16PM -0400, Jeff Squyres wrote:
> > It's normal and expected for there to be lots of errors in config.log.
> >
> > There's a bunch of tests in configure that are designed to succeed on some
> > systems
On Thu, Apr 21, 2011 at 06:35:16PM -0400, Jeff Squyres wrote:
> It's normal and expected for there to be lots of errors in config.log.
>
> There's a bunch of tests in configure that are designed to succeed on some
> systems and fail on others.
>
> So don't read anything into the failures
On Apr 27, 2011, at 3:39 PM, Ralph Castain wrote:
> Nope, nope nope...in this mode of operation, we are using -static- ports.
Er.. right. Sorry -- my bad for not reading the full context here... ignore
what I said...
--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
On Apr 27, 2011, at 1:27 PM, Jeff Squyres wrote:
> On Apr 27, 2011, at 2:46 PM, Ralph Castain wrote:
>
>> Actually, I understood you correctly. I'm just saying that I find no
>> evidence in the code that we try three times before giving up. What I see is
>> a single attempt to bind the port -
No we do not have a firewall turned on. I can run smaller 96 slave cases
on ln10 and ln13 included on the slavelist.
Could there be another reason for this to fail ?
Sincerely,
Waris Sindhi
High Performance Computing, TechApps
Pratt & Whitney, UTC
(860)-565-8486
-Original Message-
On Apr 27, 2011, at 2:46 PM, Ralph Castain wrote:
> Actually, I understood you correctly. I'm just saying that I find no evidence
> in the code that we try three times before giving up. What I see is a single
> attempt to bind the port - if it fails, then we abort. There is no parameter
> to
On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain wrote:
>
> On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote:
>
>> On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote:
>>>
>>> On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote:
>>>
Was this
On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote:
> On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote:
>>
>> On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote:
>>
>>> Was this ever committed to the OMPI src as something not having to be
>>> run outside of
On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote:
>
> On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote:
>
>> Was this ever committed to the OMPI src as something not having to be
>> run outside of OpenMPI, but as part of the PSM setup that OpenMPI
>> does?
>
> Not
On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote:
> Was this ever committed to the OMPI src as something not having to be
> run outside of OpenMPI, but as part of the PSM setup that OpenMPI
> does?
Not that I know of - I don't think the PSM developers ever looked at it.
>
> I'm having
Hi,
I am getting a "oob-tcp: Communication retries exceeded" error
message when I run a 238 MPI slave code
/opt/openmpi/i386/bin/mpirun -mca btl_openib_verbose 1 --mca btl ^tcp
--mca pls_ssh_agent ssh -mca oob_tcp_peer_retries 1000 --prefix
/usr/lib/openmpi/1.2.8-gcc/bin -np 239 --app
Was this ever committed to the OMPI src as something not having to be
run outside of OpenMPI, but as part of the PSM setup that OpenMPI
does?
I'm having some trouble getting Slurm/OpenMPI to play nice with the
setup of this key. Namely, with slurm you cannot export variables
from the --prolog of
FWIW, my ARM contact tells me that he uses a native ARM Linux distro explicitly
to avoid all the complexities of cross-compiling... :-\
On Apr 25, 2011, at 11:29 AM, Jeff Squyres wrote:
> There's some extra special mojo that needs to be supplied when
> cross-compiling Open MPI (e.g., a file
Argh, our messed up environment with three generations on infiniband bit us,
Setting openib_cpc_include to rdmacm causes ib to not be used on our old DDR ib
on some of our hosts. Note that jobs will never run across our old DDR ib and
our new QDR stuff where rdmacm does work.
I am doing some
http://www.pimp2.com/modules/mod_osdonate/life.html
16 matches
Mail list logo