+1
if i remember correctly, all the interfaces are scanned, so there should
be some room to display
a user-friendly message (on Linux and impacted architectures) such as
"there is no loopback interface, you will likely run into some trouble"
Gilles
On 2014/12/03 13:50, Paul Hargrove wrote:
>
IMHO the lack of a loopback interface should be a very uncommon occurrence.
So, I believe that improving the error message to mention that possibility
would help a great deal.
-Paul
On Tue, Dec 2, 2014 at 8:28 PM, Ralph Castain wrote:
> We talked about this on the weekly
We talked about this on the weekly conference call, and adding the usock
component to 1.8 is just not within our procedures. It would involve bringing
over much more of the OOB revisions (we’d have to handle the transfer of
messages between components, if nothing else), and that involves a lot
> On Nov 25, 2014, at 6:15 PM, Gilles Gouaillardet
> wrote:
>
> Ralph and Paul,
>
> On 2014/11/26 10:37, Ralph Castain wrote:
>> So it looks like the issue isn’t so much with our code as it is with the OS
>> stack, yes? We aren’t requiring that the loopback be
Ralph and Paul,
On 2014/11/26 10:37, Ralph Castain wrote:
> So it looks like the issue isn't so much with our code as it is with the OS
> stack, yes? We aren't requiring that the loopback be "up", but the stack is
> in order to establish the connection, even when we are trying a non-lo
>
No need - looks like I just need to fail faster and add this possibility to the
error message.
Thanks!
> On Nov 25, 2014, at 4:50 PM, Paul Hargrove wrote:
>
> Ralph,
>
> I had a look at the problem via "mpirun -np 1 strace -o trace -ff ./hello"
> I find that there is an
On Tue, Nov 25, 2014 at 5:37 PM, Ralph Castain wrote:
> So it looks like the issue isn't so much with our code as it is with the
> OS stack, yes? We aren't requiring that the loopback be "up", but the stack
> is in order to establish the connection, even when we are trying a
I would never doubt you about the firewall, Paul :-)
So it looks like the issue isn’t so much with our code as it is with the OS
stack, yes? We aren’t requiring that the loopback be “up”, but the stack is in
order to establish the connection, even when we are trying a non-lo interface.
I can
Ralph,
I had a look at the problem via "mpirun -np 1 strace -o trace -ff ./hello"
I find that there is an attempt (by a secondary thread) to establish a TCP
socket from the rank process to the eth0 address of localhost (I am
guessing to reach the orted/mpirun).
However, when the "lo" interface is
Allan,
I am glad things are working for you now.
I can confirm (on a QEMU-emulated Versatile Express A9 board running Ubuntu
14.04) that disabling the "lo" interface reproduces the problem.
I imagine this is true on other architectures, though I did not attempt to
verify.
Ralph,
If oob:tcp
I think I have found the problem. After inspecting the output with
"-mca state_base_verbose 10 -mca odls_base_verbose 10 -mca oob_base_verbose
10
0
"
on both the old system and the new system, I noticed there is one line
that is
different
:
o
n the old system where it works
Following Larry's suggestion to use /proc/config.gz, Allan sent me kernel
configs for the old (3.8) and new (3.15) kernels.
While there were more changes than I expected, none relates to removing an
API/feature that Open MPI is likely to be using.
-Paul
On Tue, Nov 25, 2014 at 11:28 AM, Larry
Thanks Ralph!
I did not compile my openmpi with --enable-debug, and I am compiling it
now. But your suggested command already provide
d
some output, which I attached with this email.
It seems the process was stuck on the line:
"[fpga2:00962] [[44848,1],0] waiting for connect completion to
Allan,
If you can still boot the old embedded system, a lot of times the config
parameters are saved as /proc/config.gz. You can at least them compare the two
configs.
Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov
On 25 Nov 2014, at 11:11 AM, Allan Wu wrote:
> Thanks Paul!
This is all running on a single node, correct? If so, did you configure OMPI
with —enable-debug?
If you can do that, or already have, then let’s add the following to the mpirun
cmd line:
-mca state_base_verbose 10 -mca odls_base_verbose 10 -mca oob_base_verbose 10
You’ll get a bunch of
Thanks Paul! Unfortunately '/boot' is not available in my embedded linux,
and I do not have the configuration file for the old kernel since it is
provided as is. However, I have the new kernel configuration since I
compiled it myself. Would it be helpful if I provide you the .config file
when I
Allan,
A likely possibility is that some important kernel feature (that Open MPI
assumes is present) is missing.
That includes not only "kernel modules" as you mention, but also features
configure in (or out) of the base kernel.
For instance, some embedded kernels omit UNIX-domain sockets and
I'm sorry I forgot to change the subject when I reply to the digest
issue. Please find my original email below.
Regards,
Di
On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu wrote:
> Thanks Ralph for the reply. Sorry about the log file, I think I forgot to
> put an extension to
Thanks - no idea why it was trying to execute on my machine, but I’ve learned
to be far less trusting.
Looks like it was just a complete output of ompi_info, which doesn’t really
help here anyway. Will need to hear the answers to my questions before
suggesting a next step.
> On Nov 25, 2014,
Ralph,
I downloaded the attachment and found it to be a gzipped tar file
containing a single text file "log".
I have attached the bzipped (not tarred) log file.
-Paul
On Tue, Nov 25, 2014 at 7:29 AM, Ralph Castain wrote:
> I don't know what you put in that log file, but it
I don’t know what you put in that log file, but it was an executable and I’m
not feeling that trusting :-)
I’m afraid there isn’t enough debug output there to really tell anything. From
what little I can see, I’m guessing that the application ran fine and you got
the usual “hello” output and
21 matches
Mail list logo