Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-12-02 Thread Gilles Gouaillardet
+1 if i remember correctly, all the interfaces are scanned, so there should be some room to display a user-friendly message (on Linux and impacted architectures) such as "there is no loopback interface, you will likely run into some trouble" Gilles On 2014/12/03 13:50, Paul Hargrove wrote: >

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-12-02 Thread Paul Hargrove
IMHO the lack of a loopback interface should be a very uncommon occurrence. So, I believe that improving the error message to mention that possibility would help a great deal. -Paul On Tue, Dec 2, 2014 at 8:28 PM, Ralph Castain wrote: > We talked about this on the weekly

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-12-02 Thread Ralph Castain
We talked about this on the weekly conference call, and adding the usock component to 1.8 is just not within our procedures. It would involve bringing over much more of the OOB revisions (we’d have to handle the transfer of messages between components, if nothing else), and that involves a lot

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Ralph Castain
> On Nov 25, 2014, at 6:15 PM, Gilles Gouaillardet > wrote: > > Ralph and Paul, > > On 2014/11/26 10:37, Ralph Castain wrote: >> So it looks like the issue isn’t so much with our code as it is with the OS >> stack, yes? We aren’t requiring that the loopback be

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Gilles Gouaillardet
Ralph and Paul, On 2014/11/26 10:37, Ralph Castain wrote: > So it looks like the issue isn't so much with our code as it is with the OS > stack, yes? We aren't requiring that the loopback be "up", but the stack is > in order to establish the connection, even when we are trying a non-lo >

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Ralph Castain
No need - looks like I just need to fail faster and add this possibility to the error message. Thanks! > On Nov 25, 2014, at 4:50 PM, Paul Hargrove wrote: > > Ralph, > > I had a look at the problem via "mpirun -np 1 strace -o trace -ff ./hello" > I find that there is an

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Paul Hargrove
On Tue, Nov 25, 2014 at 5:37 PM, Ralph Castain wrote: > So it looks like the issue isn't so much with our code as it is with the > OS stack, yes? We aren't requiring that the loopback be "up", but the stack > is in order to establish the connection, even when we are trying a

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Ralph Castain
I would never doubt you about the firewall, Paul :-) So it looks like the issue isn’t so much with our code as it is with the OS stack, yes? We aren’t requiring that the loopback be “up”, but the stack is in order to establish the connection, even when we are trying a non-lo interface. I can

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Paul Hargrove
Ralph, I had a look at the problem via "mpirun -np 1 strace -o trace -ff ./hello" I find that there is an attempt (by a secondary thread) to establish a TCP socket from the rank process to the eth0 address of localhost (I am guessing to reach the orted/mpirun). However, when the "lo" interface is

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Paul Hargrove
Allan, I am glad things are working for you now. I can confirm (on a QEMU-emulated Versatile Express A9 board running Ubuntu 14.04) that disabling the "lo" interface reproduces the problem. I imagine this is true on other architectures, though I did not attempt to verify. Ralph, If oob:tcp

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Allan Wu
I think I have found the problem. After inspecting the output with ​ "-mca state_base_verbose 10 -mca odls_base_verbose 10 -mca oob_base_verbose 10 ​0​ " ​ on both the old system and the new system, I noticed there is one line ​ that is ​ different ​:​ ​o​ n the old system where it works

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Paul Hargrove
Following Larry's suggestion to use /proc/config.gz, Allan sent me kernel configs for the old (3.8) and new (3.15) kernels. While there were more changes than I expected, none relates to removing an API/feature that Open MPI is likely to be using. -Paul On Tue, Nov 25, 2014 at 11:28 AM, Larry

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Allan Wu
Thanks Ralph! I did not compile my openmpi with --enable-debug, and I am compiling it now. But your suggested command already provide ​d​ some output, which I attached with this email. It seems the process was stuck on the line: "[fpga2:00962] [[44848,1],0] waiting for connect completion to

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Larry Baker
Allan, If you can still boot the old embedded system, a lot of times the config parameters are saved as /proc/config.gz. You can at least them compare the two configs. Larry Baker US Geological Survey 650-329-5608 ba...@usgs.gov On 25 Nov 2014, at 11:11 AM, Allan Wu wrote: > Thanks Paul!

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Ralph Castain
This is all running on a single node, correct? If so, did you configure OMPI with —enable-debug? If you can do that, or already have, then let’s add the following to the mpirun cmd line: -mca state_base_verbose 10 -mca odls_base_verbose 10 -mca oob_base_verbose 10 You’ll get a bunch of

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Allan Wu
Thanks Paul! Unfortunately '/boot' is not available in my embedded linux, and I do not have the configuration file for the old kernel since it is provided as is. However, I have the new kernel configuration since I compiled it myself. Would it be helpful if I provide you the .config file when I

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Paul Hargrove
Allan, A likely possibility is that some important kernel feature (that Open MPI assumes is present) is missing. That includes not only "kernel modules" as you mention, but also features configure in (or out) of the base kernel. For instance, some embedded kernels omit UNIX-domain sockets and

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Allan Wu
​​I'm sorry I forgot to change the subject when I reply to the digest issue. Please find my original email below. Regards, Di On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu wrote: > Thanks Ralph for the reply. Sorry about the log file, I think I forgot to > put an extension to

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Ralph Castain
Thanks - no idea why it was trying to execute on my machine, but I’ve learned to be far less trusting. Looks like it was just a complete output of ompi_info, which doesn’t really help here anyway. Will need to hear the answers to my questions before suggesting a next step. > On Nov 25, 2014,

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Paul Hargrove
Ralph, I downloaded the attachment and found it to be a gzipped tar file containing a single text file "log". I have attached the bzipped (not tarred) log file. -Paul On Tue, Nov 25, 2014 at 7:29 AM, Ralph Castain wrote: > I don't know what you put in that log file, but it

Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Ralph Castain
I don’t know what you put in that log file, but it was an executable and I’m not feeling that trusting :-) I’m afraid there isn’t enough debug output there to really tell anything. From what little I can see, I’m guessing that the application ran fine and you got the usual “hello” output and