Re: [OMPI users] Mpirun invocation only works in debug mode, hangs in "normal" mode.

Jeff Squyres (jsquyres) Mon, 16 May 2016 12:03:49 -0400 (EDT)

I'm afraid I don't know what the difference is in systemctld for ssh.socket vs. 
ssh.service, or why that would change Open MPI's behavior.


One other thing to try is to mpirun non-MPI programs, like "hostname" and see 
if that works.  This will help distinguish between problems with Open MPI's run 
time / launching system and the MPI communications layer (such problems are 
usually in the former, so running non-MPI programs can be helpful in verifying 
that you get hangs in that situation, too).

If you still care...

Try running "hostname" instead of "cpi" in the problematic scenarios (or even 
mpirun "sleep 10000" so that you have plenty of time to login to each machine, 
run ps, etc.).  Do you see an orted running on each pi?  Do you see the 
hostname (or "sleep 10000") running?  Can you see if there was a successful 
inbound ssh connection (check the /var logs)?



> On May 14, 2016, at 5:44 PM, Andrew Reid <andrew.ce.r...@gmail.com> wrote:
> 
> I think I might have fixed this, but I still don't really understand it.
> 
> In setting up the RPi machines, I followed a config guide that suggested 
> switching the SSH service in systemd to "ssh.socket" instead of 
> "ssh.service". It's supposed to be lighter weight and get you cleaner 
> shut-downs, and I've used this trick on other machines, without really 
> knowing the implications.
> 
> By way of completeness of my config audit to try to figure this out, I backed 
> this out, restoring the "ssh.service" link and removing the "ssh.socket" one. 
> Now MPI works, and I also get clean disconnections at exit-time, so 
> apparently there's no reason at all to do this.
> 
> This behavior has survived two reboot cycles, so I think it's real. Not sure 
> if this behavior is relevant to just Raspbian, or if it's in all 
> architectures of Debian Jessie, or all systemd init systems, or what.
> 
>          -- A.
> 
> On Sat, May 14, 2016 at 3:27 PM Andrew Reid <andrew.ce.r...@gmail.com> wrote:
> Hi all --
> 
> I am having a weird problem on a cluster of Raspberry Pi model 2 machines 
> running the Debian/Raspbian version of OpenMPI, 1.6.5.
> 
> I apologize for the length of this message, but I am trying to include all 
> the pertinent details, but of course can't reliably discriminate between 
> pertinent and irrelevant details.
> 
> I am actually a fairly long-time user of OpenMPI in various environments, and 
> have never had any trouble with it, but in configuring my "toy" cluster, this 
> came up.  
> 
> The basic issue is, a sample MPI executable runs with "mpirun -d" or under 
> "slurm" resource allocation, but not directly from the command line -- in the 
> direct command-line case, it just hangs, apparently forever.
> 
> What is even weirder is that, earlier today, while backing out a private 
> domain configuration (see below), it actually started working for a while, 
> but after reboots, the problem behavior has returned.
> 
> It seems overwhelmingly likely that this is some kind of network transport 
> configuration problem, but it eludes me.
> 
>  
> More details about the problem:
> 
>   
> The Pis are all quad-core, and are named pi (head node), pj, pk, and pl (work 
> nodes).  They're connected by ethernet.  They all have a single 
> non-privileged user, named "pi".
> 
> There's a directory on my account containing an MPI executable, the "cpi" 
> example from the OpenMPI package, and a list of machines to run on, named 
> "machines", with the following contents:
> 
> > pj slots=4
> > pk slots=4
> > pl slots=4
> 
> 
> > mpirun --hostfile machines ./cpi
> 
>   ... hangs forever, but
> 
> > mpirun -d --hostfile machines ./cpi 
> 
>   ... runs correctly, if somewhat verbosely.
> 
> Also:
> 
> > salloc -n 12 /bin/bash   
> > mpirun ./cpi
> 
>    ... also runs correctly.  The "salloc" command is a slurm directive to 
> allocate CPU resources, and start an interactive shell with a bunch of 
> environment variables set to give mpirun the clues it needs, of course.  The 
> work CPUs are allocated correctly on my "work" nodes when salloc is run from 
> the head node.
> 
>   
> 
>   Config details and diagnostic efforts:
> 
> The outputs of the ompi_info runs are attached.
> 
> The cluster of four Raspberry Pi model 2 computers runs the Jessie 
> distribution of Raspbian, which is essentially Debian.  They differ a bit, 
> the "head node", creatively named "pi", has an older static network config, 
> with everything specified in /etc/network/interfaces.  The "cluster nodes", 
> equally creatively named pj, pk, and pl, all have the newer DHCPCD client 
> daemon configured for static interfaces, via /etc/dhcpcd.conf (NB this is 
> *not* the DHCP *server*, these machines do not use DHCP services.)  The 
> dhcpcd configuration tool is the new scheme for Raspian, and has been 
> modified from the "as-shipped" set-up to have a static IPv4 address on eth0, 
> and to remove some ipv6 functionality (router solicitation) that pollutes the 
> log files.
> 
> 
> MDNS is turned off in /etc/nsswitch.conf, "hosts" are resolved via "files", 
> then "dns".  The DNS name servers are statically configured to be 8.8.8.8 and 
> 8.8.4.4.  None of the machines involved in the OpenMPI operation are in DNS.
> 
> For slightly complicated reasons, all four machines were initially configured 
> as members of a local, non-DNS-resolveable domain, named ".gb"  This was done 
> because slurm requires e-mail, and my first crack at e-mail config seemed to 
> require a domain.  All the hostnames were statically configured through 
> /etc/hosts.   I realized later that I misunderstood the mail config, and have 
> backed out the domain configuration, the machines all have non-dotted names.  
> 
> This seemed to briefly change the behavior, it worked several times after 
> this, but then on reboot, stopped working again, making me think I am perhaps 
> losing my mind.
> 
> The system is *not* running nscd, so some kind of name-service cache is not a 
> good explanation here.
> 
> 
> The whole cluster is set up for host-based SSH authentication for the default 
> user, "pi".  This works for all possible host pairs, tested via:
> 
> > ssh -o PreferredAuthentications=hostbased pi@<target>
> 
> The network config looks OK.  I can ping and ssh every way I want to, and it 
> all works.  The pis are all wired to the same Netgear 10/100 switch, which in 
> turn goes to my household switch, which in turn goes to my cable modem.  
> "ifconfig" shows eth0 and lo configured. "ifconfig -a" does not show any 
> additional unconfigured interfaces.
>   
> Ifconfig output is, in order for pi, pj, pk, and pl:
> 
> 
> 
> eth0      Link encap:Ethernet  HWaddr b8:27:eb:16:0a:70  
>           inet addr:192.168.0.11  Bcast:192.168.0.255  Mask:255.255.255.0
>           inet6 addr: ::ba27:ebff:fe16:a70/64 Scope:Global
>           inet6 addr: fe80::ba27:ebff:fe16:a70/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:164 errors:0 dropped:23 overruns:0 frame:0
>           TX packets:133 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000 
>           RX bytes:15733 (15.3 KiB)  TX bytes:13756 (13.4 KiB)
> 
> lo        Link encap:Local Loopback  
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:65536  Metric:1
>           RX packets:7 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0 
>           RX bytes:616 (616.0 B)  TX bytes:616 (616.0 B)
> 
> 
> 
> 
> eth0      Link encap:Ethernet  HWaddr b8:27:eb:27:4d:17  
>           inet addr:192.168.0.12  Bcast:192.168.0.255  Mask:255.255.255.0
>           inet6 addr: ::4c5c:1329:f1b6:1169/64 Scope:Global
>           inet6 addr: fe80::6594:bfad:206:1191/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:237 errors:0 dropped:31 overruns:0 frame:0
>           TX packets:131 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000                                        
>   
>           RX bytes:28966 (28.2 KiB)  TX bytes:18841 (18.3 KiB)                
>   
>                                                                               
>   
> lo        Link encap:Local Loopback                                           
>   
>           inet addr:127.0.0.1  Mask:255.0.0.0                                 
>   
>           inet6 addr: ::1/128 Scope:Host                                      
>   
>           UP LOOPBACK RUNNING  MTU:65536  Metric:1                            
>   
>           RX packets:136 errors:0 dropped:0 overruns:0 frame:0                
>   
>           TX packets:136 errors:0 dropped:0 overruns:0 carrier:0              
>   
>           collisions:0 txqueuelen:0                                           
>   
>           RX bytes:11664 (11.3 KiB)  TX bytes:11664 (11.3 KiB)
> 
> 
> 
>  
> eth0      Link encap:Ethernet  HWaddr b8:27:eb:f4:ec:03  
>           inet addr:192.168.0.13  Bcast:192.168.0.255  Mask:255.255.255.0
>           inet6 addr: fe80::ba08:3c9:67c3:a2a1/64 Scope:Link
>           inet6 addr: ::8e5a:32a5:ab50:d955/64 Scope:Global
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:299 errors:0 dropped:57 overruns:0 frame:0
>           TX packets:138 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000 
>           RX bytes:34334 (33.5 KiB)  TX bytes:19909 (19.4 KiB)
> 
> lo        Link encap:Local Loopback  
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:65536  Metric:1
>           RX packets:136 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:136 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0 
>           RX bytes:11664 (11.3 KiB)  TX bytes:11664 (11.3 KiB)
> 
> 
> 
> eth0      Link encap:Ethernet  HWaddr b8:27:eb:da:c6:7f  
>           inet addr:192.168.0.14  Bcast:192.168.0.255  Mask:255.255.255.0
>           inet6 addr: ::a8db:7245:458f:2342/64 Scope:Global
>           inet6 addr: fe80::3c5f:7092:578a:6c10/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:369 errors:0 dropped:76 overruns:0 frame:0
>           TX packets:165 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000 
>           RX bytes:38040 (37.1 KiB)  TX bytes:22788 (22.2 KiB)
> 
> lo        Link encap:Local Loopback  
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:65536  Metric:1
>           RX packets:136 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:136 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0 
>           RX bytes:11664 (11.3 KiB)  TX bytes:11664 (11.3 KiB)
> 
> 
> 
> 
> There are no firewalls on any of the machines.  I checked this via 
> "iptables-save", which dumps the system firewall state in a way that allows 
> it to be re-loaded by a script, and the output is reasonably human-readable.  
> It shows all tables with no rules and a default "accept" state.
> 
> 
> The OpenMPI installation is the current Raspbian version, freshly installed 
> (via "apt-get install openmpi-bin libopenmpi-dev") from the repos.  The 
> OpenMPI is version 1.6.5, the package version is 1.6.5-9.1+rpi1.  No 
> configuration options have been modified.
> 
> There is no ".openmpi" directory on the pi user account on any of the 
> machines.
> 
> When I run the problem case, I can sometimes catch the "orted" daemon 
> spinning up on the pj machine, it looks something like this (the port number 
> on the tcp uri varies from run to run): 
> 
> > 1 S pi        4895     1  0  80   0 -  1945 poll_s 20:23 ?        00:00:00 
> > orted --daemonize -mca ess env -mca orte_ess_jobid 1646002176 -mca 
> > orte_ess_vpid 1 -mca orte_ess_num_procs 4 --hnp-uri 
> > 1646002176.0;tcp://192.168.0.11:59646 -mca plm rsh
> 
> (192.168.0.11 is indeed the correct address of the launching machine, 
> hostname pi.  The first "pi" in column 3 is the name of the user who owns the 
> process.
> 
> If I run "telnet 192.168.0.11 59646", it connects.  I can send some garbage 
> into the connection, but this does not cause the orted to exit, nor does it 
> immedately blow up the launching process on the launch machine.  I have not 
> investigated in detail, but it seems that if you molest the TCP connection in 
> this way, the launching process eventually reports an error, but if you 
> don't, it will hang forever.
> 
>   
> One additional oddity, when I run the job in "debug" mode, the clients 
> generate the following dmesg traffic:
> 
> > [ 1002.404021] sctp: [Deprecated]: cpi (pid 13770) Requested 
> > SCTP_SNDRCVINFO event.
> > Use SCTP_RCVINFO through SCTP_RECVRCVINFO option instead.
> > [ 1002.412423] sctp: [Deprecated]: cpi (pid 13772) Requested 
> > SCTP_SNDRCVINFO event.
> > Use SCTP_RCVINFO through SCTP_RECVRCVINFO option instead.
> > [ 1002.427621] sctp: [Deprecated]: cpi (pid 13771) Requested 
> > SCTP_SNDRCVINFO event.
> > Use SCTP_RCVINFO through SCTP_RECVRCVINFO option instead.
> 
> 
> 
>   I have tried:
> 
>  - Adding or removing the domain suffix from the hosts in the machines file.
>  - Checked that the clocks on all four machines match.
>  - Changing the host names in the machines file to invalid names -- this 
> causes the expected failure, reassuring me that the file is being read.  Note 
> that the hanging behavior also occurs with the "-H" option in place of a 
> machine file.
>  - Running with "-mca btl tcp,self -mca btl_tcp_if_include eth0" in case it's 
> having device problems.  When I do this, I see this argument echoed on the 
> orted process on pj, but the behavior is the same, it still hangs.
>  - Removing the "slots=" directive from the machines file.
>  - Disabling IPv6 (via sysctl).
>  - Turning off the SLURM daemons (via systemctl, not by uninstalling them.)
>  - Different host combinations in the machines file.  This changes things in 
> weird ways, which I have not systematically explored.
>    It seems that if pk is the first in line, then the thing eventually times 
> out, but if pj or pl is first, it hangs forever.  The willingness of orted to 
> appear in the client process table seems seems inconsistent, but it may be 
> that it always runs, but I am not consistently catching it.
>  - Adding/removing "multi on" from /etc/host.conf.
> 
> None of these have changed the behavior, except, as noted, briefly after 
> backing out the private domain configuration (which involves editing the 
> hosts file, which motivates the focus on DNS in some of this.)
> 
> 
> All configurations work with "-d", or with "--debug-daemons" or with no 
> arguments inside a slurm allocation, but hang in the "ordinary" case.
> 
> I am stumped.  I am totally willing to believe I have mucked up the network 
> config, but where? How? What's different about debug mode?
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29204.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] Mpirun invocation only works in debug mode, hangs in "normal" mode.

Reply via email to