Justin and all,

the root cause is indeed a bug i fixed in
https://github.com/open-mpi/ompi/pull/2135
i also had this patch applied to home-brew, so if you re-install
open-mpi, you should be fine.

Cheers,

Gilles

for those who want to know more
- Open MPI uses two Unix sockets, one by oob/usock and one by mix
- to keep things simple, oob/usock Unix socket is based on $TMPDIR,
hostname and quite a few more characters.
  OSX default $TMPDIR is not short, so when we append the FQDN (that
might not be short too) and other paths, the size may
  excess the max allowed path for a Unix socket (104 bytes on
Yosemite). this path is currently silently truncated, so
bad/non-understandable things can happen. the patch disqualifies
oob/usock instead of silently truncating the path.
a simple workaround is to
export TMPDIR=/tmp
a better workaround is to
mpirun --mca oob ^usock ...
or you can add to your environment
export OMPI_MCA_oob=^sock
and then use mpirun as usual
- pmix Unix socket path is only based on $TMPDIR plus a few extra characters
bottom line, and unless your $TMPDIR is insanely long, you should be
fine with one of these workarounds, or the patch available at
https://github.com/open-mpi/ompi/pull/2135.patch, or by using the
latest open-mpi from homebrew.

On Fri, Sep 23, 2016 at 11:15 AM, Gilles Gouaillardet <gil...@rist.or.jp> wrote:
> Justin,
>
>
> the root cause could be the length of $TMPDIR that might cause some path
> being truncated.
>
> you can check that by simply using a custom $TMPDIR that has the same size
> than the original one
>
>
> which version of OSX are you running ?
>
> this might explain why Nathan nor i were able to reproduce the issue, and
> i'd like to understand why this
>
> issue went undetected by Open MPI
>
>
> Cheers,
>
>
> Gilles
>
>
>
> On 9/23/2016 3:12 AM, Justin Chang wrote:
>>
>> Oh, so setting this in my ~/.profile
>>
>> export TMPDIR=/tmp
>>
>> in fact solves my problem completely! Not sure why this is the case, but
>> thanks!
>>
>> Justin
>>
>> On Thu, Sep 22, 2016 at 7:33 AM, Gilles Gouaillardet
>> <gilles.gouaillar...@gmail.com> wrote:
>>>
>>> Justin,
>>>
>>> i do not see this error on my laptop
>>>
>>> which version of OS X are you running ?
>>>
>>> can you try to
>>> TMPDIR=/tmp mpirun -n 1
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On Thu, Sep 22, 2016 at 7:21 PM, Nathan Hjelm <hje...@me.com> wrote:
>>>>
>>>> FWIW it works fine for me on my MacBook Pro running 10.12 with Open MPI
>>>> 2.0.1 installed through homebrew:
>>>>
>>>> ✗ brew -v
>>>> Homebrew 1.0.0 (git revision c3105; last commit 2016-09-22)
>>>> Homebrew/homebrew-core (git revision 227e; last commit 2016-09-22)
>>>>
>>>> ✗ brew info openmpi
>>>>
>>>> open-mpi: stable 2.0.1 (bottled), HEAD
>>>> High performance message passing library
>>>> https://www.open-mpi.org/
>>>> Conflicts with: lcdf-typetools, mpich
>>>> /usr/local/Cellar/open-mpi/2.0.1 (688 files, 8.3M) *
>>>>    Poured from bottle on 2016-09-22 at 03:53:35
>>>> From:
>>>> https://github.com/Homebrew/homebrew-core/blob/master/Formula/open-mpi.rb
>>>> ==> Dependencies
>>>> Required: libevent ✔
>>>> ==> Options
>>>> --c++11
>>>>          Build using C++11 mode
>>>> --with-cxx-bindings
>>>>          Enable C++ MPI bindings (deprecated as of MPI-3.0)
>>>> --with-java
>>>>          Build with java support
>>>> --with-mpi-thread-multiple
>>>>          Enable MPI_THREAD_MULTIPLE
>>>> --without-fortran
>>>>          Build without fortran support
>>>> --HEAD
>>>>          Install HEAD version
>>>>
>>>> ✗ type -p mpicc
>>>> mpicc is /usr/local/bin/mpicc
>>>>
>>>> ✗ mpirun --version
>>>> mpirun (Open MPI) 2.0.1
>>>>
>>>> Report bugs to http://www.open-mpi.org/community/help/
>>>>
>>>>
>>>> ✗ mpirun ./ring_c
>>>> Process 0 sending 10 to 1, tag 201 (4 processes in ring)
>>>> Process 0 sent to 1
>>>> Process 0 decremented value: 9
>>>> Process 0 decremented value: 8
>>>> Process 0 decremented value: 7
>>>> Process 0 decremented value: 6
>>>> Process 0 decremented value: 5
>>>> Process 0 decremented value: 4
>>>> Process 0 decremented value: 3
>>>> Process 0 decremented value: 2
>>>> Process 0 decremented value: 1
>>>> Process 0 decremented value: 0
>>>> Process 0 exiting
>>>> Process 1 exiting
>>>> Process 2 exiting
>>>> Process 3 exiting
>>>>
>>>>
>>>> -Nathan
>>>>
>>>>> On Sep 22, 2016, at 3:31 AM, Justin Chang <jychan...@gmail.com> wrote:
>>>>>
>>>>> I tried that and also deleted everything inside $TMPDIR. The error
>>>>> still persists
>>>>>
>>>>> On Thu, Sep 22, 2016 at 4:21 AM, r...@open-mpi.org <r...@open-mpi.org>
>>>>> wrote:
>>>>>>
>>>>>> Try removing the “pmix” entries as well
>>>>>>
>>>>>>> On Sep 22, 2016, at 2:19 AM, Justin Chang <jychan...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> "mpirun -n 1" was just to demonstrate that I get those error
>>>>>>> messages.
>>>>>>> I ran a simple helloworld.c and it still gives those two messages.
>>>>>>>
>>>>>>> I did delete openmpi-sessions-* from my $TMPDIR but it doesn't solve
>>>>>>> the problem. Here's my $TMPDIR:
>>>>>>>
>>>>>>> ~ cd $TMPDIR
>>>>>>> ~ pwd
>>>>>>> /var/folders/jd/qh5zn6jn5kz_byz9gxz5kl2m0000gn/T
>>>>>>> ~ ls
>>>>>>> MediaCache
>>>>>>> TemporaryItems
>>>>>>> com.apple.AddressBook.ContactsAccountsService
>>>>>>> com.apple.AddressBook.InternetAccountsBridge
>>>>>>> com.apple.AirPlayUIAgent
>>>>>>> com.apple.BKAgentService
>>>>>>> com.apple.CalendarAgent
>>>>>>> com.apple.CalendarAgent.CalNCService
>>>>>>> com.apple.CloudPhotosConfiguration
>>>>>>> com.apple.DataDetectorsDynamicData
>>>>>>> com.apple.ICPPhotoStreamLibraryService
>>>>>>> com.apple.InputMethodKit.TextReplacementService
>>>>>>> com.apple.PhotoIngestService
>>>>>>> com.apple.Preview
>>>>>>> com.apple.Safari
>>>>>>> com.apple.SocialPushAgent
>>>>>>> com.apple.WeatherKitService
>>>>>>> com.apple.cloudphotosd
>>>>>>> com.apple.dt.XCDocumenter.XCDocumenterExtension
>>>>>>> com.apple.dt.XcodeBuiltInExtensions
>>>>>>> com.apple.geod
>>>>>>> com.apple.iCal.CalendarNC
>>>>>>> com.apple.lateragent
>>>>>>> com.apple.ncplugin.stocks
>>>>>>> com.apple.ncplugin.weather
>>>>>>> com.apple.notificationcenterui.WeatherSummary
>>>>>>> com.apple.photolibraryd
>>>>>>> com.apple.photomoments
>>>>>>> com.apple.quicklook.ui.helper
>>>>>>> com.apple.soagent
>>>>>>> com.getdropbox.dropbox.garcon
>>>>>>> icdd501
>>>>>>> ics21406
>>>>>>> openmpi-sessions-501@Justins-MacBook-Pro-2_0
>>>>>>> pmix-12195
>>>>>>> pmix-12271
>>>>>>> pmix-12289
>>>>>>> pmix-12295
>>>>>>> pmix-12304
>>>>>>> pmix-12313
>>>>>>> pmix-12367
>>>>>>> pmix-12397
>>>>>>> pmix-12775
>>>>>>> pmix-12858
>>>>>>> pmix-17118
>>>>>>> pmix-1754
>>>>>>> pmix-20632
>>>>>>> pmix-20793
>>>>>>> pmix-20849
>>>>>>> pmix-21019
>>>>>>> pmix-22316
>>>>>>> pmix-8129
>>>>>>> pmix-8494
>>>>>>> xcrun_db
>>>>>>> ~ rm -rf openmpi-sessions-501@Justins-MacBook-Pro-2_0
>>>>>>> ~ mpirun -n 1
>>>>>>> [Justins-MacBook-Pro-2.local:22527] [[12992,0],0] bind() failed on
>>>>>>> error Address already in use (48)
>>>>>>> [Justins-MacBook-Pro-2.local:22527] [[12992,0],0] ORTE_ERROR_LOG:
>>>>>>> Error in file oob_usock_component.c at line 228
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>> No executable was specified on the mpirun command line.
>>>>>>>
>>>>>>> Aborting.
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>> and when I type "ls" the directory
>>>>>>> "openmpi-sessions-501@Justins-MacBook-Pro-2_0" reappeared. Unless
>>>>>>> there's a different directory I need to look for?
>>>>>>>
>>>>>>> On Thu, Sep 22, 2016 at 4:08 AM, r...@open-mpi.org <r...@open-mpi.org>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Maybe I’m missing something, but “mpirun -n 1” doesn’t include the
>>>>>>>> name of an application to execute.
>>>>>>>>
>>>>>>>> The error message prior to that error indicates that you have some
>>>>>>>> cruft sitting in your tmpdir. You just need to clean it out - look for
>>>>>>>> something that starts with “openmpi”
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Sep 22, 2016, at 1:45 AM, Justin Chang <jychan...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Dear all,
>>>>>>>>>
>>>>>>>>> So I upgraded/updated my Homebrew on my Macbook and installed Open
>>>>>>>>> MPI
>>>>>>>>> 2.0.1 using "brew install openmpi". However, when I open up a
>>>>>>>>> terminal
>>>>>>>>> and type "mpirun -n 1" I get the following messages:
>>>>>>>>>
>>>>>>>>> ~ mpirun -n 1
>>>>>>>>> [Justins-MacBook-Pro-2.local:20793] [[13318,0],0] bind() failed on
>>>>>>>>> error Address already in use (48)
>>>>>>>>> [Justins-MacBook-Pro-2.local:20793] [[13318,0],0] ORTE_ERROR_LOG:
>>>>>>>>> Error in file oob_usock_component.c at line 228
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>> No executable was specified on the mpirun command line.
>>>>>>>>>
>>>>>>>>> Aborting.
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I have never seen anything like the first two lines. I also
>>>>>>>>> installed
>>>>>>>>> python and mpi4py via pip, and when I still get the same messages:
>>>>>>>>>
>>>>>>>>> ~ python -c "from mpi4py import MPI"
>>>>>>>>> [Justins-MacBook-Pro-2.local:20871] [[13496,0],0] bind() failed on
>>>>>>>>> error Address already in use (48)
>>>>>>>>> [Justins-MacBook-Pro-2.local:20871] [[13496,0],0] ORTE_ERROR_LOG:
>>>>>>>>> Error in file oob_usock_component.c at line 228
>>>>>>>>>
>>>>>>>>> But now if I add "mpirun -n 1" I get the following:
>>>>>>>>>
>>>>>>>>> ~ mpirun -n 1 python -c "from mpi4py import MPI"
>>>>>>>>> [Justins-MacBook-Pro-2.local:20935] [[13560,0],0] bind() failed on
>>>>>>>>> error Address already in use (48)
>>>>>>>>> [Justins-MacBook-Pro-2.local:20935] [[13560,0],0] ORTE_ERROR_LOG:
>>>>>>>>> Error in file oob_usock_component.c at line 228
>>>>>>>>> [Justins-MacBook-Pro-2.local:20936] [[13560,1],0]
>>>>>>>>> usock_peer_send_blocking: send() to socket 17 failed: Socket is not
>>>>>>>>> connected (57)
>>>>>>>>> [Justins-MacBook-Pro-2.local:20936] [[13560,1],0] ORTE_ERROR_LOG:
>>>>>>>>> Unreachable in file oob_usock_connection.c at line 315
>>>>>>>>> [Justins-MacBook-Pro-2.local:20936] [[13560,1],0]
>>>>>>>>> orte_usock_peer_try_connect: usock_peer_send_connect_ack to proc
>>>>>>>>> [[13560,0],0] failed: Unreachable (-12)
>>>>>>>>> [Justins-MacBook-Pro-2:20936] *** Process received signal ***
>>>>>>>>> [Justins-MacBook-Pro-2:20936] Signal: Segmentation fault: 11 (11)
>>>>>>>>> [Justins-MacBook-Pro-2:20936] Signal code:  (0)
>>>>>>>>> [Justins-MacBook-Pro-2:20936] Failing at address: 0x0
>>>>>>>>> -------------------------------------------------------
>>>>>>>>> Primary job  terminated normally, but 1 process returned
>>>>>>>>> a non-zero exit code.. Per user-direction, the job has been
>>>>>>>>> aborted.
>>>>>>>>> -------------------------------------------------------
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>> mpirun detected that one or more processes exited with non-zero
>>>>>>>>> status, thus causing
>>>>>>>>> the job to be terminated. The first process to do so was:
>>>>>>>>>
>>>>>>>>> Process name: [[13560,1],0]
>>>>>>>>> Exit code:    1
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> Clearly something is wrong here. I already tried things like "rm
>>>>>>>>> -rf
>>>>>>>>> $TMPDIR/openmpi-sessions-*" but said directory keeps reappearing
>>>>>>>>> and
>>>>>>>>> the error persists. Why does this happen and how do I fix it? For
>>>>>>>>> what
>>>>>>>>> it's worth, here's some other information that may help:
>>>>>>>>>
>>>>>>>>> ~ mpicc --version
>>>>>>>>> Apple LLVM version 8.0.0 (clang-800.0.38)
>>>>>>>>> Target: x86_64-apple-darwin15.6.0
>>>>>>>>> Thread model: posix
>>>>>>>>> InstalledDir:
>>>>>>>>> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
>>>>>>>>>
>>>>>>>>> I tested Hello World with both mpicc and mpif90, and they still
>>>>>>>>> work
>>>>>>>>> despite showing those two error/warning messages.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Justin
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users@lists.open-mpi.org
>>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users@lists.open-mpi.org
>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users@lists.open-mpi.org
>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users@lists.open-mpi.org
>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users@lists.open-mpi.org
>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.open-mpi.org
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to