Thank you, using the default $TMPDIR works now. On Fri, Sep 30, 2016 at 7:32 AM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote:
> Justin and all, > > the root cause is indeed a bug i fixed in > https://github.com/open-mpi/ompi/pull/2135 > i also had this patch applied to home-brew, so if you re-install > open-mpi, you should be fine. > > Cheers, > > Gilles > > for those who want to know more > - Open MPI uses two Unix sockets, one by oob/usock and one by mix > - to keep things simple, oob/usock Unix socket is based on $TMPDIR, > hostname and quite a few more characters. > OSX default $TMPDIR is not short, so when we append the FQDN (that > might not be short too) and other paths, the size may > excess the max allowed path for a Unix socket (104 bytes on > Yosemite). this path is currently silently truncated, so > bad/non-understandable things can happen. the patch disqualifies > oob/usock instead of silently truncating the path. > a simple workaround is to > export TMPDIR=/tmp > a better workaround is to > mpirun --mca oob ^usock ... > or you can add to your environment > export OMPI_MCA_oob=^sock > and then use mpirun as usual > - pmix Unix socket path is only based on $TMPDIR plus a few extra > characters > bottom line, and unless your $TMPDIR is insanely long, you should be > fine with one of these workarounds, or the patch available at > https://github.com/open-mpi/ompi/pull/2135.patch, or by using the > latest open-mpi from homebrew. > > On Fri, Sep 23, 2016 at 11:15 AM, Gilles Gouaillardet <gil...@rist.or.jp> > wrote: > > Justin, > > > > > > the root cause could be the length of $TMPDIR that might cause some path > > being truncated. > > > > you can check that by simply using a custom $TMPDIR that has the same > size > > than the original one > > > > > > which version of OSX are you running ? > > > > this might explain why Nathan nor i were able to reproduce the issue, and > > i'd like to understand why this > > > > issue went undetected by Open MPI > > > > > > Cheers, > > > > > > Gilles > > > > > > > > On 9/23/2016 3:12 AM, Justin Chang wrote: > >> > >> Oh, so setting this in my ~/.profile > >> > >> export TMPDIR=/tmp > >> > >> in fact solves my problem completely! Not sure why this is the case, but > >> thanks! > >> > >> Justin > >> > >> On Thu, Sep 22, 2016 at 7:33 AM, Gilles Gouaillardet > >> <gilles.gouaillar...@gmail.com> wrote: > >>> > >>> Justin, > >>> > >>> i do not see this error on my laptop > >>> > >>> which version of OS X are you running ? > >>> > >>> can you try to > >>> TMPDIR=/tmp mpirun -n 1 > >>> > >>> Cheers, > >>> > >>> Gilles > >>> > >>> On Thu, Sep 22, 2016 at 7:21 PM, Nathan Hjelm <hje...@me.com> wrote: > >>>> > >>>> FWIW it works fine for me on my MacBook Pro running 10.12 with Open > MPI > >>>> 2.0.1 installed through homebrew: > >>>> > >>>> ✗ brew -v > >>>> Homebrew 1.0.0 (git revision c3105; last commit 2016-09-22) > >>>> Homebrew/homebrew-core (git revision 227e; last commit 2016-09-22) > >>>> > >>>> ✗ brew info openmpi > >>>> > >>>> open-mpi: stable 2.0.1 (bottled), HEAD > >>>> High performance message passing library > >>>> https://www.open-mpi.org/ > >>>> Conflicts with: lcdf-typetools, mpich > >>>> /usr/local/Cellar/open-mpi/2.0.1 (688 files, 8.3M) * > >>>> Poured from bottle on 2016-09-22 at 03:53:35 > >>>> From: > >>>> https://github.com/Homebrew/homebrew-core/blob/master/ > Formula/open-mpi.rb > >>>> ==> Dependencies > >>>> Required: libevent ✔ > >>>> ==> Options > >>>> --c++11 > >>>> Build using C++11 mode > >>>> --with-cxx-bindings > >>>> Enable C++ MPI bindings (deprecated as of MPI-3.0) > >>>> --with-java > >>>> Build with java support > >>>> --with-mpi-thread-multiple > >>>> Enable MPI_THREAD_MULTIPLE > >>>> --without-fortran > >>>> Build without fortran support > >>>> --HEAD > >>>> Install HEAD version > >>>> > >>>> ✗ type -p mpicc > >>>> mpicc is /usr/local/bin/mpicc > >>>> > >>>> ✗ mpirun --version > >>>> mpirun (Open MPI) 2.0.1 > >>>> > >>>> Report bugs to http://www.open-mpi.org/community/help/ > >>>> > >>>> > >>>> ✗ mpirun ./ring_c > >>>> Process 0 sending 10 to 1, tag 201 (4 processes in ring) > >>>> Process 0 sent to 1 > >>>> Process 0 decremented value: 9 > >>>> Process 0 decremented value: 8 > >>>> Process 0 decremented value: 7 > >>>> Process 0 decremented value: 6 > >>>> Process 0 decremented value: 5 > >>>> Process 0 decremented value: 4 > >>>> Process 0 decremented value: 3 > >>>> Process 0 decremented value: 2 > >>>> Process 0 decremented value: 1 > >>>> Process 0 decremented value: 0 > >>>> Process 0 exiting > >>>> Process 1 exiting > >>>> Process 2 exiting > >>>> Process 3 exiting > >>>> > >>>> > >>>> -Nathan > >>>> > >>>>> On Sep 22, 2016, at 3:31 AM, Justin Chang <jychan...@gmail.com> > wrote: > >>>>> > >>>>> I tried that and also deleted everything inside $TMPDIR. The error > >>>>> still persists > >>>>> > >>>>> On Thu, Sep 22, 2016 at 4:21 AM, r...@open-mpi.org <r...@open-mpi.org> > >>>>> wrote: > >>>>>> > >>>>>> Try removing the “pmix” entries as well > >>>>>> > >>>>>>> On Sep 22, 2016, at 2:19 AM, Justin Chang <jychan...@gmail.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>> "mpirun -n 1" was just to demonstrate that I get those error > >>>>>>> messages. > >>>>>>> I ran a simple helloworld.c and it still gives those two messages. > >>>>>>> > >>>>>>> I did delete openmpi-sessions-* from my $TMPDIR but it doesn't > solve > >>>>>>> the problem. Here's my $TMPDIR: > >>>>>>> > >>>>>>> ~ cd $TMPDIR > >>>>>>> ~ pwd > >>>>>>> /var/folders/jd/qh5zn6jn5kz_byz9gxz5kl2m0000gn/T > >>>>>>> ~ ls > >>>>>>> MediaCache > >>>>>>> TemporaryItems > >>>>>>> com.apple.AddressBook.ContactsAccountsService > >>>>>>> com.apple.AddressBook.InternetAccountsBridge > >>>>>>> com.apple.AirPlayUIAgent > >>>>>>> com.apple.BKAgentService > >>>>>>> com.apple.CalendarAgent > >>>>>>> com.apple.CalendarAgent.CalNCService > >>>>>>> com.apple.CloudPhotosConfiguration > >>>>>>> com.apple.DataDetectorsDynamicData > >>>>>>> com.apple.ICPPhotoStreamLibraryService > >>>>>>> com.apple.InputMethodKit.TextReplacementService > >>>>>>> com.apple.PhotoIngestService > >>>>>>> com.apple.Preview > >>>>>>> com.apple.Safari > >>>>>>> com.apple.SocialPushAgent > >>>>>>> com.apple.WeatherKitService > >>>>>>> com.apple.cloudphotosd > >>>>>>> com.apple.dt.XCDocumenter.XCDocumenterExtension > >>>>>>> com.apple.dt.XcodeBuiltInExtensions > >>>>>>> com.apple.geod > >>>>>>> com.apple.iCal.CalendarNC > >>>>>>> com.apple.lateragent > >>>>>>> com.apple.ncplugin.stocks > >>>>>>> com.apple.ncplugin.weather > >>>>>>> com.apple.notificationcenterui.WeatherSummary > >>>>>>> com.apple.photolibraryd > >>>>>>> com.apple.photomoments > >>>>>>> com.apple.quicklook.ui.helper > >>>>>>> com.apple.soagent > >>>>>>> com.getdropbox.dropbox.garcon > >>>>>>> icdd501 > >>>>>>> ics21406 > >>>>>>> openmpi-sessions-501@Justins-MacBook-Pro-2_0 > >>>>>>> pmix-12195 > >>>>>>> pmix-12271 > >>>>>>> pmix-12289 > >>>>>>> pmix-12295 > >>>>>>> pmix-12304 > >>>>>>> pmix-12313 > >>>>>>> pmix-12367 > >>>>>>> pmix-12397 > >>>>>>> pmix-12775 > >>>>>>> pmix-12858 > >>>>>>> pmix-17118 > >>>>>>> pmix-1754 > >>>>>>> pmix-20632 > >>>>>>> pmix-20793 > >>>>>>> pmix-20849 > >>>>>>> pmix-21019 > >>>>>>> pmix-22316 > >>>>>>> pmix-8129 > >>>>>>> pmix-8494 > >>>>>>> xcrun_db > >>>>>>> ~ rm -rf openmpi-sessions-501@Justins-MacBook-Pro-2_0 > >>>>>>> ~ mpirun -n 1 > >>>>>>> [Justins-MacBook-Pro-2.local:22527] [[12992,0],0] bind() failed on > >>>>>>> error Address already in use (48) > >>>>>>> [Justins-MacBook-Pro-2.local:22527] [[12992,0],0] ORTE_ERROR_LOG: > >>>>>>> Error in file oob_usock_component.c at line 228 > >>>>>>> > >>>>>>> ------------------------------------------------------------ > -------------- > >>>>>>> No executable was specified on the mpirun command line. > >>>>>>> > >>>>>>> Aborting. > >>>>>>> > >>>>>>> ------------------------------------------------------------ > -------------- > >>>>>>> > >>>>>>> and when I type "ls" the directory > >>>>>>> "openmpi-sessions-501@Justins-MacBook-Pro-2_0" reappeared. Unless > >>>>>>> there's a different directory I need to look for? > >>>>>>> > >>>>>>> On Thu, Sep 22, 2016 at 4:08 AM, r...@open-mpi.org < > r...@open-mpi.org> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> Maybe I’m missing something, but “mpirun -n 1” doesn’t include the > >>>>>>>> name of an application to execute. > >>>>>>>> > >>>>>>>> The error message prior to that error indicates that you have some > >>>>>>>> cruft sitting in your tmpdir. You just need to clean it out - > look for > >>>>>>>> something that starts with “openmpi” > >>>>>>>> > >>>>>>>> > >>>>>>>>> On Sep 22, 2016, at 1:45 AM, Justin Chang <jychan...@gmail.com> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Dear all, > >>>>>>>>> > >>>>>>>>> So I upgraded/updated my Homebrew on my Macbook and installed > Open > >>>>>>>>> MPI > >>>>>>>>> 2.0.1 using "brew install openmpi". However, when I open up a > >>>>>>>>> terminal > >>>>>>>>> and type "mpirun -n 1" I get the following messages: > >>>>>>>>> > >>>>>>>>> ~ mpirun -n 1 > >>>>>>>>> [Justins-MacBook-Pro-2.local:20793] [[13318,0],0] bind() failed > on > >>>>>>>>> error Address already in use (48) > >>>>>>>>> [Justins-MacBook-Pro-2.local:20793] [[13318,0],0] > ORTE_ERROR_LOG: > >>>>>>>>> Error in file oob_usock_component.c at line 228 > >>>>>>>>> > >>>>>>>>> ------------------------------------------------------------ > -------------- > >>>>>>>>> No executable was specified on the mpirun command line. > >>>>>>>>> > >>>>>>>>> Aborting. > >>>>>>>>> > >>>>>>>>> ------------------------------------------------------------ > -------------- > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> I have never seen anything like the first two lines. I also > >>>>>>>>> installed > >>>>>>>>> python and mpi4py via pip, and when I still get the same > messages: > >>>>>>>>> > >>>>>>>>> ~ python -c "from mpi4py import MPI" > >>>>>>>>> [Justins-MacBook-Pro-2.local:20871] [[13496,0],0] bind() failed > on > >>>>>>>>> error Address already in use (48) > >>>>>>>>> [Justins-MacBook-Pro-2.local:20871] [[13496,0],0] > ORTE_ERROR_LOG: > >>>>>>>>> Error in file oob_usock_component.c at line 228 > >>>>>>>>> > >>>>>>>>> But now if I add "mpirun -n 1" I get the following: > >>>>>>>>> > >>>>>>>>> ~ mpirun -n 1 python -c "from mpi4py import MPI" > >>>>>>>>> [Justins-MacBook-Pro-2.local:20935] [[13560,0],0] bind() failed > on > >>>>>>>>> error Address already in use (48) > >>>>>>>>> [Justins-MacBook-Pro-2.local:20935] [[13560,0],0] > ORTE_ERROR_LOG: > >>>>>>>>> Error in file oob_usock_component.c at line 228 > >>>>>>>>> [Justins-MacBook-Pro-2.local:20936] [[13560,1],0] > >>>>>>>>> usock_peer_send_blocking: send() to socket 17 failed: Socket is > not > >>>>>>>>> connected (57) > >>>>>>>>> [Justins-MacBook-Pro-2.local:20936] [[13560,1],0] > ORTE_ERROR_LOG: > >>>>>>>>> Unreachable in file oob_usock_connection.c at line 315 > >>>>>>>>> [Justins-MacBook-Pro-2.local:20936] [[13560,1],0] > >>>>>>>>> orte_usock_peer_try_connect: usock_peer_send_connect_ack to proc > >>>>>>>>> [[13560,0],0] failed: Unreachable (-12) > >>>>>>>>> [Justins-MacBook-Pro-2:20936] *** Process received signal *** > >>>>>>>>> [Justins-MacBook-Pro-2:20936] Signal: Segmentation fault: 11 (11) > >>>>>>>>> [Justins-MacBook-Pro-2:20936] Signal code: (0) > >>>>>>>>> [Justins-MacBook-Pro-2:20936] Failing at address: 0x0 > >>>>>>>>> ------------------------------------------------------- > >>>>>>>>> Primary job terminated normally, but 1 process returned > >>>>>>>>> a non-zero exit code.. Per user-direction, the job has been > >>>>>>>>> aborted. > >>>>>>>>> ------------------------------------------------------- > >>>>>>>>> > >>>>>>>>> ------------------------------------------------------------ > -------------- > >>>>>>>>> mpirun detected that one or more processes exited with non-zero > >>>>>>>>> status, thus causing > >>>>>>>>> the job to be terminated. The first process to do so was: > >>>>>>>>> > >>>>>>>>> Process name: [[13560,1],0] > >>>>>>>>> Exit code: 1 > >>>>>>>>> > >>>>>>>>> ------------------------------------------------------------ > -------------- > >>>>>>>>> > >>>>>>>>> Clearly something is wrong here. I already tried things like "rm > >>>>>>>>> -rf > >>>>>>>>> $TMPDIR/openmpi-sessions-*" but said directory keeps reappearing > >>>>>>>>> and > >>>>>>>>> the error persists. Why does this happen and how do I fix it? For > >>>>>>>>> what > >>>>>>>>> it's worth, here's some other information that may help: > >>>>>>>>> > >>>>>>>>> ~ mpicc --version > >>>>>>>>> Apple LLVM version 8.0.0 (clang-800.0.38) > >>>>>>>>> Target: x86_64-apple-darwin15.6.0 > >>>>>>>>> Thread model: posix > >>>>>>>>> InstalledDir: > >>>>>>>>> /Applications/Xcode.app/Contents/Developer/Toolchains/ > XcodeDefault.xctoolchain/usr/bin > >>>>>>>>> > >>>>>>>>> I tested Hello World with both mpicc and mpif90, and they still > >>>>>>>>> work > >>>>>>>>> despite showing those two error/warning messages. > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Justin > >>>>>>>>> _______________________________________________ > >>>>>>>>> users mailing list > >>>>>>>>> users@lists.open-mpi.org > >>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> users mailing list > >>>>>>>> users@lists.open-mpi.org > >>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> users mailing list > >>>>>>> users@lists.open-mpi.org > >>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > >>>>>> > >>>>>> _______________________________________________ > >>>>>> users mailing list > >>>>>> users@lists.open-mpi.org > >>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > >>>>> > >>>>> _______________________________________________ > >>>>> users mailing list > >>>>> users@lists.open-mpi.org > >>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > >>>> > >>>> _______________________________________________ > >>>> users mailing list > >>>> users@lists.open-mpi.org > >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > >>> > >>> _______________________________________________ > >>> users mailing list > >>> users@lists.open-mpi.org > >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > >> > >> _______________________________________________ > >> users mailing list > >> users@lists.open-mpi.org > >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users