Thank you, using the default $TMPDIR works now.

On Fri, Sep 30, 2016 at 7:32 AM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Justin and all,
>
> the root cause is indeed a bug i fixed in
> https://github.com/open-mpi/ompi/pull/2135
> i also had this patch applied to home-brew, so if you re-install
> open-mpi, you should be fine.
>
> Cheers,
>
> Gilles
>
> for those who want to know more
> - Open MPI uses two Unix sockets, one by oob/usock and one by mix
> - to keep things simple, oob/usock Unix socket is based on $TMPDIR,
> hostname and quite a few more characters.
>   OSX default $TMPDIR is not short, so when we append the FQDN (that
> might not be short too) and other paths, the size may
>   excess the max allowed path for a Unix socket (104 bytes on
> Yosemite). this path is currently silently truncated, so
> bad/non-understandable things can happen. the patch disqualifies
> oob/usock instead of silently truncating the path.
> a simple workaround is to
> export TMPDIR=/tmp
> a better workaround is to
> mpirun --mca oob ^usock ...
> or you can add to your environment
> export OMPI_MCA_oob=^sock
> and then use mpirun as usual
> - pmix Unix socket path is only based on $TMPDIR plus a few extra
> characters
> bottom line, and unless your $TMPDIR is insanely long, you should be
> fine with one of these workarounds, or the patch available at
> https://github.com/open-mpi/ompi/pull/2135.patch, or by using the
> latest open-mpi from homebrew.
>
> On Fri, Sep 23, 2016 at 11:15 AM, Gilles Gouaillardet <gil...@rist.or.jp>
> wrote:
> > Justin,
> >
> >
> > the root cause could be the length of $TMPDIR that might cause some path
> > being truncated.
> >
> > you can check that by simply using a custom $TMPDIR that has the same
> size
> > than the original one
> >
> >
> > which version of OSX are you running ?
> >
> > this might explain why Nathan nor i were able to reproduce the issue, and
> > i'd like to understand why this
> >
> > issue went undetected by Open MPI
> >
> >
> > Cheers,
> >
> >
> > Gilles
> >
> >
> >
> > On 9/23/2016 3:12 AM, Justin Chang wrote:
> >>
> >> Oh, so setting this in my ~/.profile
> >>
> >> export TMPDIR=/tmp
> >>
> >> in fact solves my problem completely! Not sure why this is the case, but
> >> thanks!
> >>
> >> Justin
> >>
> >> On Thu, Sep 22, 2016 at 7:33 AM, Gilles Gouaillardet
> >> <gilles.gouaillar...@gmail.com> wrote:
> >>>
> >>> Justin,
> >>>
> >>> i do not see this error on my laptop
> >>>
> >>> which version of OS X are you running ?
> >>>
> >>> can you try to
> >>> TMPDIR=/tmp mpirun -n 1
> >>>
> >>> Cheers,
> >>>
> >>> Gilles
> >>>
> >>> On Thu, Sep 22, 2016 at 7:21 PM, Nathan Hjelm <hje...@me.com> wrote:
> >>>>
> >>>> FWIW it works fine for me on my MacBook Pro running 10.12 with Open
> MPI
> >>>> 2.0.1 installed through homebrew:
> >>>>
> >>>> ✗ brew -v
> >>>> Homebrew 1.0.0 (git revision c3105; last commit 2016-09-22)
> >>>> Homebrew/homebrew-core (git revision 227e; last commit 2016-09-22)
> >>>>
> >>>> ✗ brew info openmpi
> >>>>
> >>>> open-mpi: stable 2.0.1 (bottled), HEAD
> >>>> High performance message passing library
> >>>> https://www.open-mpi.org/
> >>>> Conflicts with: lcdf-typetools, mpich
> >>>> /usr/local/Cellar/open-mpi/2.0.1 (688 files, 8.3M) *
> >>>>    Poured from bottle on 2016-09-22 at 03:53:35
> >>>> From:
> >>>> https://github.com/Homebrew/homebrew-core/blob/master/
> Formula/open-mpi.rb
> >>>> ==> Dependencies
> >>>> Required: libevent ✔
> >>>> ==> Options
> >>>> --c++11
> >>>>          Build using C++11 mode
> >>>> --with-cxx-bindings
> >>>>          Enable C++ MPI bindings (deprecated as of MPI-3.0)
> >>>> --with-java
> >>>>          Build with java support
> >>>> --with-mpi-thread-multiple
> >>>>          Enable MPI_THREAD_MULTIPLE
> >>>> --without-fortran
> >>>>          Build without fortran support
> >>>> --HEAD
> >>>>          Install HEAD version
> >>>>
> >>>> ✗ type -p mpicc
> >>>> mpicc is /usr/local/bin/mpicc
> >>>>
> >>>> ✗ mpirun --version
> >>>> mpirun (Open MPI) 2.0.1
> >>>>
> >>>> Report bugs to http://www.open-mpi.org/community/help/
> >>>>
> >>>>
> >>>> ✗ mpirun ./ring_c
> >>>> Process 0 sending 10 to 1, tag 201 (4 processes in ring)
> >>>> Process 0 sent to 1
> >>>> Process 0 decremented value: 9
> >>>> Process 0 decremented value: 8
> >>>> Process 0 decremented value: 7
> >>>> Process 0 decremented value: 6
> >>>> Process 0 decremented value: 5
> >>>> Process 0 decremented value: 4
> >>>> Process 0 decremented value: 3
> >>>> Process 0 decremented value: 2
> >>>> Process 0 decremented value: 1
> >>>> Process 0 decremented value: 0
> >>>> Process 0 exiting
> >>>> Process 1 exiting
> >>>> Process 2 exiting
> >>>> Process 3 exiting
> >>>>
> >>>>
> >>>> -Nathan
> >>>>
> >>>>> On Sep 22, 2016, at 3:31 AM, Justin Chang <jychan...@gmail.com>
> wrote:
> >>>>>
> >>>>> I tried that and also deleted everything inside $TMPDIR. The error
> >>>>> still persists
> >>>>>
> >>>>> On Thu, Sep 22, 2016 at 4:21 AM, r...@open-mpi.org <r...@open-mpi.org>
> >>>>> wrote:
> >>>>>>
> >>>>>> Try removing the “pmix” entries as well
> >>>>>>
> >>>>>>> On Sep 22, 2016, at 2:19 AM, Justin Chang <jychan...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> "mpirun -n 1" was just to demonstrate that I get those error
> >>>>>>> messages.
> >>>>>>> I ran a simple helloworld.c and it still gives those two messages.
> >>>>>>>
> >>>>>>> I did delete openmpi-sessions-* from my $TMPDIR but it doesn't
> solve
> >>>>>>> the problem. Here's my $TMPDIR:
> >>>>>>>
> >>>>>>> ~ cd $TMPDIR
> >>>>>>> ~ pwd
> >>>>>>> /var/folders/jd/qh5zn6jn5kz_byz9gxz5kl2m0000gn/T
> >>>>>>> ~ ls
> >>>>>>> MediaCache
> >>>>>>> TemporaryItems
> >>>>>>> com.apple.AddressBook.ContactsAccountsService
> >>>>>>> com.apple.AddressBook.InternetAccountsBridge
> >>>>>>> com.apple.AirPlayUIAgent
> >>>>>>> com.apple.BKAgentService
> >>>>>>> com.apple.CalendarAgent
> >>>>>>> com.apple.CalendarAgent.CalNCService
> >>>>>>> com.apple.CloudPhotosConfiguration
> >>>>>>> com.apple.DataDetectorsDynamicData
> >>>>>>> com.apple.ICPPhotoStreamLibraryService
> >>>>>>> com.apple.InputMethodKit.TextReplacementService
> >>>>>>> com.apple.PhotoIngestService
> >>>>>>> com.apple.Preview
> >>>>>>> com.apple.Safari
> >>>>>>> com.apple.SocialPushAgent
> >>>>>>> com.apple.WeatherKitService
> >>>>>>> com.apple.cloudphotosd
> >>>>>>> com.apple.dt.XCDocumenter.XCDocumenterExtension
> >>>>>>> com.apple.dt.XcodeBuiltInExtensions
> >>>>>>> com.apple.geod
> >>>>>>> com.apple.iCal.CalendarNC
> >>>>>>> com.apple.lateragent
> >>>>>>> com.apple.ncplugin.stocks
> >>>>>>> com.apple.ncplugin.weather
> >>>>>>> com.apple.notificationcenterui.WeatherSummary
> >>>>>>> com.apple.photolibraryd
> >>>>>>> com.apple.photomoments
> >>>>>>> com.apple.quicklook.ui.helper
> >>>>>>> com.apple.soagent
> >>>>>>> com.getdropbox.dropbox.garcon
> >>>>>>> icdd501
> >>>>>>> ics21406
> >>>>>>> openmpi-sessions-501@Justins-MacBook-Pro-2_0
> >>>>>>> pmix-12195
> >>>>>>> pmix-12271
> >>>>>>> pmix-12289
> >>>>>>> pmix-12295
> >>>>>>> pmix-12304
> >>>>>>> pmix-12313
> >>>>>>> pmix-12367
> >>>>>>> pmix-12397
> >>>>>>> pmix-12775
> >>>>>>> pmix-12858
> >>>>>>> pmix-17118
> >>>>>>> pmix-1754
> >>>>>>> pmix-20632
> >>>>>>> pmix-20793
> >>>>>>> pmix-20849
> >>>>>>> pmix-21019
> >>>>>>> pmix-22316
> >>>>>>> pmix-8129
> >>>>>>> pmix-8494
> >>>>>>> xcrun_db
> >>>>>>> ~ rm -rf openmpi-sessions-501@Justins-MacBook-Pro-2_0
> >>>>>>> ~ mpirun -n 1
> >>>>>>> [Justins-MacBook-Pro-2.local:22527] [[12992,0],0] bind() failed on
> >>>>>>> error Address already in use (48)
> >>>>>>> [Justins-MacBook-Pro-2.local:22527] [[12992,0],0] ORTE_ERROR_LOG:
> >>>>>>> Error in file oob_usock_component.c at line 228
> >>>>>>>
> >>>>>>> ------------------------------------------------------------
> --------------
> >>>>>>> No executable was specified on the mpirun command line.
> >>>>>>>
> >>>>>>> Aborting.
> >>>>>>>
> >>>>>>> ------------------------------------------------------------
> --------------
> >>>>>>>
> >>>>>>> and when I type "ls" the directory
> >>>>>>> "openmpi-sessions-501@Justins-MacBook-Pro-2_0" reappeared. Unless
> >>>>>>> there's a different directory I need to look for?
> >>>>>>>
> >>>>>>> On Thu, Sep 22, 2016 at 4:08 AM, r...@open-mpi.org <
> r...@open-mpi.org>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Maybe I’m missing something, but “mpirun -n 1” doesn’t include the
> >>>>>>>> name of an application to execute.
> >>>>>>>>
> >>>>>>>> The error message prior to that error indicates that you have some
> >>>>>>>> cruft sitting in your tmpdir. You just need to clean it out -
> look for
> >>>>>>>> something that starts with “openmpi”
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Sep 22, 2016, at 1:45 AM, Justin Chang <jychan...@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Dear all,
> >>>>>>>>>
> >>>>>>>>> So I upgraded/updated my Homebrew on my Macbook and installed
> Open
> >>>>>>>>> MPI
> >>>>>>>>> 2.0.1 using "brew install openmpi". However, when I open up a
> >>>>>>>>> terminal
> >>>>>>>>> and type "mpirun -n 1" I get the following messages:
> >>>>>>>>>
> >>>>>>>>> ~ mpirun -n 1
> >>>>>>>>> [Justins-MacBook-Pro-2.local:20793] [[13318,0],0] bind() failed
> on
> >>>>>>>>> error Address already in use (48)
> >>>>>>>>> [Justins-MacBook-Pro-2.local:20793] [[13318,0],0]
> ORTE_ERROR_LOG:
> >>>>>>>>> Error in file oob_usock_component.c at line 228
> >>>>>>>>>
> >>>>>>>>> ------------------------------------------------------------
> --------------
> >>>>>>>>> No executable was specified on the mpirun command line.
> >>>>>>>>>
> >>>>>>>>> Aborting.
> >>>>>>>>>
> >>>>>>>>> ------------------------------------------------------------
> --------------
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I have never seen anything like the first two lines. I also
> >>>>>>>>> installed
> >>>>>>>>> python and mpi4py via pip, and when I still get the same
> messages:
> >>>>>>>>>
> >>>>>>>>> ~ python -c "from mpi4py import MPI"
> >>>>>>>>> [Justins-MacBook-Pro-2.local:20871] [[13496,0],0] bind() failed
> on
> >>>>>>>>> error Address already in use (48)
> >>>>>>>>> [Justins-MacBook-Pro-2.local:20871] [[13496,0],0]
> ORTE_ERROR_LOG:
> >>>>>>>>> Error in file oob_usock_component.c at line 228
> >>>>>>>>>
> >>>>>>>>> But now if I add "mpirun -n 1" I get the following:
> >>>>>>>>>
> >>>>>>>>> ~ mpirun -n 1 python -c "from mpi4py import MPI"
> >>>>>>>>> [Justins-MacBook-Pro-2.local:20935] [[13560,0],0] bind() failed
> on
> >>>>>>>>> error Address already in use (48)
> >>>>>>>>> [Justins-MacBook-Pro-2.local:20935] [[13560,0],0]
> ORTE_ERROR_LOG:
> >>>>>>>>> Error in file oob_usock_component.c at line 228
> >>>>>>>>> [Justins-MacBook-Pro-2.local:20936] [[13560,1],0]
> >>>>>>>>> usock_peer_send_blocking: send() to socket 17 failed: Socket is
> not
> >>>>>>>>> connected (57)
> >>>>>>>>> [Justins-MacBook-Pro-2.local:20936] [[13560,1],0]
> ORTE_ERROR_LOG:
> >>>>>>>>> Unreachable in file oob_usock_connection.c at line 315
> >>>>>>>>> [Justins-MacBook-Pro-2.local:20936] [[13560,1],0]
> >>>>>>>>> orte_usock_peer_try_connect: usock_peer_send_connect_ack to proc
> >>>>>>>>> [[13560,0],0] failed: Unreachable (-12)
> >>>>>>>>> [Justins-MacBook-Pro-2:20936] *** Process received signal ***
> >>>>>>>>> [Justins-MacBook-Pro-2:20936] Signal: Segmentation fault: 11 (11)
> >>>>>>>>> [Justins-MacBook-Pro-2:20936] Signal code:  (0)
> >>>>>>>>> [Justins-MacBook-Pro-2:20936] Failing at address: 0x0
> >>>>>>>>> -------------------------------------------------------
> >>>>>>>>> Primary job  terminated normally, but 1 process returned
> >>>>>>>>> a non-zero exit code.. Per user-direction, the job has been
> >>>>>>>>> aborted.
> >>>>>>>>> -------------------------------------------------------
> >>>>>>>>>
> >>>>>>>>> ------------------------------------------------------------
> --------------
> >>>>>>>>> mpirun detected that one or more processes exited with non-zero
> >>>>>>>>> status, thus causing
> >>>>>>>>> the job to be terminated. The first process to do so was:
> >>>>>>>>>
> >>>>>>>>> Process name: [[13560,1],0]
> >>>>>>>>> Exit code:    1
> >>>>>>>>>
> >>>>>>>>> ------------------------------------------------------------
> --------------
> >>>>>>>>>
> >>>>>>>>> Clearly something is wrong here. I already tried things like "rm
> >>>>>>>>> -rf
> >>>>>>>>> $TMPDIR/openmpi-sessions-*" but said directory keeps reappearing
> >>>>>>>>> and
> >>>>>>>>> the error persists. Why does this happen and how do I fix it? For
> >>>>>>>>> what
> >>>>>>>>> it's worth, here's some other information that may help:
> >>>>>>>>>
> >>>>>>>>> ~ mpicc --version
> >>>>>>>>> Apple LLVM version 8.0.0 (clang-800.0.38)
> >>>>>>>>> Target: x86_64-apple-darwin15.6.0
> >>>>>>>>> Thread model: posix
> >>>>>>>>> InstalledDir:
> >>>>>>>>> /Applications/Xcode.app/Contents/Developer/Toolchains/
> XcodeDefault.xctoolchain/usr/bin
> >>>>>>>>>
> >>>>>>>>> I tested Hello World with both mpicc and mpif90, and they still
> >>>>>>>>> work
> >>>>>>>>> despite showing those two error/warning messages.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Justin
> >>>>>>>>> _______________________________________________
> >>>>>>>>> users mailing list
> >>>>>>>>> users@lists.open-mpi.org
> >>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> users mailing list
> >>>>>>>> users@lists.open-mpi.org
> >>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> users mailing list
> >>>>>>> users@lists.open-mpi.org
> >>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> users mailing list
> >>>>>> users@lists.open-mpi.org
> >>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >>>>>
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> users@lists.open-mpi.org
> >>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users@lists.open-mpi.org
> >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> users@lists.open-mpi.org
> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> users@lists.open-mpi.org
> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> >
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to