Understood - but I was wondering if that was true for OMPI as well. On Jul 9, 2013, at 11:30 AM, "Daniels, Marcus G" <mdani...@lanl.gov> wrote:
> The Intel MPI implementation does this. The performance between the > accelerators and the host is poor though. About 20mb/sec in my ping/pong > test. Intra-MIC communication is about a 1GB/sec whereas intra-host is > about 6GB/sec. Latency is higher (i.e. worse) for the intra-MIC > communication too (vs. intra-host) by about the same factor. > > Thanks Tim for the hints on building OpenMPI. > ________________________________________ > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of > Ralph Castain [r...@open-mpi.org] > Sent: Tuesday, July 09, 2013 12:14 PM > To: Tim Carlson; Open MPI Users > Subject: Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 > an 1.7.2 > > Hi Tim > > Quick question: can the procs on the MIC communicate with procs on (a) the > local host, (b) other hosts, and (c) MICs on other hosts? > > The last two would depend on having direct access to one or more network > transports. > > > On Jul 9, 2013, at 10:18 AM, Tim Carlson <tim.carl...@pnl.gov> wrote: > >> On Mon, 8 Jul 2013, Tim Carlson wrote: >> >> Now that I have gone through this process, I'll report that it works with >> the caveat that you can't use the openmpi wrappers for compiling. Recall >> that the Phi card does not have either the GNU or Intel compilers installed. >> While you could build up a tool chain for the GNU compilers, you're not >> going to get a native Intel compiler unless Intel decides to support it. >> >> Here is the process from end to end to get Openmpi to build a native Phi >> application. >> >> export PATH=/usr/linux-k1om-4.7/bin:$PATH >> . /share/apps/intel/composer_xe_2013.3.163/bin/iccvars.sh intel64 >> export CC="icc -mmic" >> export CXX="icpc -mmic" >> >> cd ~ >> tar zxf openmpi-1.6.4.tar.gz >> cd openmpi-1.6.4 >> ./configure --prefix=/people/tim/mic/openmpi/intel \ >> --build=x86_64-unknown-linux-gnu --host=x86_64-k1om-linux \ >> --disable-mpi-f77 \ >> AR=x86_64-k1om-linux-ar RANLIB=x86_64-k1om-linux-ranlib >> LD=x86_64-k1om-linux-ld >> make >> make install >> >> That leaves me with a native build of openmpi in >> /people/tim/mic/openmpi/intel >> >> It is of course tempting to just do a >> export PATH=/people/tim/mic/openmpi/intel/bin:$PATH >> and start using mpicc to build my code but that does not work because: >> >> 1) If I try this on the host system I am going to get "wrong architecture" >> because mpicc was build for the Phi and not for the x86_64 host >> >> 2) If I try running it on the Phi, I don't have access to "icc" because I >> can't run the compiler directly on the Phi. >> >> I can "cheat" and see what the mpicc command really does by using "mpicc >> --show" for another installation of openmpi and munge the paths correctly. >> In this case >> >> icc -mmic cpi.c -I/people/tim/mic/openmpi/intel/include -pthread \ >> -L/people/tim/mic/openmpi/intel/lib -lmpi -ldl -lm -Wl,--export-dynamic \ >> -lrt -lnsl -lutil -lm -ldl -o cpi.x >> >> That leaves me with a Phi native version of cpi.x which I can then execute >> on the Phi >> >> $ ssh phi002-mic0 >> >> ( I have NFS mounts on the Phi for all the bits I need ) >> >> ~ $ export PATH=/people/tim/mic/openmpi/intel/bin/:$PATH >> ~ $ export >> LD_LIBRARY_PATH=/share/apps/intel/composer_xe_2013.3.163/compiler/lib/mic/ >> ~ $ export LD_LIBRARY_PATH=/people/tim/mic/openmpi/intel/lib:$LD_LIBRARY_PATH >> ~ $ cd mic >> ~/mic $ mpirun -np 12 cpi.x >> Process 7 on phi002-mic0.local >> Process 10 on phi002-mic0.local >> Process 2 on phi002-mic0.local >> Process 9 on phi002-mic0.local >> Process 1 on phi002-mic0.local >> Process 3 on phi002-mic0.local >> Process 11 on phi002-mic0.local >> Process 5 on phi002-mic0.local >> Process 8 on phi002-mic0.local >> Process 4 on phi002-mic0.local >> Process 6 on phi002-mic0.local >> Process 0 on phi002-mic0.local >> pi is approximately 3.1416009869231245, Error is 0.0000083333333314 >> wall clock time = 0.001766 >> >> >>> On Mon, 8 Jul 2013, Elken, Tom wrote: >>> >>> My mistake on the OFED bits. The host I was installing on did not have all >>> of the MPSS software installed (my cluster admin node and not one of the >>> compute nodes). Adding the intel-mic-ofed-card RPM fixed the problem with >>> compiling the btl:openib bits with both the GNU and Intel compilers using >>> the cross-compiler route (-mmic on the Intel side) >>> >>> Still working on getting the resulting mpicc wrapper working on the MIC >>> side. When I get a working example I'll post the results. >>> >>> Thanks! >>> >>> Tim >>> >>> >>>> >>>> >>>> >>>> Hi Tim, >>>> >>>> >>>> >>>> >>>> >>>> Well, in general and not on MIC I usually build the MPI stacks using the >>>> Intel compiler set. Have you ran into s/w that requires GCC instead of >>>> Intel >>>> compilers (beside Nvidia Cuda)? Did you try to use Intel compiler to >>>> produce >>>> MIC native code (the OpenMPI stack for that matter)? >>>> >>>> [Tom] >>>> >>>> Good idea Michael, With the Intel Compiler, I would use the -mmic flag to >>>> build MIC code. >>>> >>>> >>>> >>>> Tim wrote: “My first pass at doing a cross-compile with the GNU compilers >>>> failed to produce something with OFED support (not surprising) >>>> >>>> export PATH=/usr/linux-k1om-4.7/bin:$PATH >>>> ./configure --build=x86_64-unknown-linux-gnu --host=x86_64-k1om-linux \ >>>> --disable-mpi-f77 >>>> >>>> checking if MCA component btl:openib can compile... no >>>> >>>> >>>> >>>> Regarding a Gnu cross compiler, this worked for one of our engineers >>>> building for True Scale HCAs and PSM/infinipath. But it may give useful >>>> tips for building for btl:openib as well: >>>> >>>> >>>> >>>> 3. How to configure/compile OpenMPI: >>>> >>>> a). untar the openmpi tarball. >>>> >>>> b). edit configure in top directory, add '-linfinipath' after >>>> '-lpsm_infinipath " >>>> >>>> but not necessary for messages, only for command lines. >>>> >>>> >>>> >>>> c). run the following script, >>>> >>>> #!/bin/sh >>>> >>>> >>>> >>>> ./configure \ >>>> >>>> --host=x86_64-k1om-linux \ >>>> >>>> --enable-mpi-f77=no --enable-mpi-f90=no \ >>>> >>>> --with-psm=/…/psm-7.6 \ >>>> >>>> --prefix=/…/openmpi \ >>>> >>>> CC=x86_64-k1om-linux-gcc CXX=x86_64-k1om-linux-g++ \ >>>> >>>> AR=x86_64-k1om-linux-ar RANLIB=x86_64-k1om-linux-ranlib >>>> >>>> >>>> >>>> d). run 'make' and 'make install' >>>> >>>> >>>> >>>> OK, I see that they did not configure for mpi-f77 & mpif90, but perhaps >>>> this >>>> is still helpful, if the AR and RANLIB flags are important. >>>> >>>> -Tom >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> regards >>>> >>>> Michael >>>> >>>> >>>> >>>> On Mon, Jul 8, 2013 at 4:30 PM, Tim Carlson <tim.carl...@pnl.gov> wrote: >>>> >>>> On Mon, 8 Jul 2013, Elken, Tom wrote: >>>> >>>> It isn't quite so easy. >>>> >>>> Out of the box, there is no gcc on the Phi card. You can use the cross >>>> compiler on the host, but you don't get gcc on the Phi by default. >>>> >>>> See this post http://software.intel.com/en-us/forums/topic/382057 >>>> >>>> I really think you would need to build and install gcc on the Phi first. >>>> >>>> My first pass at doing a cross-compile with the GNU compilers failed to >>>> produce something with OFED support (not surprising) >>>> >>>> export PATH=/usr/linux-k1om-4.7/bin:$PATH >>>> ./configure --build=x86_64-unknown-linux-gnu --host=x86_64-k1om-linux \ >>>> --disable-mpi-f77 >>>> >>>> checking if MCA component btl:openib can compile... no >>>> >>>> >>>> Tim >>>> >>>> >>>> >>>> >>>> >>>> Thanks Tom, that sounds good. I will give it a try as soon as >>>> our Phi host >>>> here host gets installed. >>>> >>>> >>>> >>>> I assume that all the prerequisite libs and bins on the Phi side >>>> are >>>> available when we download the Phi s/w stack from Intel's site, >>>> right ? >>>> >>>> [Tom] >>>> >>>> Right. When you install Intel’s MPSS (Manycore Platform >>>> Software Stack), >>>> including following the section on “OFED Support” in the readme >>>> file, you >>>> should have all the prerequisite libs and bins. Note that I >>>> have not built >>>> Open MPI for Xeon Phi for your interconnect, but it seems to me >>>> that it >>>> should work. >>>> >>>> >>>> >>>> -Tom >>>> >>>> >>>> >>>> Cheers >>>> >>>> Michael >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Jul 8, 2013 at 12:10 PM, Elken, Tom >>>> <tom.el...@intel.com> wrote: >>>> >>>> Do you guys have any plan to support Intel Phi in the future? >>>> That is, >>>> running MPI code on the Phi cards or across the multicore and >>>> Phi, as Intel >>>> MPI does? >>>> >>>> [Tom] >>>> >>>> Hi Michael, >>>> >>>> Because a Xeon Phi card acts a lot like a Linux host with an x86 >>>> architecture, you can build your own Open MPI libraries to serve >>>> this >>>> purpose. >>>> >>>> Our team has used existing (an older 1.4.3 version of) Open MPI >>>> source to >>>> build an Open MPI for running MPI code on Intel Xeon Phi cards >>>> over Intel’s >>>> (formerly QLogic’s) True Scale InfiniBand fabric, and it works >>>> quite well. >>>> We have not released a pre-built Open MPI as part of any Intel >>>> software >>>> release. But I think if you have a compiler for Xeon Phi >>>> (Intel Compiler >>>> or GCC) and an interconnect for it, you should be able to build >>>> an Open MPI >>>> that works on Xeon Phi. >>>> >>>> Cheers, >>>> Tom Elken >>>> >>>> thanks... >>>> >>>> Michael >>>> >>>> >>>> >>>> On Sat, Jul 6, 2013 at 2:36 PM, Ralph Castain <r...@open-mpi.org> >>>> wrote: >>>> >>>> Rolf will have to answer the question on level of support. The >>>> CUDA code is >>>> not in the 1.6 series as it was developed after that series went >>>> "stable". >>>> It is in the 1.7 series, although the level of support will >>>> likely be >>>> incrementally increasing as that "feature" series continues to >>>> evolve. >>>> >>>> >>>> On Jul 6, 2013, at 12:06 PM, Michael Thomadakis >>>> <drmichaelt7...@gmail.com> >>>> wrote: >>>>> Hello OpenMPI, >>>>>> I am wondering what level of support is there for CUDA and >>>> GPUdirect on >>>> OpenMPI 1.6.5 and 1.7.2. >>>>>> I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. >>>> However, it >>>> seems that with configure v1.6.5 it was ignored. >>>>>> Can you identify GPU memory and send messages from it directly >>>> without >>>> copying to host memory first? >>>>>>> Or in general, what level of CUDA support is there on 1.6.5 >>>> and 1.7.2 ? Do >>>> you support SDK 5.0 and above? >>>>>> Cheers ... >>>>> Michael >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> >>> >>> >> >> -- >> ------------------------------------------- >> Tim Carlson, PhD >> Senior Research Scientist >> Environmental Molecular Sciences >> Laboratory_______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users