On Feb 12, 2013, at 7:43 PM, Karl Rupp wrote: > Hi Jed, > > > Which crossover are you referring to? The CPU versus GTX285 at >> about 20k >> dofs, but with only very small gains for another order of magnitude? >> >> >> I assume it's the cross-over of Xeon Phi vs. Xeon, >> >> >> MIC is slower than Xeon by more than an order of magnitude at 10k dofs. > > Tim was referring to the cross-over at >10k... > > >> but almost all cross-overs happen in the 10k-100k region and are due >> to PCI-Express latency. >> >> >> Why is PCI-Express latency important here? Can't the MIC code run >> entirely on the device? > > Almost-all (OpenCL, CUDA). Native mode ought to be the exception, but it's > the OpenMP overhead which limits then. Single-core on the MIC is not really > an option either... > > It would be interesting to play with a pthreads-threadpool implementation on > the MIC to see how much performance can really be obtained for smallish > problems.
You can try running the example threadcomm/examples/tutorials/ex5.c. It gives a measure of the overhead in launching kernels with OpenMP and pthread. > > Best regards, > Karli >
