It's really a mystery that you get any speedup on any kind of processor for MKL sparse triangular solves, because according to this: http://software.intel.com/en-us/articles/parallelism-in-the-intel-math-kernel-library they are not threaded at all. citation: "For sparse matrices, all level 2 operations except for the sparse triangular solvers are threaded"
Anton On 11/13/12 7:11 PM, Paul Mullowney wrote: > Sparse solves. MKL has an option for using multiple CPU cores for > their sparse triangular solve with: > > mkl_set_num_threads() > > Under the hood, the MKL implementation uses the level-scheduler > algorithm for extracting some amount of parallelism. We've tested this > on many matrices and never seen scalability on a sandy bridge. I don't > know the reason for this. For some matrices, the level-scheduler > algorithm has a modest amount of parallelism and I would expect some > benefit going to multiple cores. > > -Paul > > >> On 11/13/12 2:54 AM, Paul Mullowney wrote: >>> Every test we've done shows that the MKL triangular solve doesn't >>> scale at all on a sandy bridge multi-core. I doubt it will be any >>> different on the Xeon Phi. >>> >>> -Paul >> Do you mean sparse or dense solves? Sparse triangular solves are >> sequential in MKL. PARDISO also does it sequentially. >> >> Anton >> >>>>> >>>>>> >>>>>> In terms of raw numbers, $2,649 for 320 GB/sec and 8 GB of memory >>>>>> is quite a lot compared to the $500 of a Radeon HD 7970 GHz >>>>>> Edition at 288 GB/sec and 3 GB memory. My hope is that Xeon Phi >>>>>> can do better than GPUs in kernels requiring frequent global >>>>>> synchronizations, e.g. ILU-substitutions. >>>>> >>>>> But, but, but it runs the Intel instruction set, that is >>>>> clearly worth 5+ times the price :-) >>>> >>>> I'm tempted to say 'yes', but at a second thought I'm not so sure >>>> whether any of us is actually programming in x86 assembly (again)? >>>> Part of the GPU/accelerator hype is arguably due to a rediscovery >>>> of programming close to hardware, even though it was/is non-x86. >>>> With Xeon Phi we might now observe some sort of compiler war >>>> instead of low-level kernel tuning - is this what we want? >>>> >>>> Best regards, >>>> Karli >>>> >>> >>> >
