Sparse solves. MKL has an option for using multiple CPU cores for their sparse triangular solve with:
mkl_set_num_threads() Under the hood, the MKL implementation uses the level-scheduler algorithm for extracting some amount of parallelism. We've tested this on many matrices and never seen scalability on a sandy bridge. I don't know the reason for this. For some matrices, the level-scheduler algorithm has a modest amount of parallelism and I would expect some benefit going to multiple cores. -Paul > On 11/13/12 2:54 AM, Paul Mullowney wrote: >> Every test we've done shows that the MKL triangular solve doesn't >> scale at all on a sandy bridge multi-core. I doubt it will be any >> different on the Xeon Phi. >> >> -Paul > Do you mean sparse or dense solves? Sparse triangular solves are > sequential in MKL. PARDISO also does it sequentially. > > Anton > >>>> >>>>> >>>>> In terms of raw numbers, $2,649 for 320 GB/sec and 8 GB of memory >>>>> is quite a lot compared to the $500 of a Radeon HD 7970 GHz >>>>> Edition at 288 GB/sec and 3 GB memory. My hope is that Xeon Phi >>>>> can do better than GPUs in kernels requiring frequent global >>>>> synchronizations, e.g. ILU-substitutions. >>>> >>>> But, but, but it runs the Intel instruction set, that is >>>> clearly worth 5+ times the price :-) >>> >>> I'm tempted to say 'yes', but at a second thought I'm not so sure >>> whether any of us is actually programming in x86 assembly (again)? >>> Part of the GPU/accelerator hype is arguably due to a rediscovery of >>> programming close to hardware, even though it was/is non-x86. With >>> Xeon Phi we might now observe some sort of compiler war instead of >>> low-level kernel tuning - is this what we want? >>> >>> Best regards, >>> Karli >>> >> >>
