But for FDTD-EM, e.g., we would be doing many tri-diagonal solves at once. So modulo memory bandwidth (i.e., small enough problems), AVX should do fine?
Is this the case you are talking about, Paul? Thx....John On 11/13/12 11:11 AM, Paul Mullowney wrote: > Sparse solves. MKL has an option for using multiple CPU cores for their > sparse triangular solve with: > > mkl_set_num_threads() > > Under the hood, the MKL implementation uses the level-scheduler algorithm for > extracting some amount of parallelism. We've tested this on many matrices and > never seen scalability on a sandy bridge. I don't know the reason for this. > For some matrices, the level-scheduler algorithm has a modest amount of > parallelism and I would expect some benefit going to multiple cores. > > -Paul > > >> On 11/13/12 2:54 AM, Paul Mullowney wrote: >>> Every test we've done shows that the MKL triangular solve doesn't scale at >>> all on a sandy bridge multi-core. I doubt it will be any different on the >>> Xeon Phi. >>> >>> -Paul >> Do you mean sparse or dense solves? Sparse triangular solves are sequential >> in MKL. PARDISO also does it sequentially. >> >> Anton >> >>>>> >>>>>> >>>>>> In terms of raw numbers, $2,649 for 320 GB/sec and 8 GB of memory is >>>>>> quite a lot compared to the $500 of a Radeon HD 7970 GHz Edition at 288 >>>>>> GB/sec and 3 GB memory. My hope is that Xeon Phi can do better than GPUs >>>>>> in kernels requiring frequent global synchronizations, e.g. >>>>>> ILU-substitutions. >>>>> >>>>> But, but, but it runs the Intel instruction set, that is clearly >>>>> worth 5+ times the price :-) >>>> >>>> I'm tempted to say 'yes', but at a second thought I'm not so sure whether >>>> any of us is actually programming in x86 assembly (again)? >>>> Part of the GPU/accelerator hype is arguably due to a rediscovery of >>>> programming close to hardware, even though it was/is non-x86. With Xeon >>>> Phi we might now observe some sort of compiler war instead of low-level >>>> kernel tuning - is this what we want? >>>> >>>> Best regards, >>>> Karli >>>> >>> >>> >
