Hi Jose, I reran ex43 in next and confirmed that vectors of size 10k are simply to small (latency-limited regime). The evaluation should better be based on vectors of size 50k or 100k.
Best regards, Karli On 03/26/2013 10:39 AM, Jose E. Roman wrote: > > El 26/03/2013, a las 02:41, Karl Rupp escribi?: > >> Hi Jose, Paul, and others, >> >> I worked today and VecMDot and came up with an implementation which is >> faster than an iterated application of the standard cusp::blas::dot() >> (which, if I'm not mistaken, just forwards to CUBLAS) if enough vectors >> (>~6) are involved. For complex arithmetic, an iterated application of >> cusp::blas::dotc() is used, since passing complex types to CUDA kernels is >> fairly tricky within PETSc. Jose, any performance feedback from within SLEPc >> is appreciated :-) >> >> The new implementation is based on custom kernels, only allocates a little >> scratchpad memory and is thus more memory efficient than the old version. >> Also, any unnecessary copying of data is avoided. This should speed up GMRES >> quite a bit, yet I haven't run any dedicated GMRES benchmarks. Paul, I guess >> you have some samples at hand, don't you? >> >> Best regards, >> Karli > > In my tests, the new implementation is actually slower. I tried > src/vec/vec/examples/tests/ex43.c with 200 vectors of length 10000. Time > increases from 4.1 to 7.2. Can anyone try to repeat the tests below? > > I have an Intel Core i7 with two Tesla C2050. > > Jose > > > master > --------------- > > $ ./ex43 -n 10000 -k 200 -mdot -log_summary > > VecMDot 3980 1.0 3.6485e+00 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 11100 0 0 0 11100 0 0 0 2182 > > $ ./ex43 -n 10000 -k 200 -mdot -log_summary -vec_type cusp > > VecMDot 3980 1.0 4.1368e+00 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 40100 0 0 0 40100 0 0 0 1924 > > $ ./ex43 -n 10000 -k 200 -log_summary > > VecDot 398000 1.0 2.1585e+01 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 78100 0 0 0 78100 0 0 0 369 > > $ ./ex43 -n 10000 -k 200 -log_summary -vec_type cusp > > VecDot 398000 1.0 2.9228e+01 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 82100 0 0 0 82100 0 0 0 272 > > > next > --------------- > > $ ./ex43 -n 10000 -k 200 -mdot -log_summary > > VecMDot 3980 1.0 3.6899e+00 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 39100 0 0 0 39100 0 0 0 2157 > > $ ./ex43 -n 10000 -k 200 -mdot -log_summary -vec_type cusp > > VecMDot 3980 1.0 7.1823e+00 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 54100 0 0 0 54100 0 0 0 1108 > > $ ./ex43 -n 10000 -k 200 -log_summary > > VecDot 398000 1.0 2.1702e+01 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 79100 0 0 0 79100 0 0 0 367 > > $ ./ex43 -n 10000 -k 200 -log_summary -vec_type cusp > > VecDot 398000 1.0 2.8953e+01 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 82100 0 0 0 82100 0 0 0 275 >
