El 26/03/2013, a las 02:41, Karl Rupp escribi?: > Hi Jose, Paul, and others, > > I worked today and VecMDot and came up with an implementation which is faster > than an iterated application of the standard cusp::blas::dot() (which, if I'm > not mistaken, just forwards to CUBLAS) if enough vectors (>~6) are involved. > For complex arithmetic, an iterated application of cusp::blas::dotc() is > used, since passing complex types to CUDA kernels is fairly tricky within > PETSc. Jose, any performance feedback from within SLEPc is appreciated :-) > > The new implementation is based on custom kernels, only allocates a little > scratchpad memory and is thus more memory efficient than the old version. > Also, any unnecessary copying of data is avoided. This should speed up GMRES > quite a bit, yet I haven't run any dedicated GMRES benchmarks. Paul, I guess > you have some samples at hand, don't you? > > Best regards, > Karli
In my tests, the new implementation is actually slower. I tried src/vec/vec/examples/tests/ex43.c with 200 vectors of length 10000. Time increases from 4.1 to 7.2. Can anyone try to repeat the tests below? I have an Intel Core i7 with two Tesla C2050. Jose master --------------- $ ./ex43 -n 10000 -k 200 -mdot -log_summary VecMDot 3980 1.0 3.6485e+00 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 0.0e+00 11100 0 0 0 11100 0 0 0 2182 $ ./ex43 -n 10000 -k 200 -mdot -log_summary -vec_type cusp VecMDot 3980 1.0 4.1368e+00 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 0.0e+00 40100 0 0 0 40100 0 0 0 1924 $ ./ex43 -n 10000 -k 200 -log_summary VecDot 398000 1.0 2.1585e+01 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 0.0e+00 78100 0 0 0 78100 0 0 0 369 $ ./ex43 -n 10000 -k 200 -log_summary -vec_type cusp VecDot 398000 1.0 2.9228e+01 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 0.0e+00 82100 0 0 0 82100 0 0 0 272 next --------------- $ ./ex43 -n 10000 -k 200 -mdot -log_summary VecMDot 3980 1.0 3.6899e+00 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 0.0e+00 39100 0 0 0 39100 0 0 0 2157 $ ./ex43 -n 10000 -k 200 -mdot -log_summary -vec_type cusp VecMDot 3980 1.0 7.1823e+00 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 0.0e+00 54100 0 0 0 54100 0 0 0 1108 $ ./ex43 -n 10000 -k 200 -log_summary VecDot 398000 1.0 2.1702e+01 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 0.0e+00 79100 0 0 0 79100 0 0 0 367 $ ./ex43 -n 10000 -k 200 -log_summary -vec_type cusp VecDot 398000 1.0 2.8953e+01 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 0.0e+00 82100 0 0 0 82100 0 0 0 275
