Hi Jose,

I reran ex43 in next and confirmed that vectors of size 10k are simply 
to small (latency-limited regime). The evaluation should better be based 
on vectors of size 50k or 100k.

Best regards,
Karli



On 03/26/2013 10:39 AM, Jose E. Roman wrote:
>
> El 26/03/2013, a las 02:41, Karl Rupp escribi?:
>
>> Hi Jose, Paul, and others,
>>
>> I worked today and VecMDot and came up with an implementation which is 
>> faster than an iterated application of the standard cusp::blas::dot() 
>> (which, if I'm not mistaken, just forwards to CUBLAS) if enough vectors 
>> (>~6) are involved. For complex arithmetic, an iterated application of 
>> cusp::blas::dotc() is used, since passing complex types to CUDA kernels is 
>> fairly tricky within PETSc. Jose, any performance feedback from within SLEPc 
>> is appreciated :-)
>>
>> The new implementation is based on custom kernels, only allocates a little 
>> scratchpad memory and is thus more memory efficient than the old version. 
>> Also, any unnecessary copying of data is avoided. This should speed up GMRES 
>> quite a bit, yet I haven't run any dedicated GMRES benchmarks. Paul, I guess 
>> you have some samples at hand, don't you?
>>
>> Best regards,
>> Karli
>
> In my tests, the new implementation is actually slower. I tried 
> src/vec/vec/examples/tests/ex43.c with 200 vectors of length 10000. Time 
> increases from 4.1 to 7.2. Can anyone try to repeat the tests below?
>
> I have an Intel Core i7 with two Tesla C2050.
>
> Jose
>
>
> master
> ---------------
>
> $ ./ex43 -n 10000 -k 200 -mdot -log_summary
>
> VecMDot             3980 1.0 3.6485e+00 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 
> 0.0e+00 11100  0  0  0  11100  0  0  0  2182
>
> $ ./ex43 -n 10000 -k 200 -mdot -log_summary -vec_type cusp
>
> VecMDot             3980 1.0 4.1368e+00 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 
> 0.0e+00 40100  0  0  0  40100  0  0  0  1924
>
> $ ./ex43 -n 10000 -k 200 -log_summary
>
> VecDot            398000 1.0 2.1585e+01 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 
> 0.0e+00 78100  0  0  0  78100  0  0  0   369
>
> $ ./ex43 -n 10000 -k 200 -log_summary -vec_type cusp
>
> VecDot            398000 1.0 2.9228e+01 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 
> 0.0e+00 82100  0  0  0  82100  0  0  0   272
>
>
> next
> ---------------
>
> $ ./ex43 -n 10000 -k 200 -mdot -log_summary
>
> VecMDot             3980 1.0 3.6899e+00 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 
> 0.0e+00 39100  0  0  0  39100  0  0  0  2157
>
> $ ./ex43 -n 10000 -k 200 -mdot -log_summary -vec_type cusp
>
> VecMDot             3980 1.0 7.1823e+00 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 
> 0.0e+00 54100  0  0  0  54100  0  0  0  1108
>
> $ ./ex43 -n 10000 -k 200 -log_summary
>
> VecDot            398000 1.0 2.1702e+01 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 
> 0.0e+00 79100  0  0  0  79100  0  0  0   367
>
> $ ./ex43 -n 10000 -k 200 -log_summary -vec_type cusp
>
> VecDot            398000 1.0 2.8953e+01 1.0 7.96e+09 1.0 0.0e+00 0.0e+00 
> 0.0e+00 82100  0  0  0  82100  0  0  0   275
>

Reply via email to