[petsc-dev] refactoring petsccusp.h needed

Karl Rupp Sat, 16 Mar 2013 09:57:15 -0500

Hi Paul,

 > I agree. Just thought I would share something that works easily now for
> the small nv case.


It still serves as a nice comparison for benchmarking :-)

Best regards,
Karli


>>
>> thanks, this should work well for nv = 1, 2, maybe 3. However, it
>> won't help Jose a lot. Clearly, for nv >> 1, there are a bunch of
>> unnecessary loads of xarray. Thus, a two-kernel approach is necessary
>> to handle both extremes (nv \approx 1 and nv >> 1).
>>
>> Best regards,
>> Karli
>>
>>
>>
>>
>>>> Hi Jose,
>>>>
>>>>>> Since I just stumbled over VecMDot_SeqCUSP() when interfacing
>>>>>> ViennaCL: Do you know what was the reason why the 'old' version was
>>>>>> replaced by this expensive call to gemv() including the creation of
>>>>>> temporaries, etc.? Just writing a custom kernel with one work group
>>>>>> per dot-product should do the job perfectly, shouldn't it?
>>>>>
>>>>> My fault:
>>>>> https://bitbucket.org/petsc/petsc-hg/commits/ec7a7de2acd477e5edd24cc5a3af441ce7a68a36
>>>>>
>>>>>
>>>>>
>>>>> The motivation was that the previous version was even worse for me
>>>>> (VecMDot is used a lot in SLEPc and GPU performance was really bad).
>>>>> At that time I did not have the time to write a custom kernel. If you
>>>>> write one, I could help in testing and measuring performance.
>>>>
>>>> Thanks for providing the context. It makes sense to me now, because
>>>> for eigenvalue computations you typically have a lot more vectors
>>>> taking part in mdot as compared to GMRES. This looks like an
>>>> archetypal example for using two different kernels: The first is
>>>> suitable for 'small' numbers of vectors (GMRES), while the second is
>>>> more gemv-like and good for larger vector counts (SLEPc). I'll let you
>>>> know as soon as it's ready for testing.
>>>>
>>>> Thanks and best regards,
>>>> Karli
>>>
>

[petsc-dev] refactoring petsccusp.h needed

Reply via email to