[sage-devel] Re: FLINT/NTL benchmarks

Victor Shoup Mon, 05 Oct 2015 05:55:45 -0700

Actually, I'll correct myself: SIMD can also be used to speed up the CRT 
operations in the
multi-modular algorithm as well, at least up to a few thousand bits.  So 
even in the grand and glorious
SIMD future, there will probably be room to explore a variety of 
algorithmic approaches.


On Monday, October 5, 2015 at 7:13:23 AM UTC-4, Victor Shoup wrote:
>
> Yes, I saw a paper by the mathemagix people recently...I believe they use 
> the floating point SIMD, and some 
> tricks with "fused multiply and add" to very nice effect.  
> They are basically implementing integer mod p arithmetic using floating 
> point.
> I may want to play around with those ideas at some point.
> Although, I may want to just wait a couple of years until Canonlakes 
> support the "IFMA52" instructions, which will
> support everything we need for 52-bit prime FFT's very nicely.  There will 
> be a tradeoff, of course, going back
> to 52-bits from 60.  Too bad Intel has no plans to support 64-bit MulHi in 
> SIMD anytime soon.
>
> Applying the mathemagix approach directly to NTL's multi-modular approach 
> for ZZ_pX mul may not
> be that helpful, because of the cost of CRT on coefficients.  But maybe 
> this will tip the scales
> in favor of Kronecker substitution by way of 3-prime FFT large integer mul 
> in SIMD....
> especially if we get to AVX1024 someday :-)
>
> I hesitate somewhat to get involved in SIMD game, as all the assembly code 
> / intrinsics stuff is a huge time sink that 
> will yield code that will probably be obsolete in 10 years.  Multicore, on 
> the other hand, seems like a better
> investment, especially since C++11 (and C11) now standardize many aspects 
> of it, so one can write portable
> code now.
>
>
> On Monday, October 5, 2015 at 5:33:29 AM UTC-4, Bill Hart wrote:
>>
>>
>>
>> On Monday, 5 October 2015 04:18:27 UTC+2, Victor Shoup wrote:
>>>
>>> Thanks for the feedback, Bill!  A bit of friendly competition is always 
>>> a good thing :-)
>>>
>>
>> I agree. Software development is more interesting when there are people 
>> working on similar things.
>>  
>>
>>>
>>> I've been looking into SIMD for small prime FFT's...unfortunately, there 
>>> is currently
>>> no CPU out there that supports SIMD 64x64 -> high order 64 bits of 
>>> product.
>>> So, it does not look very promising.
>>> In a couple of years, Intel will eventually release a machine with SIMD 
>>> 52x52 -> high order 52 bits of product.
>>> That may be interesting.
>>>
>>
>> At a recent conference I saw some interesting work out of the Mathemagix 
>> group along these lines. They have papers on their website that give very 
>> impressive timings.
>>  
>>
>>>
>>> Also, now that NTL is threadsafe, I'm looking at making some of the 
>>> low-level routines thread enhanced.
>>> I'm currently finishing up makeing NTL's multi-modular ZZ_pX arithmetic 
>>> thread enhanced.
>>> For up to 4 threads, I get close to linear speedup...but after that, it 
>>> starts to degrade: at 16 threads
>>> I only get 8x speedup.  I've also made the Brent/Kung modular 
>>> composition thread enhanced.
>>> This gives much better multicore performance, as the bottleneck is a 
>>> rectangular matrix mul
>>> which is trivial to parallelize.  I'll hopefully make a release of all 
>>> this soonish.
>>>
>>
>> Bill.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

[sage-devel] Re: FLINT/NTL benchmarks

Reply via email to