Actually, I'll correct myself: SIMD can also be used to speed up the CRT operations in the multi-modular algorithm as well, at least up to a few thousand bits. So even in the grand and glorious SIMD future, there will probably be room to explore a variety of algorithmic approaches.
On Monday, October 5, 2015 at 7:13:23 AM UTC-4, Victor Shoup wrote: > > Yes, I saw a paper by the mathemagix people recently...I believe they use > the floating point SIMD, and some > tricks with "fused multiply and add" to very nice effect. > They are basically implementing integer mod p arithmetic using floating > point. > I may want to play around with those ideas at some point. > Although, I may want to just wait a couple of years until Canonlakes > support the "IFMA52" instructions, which will > support everything we need for 52-bit prime FFT's very nicely. There will > be a tradeoff, of course, going back > to 52-bits from 60. Too bad Intel has no plans to support 64-bit MulHi in > SIMD anytime soon. > > Applying the mathemagix approach directly to NTL's multi-modular approach > for ZZ_pX mul may not > be that helpful, because of the cost of CRT on coefficients. But maybe > this will tip the scales > in favor of Kronecker substitution by way of 3-prime FFT large integer mul > in SIMD.... > especially if we get to AVX1024 someday :-) > > I hesitate somewhat to get involved in SIMD game, as all the assembly code > / intrinsics stuff is a huge time sink that > will yield code that will probably be obsolete in 10 years. Multicore, on > the other hand, seems like a better > investment, especially since C++11 (and C11) now standardize many aspects > of it, so one can write portable > code now. > > > On Monday, October 5, 2015 at 5:33:29 AM UTC-4, Bill Hart wrote: >> >> >> >> On Monday, 5 October 2015 04:18:27 UTC+2, Victor Shoup wrote: >>> >>> Thanks for the feedback, Bill! A bit of friendly competition is always >>> a good thing :-) >>> >> >> I agree. Software development is more interesting when there are people >> working on similar things. >> >> >>> >>> I've been looking into SIMD for small prime FFT's...unfortunately, there >>> is currently >>> no CPU out there that supports SIMD 64x64 -> high order 64 bits of >>> product. >>> So, it does not look very promising. >>> In a couple of years, Intel will eventually release a machine with SIMD >>> 52x52 -> high order 52 bits of product. >>> That may be interesting. >>> >> >> At a recent conference I saw some interesting work out of the Mathemagix >> group along these lines. They have papers on their website that give very >> impressive timings. >> >> >>> >>> Also, now that NTL is threadsafe, I'm looking at making some of the >>> low-level routines thread enhanced. >>> I'm currently finishing up makeing NTL's multi-modular ZZ_pX arithmetic >>> thread enhanced. >>> For up to 4 threads, I get close to linear speedup...but after that, it >>> starts to degrade: at 16 threads >>> I only get 8x speedup. I've also made the Brent/Kung modular >>> composition thread enhanced. >>> This gives much better multicore performance, as the bottleneck is a >>> rectangular matrix mul >>> which is trivial to parallelize. I'll hopefully make a release of all >>> this soonish. >>> >> >> Bill. >> > -- You received this message because you are subscribed to the Google Groups "sage-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout.