Re: Montgomery multiplication on nVidia GPUs

Andy Polyakov Tue, 28 Aug 2007 08:01:12 -0700

I've put together some code to do parallel 512-bit montgomery
multiplication for nVidia GPUs. On my 8800GTX I get about 12k of these per
second in 2k batches,

What does "12k in 2k" mean? I mean given that 512 bits is 64 bytes does2k mean that you process 32 vectors at once and 12k means that is takes1/(12000/32) seconds to perform 32 private key operations?

so enough to do 6k 1024-bit RSA private decrypts.

Why do you think that it will be 6k? Complexity is n^2 so it should be12/4, not 12/2.

On this basis it looks like GPUs make pretty competitive cheap
crypto accelerators :)

I don't have the time or expertise to integrate this as a patch. Is there
anyone who would be interested in taking this on? A significant challenge
would be to provide a sensible interface for batch processing of requests;
the current interface doesn't look well suited to this task.

Trouble with crypto, especially with asymmetric, is that there is noguaranteed data flow. I mean there is no guarantee that there will be 32operations every 32/12 milliseconds and given lower average request ratebatching 32 requests will incur longer average service times you mightnot be willing to tolerate: when a request is posed it's expected to beserved instantly, not wait till 31 extra ones are posed. So you'd haveto be adaptive, i.e. grab as many requests as currently available andprocess sometimes 1, sometimes 5, 2, 15, 9, etc. operations at once. Nowwhat if average load is less than 12000/32 requests per second? Thenyou'd go with single request at a time practically all the time. Butsingle operation performance is *far* from impressive. So you'd have toplay even smarter, grab currently outstanding requests and decide if itwould pay off to pack them together and send to GPU or just process themon CPU. Such dispatching per se implies certain requirements onapplication design (as it would be hardly appropriate to delegate thisrole to say OpenSSL engine), so it's likely to be impossible to "justuse GPU with arbitrary application."

Please note that my remark does not mean that I'm condemning the idea.It only means that this class of problems is more complicated thancommonly presented. General purpose computing on graphic processor iscompelling idea, but normally you'd have to have guaranteed data flow(say XGB of seismic data to process in shortest amount of minutes) tojustify the effort. A.

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           [EMAIL PROTECTED]

Re: Montgomery multiplication on nVidia GPUs

Reply via email to