David Miller writes:
>> So compared to add_n, you just get an additional xor with -1 in the loop
>> (and not on the loop's critical path). I can't guess whether or not that
>> will be visible in the execution time.
>
> Thanks I'll give this a try!
And on second thougt, there's no need to handle
From: ni...@lysator.liu.se (Niels Möller)
Date: Fri, 04 Jan 2013 22:29:58 +0100
> David Miller writes:
>
>> If it's needed for sub_n, then yes that's a bit difficult. I was
>> trying to figure out ways to fabricate the needed calculations
>> using just subcc and addxc/addxcc but haven't come up
David Miller writes:
> If it's needed for sub_n, then yes that's a bit difficult. I was
> trying to figure out ways to fabricate the needed calculations
> using just subcc and addxc/addxcc but haven't come up with anything
> just yet.
You could always do the two's complement of one of the opera
From: Torbjorn Granlund
Date: Fri, 04 Jan 2013 15:17:11 +0100
> I expect them to add 3n/2 to 3n cycles, depending on the pipeline
> characteristics.
Each load can issue in 1 cycle, there is a 4 cycle latency, the
loads will fully pipeline. Therefore the overhead is around 3n.
> The Oracle manu
From: Torbjorn Granlund
Date: Fri, 04 Jan 2013 14:54:15 +0100
> (For modexp, I assume one can stay in registers, making this
> overhead small when using a large exponent, such as RSA
> signing/decryption.)
The montmul and montsqr instructions are meant to be used in
a sort of byte-code'ish way.
From: Torbjorn Granlund
Date: Fri, 04 Jan 2013 14:54:15 +0100
> Did you add umulxhi use in your patch from a few days ago?
Yes I did use mulx/umulxhi (both T3 and T4 have umulxhi) and yes the
multiplies do pipeline on T4 (it doesn't on T3), and it gets about 4
cycles per limb in a two-way unroll
From: bodr...@mail.dm.unipi.it
Date: Fri, 4 Jan 2013 14:12:23 +0100 (CET)
> Il Ven, 4 Gennaio 2013 10:07 am, David Miller ha scritto:
>> mpmul 3 ! The immediate field is "N - 1"
>
> Does the immediate means that, to write e.g. sqr_basecase (it should be
> far simpler
bodr...@mail.dm.unipi.it writes:
Does the immediate means that, to write e.g. sqr_basecase (it should be
far simpler than writing mul_basecase), you need a branch for each
different N?
Since you have to preload a (weird) set of hardwired registers, one will
really need special code for ev
I took a brief look at the definition of these instructions. It is
clear that they did not consult an expert in the area. They also added
DES instructions now (in 2012).
They added a few useful instructions, addxc/addxcc and umulxhi. The
former is a 64-bit addition with useful carry in and out
Ciao,
Il Ven, 4 Gennaio 2013 10:07 am, David Miller ha scritto:
> mpmul 3 ! The immediate field is "N - 1"
Does the immediate means that, to write e.g. sqr_basecase (it should be
far simpler than writing mul_basecase), you need a branch for each
different N?
> The c
From: ni...@lysator.liu.se (Niels Möller)
Date: Fri, 04 Jan 2013 09:10:30 +0100
> David Miller writes:
>
>> That's why realistically I'll probably only use mpmul for 3x3 and
>> larger.
>
> So, e.g., an mpn_addmul_4 would make sense (and up to mpn_addmul_32, if
> you want to make maximal use of
David Miller writes:
> That's why realistically I'll probably only use mpmul for 3x3 and
> larger.
So, e.g., an mpn_addmul_4 would make sense (and up to mpn_addmul_32, if
you want to make maximal use of mpmul...)? I don't know anything about
these sparc instructions beyond what you're explaining
From: ni...@lysator.liu.se (Niels Möller)
Date: Fri, 04 Jan 2013 08:48:21 +0100
> David Miller writes:
>
>> Just FYI, I'm also working on an mpn_mul_basecase that makes use of
>> the T4 'mpmul' instruction which can do NxN 64-bit limb multiplies
>> for values of N from 1 to 32.
>
> It might mak
13 matches
Mail list logo