Re: RFR 8181594: Efficient and constant-time modular arithmetic

2018-03-20 Thread Xuelei Fan

I have no other comments.

Thanks,
Xuelei

On 3/20/2018 7:50 AM, Adam Petcher wrote:

Latest webrev: http://cr.openjdk.java.net/~apetcher/8181594/webrev.02/

Comments inline below.

In addition, I also changed the name of IntegerModuloP_Base to 
IntegerModuloP, and IntegerModuloP to ImmutableIntegerModuloP.



On 3/11/2018 12:04 PM, Xuelei Fan wrote:

On 2/26/2018 10:39 AM, Adam Petcher wrote:


On 2/23/2018 12:46 PM, Xuelei Fan wrote:


ArrayUtil.java:
===
I'm not very sure how widely this utilities will be used in the 
future. Looks like only BigIntegerModuloP uses this classes.  I may 
prefer to define private methods for byte array swap in 
BigIntegerModuloP.


It is also used by XDHPublicKeyImpl (in the XDH code review). XDH 
public keys are represented as BigInteger, and I use the array 
reverse method to convert encoded keys to BigInteger.


If it is not widely used by other classes, please have these methods 
in the class where is get called.   The sun.security.util is exported 
to other modules as well, we may not want to add stuff into this 
package unless it is really necessary.


Okay. I put these methods in BigIntegerModuloP and removed the ArrayUtil 
class. This array reverse code will be duplicated where it is used by 
XDH public keys (in the other code review).






MutableIntegerModuloP.java
==
void conditionalSwapWith(MutableIntegerModuloP b, int swap);
As the 'swap' parameter can only be 0 or 1, could it be a boolean 
parameter?


I couldn't come up with a way to implement this without branching 
when the swap parameter is boolean. See 
IntegerPolynomial.conditionalSwap to see how this is implemented in 
arithmetic with an int swap argument. If you (or anyone) can think of 
a way to do this with boolean, let me know.


I added a sentence to the comment above conditionalSwapWith that 
describes why it is an int instead of a boolean.


I did not get the point about the need to avoid branching.  Can you 
have more details?


The goal is to avoid things like if(secret){...}, in order to prevent 
the secrets from leaking into side channels (mostly timing and cache). 
The way this method is used by XDH, the swap parameter is a single bit 
of the private key. By storing this bit as an integer, and then doing 
the swap using only integer arithmetic, we can avoid branching which may 
leak the bits of the key.






Except the conditionalSwapWith() method, I did not get the points 
why we need a mutable version.  Would you please have more 
description of this requirement?


The comment above the class definition has this sentence:

"This interface can be used to improve performance and avoid the 
allocation of a large number of temporary objects."


Do you need more information than this in the comments? The 
performance motivation is so that a.add(b).multiply(c)... can be done 
without allocating a new buffer for each operation. For example, 
without mutable field elements, an X25519 point multiplication would 
allocate around 4,300 temporary arrays totaling 350,000 bytes. If I 
remember correctly, switching the X25519 implementation to mutable 
field elements reduced the point multiplication time by about half.



I see your point.  The benefits is obviously.

OK, why you need the immutable version then? Sounds like the mutable 
version interface is sufficient, including performance. If an 
immutable version is really needed, we can have the implementation 
making the decision.  Accordingly, the conditionalSwapWith() can be 
defined as optional method, if it is not required to be implemented in 
immutable implementation.


It's confusing to me that the immutable and mutable and the base 
versions/interfaces mixed together.  It would be nice if we can 
simplify the interface a little bit.


For internal APIs, sometimes we don't want the same quality level as 
public APIs.  I think this set of class will be widely used by new EC 
curves, ChaCha20/Poly1305, or more in the future.  It would be nice if 
we could do it good at the beginning.


The mutable version adds the conditional swap as well as mutable 
versions of many of the basic operations. The XDH implementation uses 
both the mutable and immutable versions. The immutable version allows me 
to simplify the client code because I don't have to worry about whether 
some value has been modified. For example, the XDH code keeps a 
representation of 0, 1, and the constant that defines the curve as 
immutable values.


So I prefer to have both. It complicates this API a bit, but it allows 
simpler and more robust code in the client of this API.







IntegerModuloP_Base.java

default byte[] addModPowerTwo(IntegerModuloP_Base b, int len)
void addModPowerTwo(IntegerModuloP_Base b, byte[] result);

For the first sign of the method names, I thought it is to calculate 
as "(this + b) ^ 2 mod m". 


To be precise, it calculates "((this % p) + (b % p)) % 2^m" (where p 
is the prime that 

Re: RFR 8181594: Efficient and constant-time modular arithmetic

2018-03-20 Thread Adam Petcher

Latest webrev: http://cr.openjdk.java.net/~apetcher/8181594/webrev.02/

Comments inline below.

In addition, I also changed the name of IntegerModuloP_Base to 
IntegerModuloP, and IntegerModuloP to ImmutableIntegerModuloP.



On 3/11/2018 12:04 PM, Xuelei Fan wrote:

On 2/26/2018 10:39 AM, Adam Petcher wrote:


On 2/23/2018 12:46 PM, Xuelei Fan wrote:


ArrayUtil.java:
===
I'm not very sure how widely this utilities will be used in the 
future. Looks like only BigIntegerModuloP uses this classes.  I may 
prefer to define private methods for byte array swap in 
BigIntegerModuloP.


It is also used by XDHPublicKeyImpl (in the XDH code review). XDH 
public keys are represented as BigInteger, and I use the array 
reverse method to convert encoded keys to BigInteger.


If it is not widely used by other classes, please have these methods 
in the class where is get called.   The sun.security.util is exported 
to other modules as well, we may not want to add stuff into this 
package unless it is really necessary.


Okay. I put these methods in BigIntegerModuloP and removed the ArrayUtil 
class. This array reverse code will be duplicated where it is used by 
XDH public keys (in the other code review).






MutableIntegerModuloP.java
==
void conditionalSwapWith(MutableIntegerModuloP b, int swap);
As the 'swap' parameter can only be 0 or 1, could it be a boolean 
parameter?


I couldn't come up with a way to implement this without branching 
when the swap parameter is boolean. See 
IntegerPolynomial.conditionalSwap to see how this is implemented in 
arithmetic with an int swap argument. If you (or anyone) can think of 
a way to do this with boolean, let me know.


I added a sentence to the comment above conditionalSwapWith that 
describes why it is an int instead of a boolean.


I did not get the point about the need to avoid branching.  Can you 
have more details?


The goal is to avoid things like if(secret){...}, in order to prevent 
the secrets from leaking into side channels (mostly timing and cache). 
The way this method is used by XDH, the swap parameter is a single bit 
of the private key. By storing this bit as an integer, and then doing 
the swap using only integer arithmetic, we can avoid branching which may 
leak the bits of the key.






Except the conditionalSwapWith() method, I did not get the points 
why we need a mutable version.  Would you please have more 
description of this requirement?


The comment above the class definition has this sentence:

"This interface can be used to improve performance and avoid the 
allocation of a large number of temporary objects."


Do you need more information than this in the comments? The 
performance motivation is so that a.add(b).multiply(c)... can be done 
without allocating a new buffer for each operation. For example, 
without mutable field elements, an X25519 point multiplication would 
allocate around 4,300 temporary arrays totaling 350,000 bytes. If I 
remember correctly, switching the X25519 implementation to mutable 
field elements reduced the point multiplication time by about half.



I see your point.  The benefits is obviously.

OK, why you need the immutable version then? Sounds like the mutable 
version interface is sufficient, including performance. If an 
immutable version is really needed, we can have the implementation 
making the decision.  Accordingly, the conditionalSwapWith() can be 
defined as optional method, if it is not required to be implemented in 
immutable implementation.


It's confusing to me that the immutable and mutable and the base 
versions/interfaces mixed together.  It would be nice if we can 
simplify the interface a little bit.


For internal APIs, sometimes we don't want the same quality level as 
public APIs.  I think this set of class will be widely used by new EC 
curves, ChaCha20/Poly1305, or more in the future.  It would be nice if 
we could do it good at the beginning.


The mutable version adds the conditional swap as well as mutable 
versions of many of the basic operations. The XDH implementation uses 
both the mutable and immutable versions. The immutable version allows me 
to simplify the client code because I don't have to worry about whether 
some value has been modified. For example, the XDH code keeps a 
representation of 0, 1, and the constant that defines the curve as 
immutable values.


So I prefer to have both. It complicates this API a bit, but it allows 
simpler and more robust code in the client of this API.







IntegerModuloP_Base.java

default byte[] addModPowerTwo(IntegerModuloP_Base b, int len)
void addModPowerTwo(IntegerModuloP_Base b, byte[] result);

For the first sign of the method names, I thought it is to calculate 
as "(this + b) ^ 2 mod m". 


To be precise, it calculates "((this % p) + (b % p)) % 2^m" (where p 
is the prime that defines the field, and m is the desired length, in 
bits). Note that the addition here is 

Re: RFR 8181594: Efficient and constant-time modular arithmetic

2018-03-11 Thread Xuelei Fan

On 2/26/2018 10:39 AM, Adam Petcher wrote:


http://cr.openjdk.java.net/~apetcher/8181594/webrev.01/

See inline below.

On 2/23/2018 12:46 PM, Xuelei Fan wrote:


ArrayUtil.java:
===
I'm not very sure how widely this utilities will be used in the 
future. Looks like only BigIntegerModuloP uses this classes.  I may 
prefer to define private methods for byte array swap in 
BigIntegerModuloP.


It is also used by XDHPublicKeyImpl (in the XDH code review). XDH public 
keys are represented as BigInteger, and I use the array reverse method 
to convert encoded keys to BigInteger.


If it is not widely used by other classes, please have these methods in 
the class where is get called.   The sun.security.util is exported to 
other modules as well, we may not want to add stuff into this package 
unless it is really necessary.




MutableIntegerModuloP.java
==
void conditionalSwapWith(MutableIntegerModuloP b, int swap);
As the 'swap' parameter can only be 0 or 1, could it be a boolean 
parameter?


I couldn't come up with a way to implement this without branching when 
the swap parameter is boolean. See IntegerPolynomial.conditionalSwap to 
see how this is implemented in arithmetic with an int swap argument. If 
you (or anyone) can think of a way to do this with boolean, let me know.


I added a sentence to the comment above conditionalSwapWith that 
describes why it is an int instead of a boolean.


I did not get the point about the need to avoid branching.  Can you have 
more details?




Except the conditionalSwapWith() method, I did not get the points why 
we need a mutable version.  Would you please have more description of 
this requirement?


The comment above the class definition has this sentence:

"This interface can be used to improve performance and avoid the 
allocation of a large number of temporary objects."


Do you need more information than this in the comments? The performance 
motivation is so that a.add(b).multiply(c)... can be done without 
allocating a new buffer for each operation. For example, without mutable 
field elements, an X25519 point multiplication would allocate around 
4,300 temporary arrays totaling 350,000 bytes. If I remember correctly, 
switching the X25519 implementation to mutable field elements reduced 
the point multiplication time by about half.



I see your point.  The benefits is obviously.

OK, why you need the immutable version then? Sounds like the mutable 
version interface is sufficient, including performance.  If an immutable 
version is really needed, we can have the implementation making the 
decision.  Accordingly, the conditionalSwapWith() can be defined as 
optional method, if it is not required to be implemented in immutable 
implementation.


It's confusing to me that the immutable and mutable and the base 
versions/interfaces mixed together.  It would be nice if we can simplify 
the interface a little bit.


For internal APIs, sometimes we don't want the same quality level as 
public APIs.  I think this set of class will be widely used by new EC 
curves, ChaCha20/Poly1305, or more in the future.  It would be nice if 
we could do it good at the beginning.





IntegerModuloP_Base.java

default byte[] addModPowerTwo(IntegerModuloP_Base b, int len)
void addModPowerTwo(IntegerModuloP_Base b, byte[] result);

For the first sign of the method names, I thought it is to calculate 
as "(this + b) ^ 2 mod m". 


To be precise, it calculates "((this % p) + (b % p)) % 2^m" (where p is 
the prime that defines the field, and m is the desired length, in bits). 
Note that the addition here is normal integer addition (not addition in 
GF(p)).


This operation is not used in XDH, but it is used in Poly1305 to add the 
AES encryption of a nonce to a field element. So you can get more 
information about this operation by reading the Poly1305 paper/RFC.


I was not meant to say the function of the method.  I meant that the 
method name is a little bit misleading, not very straightforward to me.



Besides, what's the benefits of the two methods?  Could we just use:
  this.add(b).asByteArray()


No, because that would calculate "((this + b) mod p) mod 2^m". The value 
of (this + b) can be larger than p, so this would not produce the 
desired result.

 >>
I guess, but not very sure, it is for constant time calculation. If 
the function is required, could it be renamed as:


  // the result is inside of the size range
  IntegerModuloP addModSize(IntegerModuloP_Base b, int size)
Or
  // the result is wrapped if outside of the size range
  IntegerModuloP addOnWrap(IntegerModuloP_Base b, int size)

and the use may look like:
  this.addModSize(b, size).asByteArray()



Any attempt to perform the addition in IntegerModuloP and then pull out 
the byte array will not work.
Does it mean if I perform a addition, and cannot get the byte array in 
the following step?

 that = this.add(b);
 byte[] 

Re: RFR 8181594: Efficient and constant-time modular arithmetic

2018-02-26 Thread Adam Petcher

Thanks for the initial feedback. Here is the latest webrev:

http://cr.openjdk.java.net/~apetcher/8181594/webrev.01/

See inline below.

On 2/23/2018 12:46 PM, Xuelei Fan wrote:


ArrayUtil.java:
===
I'm not very sure how widely this utilities will be used in the 
future. Looks like only BigIntegerModuloP uses this classes.  I may 
prefer to define private methods for byte array swap in 
BigIntegerModuloP.


It is also used by XDHPublicKeyImpl (in the XDH code review). XDH public 
keys are represented as BigInteger, and I use the array reverse method 
to convert encoded keys to BigInteger.




BigIntegerModuloP.java:
===
As this is a class for testing or ptototype purpose,  it might not be 
a part of JDK products, like JRE.  Would you mind move it to a test 
package if you want to keep it?


Good idea. I moved it into the same directory as TestIntegerModuloP.java.



IntegerModuloP, IntegerModuloP_Base and MutableIntegerModuloP
=
In the security package/context, it may make sense to use 
"IntegerModulo" for the referring to "integers modulo a prime value". 


In ECC, we also have curves over binary fields, and some methods in 
these interfaces/classes will not work correctly if the number of field 
elements is not prime (e.g. Fermat's little theorem is used to calculate 
a multiplicative inverse). So I would prefer that the names of these 
interfaces contain some string (e.g. "Prime" or "P") to make it clear 
that they only represent prime fields.


The class name of "IntegerModuloP_Base" is not a general Java coding 
style.  I may prefer a little bit name changes like:

 IntegerModuloP_Base   -> IntegerModulo
 IntegerModuloP    -> ImmutableIntegerModulo
 MutableIntegerModuloP -> MutableIntegerModulo

 IntegerFieldModuloP   -> IntegerModuloField (?)



I'm not fond of the current names, either. My goal with the naming was 
to indicate that "IntegerModuloP_Base" shouldn't be used in most cases, 
because you don't know whether it is mutable or not. So the typical 
interface to use is the immutable one, and the mutable one should only 
be used when mutability is necessary. Your suggested naming is more 
conventional, but I think it loses this aspect of my naming system.


It's only an internal API, so I'm not too picky about the names, and I 
have no problem changing it as you suggest. But I'd like to see if 
anyone else has any suggestions about the names, first.




MutableIntegerModuloP.java
==
void conditionalSwapWith(MutableIntegerModuloP b, int swap);
As the 'swap' parameter can only be 0 or 1, could it be a boolean 
parameter?


I couldn't come up with a way to implement this without branching when 
the swap parameter is boolean. See IntegerPolynomial.conditionalSwap to 
see how this is implemented in arithmetic with an int swap argument. If 
you (or anyone) can think of a way to do this with boolean, let me know.


I added a sentence to the comment above conditionalSwapWith that 
describes why it is an int instead of a boolean.




Except the conditionalSwapWith() method, I did not get the points why 
we need a mutable version.  Would you please have more description of 
this requirement?


The comment above the class definition has this sentence:

"This interface can be used to improve performance and avoid the 
allocation of a large number of temporary objects."


Do you need more information than this in the comments? The performance 
motivation is so that a.add(b).multiply(c)... can be done without 
allocating a new buffer for each operation. For example, without mutable 
field elements, an X25519 point multiplication would allocate around 
4,300 temporary arrays totaling 350,000 bytes. If I remember correctly, 
switching the X25519 implementation to mutable field elements reduced 
the point multiplication time by about half.





IntegerModuloP_Base.java

default byte[] addModPowerTwo(IntegerModuloP_Base b, int len)
void addModPowerTwo(IntegerModuloP_Base b, byte[] result);

For the first sign of the method names, I thought it is to calculate 
as "(this + b) ^ 2 mod m". 


To be precise, it calculates "((this % p) + (b % p)) % 2^m" (where p is 
the prime that defines the field, and m is the desired length, in bits). 
Note that the addition here is normal integer addition (not addition in 
GF(p)).


This operation is not used in XDH, but it is used in Poly1305 to add the 
AES encryption of a nonce to a field element. So you can get more 
information about this operation by reading the Poly1305 paper/RFC.



Besides, what's the benefits of the two methods?  Could we just use:
  this.add(b).asByteArray()


No, because that would calculate "((this + b) mod p) mod 2^m". The value 
of (this + b) can be larger than p, so this would not produce the 
desired result.




I guess, but not very sure, it is for constant time calculation. If 
the 

Re: RFR 8181594: Efficient and constant-time modular arithmetic

2018-02-23 Thread Xuelei Fan

ArrayUtil.java:
===
I'm not very sure how widely this utilities will be used in the future. 
Looks like only BigIntegerModuloP uses this classes.  I may prefer to 
define private methods for byte array swap in BigIntegerModuloP.



BigIntegerModuloP.java:
===
As this is a class for testing or ptototype purpose,  it might not be a 
part of JDK products, like JRE.  Would you mind move it to a test 
package if you want to keep it?



IntegerModuloP, IntegerModuloP_Base and MutableIntegerModuloP
=
In the security package/context, it may make sense to use 
"IntegerModulo" for the referring to "integers modulo a prime value". 
The class name of "IntegerModuloP_Base" is not a general Java coding 
style.  I may prefer a little bit name changes like:

 IntegerModuloP_Base   -> IntegerModulo
 IntegerModuloP-> ImmutableIntegerModulo
 MutableIntegerModuloP -> MutableIntegerModulo

 IntegerFieldModuloP   -> IntegerModuloField (?)


MutableIntegerModuloP.java
==
void conditionalSwapWith(MutableIntegerModuloP b, int swap);
As the 'swap' parameter can only be 0 or 1, could it be a boolean parameter?

Except the conditionalSwapWith() method, I did not get the points why we 
need a mutable version.  Would you please have more description of this 
requirement?



IntegerModuloP_Base.java

default byte[] addModPowerTwo(IntegerModuloP_Base b, int len)
void addModPowerTwo(IntegerModuloP_Base b, byte[] result);

For the first sign of the method names, I thought it is to calculate as 
"(this + b) ^ 2 mod m".  Besides, what's the benefits of the two 
methods?  Could we just use:

  this.add(b).asByteArray()

I guess, but not very sure, it is for constant time calculation.  If the 
function is required, could it be renamed as:


  // the result is inside of the size range
  IntegerModuloP addModSize(IntegerModuloP_Base b, int size)
Or
  // the result is wrapped if outside of the size range
  IntegerModuloP addOnWrap(IntegerModuloP_Base b, int size)

and the use may look like:
  this.addModSize(b, size).asByteArray()


Will review the rest when I understand more about the interfaces design.

Thanks,
Xuelei

On 1/30/2018 8:52 AM, Adam Petcher wrote:

+core-libs-dev


On 1/26/2018 4:06 PM, Adam Petcher wrote:

JBS: https://bugs.openjdk.java.net/browse/JDK-8181594
Webrev: http://cr.openjdk.java.net/~apetcher/8181594/webrev.00/

This is a code review for the field arithmetic that will be used in 
implementations of X25519/X448 key agreement, the Poly1305 
authenticator, and EdDSA signatures. I believe that the library has 
all the features necessary for X25519/X448 and Poly1305, and I expect 
at most a couple of minor enhancements will be required to support 
EdDSA. There is no public API for this library, so we can change it in 
the future to suit the needs of new algorithms without breaking 
compatibility with external code. Still, I made an attempt to clearly 
structure and document the (internal) API, and I want to make sure it 
is understandable and easy to use.


This is not a general-purpose modular arithmetic library. It will only 
work well in circumstances where the sequence of operations is 
restricted, and where the prime that defines the field has some useful 
structure. Moreover, each new field will require some field-specific 
code that takes into account the structure of the prime and the way 
the field is used in the application. The initial implementation 
includes a field for Poly1305 and the fields for X25519/X448 which 
should also work for EdDSA.


The benefits of using this library are that it is much more efficient 
than using similar operations in BigInteger. Also, many operations are 
branch-free, making them suitable for use in a side-channel resistant 
implementation that does not branch on secrets.


To provide some context, I have attached a code snippet describing how 
this library can be used. The snippet is the constant-time Montgomery 
ladder from my X25519/X448 implementation, which I expect to be out 
for review soon. X25519/X448 only uses standard arithmetic operations, 
and the more unusual features (e.g. add modulo a power of 2) are 
needed by Poly1305.


The field arithmetic (for all fields) is implemented using a 32-bit 
representation similar to the one described in the Ed448 paper[1] (in 
the "Implementation on 32-bit platforms" section). Though my 
implementation uses signed limbs, and grade-school multiplication 
instead of Karatsuba. The argument for correctness is essentially the 
same for all three fields: the magnitude of each 64-bit limb is at 
most 2^(k-1) after reduction, except for the last limb which may have 
a magnitude of up to 2^k. The values of k are between 26 to 28 
(depending on the field), and we can calculate that the maximum 
magnitude for any limb during an add-multiply-carry-reduce 

Re: RFR 8181594: Efficient and constant-time modular arithmetic

2018-01-30 Thread Adam Petcher

+core-libs-dev


On 1/26/2018 4:06 PM, Adam Petcher wrote:

JBS: https://bugs.openjdk.java.net/browse/JDK-8181594
Webrev: http://cr.openjdk.java.net/~apetcher/8181594/webrev.00/

This is a code review for the field arithmetic that will be used in 
implementations of X25519/X448 key agreement, the Poly1305 
authenticator, and EdDSA signatures. I believe that the library has 
all the features necessary for X25519/X448 and Poly1305, and I expect 
at most a couple of minor enhancements will be required to support 
EdDSA. There is no public API for this library, so we can change it in 
the future to suit the needs of new algorithms without breaking 
compatibility with external code. Still, I made an attempt to clearly 
structure and document the (internal) API, and I want to make sure it 
is understandable and easy to use.


This is not a general-purpose modular arithmetic library. It will only 
work well in circumstances where the sequence of operations is 
restricted, and where the prime that defines the field has some useful 
structure. Moreover, each new field will require some field-specific 
code that takes into account the structure of the prime and the way 
the field is used in the application. The initial implementation 
includes a field for Poly1305 and the fields for X25519/X448 which 
should also work for EdDSA.


The benefits of using this library are that it is much more efficient 
than using similar operations in BigInteger. Also, many operations are 
branch-free, making them suitable for use in a side-channel resistant 
implementation that does not branch on secrets.


To provide some context, I have attached a code snippet describing how 
this library can be used. The snippet is the constant-time Montgomery 
ladder from my X25519/X448 implementation, which I expect to be out 
for review soon. X25519/X448 only uses standard arithmetic operations, 
and the more unusual features (e.g. add modulo a power of 2) are 
needed by Poly1305.


The field arithmetic (for all fields) is implemented using a 32-bit 
representation similar to the one described in the Ed448 paper[1] (in 
the "Implementation on 32-bit platforms" section). Though my 
implementation uses signed limbs, and grade-school multiplication 
instead of Karatsuba. The argument for correctness is essentially the 
same for all three fields: the magnitude of each 64-bit limb is at 
most 2^(k-1) after reduction, except for the last limb which may have 
a magnitude of up to 2^k. The values of k are between 26 to 28 
(depending on the field), and we can calculate that the maximum 
magnitude for any limb during an add-multiply-carry-reduce sequence is 
always less than 2^63. Therefore, no overflow occurs and all 
operations are correct.


Process note: this enhancement is part of JEP 324 (Key Agreement with 
Curve25519 and Curve448). When this code review is complete, nothing 
will happen until all other work for this JEP is complete, and the JEP 
is accepted as part of some release. This means that this code will be 
pushed to the repo along with the X25519/X448 code that uses it.


[1] https://eprint.iacr.org/2015/625.pdf





private IntegerModuloP_Base pointMultiply(byte[] k, IntegerModuloP u){

IntegerModuloP x_1 = u;
MutableIntegerModuloP x_2 = one.mutable();
MutableIntegerModuloP z_2 = zero.mutable();
MutableIntegerModuloP x_3 = u.mutable();
MutableIntegerModuloP z_3 = one.mutable();
int swap = 0;

// Variables below are reused to avoid unnecessary allocation
// They will be assigned in the loop, so initial value doesn't matter
MutableIntegerModuloP m1 = zero.mutable();
MutableIntegerModuloP DA = zero.mutable();
MutableIntegerModuloP E = zero.mutable();
MutableIntegerModuloP a24_times_E = zero.mutable();

for(int t = params.getBits() - 1; t >= 0; t--){
int k_t = bitAt(k, t);
swap = swap ^ k_t;
x_2.conditionalSwapWith(x_3, swap);
z_2.conditionalSwapWith(z_3, swap);
swap = k_t;

// A(m1) = x_2 + z_2
m1.setValue(x_2).setSum(z_2);
// D = x_3 - z_3
// DA = D * A(m1)
DA.setValue(x_3).setDifference(z_3).setProduct(m1);
// AA(m1) = A(m1)^2
m1.setSquare();
// B(x_2) = x_2 - z_2
x_2.setDifference(z_2);
// C = x_3 + z_3
// CB(x_3) = C * B(x_2)
x_3.setSum(z_3).setProduct(x_2);
// BB(x_2) = B^2
x_2.setSquare();
// E = AA(m1) - BB(x_2)
E.setValue(m1).setDifference(x_2);
// compute a24 * E using SmallValue
a24_times_E.setValue(E);
a24_times_E.setProduct(a24);

// assign results to x_3, z_3, x_2, z_2
// x_2 = AA(m1) * BB
x_2.setProduct(m1);
// z_2 = E * (AA(m1) + a24 * E)