> This is what I was looking for. Running the benchmark on two very
> different systems was revealing: on my Pentium G620 the ntor
> server-side time was ~300 uSec, an Allwinner A20 system completed the
> server-side code in ~10600 uSec.

One of the things on my TODO list is to use NEON for the X25519 scalar
mult on ARM targets that are capable of such since it's a decent
performance increase, at least on 32 bit ARM.

One day I will also get an Aarch64 target and figure out optimization
there, since it's the way of the future.


Yawning Angel

