I decided to play a bit with Neon, but instead of doing something hard
like addmul_k, I wrote an mpn_popcount. :-)
The code runs well for A15 at about 0.56 c/l, but much worse on A9 at
about 2.8 c/l. (The inner-loops hard whacking on q8 is a problem on A9;
using a8 and a9 alternatingly shaves
On 2013-02-27 13:27, Torbjorn Granlund wrote:
Specific questions:
* I completely ignore alignment. Is that bad?
I'm not sure about that. It's something that perhaps we should
experiment with. As written, the code will work, as the chip will
handle totally unaligned data. What I don't
On 2013-02-27 14:33, Torbjorn Granlund wrote:
vld1.32 { q1, q2 }, [r0@128]!
As specified in section A.3.2.1, if you specify the alignment it will
also be checked, so you'll get SIGBUS if its not right.
I wanted to experiment, but I cannot find any syntax which is accepted
Several times over the past week as I debug my neon routines, it has
become painfully apparent (as I accidentally single-step into the
dynamic linker) that the shared libgmp could use some help in
modernizing its internal linkage.
Most important is arranging for calls within GMP to go through
Richard Henderson r...@twiddle.net writes:
On 2013-02-27 13:27, Torbjorn Granlund wrote:
* Can one read four 128-bit values using just one insn (for inner loop)?
No. We can only read 4 64-bit values. I didn't actually realize the
assembler would accept Q registers in the list grammar