ni...@lysator.liu.se (Niels Möller) writes:
What about vldm? Like
vldmup!, {q0,q1,q2,q3}
As far as I understand the manual, it supports a larger number of
registers. The registers must be consecutive, but that's no problem
here.
I added a long list of things to try.
I decided to play a bit with Neon, but instead of doing something hard
like addmul_k, I wrote an mpn_popcount. :-)
The code runs well for A15 at about 0.56 c/l, but much worse on A9 at
about 2.8 c/l. (The inner-loops hard whacking on q8 is a problem on A9;
using a8 and a9 alternatingly shaves
On 2013-02-27 13:27, Torbjorn Granlund wrote:
Specific questions:
* I completely ignore alignment. Is that bad?
I'm not sure about that. It's something that perhaps we should
experiment with. As written, the code will work, as the chip will
handle totally unaligned data. What I don't
On 2013-02-27 14:33, Torbjorn Granlund wrote:
vld1.32 { q1, q2 }, [r0@128]!
As specified in section A.3.2.1, if you specify the alignment it will
also be checked, so you'll get SIGBUS if its not right.
I wanted to experiment, but I cannot find any syntax which is accepted
Richard Henderson r...@twiddle.net writes:
On 2013-02-27 13:27, Torbjorn Granlund wrote:
* Can one read four 128-bit values using just one insn (for inner loop)?
No. We can only read 4 64-bit values. I didn't actually realize the
assembler would accept Q registers in the list grammar