Re: ARM Neon popcount

2013-02-28 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: What about vldm? Like vldmup!, {q0,q1,q2,q3} As far as I understand the manual, it supports a larger number of registers. The registers must be consecutive, but that's no problem here. I added a long list of things to try.

Re: ARM Neon popcount

2013-02-27 Thread Richard Henderson
On 2013-02-27 13:27, Torbjorn Granlund wrote: Specific questions: * I completely ignore alignment. Is that bad? I'm not sure about that. It's something that perhaps we should experiment with. As written, the code will work, as the chip will handle totally unaligned data. What I don't

Re: ARM Neon popcount

2013-02-27 Thread Richard Henderson
On 2013-02-27 14:33, Torbjorn Granlund wrote: vld1.32 { q1, q2 }, [r0@128]! As specified in section A.3.2.1, if you specify the alignment it will also be checked, so you'll get SIGBUS if its not right. I wanted to experiment, but I cannot find any syntax which is accepted

Re: ARM Neon popcount

2013-02-27 Thread Niels Möller
Richard Henderson r...@twiddle.net writes: On 2013-02-27 13:27, Torbjorn Granlund wrote: * Can one read four 128-bit values using just one insn (for inner loop)? No. We can only read 4 64-bit values. I didn't actually realize the assembler would accept Q registers in the list grammar