On Mon, 2 Aug 2021, Henry Rich wrote:
1a. What's an example that is greatly sped up by bitarrays?
Logical boolean ops (and/or/xor).
Perhaps more importantly, lower cache usage improves overall program
locality. (Though this becomes doubly false every time you have to
promote a bitarray to a bytearray.)
1b. J uses bitarrays internally in dyad i. when the arguments are large
and integral.
That is 'third class' in vi.c? I see--somewhat. I will study that code
more closely.
1d. With bitarrays, a single pointer is not enough to designate an atom
of an array.
Yes, and you cannot associate a size with an atom either. I have been
tentatively supporting them in my own (nascent) implementation and there
are a lot of special cases to support. Hence the initial question.
Obviously it's much easier to design a system from the ground up to
support such a feature than to add it after the fact; I certainly wasn't
proposing JE add such a feature, just curious in general about the
tradeoffs there.
compelling example.
Single instruction 8x8 bitmatrix transpose on powerpc ;)
2a. Padding to what boundary? The boundary that would satisfy all needs
is 32 or 64 bytes, and that would make an nx2 character array pretty
big.
Good point. I guess you could do it heuristically, when rows are large
enough that the space overhead is negligible.
2f. As it happens, I tried rewriting (x |: y) recently to execute in
place & had to scrap the project because the unaligned stores were too
expensive. But that's the ONLY case I've found where alignment caused
trouble.
Here's the paper I was referring to:
https://www.researchgate.net/publication/273912700_In-Place_Matrix_Transposition_on_GPUs
(It refers specifically to monadic transpose; I'm not sure whether it
can be applied to the dyadic form.)
It wasn't actually to do with alignment, but rather with getting more
composite array dimensions. I believe that those aspects of the technique
are still useful on the CPU; their primary focus is on parallelism, but
they also improve locality. Particularly:
the dimensions of an M × N matrix are factorized as M' × m × N' × n or a
4D array, where M = M' × m and N = N' × n. This factorization defines a
blocked format on the matrix
-E
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm