The recent work on bit arrays in Dyalog APL is the source of numerous
serious and subtle bugs.  I am not speaking on any official capacity.



On Mon, Aug 2, 2021 at 2:56 PM Marshall Lochbaum <[email protected]>
wrote:

> Is (64-bit) J still using 8 bytes for all its non-boolean integers and
> floats? If so I think that mostly explains the difference. You can't
> gain that much performance by turning 8 bits to 1 when you're still
> regularly dealing with 64-bit values. The biggest advantage for bit
> arrays in Dyalog was usually for sections of the program that work
> mainly with booleans. These sections become free, essentially. This is
> relevant when the other parts are dealing with 1- and 2-byte values, not
> 8-byte ones.
>
> For (x # y), looks like you pack x to booleans and then find the next 1
> bit at each iteration? Assuming I'm reading right, this method isn't
> fast in general. I used something similar for the sparse case in Dyalog
> but switch over when ((+/%#) x) is more than 1/16 or so. A lookup table
> converting bytes to lists of indices is much better for dense cases. I
> describe it in my notes here:
>
> https://mlochbaum.github.io/BQN/implementation/primitive/replicate.html
>
> There's more improvement to be had with SIMD if y consists of 1- or
> 2-byte elements: in those cases it's worthwhile to skip the index
> generation and use shuffle instructions to filter data directly.
>
> Marshall
>
> On Mon, Aug 02, 2021 at 04:55:06PM -0400, Henry Rich wrote:
> > 1a. What's an example that is greatly sped up by bitarrays?
> >
> > 1b. J uses bitarrays internally in dyad i. when the arguments are large
> and
> > integral.
> >
> > 1c. I haven't seen enough advantage to suggest that bitarrays are worth
> > implementing.  (x # y), perhaps the most important case, is dominated by
> the
> > time to move y, not x; and JE does its best to handle long runs of 0s
> > quickly.
> >
> > 1d. With bitarrays, a single pointer is not enough to designate an atom
> of
> > an array.
> >
> > 1e. For internal structures (as in u;.1), JE uses run-length-encoded
> tables,
> > which IMO are better than bitarrays.
> >
> > 1f. I am open to persuasion on this point but a new datatype is a great
> deal
> > of work & I need to to hear a compelling example.
> >
> >
> > 2a. Padding to what boundary?  The boundary that would satisfy all needs
> is
> > 32 or 64 bytes, and that would make an nx2 character array pretty big.
> >
> > 2b. Given J's frequent use of variable-length subarrays, you would not
> want
> > unaligned operations to be much slower than aligned ones.
> >
> > 2c. Packing the atoms together makes the loops that iterate over atoms
> > somewhat faster.
> >
> > 2d. Alignment is only one of many factors affecting performance.
> Operation
> > in place is usually more important.
> >
> > 2e. As an old hardware designer, I expect that unaligned stores are more
> > expensive than unaligned loads.  From my measurements, this is the
> case.  JE
> > executes to avoid unaligned stores on large arguments.
> >
> > 2f. As it happens, I tried rewriting (x |: y) recently to execute in
> place &
> > had to scrap the project because the unaligned stores were too
> expensive.
> > But that's the ONLY case I've found where alignment caused trouble.
> >
> > Henry Rich
> >
> >
> >
> > On 8/2/2021 4:01 PM, Elijah Stone wrote:
> > > Apologies if this is the wrong place for this.
> > >
> > > First: what's the deal with bitarrays?  I have heard APL implementors
> > > state with great conviction that they are worthwhile and have led to
> > > great performance increases; and I have heard J implementors state with
> > > equal conviction that they are not worth their while at all.  I don't
> > > know whom to believe!
> > >
> > > Second: is it worth it to add padding for >1-rank arrays?  It seems
> > > convenient for major cells (and perhaps more minor cells as well) to be
> > > aligned, and I know of at least one high-performance in-place high-rank
> > > transposition algorithm which requires some degree of
> padding/alignment.
> > > J's array structure doesn't seem to make any allowance for more than
> one
> > > shape.  Has this approach been tried and found lacking, or is it just
> > > untrodden ground?
> > >
> > >  -E
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> >
> > --
> > This email has been checked for viruses by AVG.
> > https://www.avg.com
> >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to