Re: [Jprogramming] Two implementation questions

Marshall Lochbaum Mon, 02 Aug 2021 18:04:28 -0700

I wrote maybe 50k lines of code over two years for Dyalog 18.0. Yes,
there are bugs. There were plenty of bugs when I joined too. Frankly I
think Dyalog's response to 18.0 says as much about its relationship with
clients as about the nature of the release.


Marshall

On Mon, Aug 02, 2021 at 05:30:43PM -0700, Roger Hui wrote:
> The recent work on bit arrays in Dyalog APL is the source of numerous
> serious and subtle bugs.  I am not speaking on any official capacity.
> 
> 
> 
> On Mon, Aug 2, 2021 at 2:56 PM Marshall Lochbaum <[email protected]>
> wrote:
> 
> > Is (64-bit) J still using 8 bytes for all its non-boolean integers and
> > floats? If so I think that mostly explains the difference. You can't
> > gain that much performance by turning 8 bits to 1 when you're still
> > regularly dealing with 64-bit values. The biggest advantage for bit
> > arrays in Dyalog was usually for sections of the program that work
> > mainly with booleans. These sections become free, essentially. This is
> > relevant when the other parts are dealing with 1- and 2-byte values, not
> > 8-byte ones.
> >
> > For (x # y), looks like you pack x to booleans and then find the next 1
> > bit at each iteration? Assuming I'm reading right, this method isn't
> > fast in general. I used something similar for the sparse case in Dyalog
> > but switch over when ((+/%#) x) is more than 1/16 or so. A lookup table
> > converting bytes to lists of indices is much better for dense cases. I
> > describe it in my notes here:
> >
> > https://mlochbaum.github.io/BQN/implementation/primitive/replicate.html
> >
> > There's more improvement to be had with SIMD if y consists of 1- or
> > 2-byte elements: in those cases it's worthwhile to skip the index
> > generation and use shuffle instructions to filter data directly.
> >
> > Marshall
> >
> > On Mon, Aug 02, 2021 at 04:55:06PM -0400, Henry Rich wrote:
> > > 1a. What's an example that is greatly sped up by bitarrays?
> > >
> > > 1b. J uses bitarrays internally in dyad i. when the arguments are large
> > and
> > > integral.
> > >
> > > 1c. I haven't seen enough advantage to suggest that bitarrays are worth
> > > implementing.  (x # y), perhaps the most important case, is dominated by
> > the
> > > time to move y, not x; and JE does its best to handle long runs of 0s
> > > quickly.
> > >
> > > 1d. With bitarrays, a single pointer is not enough to designate an atom
> > of
> > > an array.
> > >
> > > 1e. For internal structures (as in u;.1), JE uses run-length-encoded
> > tables,
> > > which IMO are better than bitarrays.
> > >
> > > 1f. I am open to persuasion on this point but a new datatype is a great
> > deal
> > > of work & I need to to hear a compelling example.
> > >
> > >
> > > 2a. Padding to what boundary?  The boundary that would satisfy all needs
> > is
> > > 32 or 64 bytes, and that would make an nx2 character array pretty big.
> > >
> > > 2b. Given J's frequent use of variable-length subarrays, you would not
> > want
> > > unaligned operations to be much slower than aligned ones.
> > >
> > > 2c. Packing the atoms together makes the loops that iterate over atoms
> > > somewhat faster.
> > >
> > > 2d. Alignment is only one of many factors affecting performance.
> > Operation
> > > in place is usually more important.
> > >
> > > 2e. As an old hardware designer, I expect that unaligned stores are more
> > > expensive than unaligned loads.  From my measurements, this is the
> > case.  JE
> > > executes to avoid unaligned stores on large arguments.
> > >
> > > 2f. As it happens, I tried rewriting (x |: y) recently to execute in
> > place &
> > > had to scrap the project because the unaligned stores were too
> > expensive.
> > > But that's the ONLY case I've found where alignment caused trouble.
> > >
> > > Henry Rich
> > >
> > >
> > >
> > > On 8/2/2021 4:01 PM, Elijah Stone wrote:
> > > > Apologies if this is the wrong place for this.
> > > >
> > > > First: what's the deal with bitarrays?  I have heard APL implementors
> > > > state with great conviction that they are worthwhile and have led to
> > > > great performance increases; and I have heard J implementors state with
> > > > equal conviction that they are not worth their while at all.  I don't
> > > > know whom to believe!
> > > >
> > > > Second: is it worth it to add padding for >1-rank arrays?  It seems
> > > > convenient for major cells (and perhaps more minor cells as well) to be
> > > > aligned, and I know of at least one high-performance in-place high-rank
> > > > transposition algorithm which requires some degree of
> > padding/alignment.
> > > > J's array structure doesn't seem to make any allowance for more than
> > one
> > > > shape.  Has this approach been tried and found lacking, or is it just
> > > > untrodden ground?
> > > >
> > > >  -E
> > > > ----------------------------------------------------------------------
> > > > For information about J forums see http://www.jsoftware.com/forums.htm
> > >
> > >
> > > --
> > > This email has been checked for viruses by AVG.
> > > https://www.avg.com
> > >
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Two implementation questions

Reply via email to