Re: [Jprogramming] Two implementation questions

'Rob Hodgkinson' via Programming Thu, 05 Aug 2021 03:36:16 -0700

My 2c, I worked on an APL Implementation of Sharp APL on HP-UX in C with Arthur 
Whitney  in the 1980’s (2-3 person project).


To fully replicate Sharp APL full bit storage was implemented (up to 32 
booleans were saved in a 32 bit unsigned integer), so indexing into that list 
entailed bit shifts by offset and masking out the value, and it was equally 
obfuscated to set a bit).
RISC architectures enabled the handling then (which had optimised bitwise 
operations), but when dealing with rank n arrays saved in ravel order, the 
overhead of managing the arrays outweighed any bitwise operation  benefits.
The “bitwise overhead” was repeated baggage and the reduced storage advantage 
was offset by 2 things: (1) code complexity and (2) repeated overhead.  A 
couple of considerations that were encountered:
How to store boolean arrays and whether to align on dimensions
For example store a 5 x 13 array in “full ravel order” so require 65 continuous 
bits here (plus header), or do you “break” to start each each row in a new 
integer) ? (We chose the former for maximum space saving)
array operations
Eg when you need to index into bit arrays stored in ravel order for ad hoc 
indexing
Or index by columns
Or transpose a matrix
And matrix inverse (where one needs to pivot on rows/columns) … and for use 
cases think about a connectivity matrix and transitive closure operations...
… all these become heavily bloated with “bit handling” (this could be tidier if 
functions were used to deal with the datatypes, but then you have the overhead 
of function calls on each bit index … way too slow !!)
It was left in the code to replicate Sharp APL, but in review we wished we had 
removed it to simplify the interpreter.
In Arthur’s later projects boolean storage was avoided and Arthur as an 
implementer weighed heavily on simpler core design, with which I agree
I share Roger and Henry's view, it adds (fairly unnecessarily) to complexity 
and reliability and would entail a lot of work to retrofit

The above is what I consider the “trade offs” and my advice would be don’t 
implement it, it isn’t worth it.
Keep the J core lean and simple.  I think the J approach of 1 boolean per byte 
is a reasonable trade off.

Regards Rob


> On 5 Aug 2021, at 12:35 pm, Henry Rich <[email protected]> wrote:
> 
> Your analysis is essentially correct, but perhaps improved compiler 
> technology might help a bit.
> 
> I faced the same problem recently (around 9.03 beta-i) with the J atomic 
> primitives.  I had discovered what has to be done to get the operation to run 
> at the full memory speed.  It required rewriting all the arithmetic monads 
> and dyads.  An alluring thought was: why not just convert to the highest 
> precision: then you would need only one routine per precision?  But a quick 
> analysis showed that the excess memory bandwidth required by the conversion 
> would be intolerable.
> 
> Instead, I wrote a #define macro that understands the main datatypes 
> (boolean/int/float).  This macro loops through the inputs, converting as 
> required, and (crucially) unrolling and aligning as needed.  A small bit of 
> code supplied with each invocation performs the operation on the homogeneous 
> arguments resulting from the conversion.  The macro is controlled by a 
> parameter describing the arguments, and bristles with tests of the parameter, 
> but the modern C compiler is brilliant at moving tests from runtime to 
> compiletime, and the generated code is lean.
> 
> So yes, it's a quadratic problem, but the quadratic part can be localized to 
> the conversion step, which is a small piece of the overall code.
> 
> You would still have to handle the non-atomic primitives.  For them, 
> conversion to the higher type is reasonable.
> 
> All that said, I still have not been convinced that a bit-boolean datatype is 
> worth the trouble.
> 
> Henry Rich
> 
> 
> On 8/4/2021 8:59 PM, Devon McCormick wrote:
>> Once, when I asked Roger why J did not have true bit arrays, he told me I
>> was welcome to fund it.  These days that would mean modifying the source
>> myself.
>> 
>> However, early APL implementers noted that the size of an interpreter is
>> proportional to the square of the number of data-types it supports.  In
>> practical terms, this means that if I wanted to put bit arrays into J, I
>> would have to modify every piece of code where there is interaction between
>> any two arrays of different type and figure out things like: can I make the
>> other array a bit array or do I have to promote my bit array to the more
>> complex type?
>> 
>> So, I can see that that would be more work than I am willing to put in for
>> the marginal utility it would give me.
>> 
>> 
>> On Tue, Aug 3, 2021 at 5:37 PM Raul Miller <[email protected]> wrote:
>> 
>>> I am not sure why you are talking about unicode, floats and arbitrary DLLs
>>> here.
>>> 
>>> The example I showed did not involve either of those.
>>> 
>>> And, especially if you are working with fixed width 1 dimensional
>>> arrays, in many cases you can use the b. derived verbs directly. It's
>>> only when you're not using array-at-a-time operations that you need
>>> concern yourself with packing/unpacking and/or padding operations,
>>> 
>>> Take care,
>>> 
>>> --
>>> Raul
>>> 
>>> 
>>> On Tue, Aug 3, 2021 at 5:00 PM greg heil <[email protected]> wrote:
>>>> Raul
>>>> 
>>>> indeed i can make up anything
>>>> using various b.
>>>> storing seems to be reliable
>>>> only as 8b characters
>>>> i think some 16b Unicode's
>>>> are illegal
>>>> so i cannot store and manipulate
>>>> 16 bit floats that way
>>>> to do 12b floats
>>>> i might store pairs
>>>> in 3 bytes
>>>> 
>>>> i suppose one could
>>>> communicate with the many
>>>> DLL's that access the
>>>> BFloat16 hiding in every AVX machine
>>>> by sifting in and out of
>>>> that format
>>>> though bytes are rather inconvenient
>>>> due to the vagaries
>>>> of how sign, multiplier and multiplicand
>>>> bits are stored
>>>> 
>>>> one would prefer that that was all handled
>>>> and could check for positivity with
>>>> a simple >0
>>>> rather than make a DLL call
>>>> or another sift
>>>> 
>>>> ~greg heil
>>>> https//picsrp.github.io
>>>> 
>>>> --
>>>> 
>>>> from: Raul Miller <[email protected]>
>>>> to: Programming forum <[email protected]>
>>>> date: Aug 2, 2021, 10:12 PM
>>>> subject: Re: [Jprogramming] Two implementation questions
>>>> 
>>>> https://www.jsoftware.com/help/dictionary/dbdotn.htm
>>>> 
>>>>    #~.(i.1024) 32 b. 1
>>>> 64
>>>> 
>>>>> Or, on my machine, the bitwise operations give me 64 bits packed in a J
>>> atom.
>>>>> Constructing mechanisms which make an array of such things appear to be
>>> a bit array of arbitrary dimensions should be straightforward for someone
>>> with your background. (Though you could also use them "as is" with an
>>> implicit trailing dimension of 64.)
>>>> Thanks,
>>>> 
>>>> --
>>>> Raul
>>>> 
>>>> --
>>>> 
>>>> from: greg heil <[email protected]>
>>>> to: Programming forum <[email protected]>
>>>> date: Aug 2, 2021, 8:33 PM
>>>> subject: Re: [Jprogramming] Two implementation questions
>>>> 
>>>> Henry
>>>> 
>>>> Hmm, how would one create an array of bits?
>>>> use packed 8-bit characters?
>>>> would one use a similar strategy
>>>> for 16-bit floats?
>>>> (which i am more interested in,
>>>> and is directly supported in most hardware
>>>> -for its utility in Deep Learning)
>>>> 
>>>> Seems a serious amount of
>>>> packing unpacking
>>>> indexing chopping and inverting
>>>> just for matrix multiplication
>>>> i could be wrong
>>>> but i do not see the algorithm
>>>> 
>>>> ~greg heil
>>>> https//picsrp.github.io
>>>> 
>>>> --
>>>> 
>>>> from: Henry Rich <[email protected]>
>>>> to: [email protected]
>>>> date: Aug 2, 2021, 8:07 PM
>>>> subject: Re: [Jprogramming] Two implementation questions
>>>> 
>>>>> Rather than having a new datatype, you could create arrays of bits and
>>> use bitwise booleans on them.
>>>>> Can you list a set of operations on bit arrays that would suffice to
>>> solve the problems you were working on?
>>>> Henry Rich
>>>> 
>>>> --
>>>> 
>>>> from: greg heil <[email protected]>
>>>> to: Programming forum <[email protected]>
>>>> date: Aug 2, 2021, 7:07 PM
>>>> subject: Re: [Jprogramming] Two implementation questions
>>>> 
>>>> 
>>>> 
>>>> i did my undergraduate and doctoral thesis
>>>> on the categories of Graphs
>>>> with multiplication (and/or) as the operator
>>>> both were done using APL's bit-boolean's
>>>> 
>>>> much more than just multiplication
>>>> were done on these graphs
>>>> homomorphism, isomorphism
>>>> clustering cliquing
>>>> transitive/closures
>>>> all benefited from the simpler
>>>> bit-boolean's
>>>> 
>>>> as categories of graphs
>>>> run the full gamut
>>>> from 0-100% full
>>>> one cannot use
>>>> any form of linked list
>>>> 
>>>> a size multiplication by 8
>>>> would (at least in those days)
>>>> have caused a LOT of cache misses
>>>> 
>>>> even now caching would
>>>> be a lot happier
>>>> because the Social Networks
>>>> under investigation
>>>> (the object of the studies)
>>>> have grown a LOT bigger!-)
>>>> 
>>>> ~greg heil
>>>> https//picsrp.github.io
>>>> .
>>>> ----------------------------------------------------------------------
>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>> 
>> 
> 
> 
> -- 
> This email has been checked for viruses by AVG.
> https://www.avg.com
> 
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Two implementation questions

Reply via email to