Re: [Jprogramming] Generating Lookup Tables (LUTs)

Michal Dobrogost Thu, 13 Oct 2016 20:23:15 -0700

Thanks Mike!

I'm scared of diving off the deep end here, nevertheless, what you describe
is exactly how I solve "one side" of the problem (in C++, J is doing the
design work for now. A pipe dream would be to write my own mini-J-to-C
compiler with only the operations required here - thus the desire to
minimize the number of different operations used).


So in the full problem we have:

17      = $X1
1024    = $X2
17 1024 = $Y

and we want to compute both X1 (+./@:*.) Y and X2 (+./@:*.) |:Y many many
times with Y being static.

For X2 your approach works great, however for X1 it's better to take
advantage of the bit-level parallelism (like you demonstrated for X2) which
on modern SIMD machines can be 256 bits (or even 512 on the latest and
greatest). I guess in J my suggested non-LUT approach for X1 would go
something like:

   Y=. 1 > ? 17 1024 $ 10
   X=. ? 17 $ 2

   YDense=. _32 #.\"1 Y
   ResultDense=. X ((23 b.)/ @: *) YDense
   Result=. , #: ResultDense

   NB. Everything checks out
   ~. Result = (X (+./@:*.) Y)
1

For X1, the LUT version of ((0~:(17 b.))&(#.@|:) requires 1024 ANDs, 1024
compares and then 1024 of the associated bit shuffling operations to get
those results into contiguous register values.

For X1, ((23 b.)/@:*) requires 68 ORs and 17 bit shuffling operations (to
extract 0 or 1 from the bits of XDense).

The LUT method I was originally suggesting goes one step further and
precomputes ORs between adjacent rows of Y. The 3-bit version would
precompute 3 rows at a time. The number of bits in the lut, linearly
decreases the number of ORs and bit shuffling ops required (and memory
accesses) at the cost of increased memory use.

Cheers,

Mike

On Thu, Oct 13, 2016 at 2:09 AM, 'Mike Day' via Programming <
programm...@jsoftware.com> wrote:

> While I'm also not too clear what you're really trying to achieve,
> it might be worth offering this solution to your particular example
> knowing your required solution:
>
>    x  ((0~:(17 b.))&(#.@|:)) y
> 1 1 0 0 1 1
>
> It uses the boolean and operation (17 b.) on bits at word level.
> 0~: could be 0< if you're sure the sign bit isn't involved.
> Inducing |: on x is redundant,  but it makes for a neater expression.
>
> If you were doing lots of these,  it would be better to
> set up your "look-up" vector as
>
>     [LU =: #.|:y
> 16 13 1 9 10 7
>
> and then do
>
>     LU(0~:(17 b.)) #.x
> 1 1 0 0 1 1
>
> For repeated operation on several x s,  perhaps define a verb:
>
>    lu =: LU &(0 ~: 17 b.)@#."1
>
> eg
>
>     lu x,:|.x
> 1 1 0 0 1 1
> 0 1 1 1 1 1
>
> Binding LU into lu might be excessive for spatially large arrays of y.
>
> This approach should work well for fewer that 32 or 64 bits.  Otherwise,
> you'd need to break the arrays up into word-sized chunks. Perhaps that
> is what you were getting at?
>
> Any help,
>
> Mike
>
>
>
> On 13/10/2016 08:11, Michal Dobrogost wrote:
>
>> Sorry for spamming.
>>
>> In fact my paranoid use of @: can be replaced with @ everywhere. Giving:
>>
>> L3=. _3((|."1@#:@i.@(2^#)) (+./@#) ])\Y
>>
>> So I should also say that the real goal is to minimize the number of
>> different operations used. Right now this stands at: #: i. 2^ # +. / @ # \
>> if we assume #: is implemented in reverse.
>>
>> On Thu, Oct 13, 2016 at 12:05 AM, Michal Dobrogost <
>> michal.dobrog...@gmail.com> wrote:
>>
>> And +./"2@:# can be rewritten as +./@#.
>>>
>>> Anything else?
>>>
>>> On Wed, Oct 12, 2016 at 11:55 PM, Michal Dobrogost <
>>> michal.dobrog...@gmail.com> wrote:
>>>
>>> You can rewrite *."0 1"1 _ as #
>>>>
>>>>     ]L3=. _3((|."1@:#:@:i.@:(2^#)) (+./"2 @: #) ])\Y
>>>> 0 0 0 0 0 NB. zero
>>>> 0 0 0 0 1 NB.         0 { Y
>>>> 0 0 0 1 0 NB.         1 { Y
>>>> 0 0 0 1 1 NB. +./   0 1 { Y
>>>> 0 0 1 0 0 NB.         2 { y
>>>> 0 0 1 0 1 NB.       0 2 { Y
>>>> 0 0 1 1 0 NB. +./   0 1 { Y
>>>> 0 0 1 1 1 NB. +./ 0 1 2 { Y
>>>>
>>>> 0 0 0 0 0 NB. zero
>>>> 0 1 0 0 0 NB.       3 { Y
>>>> 1 0 0 0 0 NB.       4 { Y
>>>> 1 1 0 0 0 NB. +./ 3 4 { Y
>>>> 0 0 0 0 0
>>>> 0 0 0 0 0
>>>> 0 0 0 0 0
>>>> 0 0 0 0 0
>>>>
>>>> On Wed, Oct 12, 2016 at 11:34 PM, Michal Dobrogost <
>>>> michal.dobrog...@gmail.com> wrote:
>>>>
>>>> I've tried to clean up the code to be more explanatory. All we really
>>>>> care about is generating L2 given an arbitrary Y. Rewriting this has
>>>>> also
>>>>> helped me catch a subtle bug with arrays going left-to-right but #.
>>>>> and #:
>>>>> interpreting numbers right-to-left.
>>>>>
>>>>>     NB. X and Y are completely arbitary and are chosen to be
>>>>>     NB. easy to trace.
>>>>>     ] X =. 0 1 1 0 1
>>>>> 0 1 1 0 1
>>>>>     ] Y =. |.=/~i.5
>>>>> 0 0 0 0 1
>>>>> 0 0 0 1 0
>>>>> 0 0 1 0 0
>>>>> 0 1 0 0 0
>>>>> 1 0 0 0 0
>>>>>
>>>>>     NB. This is the final result but we are actually trying to
>>>>>     NB. generate lookup tables for highly optimized C++ code.
>>>>>     X (+./@:*.) Y
>>>>> 1 0 1 1 0
>>>>>
>>>>>     NB. Break up x into 2-bit chunks.
>>>>>     NB. Note: reversed with |. because #. below interprets right-most
>>>>>     NB.       digit as most significant but J's array has first
>>>>>     NB.       item as left-most.
>>>>>     X
>>>>> 0 1 1 0 1
>>>>>     _2<@|.\X
>>>>> ┌───┬───┬─┐
>>>>> │1 0│0 1│1│
>>>>> └───┴───┴─┘
>>>>>
>>>>>     NB. We convert 2-bit chunks of x into indices.
>>>>>     _2#.@|.\X
>>>>> 2 1 1
>>>>>
>>>>>     ]L2=. _2((|."1@:#:@:i.@:(2^#)) (+./"2 @: (*."0 1"1 _)) ])\Y
>>>>> 0 0 0 0 0 NB. zero
>>>>> 0 0 0 0 1 NB. 0 { y
>>>>> 0 0 0 1 0 NB. 1 { y
>>>>> 0 0 0 1 1 NB. +./ 0 1 { y
>>>>>
>>>>> 0 0 0 0 0 NB. zero
>>>>> 0 0 1 0 0 NB. 2 { y
>>>>> 0 1 0 0 0 NB. 3 { y
>>>>> 0 1 1 0 0 NB. +./ 2 3 { y
>>>>>
>>>>> 0 0 0 0 0 NB. zero
>>>>> 1 0 0 0 0 NB. 4 { y
>>>>> 0 0 0 0 0
>>>>> 0 0 0 0 0
>>>>>
>>>>>     NB. Compute by 2-bit lookups (same result as +./@:*. above).
>>>>>     NB. We don't really care, this just demonstrates that the LUT
>>>>> works.
>>>>>     +./ (_2#.@|.\x) {"0 2 L2
>>>>> 1 0 1 1 0
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Mike
>>>>>
>>>>> On Wed, Oct 12, 2016 at 7:00 PM, Michal Dobrogost <
>>>>> michal.dobrog...@gmail.com> wrote:
>>>>>
>>>>> Hi Raul,
>>>>>>
>>>>>> What is the definition of *inds*?
>>>>>>
>>>>>>
>>>>>> I would describe the higher level operation as: select the rows of Y
>>>>>> where X is 1. Then take the column-wise OR of them.
>>>>>>
>>>>>> However the real thing I'm interested in is generating the lookup
>>>>>> tables which serve as an intermediate step in the higher level
>>>>>> operation.
>>>>>>
>>>>>> On Wed, Oct 12, 2016 at 9:35 AM, Raul Miller <rauldmil...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> I'm not quite sure I understand what you are doing here.
>>>>>>>
>>>>>>> Here's what I understand from your description:
>>>>>>>
>>>>>>> You're looking for rows where a bit set in X has a bit set in Y, and
>>>>>>> you want to split up X and Y into smaller pieces for your
>>>>>>> intermediate
>>>>>>> result, and you want to use indices rather than bit operations for
>>>>>>> your final operation.
>>>>>>>
>>>>>>> That gives me something like this.
>>>>>>>
>>>>>>>     X =: ? 5 $ 2
>>>>>>>     Y =: 30 > ? 5 6 $ 100
>>>>>>>     X
>>>>>>> 0 1 1 1 1
>>>>>>>     Y
>>>>>>> 0 0 0 0 0 0
>>>>>>> 1 0 0 0 0 0
>>>>>>> 1 1 1 0 0 1
>>>>>>> 1 0 0 1 0 0
>>>>>>> 0 0 0 0 0 1
>>>>>>>     (_2 {. X) inds _2 {. Y
>>>>>>> 0 3 5
>>>>>>>     (3 {. X) inds 3 {. Y
>>>>>>> 0 1 2 5
>>>>>>>     0 3 5 ([ -. -.) 0 1 2 5
>>>>>>> 0 5
>>>>>>>
>>>>>>> So maybe you would be looking at making an adverb along the lines of:
>>>>>>>
>>>>>>> L=: adverb define
>>>>>>> :
>>>>>>>     (m {. x) inds m {. y
>>>>>>> )
>>>>>>>
>>>>>>> Or, more concisely:
>>>>>>>
>>>>>>> L=: adverb define
>>>>>>>     inds&(m&{.)
>>>>>>> )
>>>>>>>
>>>>>>> But when I look at your calculations, I don't see anything like this.
>>>>>>>
>>>>>>> If I am off base here, could you describe how what you are looking
>>>>>>> for
>>>>>>> differs from what I am understanding?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> --
>>>>>>> Raul
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 12, 2016 at 12:14 PM, Michal Dobrogost
>>>>>>> <michal.dobrog...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> I've been mucking around generating look up tables in C++. Getting
>>>>>>>> frustrated, I wrote up this J single liner. Can you think of ways to
>>>>>>>> simplify the LUT computation (the expression we assign to L2 and L3
>>>>>>>>
>>>>>>> below)?
>>>>>>>
>>>>>>>> *Original Operator (no LUT)*
>>>>>>>>
>>>>>>>>     ] x =. ? 5 $ 2
>>>>>>>> 1 0 1 1 0
>>>>>>>>     ] y =. 30 > ? 5 6 $ 100
>>>>>>>> 1 0 0 0 0 0
>>>>>>>> 0 1 0 1 1 0
>>>>>>>> 0 1 0 0 0 1
>>>>>>>> 0 0 0 0 1 1
>>>>>>>> 0 1 1 1 0 1
>>>>>>>>     x (+./@:*.) y
>>>>>>>> 1 1 0 0 1 1
>>>>>>>>
>>>>>>>> *LUT Explanation*
>>>>>>>>
>>>>>>>> The idea is to break up x into smaller chunks (2-bits, 3-bits, etc)
>>>>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>> precompute the operation for the corresponding chunks of y. Then we
>>>>>>>>
>>>>>>> just
>>>>>>>
>>>>>>>> convert the chunks into indices and look them up in the LUT.
>>>>>>>>
>>>>>>>>     _2<\x
>>>>>>>> ┌───┬───┬─┐
>>>>>>>> │1 0│1 1│0│
>>>>>>>> └───┴───┴─┘
>>>>>>>>     _2#.\x
>>>>>>>> 2 3 0
>>>>>>>>
>>>>>>>> *2-bit LUT*
>>>>>>>>
>>>>>>>>     ]L2=. _2((#:@:i.@:(2&^)@:#) (+./"2 @: (*."0 1"1 _)) ])\y
>>>>>>>> 0 0 0 0 0 0
>>>>>>>> 0 1 0 1 1 0
>>>>>>>> 1 0 0 0 0 0
>>>>>>>> 1 1 0 1 1 0
>>>>>>>>
>>>>>>>> 0 0 0 0 0 0
>>>>>>>> 0 0 0 0 1 1
>>>>>>>> 0 1 0 0 0 1
>>>>>>>> 0 1 0 0 1 1
>>>>>>>>
>>>>>>>> 0 0 0 0 0 0
>>>>>>>> 0 1 1 1 0 1
>>>>>>>> 0 0 0 0 0 0
>>>>>>>> 0 0 0 0 0 0
>>>>>>>>
>>>>>>>>     +./ (_2#.\x) {"0 2 L2    NB. Compute by 2-bit lookups
>>>>>>>> 1 1 0 0 1 1
>>>>>>>>
>>>>>>>> *3-Bit LUT*
>>>>>>>>
>>>>>>>>     ]L3=. _3((#:@:i.@:(2&^)@:#) (+./"2 @: (*."0 1"1 _)) ])\y
>>>>>>>> 0 0 0 0 0 0
>>>>>>>> 0 1 0 0 0 1
>>>>>>>> 0 1 0 1 1 0
>>>>>>>> 0 1 0 1 1 1
>>>>>>>> 1 0 0 0 0 0
>>>>>>>> 1 1 0 0 0 1
>>>>>>>> 1 1 0 1 1 0
>>>>>>>> 1 1 0 1 1 1
>>>>>>>>
>>>>>>>> 0 0 0 0 0 0
>>>>>>>> 0 1 1 1 0 1
>>>>>>>> 0 0 0 0 1 1
>>>>>>>> 0 1 1 1 1 1
>>>>>>>> 0 0 0 0 0 0
>>>>>>>> 0 0 0 0 0 0
>>>>>>>> 0 0 0 0 0 0
>>>>>>>> 0 0 0 0 0 0
>>>>>>>>
>>>>>>>>     +./ (_3#.\x) {"0 2 L3    NB. Compute by 3-bit lookups
>>>>>>>> 1 1 0 0 1 1
>>>>>>>> ------------------------------------------------------------
>>>>>>>>
>>>>>>> ----------
>>>>>>>
>>>>>>>> For information about J forums see http://www.jsoftware.com/forum
>>>>>>>>
>>>>>>> s.htm
>>>>>>> ------------------------------------------------------------
>>>>>>> ----------
>>>>>>> For information about J forums see http://www.jsoftware.com/forum
>>>>>>> s.htm
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------
>> ----------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Generating Lookup Tables (LUTs)

Reply via email to