Sorry, I should say I am assuming 256 bit SIMD registers when I say 68 ORs.

On Thu, Oct 13, 2016 at 8:22 PM, Michal Dobrogost <
michal.dobrog...@gmail.com> wrote:

> Thanks Mike!
>
> I'm scared of diving off the deep end here, nevertheless, what you
> describe is exactly how I solve "one side" of the problem (in C++, J is
> doing the design work for now. A pipe dream would be to write my own
> mini-J-to-C compiler with only the operations required here - thus the
> desire to minimize the number of different operations used).
>
> So in the full problem we have:
>
> 17      = $X1
> 1024    = $X2
> 17 1024 = $Y
>
> and we want to compute both X1 (+./@:*.) Y and X2 (+./@:*.) |:Y many many
> times with Y being static.
>
> For X2 your approach works great, however for X1 it's better to take
> advantage of the bit-level parallelism (like you demonstrated for X2) which
> on modern SIMD machines can be 256 bits (or even 512 on the latest and
> greatest). I guess in J my suggested non-LUT approach for X1 would go
> something like:
>
>    Y=. 1 > ? 17 1024 $ 10
>    X=. ? 17 $ 2
>
>    YDense=. _32 #.\"1 Y
>    ResultDense=. X ((23 b.)/ @: *) YDense
>    Result=. , #: ResultDense
>
>    NB. Everything checks out
>    ~. Result = (X (+./@:*.) Y)
> 1
>
> For X1, the LUT version of ((0~:(17 b.))&(#.@|:) requires 1024 ANDs, 1024
> compares and then 1024 of the associated bit shuffling operations to get
> those results into contiguous register values.
>
> For X1, ((23 b.)/@:*) requires 68 ORs and 17 bit shuffling operations (to
> extract 0 or 1 from the bits of XDense).
>
> The LUT method I was originally suggesting goes one step further and
> precomputes ORs between adjacent rows of Y. The 3-bit version would
> precompute 3 rows at a time. The number of bits in the lut, linearly
> decreases the number of ORs and bit shuffling ops required (and memory
> accesses) at the cost of increased memory use.
>
> Cheers,
>
> Mike
>
> On Thu, Oct 13, 2016 at 2:09 AM, 'Mike Day' via Programming <
> programm...@jsoftware.com> wrote:
>
>> While I'm also not too clear what you're really trying to achieve,
>> it might be worth offering this solution to your particular example
>> knowing your required solution:
>>
>>    x  ((0~:(17 b.))&(#.@|:)) y
>> 1 1 0 0 1 1
>>
>> It uses the boolean and operation (17 b.) on bits at word level.
>> 0~: could be 0< if you're sure the sign bit isn't involved.
>> Inducing |: on x is redundant,  but it makes for a neater expression.
>>
>> If you were doing lots of these,  it would be better to
>> set up your "look-up" vector as
>>
>>     [LU =: #.|:y
>> 16 13 1 9 10 7
>>
>> and then do
>>
>>     LU(0~:(17 b.)) #.x
>> 1 1 0 0 1 1
>>
>> For repeated operation on several x s,  perhaps define a verb:
>>
>>    lu =: LU &(0 ~: 17 b.)@#."1
>>
>> eg
>>
>>     lu x,:|.x
>> 1 1 0 0 1 1
>> 0 1 1 1 1 1
>>
>> Binding LU into lu might be excessive for spatially large arrays of y.
>>
>> This approach should work well for fewer that 32 or 64 bits.  Otherwise,
>> you'd need to break the arrays up into word-sized chunks. Perhaps that
>> is what you were getting at?
>>
>> Any help,
>>
>> Mike
>>
>>
>>
>> On 13/10/2016 08:11, Michal Dobrogost wrote:
>>
>>> Sorry for spamming.
>>>
>>> In fact my paranoid use of @: can be replaced with @ everywhere. Giving:
>>>
>>> L3=. _3((|."1@#:@i.@(2^#)) (+./@#) ])\Y
>>>
>>> So I should also say that the real goal is to minimize the number of
>>> different operations used. Right now this stands at: #: i. 2^ # +. / @ #
>>> \
>>> if we assume #: is implemented in reverse.
>>>
>>> On Thu, Oct 13, 2016 at 12:05 AM, Michal Dobrogost <
>>> michal.dobrog...@gmail.com> wrote:
>>>
>>> And +./"2@:# can be rewritten as +./@#.
>>>>
>>>> Anything else?
>>>>
>>>> On Wed, Oct 12, 2016 at 11:55 PM, Michal Dobrogost <
>>>> michal.dobrog...@gmail.com> wrote:
>>>>
>>>> You can rewrite *."0 1"1 _ as #
>>>>>
>>>>>     ]L3=. _3((|."1@:#:@:i.@:(2^#)) (+./"2 @: #) ])\Y
>>>>> 0 0 0 0 0 NB. zero
>>>>> 0 0 0 0 1 NB.         0 { Y
>>>>> 0 0 0 1 0 NB.         1 { Y
>>>>> 0 0 0 1 1 NB. +./   0 1 { Y
>>>>> 0 0 1 0 0 NB.         2 { y
>>>>> 0 0 1 0 1 NB.       0 2 { Y
>>>>> 0 0 1 1 0 NB. +./   0 1 { Y
>>>>> 0 0 1 1 1 NB. +./ 0 1 2 { Y
>>>>>
>>>>> 0 0 0 0 0 NB. zero
>>>>> 0 1 0 0 0 NB.       3 { Y
>>>>> 1 0 0 0 0 NB.       4 { Y
>>>>> 1 1 0 0 0 NB. +./ 3 4 { Y
>>>>> 0 0 0 0 0
>>>>> 0 0 0 0 0
>>>>> 0 0 0 0 0
>>>>> 0 0 0 0 0
>>>>>
>>>>> On Wed, Oct 12, 2016 at 11:34 PM, Michal Dobrogost <
>>>>> michal.dobrog...@gmail.com> wrote:
>>>>>
>>>>> I've tried to clean up the code to be more explanatory. All we really
>>>>>> care about is generating L2 given an arbitrary Y. Rewriting this has
>>>>>> also
>>>>>> helped me catch a subtle bug with arrays going left-to-right but #.
>>>>>> and #:
>>>>>> interpreting numbers right-to-left.
>>>>>>
>>>>>>     NB. X and Y are completely arbitary and are chosen to be
>>>>>>     NB. easy to trace.
>>>>>>     ] X =. 0 1 1 0 1
>>>>>> 0 1 1 0 1
>>>>>>     ] Y =. |.=/~i.5
>>>>>> 0 0 0 0 1
>>>>>> 0 0 0 1 0
>>>>>> 0 0 1 0 0
>>>>>> 0 1 0 0 0
>>>>>> 1 0 0 0 0
>>>>>>
>>>>>>     NB. This is the final result but we are actually trying to
>>>>>>     NB. generate lookup tables for highly optimized C++ code.
>>>>>>     X (+./@:*.) Y
>>>>>> 1 0 1 1 0
>>>>>>
>>>>>>     NB. Break up x into 2-bit chunks.
>>>>>>     NB. Note: reversed with |. because #. below interprets right-most
>>>>>>     NB.       digit as most significant but J's array has first
>>>>>>     NB.       item as left-most.
>>>>>>     X
>>>>>> 0 1 1 0 1
>>>>>>     _2<@|.\X
>>>>>> ┌───┬───┬─┐
>>>>>> │1 0│0 1│1│
>>>>>> └───┴───┴─┘
>>>>>>
>>>>>>     NB. We convert 2-bit chunks of x into indices.
>>>>>>     _2#.@|.\X
>>>>>> 2 1 1
>>>>>>
>>>>>>     ]L2=. _2((|."1@:#:@:i.@:(2^#)) (+./"2 @: (*."0 1"1 _)) ])\Y
>>>>>> 0 0 0 0 0 NB. zero
>>>>>> 0 0 0 0 1 NB. 0 { y
>>>>>> 0 0 0 1 0 NB. 1 { y
>>>>>> 0 0 0 1 1 NB. +./ 0 1 { y
>>>>>>
>>>>>> 0 0 0 0 0 NB. zero
>>>>>> 0 0 1 0 0 NB. 2 { y
>>>>>> 0 1 0 0 0 NB. 3 { y
>>>>>> 0 1 1 0 0 NB. +./ 2 3 { y
>>>>>>
>>>>>> 0 0 0 0 0 NB. zero
>>>>>> 1 0 0 0 0 NB. 4 { y
>>>>>> 0 0 0 0 0
>>>>>> 0 0 0 0 0
>>>>>>
>>>>>>     NB. Compute by 2-bit lookups (same result as +./@:*. above).
>>>>>>     NB. We don't really care, this just demonstrates that the LUT
>>>>>> works.
>>>>>>     +./ (_2#.@|.\x) {"0 2 L2
>>>>>> 1 0 1 1 0
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> On Wed, Oct 12, 2016 at 7:00 PM, Michal Dobrogost <
>>>>>> michal.dobrog...@gmail.com> wrote:
>>>>>>
>>>>>> Hi Raul,
>>>>>>>
>>>>>>> What is the definition of *inds*?
>>>>>>>
>>>>>>>
>>>>>>> I would describe the higher level operation as: select the rows of Y
>>>>>>> where X is 1. Then take the column-wise OR of them.
>>>>>>>
>>>>>>> However the real thing I'm interested in is generating the lookup
>>>>>>> tables which serve as an intermediate step in the higher level
>>>>>>> operation.
>>>>>>>
>>>>>>> On Wed, Oct 12, 2016 at 9:35 AM, Raul Miller <rauldmil...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> I'm not quite sure I understand what you are doing here.
>>>>>>>>
>>>>>>>> Here's what I understand from your description:
>>>>>>>>
>>>>>>>> You're looking for rows where a bit set in X has a bit set in Y, and
>>>>>>>> you want to split up X and Y into smaller pieces for your
>>>>>>>> intermediate
>>>>>>>> result, and you want to use indices rather than bit operations for
>>>>>>>> your final operation.
>>>>>>>>
>>>>>>>> That gives me something like this.
>>>>>>>>
>>>>>>>>     X =: ? 5 $ 2
>>>>>>>>     Y =: 30 > ? 5 6 $ 100
>>>>>>>>     X
>>>>>>>> 0 1 1 1 1
>>>>>>>>     Y
>>>>>>>> 0 0 0 0 0 0
>>>>>>>> 1 0 0 0 0 0
>>>>>>>> 1 1 1 0 0 1
>>>>>>>> 1 0 0 1 0 0
>>>>>>>> 0 0 0 0 0 1
>>>>>>>>     (_2 {. X) inds _2 {. Y
>>>>>>>> 0 3 5
>>>>>>>>     (3 {. X) inds 3 {. Y
>>>>>>>> 0 1 2 5
>>>>>>>>     0 3 5 ([ -. -.) 0 1 2 5
>>>>>>>> 0 5
>>>>>>>>
>>>>>>>> So maybe you would be looking at making an adverb along the lines
>>>>>>>> of:
>>>>>>>>
>>>>>>>> L=: adverb define
>>>>>>>> :
>>>>>>>>     (m {. x) inds m {. y
>>>>>>>> )
>>>>>>>>
>>>>>>>> Or, more concisely:
>>>>>>>>
>>>>>>>> L=: adverb define
>>>>>>>>     inds&(m&{.)
>>>>>>>> )
>>>>>>>>
>>>>>>>> But when I look at your calculations, I don't see anything like
>>>>>>>> this.
>>>>>>>>
>>>>>>>> If I am off base here, could you describe how what you are looking
>>>>>>>> for
>>>>>>>> differs from what I am understanding?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> --
>>>>>>>> Raul
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 12, 2016 at 12:14 PM, Michal Dobrogost
>>>>>>>> <michal.dobrog...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> I've been mucking around generating look up tables in C++. Getting
>>>>>>>>> frustrated, I wrote up this J single liner. Can you think of ways
>>>>>>>>> to
>>>>>>>>> simplify the LUT computation (the expression we assign to L2 and L3
>>>>>>>>>
>>>>>>>> below)?
>>>>>>>>
>>>>>>>>> *Original Operator (no LUT)*
>>>>>>>>>
>>>>>>>>>     ] x =. ? 5 $ 2
>>>>>>>>> 1 0 1 1 0
>>>>>>>>>     ] y =. 30 > ? 5 6 $ 100
>>>>>>>>> 1 0 0 0 0 0
>>>>>>>>> 0 1 0 1 1 0
>>>>>>>>> 0 1 0 0 0 1
>>>>>>>>> 0 0 0 0 1 1
>>>>>>>>> 0 1 1 1 0 1
>>>>>>>>>     x (+./@:*.) y
>>>>>>>>> 1 1 0 0 1 1
>>>>>>>>>
>>>>>>>>> *LUT Explanation*
>>>>>>>>>
>>>>>>>>> The idea is to break up x into smaller chunks (2-bits, 3-bits, etc)
>>>>>>>>>
>>>>>>>> and
>>>>>>>>
>>>>>>>>> precompute the operation for the corresponding chunks of y. Then we
>>>>>>>>>
>>>>>>>> just
>>>>>>>>
>>>>>>>>> convert the chunks into indices and look them up in the LUT.
>>>>>>>>>
>>>>>>>>>     _2<\x
>>>>>>>>> ┌───┬───┬─┐
>>>>>>>>> │1 0│1 1│0│
>>>>>>>>> └───┴───┴─┘
>>>>>>>>>     _2#.\x
>>>>>>>>> 2 3 0
>>>>>>>>>
>>>>>>>>> *2-bit LUT*
>>>>>>>>>
>>>>>>>>>     ]L2=. _2((#:@:i.@:(2&^)@:#) (+./"2 @: (*."0 1"1 _)) ])\y
>>>>>>>>> 0 0 0 0 0 0
>>>>>>>>> 0 1 0 1 1 0
>>>>>>>>> 1 0 0 0 0 0
>>>>>>>>> 1 1 0 1 1 0
>>>>>>>>>
>>>>>>>>> 0 0 0 0 0 0
>>>>>>>>> 0 0 0 0 1 1
>>>>>>>>> 0 1 0 0 0 1
>>>>>>>>> 0 1 0 0 1 1
>>>>>>>>>
>>>>>>>>> 0 0 0 0 0 0
>>>>>>>>> 0 1 1 1 0 1
>>>>>>>>> 0 0 0 0 0 0
>>>>>>>>> 0 0 0 0 0 0
>>>>>>>>>
>>>>>>>>>     +./ (_2#.\x) {"0 2 L2    NB. Compute by 2-bit lookups
>>>>>>>>> 1 1 0 0 1 1
>>>>>>>>>
>>>>>>>>> *3-Bit LUT*
>>>>>>>>>
>>>>>>>>>     ]L3=. _3((#:@:i.@:(2&^)@:#) (+./"2 @: (*."0 1"1 _)) ])\y
>>>>>>>>> 0 0 0 0 0 0
>>>>>>>>> 0 1 0 0 0 1
>>>>>>>>> 0 1 0 1 1 0
>>>>>>>>> 0 1 0 1 1 1
>>>>>>>>> 1 0 0 0 0 0
>>>>>>>>> 1 1 0 0 0 1
>>>>>>>>> 1 1 0 1 1 0
>>>>>>>>> 1 1 0 1 1 1
>>>>>>>>>
>>>>>>>>> 0 0 0 0 0 0
>>>>>>>>> 0 1 1 1 0 1
>>>>>>>>> 0 0 0 0 1 1
>>>>>>>>> 0 1 1 1 1 1
>>>>>>>>> 0 0 0 0 0 0
>>>>>>>>> 0 0 0 0 0 0
>>>>>>>>> 0 0 0 0 0 0
>>>>>>>>> 0 0 0 0 0 0
>>>>>>>>>
>>>>>>>>>     +./ (_3#.\x) {"0 2 L3    NB. Compute by 3-bit lookups
>>>>>>>>> 1 1 0 0 1 1
>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>
>>>>>>>> ----------
>>>>>>>>
>>>>>>>>> For information about J forums see http://www.jsoftware.com/forum
>>>>>>>>>
>>>>>>>> s.htm
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ----------
>>>>>>>> For information about J forums see http://www.jsoftware.com/forum
>>>>>>>> s.htm
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>> ----------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>>
>>
>> ---
>> This email has been checked for viruses by Avast antivirus software.
>> https://www.avast.com/antivirus
>>
>>
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to