Sorry, I should say I am assuming 256 bit SIMD registers when I say 68 ORs.

## Advertising

On Thu, Oct 13, 2016 at 8:22 PM, Michal Dobrogost < michal.dobrog...@gmail.com> wrote: > Thanks Mike! > > I'm scared of diving off the deep end here, nevertheless, what you > describe is exactly how I solve "one side" of the problem (in C++, J is > doing the design work for now. A pipe dream would be to write my own > mini-J-to-C compiler with only the operations required here - thus the > desire to minimize the number of different operations used). > > So in the full problem we have: > > 17 = $X1 > 1024 = $X2 > 17 1024 = $Y > > and we want to compute both X1 (+./@:*.) Y and X2 (+./@:*.) |:Y many many > times with Y being static. > > For X2 your approach works great, however for X1 it's better to take > advantage of the bit-level parallelism (like you demonstrated for X2) which > on modern SIMD machines can be 256 bits (or even 512 on the latest and > greatest). I guess in J my suggested non-LUT approach for X1 would go > something like: > > Y=. 1 > ? 17 1024 $ 10 > X=. ? 17 $ 2 > > YDense=. _32 #.\"1 Y > ResultDense=. X ((23 b.)/ @: *) YDense > Result=. , #: ResultDense > > NB. Everything checks out > ~. Result = (X (+./@:*.) Y) > 1 > > For X1, the LUT version of ((0~:(17 b.))&(#.@|:) requires 1024 ANDs, 1024 > compares and then 1024 of the associated bit shuffling operations to get > those results into contiguous register values. > > For X1, ((23 b.)/@:*) requires 68 ORs and 17 bit shuffling operations (to > extract 0 or 1 from the bits of XDense). > > The LUT method I was originally suggesting goes one step further and > precomputes ORs between adjacent rows of Y. The 3-bit version would > precompute 3 rows at a time. The number of bits in the lut, linearly > decreases the number of ORs and bit shuffling ops required (and memory > accesses) at the cost of increased memory use. > > Cheers, > > Mike > > On Thu, Oct 13, 2016 at 2:09 AM, 'Mike Day' via Programming < > programm...@jsoftware.com> wrote: > >> While I'm also not too clear what you're really trying to achieve, >> it might be worth offering this solution to your particular example >> knowing your required solution: >> >> x ((0~:(17 b.))&(#.@|:)) y >> 1 1 0 0 1 1 >> >> It uses the boolean and operation (17 b.) on bits at word level. >> 0~: could be 0< if you're sure the sign bit isn't involved. >> Inducing |: on x is redundant, but it makes for a neater expression. >> >> If you were doing lots of these, it would be better to >> set up your "look-up" vector as >> >> [LU =: #.|:y >> 16 13 1 9 10 7 >> >> and then do >> >> LU(0~:(17 b.)) #.x >> 1 1 0 0 1 1 >> >> For repeated operation on several x s, perhaps define a verb: >> >> lu =: LU &(0 ~: 17 b.)@#."1 >> >> eg >> >> lu x,:|.x >> 1 1 0 0 1 1 >> 0 1 1 1 1 1 >> >> Binding LU into lu might be excessive for spatially large arrays of y. >> >> This approach should work well for fewer that 32 or 64 bits. Otherwise, >> you'd need to break the arrays up into word-sized chunks. Perhaps that >> is what you were getting at? >> >> Any help, >> >> Mike >> >> >> >> On 13/10/2016 08:11, Michal Dobrogost wrote: >> >>> Sorry for spamming. >>> >>> In fact my paranoid use of @: can be replaced with @ everywhere. Giving: >>> >>> L3=. _3((|."1@#:@i.@(2^#)) (+./@#) ])\Y >>> >>> So I should also say that the real goal is to minimize the number of >>> different operations used. Right now this stands at: #: i. 2^ # +. / @ # >>> \ >>> if we assume #: is implemented in reverse. >>> >>> On Thu, Oct 13, 2016 at 12:05 AM, Michal Dobrogost < >>> michal.dobrog...@gmail.com> wrote: >>> >>> And +./"2@:# can be rewritten as +./@#. >>>> >>>> Anything else? >>>> >>>> On Wed, Oct 12, 2016 at 11:55 PM, Michal Dobrogost < >>>> michal.dobrog...@gmail.com> wrote: >>>> >>>> You can rewrite *."0 1"1 _ as # >>>>> >>>>> ]L3=. _3((|."1@:#:@:i.@:(2^#)) (+./"2 @: #) ])\Y >>>>> 0 0 0 0 0 NB. zero >>>>> 0 0 0 0 1 NB. 0 { Y >>>>> 0 0 0 1 0 NB. 1 { Y >>>>> 0 0 0 1 1 NB. +./ 0 1 { Y >>>>> 0 0 1 0 0 NB. 2 { y >>>>> 0 0 1 0 1 NB. 0 2 { Y >>>>> 0 0 1 1 0 NB. +./ 0 1 { Y >>>>> 0 0 1 1 1 NB. +./ 0 1 2 { Y >>>>> >>>>> 0 0 0 0 0 NB. zero >>>>> 0 1 0 0 0 NB. 3 { Y >>>>> 1 0 0 0 0 NB. 4 { Y >>>>> 1 1 0 0 0 NB. +./ 3 4 { Y >>>>> 0 0 0 0 0 >>>>> 0 0 0 0 0 >>>>> 0 0 0 0 0 >>>>> 0 0 0 0 0 >>>>> >>>>> On Wed, Oct 12, 2016 at 11:34 PM, Michal Dobrogost < >>>>> michal.dobrog...@gmail.com> wrote: >>>>> >>>>> I've tried to clean up the code to be more explanatory. All we really >>>>>> care about is generating L2 given an arbitrary Y. Rewriting this has >>>>>> also >>>>>> helped me catch a subtle bug with arrays going left-to-right but #. >>>>>> and #: >>>>>> interpreting numbers right-to-left. >>>>>> >>>>>> NB. X and Y are completely arbitary and are chosen to be >>>>>> NB. easy to trace. >>>>>> ] X =. 0 1 1 0 1 >>>>>> 0 1 1 0 1 >>>>>> ] Y =. |.=/~i.5 >>>>>> 0 0 0 0 1 >>>>>> 0 0 0 1 0 >>>>>> 0 0 1 0 0 >>>>>> 0 1 0 0 0 >>>>>> 1 0 0 0 0 >>>>>> >>>>>> NB. This is the final result but we are actually trying to >>>>>> NB. generate lookup tables for highly optimized C++ code. >>>>>> X (+./@:*.) Y >>>>>> 1 0 1 1 0 >>>>>> >>>>>> NB. Break up x into 2-bit chunks. >>>>>> NB. Note: reversed with |. because #. below interprets right-most >>>>>> NB. digit as most significant but J's array has first >>>>>> NB. item as left-most. >>>>>> X >>>>>> 0 1 1 0 1 >>>>>> _2<@|.\X >>>>>> ┌───┬───┬─┐ >>>>>> │1 0│0 1│1│ >>>>>> └───┴───┴─┘ >>>>>> >>>>>> NB. We convert 2-bit chunks of x into indices. >>>>>> _2#.@|.\X >>>>>> 2 1 1 >>>>>> >>>>>> ]L2=. _2((|."1@:#:@:i.@:(2^#)) (+./"2 @: (*."0 1"1 _)) ])\Y >>>>>> 0 0 0 0 0 NB. zero >>>>>> 0 0 0 0 1 NB. 0 { y >>>>>> 0 0 0 1 0 NB. 1 { y >>>>>> 0 0 0 1 1 NB. +./ 0 1 { y >>>>>> >>>>>> 0 0 0 0 0 NB. zero >>>>>> 0 0 1 0 0 NB. 2 { y >>>>>> 0 1 0 0 0 NB. 3 { y >>>>>> 0 1 1 0 0 NB. +./ 2 3 { y >>>>>> >>>>>> 0 0 0 0 0 NB. zero >>>>>> 1 0 0 0 0 NB. 4 { y >>>>>> 0 0 0 0 0 >>>>>> 0 0 0 0 0 >>>>>> >>>>>> NB. Compute by 2-bit lookups (same result as +./@:*. above). >>>>>> NB. We don't really care, this just demonstrates that the LUT >>>>>> works. >>>>>> +./ (_2#.@|.\x) {"0 2 L2 >>>>>> 1 0 1 1 0 >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Mike >>>>>> >>>>>> On Wed, Oct 12, 2016 at 7:00 PM, Michal Dobrogost < >>>>>> michal.dobrog...@gmail.com> wrote: >>>>>> >>>>>> Hi Raul, >>>>>>> >>>>>>> What is the definition of *inds*? >>>>>>> >>>>>>> >>>>>>> I would describe the higher level operation as: select the rows of Y >>>>>>> where X is 1. Then take the column-wise OR of them. >>>>>>> >>>>>>> However the real thing I'm interested in is generating the lookup >>>>>>> tables which serve as an intermediate step in the higher level >>>>>>> operation. >>>>>>> >>>>>>> On Wed, Oct 12, 2016 at 9:35 AM, Raul Miller <rauldmil...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> I'm not quite sure I understand what you are doing here. >>>>>>>> >>>>>>>> Here's what I understand from your description: >>>>>>>> >>>>>>>> You're looking for rows where a bit set in X has a bit set in Y, and >>>>>>>> you want to split up X and Y into smaller pieces for your >>>>>>>> intermediate >>>>>>>> result, and you want to use indices rather than bit operations for >>>>>>>> your final operation. >>>>>>>> >>>>>>>> That gives me something like this. >>>>>>>> >>>>>>>> X =: ? 5 $ 2 >>>>>>>> Y =: 30 > ? 5 6 $ 100 >>>>>>>> X >>>>>>>> 0 1 1 1 1 >>>>>>>> Y >>>>>>>> 0 0 0 0 0 0 >>>>>>>> 1 0 0 0 0 0 >>>>>>>> 1 1 1 0 0 1 >>>>>>>> 1 0 0 1 0 0 >>>>>>>> 0 0 0 0 0 1 >>>>>>>> (_2 {. X) inds _2 {. Y >>>>>>>> 0 3 5 >>>>>>>> (3 {. X) inds 3 {. Y >>>>>>>> 0 1 2 5 >>>>>>>> 0 3 5 ([ -. -.) 0 1 2 5 >>>>>>>> 0 5 >>>>>>>> >>>>>>>> So maybe you would be looking at making an adverb along the lines >>>>>>>> of: >>>>>>>> >>>>>>>> L=: adverb define >>>>>>>> : >>>>>>>> (m {. x) inds m {. y >>>>>>>> ) >>>>>>>> >>>>>>>> Or, more concisely: >>>>>>>> >>>>>>>> L=: adverb define >>>>>>>> inds&(m&{.) >>>>>>>> ) >>>>>>>> >>>>>>>> But when I look at your calculations, I don't see anything like >>>>>>>> this. >>>>>>>> >>>>>>>> If I am off base here, could you describe how what you are looking >>>>>>>> for >>>>>>>> differs from what I am understanding? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> -- >>>>>>>> Raul >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Oct 12, 2016 at 12:14 PM, Michal Dobrogost >>>>>>>> <michal.dobrog...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> I've been mucking around generating look up tables in C++. Getting >>>>>>>>> frustrated, I wrote up this J single liner. Can you think of ways >>>>>>>>> to >>>>>>>>> simplify the LUT computation (the expression we assign to L2 and L3 >>>>>>>>> >>>>>>>> below)? >>>>>>>> >>>>>>>>> *Original Operator (no LUT)* >>>>>>>>> >>>>>>>>> ] x =. ? 5 $ 2 >>>>>>>>> 1 0 1 1 0 >>>>>>>>> ] y =. 30 > ? 5 6 $ 100 >>>>>>>>> 1 0 0 0 0 0 >>>>>>>>> 0 1 0 1 1 0 >>>>>>>>> 0 1 0 0 0 1 >>>>>>>>> 0 0 0 0 1 1 >>>>>>>>> 0 1 1 1 0 1 >>>>>>>>> x (+./@:*.) y >>>>>>>>> 1 1 0 0 1 1 >>>>>>>>> >>>>>>>>> *LUT Explanation* >>>>>>>>> >>>>>>>>> The idea is to break up x into smaller chunks (2-bits, 3-bits, etc) >>>>>>>>> >>>>>>>> and >>>>>>>> >>>>>>>>> precompute the operation for the corresponding chunks of y. Then we >>>>>>>>> >>>>>>>> just >>>>>>>> >>>>>>>>> convert the chunks into indices and look them up in the LUT. >>>>>>>>> >>>>>>>>> _2<\x >>>>>>>>> ┌───┬───┬─┐ >>>>>>>>> │1 0│1 1│0│ >>>>>>>>> └───┴───┴─┘ >>>>>>>>> _2#.\x >>>>>>>>> 2 3 0 >>>>>>>>> >>>>>>>>> *2-bit LUT* >>>>>>>>> >>>>>>>>> ]L2=. _2((#:@:i.@:(2&^)@:#) (+./"2 @: (*."0 1"1 _)) ])\y >>>>>>>>> 0 0 0 0 0 0 >>>>>>>>> 0 1 0 1 1 0 >>>>>>>>> 1 0 0 0 0 0 >>>>>>>>> 1 1 0 1 1 0 >>>>>>>>> >>>>>>>>> 0 0 0 0 0 0 >>>>>>>>> 0 0 0 0 1 1 >>>>>>>>> 0 1 0 0 0 1 >>>>>>>>> 0 1 0 0 1 1 >>>>>>>>> >>>>>>>>> 0 0 0 0 0 0 >>>>>>>>> 0 1 1 1 0 1 >>>>>>>>> 0 0 0 0 0 0 >>>>>>>>> 0 0 0 0 0 0 >>>>>>>>> >>>>>>>>> +./ (_2#.\x) {"0 2 L2 NB. Compute by 2-bit lookups >>>>>>>>> 1 1 0 0 1 1 >>>>>>>>> >>>>>>>>> *3-Bit LUT* >>>>>>>>> >>>>>>>>> ]L3=. _3((#:@:i.@:(2&^)@:#) (+./"2 @: (*."0 1"1 _)) ])\y >>>>>>>>> 0 0 0 0 0 0 >>>>>>>>> 0 1 0 0 0 1 >>>>>>>>> 0 1 0 1 1 0 >>>>>>>>> 0 1 0 1 1 1 >>>>>>>>> 1 0 0 0 0 0 >>>>>>>>> 1 1 0 0 0 1 >>>>>>>>> 1 1 0 1 1 0 >>>>>>>>> 1 1 0 1 1 1 >>>>>>>>> >>>>>>>>> 0 0 0 0 0 0 >>>>>>>>> 0 1 1 1 0 1 >>>>>>>>> 0 0 0 0 1 1 >>>>>>>>> 0 1 1 1 1 1 >>>>>>>>> 0 0 0 0 0 0 >>>>>>>>> 0 0 0 0 0 0 >>>>>>>>> 0 0 0 0 0 0 >>>>>>>>> 0 0 0 0 0 0 >>>>>>>>> >>>>>>>>> +./ (_3#.\x) {"0 2 L3 NB. Compute by 3-bit lookups >>>>>>>>> 1 1 0 0 1 1 >>>>>>>>> ------------------------------------------------------------ >>>>>>>>> >>>>>>>> ---------- >>>>>>>> >>>>>>>>> For information about J forums see http://www.jsoftware.com/forum >>>>>>>>> >>>>>>>> s.htm >>>>>>>> ------------------------------------------------------------ >>>>>>>> ---------- >>>>>>>> For information about J forums see http://www.jsoftware.com/forum >>>>>>>> s.htm >>>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------ >>> ---------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >>> >> >> >> --- >> This email has been checked for viruses by Avast antivirus software. >> https://www.avast.com/antivirus >> >> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm