/ and boxscan are indeed fold and reduce in other languages. The lisp implementations probably the easiest to understand.
The big difference between / and boxscan is that / applied to a list implies a requirement that all items in the list are the same shape. boxscan can use any list of arguments as the x, and appends any shape as the y, as long x u y results in a shape "compatible with y" (more accurately compatible with future applications of x u result . use of terms like "recursion with tail call optimization" come from lisp fold implementations where in order to avoid stack growth/exhaustion (/ avoids such stack abuse btw), you need to pass any accumulator (state between function calls) as a parameter to the function. boxscan basically allows the anything lisp can do, can be done with boxscan (and each/ at its core) thinking. A neat feature regarding tacit adverbs in general, is that I can type boxscan in source code, but the actual definition will be "compiled away" as if it were called with f. . using the term boxscan can help with readability if the intent of the function is to create a rightmost accumulator state. ----- Original Message ----- From: Raul Miller <[email protected]> To: Programming forum <[email protected]> Cc: Sent: Monday, August 25, 2014 1:16:58 PM Subject: Re: [Jprogramming] Adler-32 I have a nitpick with your phrasing here. Instead of saying "... use / recursively... (tail call optimization)" I think you should say something else. Mind you, I'm not quite sure what you should say. And, it's a nice clever bit of code. And here's another example of how I might use the concept of "use" - I am using the usual J convention where we indent the expression but not the result: (long sentence here) ((&.>)/)(>@:) >@:((long sentence here)&.>/) But I can't quite bring myself to say that this is "using / recursively". I think I recognize the thought process - the argument to boxscan is a binary operator or what we call a dyadic verb, and the underlying concept here (which might be called "reduce", "insert" or "fold") is often implemented recursively in other contexts. But / itself represents that same operation. So it's the implementation of / or the concept of / which would attach to the concept of "recursively", but that's something very different, in my mind, from "using / recursively". Using / recursively seems like it ought to repeatedly use / in the same context (instead of just once). Maybe the sentence should have been "make the use of / more closely match a tail call recursive list scan"? I'm not sure. (Should I take this kind of side thought to the chat forum?) Thanks, -- Raul On Mon, Aug 25, 2014 at 11:47 AM, 'Pascal Jasmin' via Programming <[email protected]> wrote: > I was curious in making a chunked version of running sum for the algorithmic > savings of doing 1 modulo expression per group, but also working with packet > groups, or "bytes so far". > > Randdata =. 1e7 ?@:$ 256 > > boxscan > =: ((&.>)/)(>@:) > > an explicit version for argument to boxscan: > > runsum =: 4 : 0 NB. x is data, y is accumulators > a=. +/\ x ,~ {: y > ({: ,~ ({. y) + +/@:}:) a > ) > > boxscan is a way to use / recursively (tail call optimized) where the right > most box can be different shape than the data, and the verb argument to / > processes x to update new values (accumulators) of y, which / will then pass > on to the next "iteration" of u/ > > (65521 | [: ( {: , <:@:+/) (65521 | {.@:] ({:@:] ,~ [ + +/@:}:@:]) [: +/\ >[,~{:@:]) boxscan) |. 0 0 ; _3 <\ 1, a. i. 'Wikipedia' > 920 4582 > (65521 | [: ( {: , <:@:+/) (65521 | runsum) boxscan) |. 0 0 ; _4 <\ 1, > a. i. 'Wikipedia' > 920 4582 > > timespacex '(65521 | [: ( {: , <:@:+/) (65521 | {.@:] ({:@:] ,~ [ + >+/@:}:@:]) [: +/\ [,~{:@:]) boxscan) |. 0 0 ; _4096 <\ 1, Randdata' > 0.274859 4.28927e8 > timespacex '(65521 | [: ( {: , <:@:+/) (65521 | {.@:] ({:@:] ,~ [ + >+/@:}:@:]) [: +/\ [,~{:@:]) boxscan) |. 0 0 ; _15000010 <\ 1, Randdata' > 0.412953 5.36889e8 > timespacex '(65521 | [: ( {: , <:@:+/) (65521 | {.@:] ({:@:] ,~ [ + >+/@:}:@:]) [: +/\ [,~{:@:]) boxscan) |. 0 0 ; _1500001 <\ 1, Randdata' > 0.414528 3.77498e8 > > 4096 to 32764 chunks are the fastest, and use the same memory. 4096 would > work best on 32 bit J for character data > > > > > ----- Original Message ----- > From: bill lam <[email protected]> > To: 'Pascal Jasmin' via Programming <[email protected]> > Cc: > Sent: Sunday, August 24, 2014 11:04:01 PM > Subject: Re: [Jprogramming] Adler-32 > > Even if it overflows to floating point, it still gets 52-bit > preceison. In zlib, adler32 checksum is calcualted for each > block, which I guess is 32K or 64K. Therefore the issue of > overflow might well be a non-issue in the context of zlib. > > On J32 > > 9!:11[16 > > 255**:65536 > 1095216660480 > 2^51x > 2251799813685248 > > > Вс, 24 авг 2014, jprogramming написал(а): >> I'm having a hard time getting a streaming version (one that works on parts >> of the data and then recombines. >> >> can only do 4096 bytes for 32 bit J, before it needs do call |.. but it is >> fast, and 64bit J can do 2^32 larger byte sizes (over 16TB) >> >> timespacex ' (65521 | [: ({: , [: <: +/) +/\) 1 ,~ Randdata' >> 0.187033 2.68442e8 >> >> the block I have with a streaming (chunk) version is how to get these numbers >> >> +/\ 1 87 105 107 105 112 101 100 105 97 >> 1 88 193 300 405 517 618 718 823 920 >> >> from numbers in this shape: >> _3 [\ 1, a. i. 'Wikipedia' >> 1 87 105 >> 107 105 112 >> 101 100 105 >> 97 0 0 >> >> , is not allowed. The key point about 1 88 193 300 405 517 618 718 823 920 >> is that it provides the basis for s2 >> >> <: +/ 1 88 193 300 405 517 618 718 823 920 >> 4582 >> >> >> >> ----- Original Message ----- >> From: 'Pascal Jasmin' via Programming <[email protected]> >> To: "[email protected]" <[email protected]> >> Cc: >> Sent: Sunday, August 24, 2014 11:48:23 AM >> Subject: Re: [Jprogramming] Adler-32 >> >> sorry that version doesn't work on a stream. will post one later. >> >> >> ----- Original Message ----- >> From: bill lam <[email protected]> >> To: 'Pascal Jasmin' via Programming <[email protected]> >> Cc: >> Sent: Sunday, August 24, 2014 11:15:57 AM >> Subject: Re: [Jprogramming] Adler-32 >> >> Did you mean _4]\ ? >> >> Anyway, from the wikipedia page, >> B = (1 + D[1]) + (1 + D[1] + D[2]) + ... + (1 + D[1] + D[2] + ... + D[n]) >> (mod 65521) >> = n×D[1] + (n−1)×D[2] + (n−2)×D[3] + ... + D[n] + n (mod 65521) >> >> if a parallel version is asked for, probably the formula in the >> 2nd line should be used, then the speed may increase directly >> proportional to the number of core. A good problem for Nvidia >> Jetson Tk1 board with 192 CUDA cores. >> >> * http://www.nvidia.com/object/jetson-tk1-embedded-dev-kit.html >> >> >> NB. ============================= >> NB. Raul Miller >> a32=: [: ({: (23 b.) 16&(33 b.)@{.) _1 0 + [: ((65521 | +)/ , {.) [: (65521 >> | +)/\. 1 ,~ a. i. |. >> >> NB. from zlib rfc >> adler32=: 3 : 0 >> 's1 s2'=. 1 0 >> for_a. a.i.y do. >> s1=. 65521 | s1 + a >> s2=. 65521 | s2 + s1 >> end. >> ({: (23 b.) 16&(33 b.)@{.) s2,s1 >> ) >> >> NB. formula from adler32 wikipedia >> adler32a=: 3 : 0 >> dat=. a.i.y >> s1=. (65521|+)/1,dat >> s2=. (#dat) (65521|+) (65521|+)/ (}:i.(-1+#dat)) (65521|*) dat >> ({: (23 b.) 16&(33 b.)@{.) s2,s1 >> ) >> >> f=: 1e6{.a. >> >> smoutput adler32 f >> smoutput adler32a f >> smoutput a32 f >> >> smoutput timespacex'adler32 f' >> smoutput timespacex'adler32a f' >> smoutput timespacex'a32 f' >> NB. ============================= >> >> result on J32 >> >> _121519079 >> _121519079 >> _121519079 >> 3.99429 4.19898e6 >> 1.03239 2.09744e7 >> 0.9106 8.38976e6 >> >> >> Вс, 24 авг 2014, jprogramming написал(а): >> > an approach that should be fast and also allows resuming where you left >> > off, and avoiding many | >> > >> > ([: ({: , [: <: +/) +/\) 1, a. i. 'Wikipedia' >> > 920 4582 >> > >> > works even when you split up stream into say 32k bytes, and do the mod >> > transform between each stream, and the single number reduction at the end. >> > >> > ([: ({: , [: <: +/) +/\)"1 ] _4 ]/ 1, a. i. 'Wikipedia' >> > 920 4582 >> > >> > >> > >> > >> > ----- Original Message ----- >> > From: bill lam <[email protected]> >> > To: Programming forum <[email protected]> >> > Cc: >> > Sent: Sunday, August 24, 2014 7:35:26 AM >> > Subject: Re: [Jprogramming] Adler-32 >> > >> > I realized that my loop translation solution was equally faulty in its >> > final step. It should be >> > ({: (23 b.) 16&(33 b.)@{.) s2,s1 >> > >> > >> > >> > On Aug 24, 2014 6:40 PM, "bill lam" <[email protected]> wrote: >> > >> > > J32 should give a 32-bit integer so that the last train >> > > should be >> > > [: ({: (23 b.) 16&(33 b.)@{.) >> > > instead of >> > > 65536#. >> > > >> > > This shoud be good enough. Thanks. >> > > >> > > Вс, 24 авг 2014, Jan-Pieter Jacobs написал(а): >> > > > You could incorporate the modulo operation in the summation directly: >> > > > >> > > > As per Raul's solution, with the modulo addition: >> > > > >> > > > a32_2=: 65536#. _1 0+ [:((65521|+)/ , {.) [: (65521|+)/\. 1,~ a.i. |. >> > > > NB. Test on 10M of random data >> > > > Randdata =: a.{~ ?10000000$256 >> > > > timespacex 'r1=:a32_2 Randdata' >> > > > 7.26671 2.68438e8 >> > > > >> > > > timespacex 'r1=:adler32 Randdata' >> > > > 36.2094 2.01333e8 >> > > > >> > > > Jan-Pieter >> > > > >> > > > 2014-08-24 11:14 GMT+02:00 bill lam <[email protected]>: >> > > > > Since s2 is quadratic, it overflows quickly in J32. However >> > > > > ieee floating points have 52 bits mantissa, so it can still >> > > > > give correct result for medium size data albeit overflowed. >> > > > > >> > > > > 9!:11[16 >> > > > > (3!:0;]) a32 1e6{.a. >> > > > > >> > > > > Вс, 24 авг 2014, Raul Miller написал(а): >> > > > >> Hmm... actually there might be a floating point overflow problem for >> > > > >> gigabyte length arguments. I do not have the patience, however, to >> > > > >> find what (for example) adler32 1e9#{:a. is. >> > > > >> >> > > > >> Thanks, >> > > > >> >> > > > >> -- >> > > > >> Raul >> > > > >> >> > > > >> >> > > > >> >> > > > >> On Sun, Aug 24, 2014 at 4:37 AM, Raul Miller <[email protected]> >> > > wrote: >> > > > >> > I believe this is equivalent, and a bit faster: >> > > > >> > >> > > > >> > a32=: 65536#. 65521| _1 0+ [:(+/ , {.) 65521| [:+/\. 1,~ a.i. |. >> > > > >> > >> > > > >> > a32 'Wikipedia' >> > > > >> > 300286872 >> > > > >> > a32 'The quick brown fox jumps over the lazy dog' >> > > > >> > 1541148634 >> > > > >> > >> > > > >> > Thanks, >> > > > >> > >> > > > >> > -- >> > > > >> > Raul >> > > > >> > >> > > > >> > >> > > > >> > On Sun, Aug 24, 2014 at 4:05 AM, bill lam <[email protected]> >> > > wrote: >> > > > >> >> Does anyone have a J-ish solution to adler32 checksum >> > > > >> >> and avoid overflow to floating points ? >> > > > >> >> >> > > > >> >> Adler-32 is composed of two sums accumulated per byte: s1 is >> > > > >> >> the sum of all bytes, s2 is the sum of all s1 values. Both sums >> > > > >> >> are done modulo 65521. s1 is initialized to 1, s2 to zero. The >> > > > >> >> Adler-32 checksum is stored as s2*65536 + s1 in most- >> > > > >> >> significant-byte first (network) order. >> > > > >> >> >> > > > >> >> * https://www.ietf.org/rfc/rfc1950.txt >> > > > >> >> * http://en.wikipedia.org/wiki/Adler-32 >> > > > >> >> >> > > > >> >> direct translation to J >> > > > >> >> >> > > > >> >> adler32=: 3 : 0 >> > > > >> >> 's1 s2'=. 1 0 >> > > > >> >> for_a. a.i.y do. >> > > > >> >> s1=. 65521 | s1 + a >> > > > >> >> s2=. 65521 | s2 + s1 >> > > > >> >> end. >> > > > >> >> s1 + 16 (33 b.) s2 >> > > > >> >> ) >> > > > >> >> >> > > > >> >> test data: >> > > > >> >> >> > > > >> >> adler32 'Wikipedia' >> > > > >> >> 300286872 >> > > > >> >> >> > > > >> >> adler32 'The quick brown fox jumps over the lazy dog' >> > > > >> >> 1541148634 >> > > > >> >> >> > > > >> >> -- >> > > > >> >> regards, >> > > > >> >> ==================================================== >> > > > >> >> GPG key 1024D/4434BAB3 2008-08-24 >> > > > >> >> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >> > > > >> >> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >> > > > >> >> >> > > ---------------------------------------------------------------------- >> > > > >> >> For information about J forums see >> > > http://www.jsoftware.com/forums.htm >> > > > >> ---------------------------------------------------------------------- >> > > > >> For information about J forums see >> > > http://www.jsoftware.com/forums.htm >> >> >> >> >> >> >> > > > > >> > > > > -- >> > > > > regards, >> > > > > ==================================================== >> > > > > GPG key 1024D/4434BAB3 2008-08-24 >> > > > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >> > > > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >> > > > > ---------------------------------------------------------------------- >> > > > > For information about J forums see >> > > > > http://www.jsoftware.com/forums.htm >> > > > ---------------------------------------------------------------------- >> > > > For information about J forums see http://www.jsoftware.com/forums.htm > > > >> > > >> > > -- >> > > regards, >> > > ==================================================== >> > > GPG key 1024D/4434BAB3 2008-08-24 >> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >> > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >> > > >> > ---------------------------------------------------------------------- >> > For information about J forums see http://www.jsoftware.com/forums.htm >> > ---------------------------------------------------------------------- >> > For information about J forums see http://www.jsoftware.com/forums.htm >> >> -- >> regards, >> ==================================================== >> GPG key 1024D/4434BAB3 2008-08-24 >> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > > -- > regards, > ==================================================== > GPG key 1024D/4434BAB3 2008-08-24 > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
