Re: [Jprogramming] Adler-32

'Pascal Jasmin' via Programming Mon, 25 Aug 2014 11:40:43 -0700

/ and boxscan are indeed fold and reduce in other languages.  The lisp 
implementations probably the easiest to understand.


The big difference between / and boxscan is that / applied to a list implies a 
requirement that all items in the list are the same shape.  boxscan can use any 
list of arguments as the x, and appends any shape as the y, as long x u y 
results in a shape "compatible with y" (more accurately compatible with future 
applications of x u result .

use of terms like "recursion with tail call optimization" come from lisp fold 
implementations where in order to avoid stack growth/exhaustion (/ avoids such 
stack abuse btw), you need to pass any accumulator (state between function 
calls) as a parameter to the function.  boxscan basically allows the anything 
lisp can do, can be done with boxscan (and each/ at its core) thinking.

A neat feature regarding tacit adverbs in general, is that I can type boxscan 
in source code, but the actual definition will be "compiled away" as if it were 
called with f. . using the term boxscan can help with readability if the intent 
of the function is to create a rightmost accumulator state.




----- Original Message -----
From: Raul Miller <[email protected]>
To: Programming forum <[email protected]>
Cc: 
Sent: Monday, August 25, 2014 1:16:58 PM
Subject: Re: [Jprogramming] Adler-32

I have a nitpick with your phrasing here.

Instead of saying "... use / recursively... (tail call optimization)"
I think you should say something else.

Mind you, I'm not quite sure what you should say.

And, it's a nice clever bit of code. And here's another example of how
I might use the concept of "use"  - I am using the usual J convention
where we indent the expression but not the result:

   (long sentence here)    ((&.>)/)(>@:)
>@:((long sentence here)&.>/)

But I can't quite bring myself to say that this is "using / recursively".

I think I recognize the thought process - the argument to boxscan is a
binary operator or what we call a dyadic verb, and the underlying
concept here (which might be called "reduce", "insert" or "fold") is
often implemented recursively in other contexts.

But / itself represents that same operation. So it's the
implementation of / or the concept of / which would attach to the
concept of "recursively", but that's something very different, in my
mind, from "using / recursively".

Using / recursively seems like it ought to repeatedly use / in the
same context (instead of just once).

Maybe the sentence should have been "make the use of / more closely
match a tail call recursive list scan"?

I'm not sure.

(Should I take this kind of side thought to the chat forum?)

Thanks,

-- 
Raul

On Mon, Aug 25, 2014 at 11:47 AM, 'Pascal Jasmin' via Programming
<[email protected]> wrote:
> I was curious in making a chunked version of running sum for the algorithmic 
> savings of doing 1 modulo expression per group, but also working with packet 
> groups, or "bytes so far".
>
> Randdata =. 1e7 ?@:$ 256
>
> boxscan
> =: ((&.>)/)(>@:)
>
> an explicit version for argument to boxscan:
>
> runsum =: 4 : 0 NB. x is data, y is accumulators
> a=. +/\  x ,~ {: y
> ({:  ,~ ({. y) + +/@:}:) a
> )
>
> boxscan is a way to use / recursively (tail call optimized) where the right 
> most box can be different shape than the data, and the verb argument to / 
> processes x to update new values (accumulators) of y, which / will then pass 
> on to the next "iteration" of u/
>
>    (65521 | [: ( {: , <:@:+/) (65521 | {.@:] ({:@:]  ,~ [ + +/@:}:@:]) [: +/\ 
>[,~{:@:]) boxscan) |. 0 0 ;   _3 <\  1,  a. i. 'Wikipedia'
> 920 4582
>     (65521 | [: ( {: , <:@:+/) (65521 | runsum) boxscan) |. 0 0 ;   _4 <\  1, 
> a. i. 'Wikipedia'
> 920 4582
>
>    timespacex '(65521 | [: ( {: , <:@:+/) (65521 | {.@:] ({:@:]  ,~ [ + 
>+/@:}:@:]) [: +/\ [,~{:@:]) boxscan) |. 0 0 ;   _4096 <\  1, Randdata'
> 0.274859 4.28927e8
>    timespacex '(65521 | [: ( {: , <:@:+/) (65521 | {.@:] ({:@:]  ,~ [ + 
>+/@:}:@:]) [: +/\ [,~{:@:]) boxscan) |. 0 0 ;   _15000010 <\  1, Randdata'
> 0.412953 5.36889e8
>    timespacex '(65521 | [: ( {: , <:@:+/) (65521 | {.@:] ({:@:]  ,~ [ + 
>+/@:}:@:]) [: +/\ [,~{:@:]) boxscan) |. 0 0 ;   _1500001 <\  1, Randdata'
> 0.414528 3.77498e8
>
> 4096 to 32764 chunks are the fastest, and use the same memory.  4096 would 
> work best on 32 bit J for character data
>
>
>
>
> ----- Original Message -----
> From: bill lam <[email protected]>
> To: 'Pascal Jasmin' via Programming <[email protected]>
> Cc:
> Sent: Sunday, August 24, 2014 11:04:01 PM
> Subject: Re: [Jprogramming] Adler-32
>
> Even if it overflows to floating point, it still gets 52-bit
> preceison. In zlib, adler32 checksum is calcualted for each
> block, which I guess is 32K or 64K. Therefore the issue of
> overflow might well be a non-issue in the context of zlib.
>
> On J32
>
>    9!:11[16
>
>    255**:65536
> 1095216660480
>    2^51x
> 2251799813685248
>
>
> Вс, 24 авг 2014, jprogramming написал(а):
>> I'm having a hard time getting a streaming version (one that works on parts 
>> of the data and then recombines.
>>
>> can only do 4096 bytes for 32 bit J, before it needs do call |.. but it is 
>> fast, and 64bit J can do 2^32 larger byte sizes (over 16TB)
>>
>>     timespacex ' (65521 | [: ({: , [: <: +/) +/\) 1 ,~ Randdata'
>> 0.187033 2.68442e8
>>
>> the block I have with a streaming (chunk) version is how to get these numbers
>>
>>    +/\ 1 87 105 107 105 112 101 100 105 97
>> 1 88 193 300 405 517 618 718 823 920
>>
>> from numbers in this shape:
>>    _3 [\ 1,  a. i. 'Wikipedia'
>>   1  87 105
>> 107 105 112
>> 101 100 105
>>  97   0   0
>>
>> , is not allowed.  The key point about 1 88 193 300 405 517 618 718 823 920 
>> is that it provides the basis for s2
>>
>>    <: +/ 1 88 193 300 405 517 618 718 823 920
>> 4582
>>
>>
>>
>> ----- Original Message -----
>> From: 'Pascal Jasmin' via Programming <[email protected]>
>> To: "[email protected]" <[email protected]>
>> Cc:
>> Sent: Sunday, August 24, 2014 11:48:23 AM
>> Subject: Re: [Jprogramming] Adler-32
>>
>> sorry that version doesn't work on a stream.  will post one later.
>>
>>
>> ----- Original Message -----
>> From: bill lam <[email protected]>
>> To: 'Pascal Jasmin' via Programming <[email protected]>
>> Cc:
>> Sent: Sunday, August 24, 2014 11:15:57 AM
>> Subject: Re: [Jprogramming] Adler-32
>>
>> Did you mean  _4]\  ?
>>
>> Anyway, from the wikipedia page,
>> B = (1 + D[1]) + (1 + D[1] + D[2]) + ... + (1 + D[1] + D[2] + ... + D[n]) 
>> (mod 65521)
>>    = n×D[1] + (n−1)×D[2] + (n−2)×D[3] + ... + D[n] + n (mod 65521)
>>
>> if a parallel version is asked for, probably the formula in the
>> 2nd line should be used, then the speed may increase directly
>> proportional to the number of core. A good problem for Nvidia
>> Jetson Tk1 board with 192 CUDA cores.
>>
>> * http://www.nvidia.com/object/jetson-tk1-embedded-dev-kit.html
>>
>>
>> NB. =============================
>> NB. Raul Miller
>> a32=: [: ({: (23 b.) 16&(33 b.)@{.) _1 0 + [: ((65521 | +)/ , {.) [: (65521 
>> | +)/\. 1 ,~ a. i. |.
>>
>> NB. from zlib rfc
>> adler32=: 3 : 0
>> 's1 s2'=. 1 0
>> for_a. a.i.y do.
>>   s1=. 65521 | s1 + a
>>   s2=. 65521 | s2 + s1
>> end.
>> ({: (23 b.) 16&(33 b.)@{.) s2,s1
>> )
>>
>> NB. formula from adler32 wikipedia
>> adler32a=: 3 : 0
>> dat=. a.i.y
>> s1=. (65521|+)/1,dat
>> s2=. (#dat) (65521|+) (65521|+)/ (}:i.(-1+#dat)) (65521|*) dat
>> ({: (23 b.) 16&(33 b.)@{.) s2,s1
>> )
>>
>> f=: 1e6{.a.
>>
>> smoutput adler32 f
>> smoutput adler32a f
>> smoutput a32 f
>>
>> smoutput timespacex'adler32 f'
>> smoutput timespacex'adler32a f'
>> smoutput timespacex'a32 f'
>> NB. =============================
>>
>> result on J32
>>
>> _121519079
>> _121519079
>> _121519079
>> 3.99429 4.19898e6
>> 1.03239 2.09744e7
>> 0.9106 8.38976e6
>>
>>
>> Вс, 24 авг 2014, jprogramming написал(а):
>> > an approach that should be fast and also allows resuming where you left 
>> > off, and avoiding many |
>> >
>> >    ([: ({: , [: <: +/) +/\) 1,  a. i. 'Wikipedia'
>> > 920 4582
>> >
>> > works even when you split up stream into say 32k bytes, and do the mod 
>> > transform between each stream, and the single number reduction at the end.
>> >
>> >     ([: ({: , [: <: +/) +/\)"1 ] _4 ]/   1,  a. i. 'Wikipedia'
>> > 920 4582
>> >
>> >
>> >
>> >
>> > ----- Original Message -----
>> > From: bill lam <[email protected]>
>> > To: Programming forum <[email protected]>
>> > Cc:
>> > Sent: Sunday, August 24, 2014 7:35:26 AM
>> > Subject: Re: [Jprogramming] Adler-32
>> >
>> > I realized that my loop translation solution was equally faulty in its
>> > final step. It should be
>> > ({: (23 b.) 16&(33 b.)@{.) s2,s1
>> >
>> >
>> >
>> > On Aug 24, 2014 6:40 PM, "bill lam" <[email protected]> wrote:
>> >
>> > > J32 should give a 32-bit integer so that the last train
>> > > should be
>> > >  [: ({: (23 b.) 16&(33 b.)@{.)
>> > > instead of
>> > >  65536#.
>> > >
>> > > This shoud be good enough. Thanks.
>> > >
>> > > Вс, 24 авг 2014, Jan-Pieter Jacobs написал(а):
>> > > > You could incorporate the modulo operation in the summation directly:
>> > > >
>> > > > As per Raul's solution, with the modulo addition:
>> > > >
>> > > > a32_2=: 65536#. _1 0+ [:((65521|+)/ , {.) [: (65521|+)/\. 1,~ a.i. |.
>> > > > NB. Test on 10M of random data
>> > > > Randdata =: a.{~ ?10000000$256
>> > > > timespacex 'r1=:a32_2 Randdata'
>> > > > 7.26671 2.68438e8
>> > > >
>> > > > timespacex 'r1=:adler32 Randdata'
>> > > > 36.2094 2.01333e8
>> > > >
>> > > > Jan-Pieter
>> > > >
>> > > > 2014-08-24 11:14 GMT+02:00 bill lam <[email protected]>:
>> > > > > Since s2 is quadratic, it overflows quickly in J32. However
>> > > > > ieee floating points have 52 bits mantissa, so it can still
>> > > > > give correct result for medium size data albeit overflowed.
>> > > > >
>> > > > > 9!:11[16
>> > > > > (3!:0;]) a32 1e6{.a.
>> > > > >
>> > > > > Вс, 24 авг 2014, Raul Miller написал(а):
>> > > > >> Hmm... actually there might be a floating point overflow problem for
>> > > > >> gigabyte length arguments. I do not have the patience, however, to
>> > > > >> find what (for example) adler32 1e9#{:a. is.
>> > > > >>
>> > > > >> Thanks,
>> > > > >>
>> > > > >> --
>> > > > >> Raul
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> On Sun, Aug 24, 2014 at 4:37 AM, Raul Miller <[email protected]>
>> > > wrote:
>> > > > >> > I believe this is equivalent, and a bit faster:
>> > > > >> >
>> > > > >> > a32=: 65536#. 65521| _1 0+ [:(+/ , {.) 65521| [:+/\. 1,~ a.i. |.
>> > > > >> >
>> > > > >> >    a32 'Wikipedia'
>> > > > >> > 300286872
>> > > > >> >    a32 'The quick brown fox jumps over the lazy dog'
>> > > > >> > 1541148634
>> > > > >> >
>> > > > >> > Thanks,
>> > > > >> >
>> > > > >> > --
>> > > > >> > Raul
>> > > > >> >
>> > > > >> >
>> > > > >> > On Sun, Aug 24, 2014 at 4:05 AM, bill lam <[email protected]>
>> > > wrote:
>> > > > >> >> Does anyone have a J-ish solution to adler32 checksum
>> > > > >> >> and avoid overflow to floating points ?
>> > > > >> >>
>> > > > >> >> Adler-32 is composed of two sums accumulated per byte: s1 is
>> > > > >> >> the sum of all bytes, s2 is the sum of all s1 values.  Both sums
>> > > > >> >> are done modulo 65521. s1 is initialized to 1, s2 to zero.  The
>> > > > >> >> Adler-32 checksum is stored as s2*65536 + s1 in most-
>> > > > >> >> significant-byte first (network) order.
>> > > > >> >>
>> > > > >> >> * https://www.ietf.org/rfc/rfc1950.txt
>> > > > >> >> * http://en.wikipedia.org/wiki/Adler-32
>> > > > >> >>
>> > > > >> >> direct translation to J
>> > > > >> >>
>> > > > >> >> adler32=: 3 : 0
>> > > > >> >> 's1 s2'=. 1 0
>> > > > >> >> for_a. a.i.y do.
>> > > > >> >>   s1=. 65521 | s1 + a
>> > > > >> >>   s2=. 65521 | s2 + s1
>> > > > >> >> end.
>> > > > >> >> s1 + 16 (33 b.) s2
>> > > > >> >> )
>> > > > >> >>
>> > > > >> >> test data:
>> > > > >> >>
>> > > > >> >>  adler32 'Wikipedia'
>> > > > >> >> 300286872
>> > > > >> >>
>> > > > >> >>  adler32 'The quick brown fox jumps over the lazy dog'
>> > > > >> >> 1541148634
>> > > > >> >>
>> > > > >> >> --
>> > > > >> >> regards,
>> > > > >> >> ====================================================
>> > > > >> >> GPG key 1024D/4434BAB3 2008-08-24
>> > > > >> >> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
>> > > > >> >> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
>> > > > >> >>
>> > > ----------------------------------------------------------------------
>> > > > >> >> For information about J forums see
>> > > http://www.jsoftware.com/forums.htm
>> > > > >> ----------------------------------------------------------------------
>> > > > >> For information about J forums see
>> > > http://www.jsoftware.com/forums.htm
>>
>>
>>
>>
>>
>>
>> > > > >
>> > > > > --
>> > > > > regards,
>> > > > > ====================================================
>> > > > > GPG key 1024D/4434BAB3 2008-08-24
>> > > > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
>> > > > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
>> > > > > ----------------------------------------------------------------------
>> > > > > For information about J forums see 
>> > > > > http://www.jsoftware.com/forums.htm
>> > > > ----------------------------------------------------------------------
>> > > > For information about J forums see http://www.jsoftware.com/forums.htm



>
>
>
>> > >
>> > > --
>> > > regards,
>> > > ====================================================
>> > > GPG key 1024D/4434BAB3 2008-08-24
>> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
>> > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
>> > >
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>>
>> --
>> regards,
>> ====================================================
>> GPG key 1024D/4434BAB3 2008-08-24
>> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
>> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>
> --
> regards,
> ====================================================
> GPG key 1024D/4434BAB3 2008-08-24
> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Adler-32

Reply via email to