Re: [Jprogramming] Adler-32

bill lam Sun, 24 Aug 2014 20:04:51 -0700

Even if it overflows to floating point, it still gets 52-bit
preceison. In zlib, adler32 checksum is calcualted for each
block, which I guess is 32K or 64K. Therefore the issue of 
overflow might well be a non-issue in the context of zlib.


On J32

   9!:11[16

   255**:65536
1095216660480
   2^51x
2251799813685248


Вс, 24 авг 2014, jprogramming написал(а):
> I'm having a hard time getting a streaming version (one that works on parts 
> of the data and then recombines.
> 
> can only do 4096 bytes for 32 bit J, before it needs do call |.. but it is 
> fast, and 64bit J can do 2^32 larger byte sizes (over 16TB)
> 
>     timespacex ' (65521 | [: ({: , [: <: +/) +/\) 1 ,~ Randdata' 
> 0.187033 2.68442e8
> 
> the block I have with a streaming (chunk) version is how to get these numbers
> 
>    +/\ 1 87 105 107 105 112 101 100 105 97
> 1 88 193 300 405 517 618 718 823 920
> 
> from numbers in this shape:
>    _3 [\ 1,  a. i. 'Wikipedia' 
>   1  87 105 
> 107 105 112 
> 101 100 105 
>  97   0   0 
> 
> , is not allowed.  The key point about 1 88 193 300 405 517 618 718 823 920 
> is that it provides the basis for s2
> 
>    <: +/ 1 88 193 300 405 517 618 718 823 920 
> 4582 
> 
> 
> 
> ----- Original Message -----
> From: 'Pascal Jasmin' via Programming <[email protected]>
> To: "[email protected]" <[email protected]>
> Cc: 
> Sent: Sunday, August 24, 2014 11:48:23 AM
> Subject: Re: [Jprogramming] Adler-32
> 
> sorry that version doesn't work on a stream.  will post one later.
> 
> 
> ----- Original Message -----
> From: bill lam <[email protected]>
> To: 'Pascal Jasmin' via Programming <[email protected]>
> Cc: 
> Sent: Sunday, August 24, 2014 11:15:57 AM
> Subject: Re: [Jprogramming] Adler-32
> 
> Did you mean  _4]\  ?
> 
> Anyway, from the wikipedia page,
> B = (1 + D[1]) + (1 + D[1] + D[2]) + ... + (1 + D[1] + D[2] + ... + D[n]) 
> (mod 65521)
>    = n×D[1] + (n−1)×D[2] + (n−2)×D[3] + ... + D[n] + n (mod 65521)
> 
> if a parallel version is asked for, probably the formula in the
> 2nd line should be used, then the speed may increase directly
> proportional to the number of core. A good problem for Nvidia
> Jetson Tk1 board with 192 CUDA cores.
> 
> * http://www.nvidia.com/object/jetson-tk1-embedded-dev-kit.html
> 
> 
> NB. =============================
> NB. Raul Miller
> a32=: [: ({: (23 b.) 16&(33 b.)@{.) _1 0 + [: ((65521 | +)/ , {.) [: (65521 | 
> +)/\. 1 ,~ a. i. |.
> 
> NB. from zlib rfc
> adler32=: 3 : 0
> 's1 s2'=. 1 0
> for_a. a.i.y do.
>   s1=. 65521 | s1 + a
>   s2=. 65521 | s2 + s1
> end.
> ({: (23 b.) 16&(33 b.)@{.) s2,s1
> )
> 
> NB. formula from adler32 wikipedia
> adler32a=: 3 : 0
> dat=. a.i.y
> s1=. (65521|+)/1,dat
> s2=. (#dat) (65521|+) (65521|+)/ (}:i.(-1+#dat)) (65521|*) dat
> ({: (23 b.) 16&(33 b.)@{.) s2,s1
> )
> 
> f=: 1e6{.a.
> 
> smoutput adler32 f
> smoutput adler32a f
> smoutput a32 f
> 
> smoutput timespacex'adler32 f'
> smoutput timespacex'adler32a f'
> smoutput timespacex'a32 f'
> NB. =============================
> 
> result on J32
> 
> _121519079
> _121519079
> _121519079
> 3.99429 4.19898e6
> 1.03239 2.09744e7
> 0.9106 8.38976e6
> 
> 
> Вс, 24 авг 2014, jprogramming написал(а):
> > an approach that should be fast and also allows resuming where you left 
> > off, and avoiding many |
> > 
> >    ([: ({: , [: <: +/) +/\) 1,  a. i. 'Wikipedia' 
> > 920 4582 
> > 
> > works even when you split up stream into say 32k bytes, and do the mod 
> > transform between each stream, and the single number reduction at the end.
> > 
> >     ([: ({: , [: <: +/) +/\)"1 ] _4 ]/   1,  a. i. 'Wikipedia' 
> > 920 4582 
> > 
> > 
> > 
> > 
> > ----- Original Message -----
> > From: bill lam <[email protected]>
> > To: Programming forum <[email protected]>
> > Cc: 
> > Sent: Sunday, August 24, 2014 7:35:26 AM
> > Subject: Re: [Jprogramming] Adler-32
> > 
> > I realized that my loop translation solution was equally faulty in its
> > final step. It should be
> > ({: (23 b.) 16&(33 b.)@{.) s2,s1
> > 
> > 
> > 
> > On Aug 24, 2014 6:40 PM, "bill lam" <[email protected]> wrote:
> > 
> > > J32 should give a 32-bit integer so that the last train
> > > should be
> > >  [: ({: (23 b.) 16&(33 b.)@{.)
> > > instead of
> > >  65536#.
> > >
> > > This shoud be good enough. Thanks.
> > >
> > > Вс, 24 авг 2014, Jan-Pieter Jacobs написал(а):
> > > > You could incorporate the modulo operation in the summation directly:
> > > >
> > > > As per Raul's solution, with the modulo addition:
> > > >
> > > > a32_2=: 65536#. _1 0+ [:((65521|+)/ , {.) [: (65521|+)/\. 1,~ a.i. |.
> > > > NB. Test on 10M of random data
> > > > Randdata =: a.{~ ?10000000$256
> > > > timespacex 'r1=:a32_2 Randdata'
> > > > 7.26671 2.68438e8
> > > >
> > > > timespacex 'r1=:adler32 Randdata'
> > > > 36.2094 2.01333e8
> > > >
> > > > Jan-Pieter
> > > >
> > > > 2014-08-24 11:14 GMT+02:00 bill lam <[email protected]>:
> > > > > Since s2 is quadratic, it overflows quickly in J32. However
> > > > > ieee floating points have 52 bits mantissa, so it can still
> > > > > give correct result for medium size data albeit overflowed.
> > > > >
> > > > > 9!:11[16
> > > > > (3!:0;]) a32 1e6{.a.
> > > > >
> > > > > Вс, 24 авг 2014, Raul Miller написал(а):
> > > > >> Hmm... actually there might be a floating point overflow problem for
> > > > >> gigabyte length arguments. I do not have the patience, however, to
> > > > >> find what (for example) adler32 1e9#{:a. is.
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> --
> > > > >> Raul
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Sun, Aug 24, 2014 at 4:37 AM, Raul Miller <[email protected]>
> > > wrote:
> > > > >> > I believe this is equivalent, and a bit faster:
> > > > >> >
> > > > >> > a32=: 65536#. 65521| _1 0+ [:(+/ , {.) 65521| [:+/\. 1,~ a.i. |.
> > > > >> >
> > > > >> >    a32 'Wikipedia'
> > > > >> > 300286872
> > > > >> >    a32 'The quick brown fox jumps over the lazy dog'
> > > > >> > 1541148634
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >> > --
> > > > >> > Raul
> > > > >> >
> > > > >> >
> > > > >> > On Sun, Aug 24, 2014 at 4:05 AM, bill lam <[email protected]>
> > > wrote:
> > > > >> >> Does anyone have a J-ish solution to adler32 checksum
> > > > >> >> and avoid overflow to floating points ?
> > > > >> >>
> > > > >> >> Adler-32 is composed of two sums accumulated per byte: s1 is
> > > > >> >> the sum of all bytes, s2 is the sum of all s1 values.  Both sums
> > > > >> >> are done modulo 65521. s1 is initialized to 1, s2 to zero.  The
> > > > >> >> Adler-32 checksum is stored as s2*65536 + s1 in most-
> > > > >> >> significant-byte first (network) order.
> > > > >> >>
> > > > >> >> * https://www.ietf.org/rfc/rfc1950.txt
> > > > >> >> * http://en.wikipedia.org/wiki/Adler-32
> > > > >> >>
> > > > >> >> direct translation to J
> > > > >> >>
> > > > >> >> adler32=: 3 : 0
> > > > >> >> 's1 s2'=. 1 0
> > > > >> >> for_a. a.i.y do.
> > > > >> >>   s1=. 65521 | s1 + a
> > > > >> >>   s2=. 65521 | s2 + s1
> > > > >> >> end.
> > > > >> >> s1 + 16 (33 b.) s2
> > > > >> >> )
> > > > >> >>
> > > > >> >> test data:
> > > > >> >>
> > > > >> >>  adler32 'Wikipedia'
> > > > >> >> 300286872
> > > > >> >>
> > > > >> >>  adler32 'The quick brown fox jumps over the lazy dog'
> > > > >> >> 1541148634
> > > > >> >>
> > > > >> >> --
> > > > >> >> regards,
> > > > >> >> ====================================================
> > > > >> >> GPG key 1024D/4434BAB3 2008-08-24
> > > > >> >> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > > > >> >> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> > > > >> >>
> > > ----------------------------------------------------------------------
> > > > >> >> For information about J forums see
> > > http://www.jsoftware.com/forums.htm
> > > > >> ----------------------------------------------------------------------
> > > > >> For information about J forums see
> > > http://www.jsoftware.com/forums.htm
> 
> 
> 
> 
> 
> 
> > > > >
> > > > > --
> > > > > regards,
> > > > > ====================================================
> > > > > GPG key 1024D/4434BAB3 2008-08-24
> > > > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > > > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> > > > > ----------------------------------------------------------------------
> > > > > For information about J forums see http://www.jsoftware.com/forums.htm
> > > > ----------------------------------------------------------------------
> > > > For information about J forums see http://www.jsoftware.com/forums.htm
> > >
> > > --
> > > regards,
> > > ====================================================
> > > GPG key 1024D/4434BAB3 2008-08-24
> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> > >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> 
> -- 
> regards,
> ====================================================
> GPG key 1024D/4434BAB3 2008-08-24
> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

-- 
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Adler-32

Reply via email to