Re: [Jprogramming] zlib huffman coding

bill lam Thu, 11 Sep 2014 18:08:23 -0700

It seems everyone use huffman code in its own way, at least huffman coding
in jpeg is different from that in deflate.


The intent to is to replace bmp with png as the default image format in jal
addons for j803.  This is already almost fully done. With dynamic huffman
coding, png images can achieve better compression ratio. But even without
it, png files using fixed huffman coding can already compress about 90% to
99% for images output from viewmat.

The addon will use stock zlib binaries when they are  available, otherwise
falls back to J implementation.

This is independent of jqt so that it can work for jconsole and jhs.

I don't know the details but I think jhs can also take advantage to support
gzip/deflate encoding for transfer.
On Sep 12, 2014 8:33 AM, "Joe Bogner" <[email protected]> wrote:

> The bit widths are calculated from the huffman tree
>
> See
>
>
> http://stackoverflow.com/questions/759707/efficient-way-of-storing-huffman-tree
>
>
> http://www.siggraph.org/education/materials/HyperGraph/video/mpeg/mpegfaq/huffman_tutorial.html
>
> The timing is interesting considering we were talking about trees the other
> day:
>
> http://jsoftware.2058.n7.nabble.com/Ragged-Array-Shapes-are-Trees-td63207.html
>
> I was thinking to myself then how I hadn't used trees more than a few times
> in 18 years of programming.
>
> I am not sure how to apply your code to the problem.  I also am not
> completely sure what problem we are solving.  If it is creating a
> standalone J deflate implementation or PNG compression it may be a tall
> order. I would be curious why not just interface to a C library like what
> is done in the image3 addon:
> http://www.jsoftware.com/jwiki/Addons/media/image3
> On Sep 11, 2014 6:27 PM, "Raul Miller" <[email protected]> wrote:
>
> > Here's the code I came up with, with Bill's help:
> >
> > bl_count=:3 :0 NB. y is result of freqs
> >   0,}.<:#/.~(,~ [: i. 1 + >./)y
> > )
> >
> > start_vals=: +:@+/\.&.|.@}:@,~&0
> >
> > find_codes=:3 :0 NB. y is result of freqs
> >  b=. bl_count y
> >  v=. start_vals b
> >  n=. /:~ ~.y-.0
> >  o=. ;({./.~ /:~ (</. i.@#)) y
> >  c=. ;<"1&.>n (([#2:) #: ])&.> (*b)#v+&.>i.&.>b
> >  c /: o
> > )
> >
> > An alternate version of the result from find_codes would be given by:
> >
> > def_code=:3 :0
> >  b=. bl_count y
> >  v=. start_vals b
> >  n=. /:~ ~.y-.0
> >  o=. ;({./.~ /:~ (</. i.@#)) y
> >  c=. ;n,.&.>(*b)#v+&.>i.&.>b
> >  (,. i.@#)c /: o
> > )
> >
> > The argument to find_codes or def_code is the bit widths for each symbol.
> >
> > I have not been able to figure out, from rfc 1951, how the bit widths
> > are calculated.
> >
> > Thanks,
> >
> > --
> > Raul
> >
> >
> >
> > On Thu, Sep 11, 2014 at 4:47 PM, Joe Bogner <[email protected]> wrote:
> > > bill, I'd be interested in a solution but I don't think I can
> > > contribute any more on this. I played with
> > > https://code.google.com/p/miniz/ and became even more convinced of the
> > > complexity. It seems as though the compressor can decide whether to
> > > include the dictionary code table or not -- likely based on the size
> > > of the table.
> > >
> > >
> > > http://tools.ietf.org/html/rfc1950
> > >
> > > A preset dictionary is specially useful to compress short input
> > >       sequences. The compressor can take advantage of the dictionary
> > >       context to encode the input in a more compact manner.
> > >
> > >
> > > More links for anyone who is following and cares to go down the rabbit
> > hole too:
> > >
> > > http://en.wikipedia.org/wiki/Canonical_Huffman_code
> > >
> > >
> >
> http://stackoverflow.com/questions/759707/efficient-way-of-storing-huffman-tree
> > >
> > >
> > >
> > > On Thu, Sep 11, 2014 at 1:28 PM, bill lam <[email protected]> wrote:
> > >> This codes seemed invalid.
> > >>
> > >> 1 is a prefix of 11 which is a prefix of 111. Suppose there
> > >> is a bit pattern of 1 1 , it is ambiguous to mean
> > >> [68,'1']  [68,'1']
> > >> or [65,'11']
> > >>
> > >> The huffman code in rfc is canonical meaning there is exactly one
> > >> possible huffman codes for a given bit length vector. This is
> > >> important because the huffman code table itself will not be
> > >> stored inside the deflate stream. The decoder only gets the bit
> > >> length vector, if encoder and decoder use different huffman code
> > >> for the same bit length vectors, it will not work.
> > >>
> > >> Чт, 11 сен 2014, Joe Bogner написал(а):
> > >>> Ignore the pako.js example output... It was just outputting the
> binary
> > >>> representation of A-Z, not the huffman code
> > >>>
> > >>> This is what I meant to send
> > >>>
> > >>> For ABCD:
> > >>>
> > >>> [65,'11'],
> > >>> [66,'0'],
> > >>> [67,'10'],
> > >>> [68,'1'],
> > >>> [256,'111']
> > >>>
> > >>> It still doesn't seem to be sorting correctly lexographically, but
> I'm
> > >>> not really in my comfort zone of understanding:
> > >>>
> > >>> The RFC has this instead:
> > >>>
> > >>> Symbol  Code
> > >>>             ------  ----
> > >>>             A       10
> > >>>             B       0
> > >>>             C       110
> > >>>             D       111
> > >>>
> > >>> I don't really know if it has to match the RFC or if each
> > >>> implementation is able to do its own thing as long since it includes
> > >>> the distance/reverse lookup table (whatever it's called).
> > >>>
> > >>> FYI
> > >>>
> > >>>
> > >>> This is where I inserted my code:
> > >>>
> > >>> /*
> >
> ===========================================================================
> > >>>  * Generate the codes for a given tree and bit counts (which need not
> > be
> > >>>  * optimal).
> > >>>  * IN assertion: the array bl_count contains the bit length
> statistics
> > for
> > >>>  * the given tree and the field len is set for all tree elements.
> > >>>  * OUT assertion: the field code is set for all tree elements of non
> > >>>  *     zero code length.
> > >>>  */
> > >>> function gen_codes(tree, max_code, bl_count)
> > >>> //    ct_data *tree;             /* the tree to decorate */
> > >>> //    int max_code;              /* largest code with non zero
> > frequency */
> > >>> //    ushf *bl_count;            /* number of codes at each bit
> length
> > */
> > >>> {
> > >>>   var next_code = new Array(MAX_BITS+1); /* next code value for each
> > >>> bit length */
> > >>>   var code = 0;              /* running code value */
> > >>>   var bits;                  /* bit index */
> > >>>   var n;                     /* code index */
> > >>>
> > >>>   /* The distribution counts are first used to generate the code
> values
> > >>>    * without bit reversal.
> > >>>    */
> > >>>   for (bits = 1; bits <= MAX_BITS; bits++) {
> > >>>     next_code[bits] = code = (code + bl_count[bits-1]) << 1;
> > >>>   }
> > >>>   /* Check that the bit counts in bl_count are consistent. The last
> > code
> > >>>    * must be all ones.
> > >>>    */
> > >>>   //Assert (code + bl_count[MAX_BITS]-1 == (1<<MAX_BITS)-1,
> > >>>   //        "inconsistent bit counts");
> > >>>   //Tracev((stderr,"\ngen_codes: max_code %d ", max_code));
> > >>>
> > >>>   for (n = 0;  n <= max_code; n++) {
> > >>>     var len = tree[n*2 + 1]/*.Len*/;
> > >>>     if (len === 0) { continue; }
> > >>>     /* Now reverse the bits */
> > >>>     tree[n*2]/*.Code*/ = bi_reverse(next_code[len]++, len);
> > >>>
> > >>>       if (tree!=static_ltree) {
> > >>>           var v = tree[n*2];
> > >>>           console.log('[' + n + ",'" + v.toString(2) + "'],");
> > >>>       }
> > >>>     //Tracecv(tree != static_ltree, (stderr,"\nn %3d %c l %2d c %4x
> > (%x) ",
> > >>>     //     n, (isgraph(n) ? n : ' '), len, tree[n].Code,
> > next_code[len]-1));
> > >>>   }
> > >>>
> > >>> }
> > >>>
> > >>> On Thu, Sep 11, 2014 at 11:55 AM, Joe Bogner <[email protected]>
> > wrote:
> > >>> > I think the prefix coding looks OK, but the 2 rules does not:
> > >>> >
> > >>> > I modified the code[1] to allow passing in a string and outputting
> > the codes
> > >>> >
> > >>> > C:\temp>deflate ABCDEFGHIJKLMONPQRSTUVWXYZ
> > >>> > code 65 : 0 00000000000000000000000000000000
> > >>> > code 66 : 6 00000000000000000000000000000110
> > >>> > code 67 : 8 00000000000000000000000000001000
> > >>> > code 68 : 4 00000000000000000000000000000100
> > >>> > code 69 : 22 00000000000000000000000000010110
> > >>> > code 70 : 14 00000000000000000000000000001110
> > >>> > code 71 : 30 00000000000000000000000000011110
> > >>> > code 72 : 1 00000000000000000000000000000001
> > >>> > code 73 : 17 00000000000000000000000000010001
> > >>> > code 74 : 12 00000000000000000000000000001100
> > >>> > code 75 : 9 00000000000000000000000000001001
> > >>> > code 76 : 25 00000000000000000000000000011001
> > >>> > code 77 : 5 00000000000000000000000000000101
> > >>> > code 78 : 21 00000000000000000000000000010101
> > >>> > code 79 : 13 00000000000000000000000000001101
> > >>> > code 80 : 29 00000000000000000000000000011101
> > >>> > code 81 : 3 00000000000000000000000000000011
> > >>> > code 82 : 19 00000000000000000000000000010011
> > >>> > code 83 : 11 00000000000000000000000000001011
> > >>> > code 84 : 27 00000000000000000000000000011011
> > >>> > code 85 : 7 00000000000000000000000000000111
> > >>> > code 86 : 23 00000000000000000000000000010111
> > >>> > code 87 : 15 00000000000000000000000000001111
> > >>> > code 88 : 31 00000000000000000000000000011111
> > >>> > code 89 : 2 00000000000000000000000000000010
> > >>> > code 90 : 10 00000000000000000000000000001010
> > >>> >
> > >>> >
> > >>> > I think it violates the consecutive rule... Each letter has the
> same
> > >>> > frequency. ABCD have the same bit length. The order is off:
> > >>> >
> > >>> > If I sort it lexographically using javascript:
> > >>> >
> > >>> > JSON.stringify([['a','00000000000000000000000000000000'],
> > >>> > ['b','00000000000000000000000000000110'],
> > >>> > ['c','00000000000000000000000000001000'],
> > >>> > ['d','00000000000000000000000000000100']].sort(function(x,y) {
> return
> > >>> > x[1] - y[1] }))
> > >>> >
> > >>> >
> >
> "[["a","00000000000000000000000000000000"],["d","00000000000000000000000000000100"],["b","00000000000000000000000000000110"],["c","00000000000000000000000000001000"]]"
> > >>> >
> > >>> > As you can see, the order comes out a,d,b,c
> > >>> >
> > >>> > I played around with a javascript implementation, pako[2]. It seems
> > to
> > >>> > work correctly:
> > >>> >
> > >>> > As you can see, it sorts lexographically
> > >>> >
> > >>> > JSON.stringify([[65,'1000001'],
> > >>> > [66,'1000010'],
> > >>> > [67,'1000011'],
> > >>> > [68,'1000100'],
> > >>> > [69,'1000101'],
> > >>> > [70,'1000110'],
> > >>> > [71,'1000111'],
> > >>> > [72,'1001000'],
> > >>> > [73,'1001001'],
> > >>> > [74,'1001010'],
> > >>> > [75,'1001011'],
> > >>> > [76,'1001100'],
> > >>> > [77,'1001101'],
> > >>> > [78,'1001110'],
> > >>> > [79,'1001111'],
> > >>> > [80,'1010000'],
> > >>> > [81,'1010001'],
> > >>> > [82,'1010010'],
> > >>> > [83,'1010011'],
> > >>> > [84,'1010100'],
> > >>> > [85,'1010101'],
> > >>> > [86,'1010110'],
> > >>> > [87,'1010111'],
> > >>> > [88,'1011000'],
> > >>> > [89,'1011001'],
> > >>> > [90,'1011010']].sort(function(x,y) { return x[1] - y[1] }))
> > >>> >
> > >>> >
> >
> "[[65,"1000001"],[66,"1000010"],[67,"1000011"],[68,"1000100"],[69,"1000101"],[70,"1000110"],[71,"1000111"],[72,"1001000"],[73,"1001001"],[74,"1001010"],[75,"1001011"],[76,"1001100"],[77,"1001101"],[78,"1001110"],[79,"1001111"],[80,"1010000"],[81,"1010001"],[82,"1010010"],[83,"1010011"],[84,"1010100"],[85,"1010101"],[86,"1010110"],[87,"1010111"],[88,"1011000"],[89,"1011001"],[90,"1011010"]]"
> > >>> >
> > >>> > All the values are sorted correctly.
> > >>> >
> > >>> > Here it is with the same ABCD example:
> > >>> >
> > >>> > var pako = require('pako');
> > >>> > var binaryString = pako.deflate('ABCD', { to: 'string' });
> > >>> > console.log(binaryString);
> > >>> > var restored = pako.inflate(binaryString, { to: 'string' });
> > >>> > console.log(restored);
> > >>> >
> > >>> > It successfully deflates and inflates itself
> > >>> >
> > >>> > x?♣A☺☺   ? mcÿ7♣A♫☻?☺♂
> > >>> > ABCD
> > >>> >
> > >>> >
> > >>> > Hope this helps...
> > >>> >
> > >>> > [1] -
> > https://gist.github.com/joebo/a3c2932f0e5a7a0c3f07#file-deflate-c-L2613
> > >>> > [2] - https://rawgit.com/nodeca/pako/master/dist/pako.js
> > >>> >
> > >>> > On Thu, Sep 11, 2014 at 11:33 AM, bill lam <[email protected]>
> > wrote:
> > >>> >> This is strange since every author must had decode its own encoded
> > >>> >> data as a smoke test.
> > >>> >>
> > >>> >> Did you test if huffman code or bit lengths it produced was
> > >>> >> correct or not, ie it is a prefix coding and it satisfy the 2
> > >>> >> rules in rfc.
> > >>> >>
> > >>> >> Чт, 11 сен 2014, Joe Bogner написал(а):
> > >>> >>> unfortunately the dynamic coding in the putty fork doesn't seem
> to
> > work:
> > >>> >>>
> > >>> >>> deflate -c deflate.c > out
> > >>> >>> deflate -d out
> > >>> >>>
> > >>> >>> decoding error: incorrect data checksum
> > >>> >>>
> > >>> >>>
> > >>> >>> it works fine with static tables
> > >>> >>>
> > >>> >>> C:\temp>echo ABCD > ABCD
> > >>> >>>
> > >>> >>> C:\temp>deflate -c ABCD > out
> > >>> >>>
> > >>> >>> C:\temp>deflate -d out
> > >>> >>> ABCD
> > >>> >>>
> > >>> >>> I added some debugging code to determine that deflating deflate.c
> > >>> >>> would be a dynamic table...  Assuming it's broke, I probably
> > wouldn't
> > >>> >>> use it as a reference implementation after all
> > >>> >>>
> > >>> >>> On Thu, Sep 11, 2014 at 3:45 AM, bill lam <[email protected]>
> > wrote:
> > >>> >>> > the frequencies (guessing from bit lengths) should be something
> > like 2 3 1 1
> > >>> >>> >   (2 3 1 1) hcodes 'ABCD'
> > >>> >>> >
> > >>> >>> > the hard part is the inverse problem: how to get the huffman
> > code with
> > >>> >>> > prior knowing the bits for each symbol.  Your pointer to the
> > putty
> > >>> >>> > fork looks like helpful.  The comment is in lines 861 to 914,
> > the code
> > >>> >>> > itself in line 915 to 964. Do you know how to express it in J?
> > >>> >>> > Thanks.
> > >>> >>> >
> > >>> >>> > On Thu, Sep 11, 2014 at 2:57 PM, Joe Bogner <
> [email protected]>
> > wrote:
> > >>> >>> >> Here a few other links ... after reading through the RFC. Not
> > sure if
> > >>> >>> >> they help, but just sharing from my own research into
> assisting
> > on
> > >>> >>> >> this topic
> > >>> >>> >>
> > >>> >>> >> https://github.com/evegard/pngview/blob/master/huffman.c#L54
> > >>> >>> >>
> > >>> >>> >> And a fork of the putty version with dynamic huffman coding:
> > >>> >>> >>
> >
> http://rc.quest.com/viewvc/putty/trunk/halibut/deflate.c?diff_format=s&revision=2&view=markup
> > >>> >>> >>
> > >>> >>> >> Or just generally googling some of the code from the RFC:
> > >>> >>> >>
> >
> https://www.google.com/search?q=next_code%5Blen%5D%2B%2B%3B&oq=next_code%5Blen%5D%2B%2B%3B&aqs=chrome..69i57.387j0j7&sourceid=chrome&es_sm=93&ie=UTF-8#q=next_code%5Blen%5D%2B%2B%3B&start=20
> > >>> >>> >>
> > >>> >>> >>
> > >>> >>> >> Using the code from
> > >>> >>> >> http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding, I got
> > stuck
> > >>> >>> >> trying to match a simple example to the binary tree in the
> RFC:
> > >>> >>> >>
> > >>> >>> >> From the RFC:
> > >>> >>> >>
> > >>> >>> >>                   /\              Symbol    Code
> > >>> >>> >>                          0  1             ------    ----
> > >>> >>> >>                         /    \                A      00
> > >>> >>> >>                        /\     B               B       1
> > >>> >>> >>                       0  1                    C     011
> > >>> >>> >>                      /    \                   D     010
> > >>> >>> >>                     A     /\
> > >>> >>> >>                          0  1
> > >>> >>> >>                         /    \
> > >>> >>> >>                        D      C
> > >>> >>> >>
> > >>> >>> >>
> > >>> >>> >>
> > >>> >>> >>    (4#1) hcodes 'ABCD'
> > >>> >>> >> ┌───┬───┬───┬───┐
> > >>> >>> >> │0 0│0 1│1 0│1 1│
> > >>> >>> >> └───┴───┴───┴───┘
> > >>> >>> >>
> > >>> >>> >> Per the RFC, ideally that should match this?
> > '00';'1';'011';'010'
> > >>> >>> >>
> > >>> >>> >>
> > >>> >>> >> From there, it seems like a pretty straightforward exercise to
> > >>> >>> >> transliterate the C code from the RFC into J code to recode
> the
> > >>> >>> >> example to:
> > >>> >>> >>
> > >>> >>> >>
> > >>> >>> >>             Symbol  Code
> > >>> >>> >>             ------  ----
> > >>> >>> >>             A       10
> > >>> >>> >>             B       0
> > >>> >>> >>             C       110
> > >>> >>> >>             D       111
> > >>> >>> >>
> > >>> >>> >>
> > >>> >>> >> I would probably start with a looping construct like what's in
> > the RFC
> > >>> >>> >> and then figure out a more J way to do it, but first I would
> > need to
> > >>> >>> >> figure out how to create the binary tree in that initial
> format.
> > >>> >>> >>
> > >>> >>> >> On Wed, Sep 10, 2014 at 7:41 PM, bill lam <
> [email protected]>
> > wrote:
> > >>> >>> >>> Thanks Joe,
> > >>> >>> >>> putty only use zlib static huffman for encoding so that it
> > does not build
> > >>> >>> >>> any huffman dictionary table.
> > >>> >>> >>>
> > >>> >>> >>> The zlib static huffman code does not care about individual
> > symbol's
> > >>> >>> >>> frequency, it just encode 0 to 286 into bits, see section
> > 3.2.6.
> > >>> >>> >>>  On Sep 11, 2014 1:26 AM, "Joe Bogner" <[email protected]>
> > wrote:
> > >>> >>> >>>
> > >>> >>> >>>> You've already likely considered this, but if it were me I
> > would compare
> > >>> >>> >>>> results to a working implementation. The one from putty
> seems
> > pretty clean
> > >>> >>> >>>> and standalone:
> > >>> >>> >>>>
> >
> https://raw.githubusercontent.com/grumpydev/PortablePuTTY/master/SSHZLIB.C
> > >>> >>> >>>> . I was able to compile it on windows no problem and I
> assume
> > it'd be fine
> > >>> >>> >>>> on linux as well.
> > >>> >>> >>>>
> > >>> >>> >>>> On Wed, Sep 10, 2014 at 1:00 PM, Raul Miller <
> > [email protected]>
> > >>> >>> >>>> wrote:
> > >>> >>> >>>>
> > >>> >>> >>>> > I think the use of the term "consecutive" rather than
> > "sequential" is
> > >>> >>> >>>> > telling.
> > >>> >>> >>>> >
> > >>> >>> >>>> > The described algorithm is: compute the huffman code
> > lengths:
> > >>> >>> >>>> >    #@>F1 hcodes A1
> > >>> >>> >>>> > 1 3 7 7 6 6 6 6 6 6 6 2
> > >>> >>> >>>> >
> > >>> >>> >>>> > Then assign ascending huffman codes first in length order
> > and then
> > >>> >>> >>>> > within codes of the same length.
> > >>> >>> >>>> >
> > >>> >>> >>>> > Taken literally, that might be something like this:
> > >>> >>> >>>> >
> > >>> >>> >>>> > H=: 4 :0
> > >>> >>> >>>> >   L=.#@> x hcodes y
> > >>> >>> >>>> >   U=.~.L
> > >>> >>> >>>> >   ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
> > >>> >>> >>>> > )
> > >>> >>> >>>> >
> > >>> >>> >>>> >    ":@>F1 H A1
> > >>> >>> >>>> > 0
> > >>> >>> >>>> > 1 1 0
> > >>> >>> >>>> > 1 1 1 0 0 1 0
> > >>> >>> >>>> > 1 1 1 0 0 1 1
> > >>> >>> >>>> > 1 1 1 0 0 0
> > >>> >>> >>>> > 1 1 1 0 0 1
> > >>> >>> >>>> > 1 1 1 0 1 0
> > >>> >>> >>>> > 1 1 1 0 1 1
> > >>> >>> >>>> > 1 1 1 1 0 0
> > >>> >>> >>>> > 1 1 1 1 0 1
> > >>> >>> >>>> > 1 1 1 1 1 0
> > >>> >>> >>>> > 1 0
> > >>> >>> >>>> >
> > >>> >>> >>>> > But is this correct? Is it actually safe to leave the
> > results like
> > >>> >>> >>>> > this - with all codes of the same length being consecutive
> > to each
> > >>> >>> >>>> > other?
> > >>> >>> >>>> >
> > >>> >>> >>>> >    F (hcodes -:&:(#@>) H) A
> > >>> >>> >>>> > 0
> > >>> >>> >>>> >
> > >>> >>> >>>> > No.
> > >>> >>> >>>> >
> > >>> >>> >>>> > So... "consecutive" must refer only to the values used and
> > not their
> > >>> >>> >>>> > order within the result.
> > >>> >>> >>>> >
> > >>> >>> >>>> > Perhaps something like this:
> > >>> >>> >>>> >
> > >>> >>> >>>> > deflatecodes=:4 :0
> > >>> >>> >>>> >   L=.#@> x hcodes y
> > >>> >>> >>>> >   U=.~.L
> > >>> >>> >>>> >   R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
> > >>> >>> >>>> >   R/:;(</. i.@#)L
> > >>> >>> >>>> > )
> > >>> >>> >>>> >
> > >>> >>> >>>> >    F (hcodes -:&:(#@>) deflatecodes)  A
> > >>> >>> >>>> > 1
> > >>> >>> >>>> >
> > >>> >>> >>>> > There should be a better way of doing this, but this
> should
> > at least
> > >>> >>> >>>> > get you started.
> > >>> >>> >>>> >
> > >>> >>> >>>> > Thanks,
> > >>> >>> >>>> >
> > >>> >>> >>>> > --
> > >>> >>> >>>> > Raul
> > >>> >>> >>>> >
> > >>> >>> >>>> >
> > >>> >>> >>>> > On Wed, Sep 10, 2014 at 10:45 AM, bill lam <
> > [email protected]> wrote:
> > >>> >>> >>>> > > For huffman coding used in zlib:
> > >>> >>> >>>> > > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2.
> > >>> >>> >>>> > >
> > >>> >>> >>>> > >  The Huffman codes used for each alphabet in the
> "deflate"
> > >>> >>> >>>> > >  format have two additional rules:
> > >>> >>> >>>> > >
> > >>> >>> >>>> > >   * All codes of a given bit length have
> lexicographically
> > >>> >>> >>>> > >   consecutive values, in the same order as the symbols
> > >>> >>> >>>> > >   they represent;
> > >>> >>> >>>> > >
> > >>> >>> >>>> > >   * Shorter codes lexicographically precede longer
> codes.
> > >>> >>> >>>> > > I tried jwiki hcodes in
> > >>> >>> >>>> > > I try Roger's essay
> > >>> >>> >>>> > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding
> > >>> >>> >>>> > >
> > >>> >>> >>>> > > hc=: 4 : 0
> > >>> >>> >>>> > > if. 1=#x do. y
> > >>> >>> >>>> > > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=.
> > 2{./:x end.
> > >>> >>> >>>> > > )
> > >>> >>> >>>> > >
> > >>> >>> >>>> > > hcodes=: 4 : 0
> > >>> >>> >>>> > > assert. x -:&$ y           NB. weights and words have
> > same shape
> > >>> >>> >>>> > > assert. (0<:x) *. 1=#$x    NB. weights are non-negative
> > >>> >>> >>>> > > assert. 1 >: L.y           NB. words are boxed not more
> > than once
> > >>> >>> >>>> > > w=. ,&.> y                 NB. standardized words
> > >>> >>> >>>> > > assert. w -: ~.w           NB. words are unique
> > >>> >>> >>>> > > t=. 0 {:: x hc w           NB. minimal weight binary
> tree
> > >>> >>> >>>> > > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t
> > >>> >>> >>>> > > )
> > >>> >>> >>>> > >
> > >>> >>> >>>> > > but the coding produced is malformed for zlib. eg,
> > >>> >>> >>>> > > this is what I ran into trouble
> > >>> >>> >>>> > >
> > >>> >>> >>>> > > f1=: 1 256 17 1 1 9 1
> > >>> >>> >>>> > > f2=: 2 1 0 1 255 0 1536
> > >>> >>> >>>> > > F=: ,/(f1#f2)
> > >>> >>> >>>> > > A=: i.286
> > >>> >>> >>>> > >
> > >>> >>> >>>> > > F hcodes A
> > >>> >>> >>>> > >
> > >>> >>> >>>> > > Or a shorter example
> > >>> >>> >>>> > >
> > >>> >>> >>>> > > A1=: i.12
> > >>> >>> >>>> > > F1=: 2 1 0 0 0 0 0 0 0 0 0 1
> > >>> >>> >>>> > >
> > >>> >>> >>>> > > F1 hcodes A1
> > >>> >>> >>>> > >
> > >>> >>> >>>> > > Any idea?
> > >>> >>> >>>> > >
> > >>> >>> >>>> > > --
> > >>> >>> >>>> > > regards,
> > >>> >>> >>>> > > ====================================================
> > >>> >>> >>>> > > GPG key 1024D/4434BAB3 2008-08-24
> > >>> >>> >>>> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > >>> >>> >>>> > > gpg --keyserver subkeys.pgp.net --armor --export
> 4434BAB3
> > >>> >>> >>>> > >
> > ----------------------------------------------------------------------
> > >>> >>> >>>> > > For information about J forums see
> > http://www.jsoftware.com/forums.htm
> > >>> >>> >>>> >
> > ----------------------------------------------------------------------
> > >>> >>> >>>> > For information about J forums see
> > http://www.jsoftware.com/forums.htm
> > >>> >>> >>>> >
> > >>> >>> >>>>
> > ----------------------------------------------------------------------
> > >>> >>> >>>> For information about J forums see
> > http://www.jsoftware.com/forums.htm
> > >>> >>> >>>>
> > >>> >>> >>>
> > ----------------------------------------------------------------------
> > >>> >>> >>> For information about J forums see
> > http://www.jsoftware.com/forums.htm
> > >>> >>> >>
> > ----------------------------------------------------------------------
> > >>> >>> >> For information about J forums see
> > http://www.jsoftware.com/forums.htm
> > >>> >>> >
> > ----------------------------------------------------------------------
> > >>> >>> > For information about J forums see
> > http://www.jsoftware.com/forums.htm
> > >>> >>>
> > ----------------------------------------------------------------------
> > >>> >>> For information about J forums see
> > http://www.jsoftware.com/forums.htm
> > >>> >>
> > >>> >> --
> > >>> >> regards,
> > >>> >> ====================================================
> > >>> >> GPG key 1024D/4434BAB3 2008-08-24
> > >>> >> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > >>> >> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> > >>> >>
> > ----------------------------------------------------------------------
> > >>> >> For information about J forums see
> > http://www.jsoftware.com/forums.htm
> > >>>
> ----------------------------------------------------------------------
> > >>> For information about J forums see
> http://www.jsoftware.com/forums.htm
> > >>
> > >> --
> > >> regards,
> > >> ====================================================
> > >> GPG key 1024D/4434BAB3 2008-08-24
> > >> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > >> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> > >> ----------------------------------------------------------------------
> > >> For information about J forums see
> http://www.jsoftware.com/forums.htm
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] zlib huffman coding

Reply via email to