for zlib, all huffman code of the same bit length are in sequential, starting with a code an 0 bit in the end (even number), and no gaps in the block (consecutive). the symbols of these block represented are sequential but not consecutive (possibly gaps). So the zlib huffman code is slightly less efficient than the original huffman code but the advantage is simpler to store the table, just the bits used. section 3.2.6 gives an example of such table.
deflatecodes can satisfy the 2 rules _but_ its result is invalid because huffman code is a prefix coding. ,.F1 ( deflatecodes) A1 ┌─────────────┐ │0 │ ├─────────────┤ │1 1 0 │ ├─────────────┤ │1 1 1 0 0 1 0│ ├─────────────┤ │1 1 1 0 0 1 1│ ├─────────────┤ │1 1 1 0 0 0 │ ├─────────────┤ │1 1 1 0 0 1 │ ├─────────────┤ │1 1 1 0 1 0 │ ├─────────────┤ │1 1 1 0 1 1 │ ├─────────────┤ │1 1 1 1 0 0 │ ├─────────────┤ │1 1 1 1 0 1 │ ├─────────────┤ │1 1 1 1 1 0 │ ├─────────────┤ │1 0 │ └─────────────┘ 7 bit 1110010 is a prefix of 6 bit 111001 and this is illegal. Instead 7 bit code block should start with #.inv 2b11100 + 7 1 1 1 1 1 1 0 because there are 6 symbols in the 6 bit code. (algorithm given in section 3.2.2). Any idea to fix deflatecodes so that it can give valid huffman codes? Thanks. On Sep 11, 2014 1:01 AM, "Raul Miller" <[email protected]> wrote: > I think the use of the term "consecutive" rather than "sequential" is > telling. > > The described algorithm is: compute the huffman code lengths: > #@>F1 hcodes A1 > 1 3 7 7 6 6 6 6 6 6 6 2 > > Then assign ascending huffman codes first in length order and then > within codes of the same length. > > Taken literally, that might be something like this: > > H=: 4 :0 > L=.#@> x hcodes y > U=.~.L > ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L > ) > > ":@>F1 H A1 > 0 > 1 1 0 > 1 1 1 0 0 1 0 > 1 1 1 0 0 1 1 > 1 1 1 0 0 0 > 1 1 1 0 0 1 > 1 1 1 0 1 0 > 1 1 1 0 1 1 > 1 1 1 1 0 0 > 1 1 1 1 0 1 > 1 1 1 1 1 0 > 1 0 > > But is this correct? Is it actually safe to leave the results like > this - with all codes of the same length being consecutive to each > other? > > F (hcodes -:&:(#@>) H) A > 0 > > No. > > So... "consecutive" must refer only to the values used and not their > order within the result. > > Perhaps something like this: > > deflatecodes=:4 :0 > L=.#@> x hcodes y > U=.~.L > R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L > R/:;(</. i.@#)L > ) > > F (hcodes -:&:(#@>) deflatecodes) A > 1 > > There should be a better way of doing this, but this should at least > get you started. > > Thanks, > > -- > Raul > > > On Wed, Sep 10, 2014 at 10:45 AM, bill lam <[email protected]> wrote: > > For huffman coding used in zlib: > > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2. > > > > The Huffman codes used for each alphabet in the "deflate" > > format have two additional rules: > > > > * All codes of a given bit length have lexicographically > > consecutive values, in the same order as the symbols > > they represent; > > > > * Shorter codes lexicographically precede longer codes. > > I tried jwiki hcodes in > > I try Roger's essay > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding > > > > hc=: 4 : 0 > > if. 1=#x do. y > > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=. 2{./:x end. > > ) > > > > hcodes=: 4 : 0 > > assert. x -:&$ y NB. weights and words have same shape > > assert. (0<:x) *. 1=#$x NB. weights are non-negative > > assert. 1 >: L.y NB. words are boxed not more than once > > w=. ,&.> y NB. standardized words > > assert. w -: ~.w NB. words are unique > > t=. 0 {:: x hc w NB. minimal weight binary tree > > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t > > ) > > > > but the coding produced is malformed for zlib. eg, > > this is what I ran into trouble > > > > f1=: 1 256 17 1 1 9 1 > > f2=: 2 1 0 1 255 0 1536 > > F=: ,/(f1#f2) > > A=: i.286 > > > > F hcodes A > > > > Or a shorter example > > > > A1=: i.12 > > F1=: 2 1 0 0 0 0 0 0 0 0 0 1 > > > > F1 hcodes A1 > > > > Any idea? > > > > -- > > regards, > > ==================================================== > > GPG key 1024D/4434BAB3 2008-08-24 > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
