unfortunately the dynamic coding in the putty fork doesn't seem to work: deflate -c deflate.c > out deflate -d out
decoding error: incorrect data checksum it works fine with static tables C:\temp>echo ABCD > ABCD C:\temp>deflate -c ABCD > out C:\temp>deflate -d out ABCD I added some debugging code to determine that deflating deflate.c would be a dynamic table... Assuming it's broke, I probably wouldn't use it as a reference implementation after all On Thu, Sep 11, 2014 at 3:45 AM, bill lam <[email protected]> wrote: > the frequencies (guessing from bit lengths) should be something like 2 3 1 1 > (2 3 1 1) hcodes 'ABCD' > > the hard part is the inverse problem: how to get the huffman code with > prior knowing the bits for each symbol. Your pointer to the putty > fork looks like helpful. The comment is in lines 861 to 914, the code > itself in line 915 to 964. Do you know how to express it in J? > Thanks. > > On Thu, Sep 11, 2014 at 2:57 PM, Joe Bogner <[email protected]> wrote: >> Here a few other links ... after reading through the RFC. Not sure if >> they help, but just sharing from my own research into assisting on >> this topic >> >> https://github.com/evegard/pngview/blob/master/huffman.c#L54 >> >> And a fork of the putty version with dynamic huffman coding: >> http://rc.quest.com/viewvc/putty/trunk/halibut/deflate.c?diff_format=s&revision=2&view=markup >> >> Or just generally googling some of the code from the RFC: >> https://www.google.com/search?q=next_code%5Blen%5D%2B%2B%3B&oq=next_code%5Blen%5D%2B%2B%3B&aqs=chrome..69i57.387j0j7&sourceid=chrome&es_sm=93&ie=UTF-8#q=next_code%5Blen%5D%2B%2B%3B&start=20 >> >> >> Using the code from >> http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding, I got stuck >> trying to match a simple example to the binary tree in the RFC: >> >> From the RFC: >> >> /\ Symbol Code >> 0 1 ------ ---- >> / \ A 00 >> /\ B B 1 >> 0 1 C 011 >> / \ D 010 >> A /\ >> 0 1 >> / \ >> D C >> >> >> >> (4#1) hcodes 'ABCD' >> ┌───┬───┬───┬───┐ >> │0 0│0 1│1 0│1 1│ >> └───┴───┴───┴───┘ >> >> Per the RFC, ideally that should match this? '00';'1';'011';'010' >> >> >> From there, it seems like a pretty straightforward exercise to >> transliterate the C code from the RFC into J code to recode the >> example to: >> >> >> Symbol Code >> ------ ---- >> A 10 >> B 0 >> C 110 >> D 111 >> >> >> I would probably start with a looping construct like what's in the RFC >> and then figure out a more J way to do it, but first I would need to >> figure out how to create the binary tree in that initial format. >> >> On Wed, Sep 10, 2014 at 7:41 PM, bill lam <[email protected]> wrote: >>> Thanks Joe, >>> putty only use zlib static huffman for encoding so that it does not build >>> any huffman dictionary table. >>> >>> The zlib static huffman code does not care about individual symbol's >>> frequency, it just encode 0 to 286 into bits, see section 3.2.6. >>> On Sep 11, 2014 1:26 AM, "Joe Bogner" <[email protected]> wrote: >>> >>>> You've already likely considered this, but if it were me I would compare >>>> results to a working implementation. The one from putty seems pretty clean >>>> and standalone: >>>> https://raw.githubusercontent.com/grumpydev/PortablePuTTY/master/SSHZLIB.C >>>> . I was able to compile it on windows no problem and I assume it'd be fine >>>> on linux as well. >>>> >>>> On Wed, Sep 10, 2014 at 1:00 PM, Raul Miller <[email protected]> >>>> wrote: >>>> >>>> > I think the use of the term "consecutive" rather than "sequential" is >>>> > telling. >>>> > >>>> > The described algorithm is: compute the huffman code lengths: >>>> > #@>F1 hcodes A1 >>>> > 1 3 7 7 6 6 6 6 6 6 6 2 >>>> > >>>> > Then assign ascending huffman codes first in length order and then >>>> > within codes of the same length. >>>> > >>>> > Taken literally, that might be something like this: >>>> > >>>> > H=: 4 :0 >>>> > L=.#@> x hcodes y >>>> > U=.~.L >>>> > ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L >>>> > ) >>>> > >>>> > ":@>F1 H A1 >>>> > 0 >>>> > 1 1 0 >>>> > 1 1 1 0 0 1 0 >>>> > 1 1 1 0 0 1 1 >>>> > 1 1 1 0 0 0 >>>> > 1 1 1 0 0 1 >>>> > 1 1 1 0 1 0 >>>> > 1 1 1 0 1 1 >>>> > 1 1 1 1 0 0 >>>> > 1 1 1 1 0 1 >>>> > 1 1 1 1 1 0 >>>> > 1 0 >>>> > >>>> > But is this correct? Is it actually safe to leave the results like >>>> > this - with all codes of the same length being consecutive to each >>>> > other? >>>> > >>>> > F (hcodes -:&:(#@>) H) A >>>> > 0 >>>> > >>>> > No. >>>> > >>>> > So... "consecutive" must refer only to the values used and not their >>>> > order within the result. >>>> > >>>> > Perhaps something like this: >>>> > >>>> > deflatecodes=:4 :0 >>>> > L=.#@> x hcodes y >>>> > U=.~.L >>>> > R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L >>>> > R/:;(</. i.@#)L >>>> > ) >>>> > >>>> > F (hcodes -:&:(#@>) deflatecodes) A >>>> > 1 >>>> > >>>> > There should be a better way of doing this, but this should at least >>>> > get you started. >>>> > >>>> > Thanks, >>>> > >>>> > -- >>>> > Raul >>>> > >>>> > >>>> > On Wed, Sep 10, 2014 at 10:45 AM, bill lam <[email protected]> wrote: >>>> > > For huffman coding used in zlib: >>>> > > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2. >>>> > > >>>> > > The Huffman codes used for each alphabet in the "deflate" >>>> > > format have two additional rules: >>>> > > >>>> > > * All codes of a given bit length have lexicographically >>>> > > consecutive values, in the same order as the symbols >>>> > > they represent; >>>> > > >>>> > > * Shorter codes lexicographically precede longer codes. >>>> > > I tried jwiki hcodes in >>>> > > I try Roger's essay >>>> > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding >>>> > > >>>> > > hc=: 4 : 0 >>>> > > if. 1=#x do. y >>>> > > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=. 2{./:x end. >>>> > > ) >>>> > > >>>> > > hcodes=: 4 : 0 >>>> > > assert. x -:&$ y NB. weights and words have same shape >>>> > > assert. (0<:x) *. 1=#$x NB. weights are non-negative >>>> > > assert. 1 >: L.y NB. words are boxed not more than once >>>> > > w=. ,&.> y NB. standardized words >>>> > > assert. w -: ~.w NB. words are unique >>>> > > t=. 0 {:: x hc w NB. minimal weight binary tree >>>> > > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t >>>> > > ) >>>> > > >>>> > > but the coding produced is malformed for zlib. eg, >>>> > > this is what I ran into trouble >>>> > > >>>> > > f1=: 1 256 17 1 1 9 1 >>>> > > f2=: 2 1 0 1 255 0 1536 >>>> > > F=: ,/(f1#f2) >>>> > > A=: i.286 >>>> > > >>>> > > F hcodes A >>>> > > >>>> > > Or a shorter example >>>> > > >>>> > > A1=: i.12 >>>> > > F1=: 2 1 0 0 0 0 0 0 0 0 0 1 >>>> > > >>>> > > F1 hcodes A1 >>>> > > >>>> > > Any idea? >>>> > > >>>> > > -- >>>> > > regards, >>>> > > ==================================================== >>>> > > GPG key 1024D/4434BAB3 2008-08-24 >>>> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >>>> > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >>>> > > ---------------------------------------------------------------------- >>>> > > For information about J forums see http://www.jsoftware.com/forums.htm >>>> > ---------------------------------------------------------------------- >>>> > For information about J forums see http://www.jsoftware.com/forums.htm >>>> > >>>> ---------------------------------------------------------------------- >>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>> >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
