Re: [Jprogramming] zlib huffman coding

Joe Bogner Thu, 11 Sep 2014 07:29:48 -0700

unfortunately the dynamic coding in the putty fork doesn't seem to work:

deflate -c deflate.c > out
deflate -d out


decoding error: incorrect data checksum


it works fine with static tables

C:\temp>echo ABCD > ABCD

C:\temp>deflate -c ABCD > out

C:\temp>deflate -d out
ABCD

I added some debugging code to determine that deflating deflate.c
would be a dynamic table...  Assuming it's broke, I probably wouldn't
use it as a reference implementation after all

On Thu, Sep 11, 2014 at 3:45 AM, bill lam <[email protected]> wrote:
> the frequencies (guessing from bit lengths) should be something like 2 3 1 1
>   (2 3 1 1) hcodes 'ABCD'
>
> the hard part is the inverse problem: how to get the huffman code with
> prior knowing the bits for each symbol.  Your pointer to the putty
> fork looks like helpful.  The comment is in lines 861 to 914, the code
> itself in line 915 to 964. Do you know how to express it in J?
> Thanks.
>
> On Thu, Sep 11, 2014 at 2:57 PM, Joe Bogner <[email protected]> wrote:
>> Here a few other links ... after reading through the RFC. Not sure if
>> they help, but just sharing from my own research into assisting on
>> this topic
>>
>> https://github.com/evegard/pngview/blob/master/huffman.c#L54
>>
>> And a fork of the putty version with dynamic huffman coding:
>> http://rc.quest.com/viewvc/putty/trunk/halibut/deflate.c?diff_format=s&revision=2&view=markup
>>
>> Or just generally googling some of the code from the RFC:
>> https://www.google.com/search?q=next_code%5Blen%5D%2B%2B%3B&oq=next_code%5Blen%5D%2B%2B%3B&aqs=chrome..69i57.387j0j7&sourceid=chrome&es_sm=93&ie=UTF-8#q=next_code%5Blen%5D%2B%2B%3B&start=20
>>
>>
>> Using the code from
>> http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding, I got stuck
>> trying to match a simple example to the binary tree in the RFC:
>>
>> From the RFC:
>>
>>                   /\              Symbol    Code
>>                          0  1             ------    ----
>>                         /    \                A      00
>>                        /\     B               B       1
>>                       0  1                    C     011
>>                      /    \                   D     010
>>                     A     /\
>>                          0  1
>>                         /    \
>>                        D      C
>>
>>
>>
>>    (4#1) hcodes 'ABCD'
>> ┌───┬───┬───┬───┐
>> │0 0│0 1│1 0│1 1│
>> └───┴───┴───┴───┘
>>
>> Per the RFC, ideally that should match this? '00';'1';'011';'010'
>>
>>
>> From there, it seems like a pretty straightforward exercise to
>> transliterate the C code from the RFC into J code to recode the
>> example to:
>>
>>
>>             Symbol  Code
>>             ------  ----
>>             A       10
>>             B       0
>>             C       110
>>             D       111
>>
>>
>> I would probably start with a looping construct like what's in the RFC
>> and then figure out a more J way to do it, but first I would need to
>> figure out how to create the binary tree in that initial format.
>>
>> On Wed, Sep 10, 2014 at 7:41 PM, bill lam <[email protected]> wrote:
>>> Thanks Joe,
>>> putty only use zlib static huffman for encoding so that it does not build
>>> any huffman dictionary table.
>>>
>>> The zlib static huffman code does not care about individual symbol's
>>> frequency, it just encode 0 to 286 into bits, see section 3.2.6.
>>>  On Sep 11, 2014 1:26 AM, "Joe Bogner" <[email protected]> wrote:
>>>
>>>> You've already likely considered this, but if it were me I would compare
>>>> results to a working implementation. The one from putty seems pretty clean
>>>> and standalone:
>>>> https://raw.githubusercontent.com/grumpydev/PortablePuTTY/master/SSHZLIB.C
>>>> . I was able to compile it on windows no problem and I assume it'd be fine
>>>> on linux as well.
>>>>
>>>> On Wed, Sep 10, 2014 at 1:00 PM, Raul Miller <[email protected]>
>>>> wrote:
>>>>
>>>> > I think the use of the term "consecutive" rather than "sequential" is
>>>> > telling.
>>>> >
>>>> > The described algorithm is: compute the huffman code lengths:
>>>> >    #@>F1 hcodes A1
>>>> > 1 3 7 7 6 6 6 6 6 6 6 2
>>>> >
>>>> > Then assign ascending huffman codes first in length order and then
>>>> > within codes of the same length.
>>>> >
>>>> > Taken literally, that might be something like this:
>>>> >
>>>> > H=: 4 :0
>>>> >   L=.#@> x hcodes y
>>>> >   U=.~.L
>>>> >   ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
>>>> > )
>>>> >
>>>> >    ":@>F1 H A1
>>>> > 0
>>>> > 1 1 0
>>>> > 1 1 1 0 0 1 0
>>>> > 1 1 1 0 0 1 1
>>>> > 1 1 1 0 0 0
>>>> > 1 1 1 0 0 1
>>>> > 1 1 1 0 1 0
>>>> > 1 1 1 0 1 1
>>>> > 1 1 1 1 0 0
>>>> > 1 1 1 1 0 1
>>>> > 1 1 1 1 1 0
>>>> > 1 0
>>>> >
>>>> > But is this correct? Is it actually safe to leave the results like
>>>> > this - with all codes of the same length being consecutive to each
>>>> > other?
>>>> >
>>>> >    F (hcodes -:&:(#@>) H) A
>>>> > 0
>>>> >
>>>> > No.
>>>> >
>>>> > So... "consecutive" must refer only to the values used and not their
>>>> > order within the result.
>>>> >
>>>> > Perhaps something like this:
>>>> >
>>>> > deflatecodes=:4 :0
>>>> >   L=.#@> x hcodes y
>>>> >   U=.~.L
>>>> >   R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
>>>> >   R/:;(</. i.@#)L
>>>> > )
>>>> >
>>>> >    F (hcodes -:&:(#@>) deflatecodes)  A
>>>> > 1
>>>> >
>>>> > There should be a better way of doing this, but this should at least
>>>> > get you started.
>>>> >
>>>> > Thanks,
>>>> >
>>>> > --
>>>> > Raul
>>>> >
>>>> >
>>>> > On Wed, Sep 10, 2014 at 10:45 AM, bill lam <[email protected]> wrote:
>>>> > > For huffman coding used in zlib:
>>>> > > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2.
>>>> > >
>>>> > >  The Huffman codes used for each alphabet in the "deflate"
>>>> > >  format have two additional rules:
>>>> > >
>>>> > >   * All codes of a given bit length have lexicographically
>>>> > >   consecutive values, in the same order as the symbols
>>>> > >   they represent;
>>>> > >
>>>> > >   * Shorter codes lexicographically precede longer codes.
>>>> > > I tried jwiki hcodes in
>>>> > > I try Roger's essay
>>>> > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding
>>>> > >
>>>> > > hc=: 4 : 0
>>>> > > if. 1=#x do. y
>>>> > > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=. 2{./:x end.
>>>> > > )
>>>> > >
>>>> > > hcodes=: 4 : 0
>>>> > > assert. x -:&$ y           NB. weights and words have same shape
>>>> > > assert. (0<:x) *. 1=#$x    NB. weights are non-negative
>>>> > > assert. 1 >: L.y           NB. words are boxed not more than once
>>>> > > w=. ,&.> y                 NB. standardized words
>>>> > > assert. w -: ~.w           NB. words are unique
>>>> > > t=. 0 {:: x hc w           NB. minimal weight binary tree
>>>> > > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t
>>>> > > )
>>>> > >
>>>> > > but the coding produced is malformed for zlib. eg,
>>>> > > this is what I ran into trouble
>>>> > >
>>>> > > f1=: 1 256 17 1 1 9 1
>>>> > > f2=: 2 1 0 1 255 0 1536
>>>> > > F=: ,/(f1#f2)
>>>> > > A=: i.286
>>>> > >
>>>> > > F hcodes A
>>>> > >
>>>> > > Or a shorter example
>>>> > >
>>>> > > A1=: i.12
>>>> > > F1=: 2 1 0 0 0 0 0 0 0 0 0 1
>>>> > >
>>>> > > F1 hcodes A1
>>>> > >
>>>> > > Any idea?
>>>> > >
>>>> > > --
>>>> > > regards,
>>>> > > ====================================================
>>>> > > GPG key 1024D/4434BAB3 2008-08-24
>>>> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
>>>> > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
>>>> > > ----------------------------------------------------------------------
>>>> > > For information about J forums see http://www.jsoftware.com/forums.htm
>>>> > ----------------------------------------------------------------------
>>>> > For information about J forums see http://www.jsoftware.com/forums.htm
>>>> >
>>>> ----------------------------------------------------------------------
>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>>
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] zlib huffman coding

Reply via email to