Why do the words matter?  It seems the hcodes are independent  of the words.

    A
abcdef
   
   F
3 1 4 1 5 9
   
   F hcodes A
┌─────┬───────┬───┬───────┬───┬───┐
│1 0 1│1 0 0 0│0 0│1 0 0 1│0 1│1 1│
└─────┴───────┴───┴───────┴───┴───┘
   
   F hcodes (?6#256){a.
┌─────┬───────┬───┬───────┬───┬───┐
│1 0 1│1 0 0 0│0 0│1 0 0 1│0 1│1 1│
└─────┴───────┴───┴───────┴───┴───┘
 
I'm only beginning to look at this so I don't know how the story ends.
  
 Linda

  

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of bill lam
Sent: Thursday, September 11, 2014 3:46 AM
To: Programming forum
Subject: Re: [Jprogramming] zlib huffman coding

the frequencies (guessing from bit lengths) should be something like 2 3 1 1
  (2 3 1 1) hcodes 'ABCD'

the hard part is the inverse problem: how to get the huffman code with
prior knowing the bits for each symbol.  Your pointer to the putty
fork looks like helpful.  The comment is in lines 861 to 914, the code
itself in line 915 to 964. Do you know how to express it in J?
Thanks.

On Thu, Sep 11, 2014 at 2:57 PM, Joe Bogner <[email protected]> wrote:
> Here a few other links ... after reading through the RFC. Not sure if
> they help, but just sharing from my own research into assisting on
> this topic
>
> https://github.com/evegard/pngview/blob/master/huffman.c#L54
>
> And a fork of the putty version with dynamic huffman coding:
> http://rc.quest.com/viewvc/putty/trunk/halibut/deflate.c?diff_format=s&revision=2&view=markup
>
> Or just generally googling some of the code from the RFC:
> https://www.google.com/search?q=next_code%5Blen%5D%2B%2B%3B&oq=next_code%5Blen%5D%2B%2B%3B&aqs=chrome..69i57.387j0j7&sourceid=chrome&es_sm=93&ie=UTF-8#q=next_code%5Blen%5D%2B%2B%3B&start=20
>
>
> Using the code from
> http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding, I got stuck
> trying to match a simple example to the binary tree in the RFC:
>
> From the RFC:
>
>                   /\              Symbol    Code
>                          0  1             ------    ----
>                         /    \                A      00
>                        /\     B               B       1
>                       0  1                    C     011
>                      /    \                   D     010
>                     A     /\
>                          0  1
>                         /    \
>                        D      C
>
>
>
>    (4#1) hcodes 'ABCD'
> ┌───┬───┬───┬───┐
> │0 0│0 1│1 0│1 1│
> └───┴───┴───┴───┘
>
> Per the RFC, ideally that should match this? '00';'1';'011';'010'
>
>
> From there, it seems like a pretty straightforward exercise to
> transliterate the C code from the RFC into J code to recode the
> example to:
>
>
>             Symbol  Code
>             ------  ----
>             A       10
>             B       0
>             C       110
>             D       111
>
>
> I would probably start with a looping construct like what's in the RFC
> and then figure out a more J way to do it, but first I would need to
> figure out how to create the binary tree in that initial format.
>
> On Wed, Sep 10, 2014 at 7:41 PM, bill lam <[email protected]> wrote:
>> Thanks Joe,
>> putty only use zlib static huffman for encoding so that it does not build
>> any huffman dictionary table.
>>
>> The zlib static huffman code does not care about individual symbol's
>> frequency, it just encode 0 to 286 into bits, see section 3.2.6.
>>  On Sep 11, 2014 1:26 AM, "Joe Bogner" <[email protected]> wrote:
>>
>>> You've already likely considered this, but if it were me I would compare
>>> results to a working implementation. The one from putty seems pretty clean
>>> and standalone:
>>> https://raw.githubusercontent.com/grumpydev/PortablePuTTY/master/SSHZLIB.C
>>> . I was able to compile it on windows no problem and I assume it'd be fine
>>> on linux as well.
>>>
>>> On Wed, Sep 10, 2014 at 1:00 PM, Raul Miller <[email protected]>
>>> wrote:
>>>
>>> > I think the use of the term "consecutive" rather than "sequential" is
>>> > telling.
>>> >
>>> > The described algorithm is: compute the huffman code lengths:
>>> >    #@>F1 hcodes A1
>>> > 1 3 7 7 6 6 6 6 6 6 6 2
>>> >
>>> > Then assign ascending huffman codes first in length order and then
>>> > within codes of the same length.
>>> >
>>> > Taken literally, that might be something like this:
>>> >
>>> > H=: 4 :0
>>> >   L=.#@> x hcodes y
>>> >   U=.~.L
>>> >   ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
>>> > )
>>> >
>>> >    ":@>F1 H A1
>>> > 0
>>> > 1 1 0
>>> > 1 1 1 0 0 1 0
>>> > 1 1 1 0 0 1 1
>>> > 1 1 1 0 0 0
>>> > 1 1 1 0 0 1
>>> > 1 1 1 0 1 0
>>> > 1 1 1 0 1 1
>>> > 1 1 1 1 0 0
>>> > 1 1 1 1 0 1
>>> > 1 1 1 1 1 0
>>> > 1 0
>>> >
>>> > But is this correct? Is it actually safe to leave the results like
>>> > this - with all codes of the same length being consecutive to each
>>> > other?
>>> >
>>> >    F (hcodes -:&:(#@>) H) A
>>> > 0
>>> >
>>> > No.
>>> >
>>> > So... "consecutive" must refer only to the values used and not their
>>> > order within the result.
>>> >
>>> > Perhaps something like this:
>>> >
>>> > deflatecodes=:4 :0
>>> >   L=.#@> x hcodes y
>>> >   U=.~.L
>>> >   R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
>>> >   R/:;(</. i.@#)L
>>> > )
>>> >
>>> >    F (hcodes -:&:(#@>) deflatecodes)  A
>>> > 1
>>> >
>>> > There should be a better way of doing this, but this should at least
>>> > get you started.
>>> >
>>> > Thanks,
>>> >
>>> > --
>>> > Raul
>>> >
>>> >
>>> > On Wed, Sep 10, 2014 at 10:45 AM, bill lam <[email protected]> wrote:
>>> > > For huffman coding used in zlib:
>>> > > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2.
>>> > >
>>> > >  The Huffman codes used for each alphabet in the "deflate"
>>> > >  format have two additional rules:
>>> > >
>>> > >   * All codes of a given bit length have lexicographically
>>> > >   consecutive values, in the same order as the symbols
>>> > >   they represent;
>>> > >
>>> > >   * Shorter codes lexicographically precede longer codes.
>>> > > I tried jwiki hcodes in
>>> > > I try Roger's essay
>>> > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding
>>> > >
>>> > > hc=: 4 : 0
>>> > > if. 1=#x do. y
>>> > > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=. 2{./:x end.
>>> > > )
>>> > >
>>> > > hcodes=: 4 : 0
>>> > > assert. x -:&$ y           NB. weights and words have same shape
>>> > > assert. (0<:x) *. 1=#$x    NB. weights are non-negative
>>> > > assert. 1 >: L.y           NB. words are boxed not more than once
>>> > > w=. ,&.> y                 NB. standardized words
>>> > > assert. w -: ~.w           NB. words are unique
>>> > > t=. 0 {:: x hc w           NB. minimal weight binary tree
>>> > > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t
>>> > > )
>>> > >
>>> > > but the coding produced is malformed for zlib. eg,
>>> > > this is what I ran into trouble
>>> > >
>>> > > f1=: 1 256 17 1 1 9 1
>>> > > f2=: 2 1 0 1 255 0 1536
>>> > > F=: ,/(f1#f2)
>>> > > A=: i.286
>>> > >
>>> > > F hcodes A
>>> > >
>>> > > Or a shorter example
>>> > >
>>> > > A1=: i.12
>>> > > F1=: 2 1 0 0 0 0 0 0 0 0 0 1
>>> > >
>>> > > F1 hcodes A1
>>> > >
>>> > > Any idea?
>>> > >
>>> > > --
>>> > > regards,
>>> > > ====================================================
>>> > > GPG key 1024D/4434BAB3 2008-08-24
>>> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
>>> > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
>>> > > ----------------------------------------------------------------------
>>> > > For information about J forums see http://www.jsoftware.com/forums.htm
>>> > ----------------------------------------------------------------------
>>> > For information about J forums see http://www.jsoftware.com/forums.htm
>>> >
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to