Re: [Jprogramming] zlib huffman coding

bill lam Sun, 14 Sep 2014 03:29:18 -0700

Thanks Raul and Joe,

The problem seemed now solved. It turns out that how to get bit length 
was a red herring. The main problem is huffman coding is in-applicable
to zero bit of information, ie the universe has only 0 or 1
symbol. eg, the result of bitlen 0 0 0  or bitlen 1 0 0 0 0 cannot
be correct (GIGO).  However such input actually happens in zlib
compression and section 4 of rfc,


  ...  If only one distance
  code is used, it is encoded using one bit, not zero bits; in
  this case there is a single code length of one, with one unused
  code.  One distance code of zero bits means that there are no
  distance codes used at all ...

Now arc/zlib addon enables dynamic huffman coding. Thank you
both of you for patience and help.

BTW I found the new bitlens has some problems, first it should
zero out bit length of zero frequency on output.

    1 0 1 bitlen i.3
1 0 1

    1 0 1 bitlens i.3
2 2 1

Actually 1 bit can encode 2 symbols, data encoded by it were
rejected during decoding with zlib.so even after masked off 
the zero frequency entries. 

What did you mean by "data is well formed"?


Вс, 14 сен 2014, Raul Miller написал(а):
> Here's a perhaps slightly more efficient way of computing bit lengths:
> 
> bitlens=: 4 :0
>   t=. 0{:: x hc w=. ,&.> y
>   ((<S:0 t)i. w) { #S:1 {::t
> )
> 
> This assumes that the data is well formed, but the big thing is that it
> does not compute a boxed intermediate result and instead finds the lengths
> earlier in the process.
> 
> Thanks,
> 
> -- 
> Raul
> 
> On Sat, Sep 13, 2014 at 3:41 PM, Joe Bogner <[email protected]> wrote:
> 
> > Raul, I just tried your code and confirmed that it returned the same
> > results as mine:
> >
> > find_codes (3, 3, 3, 3,3, 2, 4, 4)
> > ┌─────┬─────┬─────┬─────┬─────┬───┬───────┬───────┐
> > │0 1 0│0 1 1│1 0 0│1 0 1│1 1 0│0 0│1 1 1 0│1 1 1 1│
> > └─────┴─────┴─────┴─────┴─────┴───┴───────┴───────┘
> >
> > #: inv every find_codes (3, 3, 3, 3,3, 2, 4, 4)
> > 2 3 4 5 6 0 14 15
> >
> > tree=: makeFromLengths (3, 3, 3, 3,3, 2, 4, 4)
> > tree
> >
> > 2 3 4 5 6 0 14 15
> >
> > #: tree
> > 0 0 1 0
> > 0 0 1 1
> > 0 1 0 0
> > 0 1 0 1
> > 0 1 1 0
> > 0 0 0 0
> > 1 1 1 0
> > 1 1 1 1
> >
> > I like the looks of yours more. Mine was a transliteration from the
> > RFC. I will need to study yours as I don't understand it yet.
> >
> > I agree that using hcodes with either mine or yours should produce
> > what's needed per the RFC, at least the steps we were looking at in
> > the RFC. There is likely more solve still beyond that
> >
> >
> > On Sat, Sep 13, 2014 at 2:52 PM, Raul Miller <[email protected]>
> > wrote:
> > > If we use hcodes to get bit lengths, and another mechanism (I think the
> > > last one I posted is adequate) to determine the bit values, there should
> > be
> > > no reason to iterate further. The two extra constraints should never
> > > require that bit lengths be changed.
> > >
> > > Thanks,
> > >
> > > --
> > > Raul
> > >
> > > On Sat, Sep 13, 2014 at 12:43 PM, bill lam <[email protected]> wrote:
> > >
> > >> I try the putty fork, its seems fialed to uncompress it own
> > >> compressed data.  Perhaps its compressed data is malformed.
> > >>
> > >> I also look at the trees.c of the the official zlib. I do not
> > >> understand the code, too complex for me.  AFAICU it does not
> > >> generate huffman codes using any standard algorithm, but it
> > >> assigns bit lengths using some heuristics (trail and error).
> > >> I guess we can try using hcodes to get an initial guess of bit
> > >> lengths and then iterate to make it satisfy the 2 rules.
> > >>
> > >> Чт, 11 сен 2014, bill lam написал(а):
> > >> > the frequencies (guessing from bit lengths) should be something like
> > 2 3
> > >> 1 1
> > >> >   (2 3 1 1) hcodes 'ABCD'
> > >> >
> > >> > the hard part is the inverse problem: how to get the huffman code with
> > >> > prior knowing the bits for each symbol.  Your pointer to the putty
> > >> > fork looks like helpful.  The comment is in lines 861 to 914, the code
> > >> > itself in line 915 to 964. Do you know how to express it in J?
> > >> > Thanks.
> > >> >
> > >> > On Thu, Sep 11, 2014 at 2:57 PM, Joe Bogner <[email protected]>
> > wrote:
> > >> > > Here a few other links ... after reading through the RFC. Not sure
> > if
> > >> > > they help, but just sharing from my own research into assisting on
> > >> > > this topic
> > >> > >
> > >> > > https://github.com/evegard/pngview/blob/master/huffman.c#L54
> > >> > >
> > >> > > And a fork of the putty version with dynamic huffman coding:
> > >> > >
> > >>
> > http://rc.quest.com/viewvc/putty/trunk/halibut/deflate.c?diff_format=s&revision=2&view=markup
> > >> > >
> > >> > > Or just generally googling some of the code from the RFC:
> > >> > >
> > >>
> > https://www.google.com/search?q=next_code%5Blen%5D%2B%2B%3B&oq=next_code%5Blen%5D%2B%2B%3B&aqs=chrome..69i57.387j0j7&sourceid=chrome&es_sm=93&ie=UTF-8#q=next_code%5Blen%5D%2B%2B%3B&start=20
> > >> > >
> > >> > >
> > >> > > Using the code from
> > >> > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding, I got stuck
> > >> > > trying to match a simple example to the binary tree in the RFC:
> > >> > >
> > >> > > From the RFC:
> > >> > >
> > >> > >                   /\              Symbol    Code
> > >> > >                          0  1             ------    ----
> > >> > >                         /    \                A      00
> > >> > >                        /\     B               B       1
> > >> > >                       0  1                    C     011
> > >> > >                      /    \                   D     010
> > >> > >                     A     /\
> > >> > >                          0  1
> > >> > >                         /    \
> > >> > >                        D      C
> > >> > >
> > >> > >
> > >> > >
> > >> > >    (4#1) hcodes 'ABCD'
> > >> > > ┌───┬───┬───┬───┐
> > >> > > │0 0│0 1│1 0│1 1│
> > >> > > └───┴───┴───┴───┘
> > >> > >
> > >> > > Per the RFC, ideally that should match this? '00';'1';'011';'010'
> > >> > >
> > >> > >
> > >> > > From there, it seems like a pretty straightforward exercise to
> > >> > > transliterate the C code from the RFC into J code to recode the
> > >> > > example to:
> > >> > >
> > >> > >
> > >> > >             Symbol  Code
> > >> > >             ------  ----
> > >> > >             A       10
> > >> > >             B       0
> > >> > >             C       110
> > >> > >             D       111
> > >> > >
> > >> > >
> > >> > > I would probably start with a looping construct like what's in the
> > RFC
> > >> > > and then figure out a more J way to do it, but first I would need to
> > >> > > figure out how to create the binary tree in that initial format.
> > >> > >
> > >> > > On Wed, Sep 10, 2014 at 7:41 PM, bill lam <[email protected]>
> > wrote:
> > >> > >> Thanks Joe,
> > >> > >> putty only use zlib static huffman for encoding so that it does not
> > >> build
> > >> > >> any huffman dictionary table.
> > >> > >>
> > >> > >> The zlib static huffman code does not care about individual
> > symbol's
> > >> > >> frequency, it just encode 0 to 286 into bits, see section 3.2.6.
> > >> > >>  On Sep 11, 2014 1:26 AM, "Joe Bogner" <[email protected]>
> > wrote:
> > >> > >>
> > >> > >>> You've already likely considered this, but if it were me I would
> > >> compare
> > >> > >>> results to a working implementation. The one from putty seems
> > pretty
> > >> clean
> > >> > >>> and standalone:
> > >> > >>>
> > >>
> > https://raw.githubusercontent.com/grumpydev/PortablePuTTY/master/SSHZLIB.C
> > >> > >>> . I was able to compile it on windows no problem and I assume it'd
> > >> be fine
> > >> > >>> on linux as well.
> > >> > >>>
> > >> > >>> On Wed, Sep 10, 2014 at 1:00 PM, Raul Miller <
> > [email protected]>
> > >> > >>> wrote:
> > >> > >>>
> > >> > >>> > I think the use of the term "consecutive" rather than
> > "sequential"
> > >> is
> > >> > >>> > telling.
> > >> > >>> >
> > >> > >>> > The described algorithm is: compute the huffman code lengths:
> > >> > >>> >    #@>F1 hcodes A1
> > >> > >>> > 1 3 7 7 6 6 6 6 6 6 6 2
> > >> > >>> >
> > >> > >>> > Then assign ascending huffman codes first in length order and
> > then
> > >> > >>> > within codes of the same length.
> > >> > >>> >
> > >> > >>> > Taken literally, that might be something like this:
> > >> > >>> >
> > >> > >>> > H=: 4 :0
> > >> > >>> >   L=.#@> x hcodes y
> > >> > >>> >   U=.~.L
> > >> > >>> >   ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
> > >> > >>> > )
> > >> > >>> >
> > >> > >>> >    ":@>F1 H A1
> > >> > >>> > 0
> > >> > >>> > 1 1 0
> > >> > >>> > 1 1 1 0 0 1 0
> > >> > >>> > 1 1 1 0 0 1 1
> > >> > >>> > 1 1 1 0 0 0
> > >> > >>> > 1 1 1 0 0 1
> > >> > >>> > 1 1 1 0 1 0
> > >> > >>> > 1 1 1 0 1 1
> > >> > >>> > 1 1 1 1 0 0
> > >> > >>> > 1 1 1 1 0 1
> > >> > >>> > 1 1 1 1 1 0
> > >> > >>> > 1 0
> > >> > >>> >
> > >> > >>> > But is this correct? Is it actually safe to leave the results
> > like
> > >> > >>> > this - with all codes of the same length being consecutive to
> > each
> > >> > >>> > other?
> > >> > >>> >
> > >> > >>> >    F (hcodes -:&:(#@>) H) A
> > >> > >>> > 0
> > >> > >>> >
> > >> > >>> > No.
> > >> > >>> >
> > >> > >>> > So... "consecutive" must refer only to the values used and not
> > >> their
> > >> > >>> > order within the result.
> > >> > >>> >
> > >> > >>> > Perhaps something like this:
> > >> > >>> >
> > >> > >>> > deflatecodes=:4 :0
> > >> > >>> >   L=.#@> x hcodes y
> > >> > >>> >   U=.~.L
> > >> > >>> >   R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
> > >> > >>> >   R/:;(</. i.@#)L
> > >> > >>> > )
> > >> > >>> >
> > >> > >>> >    F (hcodes -:&:(#@>) deflatecodes)  A
> > >> > >>> > 1
> > >> > >>> >
> > >> > >>> > There should be a better way of doing this, but this should at
> > >> least
> > >> > >>> > get you started.
> > >> > >>> >
> > >> > >>> > Thanks,
> > >> > >>> >
> > >> > >>> > --
> > >> > >>> > Raul
> > >> > >>> >
> > >> > >>> >
> > >> > >>> > On Wed, Sep 10, 2014 at 10:45 AM, bill lam <[email protected]
> > >
> > >> wrote:
> > >> > >>> > > For huffman coding used in zlib:
> > >> > >>> > > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2.
> > >> > >>> > >
> > >> > >>> > >  The Huffman codes used for each alphabet in the "deflate"
> > >> > >>> > >  format have two additional rules:
> > >> > >>> > >
> > >> > >>> > >   * All codes of a given bit length have lexicographically
> > >> > >>> > >   consecutive values, in the same order as the symbols
> > >> > >>> > >   they represent;
> > >> > >>> > >
> > >> > >>> > >   * Shorter codes lexicographically precede longer codes.
> > >> > >>> > > I tried jwiki hcodes in
> > >> > >>> > > I try Roger's essay
> > >> > >>> > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding
> > >> > >>> > >
> > >> > >>> > > hc=: 4 : 0
> > >> > >>> > > if. 1=#x do. y
> > >> > >>> > > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=. 2{./:x
> > end.
> > >> > >>> > > )
> > >> > >>> > >
> > >> > >>> > > hcodes=: 4 : 0
> > >> > >>> > > assert. x -:&$ y           NB. weights and words have same
> > shape
> > >> > >>> > > assert. (0<:x) *. 1=#$x    NB. weights are non-negative
> > >> > >>> > > assert. 1 >: L.y           NB. words are boxed not more than
> > once
> > >> > >>> > > w=. ,&.> y                 NB. standardized words
> > >> > >>> > > assert. w -: ~.w           NB. words are unique
> > >> > >>> > > t=. 0 {:: x hc w           NB. minimal weight binary tree
> > >> > >>> > > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t
> > >> > >>> > > )
> > >> > >>> > >
> > >> > >>> > > but the coding produced is malformed for zlib. eg,
> > >> > >>> > > this is what I ran into trouble
> > >> > >>> > >
> > >> > >>> > > f1=: 1 256 17 1 1 9 1
> > >> > >>> > > f2=: 2 1 0 1 255 0 1536
> > >> > >>> > > F=: ,/(f1#f2)
> > >> > >>> > > A=: i.286
> > >> > >>> > >
> > >> > >>> > > F hcodes A
> > >> > >>> > >
> > >> > >>> > > Or a shorter example
> > >> > >>> > >
> > >> > >>> > > A1=: i.12
> > >> > >>> > > F1=: 2 1 0 0 0 0 0 0 0 0 0 1
> > >> > >>> > >
> > >> > >>> > > F1 hcodes A1
> > >> > >>> > >
> > >> > >>> > > Any idea?
> > >> > >>> > >
> > >> > >>> > > --
> > >> > >>> > > regards,
> > >> > >>> > > ====================================================
> > >> > >>> > > GPG key 1024D/4434BAB3 2008-08-24
> > >> > >>> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > >> > >>> > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> > >> > >>> > >
> > >> ----------------------------------------------------------------------
> > >> > >>> > > For information about J forums see
> > >> http://www.jsoftware.com/forums.htm
> > >> > >>> >
> > >> ----------------------------------------------------------------------
> > >> > >>> > For information about J forums see
> > >> http://www.jsoftware.com/forums.htm
> > >> > >>> >
> > >> > >>>
> > >> ----------------------------------------------------------------------
> > >> > >>> For information about J forums see
> > >> http://www.jsoftware.com/forums.htm
> > >> > >>>
> > >> > >>
> > ----------------------------------------------------------------------
> > >> > >> For information about J forums see
> > >> http://www.jsoftware.com/forums.htm
> > >> > >
> > ----------------------------------------------------------------------
> > >> > > For information about J forums see
> > http://www.jsoftware.com/forums.htm
> > >>
> > >> --
> > >> regards,
> > >> ====================================================
> > >> GPG key 1024D/4434BAB3 2008-08-24
> > >> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > >> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> > >> ----------------------------------------------------------------------
> > >> For information about J forums see http://www.jsoftware.com/forums.htm
> > >>
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

-- 
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] zlib huffman coding

Reply via email to