Ah, good point.

 Here's another thing:

   +/ bits < #@> F hcodes A
31

In some cases the deflate codes have fewer bits than huffman coding.

So the problem is finding the algorithm that was used to define bits - it's
not the huffman coding algorithm.

I guess I need to spend some time reading rfc 1951.

Thanks,

-- 
Raul

On Wed, Sep 10, 2014 at 11:41 PM, bill lam <[email protected]> wrote:

> It refered to A and F (not A1 and F1) in my first email.
>
> I sent you an email off-list that included attachments.
>
>
> Ср, 10 сен 2014, Raul Miller написал(а):
> > Something is wrong here:
> >
> >    $bits
> > 286
> >    (0=bits)-:0=F1
> > 0
> >    +/0=F1
> > 9
> >    +/0=bits
> > 26
> >
> > 0 bits should mean that that symbol is not represented.
> >
> > But there are only 9 such symbols in F1.
> >
> > --
> > Raul
> >
> > On Wed, Sep 10, 2014 at 11:09 PM, bill lam <[email protected]> wrote:
> > > I guess the description of the algorithm itself is correct, but
> > > it does not mention how to get the bits of each symbol. The bit
> > > lengths for classic and zlib huffman can be different.
> > >
> > > for testing, the bits for A from libz.so are
> > >
> > >    ":&>_10<\bits
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 11 11 10 10 10 10 11
> > > 11 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 10 10 10
> > > 10 10 10 10 10 10 10 0 0 0
> > > 0 0 0 0 0 0 0 0 0 0
> > > 0 0 0 0 10 2 0 0 0 0
> > > 0 0 0 0 0 1
> > >
> > > this is the huffman dictionary stored in zlib stream and the actual
> > > huffman codes are computed using the algorithm mentioned during
> decoding.
> > >
> > > Ср, 10 сен 2014, Raul Miller написал(а):
> > >> I agree that there should not be any 2s.
> > >>
> > >> However, I think this means that there's something wrong with the
> > >> description of the algorithm, or at least my understanding of it.
> > >>
> > >> I'll need some time to digest this.
> > >>
> > >> Thanks,
> > >>
> > >> --
> > >> Raul
> > >>
> > >> On Wed, Sep 10, 2014 at 9:39 PM, bill lam <[email protected]>
> wrote:
> > >> > pardon me of forgetting telling another cavaet.
> > >> > zlib huffman code is suboptimal so that bit length of code
> > >> > for each symbol can be longer than that in classic huffman.
> > >> > eg in my orignal data.
> > >> >
> > >> >    10{. ":@>F deflatecodes2 A
> > >> > 1 1 0 0 0 0 0 0 0 0
> > >> > 1 1 1 1 1 1 1 1 1 1 0
> > >> > 1 1 1 1 1 1 1 1 1 1 1
> > >> > 1 1 1 1 1 1 1 1 1 2 0
> > >> > 1 1 1 1 1 1 1 1 1 2 1
> > >> > 1 1 1 1 1 1 1 1 2 1 0
> > >> > 1 1 0 0 0 0 0 0 0 1
> > >> > 1 1 0 0 0 0 0 0 1 0
> > >> > 1 1 0 0 0 0 0 0 1 1
> > >> > 1 1 0 0 0 0 0 1 0 0
> > >> >
> > >> > there shouldn't be any 2's.  It needs to increase bit length of
> > >> > some code.
> > >> >
> > >> > Ср, 10 сен 2014, Raul Miller написал(а):
> > >> >> Oops, sorry about that.
> > >> >>
> > >> >> Try it this way:
> > >> >>
> > >> >> deflatecodes2=:4 :0
> > >> >>   L=. #@> x hcodes y
> > >> >>   U=. 0,~.L
> > >> >>   R=. ;<@(({. >./@(>#])&U #1:)@{. <@:+"1-@{.{."1 #:@i.@#)/.~L
> > >> >>   R/:;(</. i.@#)L
> > >> >> )
> > >> >>
> > >> >>    ":@>F1 deflatecodes2 A1
> > >> >> 0
> > >> >> 1 1 0
> > >> >> 1 1 1 1 1 1 0
> > >> >> 1 1 1 1 1 1 1
> > >> >> 1 1 1 0 0 0
> > >> >> 1 1 1 0 0 1
> > >> >> 1 1 1 0 1 0
> > >> >> 1 1 1 0 1 1
> > >> >> 1 1 1 1 0 0
> > >> >> 1 1 1 1 0 1
> > >> >> 1 1 1 1 1 0
> > >> >> 1 0
> > >> >>
> > >> >> Thanks,
> > >> >>
> > >> >> --
> > >> >> Raul
> > >> >>
> > >> >>
> > >> >> On Wed, Sep 10, 2014 at 7:26 PM, bill lam <[email protected]>
> wrote:
> > >> >> > for zlib, all huffman code of the same bit length are in
> sequential,
> > >> >> > starting with a code an 0 bit in the end (even number), and no
> gaps in the
> > >> >> > block (consecutive). the symbols of these block represented are
> sequential
> > >> >> > but not consecutive (possibly gaps). So the zlib huffman code is
> slightly
> > >> >> > less efficient than the original huffman code but the advantage
> is simpler
> > >> >> > to store the table, just the bits used. section 3.2.6 gives an
> example of
> > >> >> > such table.
> > >> >> >
> > >> >> > deflatecodes can satisfy the 2 rules _but_ its result is invalid
> because
> > >> >> > huffman code is a prefix coding.
> > >> >> >
> > >> >> >    ,.F1 ( deflatecodes)  A1
> > >> >> > ┌─────────────┐
> > >> >> > │0            │
> > >> >> > ├─────────────┤
> > >> >> > │1 1 0        │
> > >> >> > ├─────────────┤
> > >> >> > │1 1 1 0 0 1 0│
> > >> >> > ├─────────────┤
> > >> >> > │1 1 1 0 0 1 1│
> > >> >> > ├─────────────┤
> > >> >> > │1 1 1 0 0 0  │
> > >> >> > ├─────────────┤
> > >> >> > │1 1 1 0 0 1  │
> > >> >> > ├─────────────┤
> > >> >> > │1 1 1 0 1 0  │
> > >> >> > ├─────────────┤
> > >> >> > │1 1 1 0 1 1  │
> > >> >> > ├─────────────┤
> > >> >> > │1 1 1 1 0 0  │
> > >> >> > ├─────────────┤
> > >> >> > │1 1 1 1 0 1  │
> > >> >> > ├─────────────┤
> > >> >> > │1 1 1 1 1 0  │
> > >> >> > ├─────────────┤
> > >> >> > │1 0          │
> > >> >> > └─────────────┘
> > >> >> >   7 bit 1110010 is a prefix of 6 bit 111001 and this is illegal.
> > >> >> > Instead 7 bit code block should start with
> > >> >> > #.inv 2b11100 +  7
> > >> >> > 1 1 1 1 1 1 0
> > >> >> > because there are 6 symbols in the 6 bit code.  (algorithm given
> in section
> > >> >> > 3.2.2).
> > >> >> >
> > >> >> > Any idea to fix deflatecodes so that it can give valid huffman
> codes?
> > >> >> > Thanks.
> > >> >> > On Sep 11, 2014 1:01 AM, "Raul Miller" <[email protected]>
> wrote:
> > >> >> >
> > >> >> >> I think the use of the term "consecutive" rather than
> "sequential" is
> > >> >> >> telling.
> > >> >> >>
> > >> >> >> The described algorithm is: compute the huffman code lengths:
> > >> >> >>    #@>F1 hcodes A1
> > >> >> >> 1 3 7 7 6 6 6 6 6 6 6 2
> > >> >> >>
> > >> >> >> Then assign ascending huffman codes first in length order and
> then
> > >> >> >> within codes of the same length.
> > >> >> >>
> > >> >> >> Taken literally, that might be something like this:
> > >> >> >>
> > >> >> >> H=: 4 :0
> > >> >> >>   L=.#@> x hcodes y
> > >> >> >>   U=.~.L
> > >> >> >>   ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
> > >> >> >> )
> > >> >> >>
> > >> >> >>    ":@>F1 H A1
> > >> >> >> 0
> > >> >> >> 1 1 0
> > >> >> >> 1 1 1 0 0 1 0
> > >> >> >> 1 1 1 0 0 1 1
> > >> >> >> 1 1 1 0 0 0
> > >> >> >> 1 1 1 0 0 1
> > >> >> >> 1 1 1 0 1 0
> > >> >> >> 1 1 1 0 1 1
> > >> >> >> 1 1 1 1 0 0
> > >> >> >> 1 1 1 1 0 1
> > >> >> >> 1 1 1 1 1 0
> > >> >> >> 1 0
> > >> >> >>
> > >> >> >> But is this correct? Is it actually safe to leave the results
> like
> > >> >> >> this - with all codes of the same length being consecutive to
> each
> > >> >> >> other?
> > >> >> >>
> > >> >> >>    F (hcodes -:&:(#@>) H) A
> > >> >> >> 0
> > >> >> >>
> > >> >> >> No.
> > >> >> >>
> > >> >> >> So... "consecutive" must refer only to the values used and not
> their
> > >> >> >> order within the result.
> > >> >> >>
> > >> >> >> Perhaps something like this:
> > >> >> >>
> > >> >> >> deflatecodes=:4 :0
> > >> >> >>   L=.#@> x hcodes y
> > >> >> >>   U=.~.L
> > >> >> >>   R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
> > >> >> >>   R/:;(</. i.@#)L
> > >> >> >> )
> > >> >> >>
> > >> >> >>    F (hcodes -:&:(#@>) deflatecodes)  A
> > >> >> >> 1
> > >> >> >>
> > >> >> >> There should be a better way of doing this, but this should at
> least
> > >> >> >> get you started.
> > >> >> >>
> > >> >> >> Thanks,
> > >> >> >>
> > >> >> >> --
> > >> >> >> Raul
> > >> >> >>
> > >> >> >>
> > >> >> >> On Wed, Sep 10, 2014 at 10:45 AM, bill lam <[email protected]>
> wrote:
> > >> >> >> > For huffman coding used in zlib:
> > >> >> >> > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2.
> > >> >> >> >
> > >> >> >> >  The Huffman codes used for each alphabet in the "deflate"
> > >> >> >> >  format have two additional rules:
> > >> >> >> >
> > >> >> >> >   * All codes of a given bit length have lexicographically
> > >> >> >> >   consecutive values, in the same order as the symbols
> > >> >> >> >   they represent;
> > >> >> >> >
> > >> >> >> >   * Shorter codes lexicographically precede longer codes.
> > >> >> >> > I tried jwiki hcodes in
> > >> >> >> > I try Roger's essay
> > >> >> >> > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding
> > >> >> >> >
> > >> >> >> > hc=: 4 : 0
> > >> >> >> > if. 1=#x do. y
> > >> >> >> > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=. 2{./:x
> end.
> > >> >> >> > )
> > >> >> >> >
> > >> >> >> > hcodes=: 4 : 0
> > >> >> >> > assert. x -:&$ y           NB. weights and words have same
> shape
> > >> >> >> > assert. (0<:x) *. 1=#$x    NB. weights are non-negative
> > >> >> >> > assert. 1 >: L.y           NB. words are boxed not more than
> once
> > >> >> >> > w=. ,&.> y                 NB. standardized words
> > >> >> >> > assert. w -: ~.w           NB. words are unique
> > >> >> >> > t=. 0 {:: x hc w           NB. minimal weight binary tree
> > >> >> >> > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t
> > >> >> >> > )
> > >> >> >> >
> > >> >> >> > but the coding produced is malformed for zlib. eg,
> > >> >> >> > this is what I ran into trouble
> > >> >> >> >
> > >> >> >> > f1=: 1 256 17 1 1 9 1
> > >> >> >> > f2=: 2 1 0 1 255 0 1536
> > >> >> >> > F=: ,/(f1#f2)
> > >> >> >> > A=: i.286
> > >> >> >> >
> > >> >> >> > F hcodes A
> > >> >> >> >
> > >> >> >> > Or a shorter example
> > >> >> >> >
> > >> >> >> > A1=: i.12
> > >> >> >> > F1=: 2 1 0 0 0 0 0 0 0 0 0 1
> > >> >> >> >
> > >> >> >> > F1 hcodes A1
> > >> >> >> >
> > >> >> >> > Any idea?
> > >> >> >> >
> > >> >> >> > --
> > >> >> >> > regards,
> > >> >> >> > ====================================================
> > >> >> >> > GPG key 1024D/4434BAB3 2008-08-24
> > >> >> >> > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > >> >> >> > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> > >> >> >> >
> ----------------------------------------------------------------------
> > >> >> >> > For information about J forums see
> http://www.jsoftware.com/forums.htm
> > >> >> >>
> ----------------------------------------------------------------------
> > >> >> >> For information about J forums see
> http://www.jsoftware.com/forums.htm
> > >> >> >>
> > >> >> >
> ----------------------------------------------------------------------
> > >> >> > For information about J forums see
> http://www.jsoftware.com/forums.htm
> > >> >>
> ----------------------------------------------------------------------
> > >> >> For information about J forums see
> http://www.jsoftware.com/forums.htm
> > >> >
> > >> > --
> > >> > regards,
> > >> > ====================================================
> > >> > GPG key 1024D/4434BAB3 2008-08-24
> > >> > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > >> > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> > >> >
> ----------------------------------------------------------------------
> > >> > For information about J forums see
> http://www.jsoftware.com/forums.htm
> > >> ----------------------------------------------------------------------
> > >> For information about J forums see
> http://www.jsoftware.com/forums.htm
> > >
> > > --
> > > regards,
> > > ====================================================
> > > GPG key 1024D/4434BAB3 2008-08-24
> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
>
> --
> regards,
> ====================================================
> GPG key 1024D/4434BAB3 2008-08-24
> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to