I agree that there should not be any 2s.

However, I think this means that there's something wrong with the
description of the algorithm, or at least my understanding of it.

I'll need some time to digest this.

Thanks,

-- 
Raul

On Wed, Sep 10, 2014 at 9:39 PM, bill lam <[email protected]> wrote:
> pardon me of forgetting telling another cavaet.
> zlib huffman code is suboptimal so that bit length of code
> for each symbol can be longer than that in classic huffman.
> eg in my orignal data.
>
>    10{. ":@>F deflatecodes2 A
> 1 1 0 0 0 0 0 0 0 0
> 1 1 1 1 1 1 1 1 1 1 0
> 1 1 1 1 1 1 1 1 1 1 1
> 1 1 1 1 1 1 1 1 1 2 0
> 1 1 1 1 1 1 1 1 1 2 1
> 1 1 1 1 1 1 1 1 2 1 0
> 1 1 0 0 0 0 0 0 0 1
> 1 1 0 0 0 0 0 0 1 0
> 1 1 0 0 0 0 0 0 1 1
> 1 1 0 0 0 0 0 1 0 0
>
> there shouldn't be any 2's.  It needs to increase bit length of
> some code.
>
> Ср, 10 сен 2014, Raul Miller написал(а):
>> Oops, sorry about that.
>>
>> Try it this way:
>>
>> deflatecodes2=:4 :0
>>   L=. #@> x hcodes y
>>   U=. 0,~.L
>>   R=. ;<@(({. >./@(>#])&U #1:)@{. <@:+"1-@{.{."1 #:@i.@#)/.~L
>>   R/:;(</. i.@#)L
>> )
>>
>>    ":@>F1 deflatecodes2 A1
>> 0
>> 1 1 0
>> 1 1 1 1 1 1 0
>> 1 1 1 1 1 1 1
>> 1 1 1 0 0 0
>> 1 1 1 0 0 1
>> 1 1 1 0 1 0
>> 1 1 1 0 1 1
>> 1 1 1 1 0 0
>> 1 1 1 1 0 1
>> 1 1 1 1 1 0
>> 1 0
>>
>> Thanks,
>>
>> --
>> Raul
>>
>>
>> On Wed, Sep 10, 2014 at 7:26 PM, bill lam <[email protected]> wrote:
>> > for zlib, all huffman code of the same bit length are in sequential,
>> > starting with a code an 0 bit in the end (even number), and no gaps in the
>> > block (consecutive). the symbols of these block represented are sequential
>> > but not consecutive (possibly gaps). So the zlib huffman code is slightly
>> > less efficient than the original huffman code but the advantage is simpler
>> > to store the table, just the bits used. section 3.2.6 gives an example of
>> > such table.
>> >
>> > deflatecodes can satisfy the 2 rules _but_ its result is invalid because
>> > huffman code is a prefix coding.
>> >
>> >    ,.F1 ( deflatecodes)  A1
>> > ┌─────────────┐
>> > │0            │
>> > ├─────────────┤
>> > │1 1 0        │
>> > ├─────────────┤
>> > │1 1 1 0 0 1 0│
>> > ├─────────────┤
>> > │1 1 1 0 0 1 1│
>> > ├─────────────┤
>> > │1 1 1 0 0 0  │
>> > ├─────────────┤
>> > │1 1 1 0 0 1  │
>> > ├─────────────┤
>> > │1 1 1 0 1 0  │
>> > ├─────────────┤
>> > │1 1 1 0 1 1  │
>> > ├─────────────┤
>> > │1 1 1 1 0 0  │
>> > ├─────────────┤
>> > │1 1 1 1 0 1  │
>> > ├─────────────┤
>> > │1 1 1 1 1 0  │
>> > ├─────────────┤
>> > │1 0          │
>> > └─────────────┘
>> >   7 bit 1110010 is a prefix of 6 bit 111001 and this is illegal.
>> > Instead 7 bit code block should start with
>> > #.inv 2b11100 +  7
>> > 1 1 1 1 1 1 0
>> > because there are 6 symbols in the 6 bit code.  (algorithm given in section
>> > 3.2.2).
>> >
>> > Any idea to fix deflatecodes so that it can give valid huffman codes?
>> > Thanks.
>> > On Sep 11, 2014 1:01 AM, "Raul Miller" <[email protected]> wrote:
>> >
>> >> I think the use of the term "consecutive" rather than "sequential" is
>> >> telling.
>> >>
>> >> The described algorithm is: compute the huffman code lengths:
>> >>    #@>F1 hcodes A1
>> >> 1 3 7 7 6 6 6 6 6 6 6 2
>> >>
>> >> Then assign ascending huffman codes first in length order and then
>> >> within codes of the same length.
>> >>
>> >> Taken literally, that might be something like this:
>> >>
>> >> H=: 4 :0
>> >>   L=.#@> x hcodes y
>> >>   U=.~.L
>> >>   ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
>> >> )
>> >>
>> >>    ":@>F1 H A1
>> >> 0
>> >> 1 1 0
>> >> 1 1 1 0 0 1 0
>> >> 1 1 1 0 0 1 1
>> >> 1 1 1 0 0 0
>> >> 1 1 1 0 0 1
>> >> 1 1 1 0 1 0
>> >> 1 1 1 0 1 1
>> >> 1 1 1 1 0 0
>> >> 1 1 1 1 0 1
>> >> 1 1 1 1 1 0
>> >> 1 0
>> >>
>> >> But is this correct? Is it actually safe to leave the results like
>> >> this - with all codes of the same length being consecutive to each
>> >> other?
>> >>
>> >>    F (hcodes -:&:(#@>) H) A
>> >> 0
>> >>
>> >> No.
>> >>
>> >> So... "consecutive" must refer only to the values used and not their
>> >> order within the result.
>> >>
>> >> Perhaps something like this:
>> >>
>> >> deflatecodes=:4 :0
>> >>   L=.#@> x hcodes y
>> >>   U=.~.L
>> >>   R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
>> >>   R/:;(</. i.@#)L
>> >> )
>> >>
>> >>    F (hcodes -:&:(#@>) deflatecodes)  A
>> >> 1
>> >>
>> >> There should be a better way of doing this, but this should at least
>> >> get you started.
>> >>
>> >> Thanks,
>> >>
>> >> --
>> >> Raul
>> >>
>> >>
>> >> On Wed, Sep 10, 2014 at 10:45 AM, bill lam <[email protected]> wrote:
>> >> > For huffman coding used in zlib:
>> >> > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2.
>> >> >
>> >> >  The Huffman codes used for each alphabet in the "deflate"
>> >> >  format have two additional rules:
>> >> >
>> >> >   * All codes of a given bit length have lexicographically
>> >> >   consecutive values, in the same order as the symbols
>> >> >   they represent;
>> >> >
>> >> >   * Shorter codes lexicographically precede longer codes.
>> >> > I tried jwiki hcodes in
>> >> > I try Roger's essay
>> >> > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding
>> >> >
>> >> > hc=: 4 : 0
>> >> > if. 1=#x do. y
>> >> > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=. 2{./:x end.
>> >> > )
>> >> >
>> >> > hcodes=: 4 : 0
>> >> > assert. x -:&$ y           NB. weights and words have same shape
>> >> > assert. (0<:x) *. 1=#$x    NB. weights are non-negative
>> >> > assert. 1 >: L.y           NB. words are boxed not more than once
>> >> > w=. ,&.> y                 NB. standardized words
>> >> > assert. w -: ~.w           NB. words are unique
>> >> > t=. 0 {:: x hc w           NB. minimal weight binary tree
>> >> > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t
>> >> > )
>> >> >
>> >> > but the coding produced is malformed for zlib. eg,
>> >> > this is what I ran into trouble
>> >> >
>> >> > f1=: 1 256 17 1 1 9 1
>> >> > f2=: 2 1 0 1 255 0 1536
>> >> > F=: ,/(f1#f2)
>> >> > A=: i.286
>> >> >
>> >> > F hcodes A
>> >> >
>> >> > Or a shorter example
>> >> >
>> >> > A1=: i.12
>> >> > F1=: 2 1 0 0 0 0 0 0 0 0 0 1
>> >> >
>> >> > F1 hcodes A1
>> >> >
>> >> > Any idea?
>> >> >
>> >> > --
>> >> > regards,
>> >> > ====================================================
>> >> > GPG key 1024D/4434BAB3 2008-08-24
>> >> > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
>> >> > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
>> >> > ----------------------------------------------------------------------
>> >> > For information about J forums see http://www.jsoftware.com/forums.htm
>> >> ----------------------------------------------------------------------
>> >> For information about J forums see http://www.jsoftware.com/forums.htm
>> >>
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>
> --
> regards,
> ====================================================
> GPG key 1024D/4434BAB3 2008-08-24
> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to