I think the prefix coding looks OK, but the 2 rules does not:
I modified the code[1] to allow passing in a string and outputting the codes
C:\temp>deflate ABCDEFGHIJKLMONPQRSTUVWXYZ
code 65 : 0 00000000000000000000000000000000
code 66 : 6 00000000000000000000000000000110
code 67 : 8 00000000000000000000000000001000
code 68 : 4 00000000000000000000000000000100
code 69 : 22 00000000000000000000000000010110
code 70 : 14 00000000000000000000000000001110
code 71 : 30 00000000000000000000000000011110
code 72 : 1 00000000000000000000000000000001
code 73 : 17 00000000000000000000000000010001
code 74 : 12 00000000000000000000000000001100
code 75 : 9 00000000000000000000000000001001
code 76 : 25 00000000000000000000000000011001
code 77 : 5 00000000000000000000000000000101
code 78 : 21 00000000000000000000000000010101
code 79 : 13 00000000000000000000000000001101
code 80 : 29 00000000000000000000000000011101
code 81 : 3 00000000000000000000000000000011
code 82 : 19 00000000000000000000000000010011
code 83 : 11 00000000000000000000000000001011
code 84 : 27 00000000000000000000000000011011
code 85 : 7 00000000000000000000000000000111
code 86 : 23 00000000000000000000000000010111
code 87 : 15 00000000000000000000000000001111
code 88 : 31 00000000000000000000000000011111
code 89 : 2 00000000000000000000000000000010
code 90 : 10 00000000000000000000000000001010
I think it violates the consecutive rule... Each letter has the same
frequency. ABCD have the same bit length. The order is off:
If I sort it lexographically using javascript:
JSON.stringify([['a','00000000000000000000000000000000'],
['b','00000000000000000000000000000110'],
['c','00000000000000000000000000001000'],
['d','00000000000000000000000000000100']].sort(function(x,y) { return
x[1] - y[1] }))
"[["a","00000000000000000000000000000000"],["d","00000000000000000000000000000100"],["b","00000000000000000000000000000110"],["c","00000000000000000000000000001000"]]"
As you can see, the order comes out a,d,b,c
I played around with a javascript implementation, pako[2]. It seems to
work correctly:
As you can see, it sorts lexographically
JSON.stringify([[65,'1000001'],
[66,'1000010'],
[67,'1000011'],
[68,'1000100'],
[69,'1000101'],
[70,'1000110'],
[71,'1000111'],
[72,'1001000'],
[73,'1001001'],
[74,'1001010'],
[75,'1001011'],
[76,'1001100'],
[77,'1001101'],
[78,'1001110'],
[79,'1001111'],
[80,'1010000'],
[81,'1010001'],
[82,'1010010'],
[83,'1010011'],
[84,'1010100'],
[85,'1010101'],
[86,'1010110'],
[87,'1010111'],
[88,'1011000'],
[89,'1011001'],
[90,'1011010']].sort(function(x,y) { return x[1] - y[1] }))
"[[65,"1000001"],[66,"1000010"],[67,"1000011"],[68,"1000100"],[69,"1000101"],[70,"1000110"],[71,"1000111"],[72,"1001000"],[73,"1001001"],[74,"1001010"],[75,"1001011"],[76,"1001100"],[77,"1001101"],[78,"1001110"],[79,"1001111"],[80,"1010000"],[81,"1010001"],[82,"1010010"],[83,"1010011"],[84,"1010100"],[85,"1010101"],[86,"1010110"],[87,"1010111"],[88,"1011000"],[89,"1011001"],[90,"1011010"]]"
All the values are sorted correctly.
Here it is with the same ABCD example:
var pako = require('pako');
var binaryString = pako.deflate('ABCD', { to: 'string' });
console.log(binaryString);
var restored = pako.inflate(binaryString, { to: 'string' });
console.log(restored);
It successfully deflates and inflates itself
x?♣A☺☺ ? mcÿ7♣A♫☻?☺♂
ABCD
Hope this helps...
[1] - https://gist.github.com/joebo/a3c2932f0e5a7a0c3f07#file-deflate-c-L2613
[2] - https://rawgit.com/nodeca/pako/master/dist/pako.js
On Thu, Sep 11, 2014 at 11:33 AM, bill lam <[email protected]> wrote:
> This is strange since every author must had decode its own encoded
> data as a smoke test.
>
> Did you test if huffman code or bit lengths it produced was
> correct or not, ie it is a prefix coding and it satisfy the 2
> rules in rfc.
>
> Чт, 11 сен 2014, Joe Bogner написал(а):
>> unfortunately the dynamic coding in the putty fork doesn't seem to work:
>>
>> deflate -c deflate.c > out
>> deflate -d out
>>
>> decoding error: incorrect data checksum
>>
>>
>> it works fine with static tables
>>
>> C:\temp>echo ABCD > ABCD
>>
>> C:\temp>deflate -c ABCD > out
>>
>> C:\temp>deflate -d out
>> ABCD
>>
>> I added some debugging code to determine that deflating deflate.c
>> would be a dynamic table... Assuming it's broke, I probably wouldn't
>> use it as a reference implementation after all
>>
>> On Thu, Sep 11, 2014 at 3:45 AM, bill lam <[email protected]> wrote:
>> > the frequencies (guessing from bit lengths) should be something like 2 3 1
>> > 1
>> > (2 3 1 1) hcodes 'ABCD'
>> >
>> > the hard part is the inverse problem: how to get the huffman code with
>> > prior knowing the bits for each symbol. Your pointer to the putty
>> > fork looks like helpful. The comment is in lines 861 to 914, the code
>> > itself in line 915 to 964. Do you know how to express it in J?
>> > Thanks.
>> >
>> > On Thu, Sep 11, 2014 at 2:57 PM, Joe Bogner <[email protected]> wrote:
>> >> Here a few other links ... after reading through the RFC. Not sure if
>> >> they help, but just sharing from my own research into assisting on
>> >> this topic
>> >>
>> >> https://github.com/evegard/pngview/blob/master/huffman.c#L54
>> >>
>> >> And a fork of the putty version with dynamic huffman coding:
>> >> http://rc.quest.com/viewvc/putty/trunk/halibut/deflate.c?diff_format=s&revision=2&view=markup
>> >>
>> >> Or just generally googling some of the code from the RFC:
>> >> https://www.google.com/search?q=next_code%5Blen%5D%2B%2B%3B&oq=next_code%5Blen%5D%2B%2B%3B&aqs=chrome..69i57.387j0j7&sourceid=chrome&es_sm=93&ie=UTF-8#q=next_code%5Blen%5D%2B%2B%3B&start=20
>> >>
>> >>
>> >> Using the code from
>> >> http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding, I got stuck
>> >> trying to match a simple example to the binary tree in the RFC:
>> >>
>> >> From the RFC:
>> >>
>> >> /\ Symbol Code
>> >> 0 1 ------ ----
>> >> / \ A 00
>> >> /\ B B 1
>> >> 0 1 C 011
>> >> / \ D 010
>> >> A /\
>> >> 0 1
>> >> / \
>> >> D C
>> >>
>> >>
>> >>
>> >> (4#1) hcodes 'ABCD'
>> >> ┌───┬───┬───┬───┐
>> >> │0 0│0 1│1 0│1 1│
>> >> └───┴───┴───┴───┘
>> >>
>> >> Per the RFC, ideally that should match this? '00';'1';'011';'010'
>> >>
>> >>
>> >> From there, it seems like a pretty straightforward exercise to
>> >> transliterate the C code from the RFC into J code to recode the
>> >> example to:
>> >>
>> >>
>> >> Symbol Code
>> >> ------ ----
>> >> A 10
>> >> B 0
>> >> C 110
>> >> D 111
>> >>
>> >>
>> >> I would probably start with a looping construct like what's in the RFC
>> >> and then figure out a more J way to do it, but first I would need to
>> >> figure out how to create the binary tree in that initial format.
>> >>
>> >> On Wed, Sep 10, 2014 at 7:41 PM, bill lam <[email protected]> wrote:
>> >>> Thanks Joe,
>> >>> putty only use zlib static huffman for encoding so that it does not build
>> >>> any huffman dictionary table.
>> >>>
>> >>> The zlib static huffman code does not care about individual symbol's
>> >>> frequency, it just encode 0 to 286 into bits, see section 3.2.6.
>> >>> On Sep 11, 2014 1:26 AM, "Joe Bogner" <[email protected]> wrote:
>> >>>
>> >>>> You've already likely considered this, but if it were me I would compare
>> >>>> results to a working implementation. The one from putty seems pretty
>> >>>> clean
>> >>>> and standalone:
>> >>>> https://raw.githubusercontent.com/grumpydev/PortablePuTTY/master/SSHZLIB.C
>> >>>> . I was able to compile it on windows no problem and I assume it'd be
>> >>>> fine
>> >>>> on linux as well.
>> >>>>
>> >>>> On Wed, Sep 10, 2014 at 1:00 PM, Raul Miller <[email protected]>
>> >>>> wrote:
>> >>>>
>> >>>> > I think the use of the term "consecutive" rather than "sequential" is
>> >>>> > telling.
>> >>>> >
>> >>>> > The described algorithm is: compute the huffman code lengths:
>> >>>> > #@>F1 hcodes A1
>> >>>> > 1 3 7 7 6 6 6 6 6 6 6 2
>> >>>> >
>> >>>> > Then assign ascending huffman codes first in length order and then
>> >>>> > within codes of the same length.
>> >>>> >
>> >>>> > Taken literally, that might be something like this:
>> >>>> >
>> >>>> > H=: 4 :0
>> >>>> > L=.#@> x hcodes y
>> >>>> > U=.~.L
>> >>>> > ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
>> >>>> > )
>> >>>> >
>> >>>> > ":@>F1 H A1
>> >>>> > 0
>> >>>> > 1 1 0
>> >>>> > 1 1 1 0 0 1 0
>> >>>> > 1 1 1 0 0 1 1
>> >>>> > 1 1 1 0 0 0
>> >>>> > 1 1 1 0 0 1
>> >>>> > 1 1 1 0 1 0
>> >>>> > 1 1 1 0 1 1
>> >>>> > 1 1 1 1 0 0
>> >>>> > 1 1 1 1 0 1
>> >>>> > 1 1 1 1 1 0
>> >>>> > 1 0
>> >>>> >
>> >>>> > But is this correct? Is it actually safe to leave the results like
>> >>>> > this - with all codes of the same length being consecutive to each
>> >>>> > other?
>> >>>> >
>> >>>> > F (hcodes -:&:(#@>) H) A
>> >>>> > 0
>> >>>> >
>> >>>> > No.
>> >>>> >
>> >>>> > So... "consecutive" must refer only to the values used and not their
>> >>>> > order within the result.
>> >>>> >
>> >>>> > Perhaps something like this:
>> >>>> >
>> >>>> > deflatecodes=:4 :0
>> >>>> > L=.#@> x hcodes y
>> >>>> > U=.~.L
>> >>>> > R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
>> >>>> > R/:;(</. i.@#)L
>> >>>> > )
>> >>>> >
>> >>>> > F (hcodes -:&:(#@>) deflatecodes) A
>> >>>> > 1
>> >>>> >
>> >>>> > There should be a better way of doing this, but this should at least
>> >>>> > get you started.
>> >>>> >
>> >>>> > Thanks,
>> >>>> >
>> >>>> > --
>> >>>> > Raul
>> >>>> >
>> >>>> >
>> >>>> > On Wed, Sep 10, 2014 at 10:45 AM, bill lam <[email protected]>
>> >>>> > wrote:
>> >>>> > > For huffman coding used in zlib:
>> >>>> > > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2.
>> >>>> > >
>> >>>> > > The Huffman codes used for each alphabet in the "deflate"
>> >>>> > > format have two additional rules:
>> >>>> > >
>> >>>> > > * All codes of a given bit length have lexicographically
>> >>>> > > consecutive values, in the same order as the symbols
>> >>>> > > they represent;
>> >>>> > >
>> >>>> > > * Shorter codes lexicographically precede longer codes.
>> >>>> > > I tried jwiki hcodes in
>> >>>> > > I try Roger's essay
>> >>>> > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding
>> >>>> > >
>> >>>> > > hc=: 4 : 0
>> >>>> > > if. 1=#x do. y
>> >>>> > > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=. 2{./:x end.
>> >>>> > > )
>> >>>> > >
>> >>>> > > hcodes=: 4 : 0
>> >>>> > > assert. x -:&$ y NB. weights and words have same shape
>> >>>> > > assert. (0<:x) *. 1=#$x NB. weights are non-negative
>> >>>> > > assert. 1 >: L.y NB. words are boxed not more than once
>> >>>> > > w=. ,&.> y NB. standardized words
>> >>>> > > assert. w -: ~.w NB. words are unique
>> >>>> > > t=. 0 {:: x hc w NB. minimal weight binary tree
>> >>>> > > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t
>> >>>> > > )
>> >>>> > >
>> >>>> > > but the coding produced is malformed for zlib. eg,
>> >>>> > > this is what I ran into trouble
>> >>>> > >
>> >>>> > > f1=: 1 256 17 1 1 9 1
>> >>>> > > f2=: 2 1 0 1 255 0 1536
>> >>>> > > F=: ,/(f1#f2)
>> >>>> > > A=: i.286
>> >>>> > >
>> >>>> > > F hcodes A
>> >>>> > >
>> >>>> > > Or a shorter example
>> >>>> > >
>> >>>> > > A1=: i.12
>> >>>> > > F1=: 2 1 0 0 0 0 0 0 0 0 0 1
>> >>>> > >
>> >>>> > > F1 hcodes A1
>> >>>> > >
>> >>>> > > Any idea?
>> >>>> > >
>> >>>> > > --
>> >>>> > > regards,
>> >>>> > > ====================================================
>> >>>> > > GPG key 1024D/4434BAB3 2008-08-24
>> >>>> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
>> >>>> > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
>> >>>> > > ----------------------------------------------------------------------
>> >>>> > > For information about J forums see
>> >>>> > > http://www.jsoftware.com/forums.htm
>> >>>> > ----------------------------------------------------------------------
>> >>>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> >>>> >
>> >>>> ----------------------------------------------------------------------
>> >>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> >>>>
>> >>> ----------------------------------------------------------------------
>> >>> For information about J forums see http://www.jsoftware.com/forums.htm
>> >> ----------------------------------------------------------------------
>> >> For information about J forums see http://www.jsoftware.com/forums.htm
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>
> --
> regards,
> ====================================================
> GPG key 1024D/4434BAB3 2008-08-24
> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm