I seem to be missing a definition. tree=: makeFromLengths (3, 3, 3, 3, 3, 2, 4, 4) |value error: blct | prevCT=.(bll i.prevJ)}( blct,0) Linda
-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Joe Bogner Sent: Friday, September 12, 2014 8:49 AM To: [email protected] Subject: Re: [Jprogramming] zlib huffman coding bill, here is a crude algorithm that seems to replicate the logic from section 3.2.2 It also replicates the resultfrom the rfc in TEST1 MAXBITLEN =: 15 makeFromLengths =: 3 : 0 NB. 1) Count the number of codes for each code length 'bll blct' =: |: ({.,#)/.~ /:~ y NB. 2) Find the numerical value of the smallest code for each code length: bls =: (MAXBITLEN+1) # 0 for_j. 1+i. MAXBITLEN do. prevJ =. <: j prevBL=. prevJ } bls prevCT =. (bll i. prevJ) } (blct,0) v=. 1 (33 b. ) (prevBL + prevCT) NB. smoutput j;v bls=: v j } bls end. NB. 3) Assign numerical values to all codes, using consecutive NB. values for all codes of the same length with the base NB. values determined at step 2. Codes that are never used NB. (which have a bit length of zero) must not be assigned a NB. value. out =. (#arr) # 0 for_j. i. # y do. bl=:(j}y) if. bl > 0 do. v=. bl } bls NB. smoutput v out=. v j } out bls=. (>: v) bl } bls end. end. out ) NB. TEST - from RFC tree=: makeFromLengths (3, 3, 3, 3, 3, 2, 4, 4) 0 : 0 Step 3 produces the following code values: Symbol Length Code ------ ------ ---- A 3 010 B 3 011 C 3 100 D 3 101 E 3 110 F 2 00 G 4 1110 H 4 1111 ) expected=:_4]\ (0 0 1 0),(0 0 1 1),(0 1 0 0),(0 1 0 1), (0 1 1 0), (0 0 0 0), (1 1 1 0), (1 1 1 1) assert (8 {. #: tree) -: expected NB. test 2 - ABCD arr=. 257 # 0 NB. bit lengths for 'ABCD' + end of data marker arr=.2 (65) } arr arr=.2 (66) } arr arr=.2 (67) } arr arr=.3 (68) } arr arr=.3 (256) } arr tree=: makeFromLengths arr assert 1008 -: (65 } tree) assert 1009 -: (66 } tree) assert 1010 -: (67 } tree) assert 2022 -: (68 } tree) assert 2023 -: (256 } tree) Does this help? On Fri, Sep 12, 2014 at 2:40 AM, bill lam <[email protected]> wrote: > Certainly this is correct and that why I asked I had misread. > for any given text, dynamic huffman coding always encoded with > exactly the same table and there is a one to one mapping between bit > length vector and huffman codes. > > I think hcodes is also stable in that it generates the same huffman > codes for a given A and F. But I am not sure if the original huffman > codes can be reconstructed using bit length vector only without > ambiguity. > > On Fri, Sep 12, 2014 at 1:45 PM, Raul Miller <[email protected]> wrote: >> I thought the bit lengths were encoded in the file, not recreated by >> an algorithm in the decoder? >> >> Thanks, >> >> -- >> Raul >> >> On Fri, Sep 12, 2014 at 1:27 AM, bill lam <[email protected]> wrote: >>> I think this matters a lot because output of our encoder will >>> also be decoded by other zlib compliant decoders such as zlib.so >>> itself. Or I had mis-read your question? >>> >>> Пт, 12 сен 2014, Raul Miller написал(а): >>>> If we define >>>> >>>> bitlen=:4 :0 >>>> b=. 0~:x >>>> b #inv #@>(b#x) hcodes b#y >>>> ) >>>> >>>> Does it matter that these numbers are sometimes different from those >>>> used in another implementation? >>>> >>>> Thanks, >>>> >>>> -- >>>> Raul >>>> >>>> >>>> On Fri, Sep 12, 2014 at 12:59 AM, bill lam <[email protected]> wrote: >>>> > Consider an example >>>> > >>>> > 1 0 0 2 1 hcodes_jzlib_ i.5 >>>> > +-----+-------+-------+-+---+ >>>> > |1 1 1|1 1 0 0|1 1 0 1|0|1 0| >>>> > +-----+-------+-------+-+---+ >>>> > #&> 1 0 0 2 1 hcodes_jzlib_ i.5 >>>> > 3 4 4 1 2 >>>> > (0~:1 0 0 2 1)* #&> 1 0 0 2 1 hcodes_jzlib_ i.5 >>>> > 3 0 0 1 2 >>>> > >>>> > classic huffman did not expect a 0 frequency and if there were >>>> > it assigns the longest bit length to them. I think this is >>>> > reasonable because it must have a non-zero bit length to encode >>>> > an entity. For zlib, it seems that there is a third rule: >>>> > For the bit length vector, bit lengths for zero frequency symbols >>>> > are zero. >>>> > >>>> > From discussion here, I think using hcodes directly >>>> > or indirectly for zlib is a wrong direction. >>>> > >>>> > Пт, 12 сен 2014, Raul Miller написал(а): >>>> >> That's what I thought at first, also. >>>> >> >>>> >> But, let's look at the example at >>>> >> http://www.jsoftware.com/pipermail/programming/2014-September/039299.html >>>> >> >>>> >> and the bit widths given at >>>> >> http://www.jsoftware.com/pipermail/programming/2014-September/039327.html >>>> >> >>>> >> Here's how it looks to me: >>>> >> >>>> >> bits -:#@>F hcodes A >>>> >> 0 >>>> >> >>>> >> Now.. is this a problem? >>>> >> >>>> >> I think it is. Consider: >>>> >> >>>> >> #0 -.~#@>F hcodes A >>>> >> 286 >>>> >> #0 -.~bits >>>> >> 260 >>>> >> >>>> >> Incidentally, I found a bug in my code, while trying to understand and >>>> >> express this concept. >>>> >> >>>> >> Fixed version here: >>>> >> >>>> >> bl_count=:3 :0 NB. y is result of freqs >>>> >> 0,}.<:#/.~(,~ [: i. 1 + >./)y >>>> >> ) >>>> >> >>>> >> start_vals=: +:@+/\.&.|.@}:@,~&0 >>>> >> >>>> >> find_codes=:3 :0 NB. y is result of freqs >>>> >> b=. bl_count y >>>> >> v=. start_vals b >>>> >> n=. /:~ ~.y-.0 >>>> >> o=. ;({./.~ /:~ (</. i.@#)) y-.0 >>>> >> c=. ;<"1&.>n (([#2:) #: ])&.> (*b)#v+&.>i.&.>b >>>> >> c /: o >>>> >> ) >>>> >> >>>> >> An alternate version of the result from find_codes would be given by: >>>> >> >>>> >> def_code=:3 :0 >>>> >> b=. bl_count y >>>> >> v=. start_vals b >>>> >> n=. /:~ ~.y-.0 >>>> >> o=. ;({./.~ /:~ (</. i.@#)) y-.0 >>>> >> c=. ;n,.&.>(*b)#v+&.>i.&.>b >>>> >> (,. i.@#)c /: o >>>> >> ) >>>> >> >>>> >> Thanks, >>>> >> >>>> >> -- >>>> >> Raul >>>> >> >>>> >> >>>> >> On Thu, Sep 11, 2014 at 8:33 PM, Joe Bogner <[email protected]> wrote: >>>> >> > The bit widths are calculated from the huffman tree >>>> >> > >>>> >> > See >>>> >> > >>>> >> > http://stackoverflow.com/questions/759707/efficient-way-of-storing-huffman-tree >>>> >> > >>>> >> > http://www.siggraph.org/education/materials/HyperGraph/video/mpeg/mpegfaq/huffman_tutorial.html >>>> >> > >>>> >> > The timing is interesting considering we were talking about trees the >>>> >> > other >>>> >> > day: >>>> >> > http://jsoftware.2058.n7.nabble.com/Ragged-Array-Shapes-are-Trees-td63207.html >>>> >> > >>>> >> > I was thinking to myself then how I hadn't used trees more than a few >>>> >> > times >>>> >> > in 18 years of programming. >>>> >> > >>>> >> > I am not sure how to apply your code to the problem. I also am not >>>> >> > completely sure what problem we are solving. If it is creating a >>>> >> > standalone J deflate implementation or PNG compression it may be a >>>> >> > tall >>>> >> > order. I would be curious why not just interface to a C library like >>>> >> > what >>>> >> > is done in the image3 addon: >>>> >> > http://www.jsoftware.com/jwiki/Addons/media/image3 >>>> >> > On Sep 11, 2014 6:27 PM, "Raul Miller" <[email protected]> wrote: >>>> >> > >>>> >> >> Here's the code I came up with, with Bill's help: >>>> >> >> >>>> >> >> bl_count=:3 :0 NB. y is result of freqs >>>> >> >> 0,}.<:#/.~(,~ [: i. 1 + >./)y >>>> >> >> ) >>>> >> >> >>>> >> >> start_vals=: +:@+/\.&.|.@}:@,~&0 >>>> >> >> >>>> >> >> find_codes=:3 :0 NB. y is result of freqs >>>> >> >> b=. bl_count y >>>> >> >> v=. start_vals b >>>> >> >> n=. /:~ ~.y-.0 >>>> >> >> o=. ;({./.~ /:~ (</. i.@#)) y >>>> >> >> c=. ;<"1&.>n (([#2:) #: ])&.> (*b)#v+&.>i.&.>b >>>> >> >> c /: o >>>> >> >> ) >>>> >> >> >>>> >> >> An alternate version of the result from find_codes would be given by: >>>> >> >> >>>> >> >> def_code=:3 :0 >>>> >> >> b=. bl_count y >>>> >> >> v=. start_vals b >>>> >> >> n=. /:~ ~.y-.0 >>>> >> >> o=. ;({./.~ /:~ (</. i.@#)) y >>>> >> >> c=. ;n,.&.>(*b)#v+&.>i.&.>b >>>> >> >> (,. i.@#)c /: o >>>> >> >> ) >>>> >> >> >>>> >> >> The argument to find_codes or def_code is the bit widths for each >>>> >> >> symbol. >>>> >> >> >>>> >> >> I have not been able to figure out, from rfc 1951, how the bit widths >>>> >> >> are calculated. >>>> >> >> >>>> >> >> Thanks, >>>> >> >> >>>> >> >> -- >>>> >> >> Raul >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> On Thu, Sep 11, 2014 at 4:47 PM, Joe Bogner <[email protected]> >>>> >> >> wrote: >>>> >> >> > bill, I'd be interested in a solution but I don't think I can >>>> >> >> > contribute any more on this. I played with >>>> >> >> > https://code.google.com/p/miniz/ and became even more convinced of >>>> >> >> > the >>>> >> >> > complexity. It seems as though the compressor can decide whether to >>>> >> >> > include the dictionary code table or not -- likely based on the >>>> >> >> > size >>>> >> >> > of the table. >>>> >> >> > >>>> >> >> > >>>> >> >> > http://tools.ietf.org/html/rfc1950 >>>> >> >> > >>>> >> >> > A preset dictionary is specially useful to compress short input >>>> >> >> > sequences. The compressor can take advantage of the >>>> >> >> > dictionary >>>> >> >> > context to encode the input in a more compact manner. >>>> >> >> > >>>> >> >> > >>>> >> >> > More links for anyone who is following and cares to go down the >>>> >> >> > rabbit >>>> >> >> hole too: >>>> >> >> > >>>> >> >> > http://en.wikipedia.org/wiki/Canonical_Huffman_code >>>> >> >> > >>>> >> >> > >>>> >> >> http://stackoverflow.com/questions/759707/efficient-way-of-storing-huffman-tree >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> > On Thu, Sep 11, 2014 at 1:28 PM, bill lam <[email protected]> >>>> >> >> > wrote: >>>> >> >> >> This codes seemed invalid. >>>> >> >> >> >>>> >> >> >> 1 is a prefix of 11 which is a prefix of 111. Suppose there >>>> >> >> >> is a bit pattern of 1 1 , it is ambiguous to mean >>>> >> >> >> [68,'1'] [68,'1'] >>>> >> >> >> or [65,'11'] >>>> >> >> >> >>>> >> >> >> The huffman code in rfc is canonical meaning there is exactly one >>>> >> >> >> possible huffman codes for a given bit length vector. This is >>>> >> >> >> important because the huffman code table itself will not be >>>> >> >> >> stored inside the deflate stream. The decoder only gets the bit >>>> >> >> >> length vector, if encoder and decoder use different huffman code >>>> >> >> >> for the same bit length vectors, it will not work. >>>> >> >> >> >>>> >> >> >> Чт, 11 сен 2014, Joe Bogner написал(а): >>>> >> >> >>> Ignore the pako.js example output... It was just outputting the >>>> >> >> >>> binary >>>> >> >> >>> representation of A-Z, not the huffman code >>>> >> >> >>> >>>> >> >> >>> This is what I meant to send >>>> >> >> >>> >>>> >> >> >>> For ABCD: >>>> >> >> >>> >>>> >> >> >>> [65,'11'], >>>> >> >> >>> [66,'0'], >>>> >> >> >>> [67,'10'], >>>> >> >> >>> [68,'1'], >>>> >> >> >>> [256,'111'] >>>> >> >> >>> >>>> >> >> >>> It still doesn't seem to be sorting correctly lexographically, >>>> >> >> >>> but I'm >>>> >> >> >>> not really in my comfort zone of understanding: >>>> >> >> >>> >>>> >> >> >>> The RFC has this instead: >>>> >> >> >>> >>>> >> >> >>> Symbol Code >>>> >> >> >>> ------ ---- >>>> >> >> >>> A 10 >>>> >> >> >>> B 0 >>>> >> >> >>> C 110 >>>> >> >> >>> D 111 >>>> >> >> >>> >>>> >> >> >>> I don't really know if it has to match the RFC or if each >>>> >> >> >>> implementation is able to do its own thing as long since it >>>> >> >> >>> includes >>>> >> >> >>> the distance/reverse lookup table (whatever it's called). >>>> >> >> >>> >>>> >> >> >>> FYI >>>> >> >> >>> >>>> >> >> >>> >>>> >> >> >>> This is where I inserted my code: >>>> >> >> >>> >>>> >> >> >>> /* >>>> >> >> =========================================================================== >>>> >> >> >>> * Generate the codes for a given tree and bit counts (which >>>> >> >> >>> need not >>>> >> >> be >>>> >> >> >>> * optimal). >>>> >> >> >>> * IN assertion: the array bl_count contains the bit length >>>> >> >> >>> statistics >>>> >> >> for >>>> >> >> >>> * the given tree and the field len is set for all tree elements. >>>> >> >> >>> * OUT assertion: the field code is set for all tree elements of >>>> >> >> >>> non >>>> >> >> >>> * zero code length. >>>> >> >> >>> */ >>>> >> >> >>> function gen_codes(tree, max_code, bl_count) >>>> >> >> >>> // ct_data *tree; /* the tree to decorate */ >>>> >> >> >>> // int max_code; /* largest code with non zero >>>> >> >> frequency */ >>>> >> >> >>> // ushf *bl_count; /* number of codes at each bit >>>> >> >> >>> length >>>> >> >> */ >>>> >> >> >>> { >>>> >> >> >>> var next_code = new Array(MAX_BITS+1); /* next code value for >>>> >> >> >>> each >>>> >> >> >>> bit length */ >>>> >> >> >>> var code = 0; /* running code value */ >>>> >> >> >>> var bits; /* bit index */ >>>> >> >> >>> var n; /* code index */ >>>> >> >> >>> >>>> >> >> >>> /* The distribution counts are first used to generate the code >>>> >> >> >>> values >>>> >> >> >>> * without bit reversal. >>>> >> >> >>> */ >>>> >> >> >>> for (bits = 1; bits <= MAX_BITS; bits++) { >>>> >> >> >>> next_code[bits] = code = (code + bl_count[bits-1]) << 1; >>>> >> >> >>> } >>>> >> >> >>> /* Check that the bit counts in bl_count are consistent. The >>>> >> >> >>> last >>>> >> >> code >>>> >> >> >>> * must be all ones. >>>> >> >> >>> */ >>>> >> >> >>> //Assert (code + bl_count[MAX_BITS]-1 == (1<<MAX_BITS)-1, >>>> >> >> >>> // "inconsistent bit counts"); >>>> >> >> >>> //Tracev((stderr,"\ngen_codes: max_code %d ", max_code)); >>>> >> >> >>> >>>> >> >> >>> for (n = 0; n <= max_code; n++) { >>>> >> >> >>> var len = tree[n*2 + 1]/*.Len*/; >>>> >> >> >>> if (len === 0) { continue; } >>>> >> >> >>> /* Now reverse the bits */ >>>> >> >> >>> tree[n*2]/*.Code*/ = bi_reverse(next_code[len]++, len); >>>> >> >> >>> >>>> >> >> >>> if (tree!=static_ltree) { >>>> >> >> >>> var v = tree[n*2]; >>>> >> >> >>> console.log('[' + n + ",'" + v.toString(2) + "'],"); >>>> >> >> >>> } >>>> >> >> >>> //Tracecv(tree != static_ltree, (stderr,"\nn %3d %c l %2d c >>>> >> >> >>> %4x >>>> >> >> (%x) ", >>>> >> >> >>> // n, (isgraph(n) ? n : ' '), len, tree[n].Code, >>>> >> >> next_code[len]-1)); >>>> >> >> >>> } >>>> >> >> >>> >>>> >> >> >>> } >>>> >> >> >>> >>>> >> >> >>> On Thu, Sep 11, 2014 at 11:55 AM, Joe Bogner >>>> >> >> >>> <[email protected]> >>>> >> >> wrote: >>>> >> >> >>> > I think the prefix coding looks OK, but the 2 rules does not: >>>> >> >> >>> > >>>> >> >> >>> > I modified the code[1] to allow passing in a string and >>>> >> >> >>> > outputting >>>> >> >> the codes >>>> >> >> >>> > >>>> >> >> >>> > C:\temp>deflate ABCDEFGHIJKLMONPQRSTUVWXYZ >>>> >> >> >>> > code 65 : 0 00000000000000000000000000000000 >>>> >> >> >>> > code 66 : 6 00000000000000000000000000000110 >>>> >> >> >>> > code 67 : 8 00000000000000000000000000001000 >>>> >> >> >>> > code 68 : 4 00000000000000000000000000000100 >>>> >> >> >>> > code 69 : 22 00000000000000000000000000010110 >>>> >> >> >>> > code 70 : 14 00000000000000000000000000001110 >>>> >> >> >>> > code 71 : 30 00000000000000000000000000011110 >>>> >> >> >>> > code 72 : 1 00000000000000000000000000000001 >>>> >> >> >>> > code 73 : 17 00000000000000000000000000010001 >>>> >> >> >>> > code 74 : 12 00000000000000000000000000001100 >>>> >> >> >>> > code 75 : 9 00000000000000000000000000001001 >>>> >> >> >>> > code 76 : 25 00000000000000000000000000011001 >>>> >> >> >>> > code 77 : 5 00000000000000000000000000000101 >>>> >> >> >>> > code 78 : 21 00000000000000000000000000010101 >>>> >> >> >>> > code 79 : 13 00000000000000000000000000001101 >>>> >> >> >>> > code 80 : 29 00000000000000000000000000011101 >>>> >> >> >>> > code 81 : 3 00000000000000000000000000000011 >>>> >> >> >>> > code 82 : 19 00000000000000000000000000010011 >>>> >> >> >>> > code 83 : 11 00000000000000000000000000001011 >>>> >> >> >>> > code 84 : 27 00000000000000000000000000011011 >>>> >> >> >>> > code 85 : 7 00000000000000000000000000000111 >>>> >> >> >>> > code 86 : 23 00000000000000000000000000010111 >>>> >> >> >>> > code 87 : 15 00000000000000000000000000001111 >>>> >> >> >>> > code 88 : 31 00000000000000000000000000011111 >>>> >> >> >>> > code 89 : 2 00000000000000000000000000000010 >>>> >> >> >>> > code 90 : 10 00000000000000000000000000001010 >>>> >> >> >>> > >>>> >> >> >>> > >>>> >> >> >>> > I think it violates the consecutive rule... Each letter has >>>> >> >> >>> > the same >>>> >> >> >>> > frequency. ABCD have the same bit length. The order is off: >>>> >> >> >>> > >>>> >> >> >>> > If I sort it lexographically using javascript: >>>> >> >> >>> > >>>> >> >> >>> > JSON.stringify([['a','00000000000000000000000000000000'], >>>> >> >> >>> > ['b','00000000000000000000000000000110'], >>>> >> >> >>> > ['c','00000000000000000000000000001000'], >>>> >> >> >>> > ['d','00000000000000000000000000000100']].sort(function(x,y) { >>>> >> >> >>> > return >>>> >> >> >>> > x[1] - y[1] })) >>>> >> >> >>> > >>>> >> >> >>> > >>>> >> >> "[["a","00000000000000000000000000000000"],["d","00000000000000000000000000000100"],["b","00000000000000000000000000000110"],["c","00000000000000000000000000001000"]]" >>>> >> >> >>> > >>>> >> >> >>> > As you can see, the order comes out a,d,b,c >>>> >> >> >>> > >>>> >> >> >>> > I played around with a javascript implementation, pako[2]. It >>>> >> >> >>> > seems >>>> >> >> to >>>> >> >> >>> > work correctly: >>>> >> >> >>> > >>>> >> >> >>> > As you can see, it sorts lexographically >>>> >> >> >>> > >>>> >> >> >>> > JSON.stringify([[65,'1000001'], >>>> >> >> >>> > [66,'1000010'], >>>> >> >> >>> > [67,'1000011'], >>>> >> >> >>> > [68,'1000100'], >>>> >> >> >>> > [69,'1000101'], >>>> >> >> >>> > [70,'1000110'], >>>> >> >> >>> > [71,'1000111'], >>>> >> >> >>> > [72,'1001000'], >>>> >> >> >>> > [73,'1001001'], >>>> >> >> >>> > [74,'1001010'], >>>> >> >> >>> > [75,'1001011'], >>>> >> >> >>> > [76,'1001100'], >>>> >> >> >>> > [77,'1001101'], >>>> >> >> >>> > [78,'1001110'], >>>> >> >> >>> > [79,'1001111'], >>>> >> >> >>> > [80,'1010000'], >>>> >> >> >>> > [81,'1010001'], >>>> >> >> >>> > [82,'1010010'], >>>> >> >> >>> > [83,'1010011'], >>>> >> >> >>> > [84,'1010100'], >>>> >> >> >>> > [85,'1010101'], >>>> >> >> >>> > [86,'1010110'], >>>> >> >> >>> > [87,'1010111'], >>>> >> >> >>> > [88,'1011000'], >>>> >> >> >>> > [89,'1011001'], >>>> >> >> >>> > [90,'1011010']].sort(function(x,y) { return x[1] - y[1] })) >>>> >> >> >>> > >>>> >> >> >>> > >>>> >> >> "[[65,"1000001"],[66,"1000010"],[67,"1000011"],[68,"1000100"],[69,"1000101"],[70,"1000110"],[71,"1000111"],[72,"1001000"],[73,"1001001"],[74,"1001010"],[75,"1001011"],[76,"1001100"],[77,"1001101"],[78,"1001110"],[79,"1001111"],[80,"1010000"],[81,"1010001"],[82,"1010010"],[83,"1010011"],[84,"1010100"],[85,"1010101"],[86,"1010110"],[87,"1010111"],[88,"1011000"],[89,"1011001"],[90,"1011010"]]" >>>> >> >> >>> > >>>> >> >> >>> > All the values are sorted correctly. >>>> >> >> >>> > >>>> >> >> >>> > Here it is with the same ABCD example: >>>> >> >> >>> > >>>> >> >> >>> > var pako = require('pako'); >>>> >> >> >>> > var binaryString = pako.deflate('ABCD', { to: 'string' }); >>>> >> >> >>> > console.log(binaryString); >>>> >> >> >>> > var restored = pako.inflate(binaryString, { to: 'string' }); >>>> >> >> >>> > console.log(restored); >>>> >> >> >>> > >>>> >> >> >>> > It successfully deflates and inflates itself >>>> >> >> >>> > >>>> >> >> >>> > x?♣A☺☺ ? mcÿ7♣A♫☻?☺♂ >>>> >> >> >>> > ABCD >>>> >> >> >>> > >>>> >> >> >>> > >>>> >> >> >>> > Hope this helps... >>>> >> >> >>> > >>>> >> >> >>> > [1] - >>>> >> >> https://gist.github.com/joebo/a3c2932f0e5a7a0c3f07#file-deflate-c-L2613 >>>> >> >> >>> > [2] - https://rawgit.com/nodeca/pako/master/dist/pako.js >>>> >> >> >>> > >>>> >> >> >>> > On Thu, Sep 11, 2014 at 11:33 AM, bill lam >>>> >> >> >>> > <[email protected]> >>>> >> >> wrote: >>>> >> >> >>> >> This is strange since every author must had decode its own >>>> >> >> >>> >> encoded >>>> >> >> >>> >> data as a smoke test. >>>> >> >> >>> >> >>>> >> >> >>> >> Did you test if huffman code or bit lengths it produced was >>>> >> >> >>> >> correct or not, ie it is a prefix coding and it satisfy the 2 >>>> >> >> >>> >> rules in rfc. >>>> >> >> >>> >> >>>> >> >> >>> >> Чт, 11 сен 2014, Joe Bogner написал(а): >>>> >> >> >>> >>> unfortunately the dynamic coding in the putty fork doesn't >>>> >> >> >>> >>> seem to >>>> >> >> work: >>>> >> >> >>> >>> >>>> >> >> >>> >>> deflate -c deflate.c > out >>>> >> >> >>> >>> deflate -d out >>>> >> >> >>> >>> >>>> >> >> >>> >>> decoding error: incorrect data checksum >>>> >> >> >>> >>> >>>> >> >> >>> >>> >>>> >> >> >>> >>> it works fine with static tables >>>> >> >> >>> >>> >>>> >> >> >>> >>> C:\temp>echo ABCD > ABCD >>>> >> >> >>> >>> >>>> >> >> >>> >>> C:\temp>deflate -c ABCD > out >>>> >> >> >>> >>> >>>> >> >> >>> >>> C:\temp>deflate -d out >>>> >> >> >>> >>> ABCD >>>> >> >> >>> >>> >>>> >> >> >>> >>> I added some debugging code to determine that deflating >>>> >> >> >>> >>> deflate.c >>>> >> >> >>> >>> would be a dynamic table... Assuming it's broke, I probably >>>> >> >> wouldn't >>>> >> >> >>> >>> use it as a reference implementation after all >>>> >> >> >>> >>> >>>> >> >> >>> >>> On Thu, Sep 11, 2014 at 3:45 AM, bill lam >>>> >> >> >>> >>> <[email protected]> >>>> >> >> wrote: >>>> >> >> >>> >>> > the frequencies (guessing from bit lengths) should be >>>> >> >> >>> >>> > something >>>> >> >> like 2 3 1 1 >>>> >> >> >>> >>> > (2 3 1 1) hcodes 'ABCD' >>>> >> >> >>> >>> > >>>> >> >> >>> >>> > the hard part is the inverse problem: how to get the >>>> >> >> >>> >>> > huffman >>>> >> >> code with >>>> >> >> >>> >>> > prior knowing the bits for each symbol. Your pointer to >>>> >> >> >>> >>> > the >>>> >> >> putty >>>> >> >> >>> >>> > fork looks like helpful. The comment is in lines 861 to >>>> >> >> >>> >>> > 914, >>>> >> >> the code >>>> >> >> >>> >>> > itself in line 915 to 964. Do you know how to express it >>>> >> >> >>> >>> > in J? >>>> >> >> >>> >>> > Thanks. >>>> >> >> >>> >>> > >>>> >> >> >>> >>> > On Thu, Sep 11, 2014 at 2:57 PM, Joe Bogner >>>> >> >> >>> >>> > <[email protected]> >>>> >> >> wrote: >>>> >> >> >>> >>> >> Here a few other links ... after reading through the RFC. >>>> >> >> >>> >>> >> Not >>>> >> >> sure if >>>> >> >> >>> >>> >> they help, but just sharing from my own research into >>>> >> >> >>> >>> >> assisting >>>> >> >> on >>>> >> >> >>> >>> >> this topic >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> https://github.com/evegard/pngview/blob/master/huffman.c#L54 >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> And a fork of the putty version with dynamic huffman >>>> >> >> >>> >>> >> coding: >>>> >> >> >>> >>> >> >>>> >> >> http://rc.quest.com/viewvc/putty/trunk/halibut/deflate.c?diff_format=s&revision=2&view=markup >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> Or just generally googling some of the code from the RFC: >>>> >> >> >>> >>> >> >>>> >> >> https://www.google.com/search?q=next_code%5Blen%5D%2B%2B%3B&oq=next_code%5Blen%5D%2B%2B%3B&aqs=chrome..69i57.387j0j7&sourceid=chrome&es_sm=93&ie=UTF-8#q=next_code%5Blen%5D%2B%2B%3B&start=20 >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> Using the code from >>>> >> >> >>> >>> >> http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding, I >>>> >> >> >>> >>> >> got >>>> >> >> stuck >>>> >> >> >>> >>> >> trying to match a simple example to the binary tree in >>>> >> >> >>> >>> >> the RFC: >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> From the RFC: >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> /\ Symbol Code >>>> >> >> >>> >>> >> 0 1 ------ ---- >>>> >> >> >>> >>> >> / \ A 00 >>>> >> >> >>> >>> >> /\ B B 1 >>>> >> >> >>> >>> >> 0 1 C 011 >>>> >> >> >>> >>> >> / \ D 010 >>>> >> >> >>> >>> >> A /\ >>>> >> >> >>> >>> >> 0 1 >>>> >> >> >>> >>> >> / \ >>>> >> >> >>> >>> >> D C >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> (4#1) hcodes 'ABCD' >>>> >> >> >>> >>> >> ┌───┬───┬───┬───┐ >>>> >> >> >>> >>> >> │0 0│0 1│1 0│1 1│ >>>> >> >> >>> >>> >> └───┴───┴───┴───┘ >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> Per the RFC, ideally that should match this? >>>> >> >> '00';'1';'011';'010' >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> From there, it seems like a pretty straightforward >>>> >> >> >>> >>> >> exercise to >>>> >> >> >>> >>> >> transliterate the C code from the RFC into J code to >>>> >> >> >>> >>> >> recode the >>>> >> >> >>> >>> >> example to: >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> Symbol Code >>>> >> >> >>> >>> >> ------ ---- >>>> >> >> >>> >>> >> A 10 >>>> >> >> >>> >>> >> B 0 >>>> >> >> >>> >>> >> C 110 >>>> >> >> >>> >>> >> D 111 >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> I would probably start with a looping construct like >>>> >> >> >>> >>> >> what's in >>>> >> >> the RFC >>>> >> >> >>> >>> >> and then figure out a more J way to do it, but first I >>>> >> >> >>> >>> >> would >>>> >> >> need to >>>> >> >> >>> >>> >> figure out how to create the binary tree in that initial >>>> >> >> >>> >>> >> format. >>>> >> >> >>> >>> >> >>>> >> >> >>> >>> >> On Wed, Sep 10, 2014 at 7:41 PM, bill lam >>>> >> >> >>> >>> >> <[email protected]> >>>> >> >> wrote: >>>> >> >> >>> >>> >>> Thanks Joe, >>>> >> >> >>> >>> >>> putty only use zlib static huffman for encoding so that >>>> >> >> >>> >>> >>> it >>>> >> >> does not build >>>> >> >> >>> >>> >>> any huffman dictionary table. >>>> >> >> >>> >>> >>> >>>> >> >> >>> >>> >>> The zlib static huffman code does not care about >>>> >> >> >>> >>> >>> individual >>>> >> >> symbol's >>>> >> >> >>> >>> >>> frequency, it just encode 0 to 286 into bits, see section >>>> >> >> 3.2.6. >>>> >> >> >>> >>> >>> On Sep 11, 2014 1:26 AM, "Joe Bogner" >>>> >> >> >>> >>> >>> <[email protected]> >>>> >> >> wrote: >>>> >> >> >>> >>> >>> >>>> >> >> >>> >>> >>>> You've already likely considered this, but if it were >>>> >> >> >>> >>> >>>> me I >>>> >> >> would compare >>>> >> >> >>> >>> >>>> results to a working implementation. The one from putty >>>> >> >> >>> >>> >>>> seems >>>> >> >> pretty clean >>>> >> >> >>> >>> >>>> and standalone: >>>> >> >> >>> >>> >>>> >>>> >> >> https://raw.githubusercontent.com/grumpydev/PortablePuTTY/master/SSHZLIB.C >>>> >> >> >>> >>> >>>> . I was able to compile it on windows no problem and I >>>> >> >> >>> >>> >>>> assume >>>> >> >> it'd be fine >>>> >> >> >>> >>> >>>> on linux as well. >>>> >> >> >>> >>> >>>> >>>> >> >> >>> >>> >>>> On Wed, Sep 10, 2014 at 1:00 PM, Raul Miller < >>>> >> >> [email protected]> >>>> >> >> >>> >>> >>>> wrote: >>>> >> >> >>> >>> >>>> >>>> >> >> >>> >>> >>>> > I think the use of the term "consecutive" rather than >>>> >> >> "sequential" is >>>> >> >> >>> >>> >>>> > telling. >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> > The described algorithm is: compute the huffman code >>>> >> >> lengths: >>>> >> >> >>> >>> >>>> > #@>F1 hcodes A1 >>>> >> >> >>> >>> >>>> > 1 3 7 7 6 6 6 6 6 6 6 2 >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> > Then assign ascending huffman codes first in length >>>> >> >> >>> >>> >>>> > order >>>> >> >> and then >>>> >> >> >>> >>> >>>> > within codes of the same length. >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> > Taken literally, that might be something like this: >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> > H=: 4 :0 >>>> >> >> >>> >>> >>>> > L=.#@> x hcodes y >>>> >> >> >>> >>> >>>> > U=.~.L >>>> >> >> >>> >>> >>>> > ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L >>>> >> >> >>> >>> >>>> > ) >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> > ":@>F1 H A1 >>>> >> >> >>> >>> >>>> > 0 >>>> >> >> >>> >>> >>>> > 1 1 0 >>>> >> >> >>> >>> >>>> > 1 1 1 0 0 1 0 >>>> >> >> >>> >>> >>>> > 1 1 1 0 0 1 1 >>>> >> >> >>> >>> >>>> > 1 1 1 0 0 0 >>>> >> >> >>> >>> >>>> > 1 1 1 0 0 1 >>>> >> >> >>> >>> >>>> > 1 1 1 0 1 0 >>>> >> >> >>> >>> >>>> > 1 1 1 0 1 1 >>>> >> >> >>> >>> >>>> > 1 1 1 1 0 0 >>>> >> >> >>> >>> >>>> > 1 1 1 1 0 1 >>>> >> >> >>> >>> >>>> > 1 1 1 1 1 0 >>>> >> >> >>> >>> >>>> > 1 0 >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> > But is this correct? Is it actually safe to leave the >>>> >> >> results like >>>> >> >> >>> >>> >>>> > this - with all codes of the same length being >>>> >> >> >>> >>> >>>> > consecutive >>>> >> >> to each >>>> >> >> >>> >>> >>>> > other? >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> > F (hcodes -:&:(#@>) H) A >>>> >> >> >>> >>> >>>> > 0 >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> > No. >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> > So... "consecutive" must refer only to the values >>>> >> >> >>> >>> >>>> > used and >>>> >> >> not their >>>> >> >> >>> >>> >>>> > order within the result. >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> > Perhaps something like this: >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> > deflatecodes=:4 :0 >>>> >> >> >>> >>> >>>> > L=.#@> x hcodes y >>>> >> >> >>> >>> >>>> > U=.~.L >>>> >> >> >>> >>> >>>> > R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 >>>> >> >> >>> >>> >>>> > #:@i.@#)/.~L >>>> >> >> >>> >>> >>>> > R/:;(</. i.@#)L >>>> >> >> >>> >>> >>>> > ) >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> > F (hcodes -:&:(#@>) deflatecodes) A >>>> >> >> >>> >>> >>>> > 1 >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> > There should be a better way of doing this, but this >>>> >> >> >>> >>> >>>> > should >>>> >> >> at least >>>> >> >> >>> >>> >>>> > get you started. >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> > Thanks, >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> > -- >>>> >> >> >>> >>> >>>> > Raul >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> > On Wed, Sep 10, 2014 at 10:45 AM, bill lam < >>>> >> >> [email protected]> wrote: >>>> >> >> >>> >>> >>>> > > For huffman coding used in zlib: >>>> >> >> >>> >>> >>>> > > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2. >>>> >> >> >>> >>> >>>> > > >>>> >> >> >>> >>> >>>> > > The Huffman codes used for each alphabet in the >>>> >> >> >>> >>> >>>> > > "deflate" >>>> >> >> >>> >>> >>>> > > format have two additional rules: >>>> >> >> >>> >>> >>>> > > >>>> >> >> >>> >>> >>>> > > * All codes of a given bit length have >>>> >> >> >>> >>> >>>> > > lexicographically >>>> >> >> >>> >>> >>>> > > consecutive values, in the same order as the >>>> >> >> >>> >>> >>>> > > symbols >>>> >> >> >>> >>> >>>> > > they represent; >>>> >> >> >>> >>> >>>> > > >>>> >> >> >>> >>> >>>> > > * Shorter codes lexicographically precede longer >>>> >> >> >>> >>> >>>> > > codes. >>>> >> >> >>> >>> >>>> > > I tried jwiki hcodes in >>>> >> >> >>> >>> >>>> > > I try Roger's essay >>>> >> >> >>> >>> >>>> > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding >>>> >> >> >>> >>> >>>> > > >>>> >> >> >>> >>> >>>> > > hc=: 4 : 0 >>>> >> >> >>> >>> >>>> > > if. 1=#x do. y >>>> >> >> >>> >>> >>>> > > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. >>>> >> >> >>> >>> >>>> > > j=. >>>> >> >> 2{./:x end. >>>> >> >> >>> >>> >>>> > > ) >>>> >> >> >>> >>> >>>> > > >>>> >> >> >>> >>> >>>> > > hcodes=: 4 : 0 >>>> >> >> >>> >>> >>>> > > assert. x -:&$ y NB. weights and words >>>> >> >> >>> >>> >>>> > > have >>>> >> >> same shape >>>> >> >> >>> >>> >>>> > > assert. (0<:x) *. 1=#$x NB. weights are >>>> >> >> >>> >>> >>>> > > non-negative >>>> >> >> >>> >>> >>>> > > assert. 1 >: L.y NB. words are boxed not >>>> >> >> >>> >>> >>>> > > more >>>> >> >> than once >>>> >> >> >>> >>> >>>> > > w=. ,&.> y NB. standardized words >>>> >> >> >>> >>> >>>> > > assert. w -: ~.w NB. words are unique >>>> >> >> >>> >>> >>>> > > t=. 0 {:: x hc w NB. minimal weight >>>> >> >> >>> >>> >>>> > > binary tree >>>> >> >> >>> >>> >>>> > > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t >>>> >> >> >>> >>> >>>> > > ) >>>> >> >> >>> >>> >>>> > > >>>> >> >> >>> >>> >>>> > > but the coding produced is malformed for zlib. eg, >>>> >> >> >>> >>> >>>> > > this is what I ran into trouble >>>> >> >> >>> >>> >>>> > > >>>> >> >> >>> >>> >>>> > > f1=: 1 256 17 1 1 9 1 >>>> >> >> >>> >>> >>>> > > f2=: 2 1 0 1 255 0 1536 >>>> >> >> >>> >>> >>>> > > F=: ,/(f1#f2) >>>> >> >> >>> >>> >>>> > > A=: i.286 >>>> >> >> >>> >>> >>>> > > >>>> >> >> >>> >>> >>>> > > F hcodes A >>>> >> >> >>> >>> >>>> > > >>>> >> >> >>> >>> >>>> > > Or a shorter example >>>> >> >> >>> >>> >>>> > > >>>> >> >> >>> >>> >>>> > > A1=: i.12 >>>> >> >> >>> >>> >>>> > > F1=: 2 1 0 0 0 0 0 0 0 0 0 1 >>>> >> >> >>> >>> >>>> > > >>>> >> >> >>> >>> >>>> > > F1 hcodes A1 >>>> >> >> >>> >>> >>>> > > >>>> >> >> >>> >>> >>>> > > Any idea? >>>> >> >> >>> >>> >>>> > > >>>> >> >> >>> >>> >>>> > > -- >>>> >> >> >>> >>> >>>> > > regards, >>>> >> >> >>> >>> >>>> > > ==================================================== >>>> >> >> >>> >>> >>>> > > GPG key 1024D/4434BAB3 2008-08-24 >>>> >> >> >>> >>> >>>> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >>>> >> >> >>> >>> >>>> > > gpg --keyserver subkeys.pgp.net --armor --export >>>> >> >> >>> >>> >>>> > > 4434BAB3 >>>> >> >> >>> >>> >>>> > > >>>> >> >> ---------------------------------------------------------------------- >>>> >> >> >>> >>> >>>> > > For information about J forums see >>>> >> >> http://www.jsoftware.com/forums.htm >>>> >> >> >>> >>> >>>> > >>>> >> >> ---------------------------------------------------------------------- >>>> >> >> >>> >>> >>>> > For information about J forums see >>>> >> >> http://www.jsoftware.com/forums.htm >>>> >> >> >>> >>> >>>> > >>>> >> >> >>> >>> >>>> >>>> >> >> ---------------------------------------------------------------------- >>>> >> >> >>> >>> >>>> For information about J forums see >>>> >> >> http://www.jsoftware.com/forums.htm >>>> >> >> >>> >>> >>>> >>>> >> >> >>> >>> >>> >>>> >> >> ---------------------------------------------------------------------- >>>> >> >> >>> >>> >>> For information about J forums see >>>> >> >> http://www.jsoftware.com/forums.htm >>>> >> >> >>> >>> >> >>>> >> >> ---------------------------------------------------------------------- >>>> >> >> >>> >>> >> For information about J forums see >>>> >> >> http://www.jsoftware.com/forums.htm >>>> >> >> >>> >>> > >>>> >> >> ---------------------------------------------------------------------- >>>> >> >> >>> >>> > For information about J forums see >>>> >> >> http://www.jsoftware.com/forums.htm >>>> >> >> >>> >>> >>>> >> >> ---------------------------------------------------------------------- >>>> >> >> >>> >>> For information about J forums see >>>> >> >> http://www.jsoftware.com/forums.htm >>>> >> >> >>> >> >>>> >> >> >>> >> -- >>>> >> >> >>> >> regards, >>>> >> >> >>> >> ==================================================== >>>> >> >> >>> >> GPG key 1024D/4434BAB3 2008-08-24 >>>> >> >> >>> >> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >>>> >> >> >>> >> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >>>> >> >> >>> >> >>>> >> >> ---------------------------------------------------------------------- >>>> >> >> >>> >> For information about J forums see >>>> >> >> http://www.jsoftware.com/forums.htm >>>> >> >> >>> ---------------------------------------------------------------------- >>>> >> >> >>> For information about J forums see >>>> >> >> >>> http://www.jsoftware.com/forums.htm >>>> >> >> >> >>>> >> >> >> -- >>>> >> >> >> regards, >>>> >> >> >> ==================================================== >>>> >> >> >> GPG key 1024D/4434BAB3 2008-08-24 >>>> >> >> >> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >>>> >> >> >> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >>>> >> >> >> ---------------------------------------------------------------------- >>>> >> >> >> For information about J forums see >>>> >> >> >> http://www.jsoftware.com/forums.htm >>>> >> >> > ---------------------------------------------------------------------- >>>> >> >> > For information about J forums see >>>> >> >> > http://www.jsoftware.com/forums.htm >>>> >> >> ---------------------------------------------------------------------- >>>> >> >> For information about J forums see >>>> >> >> http://www.jsoftware.com/forums.htm >>>> >> > ---------------------------------------------------------------------- >>>> >> > For information about J forums see http://www.jsoftware.com/forums.htm >>>> >> ---------------------------------------------------------------------- >>>> >> For information about J forums see http://www.jsoftware.com/forums.htm >>>> > >>>> > -- >>>> > regards, >>>> > ==================================================== >>>> > GPG key 1024D/4434BAB3 2008-08-24 >>>> > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >>>> > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >>>> > ---------------------------------------------------------------------- >>>> > For information about J forums see http://www.jsoftware.com/forums.htm >>>> ---------------------------------------------------------------------- >>>> For information about J forums see http://www.jsoftware.com/forums.htm >>> >>> -- >>> regards, >>> ==================================================== >>> GPG key 1024D/4434BAB3 2008-08-24 >>> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >>> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
