Here's a perhaps slightly more efficient way of computing bit lengths:

bitlens=: 4 :0
  t=. 0{:: x hc w=. ,&.> y
  ((<S:0 t)i. w) { #S:1 {::t
)

This assumes that the data is well formed, but the big thing is that it
does not compute a boxed intermediate result and instead finds the lengths
earlier in the process.

Thanks,

-- 
Raul

On Sat, Sep 13, 2014 at 3:41 PM, Joe Bogner <[email protected]> wrote:

> Raul, I just tried your code and confirmed that it returned the same
> results as mine:
>
> find_codes (3, 3, 3, 3,3, 2, 4, 4)
> ┌─────┬─────┬─────┬─────┬─────┬───┬───────┬───────┐
> │0 1 0│0 1 1│1 0 0│1 0 1│1 1 0│0 0│1 1 1 0│1 1 1 1│
> └─────┴─────┴─────┴─────┴─────┴───┴───────┴───────┘
>
> #: inv every find_codes (3, 3, 3, 3,3, 2, 4, 4)
> 2 3 4 5 6 0 14 15
>
> tree=: makeFromLengths (3, 3, 3, 3,3, 2, 4, 4)
> tree
>
> 2 3 4 5 6 0 14 15
>
> #: tree
> 0 0 1 0
> 0 0 1 1
> 0 1 0 0
> 0 1 0 1
> 0 1 1 0
> 0 0 0 0
> 1 1 1 0
> 1 1 1 1
>
> I like the looks of yours more. Mine was a transliteration from the
> RFC. I will need to study yours as I don't understand it yet.
>
> I agree that using hcodes with either mine or yours should produce
> what's needed per the RFC, at least the steps we were looking at in
> the RFC. There is likely more solve still beyond that
>
>
> On Sat, Sep 13, 2014 at 2:52 PM, Raul Miller <[email protected]>
> wrote:
> > If we use hcodes to get bit lengths, and another mechanism (I think the
> > last one I posted is adequate) to determine the bit values, there should
> be
> > no reason to iterate further. The two extra constraints should never
> > require that bit lengths be changed.
> >
> > Thanks,
> >
> > --
> > Raul
> >
> > On Sat, Sep 13, 2014 at 12:43 PM, bill lam <[email protected]> wrote:
> >
> >> I try the putty fork, its seems fialed to uncompress it own
> >> compressed data.  Perhaps its compressed data is malformed.
> >>
> >> I also look at the trees.c of the the official zlib. I do not
> >> understand the code, too complex for me.  AFAICU it does not
> >> generate huffman codes using any standard algorithm, but it
> >> assigns bit lengths using some heuristics (trail and error).
> >> I guess we can try using hcodes to get an initial guess of bit
> >> lengths and then iterate to make it satisfy the 2 rules.
> >>
> >> Чт, 11 сен 2014, bill lam написал(а):
> >> > the frequencies (guessing from bit lengths) should be something like
> 2 3
> >> 1 1
> >> >   (2 3 1 1) hcodes 'ABCD'
> >> >
> >> > the hard part is the inverse problem: how to get the huffman code with
> >> > prior knowing the bits for each symbol.  Your pointer to the putty
> >> > fork looks like helpful.  The comment is in lines 861 to 914, the code
> >> > itself in line 915 to 964. Do you know how to express it in J?
> >> > Thanks.
> >> >
> >> > On Thu, Sep 11, 2014 at 2:57 PM, Joe Bogner <[email protected]>
> wrote:
> >> > > Here a few other links ... after reading through the RFC. Not sure
> if
> >> > > they help, but just sharing from my own research into assisting on
> >> > > this topic
> >> > >
> >> > > https://github.com/evegard/pngview/blob/master/huffman.c#L54
> >> > >
> >> > > And a fork of the putty version with dynamic huffman coding:
> >> > >
> >>
> http://rc.quest.com/viewvc/putty/trunk/halibut/deflate.c?diff_format=s&revision=2&view=markup
> >> > >
> >> > > Or just generally googling some of the code from the RFC:
> >> > >
> >>
> https://www.google.com/search?q=next_code%5Blen%5D%2B%2B%3B&oq=next_code%5Blen%5D%2B%2B%3B&aqs=chrome..69i57.387j0j7&sourceid=chrome&es_sm=93&ie=UTF-8#q=next_code%5Blen%5D%2B%2B%3B&start=20
> >> > >
> >> > >
> >> > > Using the code from
> >> > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding, I got stuck
> >> > > trying to match a simple example to the binary tree in the RFC:
> >> > >
> >> > > From the RFC:
> >> > >
> >> > >                   /\              Symbol    Code
> >> > >                          0  1             ------    ----
> >> > >                         /    \                A      00
> >> > >                        /\     B               B       1
> >> > >                       0  1                    C     011
> >> > >                      /    \                   D     010
> >> > >                     A     /\
> >> > >                          0  1
> >> > >                         /    \
> >> > >                        D      C
> >> > >
> >> > >
> >> > >
> >> > >    (4#1) hcodes 'ABCD'
> >> > > ┌───┬───┬───┬───┐
> >> > > │0 0│0 1│1 0│1 1│
> >> > > └───┴───┴───┴───┘
> >> > >
> >> > > Per the RFC, ideally that should match this? '00';'1';'011';'010'
> >> > >
> >> > >
> >> > > From there, it seems like a pretty straightforward exercise to
> >> > > transliterate the C code from the RFC into J code to recode the
> >> > > example to:
> >> > >
> >> > >
> >> > >             Symbol  Code
> >> > >             ------  ----
> >> > >             A       10
> >> > >             B       0
> >> > >             C       110
> >> > >             D       111
> >> > >
> >> > >
> >> > > I would probably start with a looping construct like what's in the
> RFC
> >> > > and then figure out a more J way to do it, but first I would need to
> >> > > figure out how to create the binary tree in that initial format.
> >> > >
> >> > > On Wed, Sep 10, 2014 at 7:41 PM, bill lam <[email protected]>
> wrote:
> >> > >> Thanks Joe,
> >> > >> putty only use zlib static huffman for encoding so that it does not
> >> build
> >> > >> any huffman dictionary table.
> >> > >>
> >> > >> The zlib static huffman code does not care about individual
> symbol's
> >> > >> frequency, it just encode 0 to 286 into bits, see section 3.2.6.
> >> > >>  On Sep 11, 2014 1:26 AM, "Joe Bogner" <[email protected]>
> wrote:
> >> > >>
> >> > >>> You've already likely considered this, but if it were me I would
> >> compare
> >> > >>> results to a working implementation. The one from putty seems
> pretty
> >> clean
> >> > >>> and standalone:
> >> > >>>
> >>
> https://raw.githubusercontent.com/grumpydev/PortablePuTTY/master/SSHZLIB.C
> >> > >>> . I was able to compile it on windows no problem and I assume it'd
> >> be fine
> >> > >>> on linux as well.
> >> > >>>
> >> > >>> On Wed, Sep 10, 2014 at 1:00 PM, Raul Miller <
> [email protected]>
> >> > >>> wrote:
> >> > >>>
> >> > >>> > I think the use of the term "consecutive" rather than
> "sequential"
> >> is
> >> > >>> > telling.
> >> > >>> >
> >> > >>> > The described algorithm is: compute the huffman code lengths:
> >> > >>> >    #@>F1 hcodes A1
> >> > >>> > 1 3 7 7 6 6 6 6 6 6 6 2
> >> > >>> >
> >> > >>> > Then assign ascending huffman codes first in length order and
> then
> >> > >>> > within codes of the same length.
> >> > >>> >
> >> > >>> > Taken literally, that might be something like this:
> >> > >>> >
> >> > >>> > H=: 4 :0
> >> > >>> >   L=.#@> x hcodes y
> >> > >>> >   U=.~.L
> >> > >>> >   ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
> >> > >>> > )
> >> > >>> >
> >> > >>> >    ":@>F1 H A1
> >> > >>> > 0
> >> > >>> > 1 1 0
> >> > >>> > 1 1 1 0 0 1 0
> >> > >>> > 1 1 1 0 0 1 1
> >> > >>> > 1 1 1 0 0 0
> >> > >>> > 1 1 1 0 0 1
> >> > >>> > 1 1 1 0 1 0
> >> > >>> > 1 1 1 0 1 1
> >> > >>> > 1 1 1 1 0 0
> >> > >>> > 1 1 1 1 0 1
> >> > >>> > 1 1 1 1 1 0
> >> > >>> > 1 0
> >> > >>> >
> >> > >>> > But is this correct? Is it actually safe to leave the results
> like
> >> > >>> > this - with all codes of the same length being consecutive to
> each
> >> > >>> > other?
> >> > >>> >
> >> > >>> >    F (hcodes -:&:(#@>) H) A
> >> > >>> > 0
> >> > >>> >
> >> > >>> > No.
> >> > >>> >
> >> > >>> > So... "consecutive" must refer only to the values used and not
> >> their
> >> > >>> > order within the result.
> >> > >>> >
> >> > >>> > Perhaps something like this:
> >> > >>> >
> >> > >>> > deflatecodes=:4 :0
> >> > >>> >   L=.#@> x hcodes y
> >> > >>> >   U=.~.L
> >> > >>> >   R=. ;<@(({.{.U e.~i.&.<:@{.)<@:+"1-@{.{."1 #:@i.@#)/.~L
> >> > >>> >   R/:;(</. i.@#)L
> >> > >>> > )
> >> > >>> >
> >> > >>> >    F (hcodes -:&:(#@>) deflatecodes)  A
> >> > >>> > 1
> >> > >>> >
> >> > >>> > There should be a better way of doing this, but this should at
> >> least
> >> > >>> > get you started.
> >> > >>> >
> >> > >>> > Thanks,
> >> > >>> >
> >> > >>> > --
> >> > >>> > Raul
> >> > >>> >
> >> > >>> >
> >> > >>> > On Wed, Sep 10, 2014 at 10:45 AM, bill lam <[email protected]
> >
> >> wrote:
> >> > >>> > > For huffman coding used in zlib:
> >> > >>> > > https://www.ietf.org/rfc/rfc1951.txt section 3.2.2.
> >> > >>> > >
> >> > >>> > >  The Huffman codes used for each alphabet in the "deflate"
> >> > >>> > >  format have two additional rules:
> >> > >>> > >
> >> > >>> > >   * All codes of a given bit length have lexicographically
> >> > >>> > >   consecutive values, in the same order as the symbols
> >> > >>> > >   they represent;
> >> > >>> > >
> >> > >>> > >   * Shorter codes lexicographically precede longer codes.
> >> > >>> > > I tried jwiki hcodes in
> >> > >>> > > I try Roger's essay
> >> > >>> > > http://www.jsoftware.com/jwiki/Essays/Huffman%20Coding
> >> > >>> > >
> >> > >>> > > hc=: 4 : 0
> >> > >>> > > if. 1=#x do. y
> >> > >>> > > else. ((i{x),+/j{x) hc (i{y),<j{y [ i=. (i.#x) -. j=. 2{./:x
> end.
> >> > >>> > > )
> >> > >>> > >
> >> > >>> > > hcodes=: 4 : 0
> >> > >>> > > assert. x -:&$ y           NB. weights and words have same
> shape
> >> > >>> > > assert. (0<:x) *. 1=#$x    NB. weights are non-negative
> >> > >>> > > assert. 1 >: L.y           NB. words are boxed not more than
> once
> >> > >>> > > w=. ,&.> y                 NB. standardized words
> >> > >>> > > assert. w -: ~.w           NB. words are unique
> >> > >>> > > t=. 0 {:: x hc w           NB. minimal weight binary tree
> >> > >>> > > ((< S: 0 t) i. w) { <@(1&=)@; S: 1 {:: t
> >> > >>> > > )
> >> > >>> > >
> >> > >>> > > but the coding produced is malformed for zlib. eg,
> >> > >>> > > this is what I ran into trouble
> >> > >>> > >
> >> > >>> > > f1=: 1 256 17 1 1 9 1
> >> > >>> > > f2=: 2 1 0 1 255 0 1536
> >> > >>> > > F=: ,/(f1#f2)
> >> > >>> > > A=: i.286
> >> > >>> > >
> >> > >>> > > F hcodes A
> >> > >>> > >
> >> > >>> > > Or a shorter example
> >> > >>> > >
> >> > >>> > > A1=: i.12
> >> > >>> > > F1=: 2 1 0 0 0 0 0 0 0 0 0 1
> >> > >>> > >
> >> > >>> > > F1 hcodes A1
> >> > >>> > >
> >> > >>> > > Any idea?
> >> > >>> > >
> >> > >>> > > --
> >> > >>> > > regards,
> >> > >>> > > ====================================================
> >> > >>> > > GPG key 1024D/4434BAB3 2008-08-24
> >> > >>> > > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> >> > >>> > > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> >> > >>> > >
> >> ----------------------------------------------------------------------
> >> > >>> > > For information about J forums see
> >> http://www.jsoftware.com/forums.htm
> >> > >>> >
> >> ----------------------------------------------------------------------
> >> > >>> > For information about J forums see
> >> http://www.jsoftware.com/forums.htm
> >> > >>> >
> >> > >>>
> >> ----------------------------------------------------------------------
> >> > >>> For information about J forums see
> >> http://www.jsoftware.com/forums.htm
> >> > >>>
> >> > >>
> ----------------------------------------------------------------------
> >> > >> For information about J forums see
> >> http://www.jsoftware.com/forums.htm
> >> > >
> ----------------------------------------------------------------------
> >> > > For information about J forums see
> http://www.jsoftware.com/forums.htm
> >>
> >> --
> >> regards,
> >> ====================================================
> >> GPG key 1024D/4434BAB3 2008-08-24
> >> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> >> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> >>
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to