I think that the simpler version would be easier to optimize, since it does not depend on sequence properties.
Also, the important qualities are 1) reversibility and 2) compressibility. -- Raul On Sat, Feb 2, 2013 at 10:43 PM, Henry Rich <[email protected]> wrote: > I don't know jack about the BWT, but from the Wikipedia entry this looks > like an implementation: > > ({~ <:@/:) 'aBANANAz' > BNNaAAzA > > That works for their testcase, but it doesn't match yours. I think the > difference is that you sort on all characters rather than just the first. > Is this simpler version good enough? > > Henry Rich > > > On 2/2/2013 10:08 PM, Raul Miller wrote: >> >> Here's a slightly flawed implementation of Burrows-Wheeler Transform >> (which is useful when compressing text, because the result tends to be >> easy to compress): >> >> bwt0=: (_1 {"1 [: /:~ (|."0 1~ i.@#)) :. ibwt0 >> bwt0 'this is a test' >> ssa tt hiie ts >> ibwt0=: (_1 { ((,. /:~)^:(#@[) (#,0:) $ ''"_)) :. bwt0 >> ibwt0 bwt0 'this is a test' >> stthis is a te >> >> As you can see, the inverse does not quite work right -- it has been >> rotated. >> >> Here's a fixed implementation: >> >> bwt=: bwt0@,&EAV :.ibwt >> ibwt=: }.@(|.~ i.&EAV)@ibwt0 :.bwt >> ibwt bwt 'this is a test' >> this is a test >> >> This fixed implementation might fail if the original data contained EAV. >> >> It seems like we ought to be able to implement the transform without >> this flaw (perhaps using a boxed representation). >> >> Allowing arbitrary treatments of how the transformed result tracks the >> original location of the beginning/end of text, do we have a more >> efficient way of computing the inverse (or the transform)? >> >> Thanks, >> > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
