I don't know jack about the BWT, but from the Wikipedia entry this looks like an implementation:

   ({~ <:@/:) 'aBANANAz'
BNNaAAzA

That works for their testcase, but it doesn't match yours. I think the difference is that you sort on all characters rather than just the first. Is this simpler version good enough?

Henry Rich

On 2/2/2013 10:08 PM, Raul Miller wrote:
Here's a slightly flawed implementation of Burrows-Wheeler Transform
(which is useful when compressing text, because the result tends to be
easy to compress):

    bwt0=: (_1 {"1 [: /:~ (|."0 1~ i.@#)) :. ibwt0
    bwt0 'this is a test'
ssa tt hiie ts
    ibwt0=: (_1 { ((,. /:~)^:(#@[) (#,0:) $ ''"_)) :. bwt0
    ibwt0 bwt0 'this is a test'
stthis is a te

As you can see, the inverse does not quite work right -- it has been rotated.

Here's a fixed implementation:

    bwt=: bwt0@,&EAV :.ibwt
    ibwt=: }.@(|.~ i.&EAV)@ibwt0 :.bwt
    ibwt bwt 'this is a test'
this is a test

This fixed implementation might fail if the original data contained EAV.

It seems like we ought to be able to implement the transform without
this flaw (perhaps using a boxed representation).

Allowing arbitrary treatments of how the transformed result tracks the
original location of the beginning/end of text, do we have a more
efficient way of computing the inverse (or the transform)?

Thanks,

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to