Re: [Jprogramming] Burrows-Wheeler Transform

Raul Miller Mon, 08 Apr 2013 08:00:31 -0700

In conversational contexts, bringing up old topics is bad because the
other participants are gone.


In informational contexts, the constraint should probably focus on
valid information rather than participants.

Anyways, that looks interesting (and similar to the paper William
Tanksley, Jr. referred to - though the paper you refer to here was
published 3 years earlier).  I think I have an inkling of what's
happening here, but I'd have to implement it to be sure.  In essence,
though, it looks like a time/space tradeoff - if I understand it
properly, this "sort on the first k letters" variant needs additional
storage (needs tie-breaker indices) but has reduced computational
requirements (only need to sort on prefix rather than on entire
string).

That said, we could instead break up the string we are compressing
into substrings of length k (or some other length). So, anyways... I'd
need a lot of testing before I had a solid understanding of which
approach was best.

Thanks,

-- 
Raul

On Mon, Apr 8, 2013 at 10:49 AM, Mike Müller <[email protected]> wrote:
> Sorry for dragging up such an old thread, but I just read this and wanted to 
> remark, that it's essential for the BWT to sort on all characters.
> There's a variant called Sort Transform (or sometimes Schindler Transform) 
> that only sorts the cyclic shifts according to the first k characters.
> See, for instance: http://arxiv.org/abs/0908.0239
> Inverting this one is more complicated though…
>
> Mike
>
> Am 03.02.2013 um 04:43 schrieb Henry Rich <[email protected]>:
>
>> I don't know jack about the BWT, but from the Wikipedia entry this looks 
>> like an implementation:
>>
>>   ({~ <:@/:) 'aBANANAz'
>> BNNaAAzA
>>
>> That works for their testcase, but it doesn't match yours.  I think the 
>> difference is that you sort on all characters rather than just the first.  
>> Is this simpler version good enough?
>>
>> Henry Rich
>>
>> On 2/2/2013 10:08 PM, Raul Miller wrote:
>>> Here's a slightly flawed implementation of Burrows-Wheeler Transform
>>> (which is useful when compressing text, because the result tends to be
>>> easy to compress):
>>>
>>>    bwt0=: (_1 {"1 [: /:~ (|."0 1~ i.@#)) :. ibwt0
>>>    bwt0 'this is a test'
>>> ssa tt hiie ts
>>>    ibwt0=: (_1 { ((,. /:~)^:(#@[) (#,0:) $ ''"_)) :. bwt0
>>>    ibwt0 bwt0 'this is a test'
>>> stthis is a te
>>>
>>> As you can see, the inverse does not quite work right -- it has been 
>>> rotated.
>>>
>>> Here's a fixed implementation:
>>>
>>>    bwt=: bwt0@,&EAV :.ibwt
>>>    ibwt=: }.@(|.~ i.&EAV)@ibwt0 :.bwt
>>>    ibwt bwt 'this is a test'
>>> this is a test
>>>
>>> This fixed implementation might fail if the original data contained EAV.
>>>
>>> It seems like we ought to be able to implement the transform without
>>> this flaw (perhaps using a boxed representation).
>>>
>>> Allowing arbitrary treatments of how the transformed result tracks the
>>> original location of the beginning/end of text, do we have a more
>>> efficient way of computing the inverse (or the transform)?
>>>
>>> Thanks,
>>>
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Burrows-Wheeler Transform

Reply via email to