On 28.08.2013 16:52, Alexandre Patry wrote:
> Hi,
>
> I use RUTA and I want to delete an annotation if it is within the
> first 50 tokens of a document. I came up with the following rules :
>
>    ANY{POSITION(Document, 1)-> Header};                // Annotate the
>    first token in the document
>    Header{->SHIFT(Header, 1, 2)} ANY[0,49];            // Appends the
>    49 following tokens
>    ToDelete{POSITION(Header, 1) -> UNMARK(ToDelete)};// Delete the
>    first ToDelete if it is within the header
>
>
> These rules work as expected but they are *really* slow. Is there a
> faster way to achieve that?
>

Oh yes, the first rule is really slow. I always miss an action MARKFIRST
(as there is a MARKLAST). I will add it today or tomorrow.

There are two reasons why the first rule is slow:
ANY has to look at all tokens and POSITION is just the slowest condition
in Ruta.
 
For now you could use a rule like:
ANY{STARTSWITH(Document)-> Header};   
... which avoids at least the POSITION condition.

A simple test with a 200 W document:

...
ANY{POSITION(Document, 1)-> Header}; // [0.274s|93.52%]
Header{->SHIFT(Header, 1, 2)} ANY[0,49];  // [0.090s|3.07%]          
ToDelete{POSITION(Header, 1) -> UNMARK(ToDelete)}; // [0.030s|1.02%]

...
ANY{STARTSWITH(Document)-> Header};  // [0.047s|50.00%]          
Header{->SHIFT(Header, 1, 2)} ANY[0,49];  // [0.029s|30.85%]          
ToDelete{POSITION(Header, 1) -> UNMARK(ToDelete)}; // [0.011s|11.7%]

well, that's still slow (in debug mode) and I actually wonder why the
other rules are getting faster... but I hope that the performance will
soon be improved :-)

Best,

Peter

> Thanks,
>
> Alexandre
>

Reply via email to