On 2013-08-28 11:25, Peter Klügl wrote:
On 28.08.2013 16:52, Alexandre Patry wrote:
Hi,
I use RUTA and I want to delete an annotation if it is within the
first 50 tokens of a document. I came up with the following rules :
ANY{POSITION(Document, 1)-> Header}; // Annotate the
first token in the document
Header{->SHIFT(Header, 1, 2)} ANY[0,49]; // Appends the
49 following tokens
ToDelete{POSITION(Header, 1) -> UNMARK(ToDelete)};// Delete the
first ToDelete if it is within the header
These rules work as expected but they are *really* slow. Is there a
faster way to achieve that?
Oh yes, the first rule is really slow. I always miss an action MARKFIRST
(as there is a MARKLAST). I will add it today or tomorrow.
There are two reasons why the first rule is slow:
ANY has to look at all tokens and POSITION is just the slowest condition
in Ruta.
For now you could use a rule like:
ANY{STARTSWITH(Document)-> Header};
... which avoids at least the POSITION condition.
A simple test with a 200 W document:
...
ANY{POSITION(Document, 1)-> Header}; // [0.274s|93.52%]
Header{->SHIFT(Header, 1, 2)} ANY[0,49]; // [0.090s|3.07%]
ToDelete{POSITION(Header, 1) -> UNMARK(ToDelete)}; // [0.030s|1.02%]
...
ANY{STARTSWITH(Document)-> Header}; // [0.047s|50.00%]
Header{->SHIFT(Header, 1, 2)} ANY[0,49]; // [0.029s|30.85%]
ToDelete{POSITION(Header, 1) -> UNMARK(ToDelete)}; // [0.011s|11.7%]
well, that's still slow (in debug mode) and I actually wonder why the
other rules are getting faster... but I hope that the performance will
soon be improved :-)
Just tried it and it is much better, thanks!
Many of my documents start with space, so I had to update the rules to :
Document{-> ADDRETAINTYPE(SPACE, BREAK)};
ANY{STARTSWITH(Document) -> Header};
// if the first token is a space, use the first non-space following it
Header{IS({SPACE, BREAK}) -> UNMARK(Header)} ANY*?
ANY{-PARTOF({SPACE, BREAK}) -> MARK(Header)};
Document{-> REMOVERETAINTYPE(SPACE, BREAK)};
Header{->SHIFT(Header, 1, 2)} ANY[0,49];
ToDelete{POSITION(Header, 1) -> UNMARK(ToDelete)};
I will be happy to test drive MARKFIRST when it will be in trunk.
Alexandre
--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com
Transformez vos documents en outils de décision
<< Turn your documents into decision tools