Text traversal order

Nikolai Krot Tue, 04 Jun 2019 02:31:14 -0700

Hi all,

I have an example of rules that dont quite work, which leads me to
realization that I dont understand how text is traversed in ruta and how
rules are applied.


Below is a simplified example of what I m doing.

Say, i have a text that has "words" like this

1 aa+bb
2 aa / aa+bb
3 aa /aa /aa+bb

I want to annotate the tokens as follows

1 FOUND
2 FOUND / FOUND
3 FOUND / FOUND / FOUND

and there can be longer sequences separated by a slash.

These are my rules:

"aa" "+" "bb"  {->MARK(FOUND,1,3)};
"aa" "/" FOUND {->MARK(FOUND, 1)};

In other words: the rightmost token of the sequence is annotated first as
FOUND. and this becomes an evidence to annotate preceeding tokens as FOUND
as well.

The thing is that only cases 1 and 2 are fully annotated. The case 3 is
annotated only partially.

1 FOUND
2 FOUND / FOUND
3 aa / FOUND / FOUND

Seems that the second rule is applied only once, though I expect it to be
applied many times in a loop as long as there is a match. The case 3 should
work as soon as the case 2 has been annotated, because case 3 is an
extension of case 2.

Case 3 starts to work when the second rule is duplicated. Which is not a
good solution, in my opinion. My question is: is the above by design (rule
matching does not restart after a match) or is it a bug in ruta? Or maybe
there is a configuration option to choose a behaviour?

Thank you in advance and best regards,
Nikolai

Text traversal order

Reply via email to