docbook: tools.ruta.language.anchoring.xml tools.ruta.language.syntax.xml tools.ruta.overview.xml

pkluegl Mon, 29 Jul 2013 06:54:15 -0700

Author: pkluegl
Date: Mon Jul 29 13:52:59 2013
New Revision: 1508071

URL: http://svn.apache.org/r1508071
Log:
UIMA-3071
- added some documentation


Modified:
    
uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.anchoring.xml
    
uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.syntax.xml
    uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml

Modified: 
uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.anchoring.xml
URL: 
http://svn.apache.org/viewvc/uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.anchoring.xml?rev=1508071&r1=1508070&r2=1508071&view=diff
==============================================================================
--- 
uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.anchoring.xml
 (original)
+++ 
uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.anchoring.xml
 Mon Jul 29 13:52:59 2013
@@ -17,7 +17,7 @@
   language governing permissions and limitations under the License. -->
 
 <section id="ugr.tools.ruta.language.anchoring">
-  <title>Matching order</title>
+  <title>Rule elements and their matching order</title>
   <para>
     If not specified otherwise, then the UIMA Ruta rules normally start the 
matching 
     process with their first rule element. The first rule element searches for 
possible positions for its matching
@@ -50,4 +50,44 @@ ANY @LastToken;]]></programlisting>
     the favorable rule element. This functionality can be activated in the 
<link linkend="ugr.tools.ruta.ae.basic.parameter">configuration 
parameters</link> of the analysis engine or 
     directly in the script file with the <link 
linkend="ugr.tools.ruta.language.actions.dynamicanchoring">DYNAMICANCHORING</link>
 action. 
   </para>
+  
+  <para>
+    A list of rule elements normally specifies a sequential pattern. The rule 
is able to match if the first rule element successfully matches 
+    and then the following rule element at the position after the match of the 
first rule element, and so on. There are three language constructs that break 
up that
+    sequential matching: <quote><![CDATA[&]]></quote>, <quote>|</quote> and 
<quote>%</quote>. A composed rule element where all inner rule elements are 
linked by the symbol <quote><![CDATA[&]]></quote>
+    matches only if all inner rule elements successfully match at the given 
position. A composed rule element with inner rule elements linked by the 
+    symbol <quote>|</quote> matches if one of the inner rule element 
successfully matches. These composed rule elements therefore specify a 
conjunction (<quote>and</quote>) 
+    and a disjunction (<quote>or</quote>) of its rule element at the given 
position. The symbol <quote>%</quote> specifies a different use case.
+    Here, rules themselves are linked and they are only able to fire if each 
one of the linked rules successfully matched. In contrast to 
<quote><![CDATA[&]]></quote>, 
+    this linkage of rule elements does not introduce constraints for the 
matched positions. In the following, a few examples of these three language 
constructs are given.
+  </para>
+  <programlisting><![CDATA[(Token.posTag=="DET" & 
Lemma.value="the");]]></programlisting>
+  <para>
+    This rule is fulfilled, if there is a token whose feature 
<quote>posTag</quote> has the value <quote>DET</quote> and an annotation of the 
type <quote>Lemma</quote> whose feature <quote>value</quote> 
+    has the value <quote>the</quote>. Both rule elements need to be fulfilled 
at the same position.
+  </para>
+  <programlisting><![CDATA[NUM (W{REGEXP("Peter") -> Name} & ANY 
CW{PARTOF(Name)});]]></programlisting>
+  <para>
+    This rule matches on a number and then validates if the next word is 
<quote>Peter</quote> and if next but one token is capitalized and part of an 
annotation of the type <quote>Name</quote>.
+    If all rule elements successfully matched, then a new annotation of the 
type <quote>Name</quote> will be created covering the largest match of the 
linked rule elements. In this example,
+    the new annotation covers also the token after the word 
<quote>Peter</quote> even if the actions was specified at the rule element with 
the smaller match.
+  </para>
+  <programlisting><![CDATA[(W{REGEXP("Peter")} CW | "Mr" PERIOD CW){-> 
Name};]]></programlisting>
+  <para>
+    In this example, an annotation of the type <quote>Name</quote> will be 
created for the token <quote>Peter</quote> followed by a 
+    capitalized word or the word <quote>Mr</quote> followed by a period and a 
capitalized word.   
+  </para>
+  <programlisting><![CDATA[(Animal ((COMMA | "and") Animal)+){-> 
AnimalEnum};]]></programlisting>
+  <para>
+    This rule annotates enumerations of animal annotations whereas each animal 
annotation is separated by either a comma or the word <quote>and</quote>.
+  </para>
+  <programlisting><![CDATA[BLOCK(forEach) Sentence{}{
+  CW NUM % SW NUM{-> MARK(Found, 1, 2)};
+}]]></programlisting>
+  <para>
+    Here, annotations of the type <quote>Found</quote> are created if a 
sentence contains a capitalized word followed by a number and a small written 
word followed by a number 
+    regardless of where these annotations occur in the sentence.
+  </para>
+  
+  
 </section>
\ No newline at end of file

Modified: 
uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.syntax.xml
URL: 
http://svn.apache.org/viewvc/uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.syntax.xml?rev=1508071&r1=1508070&r2=1508071&view=diff
==============================================================================
--- 
uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.syntax.xml 
(original)
+++ 
uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.syntax.xml 
Mon Jul 29 13:52:59 2013
@@ -83,23 +83,26 @@ BlockDeclaration    -> "BLOCK" "(" Ident
                                                        "{" Statements 
"}"]]></programlisting>
 
     Syntax of statements and rule elements:
-    <programlisting><![CDATA[SimpleStatement        -> RuleElements ";" | 
RegExpRule ";"
+    <programlisting><![CDATA[SimpleStatement        -> SimpleRule | RegExpRule 
| ConjunctRules
+SimpleRule             -> RuleElements ";"
 RegExpRule             -> StringExpression "->" GroupAssignment 
-                          ("," GroupAssignment)*
+                          ("," GroupAssignment)* ";"
+ConjunctRules          -> RuleElements ("%" RuleElements)+ ";"
 GroupAssignment        -> TypeExpression 
                         | NumberEpxression "=" TypeExpression
 RuleElements           -> RuleElement+
 RuleElement            -> RuleElementType | RuleElementLiteral
-                        | RuleElementComposed | RuleElementDisjunctive
-                        | RuleElementWildCard
+                        | RuleElementComposed | RuleElementWildCard
 RuleElementType        ->  TypeExpression QuantifierPart?
                                          ("{" Conditions?  Actions? "}")?
 RuleElementWithCA      ->  TypeExpression QuantifierPart?
                                             "{" Conditions?  Actions? "}"
 RuleElementLiteral     ->  SimpleStringExpression QuantifierPart?
                                           ("{" Conditions?  Actions? "}")?
-RuleElementComposed    -> "(" RuleElements ")" QuantifierPart?
-                                          ("{" Conditions?  Actions? "}")?
+RuleElementComposed    -> ( RuleElement ("&" RuleElement)+
+                          | RuleElement ("|" RuleElement)+
+                          | "(" RuleElements ")") 
+                          QuantifierPart? ("{" Conditions?  Actions? "}")?
 RuleElementDisjunctive -> "(" (TypeExpression | SimpleStringExpression)
                         ("|" (TypeExpression | SimpleStringExpression) )+
                         (")" QuantifierPart? "{" Conditions?  Actions? }")?

Modified: 
uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml
URL: 
http://svn.apache.org/viewvc/uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml?rev=1508071&r1=1508070&r2=1508071&view=diff
==============================================================================
--- uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml 
(original)
+++ uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml 
Mon Jul 29 13:52:59 2013
@@ -243,6 +243,17 @@ Document{-> MARKFAST(Animal, AnimalsList
     <programlisting><![CDATA[(Animal (COMMA | SEMICOLON))+{-> 
MARK(AnimalEnum,1,2)} Animal;]]></programlisting>
 
     <para>
+      There two more special symbols that can be used to link rule elements. 
If the symbol <quote>|</quote> is replaced by the
+      symbol <quote><![CDATA[&]]></quote> in the last exmaple, then the token 
after the animal need to be a comma and a semicolon, which is of course not 
possible.
+      Another symbol with a special meaning is <quote>%</quote>, which cannot 
only be used within a composed rule element (parentheses).
+      This symbol can be interpreted as a global <quote>and</quote>: It links 
several rules, which only fire, if all rules have sucessfully matched. 
+      In the following example, an annoations of the type 
<quote>FoundIt</quote> is created, if the document contains two periods in a 
row and two commas in a row:
+    </para>
+    
+    <programlisting><![CDATA[PERIOD PERIOD % COMMA COMMA{-> 
FoundIt};]]></programlisting>
+
+
+    <para>
       There is a <quote>wild card</quote> rule element, which can be used to 
skip some text or annotations until the next rule element is able to match.
     </para>

svn commit: r1508071 - in /uima/sandbox/ruta/trunk/ruta-docbook/src/docbook: tools.ruta.language.anchoring.xml tools.ruta.language.syntax.xml tools.ruta.overview.xml

Reply via email to