Author: pkluegl Date: Mon Jul 29 13:52:59 2013 New Revision: 1508071 URL: http://svn.apache.org/r1508071 Log: UIMA-3071 - added some documentation
Modified: uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.anchoring.xml uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.syntax.xml uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml Modified: uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.anchoring.xml URL: http://svn.apache.org/viewvc/uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.anchoring.xml?rev=1508071&r1=1508070&r2=1508071&view=diff ============================================================================== --- uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.anchoring.xml (original) +++ uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.anchoring.xml Mon Jul 29 13:52:59 2013 @@ -17,7 +17,7 @@ language governing permissions and limitations under the License. --> <section id="ugr.tools.ruta.language.anchoring"> - <title>Matching order</title> + <title>Rule elements and their matching order</title> <para> If not specified otherwise, then the UIMA Ruta rules normally start the matching process with their first rule element. The first rule element searches for possible positions for its matching @@ -50,4 +50,44 @@ ANY @LastToken;]]></programlisting> the favorable rule element. This functionality can be activated in the <link linkend="ugr.tools.ruta.ae.basic.parameter">configuration parameters</link> of the analysis engine or directly in the script file with the <link linkend="ugr.tools.ruta.language.actions.dynamicanchoring">DYNAMICANCHORING</link> action. </para> + + <para> + A list of rule elements normally specifies a sequential pattern. The rule is able to match if the first rule element successfully matches + and then the following rule element at the position after the match of the first rule element, and so on. There are three language constructs that break up that + sequential matching: <quote><![CDATA[&]]></quote>, <quote>|</quote> and <quote>%</quote>. A composed rule element where all inner rule elements are linked by the symbol <quote><![CDATA[&]]></quote> + matches only if all inner rule elements successfully match at the given position. A composed rule element with inner rule elements linked by the + symbol <quote>|</quote> matches if one of the inner rule element successfully matches. These composed rule elements therefore specify a conjunction (<quote>and</quote>) + and a disjunction (<quote>or</quote>) of its rule element at the given position. The symbol <quote>%</quote> specifies a different use case. + Here, rules themselves are linked and they are only able to fire if each one of the linked rules successfully matched. In contrast to <quote><![CDATA[&]]></quote>, + this linkage of rule elements does not introduce constraints for the matched positions. In the following, a few examples of these three language constructs are given. + </para> + <programlisting><![CDATA[(Token.posTag=="DET" & Lemma.value="the");]]></programlisting> + <para> + This rule is fulfilled, if there is a token whose feature <quote>posTag</quote> has the value <quote>DET</quote> and an annotation of the type <quote>Lemma</quote> whose feature <quote>value</quote> + has the value <quote>the</quote>. Both rule elements need to be fulfilled at the same position. + </para> + <programlisting><![CDATA[NUM (W{REGEXP("Peter") -> Name} & ANY CW{PARTOF(Name)});]]></programlisting> + <para> + This rule matches on a number and then validates if the next word is <quote>Peter</quote> and if next but one token is capitalized and part of an annotation of the type <quote>Name</quote>. + If all rule elements successfully matched, then a new annotation of the type <quote>Name</quote> will be created covering the largest match of the linked rule elements. In this example, + the new annotation covers also the token after the word <quote>Peter</quote> even if the actions was specified at the rule element with the smaller match. + </para> + <programlisting><![CDATA[(W{REGEXP("Peter")} CW | "Mr" PERIOD CW){-> Name};]]></programlisting> + <para> + In this example, an annotation of the type <quote>Name</quote> will be created for the token <quote>Peter</quote> followed by a + capitalized word or the word <quote>Mr</quote> followed by a period and a capitalized word. + </para> + <programlisting><![CDATA[(Animal ((COMMA | "and") Animal)+){-> AnimalEnum};]]></programlisting> + <para> + This rule annotates enumerations of animal annotations whereas each animal annotation is separated by either a comma or the word <quote>and</quote>. + </para> + <programlisting><![CDATA[BLOCK(forEach) Sentence{}{ + CW NUM % SW NUM{-> MARK(Found, 1, 2)}; +}]]></programlisting> + <para> + Here, annotations of the type <quote>Found</quote> are created if a sentence contains a capitalized word followed by a number and a small written word followed by a number + regardless of where these annotations occur in the sentence. + </para> + + </section> \ No newline at end of file Modified: uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.syntax.xml URL: http://svn.apache.org/viewvc/uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.syntax.xml?rev=1508071&r1=1508070&r2=1508071&view=diff ============================================================================== --- uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.syntax.xml (original) +++ uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.language.syntax.xml Mon Jul 29 13:52:59 2013 @@ -83,23 +83,26 @@ BlockDeclaration -> "BLOCK" "(" Ident "{" Statements "}"]]></programlisting> Syntax of statements and rule elements: - <programlisting><![CDATA[SimpleStatement -> RuleElements ";" | RegExpRule ";" + <programlisting><![CDATA[SimpleStatement -> SimpleRule | RegExpRule | ConjunctRules +SimpleRule -> RuleElements ";" RegExpRule -> StringExpression "->" GroupAssignment - ("," GroupAssignment)* + ("," GroupAssignment)* ";" +ConjunctRules -> RuleElements ("%" RuleElements)+ ";" GroupAssignment -> TypeExpression | NumberEpxression "=" TypeExpression RuleElements -> RuleElement+ RuleElement -> RuleElementType | RuleElementLiteral - | RuleElementComposed | RuleElementDisjunctive - | RuleElementWildCard + | RuleElementComposed | RuleElementWildCard RuleElementType -> TypeExpression QuantifierPart? ("{" Conditions? Actions? "}")? RuleElementWithCA -> TypeExpression QuantifierPart? "{" Conditions? Actions? "}" RuleElementLiteral -> SimpleStringExpression QuantifierPart? ("{" Conditions? Actions? "}")? -RuleElementComposed -> "(" RuleElements ")" QuantifierPart? - ("{" Conditions? Actions? "}")? +RuleElementComposed -> ( RuleElement ("&" RuleElement)+ + | RuleElement ("|" RuleElement)+ + | "(" RuleElements ")") + QuantifierPart? ("{" Conditions? Actions? "}")? RuleElementDisjunctive -> "(" (TypeExpression | SimpleStringExpression) ("|" (TypeExpression | SimpleStringExpression) )+ (")" QuantifierPart? "{" Conditions? Actions? }")? Modified: uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml URL: http://svn.apache.org/viewvc/uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml?rev=1508071&r1=1508070&r2=1508071&view=diff ============================================================================== --- uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml (original) +++ uima/sandbox/ruta/trunk/ruta-docbook/src/docbook/tools.ruta.overview.xml Mon Jul 29 13:52:59 2013 @@ -243,6 +243,17 @@ Document{-> MARKFAST(Animal, AnimalsList <programlisting><![CDATA[(Animal (COMMA | SEMICOLON))+{-> MARK(AnimalEnum,1,2)} Animal;]]></programlisting> <para> + There two more special symbols that can be used to link rule elements. If the symbol <quote>|</quote> is replaced by the + symbol <quote><![CDATA[&]]></quote> in the last exmaple, then the token after the animal need to be a comma and a semicolon, which is of course not possible. + Another symbol with a special meaning is <quote>%</quote>, which cannot only be used within a composed rule element (parentheses). + This symbol can be interpreted as a global <quote>and</quote>: It links several rules, which only fire, if all rules have sucessfully matched. + In the following example, an annoations of the type <quote>FoundIt</quote> is created, if the document contains two periods in a row and two commas in a row: + </para> + + <programlisting><![CDATA[PERIOD PERIOD % COMMA COMMA{-> FoundIt};]]></programlisting> + + + <para> There is a <quote>wild card</quote> rule element, which can be used to skip some text or annotations until the next rule element is able to match. </para>