Hi Peter,

I like the simplified regular expression rule syntax -- very handy. It's almost 
exactly what I wanted.  However, one thing I'm wondering is how to create an 
annotation with features using such rules. I have in mind something like the 
following:

"(regex string)" -> 1 = CREATE(FooType, "feat" = "bar");

Here's a possible variant of the above that  I can imagine would be useful too:

"(regex) (string)" -> CREATE(FooType, "feat1" = GROUP(1), "feat2"=GROUP(2));

What are your thoughts on this?

Cheers,

Will

-----Original Message-----
From: William Karl Thompson 
Sent: Thursday, May 02, 2013 1:49 PM
To: [email protected]
Subject: RE: Extending TextMarker with new actions

Vielen Dank, Ich werde es probieren.

-----Original Message-----
From: Peter Klügl [mailto:[email protected]]
Sent: Thursday, May 02, 2013 12:42 PM
To: [email protected]
Subject: Re: Extending TextMarker with new actions

Am 02.05.2013 19:16, schrieb William Karl Thompson:
> I see you're way ahead of me! I'll take a look at this -- is it in the latest 
> on trunk?

Yes, and there is also a unit test (if you are interested in some ready-to-work 
examples): org.apache.uima.ruta.RegExpRuleTest.java(.ruta,
.txt)

Peter

> -----Original Message-----
> From: Peter Klügl [mailto:[email protected]]
> Sent: Thursday, May 02, 2013 12:14 PM
> To: [email protected]
> Subject: Re: Extending TextMarker with new actions
>
> Hi,
>
> oh, I am afraid I recently added something like that for the 2.0.1 
> release, not yet included in the 2.0.0 release. This does not mean 
> that I would not include the action in UIMA Ruta ;-)
>
> Here the excerpt of the documentation:
>
> <section id="ugr.tools.ruta.language.regexprule">
>       <title>Simple Rules based on Regular Expressions</title>
>       <para>
>         The Ruta language includes, additionally to the normal rules, a 
> simplified rule syntax for processing regular expressions.
>         These simple rules consist of two parts separated by
> <quote>-></quote>: The left part is the regular expression
>         (flags: DOTALL and MULTILINE), which may contain capturing groups. 
> The right part defines, which kind of annotations
>         should be created for each match of the regular expression. If a type 
> is given without a group index, then an annotation of that type is
>         created for the complete regular expression match, which corresponds 
> to group 0. These simple rules can be restricted to match only within
>         certain annotations using the BLOCK construct, and ignore all 
> filtering settings.
>       </para>
>
>       <programlisting><![CDATA[
> RegExpRule      -> StringExpression "->" GroupAssignment
>                     ("," GroupAssignment)* ";"
> GroupAssignment -> TypeExpression | NumberEpxression "=" 
> TypeExpression ]]></programlisting>
>
>       <para>
>         The following example contains a simple rule, which is able to create 
> annotations of two different types. It creates an annotation
>         of the type <quote>T1</quote> for each match of the complete regular 
> expression and an annotation
>         of the type <quote>T2</quote> for each match of the first capturing 
> group.
>       </para>
>
>       <programlisting><![CDATA["A(.*?)C" -> T1, 1 = 
> T2;]]></programlisting>
>
>
>     </section>
>
>
>
>
> Am 02.05.2013 19:06, schrieb William Karl Thompson:
>> I forgot to mention, the numeric argument in the proposed MARKREGEXP action 
>> indicates which capturing group is to be used from regular expression to 
>> generate the region for the annotation of the specified type.
>>
>> -----Original Message-----
>> From: William Karl Thompson
>> Sent: Thursday, May 02, 2013 12:02 PM
>> To: [email protected]
>> Subject: RE: Extending TextMarker with new actions
>>
>> Peter,
>>
>> Thanks for helping me to get going on this, it now works like a charm! Have 
>> been able to generate extensions and have them be recognized by the Eclipse 
>> IDE as per your instructions. Very nice!
>>
>> In the process of doing this, I do have an idea for a possibly useful action 
>> to be added to the current set. The basic idea is implement functionality 
>> similar to that found in the RegularExpressionAnnotator that is one of the 
>> UIMA addons:
>>
>> http://uima.apache.org/sandbox.html#regex.annotator
>>
>> This allows you to define a set of regular expression matches, and to mark 
>> an annotation on the region covered by the match, restricted if desired by a 
>> capturing group within the regular expression. The way I implemented it 
>> experimentally was like the following:
>>
>> Sentence{->MARKREGEXP(TypeExpression, "(?i)(ascending colon) polyps", 
>> 1}; NP{PARTOF(FindingsSection)->MARKREGEXP(TypeExpression,
>> "(?i)tubular adenoma", 0)};
>>
>> The key thing is that the regular expression matching is using the 
>> equivalent of java.util.regex.Matcher.find(), unlike the current 
>> implementation of the REGEXP condition, which uses match():
>>
>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Matcher.
>> h
>> tml#find()
>>
>> Anyway, thanks again for your help getting this all working.
>>
>> Cheers,
>>
>> Will
>>
>> ________________________________________
>> From: Peter Klügl [[email protected]]
>> Sent: Monday, April 29, 2013 4:20 PM
>> To: [email protected]
>> Subject: Re: Extending TextMarker with new actions
>>
>> Hi,
>>
>> Am 29.04.2013 20:22, schrieb William Karl Thompson:
>>> Hi Peter,
>>>
>>> I've updated and built the TextMarker projects, but now I'm spinning my 
>>> wheels a bit trying to install the updated TextMarker Workbench feature 
>>> from the projects. Could you give me a tip on how to do that? This isn't 
>>> something I've ever done before, and I'm not having much success at the 
>>> moment.
>> There are different ways. You could either just build the jars and put them 
>> in the dropins folder of your eclipse installation (with no textmarker 
>> installed) - not really recommended. Or, you could build the update site, 
>> which can be used to install the feature and plugins. The pom of the update 
>> site project (was textmarker-eclipse-update-site) has two important 
>> properties: item-maven-release-version and item-eclipse-release-version. If 
>> you want to build an update site using the SNAPSHOT artifacts, then you need 
>> to adapt these values, e.g., to 2.0.1-SNAPSHOT and 2.0.1.SNAPSHOT. The 
>> normal process is to install everything and then package the update site.
>>
>> You have also to include your extensions somehow, e.g., by extending the 
>> update site (and feature) or by copying the built plugin to the dropins 
>> folder.
>>
>> When I try new stuff, I always start an Eclipse Application using my 
>> textmarker workspace. Here, no installation is needed. I could also build a 
>> textmarker update site with the fixed extensions for you, but unfortunately 
>> not before Thursday.
>>
>> I am currently in the process of renaming all textmarker projects (the new 
>> name is UIMA Ruta). You have to be careful which revision you are using to 
>> build the projects right now, because I wasn't able to finish the renaming 
>> today, and I haven't tested the new update site yet. The renaming started 
>> with revision 1477012. Sorry for the bad timing.
>>
>> Best,
>>
>> Peter
>>
>>
>>> Many thanks,
>>>
>>> Will
>>>
>>> -----Original Message-----
>>> From: William Karl Thompson
>>> Sent: Friday, April 26, 2013 3:40 PM
>>> To: [email protected]
>>> Subject: RE: Extending TextMarker with new actions
>>>
>>> Hi Peter,
>>>
>>> Thanks very much, I will try this out!
>>>
>>> Best,
>>>
>>> Will
>>>
>>> -----Original Message-----
>>> From: Peter Klügl [mailto:[email protected]]
>>> Sent: Friday, April 26, 2013 4:30 AM
>>> To: [email protected]
>>> Subject: Re: Extending TextMarker with new actions
>>>
>>> Hi,
>>>
>>> On 25.04.2013 19:16, William Karl Thompson wrote:
>>>> Hi Peter,
>>>>
>>>>     Many thanks! I was just about to try it out before reading your latest 
>>>> email. Should I check out the latest trunk version from the svn repository 
>>>> tomorrow?
>>> I fixed most problems and committed the changes together with two 
>>> example projects (in
>>> https://svn.apache.org/repos/asf/uima/sandbox/textmarker/trunk/example-projects):
>>>
>>> textmarker-ep-example-extensions contains two parts: the implementation of 
>>> an action (ExampleAction) and the integration in the ide. That's the 
>>> reason, why it is a maven eclipse-plugin project.
>>>
>>> ExtensionsExample is a simple textmarker project, which uses the extension.
>>>
>>> The syntax check in the Workbench is not yet correctly integrated. It will 
>>> take a while until I will be able to write the documentation for the 
>>> extensions. Just let me know, if any problems occur.
>>>
>>> Best,
>>>
>>> Peter
>>>
>>> Btw: I am also involved in a project about information extraction in 
>>> clinical texts. That's a quite active area ;-)
>>>
>>>> In terms of feature requests, I appreciate your willingness to consider 
>>>> extensions. My strategy will be to try accomplishing a few tasks first, to 
>>>> see what can be abstracted that is of sufficient generality. As background 
>>>> info, I am creating some NLP applications for clinical text using cTAKES, 
>>>> and I think TextMarker is a nice option to have for rule-based 
>>>> alternatives to certain tasks (like relating two annotations to each 
>>>> other, DiseaseDisorder and AnatomicalLocation in the same sentence). The 
>>>> current cTAKES relation extractor is based on machine learning, and 
>>>> requires an annotated corpus for training, whereas sometimes it's just 
>>>> easier to create a set of rules.
>>>>
>>>> Cheers,
>>>>
>>>> Will
>>>>
>>>> -----Original Message-----
>>>> From: Peter Klügl [mailto:[email protected]]
>>>> Sent: Thursday, April 25, 2013 10:49 AM
>>>> To: [email protected]
>>>> Subject: Re: Extending TextMarker with new actions
>>>>
>>>> Hi,
>>>>
>>>> I checked the language extensions and unfortunately they do not work right 
>>>> now. There are some small bugs, but they will be fixed tomorrow.
>>>>
>>>> Best,
>>>>
>>>> Peter
>>>>
>>>> Am 25.04.2013 11:37, schrieb Peter Klügl:
>>>>> Hi,
>>>>>
>>>>> Am 25.04.2013 03:29, schrieb William Karl Thompson:
>>>>>> Hello,
>>>>>>
>>>>>> (My apologies, I mistakenly sent this to the dev list initially)
>>>>>>
>>>>>> I'm very interested in using the TextMarker project, but the 
>>>>>> current set of action types doesn't quite do what I need. I found 
>>>>>> references to an extension mechanism, have also found the 
>>>>>> ITextMarkerActionExtension interface in the source code. I also 
>>>>>> found the antlr grammar and lexer files where the TextMarker 
>>>>>> language is defined, which appears to be where new action type 
>>>>>> names are to be added. So I surmise the steps to add new actions 
>>>>>> is to
>>>>>>
>>>>>>
>>>>>> 1.       Add the desired action signature to the antlr grammar
>>>>>>
>>>>>> 2.       Define an implementation of ITextMarkerActionExtension that
>>>>>> implements the functionality.
>>>>>>
>>>>>> Is there an easier way to do this? My concern is that I need to 
>>>>>> modify TextMarker source files (the grammar and lexer files), 
>>>>>> which would be overwritten on any updated version of TextMarker.
>>>>> This should be possible without changing any textmarker code.
>>>>>
>>>>> There is a generic parsing rule in the grammar, which creates an 
>>>>> external action using the set of ITextMarkerExtension mentioned in 
>>>>> the descriptor (parameter: additionalExtensions). There is no 
>>>>> default syntax check since the possible arguments are of course 
>>>>> not yet known by the engine. Syntax checks need to be implemented 
>>>>> in the ITextMarkerActionExtension.createAction(), which throws an 
>>>>> ANTLRException. The arguments of the action are delegated to this 
>>>>> method, which return the action implementation, so there will 
>>>>> probably many casts and "if instanceOf" checks. Language 
>>>>> constructs like assignments ("feature" = Type) known by the CREATE 
>>>>> action, are not yet supported.
>>>>>
>>>>> Unfortunately, there is no automatic integration in the workbench yet.
>>>>> You have to modify the BasicEngine (add the extension) in the 
>>>>> textmarker project yourself. The implemenatation of the extension 
>>>>> needs of course then also be available to the workbench.
>>>>>
>>>>> I haven't used the language extensions since 2009 (it was a 
>>>>> wordnet
>>>>> integration) and they are not yet covered by unit tests. So, there 
>>>>> are maybe some bugs due to the changes after the contribution to 
>>>>> Apache UIMA. However, I will check the functionality, add a test 
>>>>> case and extend the documentation.
>>>>>
>>>>> Concerning the list of available actions: You are of course also 
>>>>> welcome to create feature requests for new actions. The current 
>>>>> set of actions is mainly based on my own requirements and I will 
>>>>> gladly add new reasonable/generic actions (within the limits of my 
>>>>> available time).
>>>>>
>>>>> Best,
>>>>>
>>>>> Peter
>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Will Thompson
>>>>>>

Reply via email to