Hi, feature assignments for simple regexp rules are now part of the trunk. RegExpRuleTest.ruta contains an example and the documentation describes the exact syntax.
Best, Peter On 08.05.2013 17:56, William Karl Thompson wrote: > Many thanks! > > -----Original Message----- > From: Peter Klügl [mailto:[email protected]] > Sent: Wednesday, May 08, 2013 2:57 AM > To: [email protected] > Subject: Re: Extending TextMarker with new actions > > Hi, > > I will create a feature request and see when I find the time to implement it. > > Best, > > Peter > > On 08.05.2013 04:38, William Karl Thompson wrote: >> Hi Peter, >> >> What you proposed would work fine for what I was trying to do! >> >> Cheers, >> >> Will >> >> -----Original Message----- >> From: Peter Klügl [mailto:[email protected]] >> Sent: Tuesday, May 07, 2013 3:42 AM >> To: [email protected] >> Subject: Re: Extending TextMarker with new actions >> >> Hi, >> >> On 06.05.2013 18:26, William Karl Thompson wrote: >>> Hi Peter, >>> >>> I like the simplified regular expression rule syntax -- very handy. It's >>> almost exactly what I wanted. However, one thing I'm wondering is how to >>> create an annotation with features using such rules. I have in mind >>> something like the following: >>> >>> "(regex string)" -> 1 = CREATE(FooType, "feat" = "bar"); >>> >>> Here's a possible variant of the above that I can imagine would be useful >>> too: >>> >>> "(regex) (string)" -> CREATE(FooType, "feat1" = GROUP(1), >>> "feat2"=GROUP(2)); >>> >>> What are your thoughts on this? >> I think I won't be able to use the existing code of the CREATE action for >> this and it will also be problematic in the grammar without creating a new >> context. >> >> What about something like: >> >> "(regexp) (string)" -> Type1, 1 = Type2 ("feat" = 2); >> >> This will of course not work with numeric feature values, but there isn't an >> auto-cast anyway... >> >> Best, >> >> Peter >> >> >>> Cheers, >>> >>> Will >>> >>> -----Original Message----- >>> From: William Karl Thompson >>> Sent: Thursday, May 02, 2013 1:49 PM >>> To: [email protected] >>> Subject: RE: Extending TextMarker with new actions >>> >>> Vielen Dank, Ich werde es probieren. >>> >>> -----Original Message----- >>> From: Peter Klügl [mailto:[email protected]] >>> Sent: Thursday, May 02, 2013 12:42 PM >>> To: [email protected] >>> Subject: Re: Extending TextMarker with new actions >>> >>> Am 02.05.2013 19:16, schrieb William Karl Thompson: >>>> I see you're way ahead of me! I'll take a look at this -- is it in the >>>> latest on trunk? >>> Yes, and there is also a unit test (if you are interested in some >>> ready-to-work examples): >>> org.apache.uima.ruta.RegExpRuleTest.java(.ruta, >>> .txt) >>> >>> Peter >>> >>>> -----Original Message----- >>>> From: Peter Klügl [mailto:[email protected]] >>>> Sent: Thursday, May 02, 2013 12:14 PM >>>> To: [email protected] >>>> Subject: Re: Extending TextMarker with new actions >>>> >>>> Hi, >>>> >>>> oh, I am afraid I recently added something like that for the 2.0.1 >>>> release, not yet included in the 2.0.0 release. This does not mean >>>> that I would not include the action in UIMA Ruta ;-) >>>> >>>> Here the excerpt of the documentation: >>>> >>>> <section id="ugr.tools.ruta.language.regexprule"> >>>> <title>Simple Rules based on Regular Expressions</title> >>>> <para> >>>> The Ruta language includes, additionally to the normal rules, a >>>> simplified rule syntax for processing regular expressions. >>>> These simple rules consist of two parts separated by >>>> <quote>-></quote>: The left part is the regular expression >>>> (flags: DOTALL and MULTILINE), which may contain capturing groups. >>>> The right part defines, which kind of annotations >>>> should be created for each match of the regular expression. If a >>>> type is given without a group index, then an annotation of that type is >>>> created for the complete regular expression match, which >>>> corresponds to group 0. These simple rules can be restricted to match only >>>> within >>>> certain annotations using the BLOCK construct, and ignore all >>>> filtering settings. >>>> </para> >>>> >>>> <programlisting><![CDATA[ >>>> RegExpRule -> StringExpression "->" GroupAssignment >>>> ("," GroupAssignment)* ";" >>>> GroupAssignment -> TypeExpression | NumberEpxression "=" >>>> TypeExpression ]]></programlisting> >>>> >>>> <para> >>>> The following example contains a simple rule, which is able to >>>> create annotations of two different types. It creates an annotation >>>> of the type <quote>T1</quote> for each match of the complete >>>> regular expression and an annotation >>>> of the type <quote>T2</quote> for each match of the first >>>> capturing group. >>>> </para> >>>> >>>> <programlisting><![CDATA["A(.*?)C" -> T1, 1 = >>>> T2;]]></programlisting> >>>> >>>> >>>> </section> >>>> >>>> >>>> >>>> >>>> Am 02.05.2013 19:06, schrieb William Karl Thompson: >>>>> I forgot to mention, the numeric argument in the proposed MARKREGEXP >>>>> action indicates which capturing group is to be used from regular >>>>> expression to generate the region for the annotation of the specified >>>>> type. >>>>> >>>>> -----Original Message----- >>>>> From: William Karl Thompson >>>>> Sent: Thursday, May 02, 2013 12:02 PM >>>>> To: [email protected] >>>>> Subject: RE: Extending TextMarker with new actions >>>>> >>>>> Peter, >>>>> >>>>> Thanks for helping me to get going on this, it now works like a charm! >>>>> Have been able to generate extensions and have them be recognized by the >>>>> Eclipse IDE as per your instructions. Very nice! >>>>> >>>>> In the process of doing this, I do have an idea for a possibly useful >>>>> action to be added to the current set. The basic idea is implement >>>>> functionality similar to that found in the RegularExpressionAnnotator >>>>> that is one of the UIMA addons: >>>>> >>>>> http://uima.apache.org/sandbox.html#regex.annotator >>>>> >>>>> This allows you to define a set of regular expression matches, and to >>>>> mark an annotation on the region covered by the match, restricted if >>>>> desired by a capturing group within the regular expression. The way I >>>>> implemented it experimentally was like the following: >>>>> >>>>> Sentence{->MARKREGEXP(TypeExpression, "(?i)(ascending colon) >>>>> polyps", 1}; NP{PARTOF(FindingsSection)->MARKREGEXP(TypeExpression, >>>>> "(?i)tubular adenoma", 0)}; >>>>> >>>>> The key thing is that the regular expression matching is using the >>>>> equivalent of java.util.regex.Matcher.find(), unlike the current >>>>> implementation of the REGEXP condition, which uses match(): >>>>> >>>>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Matcher. >>>>> h >>>>> tml#find() >>>>> >>>>> Anyway, thanks again for your help getting this all working. >>>>> >>>>> Cheers, >>>>> >>>>> Will >>>>> >>>>> ________________________________________ >>>>> From: Peter Klügl [[email protected]] >>>>> Sent: Monday, April 29, 2013 4:20 PM >>>>> To: [email protected] >>>>> Subject: Re: Extending TextMarker with new actions >>>>> >>>>> Hi, >>>>> >>>>> Am 29.04.2013 20:22, schrieb William Karl Thompson: >>>>>> Hi Peter, >>>>>> >>>>>> I've updated and built the TextMarker projects, but now I'm spinning my >>>>>> wheels a bit trying to install the updated TextMarker Workbench feature >>>>>> from the projects. Could you give me a tip on how to do that? This isn't >>>>>> something I've ever done before, and I'm not having much success at the >>>>>> moment. >>>>> There are different ways. You could either just build the jars and put >>>>> them in the dropins folder of your eclipse installation (with no >>>>> textmarker installed) - not really recommended. Or, you could build the >>>>> update site, which can be used to install the feature and plugins. The >>>>> pom of the update site project (was textmarker-eclipse-update-site) has >>>>> two important properties: item-maven-release-version and >>>>> item-eclipse-release-version. If you want to build an update site using >>>>> the SNAPSHOT artifacts, then you need to adapt these values, e.g., to >>>>> 2.0.1-SNAPSHOT and 2.0.1.SNAPSHOT. The normal process is to install >>>>> everything and then package the update site. >>>>> >>>>> You have also to include your extensions somehow, e.g., by extending the >>>>> update site (and feature) or by copying the built plugin to the dropins >>>>> folder. >>>>> >>>>> When I try new stuff, I always start an Eclipse Application using my >>>>> textmarker workspace. Here, no installation is needed. I could also build >>>>> a textmarker update site with the fixed extensions for you, but >>>>> unfortunately not before Thursday. >>>>> >>>>> I am currently in the process of renaming all textmarker projects (the >>>>> new name is UIMA Ruta). You have to be careful which revision you are >>>>> using to build the projects right now, because I wasn't able to finish >>>>> the renaming today, and I haven't tested the new update site yet. The >>>>> renaming started with revision 1477012. Sorry for the bad timing. >>>>> >>>>> Best, >>>>> >>>>> Peter >>>>> >>>>> >>>>>> Many thanks, >>>>>> >>>>>> Will >>>>>> >>>>>> -----Original Message----- >>>>>> From: William Karl Thompson >>>>>> Sent: Friday, April 26, 2013 3:40 PM >>>>>> To: [email protected] >>>>>> Subject: RE: Extending TextMarker with new actions >>>>>> >>>>>> Hi Peter, >>>>>> >>>>>> Thanks very much, I will try this out! >>>>>> >>>>>> Best, >>>>>> >>>>>> Will >>>>>> >>>>>> -----Original Message----- >>>>>> From: Peter Klügl [mailto:[email protected]] >>>>>> Sent: Friday, April 26, 2013 4:30 AM >>>>>> To: [email protected] >>>>>> Subject: Re: Extending TextMarker with new actions >>>>>> >>>>>> Hi, >>>>>> >>>>>> On 25.04.2013 19:16, William Karl Thompson wrote: >>>>>>> Hi Peter, >>>>>>> >>>>>>> Many thanks! I was just about to try it out before reading your >>>>>>> latest email. Should I check out the latest trunk version from the svn >>>>>>> repository tomorrow? >>>>>> I fixed most problems and committed the changes together with two >>>>>> example projects (in >>>>>> https://svn.apache.org/repos/asf/uima/sandbox/textmarker/trunk/example-projects): >>>>>> >>>>>> textmarker-ep-example-extensions contains two parts: the implementation >>>>>> of an action (ExampleAction) and the integration in the ide. That's the >>>>>> reason, why it is a maven eclipse-plugin project. >>>>>> >>>>>> ExtensionsExample is a simple textmarker project, which uses the >>>>>> extension. >>>>>> >>>>>> The syntax check in the Workbench is not yet correctly integrated. It >>>>>> will take a while until I will be able to write the documentation for >>>>>> the extensions. Just let me know, if any problems occur. >>>>>> >>>>>> Best, >>>>>> >>>>>> Peter >>>>>> >>>>>> Btw: I am also involved in a project about information extraction >>>>>> in clinical texts. That's a quite active area ;-) >>>>>> >>>>>>> In terms of feature requests, I appreciate your willingness to consider >>>>>>> extensions. My strategy will be to try accomplishing a few tasks first, >>>>>>> to see what can be abstracted that is of sufficient generality. As >>>>>>> background info, I am creating some NLP applications for clinical text >>>>>>> using cTAKES, and I think TextMarker is a nice option to have for >>>>>>> rule-based alternatives to certain tasks (like relating two annotations >>>>>>> to each other, DiseaseDisorder and AnatomicalLocation in the same >>>>>>> sentence). The current cTAKES relation extractor is based on machine >>>>>>> learning, and requires an annotated corpus for training, whereas >>>>>>> sometimes it's just easier to create a set of rules. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Will >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Peter Klügl [mailto:[email protected]] >>>>>>> Sent: Thursday, April 25, 2013 10:49 AM >>>>>>> To: [email protected] >>>>>>> Subject: Re: Extending TextMarker with new actions >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I checked the language extensions and unfortunately they do not work >>>>>>> right now. There are some small bugs, but they will be fixed tomorrow. >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Peter >>>>>>> >>>>>>> Am 25.04.2013 11:37, schrieb Peter Klügl: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Am 25.04.2013 03:29, schrieb William Karl Thompson: >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> (My apologies, I mistakenly sent this to the dev list >>>>>>>>> initially) >>>>>>>>> >>>>>>>>> I'm very interested in using the TextMarker project, but the >>>>>>>>> current set of action types doesn't quite do what I need. I >>>>>>>>> found references to an extension mechanism, have also found the >>>>>>>>> ITextMarkerActionExtension interface in the source code. I also >>>>>>>>> found the antlr grammar and lexer files where the TextMarker >>>>>>>>> language is defined, which appears to be where new action type >>>>>>>>> names are to be added. So I surmise the steps to add new >>>>>>>>> actions is to >>>>>>>>> >>>>>>>>> >>>>>>>>> 1. Add the desired action signature to the antlr grammar >>>>>>>>> >>>>>>>>> 2. Define an implementation of ITextMarkerActionExtension that >>>>>>>>> implements the functionality. >>>>>>>>> >>>>>>>>> Is there an easier way to do this? My concern is that I need to >>>>>>>>> modify TextMarker source files (the grammar and lexer files), >>>>>>>>> which would be overwritten on any updated version of TextMarker. >>>>>>>> This should be possible without changing any textmarker code. >>>>>>>> >>>>>>>> There is a generic parsing rule in the grammar, which creates an >>>>>>>> external action using the set of ITextMarkerExtension mentioned >>>>>>>> in the descriptor (parameter: additionalExtensions). There is no >>>>>>>> default syntax check since the possible arguments are of course >>>>>>>> not yet known by the engine. Syntax checks need to be >>>>>>>> implemented in the ITextMarkerActionExtension.createAction(), >>>>>>>> which throws an ANTLRException. The arguments of the action are >>>>>>>> delegated to this method, which return the action >>>>>>>> implementation, so there will probably many casts and "if >>>>>>>> instanceOf" checks. Language constructs like assignments >>>>>>>> ("feature" = Type) known by the CREATE action, are not yet supported. >>>>>>>> >>>>>>>> Unfortunately, there is no automatic integration in the workbench yet. >>>>>>>> You have to modify the BasicEngine (add the extension) in the >>>>>>>> textmarker project yourself. The implemenatation of the >>>>>>>> extension needs of course then also be available to the workbench. >>>>>>>> >>>>>>>> I haven't used the language extensions since 2009 (it was a >>>>>>>> wordnet >>>>>>>> integration) and they are not yet covered by unit tests. So, >>>>>>>> there are maybe some bugs due to the changes after the >>>>>>>> contribution to Apache UIMA. However, I will check the >>>>>>>> functionality, add a test case and extend the documentation. >>>>>>>> >>>>>>>> Concerning the list of available actions: You are of course also >>>>>>>> welcome to create feature requests for new actions. The current >>>>>>>> set of actions is mainly based on my own requirements and I will >>>>>>>> gladly add new reasonable/generic actions (within the limits of my >>>>>>>> available time). >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Peter >>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> Will Thompson >>>>>>>>>
