Hi Peter, What you proposed would work fine for what I was trying to do!
Cheers, Will -----Original Message----- From: Peter Klügl [mailto:[email protected]] Sent: Tuesday, May 07, 2013 3:42 AM To: [email protected] Subject: Re: Extending TextMarker with new actions Hi, On 06.05.2013 18:26, William Karl Thompson wrote: > Hi Peter, > > I like the simplified regular expression rule syntax -- very handy. It's > almost exactly what I wanted. However, one thing I'm wondering is how to > create an annotation with features using such rules. I have in mind something > like the following: > > "(regex string)" -> 1 = CREATE(FooType, "feat" = "bar"); > > Here's a possible variant of the above that I can imagine would be useful > too: > > "(regex) (string)" -> CREATE(FooType, "feat1" = GROUP(1), > "feat2"=GROUP(2)); > > What are your thoughts on this? I think I won't be able to use the existing code of the CREATE action for this and it will also be problematic in the grammar without creating a new context. What about something like: "(regexp) (string)" -> Type1, 1 = Type2 ("feat" = 2); This will of course not work with numeric feature values, but there isn't an auto-cast anyway... Best, Peter > > Cheers, > > Will > > -----Original Message----- > From: William Karl Thompson > Sent: Thursday, May 02, 2013 1:49 PM > To: [email protected] > Subject: RE: Extending TextMarker with new actions > > Vielen Dank, Ich werde es probieren. > > -----Original Message----- > From: Peter Klügl [mailto:[email protected]] > Sent: Thursday, May 02, 2013 12:42 PM > To: [email protected] > Subject: Re: Extending TextMarker with new actions > > Am 02.05.2013 19:16, schrieb William Karl Thompson: >> I see you're way ahead of me! I'll take a look at this -- is it in the >> latest on trunk? > Yes, and there is also a unit test (if you are interested in some > ready-to-work examples): > org.apache.uima.ruta.RegExpRuleTest.java(.ruta, > .txt) > > Peter > >> -----Original Message----- >> From: Peter Klügl [mailto:[email protected]] >> Sent: Thursday, May 02, 2013 12:14 PM >> To: [email protected] >> Subject: Re: Extending TextMarker with new actions >> >> Hi, >> >> oh, I am afraid I recently added something like that for the 2.0.1 >> release, not yet included in the 2.0.0 release. This does not mean >> that I would not include the action in UIMA Ruta ;-) >> >> Here the excerpt of the documentation: >> >> <section id="ugr.tools.ruta.language.regexprule"> >> <title>Simple Rules based on Regular Expressions</title> >> <para> >> The Ruta language includes, additionally to the normal rules, a >> simplified rule syntax for processing regular expressions. >> These simple rules consist of two parts separated by >> <quote>-></quote>: The left part is the regular expression >> (flags: DOTALL and MULTILINE), which may contain capturing groups. >> The right part defines, which kind of annotations >> should be created for each match of the regular expression. If a >> type is given without a group index, then an annotation of that type is >> created for the complete regular expression match, which corresponds >> to group 0. These simple rules can be restricted to match only within >> certain annotations using the BLOCK construct, and ignore all >> filtering settings. >> </para> >> >> <programlisting><![CDATA[ >> RegExpRule -> StringExpression "->" GroupAssignment >> ("," GroupAssignment)* ";" >> GroupAssignment -> TypeExpression | NumberEpxression "=" >> TypeExpression ]]></programlisting> >> >> <para> >> The following example contains a simple rule, which is able to >> create annotations of two different types. It creates an annotation >> of the type <quote>T1</quote> for each match of the complete regular >> expression and an annotation >> of the type <quote>T2</quote> for each match of the first capturing >> group. >> </para> >> >> <programlisting><![CDATA["A(.*?)C" -> T1, 1 = >> T2;]]></programlisting> >> >> >> </section> >> >> >> >> >> Am 02.05.2013 19:06, schrieb William Karl Thompson: >>> I forgot to mention, the numeric argument in the proposed MARKREGEXP action >>> indicates which capturing group is to be used from regular expression to >>> generate the region for the annotation of the specified type. >>> >>> -----Original Message----- >>> From: William Karl Thompson >>> Sent: Thursday, May 02, 2013 12:02 PM >>> To: [email protected] >>> Subject: RE: Extending TextMarker with new actions >>> >>> Peter, >>> >>> Thanks for helping me to get going on this, it now works like a charm! Have >>> been able to generate extensions and have them be recognized by the Eclipse >>> IDE as per your instructions. Very nice! >>> >>> In the process of doing this, I do have an idea for a possibly useful >>> action to be added to the current set. The basic idea is implement >>> functionality similar to that found in the RegularExpressionAnnotator that >>> is one of the UIMA addons: >>> >>> http://uima.apache.org/sandbox.html#regex.annotator >>> >>> This allows you to define a set of regular expression matches, and to mark >>> an annotation on the region covered by the match, restricted if desired by >>> a capturing group within the regular expression. The way I implemented it >>> experimentally was like the following: >>> >>> Sentence{->MARKREGEXP(TypeExpression, "(?i)(ascending colon) >>> polyps", 1}; NP{PARTOF(FindingsSection)->MARKREGEXP(TypeExpression, >>> "(?i)tubular adenoma", 0)}; >>> >>> The key thing is that the regular expression matching is using the >>> equivalent of java.util.regex.Matcher.find(), unlike the current >>> implementation of the REGEXP condition, which uses match(): >>> >>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Matcher. >>> h >>> tml#find() >>> >>> Anyway, thanks again for your help getting this all working. >>> >>> Cheers, >>> >>> Will >>> >>> ________________________________________ >>> From: Peter Klügl [[email protected]] >>> Sent: Monday, April 29, 2013 4:20 PM >>> To: [email protected] >>> Subject: Re: Extending TextMarker with new actions >>> >>> Hi, >>> >>> Am 29.04.2013 20:22, schrieb William Karl Thompson: >>>> Hi Peter, >>>> >>>> I've updated and built the TextMarker projects, but now I'm spinning my >>>> wheels a bit trying to install the updated TextMarker Workbench feature >>>> from the projects. Could you give me a tip on how to do that? This isn't >>>> something I've ever done before, and I'm not having much success at the >>>> moment. >>> There are different ways. You could either just build the jars and put them >>> in the dropins folder of your eclipse installation (with no textmarker >>> installed) - not really recommended. Or, you could build the update site, >>> which can be used to install the feature and plugins. The pom of the update >>> site project (was textmarker-eclipse-update-site) has two important >>> properties: item-maven-release-version and item-eclipse-release-version. If >>> you want to build an update site using the SNAPSHOT artifacts, then you >>> need to adapt these values, e.g., to 2.0.1-SNAPSHOT and 2.0.1.SNAPSHOT. The >>> normal process is to install everything and then package the update site. >>> >>> You have also to include your extensions somehow, e.g., by extending the >>> update site (and feature) or by copying the built plugin to the dropins >>> folder. >>> >>> When I try new stuff, I always start an Eclipse Application using my >>> textmarker workspace. Here, no installation is needed. I could also build a >>> textmarker update site with the fixed extensions for you, but unfortunately >>> not before Thursday. >>> >>> I am currently in the process of renaming all textmarker projects (the new >>> name is UIMA Ruta). You have to be careful which revision you are using to >>> build the projects right now, because I wasn't able to finish the renaming >>> today, and I haven't tested the new update site yet. The renaming started >>> with revision 1477012. Sorry for the bad timing. >>> >>> Best, >>> >>> Peter >>> >>> >>>> Many thanks, >>>> >>>> Will >>>> >>>> -----Original Message----- >>>> From: William Karl Thompson >>>> Sent: Friday, April 26, 2013 3:40 PM >>>> To: [email protected] >>>> Subject: RE: Extending TextMarker with new actions >>>> >>>> Hi Peter, >>>> >>>> Thanks very much, I will try this out! >>>> >>>> Best, >>>> >>>> Will >>>> >>>> -----Original Message----- >>>> From: Peter Klügl [mailto:[email protected]] >>>> Sent: Friday, April 26, 2013 4:30 AM >>>> To: [email protected] >>>> Subject: Re: Extending TextMarker with new actions >>>> >>>> Hi, >>>> >>>> On 25.04.2013 19:16, William Karl Thompson wrote: >>>>> Hi Peter, >>>>> >>>>> Many thanks! I was just about to try it out before reading your >>>>> latest email. Should I check out the latest trunk version from the svn >>>>> repository tomorrow? >>>> I fixed most problems and committed the changes together with two >>>> example projects (in >>>> https://svn.apache.org/repos/asf/uima/sandbox/textmarker/trunk/example-projects): >>>> >>>> textmarker-ep-example-extensions contains two parts: the implementation of >>>> an action (ExampleAction) and the integration in the ide. That's the >>>> reason, why it is a maven eclipse-plugin project. >>>> >>>> ExtensionsExample is a simple textmarker project, which uses the extension. >>>> >>>> The syntax check in the Workbench is not yet correctly integrated. It will >>>> take a while until I will be able to write the documentation for the >>>> extensions. Just let me know, if any problems occur. >>>> >>>> Best, >>>> >>>> Peter >>>> >>>> Btw: I am also involved in a project about information extraction >>>> in clinical texts. That's a quite active area ;-) >>>> >>>>> In terms of feature requests, I appreciate your willingness to consider >>>>> extensions. My strategy will be to try accomplishing a few tasks first, >>>>> to see what can be abstracted that is of sufficient generality. As >>>>> background info, I am creating some NLP applications for clinical text >>>>> using cTAKES, and I think TextMarker is a nice option to have for >>>>> rule-based alternatives to certain tasks (like relating two annotations >>>>> to each other, DiseaseDisorder and AnatomicalLocation in the same >>>>> sentence). The current cTAKES relation extractor is based on machine >>>>> learning, and requires an annotated corpus for training, whereas >>>>> sometimes it's just easier to create a set of rules. >>>>> >>>>> Cheers, >>>>> >>>>> Will >>>>> >>>>> -----Original Message----- >>>>> From: Peter Klügl [mailto:[email protected]] >>>>> Sent: Thursday, April 25, 2013 10:49 AM >>>>> To: [email protected] >>>>> Subject: Re: Extending TextMarker with new actions >>>>> >>>>> Hi, >>>>> >>>>> I checked the language extensions and unfortunately they do not work >>>>> right now. There are some small bugs, but they will be fixed tomorrow. >>>>> >>>>> Best, >>>>> >>>>> Peter >>>>> >>>>> Am 25.04.2013 11:37, schrieb Peter Klügl: >>>>>> Hi, >>>>>> >>>>>> Am 25.04.2013 03:29, schrieb William Karl Thompson: >>>>>>> Hello, >>>>>>> >>>>>>> (My apologies, I mistakenly sent this to the dev list initially) >>>>>>> >>>>>>> I'm very interested in using the TextMarker project, but the >>>>>>> current set of action types doesn't quite do what I need. I >>>>>>> found references to an extension mechanism, have also found the >>>>>>> ITextMarkerActionExtension interface in the source code. I also >>>>>>> found the antlr grammar and lexer files where the TextMarker >>>>>>> language is defined, which appears to be where new action type >>>>>>> names are to be added. So I surmise the steps to add new actions >>>>>>> is to >>>>>>> >>>>>>> >>>>>>> 1. Add the desired action signature to the antlr grammar >>>>>>> >>>>>>> 2. Define an implementation of ITextMarkerActionExtension that >>>>>>> implements the functionality. >>>>>>> >>>>>>> Is there an easier way to do this? My concern is that I need to >>>>>>> modify TextMarker source files (the grammar and lexer files), >>>>>>> which would be overwritten on any updated version of TextMarker. >>>>>> This should be possible without changing any textmarker code. >>>>>> >>>>>> There is a generic parsing rule in the grammar, which creates an >>>>>> external action using the set of ITextMarkerExtension mentioned >>>>>> in the descriptor (parameter: additionalExtensions). There is no >>>>>> default syntax check since the possible arguments are of course >>>>>> not yet known by the engine. Syntax checks need to be implemented >>>>>> in the ITextMarkerActionExtension.createAction(), which throws an >>>>>> ANTLRException. The arguments of the action are delegated to this >>>>>> method, which return the action implementation, so there will >>>>>> probably many casts and "if instanceOf" checks. Language >>>>>> constructs like assignments ("feature" = Type) known by the >>>>>> CREATE action, are not yet supported. >>>>>> >>>>>> Unfortunately, there is no automatic integration in the workbench yet. >>>>>> You have to modify the BasicEngine (add the extension) in the >>>>>> textmarker project yourself. The implemenatation of the extension >>>>>> needs of course then also be available to the workbench. >>>>>> >>>>>> I haven't used the language extensions since 2009 (it was a >>>>>> wordnet >>>>>> integration) and they are not yet covered by unit tests. So, >>>>>> there are maybe some bugs due to the changes after the >>>>>> contribution to Apache UIMA. However, I will check the >>>>>> functionality, add a test case and extend the documentation. >>>>>> >>>>>> Concerning the list of available actions: You are of course also >>>>>> welcome to create feature requests for new actions. The current >>>>>> set of actions is mainly based on my own requirements and I will >>>>>> gladly add new reasonable/generic actions (within the limits of my >>>>>> available time). >>>>>> >>>>>> Best, >>>>>> >>>>>> Peter >>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> Will Thompson >>>>>>>
