Many thanks! -----Original Message----- From: Peter Klügl [mailto:[email protected]] Sent: Wednesday, May 08, 2013 2:57 AM To: [email protected] Subject: Re: Extending TextMarker with new actions
Hi, I will create a feature request and see when I find the time to implement it. Best, Peter On 08.05.2013 04:38, William Karl Thompson wrote: > Hi Peter, > > What you proposed would work fine for what I was trying to do! > > Cheers, > > Will > > -----Original Message----- > From: Peter Klügl [mailto:[email protected]] > Sent: Tuesday, May 07, 2013 3:42 AM > To: [email protected] > Subject: Re: Extending TextMarker with new actions > > Hi, > > On 06.05.2013 18:26, William Karl Thompson wrote: >> Hi Peter, >> >> I like the simplified regular expression rule syntax -- very handy. It's >> almost exactly what I wanted. However, one thing I'm wondering is how to >> create an annotation with features using such rules. I have in mind >> something like the following: >> >> "(regex string)" -> 1 = CREATE(FooType, "feat" = "bar"); >> >> Here's a possible variant of the above that I can imagine would be useful >> too: >> >> "(regex) (string)" -> CREATE(FooType, "feat1" = GROUP(1), >> "feat2"=GROUP(2)); >> >> What are your thoughts on this? > I think I won't be able to use the existing code of the CREATE action for > this and it will also be problematic in the grammar without creating a new > context. > > What about something like: > > "(regexp) (string)" -> Type1, 1 = Type2 ("feat" = 2); > > This will of course not work with numeric feature values, but there isn't an > auto-cast anyway... > > Best, > > Peter > > >> Cheers, >> >> Will >> >> -----Original Message----- >> From: William Karl Thompson >> Sent: Thursday, May 02, 2013 1:49 PM >> To: [email protected] >> Subject: RE: Extending TextMarker with new actions >> >> Vielen Dank, Ich werde es probieren. >> >> -----Original Message----- >> From: Peter Klügl [mailto:[email protected]] >> Sent: Thursday, May 02, 2013 12:42 PM >> To: [email protected] >> Subject: Re: Extending TextMarker with new actions >> >> Am 02.05.2013 19:16, schrieb William Karl Thompson: >>> I see you're way ahead of me! I'll take a look at this -- is it in the >>> latest on trunk? >> Yes, and there is also a unit test (if you are interested in some >> ready-to-work examples): >> org.apache.uima.ruta.RegExpRuleTest.java(.ruta, >> .txt) >> >> Peter >> >>> -----Original Message----- >>> From: Peter Klügl [mailto:[email protected]] >>> Sent: Thursday, May 02, 2013 12:14 PM >>> To: [email protected] >>> Subject: Re: Extending TextMarker with new actions >>> >>> Hi, >>> >>> oh, I am afraid I recently added something like that for the 2.0.1 >>> release, not yet included in the 2.0.0 release. This does not mean >>> that I would not include the action in UIMA Ruta ;-) >>> >>> Here the excerpt of the documentation: >>> >>> <section id="ugr.tools.ruta.language.regexprule"> >>> <title>Simple Rules based on Regular Expressions</title> >>> <para> >>> The Ruta language includes, additionally to the normal rules, a >>> simplified rule syntax for processing regular expressions. >>> These simple rules consist of two parts separated by >>> <quote>-></quote>: The left part is the regular expression >>> (flags: DOTALL and MULTILINE), which may contain capturing groups. >>> The right part defines, which kind of annotations >>> should be created for each match of the regular expression. If a >>> type is given without a group index, then an annotation of that type is >>> created for the complete regular expression match, which >>> corresponds to group 0. These simple rules can be restricted to match only >>> within >>> certain annotations using the BLOCK construct, and ignore all >>> filtering settings. >>> </para> >>> >>> <programlisting><![CDATA[ >>> RegExpRule -> StringExpression "->" GroupAssignment >>> ("," GroupAssignment)* ";" >>> GroupAssignment -> TypeExpression | NumberEpxression "=" >>> TypeExpression ]]></programlisting> >>> >>> <para> >>> The following example contains a simple rule, which is able to >>> create annotations of two different types. It creates an annotation >>> of the type <quote>T1</quote> for each match of the complete >>> regular expression and an annotation >>> of the type <quote>T2</quote> for each match of the first capturing >>> group. >>> </para> >>> >>> <programlisting><![CDATA["A(.*?)C" -> T1, 1 = >>> T2;]]></programlisting> >>> >>> >>> </section> >>> >>> >>> >>> >>> Am 02.05.2013 19:06, schrieb William Karl Thompson: >>>> I forgot to mention, the numeric argument in the proposed MARKREGEXP >>>> action indicates which capturing group is to be used from regular >>>> expression to generate the region for the annotation of the specified type. >>>> >>>> -----Original Message----- >>>> From: William Karl Thompson >>>> Sent: Thursday, May 02, 2013 12:02 PM >>>> To: [email protected] >>>> Subject: RE: Extending TextMarker with new actions >>>> >>>> Peter, >>>> >>>> Thanks for helping me to get going on this, it now works like a charm! >>>> Have been able to generate extensions and have them be recognized by the >>>> Eclipse IDE as per your instructions. Very nice! >>>> >>>> In the process of doing this, I do have an idea for a possibly useful >>>> action to be added to the current set. The basic idea is implement >>>> functionality similar to that found in the RegularExpressionAnnotator that >>>> is one of the UIMA addons: >>>> >>>> http://uima.apache.org/sandbox.html#regex.annotator >>>> >>>> This allows you to define a set of regular expression matches, and to mark >>>> an annotation on the region covered by the match, restricted if desired by >>>> a capturing group within the regular expression. The way I implemented it >>>> experimentally was like the following: >>>> >>>> Sentence{->MARKREGEXP(TypeExpression, "(?i)(ascending colon) >>>> polyps", 1}; NP{PARTOF(FindingsSection)->MARKREGEXP(TypeExpression, >>>> "(?i)tubular adenoma", 0)}; >>>> >>>> The key thing is that the regular expression matching is using the >>>> equivalent of java.util.regex.Matcher.find(), unlike the current >>>> implementation of the REGEXP condition, which uses match(): >>>> >>>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Matcher. >>>> h >>>> tml#find() >>>> >>>> Anyway, thanks again for your help getting this all working. >>>> >>>> Cheers, >>>> >>>> Will >>>> >>>> ________________________________________ >>>> From: Peter Klügl [[email protected]] >>>> Sent: Monday, April 29, 2013 4:20 PM >>>> To: [email protected] >>>> Subject: Re: Extending TextMarker with new actions >>>> >>>> Hi, >>>> >>>> Am 29.04.2013 20:22, schrieb William Karl Thompson: >>>>> Hi Peter, >>>>> >>>>> I've updated and built the TextMarker projects, but now I'm spinning my >>>>> wheels a bit trying to install the updated TextMarker Workbench feature >>>>> from the projects. Could you give me a tip on how to do that? This isn't >>>>> something I've ever done before, and I'm not having much success at the >>>>> moment. >>>> There are different ways. You could either just build the jars and put >>>> them in the dropins folder of your eclipse installation (with no >>>> textmarker installed) - not really recommended. Or, you could build the >>>> update site, which can be used to install the feature and plugins. The pom >>>> of the update site project (was textmarker-eclipse-update-site) has two >>>> important properties: item-maven-release-version and >>>> item-eclipse-release-version. If you want to build an update site using >>>> the SNAPSHOT artifacts, then you need to adapt these values, e.g., to >>>> 2.0.1-SNAPSHOT and 2.0.1.SNAPSHOT. The normal process is to install >>>> everything and then package the update site. >>>> >>>> You have also to include your extensions somehow, e.g., by extending the >>>> update site (and feature) or by copying the built plugin to the dropins >>>> folder. >>>> >>>> When I try new stuff, I always start an Eclipse Application using my >>>> textmarker workspace. Here, no installation is needed. I could also build >>>> a textmarker update site with the fixed extensions for you, but >>>> unfortunately not before Thursday. >>>> >>>> I am currently in the process of renaming all textmarker projects (the new >>>> name is UIMA Ruta). You have to be careful which revision you are using to >>>> build the projects right now, because I wasn't able to finish the renaming >>>> today, and I haven't tested the new update site yet. The renaming started >>>> with revision 1477012. Sorry for the bad timing. >>>> >>>> Best, >>>> >>>> Peter >>>> >>>> >>>>> Many thanks, >>>>> >>>>> Will >>>>> >>>>> -----Original Message----- >>>>> From: William Karl Thompson >>>>> Sent: Friday, April 26, 2013 3:40 PM >>>>> To: [email protected] >>>>> Subject: RE: Extending TextMarker with new actions >>>>> >>>>> Hi Peter, >>>>> >>>>> Thanks very much, I will try this out! >>>>> >>>>> Best, >>>>> >>>>> Will >>>>> >>>>> -----Original Message----- >>>>> From: Peter Klügl [mailto:[email protected]] >>>>> Sent: Friday, April 26, 2013 4:30 AM >>>>> To: [email protected] >>>>> Subject: Re: Extending TextMarker with new actions >>>>> >>>>> Hi, >>>>> >>>>> On 25.04.2013 19:16, William Karl Thompson wrote: >>>>>> Hi Peter, >>>>>> >>>>>> Many thanks! I was just about to try it out before reading your >>>>>> latest email. Should I check out the latest trunk version from the svn >>>>>> repository tomorrow? >>>>> I fixed most problems and committed the changes together with two >>>>> example projects (in >>>>> https://svn.apache.org/repos/asf/uima/sandbox/textmarker/trunk/example-projects): >>>>> >>>>> textmarker-ep-example-extensions contains two parts: the implementation >>>>> of an action (ExampleAction) and the integration in the ide. That's the >>>>> reason, why it is a maven eclipse-plugin project. >>>>> >>>>> ExtensionsExample is a simple textmarker project, which uses the >>>>> extension. >>>>> >>>>> The syntax check in the Workbench is not yet correctly integrated. It >>>>> will take a while until I will be able to write the documentation for the >>>>> extensions. Just let me know, if any problems occur. >>>>> >>>>> Best, >>>>> >>>>> Peter >>>>> >>>>> Btw: I am also involved in a project about information extraction >>>>> in clinical texts. That's a quite active area ;-) >>>>> >>>>>> In terms of feature requests, I appreciate your willingness to consider >>>>>> extensions. My strategy will be to try accomplishing a few tasks first, >>>>>> to see what can be abstracted that is of sufficient generality. As >>>>>> background info, I am creating some NLP applications for clinical text >>>>>> using cTAKES, and I think TextMarker is a nice option to have for >>>>>> rule-based alternatives to certain tasks (like relating two annotations >>>>>> to each other, DiseaseDisorder and AnatomicalLocation in the same >>>>>> sentence). The current cTAKES relation extractor is based on machine >>>>>> learning, and requires an annotated corpus for training, whereas >>>>>> sometimes it's just easier to create a set of rules. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Will >>>>>> >>>>>> -----Original Message----- >>>>>> From: Peter Klügl [mailto:[email protected]] >>>>>> Sent: Thursday, April 25, 2013 10:49 AM >>>>>> To: [email protected] >>>>>> Subject: Re: Extending TextMarker with new actions >>>>>> >>>>>> Hi, >>>>>> >>>>>> I checked the language extensions and unfortunately they do not work >>>>>> right now. There are some small bugs, but they will be fixed tomorrow. >>>>>> >>>>>> Best, >>>>>> >>>>>> Peter >>>>>> >>>>>> Am 25.04.2013 11:37, schrieb Peter Klügl: >>>>>>> Hi, >>>>>>> >>>>>>> Am 25.04.2013 03:29, schrieb William Karl Thompson: >>>>>>>> Hello, >>>>>>>> >>>>>>>> (My apologies, I mistakenly sent this to the dev list >>>>>>>> initially) >>>>>>>> >>>>>>>> I'm very interested in using the TextMarker project, but the >>>>>>>> current set of action types doesn't quite do what I need. I >>>>>>>> found references to an extension mechanism, have also found the >>>>>>>> ITextMarkerActionExtension interface in the source code. I also >>>>>>>> found the antlr grammar and lexer files where the TextMarker >>>>>>>> language is defined, which appears to be where new action type >>>>>>>> names are to be added. So I surmise the steps to add new >>>>>>>> actions is to >>>>>>>> >>>>>>>> >>>>>>>> 1. Add the desired action signature to the antlr grammar >>>>>>>> >>>>>>>> 2. Define an implementation of ITextMarkerActionExtension that >>>>>>>> implements the functionality. >>>>>>>> >>>>>>>> Is there an easier way to do this? My concern is that I need to >>>>>>>> modify TextMarker source files (the grammar and lexer files), >>>>>>>> which would be overwritten on any updated version of TextMarker. >>>>>>> This should be possible without changing any textmarker code. >>>>>>> >>>>>>> There is a generic parsing rule in the grammar, which creates an >>>>>>> external action using the set of ITextMarkerExtension mentioned >>>>>>> in the descriptor (parameter: additionalExtensions). There is no >>>>>>> default syntax check since the possible arguments are of course >>>>>>> not yet known by the engine. Syntax checks need to be >>>>>>> implemented in the ITextMarkerActionExtension.createAction(), >>>>>>> which throws an ANTLRException. The arguments of the action are >>>>>>> delegated to this method, which return the action >>>>>>> implementation, so there will probably many casts and "if >>>>>>> instanceOf" checks. Language constructs like assignments >>>>>>> ("feature" = Type) known by the CREATE action, are not yet supported. >>>>>>> >>>>>>> Unfortunately, there is no automatic integration in the workbench yet. >>>>>>> You have to modify the BasicEngine (add the extension) in the >>>>>>> textmarker project yourself. The implemenatation of the >>>>>>> extension needs of course then also be available to the workbench. >>>>>>> >>>>>>> I haven't used the language extensions since 2009 (it was a >>>>>>> wordnet >>>>>>> integration) and they are not yet covered by unit tests. So, >>>>>>> there are maybe some bugs due to the changes after the >>>>>>> contribution to Apache UIMA. However, I will check the >>>>>>> functionality, add a test case and extend the documentation. >>>>>>> >>>>>>> Concerning the list of available actions: You are of course also >>>>>>> welcome to create feature requests for new actions. The current >>>>>>> set of actions is mainly based on my own requirements and I will >>>>>>> gladly add new reasonable/generic actions (within the limits of my >>>>>>> available time). >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Peter >>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> Will Thompson >>>>>>>>
