Vielen Dank, Ich werde es probieren. -----Original Message----- From: Peter Klügl [mailto:[email protected]] Sent: Thursday, May 02, 2013 12:42 PM To: [email protected] Subject: Re: Extending TextMarker with new actions
Am 02.05.2013 19:16, schrieb William Karl Thompson: > I see you're way ahead of me! I'll take a look at this -- is it in the latest > on trunk? Yes, and there is also a unit test (if you are interested in some ready-to-work examples): org.apache.uima.ruta.RegExpRuleTest.java(.ruta, .txt) Peter > -----Original Message----- > From: Peter Klügl [mailto:[email protected]] > Sent: Thursday, May 02, 2013 12:14 PM > To: [email protected] > Subject: Re: Extending TextMarker with new actions > > Hi, > > oh, I am afraid I recently added something like that for the 2.0.1 > release, not yet included in the 2.0.0 release. This does not mean > that I would not include the action in UIMA Ruta ;-) > > Here the excerpt of the documentation: > > <section id="ugr.tools.ruta.language.regexprule"> > <title>Simple Rules based on Regular Expressions</title> > <para> > The Ruta language includes, additionally to the normal rules, a > simplified rule syntax for processing regular expressions. > These simple rules consist of two parts separated by > <quote>-></quote>: The left part is the regular expression > (flags: DOTALL and MULTILINE), which may contain capturing groups. > The right part defines, which kind of annotations > should be created for each match of the regular expression. If a type > is given without a group index, then an annotation of that type is > created for the complete regular expression match, which corresponds > to group 0. These simple rules can be restricted to match only within > certain annotations using the BLOCK construct, and ignore all > filtering settings. > </para> > > <programlisting><![CDATA[ > RegExpRule -> StringExpression "->" GroupAssignment > ("," GroupAssignment)* ";" > GroupAssignment -> TypeExpression | NumberEpxression "=" > TypeExpression ]]></programlisting> > > <para> > The following example contains a simple rule, which is able to create > annotations of two different types. It creates an annotation > of the type <quote>T1</quote> for each match of the complete regular > expression and an annotation > of the type <quote>T2</quote> for each match of the first capturing > group. > </para> > > <programlisting><![CDATA["A(.*?)C" -> T1, 1 = > T2;]]></programlisting> > > > </section> > > > > > Am 02.05.2013 19:06, schrieb William Karl Thompson: >> I forgot to mention, the numeric argument in the proposed MARKREGEXP action >> indicates which capturing group is to be used from regular expression to >> generate the region for the annotation of the specified type. >> >> -----Original Message----- >> From: William Karl Thompson >> Sent: Thursday, May 02, 2013 12:02 PM >> To: [email protected] >> Subject: RE: Extending TextMarker with new actions >> >> Peter, >> >> Thanks for helping me to get going on this, it now works like a charm! Have >> been able to generate extensions and have them be recognized by the Eclipse >> IDE as per your instructions. Very nice! >> >> In the process of doing this, I do have an idea for a possibly useful action >> to be added to the current set. The basic idea is implement functionality >> similar to that found in the RegularExpressionAnnotator that is one of the >> UIMA addons: >> >> http://uima.apache.org/sandbox.html#regex.annotator >> >> This allows you to define a set of regular expression matches, and to mark >> an annotation on the region covered by the match, restricted if desired by a >> capturing group within the regular expression. The way I implemented it >> experimentally was like the following: >> >> Sentence{->MARKREGEXP(TypeExpression, "(?i)(ascending colon) polyps", >> 1}; NP{PARTOF(FindingsSection)->MARKREGEXP(TypeExpression, >> "(?i)tubular adenoma", 0)}; >> >> The key thing is that the regular expression matching is using the >> equivalent of java.util.regex.Matcher.find(), unlike the current >> implementation of the REGEXP condition, which uses match(): >> >> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Matcher. >> h >> tml#find() >> >> Anyway, thanks again for your help getting this all working. >> >> Cheers, >> >> Will >> >> ________________________________________ >> From: Peter Klügl [[email protected]] >> Sent: Monday, April 29, 2013 4:20 PM >> To: [email protected] >> Subject: Re: Extending TextMarker with new actions >> >> Hi, >> >> Am 29.04.2013 20:22, schrieb William Karl Thompson: >>> Hi Peter, >>> >>> I've updated and built the TextMarker projects, but now I'm spinning my >>> wheels a bit trying to install the updated TextMarker Workbench feature >>> from the projects. Could you give me a tip on how to do that? This isn't >>> something I've ever done before, and I'm not having much success at the >>> moment. >> There are different ways. You could either just build the jars and put them >> in the dropins folder of your eclipse installation (with no textmarker >> installed) - not really recommended. Or, you could build the update site, >> which can be used to install the feature and plugins. The pom of the update >> site project (was textmarker-eclipse-update-site) has two important >> properties: item-maven-release-version and item-eclipse-release-version. If >> you want to build an update site using the SNAPSHOT artifacts, then you need >> to adapt these values, e.g., to 2.0.1-SNAPSHOT and 2.0.1.SNAPSHOT. The >> normal process is to install everything and then package the update site. >> >> You have also to include your extensions somehow, e.g., by extending the >> update site (and feature) or by copying the built plugin to the dropins >> folder. >> >> When I try new stuff, I always start an Eclipse Application using my >> textmarker workspace. Here, no installation is needed. I could also build a >> textmarker update site with the fixed extensions for you, but unfortunately >> not before Thursday. >> >> I am currently in the process of renaming all textmarker projects (the new >> name is UIMA Ruta). You have to be careful which revision you are using to >> build the projects right now, because I wasn't able to finish the renaming >> today, and I haven't tested the new update site yet. The renaming started >> with revision 1477012. Sorry for the bad timing. >> >> Best, >> >> Peter >> >> >>> Many thanks, >>> >>> Will >>> >>> -----Original Message----- >>> From: William Karl Thompson >>> Sent: Friday, April 26, 2013 3:40 PM >>> To: [email protected] >>> Subject: RE: Extending TextMarker with new actions >>> >>> Hi Peter, >>> >>> Thanks very much, I will try this out! >>> >>> Best, >>> >>> Will >>> >>> -----Original Message----- >>> From: Peter Klügl [mailto:[email protected]] >>> Sent: Friday, April 26, 2013 4:30 AM >>> To: [email protected] >>> Subject: Re: Extending TextMarker with new actions >>> >>> Hi, >>> >>> On 25.04.2013 19:16, William Karl Thompson wrote: >>>> Hi Peter, >>>> >>>> Many thanks! I was just about to try it out before reading your latest >>>> email. Should I check out the latest trunk version from the svn repository >>>> tomorrow? >>> I fixed most problems and committed the changes together with two >>> example projects (in >>> https://svn.apache.org/repos/asf/uima/sandbox/textmarker/trunk/example-projects): >>> >>> textmarker-ep-example-extensions contains two parts: the implementation of >>> an action (ExampleAction) and the integration in the ide. That's the >>> reason, why it is a maven eclipse-plugin project. >>> >>> ExtensionsExample is a simple textmarker project, which uses the extension. >>> >>> The syntax check in the Workbench is not yet correctly integrated. It will >>> take a while until I will be able to write the documentation for the >>> extensions. Just let me know, if any problems occur. >>> >>> Best, >>> >>> Peter >>> >>> Btw: I am also involved in a project about information extraction in >>> clinical texts. That's a quite active area ;-) >>> >>>> In terms of feature requests, I appreciate your willingness to consider >>>> extensions. My strategy will be to try accomplishing a few tasks first, to >>>> see what can be abstracted that is of sufficient generality. As background >>>> info, I am creating some NLP applications for clinical text using cTAKES, >>>> and I think TextMarker is a nice option to have for rule-based >>>> alternatives to certain tasks (like relating two annotations to each >>>> other, DiseaseDisorder and AnatomicalLocation in the same sentence). The >>>> current cTAKES relation extractor is based on machine learning, and >>>> requires an annotated corpus for training, whereas sometimes it's just >>>> easier to create a set of rules. >>>> >>>> Cheers, >>>> >>>> Will >>>> >>>> -----Original Message----- >>>> From: Peter Klügl [mailto:[email protected]] >>>> Sent: Thursday, April 25, 2013 10:49 AM >>>> To: [email protected] >>>> Subject: Re: Extending TextMarker with new actions >>>> >>>> Hi, >>>> >>>> I checked the language extensions and unfortunately they do not work right >>>> now. There are some small bugs, but they will be fixed tomorrow. >>>> >>>> Best, >>>> >>>> Peter >>>> >>>> Am 25.04.2013 11:37, schrieb Peter Klügl: >>>>> Hi, >>>>> >>>>> Am 25.04.2013 03:29, schrieb William Karl Thompson: >>>>>> Hello, >>>>>> >>>>>> (My apologies, I mistakenly sent this to the dev list initially) >>>>>> >>>>>> I'm very interested in using the TextMarker project, but the >>>>>> current set of action types doesn't quite do what I need. I found >>>>>> references to an extension mechanism, have also found the >>>>>> ITextMarkerActionExtension interface in the source code. I also >>>>>> found the antlr grammar and lexer files where the TextMarker >>>>>> language is defined, which appears to be where new action type >>>>>> names are to be added. So I surmise the steps to add new actions >>>>>> is to >>>>>> >>>>>> >>>>>> 1. Add the desired action signature to the antlr grammar >>>>>> >>>>>> 2. Define an implementation of ITextMarkerActionExtension that >>>>>> implements the functionality. >>>>>> >>>>>> Is there an easier way to do this? My concern is that I need to >>>>>> modify TextMarker source files (the grammar and lexer files), >>>>>> which would be overwritten on any updated version of TextMarker. >>>>> This should be possible without changing any textmarker code. >>>>> >>>>> There is a generic parsing rule in the grammar, which creates an >>>>> external action using the set of ITextMarkerExtension mentioned in >>>>> the descriptor (parameter: additionalExtensions). There is no >>>>> default syntax check since the possible arguments are of course >>>>> not yet known by the engine. Syntax checks need to be implemented >>>>> in the ITextMarkerActionExtension.createAction(), which throws an >>>>> ANTLRException. The arguments of the action are delegated to this >>>>> method, which return the action implementation, so there will >>>>> probably many casts and "if instanceOf" checks. Language >>>>> constructs like assignments ("feature" = Type) known by the CREATE >>>>> action, are not yet supported. >>>>> >>>>> Unfortunately, there is no automatic integration in the workbench yet. >>>>> You have to modify the BasicEngine (add the extension) in the >>>>> textmarker project yourself. The implemenatation of the extension >>>>> needs of course then also be available to the workbench. >>>>> >>>>> I haven't used the language extensions since 2009 (it was a >>>>> wordnet >>>>> integration) and they are not yet covered by unit tests. So, there >>>>> are maybe some bugs due to the changes after the contribution to >>>>> Apache UIMA. However, I will check the functionality, add a test >>>>> case and extend the documentation. >>>>> >>>>> Concerning the list of available actions: You are of course also >>>>> welcome to create feature requests for new actions. The current >>>>> set of actions is mainly based on my own requirements and I will >>>>> gladly add new reasonable/generic actions (within the limits of my >>>>> available time). >>>>> >>>>> Best, >>>>> >>>>> Peter >>>>> >>>>>> Thanks! >>>>>> >>>>>> Will Thompson >>>>>>
