Hi,

feature assignments for simple regexp rules are now part of the trunk.
RegExpRuleTest.ruta contains an example and the documentation describes
the exact syntax.

Best,

Peter

On 08.05.2013 17:56, William Karl Thompson wrote:
> Many thanks!
>
> -----Original Message-----
> From: Peter Klügl [mailto:[email protected]] 
> Sent: Wednesday, May 08, 2013 2:57 AM
> To: [email protected]
> Subject: Re: Extending TextMarker with new actions
>
> Hi,
>
> I will create a feature request and see when I find the time to implement it.
>
> Best,
>
> Peter
>
> On 08.05.2013 04:38, William Karl Thompson wrote:
>> Hi Peter,
>>
>> What you proposed would work fine for what I was trying to do!
>>
>> Cheers,
>>
>> Will
>>
>> -----Original Message-----
>> From: Peter Klügl [mailto:[email protected]]
>> Sent: Tuesday, May 07, 2013 3:42 AM
>> To: [email protected]
>> Subject: Re: Extending TextMarker with new actions
>>
>> Hi,
>>
>> On 06.05.2013 18:26, William Karl Thompson wrote:
>>> Hi Peter,
>>>
>>> I like the simplified regular expression rule syntax -- very handy. It's 
>>> almost exactly what I wanted.  However, one thing I'm wondering is how to 
>>> create an annotation with features using such rules. I have in mind 
>>> something like the following:
>>>
>>> "(regex string)" -> 1 = CREATE(FooType, "feat" = "bar");
>>>
>>> Here's a possible variant of the above that  I can imagine would be useful 
>>> too:
>>>
>>> "(regex) (string)" -> CREATE(FooType, "feat1" = GROUP(1), 
>>> "feat2"=GROUP(2));
>>>
>>> What are your thoughts on this?
>> I think I won't be able to use the existing code of the CREATE action for 
>> this and it will also be problematic in the grammar without creating a new 
>> context.
>>
>> What about something like:
>>
>> "(regexp) (string)" -> Type1, 1 = Type2 ("feat" = 2);
>>
>> This will of course not work with numeric feature values, but there isn't an 
>> auto-cast anyway...
>>
>> Best,
>>
>> Peter
>>  
>>
>>> Cheers,
>>>
>>> Will
>>>
>>> -----Original Message-----
>>> From: William Karl Thompson
>>> Sent: Thursday, May 02, 2013 1:49 PM
>>> To: [email protected]
>>> Subject: RE: Extending TextMarker with new actions
>>>
>>> Vielen Dank, Ich werde es probieren.
>>>
>>> -----Original Message-----
>>> From: Peter Klügl [mailto:[email protected]]
>>> Sent: Thursday, May 02, 2013 12:42 PM
>>> To: [email protected]
>>> Subject: Re: Extending TextMarker with new actions
>>>
>>> Am 02.05.2013 19:16, schrieb William Karl Thompson:
>>>> I see you're way ahead of me! I'll take a look at this -- is it in the 
>>>> latest on trunk?
>>> Yes, and there is also a unit test (if you are interested in some 
>>> ready-to-work examples):
>>> org.apache.uima.ruta.RegExpRuleTest.java(.ruta,
>>> .txt)
>>>
>>> Peter
>>>
>>>> -----Original Message-----
>>>> From: Peter Klügl [mailto:[email protected]]
>>>> Sent: Thursday, May 02, 2013 12:14 PM
>>>> To: [email protected]
>>>> Subject: Re: Extending TextMarker with new actions
>>>>
>>>> Hi,
>>>>
>>>> oh, I am afraid I recently added something like that for the 2.0.1 
>>>> release, not yet included in the 2.0.0 release. This does not mean 
>>>> that I would not include the action in UIMA Ruta ;-)
>>>>
>>>> Here the excerpt of the documentation:
>>>>
>>>> <section id="ugr.tools.ruta.language.regexprule">
>>>>       <title>Simple Rules based on Regular Expressions</title>
>>>>       <para>
>>>>         The Ruta language includes, additionally to the normal rules, a 
>>>> simplified rule syntax for processing regular expressions.
>>>>         These simple rules consist of two parts separated by
>>>> <quote>-></quote>: The left part is the regular expression
>>>>         (flags: DOTALL and MULTILINE), which may contain capturing groups. 
>>>> The right part defines, which kind of annotations
>>>>         should be created for each match of the regular expression. If a 
>>>> type is given without a group index, then an annotation of that type is
>>>>         created for the complete regular expression match, which 
>>>> corresponds to group 0. These simple rules can be restricted to match only 
>>>> within
>>>>         certain annotations using the BLOCK construct, and ignore all 
>>>> filtering settings.
>>>>       </para>
>>>>
>>>>       <programlisting><![CDATA[
>>>> RegExpRule      -> StringExpression "->" GroupAssignment
>>>>                     ("," GroupAssignment)* ";"
>>>> GroupAssignment -> TypeExpression | NumberEpxression "=" 
>>>> TypeExpression ]]></programlisting>
>>>>
>>>>       <para>
>>>>         The following example contains a simple rule, which is able to 
>>>> create annotations of two different types. It creates an annotation
>>>>         of the type <quote>T1</quote> for each match of the complete 
>>>> regular expression and an annotation
>>>>         of the type <quote>T2</quote> for each match of the first 
>>>> capturing group.
>>>>       </para>
>>>>
>>>>       <programlisting><![CDATA["A(.*?)C" -> T1, 1 = 
>>>> T2;]]></programlisting>
>>>>
>>>>
>>>>     </section>
>>>>
>>>>
>>>>
>>>>
>>>> Am 02.05.2013 19:06, schrieb William Karl Thompson:
>>>>> I forgot to mention, the numeric argument in the proposed MARKREGEXP 
>>>>> action indicates which capturing group is to be used from regular 
>>>>> expression to generate the region for the annotation of the specified 
>>>>> type.
>>>>>
>>>>> -----Original Message-----
>>>>> From: William Karl Thompson
>>>>> Sent: Thursday, May 02, 2013 12:02 PM
>>>>> To: [email protected]
>>>>> Subject: RE: Extending TextMarker with new actions
>>>>>
>>>>> Peter,
>>>>>
>>>>> Thanks for helping me to get going on this, it now works like a charm! 
>>>>> Have been able to generate extensions and have them be recognized by the 
>>>>> Eclipse IDE as per your instructions. Very nice!
>>>>>
>>>>> In the process of doing this, I do have an idea for a possibly useful 
>>>>> action to be added to the current set. The basic idea is implement 
>>>>> functionality similar to that found in the RegularExpressionAnnotator 
>>>>> that is one of the UIMA addons:
>>>>>
>>>>> http://uima.apache.org/sandbox.html#regex.annotator
>>>>>
>>>>> This allows you to define a set of regular expression matches, and to 
>>>>> mark an annotation on the region covered by the match, restricted if 
>>>>> desired by a capturing group within the regular expression. The way I 
>>>>> implemented it experimentally was like the following:
>>>>>
>>>>> Sentence{->MARKREGEXP(TypeExpression, "(?i)(ascending colon) 
>>>>> polyps", 1}; NP{PARTOF(FindingsSection)->MARKREGEXP(TypeExpression,
>>>>> "(?i)tubular adenoma", 0)};
>>>>>
>>>>> The key thing is that the regular expression matching is using the 
>>>>> equivalent of java.util.regex.Matcher.find(), unlike the current 
>>>>> implementation of the REGEXP condition, which uses match():
>>>>>
>>>>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Matcher.
>>>>> h
>>>>> tml#find()
>>>>>
>>>>> Anyway, thanks again for your help getting this all working.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Will
>>>>>
>>>>> ________________________________________
>>>>> From: Peter Klügl [[email protected]]
>>>>> Sent: Monday, April 29, 2013 4:20 PM
>>>>> To: [email protected]
>>>>> Subject: Re: Extending TextMarker with new actions
>>>>>
>>>>> Hi,
>>>>>
>>>>> Am 29.04.2013 20:22, schrieb William Karl Thompson:
>>>>>> Hi Peter,
>>>>>>
>>>>>> I've updated and built the TextMarker projects, but now I'm spinning my 
>>>>>> wheels a bit trying to install the updated TextMarker Workbench feature 
>>>>>> from the projects. Could you give me a tip on how to do that? This isn't 
>>>>>> something I've ever done before, and I'm not having much success at the 
>>>>>> moment.
>>>>> There are different ways. You could either just build the jars and put 
>>>>> them in the dropins folder of your eclipse installation (with no 
>>>>> textmarker installed) - not really recommended. Or, you could build the 
>>>>> update site, which can be used to install the feature and plugins. The 
>>>>> pom of the update site project (was textmarker-eclipse-update-site) has 
>>>>> two important properties: item-maven-release-version and 
>>>>> item-eclipse-release-version. If you want to build an update site using 
>>>>> the SNAPSHOT artifacts, then you need to adapt these values, e.g., to 
>>>>> 2.0.1-SNAPSHOT and 2.0.1.SNAPSHOT. The normal process is to install 
>>>>> everything and then package the update site.
>>>>>
>>>>> You have also to include your extensions somehow, e.g., by extending the 
>>>>> update site (and feature) or by copying the built plugin to the dropins 
>>>>> folder.
>>>>>
>>>>> When I try new stuff, I always start an Eclipse Application using my 
>>>>> textmarker workspace. Here, no installation is needed. I could also build 
>>>>> a textmarker update site with the fixed extensions for you, but 
>>>>> unfortunately not before Thursday.
>>>>>
>>>>> I am currently in the process of renaming all textmarker projects (the 
>>>>> new name is UIMA Ruta). You have to be careful which revision you are 
>>>>> using to build the projects right now, because I wasn't able to finish 
>>>>> the renaming today, and I haven't tested the new update site yet. The 
>>>>> renaming started with revision 1477012. Sorry for the bad timing.
>>>>>
>>>>> Best,
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>>> Many thanks,
>>>>>>
>>>>>> Will
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: William Karl Thompson
>>>>>> Sent: Friday, April 26, 2013 3:40 PM
>>>>>> To: [email protected]
>>>>>> Subject: RE: Extending TextMarker with new actions
>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> Thanks very much, I will try this out!
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Will
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Peter Klügl [mailto:[email protected]]
>>>>>> Sent: Friday, April 26, 2013 4:30 AM
>>>>>> To: [email protected]
>>>>>> Subject: Re: Extending TextMarker with new actions
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 25.04.2013 19:16, William Karl Thompson wrote:
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>>     Many thanks! I was just about to try it out before reading your 
>>>>>>> latest email. Should I check out the latest trunk version from the svn 
>>>>>>> repository tomorrow?
>>>>>> I fixed most problems and committed the changes together with two 
>>>>>> example projects (in
>>>>>> https://svn.apache.org/repos/asf/uima/sandbox/textmarker/trunk/example-projects):
>>>>>>
>>>>>> textmarker-ep-example-extensions contains two parts: the implementation 
>>>>>> of an action (ExampleAction) and the integration in the ide. That's the 
>>>>>> reason, why it is a maven eclipse-plugin project.
>>>>>>
>>>>>> ExtensionsExample is a simple textmarker project, which uses the 
>>>>>> extension.
>>>>>>
>>>>>> The syntax check in the Workbench is not yet correctly integrated. It 
>>>>>> will take a while until I will be able to write the documentation for 
>>>>>> the extensions. Just let me know, if any problems occur.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>> Btw: I am also involved in a project about information extraction 
>>>>>> in clinical texts. That's a quite active area ;-)
>>>>>>
>>>>>>> In terms of feature requests, I appreciate your willingness to consider 
>>>>>>> extensions. My strategy will be to try accomplishing a few tasks first, 
>>>>>>> to see what can be abstracted that is of sufficient generality. As 
>>>>>>> background info, I am creating some NLP applications for clinical text 
>>>>>>> using cTAKES, and I think TextMarker is a nice option to have for 
>>>>>>> rule-based alternatives to certain tasks (like relating two annotations 
>>>>>>> to each other, DiseaseDisorder and AnatomicalLocation in the same 
>>>>>>> sentence). The current cTAKES relation extractor is based on machine 
>>>>>>> learning, and requires an annotated corpus for training, whereas 
>>>>>>> sometimes it's just easier to create a set of rules.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Will
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Peter Klügl [mailto:[email protected]]
>>>>>>> Sent: Thursday, April 25, 2013 10:49 AM
>>>>>>> To: [email protected]
>>>>>>> Subject: Re: Extending TextMarker with new actions
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I checked the language extensions and unfortunately they do not work 
>>>>>>> right now. There are some small bugs, but they will be fixed tomorrow.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>> Am 25.04.2013 11:37, schrieb Peter Klügl:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Am 25.04.2013 03:29, schrieb William Karl Thompson:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> (My apologies, I mistakenly sent this to the dev list 
>>>>>>>>> initially)
>>>>>>>>>
>>>>>>>>> I'm very interested in using the TextMarker project, but the 
>>>>>>>>> current set of action types doesn't quite do what I need. I 
>>>>>>>>> found references to an extension mechanism, have also found the 
>>>>>>>>> ITextMarkerActionExtension interface in the source code. I also 
>>>>>>>>> found the antlr grammar and lexer files where the TextMarker 
>>>>>>>>> language is defined, which appears to be where new action type 
>>>>>>>>> names are to be added. So I surmise the steps to add new 
>>>>>>>>> actions is to
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 1.       Add the desired action signature to the antlr grammar
>>>>>>>>>
>>>>>>>>> 2.       Define an implementation of ITextMarkerActionExtension that
>>>>>>>>> implements the functionality.
>>>>>>>>>
>>>>>>>>> Is there an easier way to do this? My concern is that I need to 
>>>>>>>>> modify TextMarker source files (the grammar and lexer files), 
>>>>>>>>> which would be overwritten on any updated version of TextMarker.
>>>>>>>> This should be possible without changing any textmarker code.
>>>>>>>>
>>>>>>>> There is a generic parsing rule in the grammar, which creates an 
>>>>>>>> external action using the set of ITextMarkerExtension mentioned 
>>>>>>>> in the descriptor (parameter: additionalExtensions). There is no 
>>>>>>>> default syntax check since the possible arguments are of course 
>>>>>>>> not yet known by the engine. Syntax checks need to be 
>>>>>>>> implemented in the ITextMarkerActionExtension.createAction(), 
>>>>>>>> which throws an ANTLRException. The arguments of the action are 
>>>>>>>> delegated to this method, which return the action 
>>>>>>>> implementation, so there will probably many casts and "if 
>>>>>>>> instanceOf" checks. Language constructs like assignments 
>>>>>>>> ("feature" = Type) known by the CREATE action, are not yet supported.
>>>>>>>>
>>>>>>>> Unfortunately, there is no automatic integration in the workbench yet.
>>>>>>>> You have to modify the BasicEngine (add the extension) in the 
>>>>>>>> textmarker project yourself. The implemenatation of the 
>>>>>>>> extension needs of course then also be available to the workbench.
>>>>>>>>
>>>>>>>> I haven't used the language extensions since 2009 (it was a 
>>>>>>>> wordnet
>>>>>>>> integration) and they are not yet covered by unit tests. So, 
>>>>>>>> there are maybe some bugs due to the changes after the 
>>>>>>>> contribution to Apache UIMA. However, I will check the 
>>>>>>>> functionality, add a test case and extend the documentation.
>>>>>>>>
>>>>>>>> Concerning the list of available actions: You are of course also 
>>>>>>>> welcome to create feature requests for new actions. The current 
>>>>>>>> set of actions is mainly based on my own requirements and I will 
>>>>>>>> gladly add new reasonable/generic actions (within the limits of my 
>>>>>>>> available time).
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Peter
>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Will Thompson
>>>>>>>>>

Reply via email to