Many thanks!

-----Original Message-----
From: Peter Klügl [mailto:[email protected]] 
Sent: Wednesday, May 08, 2013 2:57 AM
To: [email protected]
Subject: Re: Extending TextMarker with new actions

Hi,

I will create a feature request and see when I find the time to implement it.

Best,

Peter

On 08.05.2013 04:38, William Karl Thompson wrote:
> Hi Peter,
>
> What you proposed would work fine for what I was trying to do!
>
> Cheers,
>
> Will
>
> -----Original Message-----
> From: Peter Klügl [mailto:[email protected]]
> Sent: Tuesday, May 07, 2013 3:42 AM
> To: [email protected]
> Subject: Re: Extending TextMarker with new actions
>
> Hi,
>
> On 06.05.2013 18:26, William Karl Thompson wrote:
>> Hi Peter,
>>
>> I like the simplified regular expression rule syntax -- very handy. It's 
>> almost exactly what I wanted.  However, one thing I'm wondering is how to 
>> create an annotation with features using such rules. I have in mind 
>> something like the following:
>>
>> "(regex string)" -> 1 = CREATE(FooType, "feat" = "bar");
>>
>> Here's a possible variant of the above that  I can imagine would be useful 
>> too:
>>
>> "(regex) (string)" -> CREATE(FooType, "feat1" = GROUP(1), 
>> "feat2"=GROUP(2));
>>
>> What are your thoughts on this?
> I think I won't be able to use the existing code of the CREATE action for 
> this and it will also be problematic in the grammar without creating a new 
> context.
>
> What about something like:
>
> "(regexp) (string)" -> Type1, 1 = Type2 ("feat" = 2);
>
> This will of course not work with numeric feature values, but there isn't an 
> auto-cast anyway...
>
> Best,
>
> Peter
>  
>
>> Cheers,
>>
>> Will
>>
>> -----Original Message-----
>> From: William Karl Thompson
>> Sent: Thursday, May 02, 2013 1:49 PM
>> To: [email protected]
>> Subject: RE: Extending TextMarker with new actions
>>
>> Vielen Dank, Ich werde es probieren.
>>
>> -----Original Message-----
>> From: Peter Klügl [mailto:[email protected]]
>> Sent: Thursday, May 02, 2013 12:42 PM
>> To: [email protected]
>> Subject: Re: Extending TextMarker with new actions
>>
>> Am 02.05.2013 19:16, schrieb William Karl Thompson:
>>> I see you're way ahead of me! I'll take a look at this -- is it in the 
>>> latest on trunk?
>> Yes, and there is also a unit test (if you are interested in some 
>> ready-to-work examples):
>> org.apache.uima.ruta.RegExpRuleTest.java(.ruta,
>> .txt)
>>
>> Peter
>>
>>> -----Original Message-----
>>> From: Peter Klügl [mailto:[email protected]]
>>> Sent: Thursday, May 02, 2013 12:14 PM
>>> To: [email protected]
>>> Subject: Re: Extending TextMarker with new actions
>>>
>>> Hi,
>>>
>>> oh, I am afraid I recently added something like that for the 2.0.1 
>>> release, not yet included in the 2.0.0 release. This does not mean 
>>> that I would not include the action in UIMA Ruta ;-)
>>>
>>> Here the excerpt of the documentation:
>>>
>>> <section id="ugr.tools.ruta.language.regexprule">
>>>       <title>Simple Rules based on Regular Expressions</title>
>>>       <para>
>>>         The Ruta language includes, additionally to the normal rules, a 
>>> simplified rule syntax for processing regular expressions.
>>>         These simple rules consist of two parts separated by
>>> <quote>-></quote>: The left part is the regular expression
>>>         (flags: DOTALL and MULTILINE), which may contain capturing groups. 
>>> The right part defines, which kind of annotations
>>>         should be created for each match of the regular expression. If a 
>>> type is given without a group index, then an annotation of that type is
>>>         created for the complete regular expression match, which 
>>> corresponds to group 0. These simple rules can be restricted to match only 
>>> within
>>>         certain annotations using the BLOCK construct, and ignore all 
>>> filtering settings.
>>>       </para>
>>>
>>>       <programlisting><![CDATA[
>>> RegExpRule      -> StringExpression "->" GroupAssignment
>>>                     ("," GroupAssignment)* ";"
>>> GroupAssignment -> TypeExpression | NumberEpxression "=" 
>>> TypeExpression ]]></programlisting>
>>>
>>>       <para>
>>>         The following example contains a simple rule, which is able to 
>>> create annotations of two different types. It creates an annotation
>>>         of the type <quote>T1</quote> for each match of the complete 
>>> regular expression and an annotation
>>>         of the type <quote>T2</quote> for each match of the first capturing 
>>> group.
>>>       </para>
>>>
>>>       <programlisting><![CDATA["A(.*?)C" -> T1, 1 = 
>>> T2;]]></programlisting>
>>>
>>>
>>>     </section>
>>>
>>>
>>>
>>>
>>> Am 02.05.2013 19:06, schrieb William Karl Thompson:
>>>> I forgot to mention, the numeric argument in the proposed MARKREGEXP 
>>>> action indicates which capturing group is to be used from regular 
>>>> expression to generate the region for the annotation of the specified type.
>>>>
>>>> -----Original Message-----
>>>> From: William Karl Thompson
>>>> Sent: Thursday, May 02, 2013 12:02 PM
>>>> To: [email protected]
>>>> Subject: RE: Extending TextMarker with new actions
>>>>
>>>> Peter,
>>>>
>>>> Thanks for helping me to get going on this, it now works like a charm! 
>>>> Have been able to generate extensions and have them be recognized by the 
>>>> Eclipse IDE as per your instructions. Very nice!
>>>>
>>>> In the process of doing this, I do have an idea for a possibly useful 
>>>> action to be added to the current set. The basic idea is implement 
>>>> functionality similar to that found in the RegularExpressionAnnotator that 
>>>> is one of the UIMA addons:
>>>>
>>>> http://uima.apache.org/sandbox.html#regex.annotator
>>>>
>>>> This allows you to define a set of regular expression matches, and to mark 
>>>> an annotation on the region covered by the match, restricted if desired by 
>>>> a capturing group within the regular expression. The way I implemented it 
>>>> experimentally was like the following:
>>>>
>>>> Sentence{->MARKREGEXP(TypeExpression, "(?i)(ascending colon) 
>>>> polyps", 1}; NP{PARTOF(FindingsSection)->MARKREGEXP(TypeExpression,
>>>> "(?i)tubular adenoma", 0)};
>>>>
>>>> The key thing is that the regular expression matching is using the 
>>>> equivalent of java.util.regex.Matcher.find(), unlike the current 
>>>> implementation of the REGEXP condition, which uses match():
>>>>
>>>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Matcher.
>>>> h
>>>> tml#find()
>>>>
>>>> Anyway, thanks again for your help getting this all working.
>>>>
>>>> Cheers,
>>>>
>>>> Will
>>>>
>>>> ________________________________________
>>>> From: Peter Klügl [[email protected]]
>>>> Sent: Monday, April 29, 2013 4:20 PM
>>>> To: [email protected]
>>>> Subject: Re: Extending TextMarker with new actions
>>>>
>>>> Hi,
>>>>
>>>> Am 29.04.2013 20:22, schrieb William Karl Thompson:
>>>>> Hi Peter,
>>>>>
>>>>> I've updated and built the TextMarker projects, but now I'm spinning my 
>>>>> wheels a bit trying to install the updated TextMarker Workbench feature 
>>>>> from the projects. Could you give me a tip on how to do that? This isn't 
>>>>> something I've ever done before, and I'm not having much success at the 
>>>>> moment.
>>>> There are different ways. You could either just build the jars and put 
>>>> them in the dropins folder of your eclipse installation (with no 
>>>> textmarker installed) - not really recommended. Or, you could build the 
>>>> update site, which can be used to install the feature and plugins. The pom 
>>>> of the update site project (was textmarker-eclipse-update-site) has two 
>>>> important properties: item-maven-release-version and 
>>>> item-eclipse-release-version. If you want to build an update site using 
>>>> the SNAPSHOT artifacts, then you need to adapt these values, e.g., to 
>>>> 2.0.1-SNAPSHOT and 2.0.1.SNAPSHOT. The normal process is to install 
>>>> everything and then package the update site.
>>>>
>>>> You have also to include your extensions somehow, e.g., by extending the 
>>>> update site (and feature) or by copying the built plugin to the dropins 
>>>> folder.
>>>>
>>>> When I try new stuff, I always start an Eclipse Application using my 
>>>> textmarker workspace. Here, no installation is needed. I could also build 
>>>> a textmarker update site with the fixed extensions for you, but 
>>>> unfortunately not before Thursday.
>>>>
>>>> I am currently in the process of renaming all textmarker projects (the new 
>>>> name is UIMA Ruta). You have to be careful which revision you are using to 
>>>> build the projects right now, because I wasn't able to finish the renaming 
>>>> today, and I haven't tested the new update site yet. The renaming started 
>>>> with revision 1477012. Sorry for the bad timing.
>>>>
>>>> Best,
>>>>
>>>> Peter
>>>>
>>>>
>>>>> Many thanks,
>>>>>
>>>>> Will
>>>>>
>>>>> -----Original Message-----
>>>>> From: William Karl Thompson
>>>>> Sent: Friday, April 26, 2013 3:40 PM
>>>>> To: [email protected]
>>>>> Subject: RE: Extending TextMarker with new actions
>>>>>
>>>>> Hi Peter,
>>>>>
>>>>> Thanks very much, I will try this out!
>>>>>
>>>>> Best,
>>>>>
>>>>> Will
>>>>>
>>>>> -----Original Message-----
>>>>> From: Peter Klügl [mailto:[email protected]]
>>>>> Sent: Friday, April 26, 2013 4:30 AM
>>>>> To: [email protected]
>>>>> Subject: Re: Extending TextMarker with new actions
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 25.04.2013 19:16, William Karl Thompson wrote:
>>>>>> Hi Peter,
>>>>>>
>>>>>>     Many thanks! I was just about to try it out before reading your 
>>>>>> latest email. Should I check out the latest trunk version from the svn 
>>>>>> repository tomorrow?
>>>>> I fixed most problems and committed the changes together with two 
>>>>> example projects (in
>>>>> https://svn.apache.org/repos/asf/uima/sandbox/textmarker/trunk/example-projects):
>>>>>
>>>>> textmarker-ep-example-extensions contains two parts: the implementation 
>>>>> of an action (ExampleAction) and the integration in the ide. That's the 
>>>>> reason, why it is a maven eclipse-plugin project.
>>>>>
>>>>> ExtensionsExample is a simple textmarker project, which uses the 
>>>>> extension.
>>>>>
>>>>> The syntax check in the Workbench is not yet correctly integrated. It 
>>>>> will take a while until I will be able to write the documentation for the 
>>>>> extensions. Just let me know, if any problems occur.
>>>>>
>>>>> Best,
>>>>>
>>>>> Peter
>>>>>
>>>>> Btw: I am also involved in a project about information extraction 
>>>>> in clinical texts. That's a quite active area ;-)
>>>>>
>>>>>> In terms of feature requests, I appreciate your willingness to consider 
>>>>>> extensions. My strategy will be to try accomplishing a few tasks first, 
>>>>>> to see what can be abstracted that is of sufficient generality. As 
>>>>>> background info, I am creating some NLP applications for clinical text 
>>>>>> using cTAKES, and I think TextMarker is a nice option to have for 
>>>>>> rule-based alternatives to certain tasks (like relating two annotations 
>>>>>> to each other, DiseaseDisorder and AnatomicalLocation in the same 
>>>>>> sentence). The current cTAKES relation extractor is based on machine 
>>>>>> learning, and requires an annotated corpus for training, whereas 
>>>>>> sometimes it's just easier to create a set of rules.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Will
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Peter Klügl [mailto:[email protected]]
>>>>>> Sent: Thursday, April 25, 2013 10:49 AM
>>>>>> To: [email protected]
>>>>>> Subject: Re: Extending TextMarker with new actions
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I checked the language extensions and unfortunately they do not work 
>>>>>> right now. There are some small bugs, but they will be fixed tomorrow.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>> Am 25.04.2013 11:37, schrieb Peter Klügl:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Am 25.04.2013 03:29, schrieb William Karl Thompson:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> (My apologies, I mistakenly sent this to the dev list 
>>>>>>>> initially)
>>>>>>>>
>>>>>>>> I'm very interested in using the TextMarker project, but the 
>>>>>>>> current set of action types doesn't quite do what I need. I 
>>>>>>>> found references to an extension mechanism, have also found the 
>>>>>>>> ITextMarkerActionExtension interface in the source code. I also 
>>>>>>>> found the antlr grammar and lexer files where the TextMarker 
>>>>>>>> language is defined, which appears to be where new action type 
>>>>>>>> names are to be added. So I surmise the steps to add new 
>>>>>>>> actions is to
>>>>>>>>
>>>>>>>>
>>>>>>>> 1.       Add the desired action signature to the antlr grammar
>>>>>>>>
>>>>>>>> 2.       Define an implementation of ITextMarkerActionExtension that
>>>>>>>> implements the functionality.
>>>>>>>>
>>>>>>>> Is there an easier way to do this? My concern is that I need to 
>>>>>>>> modify TextMarker source files (the grammar and lexer files), 
>>>>>>>> which would be overwritten on any updated version of TextMarker.
>>>>>>> This should be possible without changing any textmarker code.
>>>>>>>
>>>>>>> There is a generic parsing rule in the grammar, which creates an 
>>>>>>> external action using the set of ITextMarkerExtension mentioned 
>>>>>>> in the descriptor (parameter: additionalExtensions). There is no 
>>>>>>> default syntax check since the possible arguments are of course 
>>>>>>> not yet known by the engine. Syntax checks need to be 
>>>>>>> implemented in the ITextMarkerActionExtension.createAction(), 
>>>>>>> which throws an ANTLRException. The arguments of the action are 
>>>>>>> delegated to this method, which return the action 
>>>>>>> implementation, so there will probably many casts and "if 
>>>>>>> instanceOf" checks. Language constructs like assignments 
>>>>>>> ("feature" = Type) known by the CREATE action, are not yet supported.
>>>>>>>
>>>>>>> Unfortunately, there is no automatic integration in the workbench yet.
>>>>>>> You have to modify the BasicEngine (add the extension) in the 
>>>>>>> textmarker project yourself. The implemenatation of the 
>>>>>>> extension needs of course then also be available to the workbench.
>>>>>>>
>>>>>>> I haven't used the language extensions since 2009 (it was a 
>>>>>>> wordnet
>>>>>>> integration) and they are not yet covered by unit tests. So, 
>>>>>>> there are maybe some bugs due to the changes after the 
>>>>>>> contribution to Apache UIMA. However, I will check the 
>>>>>>> functionality, add a test case and extend the documentation.
>>>>>>>
>>>>>>> Concerning the list of available actions: You are of course also 
>>>>>>> welcome to create feature requests for new actions. The current 
>>>>>>> set of actions is mainly based on my own requirements and I will 
>>>>>>> gladly add new reasonable/generic actions (within the limits of my 
>>>>>>> available time).
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Will Thompson
>>>>>>>>

Reply via email to