Hi,

On 06.05.2013 18:26, William Karl Thompson wrote:
> Hi Peter,
>
> I like the simplified regular expression rule syntax -- very handy. It's 
> almost exactly what I wanted.  However, one thing I'm wondering is how to 
> create an annotation with features using such rules. I have in mind something 
> like the following:
>
> "(regex string)" -> 1 = CREATE(FooType, "feat" = "bar");
>
> Here's a possible variant of the above that  I can imagine would be useful 
> too:
>
> "(regex) (string)" -> CREATE(FooType, "feat1" = GROUP(1), "feat2"=GROUP(2));
>
> What are your thoughts on this?

I think I won't be able to use the existing code of the CREATE action
for this and it will also be problematic in the grammar without creating
a new context.

What about something like:

"(regexp) (string)" -> Type1, 1 = Type2 ("feat" = 2);

This will of course not work with numeric feature values, but there
isn't an auto-cast anyway...

Best,

Peter
 

>
> Cheers,
>
> Will
>
> -----Original Message-----
> From: William Karl Thompson 
> Sent: Thursday, May 02, 2013 1:49 PM
> To: [email protected]
> Subject: RE: Extending TextMarker with new actions
>
> Vielen Dank, Ich werde es probieren.
>
> -----Original Message-----
> From: Peter Klügl [mailto:[email protected]]
> Sent: Thursday, May 02, 2013 12:42 PM
> To: [email protected]
> Subject: Re: Extending TextMarker with new actions
>
> Am 02.05.2013 19:16, schrieb William Karl Thompson:
>> I see you're way ahead of me! I'll take a look at this -- is it in the 
>> latest on trunk?
> Yes, and there is also a unit test (if you are interested in some 
> ready-to-work examples): org.apache.uima.ruta.RegExpRuleTest.java(.ruta,
> .txt)
>
> Peter
>
>> -----Original Message-----
>> From: Peter Klügl [mailto:[email protected]]
>> Sent: Thursday, May 02, 2013 12:14 PM
>> To: [email protected]
>> Subject: Re: Extending TextMarker with new actions
>>
>> Hi,
>>
>> oh, I am afraid I recently added something like that for the 2.0.1 
>> release, not yet included in the 2.0.0 release. This does not mean 
>> that I would not include the action in UIMA Ruta ;-)
>>
>> Here the excerpt of the documentation:
>>
>> <section id="ugr.tools.ruta.language.regexprule">
>>       <title>Simple Rules based on Regular Expressions</title>
>>       <para>
>>         The Ruta language includes, additionally to the normal rules, a 
>> simplified rule syntax for processing regular expressions.
>>         These simple rules consist of two parts separated by
>> <quote>-></quote>: The left part is the regular expression
>>         (flags: DOTALL and MULTILINE), which may contain capturing groups. 
>> The right part defines, which kind of annotations
>>         should be created for each match of the regular expression. If a 
>> type is given without a group index, then an annotation of that type is
>>         created for the complete regular expression match, which corresponds 
>> to group 0. These simple rules can be restricted to match only within
>>         certain annotations using the BLOCK construct, and ignore all 
>> filtering settings.
>>       </para>
>>
>>       <programlisting><![CDATA[
>> RegExpRule      -> StringExpression "->" GroupAssignment
>>                     ("," GroupAssignment)* ";"
>> GroupAssignment -> TypeExpression | NumberEpxression "=" 
>> TypeExpression ]]></programlisting>
>>
>>       <para>
>>         The following example contains a simple rule, which is able to 
>> create annotations of two different types. It creates an annotation
>>         of the type <quote>T1</quote> for each match of the complete regular 
>> expression and an annotation
>>         of the type <quote>T2</quote> for each match of the first capturing 
>> group.
>>       </para>
>>
>>       <programlisting><![CDATA["A(.*?)C" -> T1, 1 = 
>> T2;]]></programlisting>
>>
>>
>>     </section>
>>
>>
>>
>>
>> Am 02.05.2013 19:06, schrieb William Karl Thompson:
>>> I forgot to mention, the numeric argument in the proposed MARKREGEXP action 
>>> indicates which capturing group is to be used from regular expression to 
>>> generate the region for the annotation of the specified type.
>>>
>>> -----Original Message-----
>>> From: William Karl Thompson
>>> Sent: Thursday, May 02, 2013 12:02 PM
>>> To: [email protected]
>>> Subject: RE: Extending TextMarker with new actions
>>>
>>> Peter,
>>>
>>> Thanks for helping me to get going on this, it now works like a charm! Have 
>>> been able to generate extensions and have them be recognized by the Eclipse 
>>> IDE as per your instructions. Very nice!
>>>
>>> In the process of doing this, I do have an idea for a possibly useful 
>>> action to be added to the current set. The basic idea is implement 
>>> functionality similar to that found in the RegularExpressionAnnotator that 
>>> is one of the UIMA addons:
>>>
>>> http://uima.apache.org/sandbox.html#regex.annotator
>>>
>>> This allows you to define a set of regular expression matches, and to mark 
>>> an annotation on the region covered by the match, restricted if desired by 
>>> a capturing group within the regular expression. The way I implemented it 
>>> experimentally was like the following:
>>>
>>> Sentence{->MARKREGEXP(TypeExpression, "(?i)(ascending colon) polyps", 
>>> 1}; NP{PARTOF(FindingsSection)->MARKREGEXP(TypeExpression,
>>> "(?i)tubular adenoma", 0)};
>>>
>>> The key thing is that the regular expression matching is using the 
>>> equivalent of java.util.regex.Matcher.find(), unlike the current 
>>> implementation of the REGEXP condition, which uses match():
>>>
>>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Matcher.
>>> h
>>> tml#find()
>>>
>>> Anyway, thanks again for your help getting this all working.
>>>
>>> Cheers,
>>>
>>> Will
>>>
>>> ________________________________________
>>> From: Peter Klügl [[email protected]]
>>> Sent: Monday, April 29, 2013 4:20 PM
>>> To: [email protected]
>>> Subject: Re: Extending TextMarker with new actions
>>>
>>> Hi,
>>>
>>> Am 29.04.2013 20:22, schrieb William Karl Thompson:
>>>> Hi Peter,
>>>>
>>>> I've updated and built the TextMarker projects, but now I'm spinning my 
>>>> wheels a bit trying to install the updated TextMarker Workbench feature 
>>>> from the projects. Could you give me a tip on how to do that? This isn't 
>>>> something I've ever done before, and I'm not having much success at the 
>>>> moment.
>>> There are different ways. You could either just build the jars and put them 
>>> in the dropins folder of your eclipse installation (with no textmarker 
>>> installed) - not really recommended. Or, you could build the update site, 
>>> which can be used to install the feature and plugins. The pom of the update 
>>> site project (was textmarker-eclipse-update-site) has two important 
>>> properties: item-maven-release-version and item-eclipse-release-version. If 
>>> you want to build an update site using the SNAPSHOT artifacts, then you 
>>> need to adapt these values, e.g., to 2.0.1-SNAPSHOT and 2.0.1.SNAPSHOT. The 
>>> normal process is to install everything and then package the update site.
>>>
>>> You have also to include your extensions somehow, e.g., by extending the 
>>> update site (and feature) or by copying the built plugin to the dropins 
>>> folder.
>>>
>>> When I try new stuff, I always start an Eclipse Application using my 
>>> textmarker workspace. Here, no installation is needed. I could also build a 
>>> textmarker update site with the fixed extensions for you, but unfortunately 
>>> not before Thursday.
>>>
>>> I am currently in the process of renaming all textmarker projects (the new 
>>> name is UIMA Ruta). You have to be careful which revision you are using to 
>>> build the projects right now, because I wasn't able to finish the renaming 
>>> today, and I haven't tested the new update site yet. The renaming started 
>>> with revision 1477012. Sorry for the bad timing.
>>>
>>> Best,
>>>
>>> Peter
>>>
>>>
>>>> Many thanks,
>>>>
>>>> Will
>>>>
>>>> -----Original Message-----
>>>> From: William Karl Thompson
>>>> Sent: Friday, April 26, 2013 3:40 PM
>>>> To: [email protected]
>>>> Subject: RE: Extending TextMarker with new actions
>>>>
>>>> Hi Peter,
>>>>
>>>> Thanks very much, I will try this out!
>>>>
>>>> Best,
>>>>
>>>> Will
>>>>
>>>> -----Original Message-----
>>>> From: Peter Klügl [mailto:[email protected]]
>>>> Sent: Friday, April 26, 2013 4:30 AM
>>>> To: [email protected]
>>>> Subject: Re: Extending TextMarker with new actions
>>>>
>>>> Hi,
>>>>
>>>> On 25.04.2013 19:16, William Karl Thompson wrote:
>>>>> Hi Peter,
>>>>>
>>>>>     Many thanks! I was just about to try it out before reading your 
>>>>> latest email. Should I check out the latest trunk version from the svn 
>>>>> repository tomorrow?
>>>> I fixed most problems and committed the changes together with two 
>>>> example projects (in
>>>> https://svn.apache.org/repos/asf/uima/sandbox/textmarker/trunk/example-projects):
>>>>
>>>> textmarker-ep-example-extensions contains two parts: the implementation of 
>>>> an action (ExampleAction) and the integration in the ide. That's the 
>>>> reason, why it is a maven eclipse-plugin project.
>>>>
>>>> ExtensionsExample is a simple textmarker project, which uses the extension.
>>>>
>>>> The syntax check in the Workbench is not yet correctly integrated. It will 
>>>> take a while until I will be able to write the documentation for the 
>>>> extensions. Just let me know, if any problems occur.
>>>>
>>>> Best,
>>>>
>>>> Peter
>>>>
>>>> Btw: I am also involved in a project about information extraction in 
>>>> clinical texts. That's a quite active area ;-)
>>>>
>>>>> In terms of feature requests, I appreciate your willingness to consider 
>>>>> extensions. My strategy will be to try accomplishing a few tasks first, 
>>>>> to see what can be abstracted that is of sufficient generality. As 
>>>>> background info, I am creating some NLP applications for clinical text 
>>>>> using cTAKES, and I think TextMarker is a nice option to have for 
>>>>> rule-based alternatives to certain tasks (like relating two annotations 
>>>>> to each other, DiseaseDisorder and AnatomicalLocation in the same 
>>>>> sentence). The current cTAKES relation extractor is based on machine 
>>>>> learning, and requires an annotated corpus for training, whereas 
>>>>> sometimes it's just easier to create a set of rules.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Will
>>>>>
>>>>> -----Original Message-----
>>>>> From: Peter Klügl [mailto:[email protected]]
>>>>> Sent: Thursday, April 25, 2013 10:49 AM
>>>>> To: [email protected]
>>>>> Subject: Re: Extending TextMarker with new actions
>>>>>
>>>>> Hi,
>>>>>
>>>>> I checked the language extensions and unfortunately they do not work 
>>>>> right now. There are some small bugs, but they will be fixed tomorrow.
>>>>>
>>>>> Best,
>>>>>
>>>>> Peter
>>>>>
>>>>> Am 25.04.2013 11:37, schrieb Peter Klügl:
>>>>>> Hi,
>>>>>>
>>>>>> Am 25.04.2013 03:29, schrieb William Karl Thompson:
>>>>>>> Hello,
>>>>>>>
>>>>>>> (My apologies, I mistakenly sent this to the dev list initially)
>>>>>>>
>>>>>>> I'm very interested in using the TextMarker project, but the 
>>>>>>> current set of action types doesn't quite do what I need. I found 
>>>>>>> references to an extension mechanism, have also found the 
>>>>>>> ITextMarkerActionExtension interface in the source code. I also 
>>>>>>> found the antlr grammar and lexer files where the TextMarker 
>>>>>>> language is defined, which appears to be where new action type 
>>>>>>> names are to be added. So I surmise the steps to add new actions 
>>>>>>> is to
>>>>>>>
>>>>>>>
>>>>>>> 1.       Add the desired action signature to the antlr grammar
>>>>>>>
>>>>>>> 2.       Define an implementation of ITextMarkerActionExtension that
>>>>>>> implements the functionality.
>>>>>>>
>>>>>>> Is there an easier way to do this? My concern is that I need to 
>>>>>>> modify TextMarker source files (the grammar and lexer files), 
>>>>>>> which would be overwritten on any updated version of TextMarker.
>>>>>> This should be possible without changing any textmarker code.
>>>>>>
>>>>>> There is a generic parsing rule in the grammar, which creates an 
>>>>>> external action using the set of ITextMarkerExtension mentioned in 
>>>>>> the descriptor (parameter: additionalExtensions). There is no 
>>>>>> default syntax check since the possible arguments are of course 
>>>>>> not yet known by the engine. Syntax checks need to be implemented 
>>>>>> in the ITextMarkerActionExtension.createAction(), which throws an 
>>>>>> ANTLRException. The arguments of the action are delegated to this 
>>>>>> method, which return the action implementation, so there will 
>>>>>> probably many casts and "if instanceOf" checks. Language 
>>>>>> constructs like assignments ("feature" = Type) known by the CREATE 
>>>>>> action, are not yet supported.
>>>>>>
>>>>>> Unfortunately, there is no automatic integration in the workbench yet.
>>>>>> You have to modify the BasicEngine (add the extension) in the 
>>>>>> textmarker project yourself. The implemenatation of the extension 
>>>>>> needs of course then also be available to the workbench.
>>>>>>
>>>>>> I haven't used the language extensions since 2009 (it was a 
>>>>>> wordnet
>>>>>> integration) and they are not yet covered by unit tests. So, there 
>>>>>> are maybe some bugs due to the changes after the contribution to 
>>>>>> Apache UIMA. However, I will check the functionality, add a test 
>>>>>> case and extend the documentation.
>>>>>>
>>>>>> Concerning the list of available actions: You are of course also 
>>>>>> welcome to create feature requests for new actions. The current 
>>>>>> set of actions is mainly based on my own requirements and I will 
>>>>>> gladly add new reasonable/generic actions (within the limits of my 
>>>>>> available time).
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Will Thompson
>>>>>>>

Reply via email to