Hi,
On 06.05.2013 18:26, William Karl Thompson wrote:
> Hi Peter,
>
> I like the simplified regular expression rule syntax -- very handy. It's
> almost exactly what I wanted. However, one thing I'm wondering is how to
> create an annotation with features using such rules. I have in mind something
> like the following:
>
> "(regex string)" -> 1 = CREATE(FooType, "feat" = "bar");
>
> Here's a possible variant of the above that I can imagine would be useful
> too:
>
> "(regex) (string)" -> CREATE(FooType, "feat1" = GROUP(1), "feat2"=GROUP(2));
>
> What are your thoughts on this?
I think I won't be able to use the existing code of the CREATE action
for this and it will also be problematic in the grammar without creating
a new context.
What about something like:
"(regexp) (string)" -> Type1, 1 = Type2 ("feat" = 2);
This will of course not work with numeric feature values, but there
isn't an auto-cast anyway...
Best,
Peter
>
> Cheers,
>
> Will
>
> -----Original Message-----
> From: William Karl Thompson
> Sent: Thursday, May 02, 2013 1:49 PM
> To: [email protected]
> Subject: RE: Extending TextMarker with new actions
>
> Vielen Dank, Ich werde es probieren.
>
> -----Original Message-----
> From: Peter Klügl [mailto:[email protected]]
> Sent: Thursday, May 02, 2013 12:42 PM
> To: [email protected]
> Subject: Re: Extending TextMarker with new actions
>
> Am 02.05.2013 19:16, schrieb William Karl Thompson:
>> I see you're way ahead of me! I'll take a look at this -- is it in the
>> latest on trunk?
> Yes, and there is also a unit test (if you are interested in some
> ready-to-work examples): org.apache.uima.ruta.RegExpRuleTest.java(.ruta,
> .txt)
>
> Peter
>
>> -----Original Message-----
>> From: Peter Klügl [mailto:[email protected]]
>> Sent: Thursday, May 02, 2013 12:14 PM
>> To: [email protected]
>> Subject: Re: Extending TextMarker with new actions
>>
>> Hi,
>>
>> oh, I am afraid I recently added something like that for the 2.0.1
>> release, not yet included in the 2.0.0 release. This does not mean
>> that I would not include the action in UIMA Ruta ;-)
>>
>> Here the excerpt of the documentation:
>>
>> <section id="ugr.tools.ruta.language.regexprule">
>> <title>Simple Rules based on Regular Expressions</title>
>> <para>
>> The Ruta language includes, additionally to the normal rules, a
>> simplified rule syntax for processing regular expressions.
>> These simple rules consist of two parts separated by
>> <quote>-></quote>: The left part is the regular expression
>> (flags: DOTALL and MULTILINE), which may contain capturing groups.
>> The right part defines, which kind of annotations
>> should be created for each match of the regular expression. If a
>> type is given without a group index, then an annotation of that type is
>> created for the complete regular expression match, which corresponds
>> to group 0. These simple rules can be restricted to match only within
>> certain annotations using the BLOCK construct, and ignore all
>> filtering settings.
>> </para>
>>
>> <programlisting><![CDATA[
>> RegExpRule -> StringExpression "->" GroupAssignment
>> ("," GroupAssignment)* ";"
>> GroupAssignment -> TypeExpression | NumberEpxression "="
>> TypeExpression ]]></programlisting>
>>
>> <para>
>> The following example contains a simple rule, which is able to
>> create annotations of two different types. It creates an annotation
>> of the type <quote>T1</quote> for each match of the complete regular
>> expression and an annotation
>> of the type <quote>T2</quote> for each match of the first capturing
>> group.
>> </para>
>>
>> <programlisting><![CDATA["A(.*?)C" -> T1, 1 =
>> T2;]]></programlisting>
>>
>>
>> </section>
>>
>>
>>
>>
>> Am 02.05.2013 19:06, schrieb William Karl Thompson:
>>> I forgot to mention, the numeric argument in the proposed MARKREGEXP action
>>> indicates which capturing group is to be used from regular expression to
>>> generate the region for the annotation of the specified type.
>>>
>>> -----Original Message-----
>>> From: William Karl Thompson
>>> Sent: Thursday, May 02, 2013 12:02 PM
>>> To: [email protected]
>>> Subject: RE: Extending TextMarker with new actions
>>>
>>> Peter,
>>>
>>> Thanks for helping me to get going on this, it now works like a charm! Have
>>> been able to generate extensions and have them be recognized by the Eclipse
>>> IDE as per your instructions. Very nice!
>>>
>>> In the process of doing this, I do have an idea for a possibly useful
>>> action to be added to the current set. The basic idea is implement
>>> functionality similar to that found in the RegularExpressionAnnotator that
>>> is one of the UIMA addons:
>>>
>>> http://uima.apache.org/sandbox.html#regex.annotator
>>>
>>> This allows you to define a set of regular expression matches, and to mark
>>> an annotation on the region covered by the match, restricted if desired by
>>> a capturing group within the regular expression. The way I implemented it
>>> experimentally was like the following:
>>>
>>> Sentence{->MARKREGEXP(TypeExpression, "(?i)(ascending colon) polyps",
>>> 1}; NP{PARTOF(FindingsSection)->MARKREGEXP(TypeExpression,
>>> "(?i)tubular adenoma", 0)};
>>>
>>> The key thing is that the regular expression matching is using the
>>> equivalent of java.util.regex.Matcher.find(), unlike the current
>>> implementation of the REGEXP condition, which uses match():
>>>
>>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Matcher.
>>> h
>>> tml#find()
>>>
>>> Anyway, thanks again for your help getting this all working.
>>>
>>> Cheers,
>>>
>>> Will
>>>
>>> ________________________________________
>>> From: Peter Klügl [[email protected]]
>>> Sent: Monday, April 29, 2013 4:20 PM
>>> To: [email protected]
>>> Subject: Re: Extending TextMarker with new actions
>>>
>>> Hi,
>>>
>>> Am 29.04.2013 20:22, schrieb William Karl Thompson:
>>>> Hi Peter,
>>>>
>>>> I've updated and built the TextMarker projects, but now I'm spinning my
>>>> wheels a bit trying to install the updated TextMarker Workbench feature
>>>> from the projects. Could you give me a tip on how to do that? This isn't
>>>> something I've ever done before, and I'm not having much success at the
>>>> moment.
>>> There are different ways. You could either just build the jars and put them
>>> in the dropins folder of your eclipse installation (with no textmarker
>>> installed) - not really recommended. Or, you could build the update site,
>>> which can be used to install the feature and plugins. The pom of the update
>>> site project (was textmarker-eclipse-update-site) has two important
>>> properties: item-maven-release-version and item-eclipse-release-version. If
>>> you want to build an update site using the SNAPSHOT artifacts, then you
>>> need to adapt these values, e.g., to 2.0.1-SNAPSHOT and 2.0.1.SNAPSHOT. The
>>> normal process is to install everything and then package the update site.
>>>
>>> You have also to include your extensions somehow, e.g., by extending the
>>> update site (and feature) or by copying the built plugin to the dropins
>>> folder.
>>>
>>> When I try new stuff, I always start an Eclipse Application using my
>>> textmarker workspace. Here, no installation is needed. I could also build a
>>> textmarker update site with the fixed extensions for you, but unfortunately
>>> not before Thursday.
>>>
>>> I am currently in the process of renaming all textmarker projects (the new
>>> name is UIMA Ruta). You have to be careful which revision you are using to
>>> build the projects right now, because I wasn't able to finish the renaming
>>> today, and I haven't tested the new update site yet. The renaming started
>>> with revision 1477012. Sorry for the bad timing.
>>>
>>> Best,
>>>
>>> Peter
>>>
>>>
>>>> Many thanks,
>>>>
>>>> Will
>>>>
>>>> -----Original Message-----
>>>> From: William Karl Thompson
>>>> Sent: Friday, April 26, 2013 3:40 PM
>>>> To: [email protected]
>>>> Subject: RE: Extending TextMarker with new actions
>>>>
>>>> Hi Peter,
>>>>
>>>> Thanks very much, I will try this out!
>>>>
>>>> Best,
>>>>
>>>> Will
>>>>
>>>> -----Original Message-----
>>>> From: Peter Klügl [mailto:[email protected]]
>>>> Sent: Friday, April 26, 2013 4:30 AM
>>>> To: [email protected]
>>>> Subject: Re: Extending TextMarker with new actions
>>>>
>>>> Hi,
>>>>
>>>> On 25.04.2013 19:16, William Karl Thompson wrote:
>>>>> Hi Peter,
>>>>>
>>>>> Many thanks! I was just about to try it out before reading your
>>>>> latest email. Should I check out the latest trunk version from the svn
>>>>> repository tomorrow?
>>>> I fixed most problems and committed the changes together with two
>>>> example projects (in
>>>> https://svn.apache.org/repos/asf/uima/sandbox/textmarker/trunk/example-projects):
>>>>
>>>> textmarker-ep-example-extensions contains two parts: the implementation of
>>>> an action (ExampleAction) and the integration in the ide. That's the
>>>> reason, why it is a maven eclipse-plugin project.
>>>>
>>>> ExtensionsExample is a simple textmarker project, which uses the extension.
>>>>
>>>> The syntax check in the Workbench is not yet correctly integrated. It will
>>>> take a while until I will be able to write the documentation for the
>>>> extensions. Just let me know, if any problems occur.
>>>>
>>>> Best,
>>>>
>>>> Peter
>>>>
>>>> Btw: I am also involved in a project about information extraction in
>>>> clinical texts. That's a quite active area ;-)
>>>>
>>>>> In terms of feature requests, I appreciate your willingness to consider
>>>>> extensions. My strategy will be to try accomplishing a few tasks first,
>>>>> to see what can be abstracted that is of sufficient generality. As
>>>>> background info, I am creating some NLP applications for clinical text
>>>>> using cTAKES, and I think TextMarker is a nice option to have for
>>>>> rule-based alternatives to certain tasks (like relating two annotations
>>>>> to each other, DiseaseDisorder and AnatomicalLocation in the same
>>>>> sentence). The current cTAKES relation extractor is based on machine
>>>>> learning, and requires an annotated corpus for training, whereas
>>>>> sometimes it's just easier to create a set of rules.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Will
>>>>>
>>>>> -----Original Message-----
>>>>> From: Peter Klügl [mailto:[email protected]]
>>>>> Sent: Thursday, April 25, 2013 10:49 AM
>>>>> To: [email protected]
>>>>> Subject: Re: Extending TextMarker with new actions
>>>>>
>>>>> Hi,
>>>>>
>>>>> I checked the language extensions and unfortunately they do not work
>>>>> right now. There are some small bugs, but they will be fixed tomorrow.
>>>>>
>>>>> Best,
>>>>>
>>>>> Peter
>>>>>
>>>>> Am 25.04.2013 11:37, schrieb Peter Klügl:
>>>>>> Hi,
>>>>>>
>>>>>> Am 25.04.2013 03:29, schrieb William Karl Thompson:
>>>>>>> Hello,
>>>>>>>
>>>>>>> (My apologies, I mistakenly sent this to the dev list initially)
>>>>>>>
>>>>>>> I'm very interested in using the TextMarker project, but the
>>>>>>> current set of action types doesn't quite do what I need. I found
>>>>>>> references to an extension mechanism, have also found the
>>>>>>> ITextMarkerActionExtension interface in the source code. I also
>>>>>>> found the antlr grammar and lexer files where the TextMarker
>>>>>>> language is defined, which appears to be where new action type
>>>>>>> names are to be added. So I surmise the steps to add new actions
>>>>>>> is to
>>>>>>>
>>>>>>>
>>>>>>> 1. Add the desired action signature to the antlr grammar
>>>>>>>
>>>>>>> 2. Define an implementation of ITextMarkerActionExtension that
>>>>>>> implements the functionality.
>>>>>>>
>>>>>>> Is there an easier way to do this? My concern is that I need to
>>>>>>> modify TextMarker source files (the grammar and lexer files),
>>>>>>> which would be overwritten on any updated version of TextMarker.
>>>>>> This should be possible without changing any textmarker code.
>>>>>>
>>>>>> There is a generic parsing rule in the grammar, which creates an
>>>>>> external action using the set of ITextMarkerExtension mentioned in
>>>>>> the descriptor (parameter: additionalExtensions). There is no
>>>>>> default syntax check since the possible arguments are of course
>>>>>> not yet known by the engine. Syntax checks need to be implemented
>>>>>> in the ITextMarkerActionExtension.createAction(), which throws an
>>>>>> ANTLRException. The arguments of the action are delegated to this
>>>>>> method, which return the action implementation, so there will
>>>>>> probably many casts and "if instanceOf" checks. Language
>>>>>> constructs like assignments ("feature" = Type) known by the CREATE
>>>>>> action, are not yet supported.
>>>>>>
>>>>>> Unfortunately, there is no automatic integration in the workbench yet.
>>>>>> You have to modify the BasicEngine (add the extension) in the
>>>>>> textmarker project yourself. The implemenatation of the extension
>>>>>> needs of course then also be available to the workbench.
>>>>>>
>>>>>> I haven't used the language extensions since 2009 (it was a
>>>>>> wordnet
>>>>>> integration) and they are not yet covered by unit tests. So, there
>>>>>> are maybe some bugs due to the changes after the contribution to
>>>>>> Apache UIMA. However, I will check the functionality, add a test
>>>>>> case and extend the documentation.
>>>>>>
>>>>>> Concerning the list of available actions: You are of course also
>>>>>> welcome to create feature requests for new actions. The current
>>>>>> set of actions is mainly based on my own requirements and I will
>>>>>> gladly add new reasonable/generic actions (within the limits of my
>>>>>> available time).
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Will Thompson
>>>>>>>