Hi,
oh, I am afraid I recently added something like that for the 2.0.1
release, not yet included in the 2.0.0 release. This does not mean that
I would not include the action in UIMA Ruta ;-)
Here the excerpt of the documentation:
<section id="ugr.tools.ruta.language.regexprule">
<title>Simple Rules based on Regular Expressions</title>
<para>
The Ruta language includes, additionally to the normal rules, a
simplified rule syntax for processing regular expressions.
These simple rules consist of two parts separated by
<quote>-></quote>: The left part is the regular expression
(flags: DOTALL and MULTILINE), which may contain capturing
groups. The right part defines, which kind of annotations
should be created for each match of the regular expression. If a
type is given without a group index, then an annotation of that type is
created for the complete regular expression match, which
corresponds to group 0. These simple rules can be restricted to match
only within
certain annotations using the BLOCK construct, and ignore all
filtering settings.
</para>
<programlisting><![CDATA[
RegExpRule -> StringExpression "->" GroupAssignment
("," GroupAssignment)* ";"
GroupAssignment -> TypeExpression | NumberEpxression "=" TypeExpression
]]></programlisting>
<para>
The following example contains a simple rule, which is able to
create annotations of two different types. It creates an annotation
of the type <quote>T1</quote> for each match of the complete
regular expression and an annotation
of the type <quote>T2</quote> for each match of the first
capturing group.
</para>
<programlisting><![CDATA["A(.*?)C" -> T1, 1 = T2;]]></programlisting>
</section>
Am 02.05.2013 19:06, schrieb William Karl Thompson:
I forgot to mention, the numeric argument in the proposed MARKREGEXP action
indicates which capturing group is to be used from regular expression to
generate the region for the annotation of the specified type.
-----Original Message-----
From: William Karl Thompson
Sent: Thursday, May 02, 2013 12:02 PM
To: [email protected]
Subject: RE: Extending TextMarker with new actions
Peter,
Thanks for helping me to get going on this, it now works like a charm! Have
been able to generate extensions and have them be recognized by the Eclipse IDE
as per your instructions. Very nice!
In the process of doing this, I do have an idea for a possibly useful action to
be added to the current set. The basic idea is implement functionality similar
to that found in the RegularExpressionAnnotator that is one of the UIMA addons:
http://uima.apache.org/sandbox.html#regex.annotator
This allows you to define a set of regular expression matches, and to mark an
annotation on the region covered by the match, restricted if desired by a
capturing group within the regular expression. The way I implemented it
experimentally was like the following:
Sentence{->MARKREGEXP(TypeExpression, "(?i)(ascending colon) polyps", 1};
NP{PARTOF(FindingsSection)->MARKREGEXP(TypeExpression, "(?i)tubular adenoma", 0)};
The key thing is that the regular expression matching is using the equivalent
of java.util.regex.Matcher.find(), unlike the current implementation of the
REGEXP condition, which uses match():
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Matcher.html#find()
Anyway, thanks again for your help getting this all working.
Cheers,
Will
________________________________________
From: Peter Klügl [[email protected]]
Sent: Monday, April 29, 2013 4:20 PM
To: [email protected]
Subject: Re: Extending TextMarker with new actions
Hi,
Am 29.04.2013 20:22, schrieb William Karl Thompson:
Hi Peter,
I've updated and built the TextMarker projects, but now I'm spinning my wheels
a bit trying to install the updated TextMarker Workbench feature from the
projects. Could you give me a tip on how to do that? This isn't something I've
ever done before, and I'm not having much success at the moment.
There are different ways. You could either just build the jars and put them in
the dropins folder of your eclipse installation (with no textmarker installed)
- not really recommended. Or, you could build the update site, which can be
used to install the feature and plugins. The pom of the update site project
(was textmarker-eclipse-update-site) has two important properties:
item-maven-release-version and item-eclipse-release-version. If you want to
build an update site using the SNAPSHOT artifacts, then you need to adapt these
values, e.g., to 2.0.1-SNAPSHOT and 2.0.1.SNAPSHOT. The normal process is to
install everything and then package the update site.
You have also to include your extensions somehow, e.g., by extending the update
site (and feature) or by copying the built plugin to the dropins folder.
When I try new stuff, I always start an Eclipse Application using my textmarker
workspace. Here, no installation is needed. I could also build a textmarker
update site with the fixed extensions for you, but unfortunately not before
Thursday.
I am currently in the process of renaming all textmarker projects (the new name
is UIMA Ruta). You have to be careful which revision you are using to build the
projects right now, because I wasn't able to finish the renaming today, and I
haven't tested the new update site yet. The renaming started with revision
1477012. Sorry for the bad timing.
Best,
Peter
Many thanks,
Will
-----Original Message-----
From: William Karl Thompson
Sent: Friday, April 26, 2013 3:40 PM
To: [email protected]
Subject: RE: Extending TextMarker with new actions
Hi Peter,
Thanks very much, I will try this out!
Best,
Will
-----Original Message-----
From: Peter Klügl [mailto:[email protected]]
Sent: Friday, April 26, 2013 4:30 AM
To: [email protected]
Subject: Re: Extending TextMarker with new actions
Hi,
On 25.04.2013 19:16, William Karl Thompson wrote:
Hi Peter,
Many thanks! I was just about to try it out before reading your latest
email. Should I check out the latest trunk version from the svn repository
tomorrow?
I fixed most problems and committed the changes together with two
example projects (in
https://svn.apache.org/repos/asf/uima/sandbox/textmarker/trunk/example-projects):
textmarker-ep-example-extensions contains two parts: the implementation of an
action (ExampleAction) and the integration in the ide. That's the reason, why
it is a maven eclipse-plugin project.
ExtensionsExample is a simple textmarker project, which uses the extension.
The syntax check in the Workbench is not yet correctly integrated. It will take
a while until I will be able to write the documentation for the extensions.
Just let me know, if any problems occur.
Best,
Peter
Btw: I am also involved in a project about information extraction in
clinical texts. That's a quite active area ;-)
In terms of feature requests, I appreciate your willingness to consider
extensions. My strategy will be to try accomplishing a few tasks first, to see
what can be abstracted that is of sufficient generality. As background info, I
am creating some NLP applications for clinical text using cTAKES, and I think
TextMarker is a nice option to have for rule-based alternatives to certain
tasks (like relating two annotations to each other, DiseaseDisorder and
AnatomicalLocation in the same sentence). The current cTAKES relation extractor
is based on machine learning, and requires an annotated corpus for training,
whereas sometimes it's just easier to create a set of rules.
Cheers,
Will
-----Original Message-----
From: Peter Klügl [mailto:[email protected]]
Sent: Thursday, April 25, 2013 10:49 AM
To: [email protected]
Subject: Re: Extending TextMarker with new actions
Hi,
I checked the language extensions and unfortunately they do not work right now.
There are some small bugs, but they will be fixed tomorrow.
Best,
Peter
Am 25.04.2013 11:37, schrieb Peter Klügl:
Hi,
Am 25.04.2013 03:29, schrieb William Karl Thompson:
Hello,
(My apologies, I mistakenly sent this to the dev list initially)
I'm very interested in using the TextMarker project, but the
current set of action types doesn't quite do what I need. I found
references to an extension mechanism, have also found the
ITextMarkerActionExtension interface in the source code. I also
found the antlr grammar and lexer files where the TextMarker
language is defined, which appears to be where new action type
names are to be added. So I surmise the steps to add new actions is
to
1. Add the desired action signature to the antlr grammar
2. Define an implementation of ITextMarkerActionExtension that
implements the functionality.
Is there an easier way to do this? My concern is that I need to
modify TextMarker source files (the grammar and lexer files), which
would be overwritten on any updated version of TextMarker.
This should be possible without changing any textmarker code.
There is a generic parsing rule in the grammar, which creates an
external action using the set of ITextMarkerExtension mentioned in
the descriptor (parameter: additionalExtensions). There is no
default syntax check since the possible arguments are of course not
yet known by the engine. Syntax checks need to be implemented in the
ITextMarkerActionExtension.createAction(), which throws an
ANTLRException. The arguments of the action are delegated to this
method, which return the action implementation, so there will
probably many casts and "if instanceOf" checks. Language constructs
like assignments ("feature" = Type) known by the CREATE action, are
not yet supported.
Unfortunately, there is no automatic integration in the workbench yet.
You have to modify the BasicEngine (add the extension) in the
textmarker project yourself. The implemenatation of the extension
needs of course then also be available to the workbench.
I haven't used the language extensions since 2009 (it was a wordnet
integration) and they are not yet covered by unit tests. So, there
are maybe some bugs due to the changes after the contribution to
Apache UIMA. However, I will check the functionality, add a test
case and extend the documentation.
Concerning the list of available actions: You are of course also
welcome to create feature requests for new actions. The current set
of actions is mainly based on my own requirements and I will gladly
add new reasonable/generic actions (within the limits of my available time).
Best,
Peter
Thanks!
Will Thompson