Hi all (and especially the good folks working on TextMarker in the sandbox),

1. I am interested in implementing the type of text simplification rules set 
out in this paper [1].
2. I would prefer to use TextMarker (and its language) natively in UIMA than 
use the UIMA<->GATE integration and JAPE rules.
3. I have cloned TextMarker from the repo and have configured an analysis 
engine descriptor to run TextMarkerEngine using custom rules.
4. I have switched off the TextMarkerEngine seed annotations as I am testing on 
pre-processed XMI files that have been pre-annotated with the Cleartk type 
systems (up to and including TreebankNodes... OpenNLP used under the hood if 
that's of interest).
5. Things are building and unit tests running fine on simple rules. Yay! Good 
work guys :)

Now I am focussing on customising the rules for the text simplification 
application. I have been studying the TextMarker language documentation here 
[2] as well as TextMarker's unit tests in the sandbox to get things working so 
far, but am now asking for your help to complete one of the example rules I'd 
like to implement. This is the example from [1]:

Input (original):
"The jury also commented on the Fulton court, which has been under fire for its 
practices in the appointment of appraisers, guardians and administrators."
Output (simplified):
"The jury also commented on the Fulton court." "The Fulton court has been under 
fire for its practices in the appointment of appraisers, guardians and 
administrators."

Rule I want to implement in the TextMarker language:
V W:NP_ant, Rel Clause(X:Rel Pr Y), Z.    ->            V W Z. W Y.
which can be interpreted as "If a sentence consists of any text V followed by 
the antecedent noun phrase W, a relative clause (consisting of a relative 
pronoun X and a sequence of words Y) enclosed in commas and a sequence of words 
Z, then the embedded clause can be made into a new sentence with W as the 
subject NP".

So far I have gotten to this in the TextMarker language (please see below the 
contents of my rules.tm file that I'm running through TextMarker). Please note 
this itself is not an attempt at the final complete rule, but some intermediate 
attempt that is the furthest I've been able to get on my own which still passes 
unit tests:

===============================================
PACKAGE org.cleartk.syntax.constituent.type;

(TreebankNode{FEATURE("nodeType","NP")} 
TerminalTreebankNode{FEATURE("nodeType",",")} 
TerminalTreebankNode{FEATURE("nodeType","WDT")} 
TreebankNode{FEATURE("nodeType","S")}){->MARK(com.sap.research.bd.ta.AdjectivalOrRelativeClause)};
===============================================

Can someone complete this rule to get me closer to the example above? I lack 
understanding of the TextMarker language, but I feel that if I had an example 
of this slightly more complex rule than what is present in the unit 
tests/documentation, that I would be able to work it out for the rest of the 
rules I want to implement.

Thanks very much for reading, and for any help you can provide,

Fergal Monaghan
B.E., Ph.D.   |   Research Specialist   |   SAP Research
SAP (UK) Limited   |   The Concourse   |   Queen's Road   |   Belfast BT3 9DT
T:   +44 (0)28 9078-5705   |   M:   +44 (0)79 2076-6281   |   F:   +44 (0)28 
9078-5777
mailto:[email protected]   |   
www.sap.com/research<http://www.sap.com/research>

[1] http://homepages.abdn.ac.uk/advaith/pages/LEC02.pdf
[2] http://tmwiki.informatik.uni-wuerzburg.de/Wiki.jsp?page=Introduction

Reply via email to