Re: TextMarker language workthrough for text simplification example?

Peter Klügl Mon, 19 Nov 2012 06:57:13 -0800

Hi Fergal,

thanks for your interest, and to give TextMarker a try.

Interesting task. I haven't used TextMarker for this yet, the closestwas maybe anonymization or a parser for nominal phrases.

The first problem I see has more to do with UIMA than with TextMarker.The processed text may not be changed by the analysis engines.Therefore, you have to store the changes that should be performedsomehow as feature structures/annotations in the CAS. Some otheranalysis engine can then create a new view with the changed document.

I have to take a closer look at your example before I can give your someadvice :-)


Best,

Peter

PS: I hope the state of the documention will improve soon.

On 19.11.2012 13:45, Monaghan, Fergal wrote:

I've attached here the descriptor ("TextSimplifier.xml": configurationfor TextMarkerEngine), the test input data ("random01.txt.xmi":Cleartk[OpenNLP] annotated), the rules file ("rules.tm": with 1 rule,my first partial attempt at the text simplification process) and thecurrent output ("1.xmi": one additional tag has been created by therule), if this helps,
Thanks again,

Fergal.

*From:*[email protected]
*Sent:* 19 November 2012 09:56
*To:* '[email protected]'
*Subject:* TextMarker language workthrough for text simplificationexample?
Hi all (and especially the good folks working on TextMarker in thesandbox),
1. I am interested in implementing the type of text simplificationrules set out in this paper [1].
2. I would prefer to use TextMarker (and its language) natively inUIMA than use the UIMA<->GATE integration and JAPE rules.
3. I have cloned TextMarker from the repo and have configured ananalysis engine descriptor to run TextMarkerEngine using custom rules.
4. I have switched off the TextMarkerEngine seed annotations as I amtesting on pre-processed XMI files that have been pre-annotated withthe Cleartk type systems (up to and including TreebankNodes... OpenNLPused under the hood if that's of interest).
5. Things are building and unit tests running fine on simple rules.Yay! Good work guys :)
Now I am focussing on customising the rules for the textsimplification application. I have been studying the TextMarkerlanguage documentation here [2] as well as TextMarker's unit tests inthe sandbox to get things working so far, but am now asking for yourhelp to complete one of the example rules I'd like to implement. Thisis the example from [1]:
Input (original):
"The jury also commented on the Fulton court, which has been underfire for its practices in the appointment of appraisers, guardians andadministrators."
Output (simplified):
"The jury also commented on the Fulton court." "The Fulton court hasbeen under fire for its practices in the appointment of appraisers,guardians and administrators."
Rule I want to implement in the TextMarker language:

V W:NP_ant, Rel Clause(X:Rel Pr Y), Z. ->            V W Z. W Y.
which can be interpreted as "If a sentence consists of any text Vfollowed by the antecedent noun phrase W, a relative clause(consisting of a relative pronoun X and a sequence of words Y)enclosed in commas and a sequence of words Z, then the embedded clausecan be made into a new sentence with W as the subject NP".
So far I have gotten to this in the TextMarker language (please seebelow the contents of my rules.tm file that I'm running throughTextMarker). Please note this itself is not an attempt at the finalcomplete rule, but some intermediate attempt that is the furthest I'vebeen able to get on my own which still passes unit tests:
===============================================

PACKAGE org.cleartk.syntax.constituent.type;
(TreebankNode{FEATURE("nodeType","NP")}TerminalTreebankNode{FEATURE("nodeType",",")}TerminalTreebankNode{FEATURE("nodeType","WDT")}TreebankNode{FEATURE("nodeType","S")}){->MARK(com.sap.research.bd.ta.AdjectivalOrRelativeClause)};
===============================================
Can someone complete this rule to get me closer to the example above?I lack understanding of the TextMarker language, but I feel that if Ihad an example of this slightly more complex rule than what is presentin the unit tests/documentation, that I would be able to work it outfor the rest of the rules I want to implement.
Thanks very much for reading, and for any help you can provide,

*Fergal Monaghan*
B.E., Ph.D.   |   Research Specialist   |   SAP Research
*SAP (UK) Limited* | The Concourse | Queen's Road |Belfast BT3 9DT
T: +44 (0)28 9078-5705 | M: +44 (0)79 2076-6281 | F: +44(0)28 9078-5777
mailto:[email protected] | www.sap.com/research<http://www.sap.com/research>__
[1] http://homepages.abdn.ac.uk/advaith/pages/LEC02.pdf<http://homepages.abdn.ac.uk/advaith/pages/LEC02.pdf>
[2] http://tmwiki.informatik.uni-wuerzburg.de/Wiki.jsp?page=Introduction

Re: TextMarker language workthrough for text simplification example?

Reply via email to