Re: abbreviation diccionary format

Joan Codina Thu, 19 Apr 2012 23:34:15 -0700

from this text
"

in an Oct. 19 review of `` The Misanthrope '' at Chicago 's GoodmanTheatre ( `` Revitalized Classics Take the Stage in Windy City , ''Leisure & Arts ) , the role of Celimene , played by Kim Cattrall , wasmistakenly attributed to Christina Haag .


i get
"

in an Oct. 19 review of ``The Misanthrope'' at Chicago's Goodman Theatre(``Revitalized Classics Take the Stage in Windy City,'' Leisure & Arts),the role of Celimene, played by Kim Cattrall, was mistakenly attributedto Christina Haag.

So, the processing is corrent but the <SPLIT>'s are missing at forexample "Haag." or "Chicago's"

And i wonder if there is a missing parameter or I need another dictionary.


On 04/19/2012 07:11 PM, Jörn Kottmann wrote:

On 04/19/2012 06:20 PM, Joan Codina wrote:
then with the sentences with all tokens separated by spaces y need tomerge the words adding <space> but I don't know how to make it withthe dictionaryDetokenizer./opennlp DictionaryDetokenizer ../models/en-detokenizer.xml<../models/CoNLL2009-ST-English-train.sent
as it merges the senteces but does not add the <space>
It should insert <SPLIT> tags for certain spaces, so the tokenizer canlearn
that there is something to split. Input should be one sentence per line.

What output do you get?

Jörn

Re: abbreviation diccionary format

Reply via email to