from this text
"
in an Oct. 19 review of `` The Misanthrope '' at Chicago 's Goodman Theatre ( `` Revitalized Classics Take the Stage in Windy City , '' Leisure & Arts ) , the role of Celimene , played by Kim Cattrall , was mistakenly attributed to Christina Haag .
"

i get
"
in an Oct. 19 review of ``The Misanthrope'' at Chicago's Goodman Theatre (``Revitalized Classics Take the Stage in Windy City,'' Leisure & Arts), the role of Celimene, played by Kim Cattrall, was mistakenly attributed to Christina Haag.
"

So, the processing is corrent but the <SPLIT>'s are missing at for example "Haag." or "Chicago's"
And i wonder if there is a missing parameter or I need another dictionary.


On 04/19/2012 07:11 PM, Jörn Kottmann wrote:
On 04/19/2012 06:20 PM, Joan Codina wrote:


then with the sentences with all tokens separated by spaces y need to merge the words adding <space> but I don't know how to make it with the dictionaryDetokenizer ./opennlp DictionaryDetokenizer ../models/en-detokenizer.xml <../models/CoNLL2009-ST-English-train.sent

as it merges the senteces but does not add the <space>

It should insert <SPLIT> tags for certain spaces, so the tokenizer can learn
that there is something to split. Input should be one sentence per line.

What output do you get?

Jörn

Reply via email to