from this text
"
in an Oct. 19 review of `` The Misanthrope '' at Chicago 's Goodman
Theatre ( `` Revitalized Classics Take the Stage in Windy City , ''
Leisure & Arts ) , the role of Celimene , played by Kim Cattrall , was
mistakenly attributed to Christina Haag .
"
i get
"
in an Oct. 19 review of ``The Misanthrope'' at Chicago's Goodman Theatre
(``Revitalized Classics Take the Stage in Windy City,'' Leisure & Arts),
the role of Celimene, played by Kim Cattrall, was mistakenly attributed
to Christina Haag.
"
So, the processing is corrent but the <SPLIT>'s are missing at for
example "Haag." or "Chicago's"
And i wonder if there is a missing parameter or I need another dictionary.
On 04/19/2012 07:11 PM, Jörn Kottmann wrote:
On 04/19/2012 06:20 PM, Joan Codina wrote:
then with the sentences with all tokens separated by spaces y need to
merge the words adding <space> but I don't know how to make it with
the dictionaryDetokenizer
./opennlp DictionaryDetokenizer ../models/en-detokenizer.xml
<../models/CoNLL2009-ST-English-train.sent
as it merges the senteces but does not add the <space>
It should insert <SPLIT> tags for certain spaces, so the tokenizer can
learn
that there is something to split. Input should be one sentence per line.
What output do you get?
Jörn