On 05/23/2013 02:56 PM, Яков Керанчук wrote:
Thanks for suggestion with own model, I'll tryI use standard en-token.bin model, text contains mixed upper-lower case words.
For the english model you should use the SimpleTokenizer, the token output from the en-token.bin model is not compatible with the training data. Jörn
