On 3/12/2013 10:22 AM, Andreas Niekler wrote:
stehenge - blieben
fre - undlicher
Andreas,

I'm not an expert on German, but in English the models are also trained on splitting contractions and other words into their root bases.

ie:  You'll -split-> You 'll -meaning-> You will
      Can't -split-> Can 't -meaning-> Can not

Other words may also get parsed and separated by the tokenizer.

Did you create the training data yourself? Or was this a clean set of data from another source?

James

Reply via email to