On 3/12/2013 10:22 AM, Andreas Niekler wrote:
stehenge - blieben fre - undlicher
Andreas,
I'm not an expert on German, but in English the models are also trained on splitting contractions and other words into their root bases.
ie: You'll -split-> You 'll -meaning-> You will
Can't -split-> Can 't -meaning-> Can not
Other words may also get parsed and separated by the tokenizer.
Did you create the training data yourself? Or was this a clean set of
data from another source?
James
