Hello, it was a clean set which i just annotated with the <SPLIT> tags.
And the german root bases for those examples are not right in those cases i posted. I used 500 iterations could it be an overfitting problem? Thnakns for you help. Am 13.03.2013 02:38, schrieb James Kosin: > On 3/12/2013 10:22 AM, Andreas Niekler wrote: >> stehenge - blieben >> fre - undlicher > Andreas, > > I'm not an expert on German, but in English the models are also trained > on splitting contractions and other words into their root bases. > > ie: You'll -split-> You 'll -meaning-> You will > Can't -split-> Can 't -meaning-> Can not > > Other words may also get parsed and separated by the tokenizer. > > Did you create the training data yourself? Or was this a clean set of > data from another source? > > James > -- Andreas Niekler, Dipl. Ing. (FH) NLP Group | Department of Computer Science University of Leipzig Johannisgasse 26 | 04103 Leipzig mail: [email protected]
