Hi, I'm a bit new to OpenNLP, and I'm interested in the name finder functionality. The embedded organization model works relatively well for me, but not sufficiently good. So I decided to go for training. However, I can't achieve stable results. I would appreciate if anybody could answer a couple of questions:
1) What are the characteristics of a good training data set? I have a training data generator that injects many different organizations into some set of predefined sentences 2) I guess I need to implement adaptive feature generators? Is there some good documentation how to do so? Maybe even some books? Description of how namefinder works will definitely be useful. 3) Based on what characteristics I should choose a number of iterations and cutoff? 4) Can I train a model for several languages at a time? Any other suggestions/pointers are highly appreciated. Thanks a lot in advance, Vyacheslav