On 1/18/12 8:35 PM, mark meiklejohn wrote:
James,

I agree the correct way is to ensure upper-case. But when you have no control over input it makes things a little more difficult.

So, I may look at a training set. What is the recommended size of a training set?


In an annotation project I was doing lately our models started to work after a couple of hundred news articles. It of course depends on your language, domain and the entities you
want to detect.

To make training easier I started to work on UIMA based annotation tooling, let me know
if you would like to try that, any feedback is very welcome.

Jörn


Reply via email to