On 1/18/12 8:35 PM, mark meiklejohn wrote:
James,
I agree the correct way is to ensure upper-case. But when you have no
control over input it makes things a little more difficult.
So, I may look at a training set. What is the recommended size of a
training set?
In an annotation project I was doing lately our models started to work
after a couple
of hundred news articles. It of course depends on your language, domain
and the entities you
want to detect.
To make training easier I started to work on UIMA based annotation
tooling, let me know
if you would like to try that, any feedback is very welcome.
Jörn