Re: Case Insensitive Name Finder - any ideas? - sorry missed the update - another ? though

mark meiklejohn Wed, 18 Jan 2012 13:13:53 -0800

Hi Jörn,

Thanks for your quick response.

Primarily the language is English, probably more American rather thanEuropean.

Domain-wise for the NER 'date' related otherwise, input data is domainindependent. The current implementation/model for NER date detection isvery good, it is the odd edge case such as lower case days, which causeproblems.

I could go to the lengths of probably writing a regex for it, but itwould be better to have a NLP solution, as these are already scanninginput texts.


Your UIMA based annotation tooling sounds interesting and worth a look.

Thanks

Mark

On 18/01/2012 21:05, Jörn Kottmann wrote:

On 1/18/12 8:35 PM, mark meiklejohn wrote:

James,

I agree the correct way is to ensure upper-case. But when you have no
control over input it makes things a little more difficult.

So, I may look at a training set. What is the recommended size of a
training set?


In an annotation project I was doing lately our models started to work
after a couple
of hundred news articles. It of course depends on your language, domain
and the entities you
want to detect.

To make training easier I started to work on UIMA based annotation
tooling, let me know
if you would like to try that, any feedback is very welcome.

Jörn

Re: Case Insensitive Name Finder - any ideas? - sorry missed the update - another ? though

Reply via email to