Hello there,
A couple of questions (please feel free to refer me to resources which could
explain, and that I might have missed):
1/ With regard to the feature cut-off parameter for namefinder model training,
what is the definition of a 'feature'? - the entire tagged string, including
all tokens (i.e. in the tag <START> John Smith <END>, the entity 'John Smith'
is the feature)? Or each token inside a tag (i.e. both 'John' and 'Smith' are
features)? Or neither?!
2/ Is there a way to adjust the sensitivity of the named entity recognition so
as to favour precision over recall, or vice versa? Does the algorithm
automatically adjust the NER sensitivity to maximise the F-measure?
Again, if I've missed something and you know of reading material which might
explain this, please point me to it!
Thanks for your time,
Tom