On 8/12/11 4:25 AM, [email protected] wrote:
If the text I am processing has any occurrence of a verb present second
person singular it will crash the tagger!
This should be fixed now, if there are any tags in the dict which are not
maxent model outcomes, the model package validation code will fail to load
it. So now it is at least fail fast.
To fix that I am thinking about optionally filter the dictionary entries
according to the known outcomes, that will be only available after having
the model trained by our training tool or by the cross validator. So after
training we could iterate over the entries and remove the tags that are
unknown by the model. But I am not sure if it is the best approach.
You can easily iterate over the training data, and create a set which
contains
all tags which are in the model and then use this set to create/filter
your tag dict.
Jörn