On 8/12/11 3:28 PM, [email protected] wrote:
If you know the tags which are causing trouble you might just want to remove
> all
> tokens from your dictionary which contain them. Removing a few words will
> not
> make a big difference in accuracy anyway.
>
Doing it during training is not a good idea? I thought it would help other
people.
No, I don't think so, because it makes it difficult to understand what
is going on and with the current system you really need enough training
data to cover all the tags.
If one tag is only mentioned 5 or 6 times I doubt that an an accurate
detection
is possible.
As said before it might be possible to create a POS Tagger which can
deal better
with less training data, but the one we have right now seems to have it
limits when
you want to use a tag dict.
Jörn