On 3/13/11 12:54 AM, Radu Simionescu wrote:
Hello
I am making paper a pos tagger for Romanian for my disertation. I want to be
able to restrict the outcomes even more than just using a dictionary. I want to
use some rules for disambiguation, based on the context. This would allow me to
use smaller corpus, and also to fix consistent output mistakes.
So I want to be able to give the postagger the possible set of outcomes for
each word from the input, separately. So, since the training of a model doesn't
really use the pos dictionary, I figured I could make this parser by making
small modifications to the API, because the dictionary can change from one
sentence/word to the other. Please let me know if I am wrong.
There is no out-of-the-box support for this, but I believe it should be
easy to implement,
all you need to do is to write a custom sequence validator which does
what you described
above.
Just have a look at the POSTaggerME class, you need to modify the
constructor to give it
a custom fetaure generator. We should open a jira issue and extend our
API to pass-in
a sequence validator object.
Jörn