On 3/13/11 12:54 AM, Radu Simionescu wrote:
Hello

I am making paper a pos tagger for Romanian for my disertation. I want  to be
able to restrict the outcomes even more than just using a  dictionary. I want to
use some rules for disambiguation, based on the  context. This would allow me to
use smaller corpus, and also to fix  consistent output mistakes.

So I want to be able to give the postagger the possible set of outcomes  for
each word from the input, separately. So, since the training of a  model doesn't
really use the pos dictionary, I figured I could make this parser by  making
small modifications to the API, because the dictionary can change from one
sentence/word to the other. Please let me know if I am wrong.


There is no out-of-the-box support for this, but I believe it should be easy to implement, all you need to do is to write a custom sequence validator which does what you described
above.

Just have a look at the POSTaggerME class, you need to modify the constructor to give it a custom fetaure generator. We should open a jira issue and extend our API to pass-in
a sequence validator object.

Jörn

Reply via email to