On 9/28/11 11:34 AM, Riccardo Tasso wrote:
This isn't a bug, but why can I load a POSDictionary from an xml format which is undocumented?
We previously had a plain/text format, which was replaced by this xml format. Because of encoding issues. I think we will do a couple of refactoring and redesign of the POS Tagger and then again improve the POS Dictionary and other dictionaries we currently have.
There are a couple of things which can be done better, e.g. when the dictionary only allows one tag we do not need to call the classifier to make a decision, the dictionary should also support token sequences,
etc. You are welcome to submit a patch to document our pos dict xml format.
I would prefear a String[] get(String word) and a void put(String word, String[] tags) methods.
For safety and thready safety reasons all our resources used during tagging should be immutable, well, that doesn't mean that we should not have an easy way to create these resources.
We have the get method, but it is called getTags. Jörn