On 9/28/11 12:44 PM, Riccardo Tasso wrote:

We once discussed to implement bloom filter based dictionaries, maybe that would also be an option for you.
Interesting, can you give me more insights about it?

Have a look here:
https://issues.apache.org/jira/browse/OPENNLP-88

Well, the current implementation cannot really be sub-classed. If you need to replace it the way to go should be to implement
the TagDictionary interface yourself.

Of course, but I still have a problem. The POSTaggerME.train method takes as a parameter a POSDictionary object, not a TagDictionary. What should I do?

Our current code doesn't support a custom TagDictionary. You would need to implement this train method yourself, otherwise it cannot create your custom POSModel implementation. I am sorry that things need to be hacked here ...

One of the issues we have with "custom" resources is that we don't yet have a generic mechanism to create them. Lets say you store a custom dictionary, in your data format int he model, then the model loading code needs to
know how to parse this dictionary with the custom dictionary class.

One way to solve this could be to place the class name of the dictionary class in the model itself, then model loading code could load the class to parse the model. A draw back of this approach is that it might be tricky to get to work in an OSGi container, but I guess there is a solution for this also. It just needs to be investigated.

A List<String> or a String[] will not work as keys for a Map. We have a StringList
object for this case. It contains a sequence of String objects.

Can you explain the structure a little, I cannot follow here.
Is the key a sequence of tokens?

Yes, I mean a sequence of tokens w.r.t. what you said in a previous message. There could be the need to assign a tag to a certain sequence of tokens.
A sequence of tokens could also have multiple tags per token. So we would need something like this I believe:
Map<StringList, String[][]tags>

Jörn

Reply via email to