On 9/28/11 12:44 PM, Riccardo Tasso wrote:
We once discussed to implement bloom filter based dictionaries, maybe
that would also be an option for you.
Interesting, can you give me more insights about it?
Have a look here:
https://issues.apache.org/jira/browse/OPENNLP-88
Well, the current implementation cannot really be sub-classed. If you
need to replace it the way to go should be to implement
the TagDictionary interface yourself.
Of course, but I still have a problem. The POSTaggerME.train method
takes as a parameter a POSDictionary object, not a TagDictionary. What
should I do?
Our current code doesn't support a custom TagDictionary. You would need
to implement this train method yourself, otherwise it cannot
create your custom POSModel implementation. I am sorry that things need
to be hacked here ...
One of the issues we have with "custom" resources is that we don't yet
have a generic mechanism to create them.
Lets say you store a custom dictionary, in your data format int he
model, then the model loading code needs to
know how to parse this dictionary with the custom dictionary class.
One way to solve this could be to place the class name of the dictionary
class in the model itself, then model loading code
could load the class to parse the model. A draw back of this approach is
that it might be tricky to get to work in an OSGi container,
but I guess there is a solution for this also. It just needs to be
investigated.
A List<String> or a String[] will not work as keys for a Map. We have
a StringList
object for this case. It contains a sequence of String objects.
Can you explain the structure a little, I cannot follow here.
Is the key a sequence of tokens?
Yes, I mean a sequence of tokens w.r.t. what you said in a previous
message. There could be the need to assign a tag to a certain sequence
of tokens.
A sequence of tokens could also have multiple tags per token. So we
would need something like this I believe:
Map<StringList, String[][]tags>
Jörn