Re: Problems training my own sentence splitter, with dictionary

Jörn Kottmann Wed, 28 Sep 2011 03:56:15 -0700

On 9/28/11 12:44 PM, Riccardo Tasso wrote:

We once discussed to implement bloom filter based dictionaries, maybethat would also be an option for you.
Interesting, can you give me more insights about it?


Have a look here:
https://issues.apache.org/jira/browse/OPENNLP-88

Well, the current implementation cannot really be sub-classed. If youneed to replace it the way to go should be to implement
the TagDictionary interface yourself.
Of course, but I still have a problem. The POSTaggerME.train methodtakes as a parameter a POSDictionary object, not a TagDictionary. Whatshould I do?

Our current code doesn't support a custom TagDictionary. You would needto implement this train method yourself, otherwise it cannotcreate your custom POSModel implementation. I am sorry that things needto be hacked here ...

One of the issues we have with "custom" resources is that we don't yethave a generic mechanism to create them.Lets say you store a custom dictionary, in your data format int hemodel, then the model loading code needs to

know how to parse this dictionary with the custom dictionary class.

One way to solve this could be to place the class name of the dictionaryclass in the model itself, then model loading codecould load the class to parse the model. A draw back of this approach isthat it might be tricky to get to work in an OSGi container,but I guess there is a solution for this also. It just needs to beinvestigated.

A List<String> or a String[] will not work as keys for a Map. We havea StringList
object for this case. It contains a sequence of String objects.

Can you explain the structure a little, I cannot follow here.
Is the key a sequence of tokens?
Yes, I mean a sequence of tokens w.r.t. what you said in a previousmessage. There could be the need to assign a tag to a certain sequenceof tokens.

A sequence of tokens could also have multiple tags per token. So wewould need something like this I believe:

Map<StringList, String[][]tags>

Jörn

Re: Problems training my own sentence splitter, with dictionary

Reply via email to