Hi John,

I'd be happy to explain a bit more!

The main goal of my project is to examine if OpenNLP and our own chain of analysis (based on Lucene) can benefit from each other. I'm training, for example, POS-models with added features from our analyzer (for example decompounded tokens). Right now I'm still experimenting with different configurations of features and training parameters. I haven't been working with MaxEnt classifiers before, so it's really quite exploratory for me at this stage.

I'm training on the Dutch Alpino CoNLL data right now, I think it's the same dataset the 'regular' OpenNLP models were trained on. I also plan to use an English corpus, but I haven't really looked into that yet.

The goal is (obviously) to create the best performing models, and then maybe further on in our analysis (keyword extraction; classfication etc.) we might benefit from having the POS-tags.

Cheers,

Vincent

On 10-01-17 16:46, John Stewart wrote:
Vincent,

Would you mind explaining in a bit more detail why you want to do this?
I'm curious about ways in which people are training their own classifiers.
What data set are you working with?

Thanks!

jds

On Tue, Jan 10, 2017 at 7:55 AM, Vincent <vincent.s...@openindex.io> wrote:

Hey William,

Excellent, that solves it! Nice example. Thanks a lot!

Cheers,

Vincent


On 09-01-17 17:51, William Colen wrote:

That is the idea of the POS Tagger Factory. You can change the context
generator using the factory.

Take a look at this JUnit. Method testPOSTaggerWithCustomFactory

https://github.com/apache/opennlp/blob/master/opennlp-tools/
src/test/java/opennlp/tools/postag/POSTaggerFactoryTest.java

Give it a try and come back if you need help.

William

2017-01-09 9:28 GMT-02:00 Vincent <vincent.s...@openindex.io>:

Hi William,
Thanks for the reply. I think I haven't been exactly clear in phrasing my
question.

I'm using OpenNLP as a Maven depencency for my own project, which is more
like shell around OpenNLP, only calling the functions I need to
train/apply
the POSTagger. I don't have the OpenNLP code in my project, I just wanted
to inherit from the code to customize/override the feature selection. But
this is problematic because for example, POSModel is a final class and
can't be extended.

I hope that this makes sense.

Cheers,

Vincent


On 05-01-17 17:46, William Colen wrote:

Hi Vincent,
OpenNLP will take care of serializing the information necessary to call
your custom factory instead of the default one.
Just to make sure of that, change your custom POSContextGenerator by
adding
a small log message. You should see the message both during training and
runtime.

To train, use the methods that you can pass in your custom factory.

-- William

2017-01-05 13:53 GMT-02:00 Vincent <vincent.s...@openindex.io>:

Hi all,

I would like to be able to have a bit more influence on the features
used
for POS-tagging. I suppose that feature selection happens in the
POSContextGenerator. However, I can't seem to be able to change the
type
of
POSContextGenerator that is being called from POSTaggerFactory: the
getPOSContextGenerator() function always returns the
DefaultPOSContextGenerator.

Looking for a way to make a custom POSContextGenerator work, I made a
custom POSTaggerFactory too, that inherits from the regular
POSTaggerFactory, but returns my own POSContextGenerator. However, for
example, a class like POSModel (in turn used by POSTaggerME) is final,
and
can't be inherited from to use my custom POSTaggerFactory.

It seems I just need to copy the entire code into my own project, or is
there a better way to make this work?

Thanks in advance!

Cheers,

Vincent





Reply via email to