Yes, lets see what we could do.

The name finder already supports custom feature generation,
the same feature generation code could be reused by the POS Tagger.
This is actually already half done.

One of the current limitations is that we cannot store "custom" resources
in
a model. If we specify some kind of Factory class it would be nice if it
can help
us to locate the Artifact Serializer for a custom resource.

We could define one Factory class per component which is able to influence
how this component is created from the model.

What do you think?

Jörn

On Tue, Feb 7, 2012 at 2:17 PM, [email protected] <
[email protected]> wrote:

> Hi,
>
> I would like to work on that now, passing a Factory class name to the CLI
> tools and saving it to the model as a configuration.
> Do you still think it is a good idea? Or we should find a better way to
> load custom feature generator and custom sequence validators? I would like
> to do it for SentenceDetector and POS Tagger for now.
>
> Thanks,
> William
>
> On Tue, Jun 21, 2011 at 11:58 AM, Jörn Kottmann <[email protected]>
> wrote:
>
> > On 6/14/11 4:23 AM, [email protected] wrote:
> >
> >> Hi,
> >>
> >> Currently we only have implemented custom feature generators that we can
> >> pass from command line only for NameFinder, but it would be very nice to
> >> have it for all tools.
> >> The Thai sentence detector customization is nice and simple, but to do
> >> something for other languages the user would need to branch the code. We
> >> should allow users to pass a factory class name from command line. Maybe
> >> we
> >> could do it for every tool that doesn't use sequence feature generator.
> >> Also
> >> would be nice to save the factory class name to the model to make sure
> we
> >> are using the same feature generator during runtime and evaluation.
> >>
> >> What do you think? Maybe you have thought a better solution for that.
> >>
> >
> > The first approach OpenNLP come up with to customize the feature
> generation
> > of a component is to simply pass in a context generator. Well, that does
> > not
> > really work with the new model packages and the command line.
> > We never really came up with a solution to this problem or discussed it.
> >
> > William suggest that we should use a class name to load a factory class.
> > And I think we then should also remove the support to pass in a context
> > generator.
> >
> > I believe it is a good way of solving the issue, since the model can than
> > be used
> > by an code which integrates OpenNLP and has an additional jar on the
> > classpath.
> > That will for example work well with our UIMA integration.
> >
> > These models might not be well suited for distribution to a wider group
> of
> > people
> > since they always need the factory class which we cannot put inside the
> > model because
> > of security issues.
> >
> > For components where we need to adapt the feature generation to a
> language
> > I still
> > suggest that we continue to define default feature generation which is
> > dependent on
> > the language, as we already do for thai in the sentence detector.
> >
> > Well, I am not yet sure how it should be done for the parser, doccat and
> > coref.
> >
> > Jörn
> >
>

Reply via email to