Hello Markus,

the TokenNameFinderTrainerTool is part the cmdline package and not public
API. You should not use it. A good solution for you is for example
Files.readAllBytes.
Otherwise thats how you should do it. And we should look into adding more
constructors the the TokenNameFinderFactory to make this a bit nicer for
you.

B and C are not possible with the API we have currently, but you can take
the model apart yourself and look at the contents.

HTH,
Jörn


On Thu, Jan 19, 2017 at 12:01 PM, Markus M. Berg <mmb...@web.de> wrote:

> Hi,
>
> I just want to use the existing name finder with custom features. With the
> cmd line I can get the custom set of features running. Thanks for that.
> However, I want to be able to retrain the model dynamically, i.e. via
> source code.
>
> I am now using the XML file for defining the set of custom features
> instead of instantiating it via the AdaptiveFeatureGenerator. I then use
> the method openFeatureGeneratorBytes() from the TokenNameFinderTrainerTool
> to convert it to a byte array which I can then pass to the
> TokenNameFinderFactory like this:
>
>     TokenNameFinderFactory factory = new TokenNameFinderFactory(
> openFeatureGeneratorBytes(featureGenFile),null, codec);
>
> a) Is this approach alright or would you recommend something else?
>
> b) Another question: Is it possible to somehow see the computed feature
> vector for every token (during training and prediction)?
>
> c) And out of curiosity: Is it possible to see how much a feature
> contributes to the final decision? I want to identify features that are
> useless and those which may lead to wrong predictions.
>
> Thank you very much for your help again!
>
> Best regards,
> Markus
>
>
> > Hello,
> >
> > it really depends on what are you trying to achieve.
> >
> > Maybe you know exactly what you want, in that case I would recommend to
> > sub-class the TokenNameFinderFactory, there could override the method to
> > create the feature generators. The default constructor is fine. The name
> > finder supports different encodings, currently Bio and Bilou. You would
> > need to pass a reference to one of those classes, or just use the default
> > (which is Bio).
> >
> > If you just want to have the name finder with custom feature generation I
> > would suggest to define an xml descriptor for it and just use our cmd
> line
> > interface to build the model. The cmd lie inerface has the advantage that
> > you can use all the tools without coding yourself, especially evaluation
> > and cross validation should be interesting for you.
> >
> > TokenNameFinderFactory(byte[] featureGeneratorBytes,
> > Map<String,Object> resources,
> > SequenceCodec<String> seqCodec)
> >
> > The byte[] is supposed to contain the feature generator xml bytes.
> >
> > HTH,
> > Jörn
>

Reply via email to