Hi,

I just want to use the existing name finder with custom features. With the cmd 
line I can get the custom set of features running. Thanks for that. However, I 
want to be able to retrain the model dynamically, i.e. via source code.

I am now using the XML file for defining the set of custom features instead of 
instantiating it via the AdaptiveFeatureGenerator. I then use the method 
openFeatureGeneratorBytes() from the TokenNameFinderTrainerTool to convert it 
to a byte array which I can then pass to the TokenNameFinderFactory like this:

    TokenNameFinderFactory factory = new 
TokenNameFinderFactory(openFeatureGeneratorBytes(featureGenFile),null, codec);

a) Is this approach alright or would you recommend something else?

b) Another question: Is it possible to somehow see the computed feature vector 
for every token (during training and prediction)?

c) And out of curiosity: Is it possible to see how much a feature contributes 
to the final decision? I want to identify features that are useless and those 
which may lead to wrong predictions.

Thank you very much for your help again!

Best regards,
Markus

 
> Hello,
>
> it really depends on what are you trying to achieve.
>
> Maybe you know exactly what you want, in that case I would recommend to
> sub-class the TokenNameFinderFactory, there could override the method to
> create the feature generators. The default constructor is fine. The name
> finder supports different encodings, currently Bio and Bilou. You would
> need to pass a reference to one of those classes, or just use the default
> (which is Bio).
>
> If you just want to have the name finder with custom feature generation I
> would suggest to define an xml descriptor for it and just use our cmd line
> interface to build the model. The cmd lie inerface has the advantage that
> you can use all the tools without coding yourself, especially evaluation
> and cross validation should be interesting for you.
>
> TokenNameFinderFactory(byte[] featureGeneratorBytes,
> Map<String,Object> resources,
> SequenceCodec<String> seqCodec)
>
> The byte[] is supposed to contain the feature generator xml bytes.
>
> HTH,
> Jörn

Reply via email to