Hi, I just want to use the existing name finder with custom features. With the cmd line I can get the custom set of features running. Thanks for that. However, I want to be able to retrain the model dynamically, i.e. via source code.
I am now using the XML file for defining the set of custom features instead of instantiating it via the AdaptiveFeatureGenerator. I then use the method openFeatureGeneratorBytes() from the TokenNameFinderTrainerTool to convert it to a byte array which I can then pass to the TokenNameFinderFactory like this: TokenNameFinderFactory factory = new TokenNameFinderFactory(openFeatureGeneratorBytes(featureGenFile),null, codec); a) Is this approach alright or would you recommend something else? b) Another question: Is it possible to somehow see the computed feature vector for every token (during training and prediction)? c) And out of curiosity: Is it possible to see how much a feature contributes to the final decision? I want to identify features that are useless and those which may lead to wrong predictions. Thank you very much for your help again! Best regards, Markus > Hello, > > it really depends on what are you trying to achieve. > > Maybe you know exactly what you want, in that case I would recommend to > sub-class the TokenNameFinderFactory, there could override the method to > create the feature generators. The default constructor is fine. The name > finder supports different encodings, currently Bio and Bilou. You would > need to pass a reference to one of those classes, or just use the default > (which is Bio). > > If you just want to have the name finder with custom feature generation I > would suggest to define an xml descriptor for it and just use our cmd line > interface to build the model. The cmd lie inerface has the advantage that > you can use all the tools without coding yourself, especially evaluation > and cross validation should be interesting for you. > > TokenNameFinderFactory(byte[] featureGeneratorBytes, > Map<String,Object> resources, > SequenceCodec<String> seqCodec) > > The byte[] is supposed to contain the feature generator xml bytes. > > HTH, > Jörn