Cool, I just modified my code to strip out all the tags I'm not training on each time... but it would have been cool to do it the other way.
Cheers, Walrus theCat On Tue, Nov 26, 2013 at 2:43 AM, Jörn Kottmann <[email protected]> wrote: > Hello, > > the command line trainer util has an option to only used a specified set > of types. > > I am not sure if we ever made this available as part of the API, but it > should be really easy to do. > > Jörn > > > > On 11/21/2013 08:43 PM, Walrus theCat wrote: > >> Hi, >> >> I'm using the training API, and I want to create a bunch of different >> models. My training data has various entities in it. Unsurprisingly (at >> least to the people on this list), when I train a model on my training >> data, passing it a name for the entity I'm trying train, it creates a >> model >> that can detect all the entities in the input data. This is the line of >> code I'm using to do the training, pardon my Scala: >> >> NameFinderME.train("en", entityName, sampleStream, >> TrainingParameters.defaultParams(), >> null:Array[Byte], Collections.emptyMap[String, Object]()); >> >> The docs say this is how it will behave: >> >> "A training file can contain multiple types. If the training file contains >> multiple types the created model will also be able to detect these >> multiple >> types. For now its recommended to only train single type models, since >> multi type support is stil experimental. " >> >> What I was hoping would happen is that the trainer would just ignore the >> other entities not matching entityName, and just train the model for >> entityName. This seems like useful functionality, as the user could just >> do multiple passes over the training data training for different entities. >> >> I guess my question is, can OpenNLP already do what I'm trying to do? >> Would it be easier to script new data for each model I want to train (ugh) >> or modify OpenNLP to be able to do this? >> >> Cheers >> >> >
