Hello, thanks for sharing this, it is very nice to have the ability to train on more data sets.
Do you have an overview of the data sets you support in DKPro? Jörn On Tue, Aug 9, 2016 at 3:58 PM, Richard Eckart de Castilho < richard.eck...@gmail.com> wrote: > Hi all, > > every once in a while somebody here asks about OpenNLP models. > The typical answer then is that there are the models on Sourceforge > but that people should rather train their own. Then often somebody > mentions that support for this-or-that corpus format in OpenNLP > would be cool in order to train on this-or-that dataset. And that > is where it normally ends. > > > So I thought, why not add the ability to train OpenNLP models to DKPro > Core? > > > DKPro Core is an open-source collection of components for Apache UIMA > integrating many NLP tools including OpenNLP into a uniform toolkit. > > DKPro Core already offers readers for many corpus formats. > > We have also started adding a dataset API to conveniently access > different standard corpora/datasets that are publicly/freely > available on the net. > > And finally, we added support for the OpenNLP training tools for > tokenizer, sentence splitter, POS tagger, chunker, and name finder. > > This makes it really easy to train new models for OpenNLP for many > datasets in just a few lines of code, e.g. [1] > > > DatasetLoader loader = new DatasetLoader(new File("cache")); > Dataset ds = loader.loadEnglishGUMCorpus(); > > CollectionReaderDescription trainReader = createReaderDescription( > Conll2006Reader.class, > Conll2006Reader.PARAM_PATTERNS, ds.getTrainingFiles(), > Conll2006Reader.PARAM_LANGUAGE, ds.getLanguage()); > > AnalysisEngineDescription trainer = createEngineDescription( > OpenNlpPosTaggerTrainer.class, > OpenNlpPosTaggerTrainer.PARAM_TARGET_LOCATION, new File(targetFolder, > "model.bin"), > OpenNlpPosTaggerTrainer.PARAM_LANGUAGE, ds.getLanguage()); > > SimplePipeline.runPipeline(trainReader, trainer); > > > Large parts of DKPro Core are ASL-licensed, but it is not an Apache > project. > > I hope this will be useful to people. > > Happy for any feedback! :) > > Cheers, > > -- Richard > > [1] https://github.com/dkpro/dkpro-core/blob/master/dkpro- > core-opennlp-asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/opennlp/ > OpenNlpPosTaggerTrainerTest.java