On 17.08.2016, at 11:24, Joern Kottmann <kottm...@gmail.com> wrote: > > Hello, > > thanks for sharing this, it is very nice to have the ability to train on > more data sets. > > Do you have an overview of the data sets you support in DKPro?
More relevant than the datasets may be the formats supported. A list of formats supported by the 1.8.0 release can be found here: https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html For some of the formats, there are also pointers to datasets. We also have such a format reference for the 1.9.0-SNAPSHOT version linked here [1]. There is no "nice" overview of the datasets supported by the loader so far, but that will eventually be added. You can check out the methods of the DatasetLoader class [2]. For your convenience, here is a snapshot of the methods available so far: loadAncientGreekAndLatingDependencyTreebank() loadCatalanConll2009() loadEnglishBrownCorpus() loadEnglishConll2000() loadEnglishGUMCorpus() loadFrenchDeepSequoiaCorpus() loadGermanConll2009() loadGermanHamburgDependencyTreebank() loadGermEval2014NER() loadJapaneseConll2009() loadNEMGP() loadSpanishConll2009() loadUniversalDependencyTreebankV1_3() The API of the dataset loader and the information exposed about the datasets is evolving. Feedback welcome. Best, -- Richard [1] https://dkpro.github.io/dkpro-core/documentation/ [2] https://github.com/dkpro/dkpro-core/blob/master/dkpro-core-datasets-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/datasets/DatasetLoader.java