On 17.08.2016, at 11:24, Joern Kottmann <kottm...@gmail.com> wrote:
> 
> Hello,
> 
> thanks for sharing this, it is very nice to have the ability to train on
> more data sets.
> 
> Do you have an overview of the data sets you support in DKPro?

More relevant than the datasets may be the formats supported. A list
of formats supported by the 1.8.0 release can be found here:

  https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html

For some of the formats, there are also pointers to datasets.

We also have such a format reference for the 1.9.0-SNAPSHOT version 
linked here [1].

There is no "nice" overview of the datasets supported by the loader so far,
but that will eventually be added. You can check out the methods of the
DatasetLoader class [2]. For your convenience, here is a snapshot of the methods
available so far:

loadAncientGreekAndLatingDependencyTreebank()
loadCatalanConll2009()
loadEnglishBrownCorpus()
loadEnglishConll2000()
loadEnglishGUMCorpus()
loadFrenchDeepSequoiaCorpus()
loadGermanConll2009()
loadGermanHamburgDependencyTreebank()
loadGermEval2014NER()
loadJapaneseConll2009()
loadNEMGP()
loadSpanishConll2009()
loadUniversalDependencyTreebankV1_3()

The API of the dataset loader and the information exposed about the 
datasets is evolving. Feedback welcome.

Best,

-- Richard

[1] https://dkpro.github.io/dkpro-core/documentation/
[2] 
https://github.com/dkpro/dkpro-core/blob/master/dkpro-core-datasets-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/datasets/DatasetLoader.java

Reply via email to