Hereby we announce the release of DBpedia 2016-04. The new release is based
on updated Wikipedia dumps dating from March/April 2016 featuring a
significantly expanded base of information as well as richer and
(hopefully) cleaner data based on the DBpedia ontology.

You can download the new DBpedia datasets in a variety of RDF-document
formats from: http://wiki.dbpedia.org/downloads-2016-04 or directly here:
http://downloads.dbpedia.org/2016-04/
Support DBpedia

During the latest DBpedia meeting in Leipzig we discussed about ways to
support DBpedia <http://blog.dbpedia.org/?p=210> and what benefits this
support would bring <http://wiki.dbpedia.org/why-is-dbpedia-so-important>.
For the next two months, we are aiming to raise money to support the
hosting of the main services and the next DBpedia release (especially to
shorten release intervals). On top of that we need to buy a new server to
host DBpedia Spotlight that was so generously hosted so far by third
parties. If you use DBpedia and want us to keep going forward, we kindly
invite you to donate here <http://wiki.dbpedia.org/donate> or become a member
of the DBpedia association <http://wiki.dbpedia.org/membership>.
Statistics

The English version of the DBpedia knowledge base currently describes 6.0M
entities of which 4.6M have abstracts, 1.53M have geo coordinates and 1.6M
depictions. In total, 5.2M resources are classified in a consistent
ontology, consisting of 1.5M persons, 810K places (including 505K populated
places), 490K works (including 135K music albums, 106K films and 20K video
games), 275K organizations (including 67K companies and 53K educational
institutions), 301K species and 5K diseases. The total number of resources
in English DBpedia is 16.9M that, besides the 6.0M resources, includes 1.7M
skos concepts (categories), 7.3M redirect pages, 260K disambiguation pages
and 1.7M intermediate nodes.

Altogether the DBpedia 2016-04 release consists of 9.5 billion (2015-10:
8.8 billion) pieces of information (RDF triples) out of which 1.3 billion
(2015-10: 1.1 billion) were extracted from the English edition of
Wikipedia, 5.0 billion (2015-04: 4.4 billion) were extracted from other
language editions and 3.2 billion (2015-10: 3.2 billion) from  DBpedia
Commons and Wikidata. In general, we observed a growth in mapping-based
statements of about 2%.

Thorough statistics can be found on the DBpedia website
<http://wiki.dbpedia.org/dbpedia-2016-04-statisticsdatasets/dataset-2015-10/dataset-2015-10-statistics>
and general information on the DBpedia datasets here
<http://wiki.dbpedia.org/services-resources/datasets/dbpedia-datasets>.
Community

The DBpedia community added new classes and properties to the DBpedia
ontology via the mappings wiki. The DBpedia 2016-04 ontology encompasses:

   -

   754 classes (DBpedia 2015-10: 739)
   -

   1,103 object properties (DBpedia 2015-10: 1,099)
   -

   1,608 datatype properties (DBpedia 2015-10: 1,596)
   -

   132 specialized datatype properties (DBpedia 2015-10: 132)
   -

   410 owl:equivalentClass and 221 owl:equivalentProperty mappings external
   vocabularies (DBpedia 2015-04: 407 - 221)


The editor community of the mappings wiki also defined many new mappings
from Wikipedia templates to DBpedia classes. For the DBpedia 2016-04
extraction, we used a total of 5800 template mappings (DBpedia 2015-10:
5553 mappings). For the second time the top language, gauged by the number
of mappings, is Dutch (646 mappings), followed by the English community
(604 mappings).
(Breaking) Changes

   -

   In addition to normalized datasets to English DBpedia (en-uris) we
   additionally provide normalized datasets based on the DBpedia Wikidata
   (DBw) datasets (wkd-uris). These sorted datasets will be the foundation for
   the upcoming fusion process with wikidata. The DBw-based uris will be the
   only ones provided from the following releases on.
   -

   We now filter out triples from the Raw Infobox Extractor that are
   already mapped. E.g. no more “<x> dbo:birthPlace <z>” and “<x>
   dbp:birthPlace|dbp:placeOfBirth|... <z>” in the same resource. These
   triples are now moved to the “infobox-properties-mapped” datasets and not
   loaded on the main endpoint. See issue 22
   <https://github.com/dbpedia/extraction-framework/issues/22> for more
   details.
   -

   Major improvements in our citation extraction. See here
   
<http://www.mail-archive.com/dbpedia-discussion@lists.sourceforge.net/msg07762.html>
   for more details.
   -

   We incorporated the statistical distribution approach
   <http://www.heikopaulheim.com/docs/iswc2013.pdf> of Heiko Paulheim in
   creating type statements automatically and providing them as an additional
   datasets (instance_types_sdtyped_dbo).


In case you missed it, what we changed in the previous release (2015-10)

   -

   English DBpedia switched to IRIs. This can be a breaking change to some
   applications that need to change their stored DBpedia resource URIs /
   links. We provide the “uri-same-as-iri” dataset for English to ease the
   transition.
   -

   The instance-types dataset is now split into two files: instance-types
   (containing only direct types) and instance-types-transitive containing the
   transitive types of a resource based on the DBpedia ontology
   -

   The mappingbased-properties file is now split into three (3) files:
   -

      “geo-coordinates-mappingbased” that contains the coordinated
      originating from the mappings wiki. the “geo-coordinates” continues to
      provide the coordinates originating from the GeoExtractor
      -

      “mappingbased-literals” that contains mapping based fact with literal
      values
      -

      “mappingbased-objects” that contains mapping based fact with object
      values
      -

      the “mappingbased-objects-disjoint-[domain|range]” are facts that are
      filtered out from the “mappingbased-objects” datasets as errors but are
      still provided
      -

   We added a new extractor for citation data that provides two files:
   -

      citation links: linking resources to citations
      -

      citation data: trying to get additional data from citations. This is
      a quite interesting dataset but we need help to clean it up
      -

   All datasets are available in .ttl and .tql serialization (nt, nq
   dataset were neglected for reasons of redundancy and server capacity).


Upcoming Changes

   -

   Dataset normalization: We are going to normalize datasets based on
   wikidata uris and no longer on the English language edition, as a
   prerequisite to finally start the fusion process with wikidata.
   -

   RML Integration: Wouter Maroy did already provide the necessary
   groundwork for switching the mappings wiki to a RML based approach
   
<https://drive.google.com/file/d/0B7je1jgVmCgISXBPOHc3NDktblU/view?usp=sharing>
   on Github. We are not there yet but this is at the top of our list of
   changes.
   -

   Starting with the next release we are adding datasets with NIF
   annotations
   
<http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html>
   of the abstracts (as we already provided those for the 2015-04 release
   <http://downloads.dbpedia.org/2015-04/ext/nlp/abstracts/>). We will
   eventually extend the NIF annotation dataset to cover the whole Wikipedia
   article of a resource.

New Datasets

   -

   SDTypes: We extended the coverage of the automatically created type
   statements (instance_types_sdtyped_dbo) to English, German and Dutch (see
   above).
   -

   Extensions: In the extension folder (2016-04/ext
   <http://downloads.dbpedia.org/2016-04/ext/>) we provide two new
   datasets, both are to be considered in an experimental state:
   -

      DBpedia World Facts: This dataset is authored by the DBpedia
      association itself. It lists all countries, all currencies in use and
      (most) languages spoken in the world as well as how these concepts relate
      to each other (spoken in, primary language etc.) and useful
properties like
      iso codes (ontology diagram
      
<https://raw.githubusercontent.com/dbpedia/WorldFacts/master/DBpediaWorldFactsOntology.png>).
      This Dataset extends the very useful LEXVO <http://www.lexvo.org>dataset
      with facts from DBpedia and the CIA Factbook
      <https://www.cia.gov/library/publications/the-world-factbook/>.
      Please report any error or suggestions in regard to this dataset to
      Markus <markus.freudenb...@gmail.com>.
      -

      Lector Facts: This experimental dataset was provided by Matteo
      Cannaviccio and demonstrates his approach
      <http://dl.acm.org/citation.cfm?id=2932203> to generating facts by
      using common sequences of words (i.e. phrases) that are
frequently used to
      describe instances of binary relations in a text. We are looking
into using
      this approach as a regular extraction step. It would be helpful
to get some
      feedback from you.

Credits

Lots of thanks to

   -

   Markus Freudenberg (University of Leipzig / DBpedia Association) for
   taking over the whole release process and creating the revamped download &
   statistics pages.
   -

   Dimitris Kontokostas (University of Leipzig / DBpedia Association) for
   conveying his considerable knowledge of the extraction and release process.
   -

   All editors that contributed to the DBpedia ontology mappings via the
   Mappings Wiki.
   -

   The whole DBpedia Internationalization Committee for pushing the DBpedia
   internationalization forward.
   -

   Heiko Paulheim (University of Mannheim) for providing the necessary code
   for his algorithm to generate additional type statements for formerly
   untyped resources and identify and removed wrong statements. Which is now
   part of the DIEF.
   -

   Václav Zeman, Thomas Klieger and the whole LHD team (University of
   Prague) for their contribution of additional DBpedia types
   -

   Marco Fossati (FBK) for contributing the DBTax types
   -

   Alan Meehan (TCD) for performing a big external link cleanup
   -

   Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for providing
   the links from DOLCE to DBpedia ontology.
   -

   Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink
   Software) for loading the new data set into the Virtuoso instance that
   provides 5-Star Linked Open Data publication and SPARQL Query Services.
   -

   OpenLink Software (http://www.openlinksw.com/) collectively for
   providing the SPARQL Query Services and Linked Open Data publishing
    infrastructure for DBpedia in addition to their continuous infrastructure
   support.
   -

   Ruben Verborgh from Ghent University – iMinds for publishing the dataset
   as Triple Pattern Fragments <http://fragments.dbpedia.org/>, and iMinds
   for sponsoring DBpedia’s Triple Pattern Fragments server.
   -

   Ali Ismayilov (University of Bonn) for extending the DBpedia Wikidata
   dataset.
   -

   Vladimir Alexiev (Ontotext) for leading a successful mapping and
   ontology clean up effort.
   -

   All the GSoC students and mentors which directly or indirectly
   influenced the DBpedia release
   -

   Special thanks to members of the DBpedia Association
   <http://dbpedia.org/dbpedia-association>, the AKSW
   <http://aksw.org/About.html> and the department for Business Information
   Systems <http://bis.informatik.uni-leipzig.de/en/Welcome> of the
   University of Leipzig.




The work on the DBpedia 2016-04 release was financially supported by the
European Commission through the project ALIGNED – quality-centric, software
and data engineering  (http://aligned-project.eu/).

More information about DBpedia is found at http://dbpedia.org as well as in
the new overview article about the project available at
http://wiki.dbpedia.org/Publications.

Have fun with the new DBpedia 2016-04 release!

Cheers,
Markus Freudenberg, Dimitris Kontokostas, Sebastian Hellmann
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
DBpedia-developers mailing list
DBpedia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Reply via email to