[ANN] Maven for Data Beta

Sebastian Hellmann Wed, 11 Sep 2019 12:17:58 -0700

Dear all,

we developed a Maven for Data. It is still beta, but we already use itproductively to publish DBpedia's data(https://en.wikipedia.org/wiki/DBpedia, https://wiki.dbpedia.org/)

While I attached a lot of specific information about the data andmanagement part below, I would like to highlight the Maven parts:


# Download of Data

* http://databus.dbpedia.org is inspired by maven central and archiva.Software is much smaller in size and of course, we can not host all ofit, therefore we just keep the metadata with links to decentral downloadURLs

* The data can be viewed on the website and also downloaded via SPARQL:http://dev.dbpedia.org/Download_Data

* The version URLs serve as download plugin configuration parameter:http://dev.dbpedia.org/Databus_Derive_Maven_Integration


 
<version>https://databus.dbpedia.org/dbpedia/enrichment/mappingbased-literals/2019.03.01</version>

is equivalent to <dependency> for software, but it downloads the datainto "target/databus/download"

the goal "databus-derive:clone" can be called before "exec", so thesoftware can use the downloaded data



# Upload

* we have an upload pluginhttp://dev.dbpedia.org/Databus_Upload_User_Manual with:


** mvn validate -> check account and consistency

** mvn prepare-package (goal databus:metadata -> collects metadata intarget/databus/$artifact/$version/dataid.ttl

** mvn package -> copies data into a package directory on the serveroften /var/www/html/databusrepo/$user/$group/$artifact/$version


** mvn deploy -> post the dataid.ttl to databus.dbpedia.org

** We configure it with pom.xml and markdown docu:https://github.com/dbpedia/databus-maven-plugin/tree/master/dbpedia/mappings

* the derive plugin will be merged with features of the Databus Client:http://dev.dbpedia.org/Databus_Client

Overall, it doesn't have all features yet and it is overall not in astate where we could remove the "-SNAPSHOT" but we are running severalthousand files through it each month.

Databus comes with Mods, which serve as a Continous Integration for datatests (parsing and SHACL) similar to Jenkins and Travis.

We would like to thank Maven for all its cool features. It is reallygood and we could work very effectively with it. Thanks to theflexibility, we could also bend it to fit data better.


Do you have any suggestions on potential cooperations?


All the best,

Sebastian



-------- Forwarded Message --------

Subject: [ANN] DBpedia’s Databus and strategic initiative to facilitate1 Billion derived Knowledge Graphs by and for Consumers until 2025

Resent-Date:    Wed, 11 Sep 2019 09:25:41 +0000
Resent-From:    [email protected]
Date:   Wed, 11 Sep 2019 11:23:44 +0200
From:   Sebastian Hellmann <[email protected]>
To:     [email protected] <[email protected]>



**

[Please forward to interested colleagues]

We are proud to announce that the DBpedia Databus websiteat<https://databus.dbpedia.org/>_https://databus.dbpedia.org_<https://databus.dbpedia.org/> and the SPARQL APIat<https://databus.dbpedia.org/(repo/sparql|yasgui)>_https://databus.dbpedia.org/(repo/sparql|yasgui)_(_docu_ <http://dev.dbpedia.org/Download_Data>) are in public beta now.The system is usable (eat-your-own-dog-food tested) following a “workingsoftware over comprehensive documentation” approach. Due to its manycomponents (website, sparql endpoints, keycloak, mods, upload client,download client, and data debugging), we estimate approximately sixmonths in beta to fix bugs, implement all features and improve thedetails. If you have any feedback or questions, please usethe<https://forum.dbpedia.org/>_DBpedia Forum_<https://forum.dbpedia.org/>, the “report issues” button, or[email protected]_.

The full document is available at:_https://databus.dbpedia.org/dbpedia/publication/strategy/2019.09.09/strategy_databus_initiative.pdf_

We are looking forward to the feedback and discussion at the_14thDBpedia Community Meeting at SEMANTiCS 2019 in Karlsruhe_<https://wiki.dbpedia.org/events/14th-dbpedia-community-meeting-karlsruhe>on September 12th or online.



########
# Excerpt
########


     DBpedia Databus

The DBpedia Databus is a platform to capture invested effort by dataconsumers who needed better data quality (fitness for use) in order touse the data and give improvements back to the data source and otherconsumers. DBpedia Databus enables anybody to build an automatedDBpedia-style extraction, mapping and testing for any data they need.Databus incorporates features from DNS, Git, RSS, online forums andMaven to harness the full workpower of data consumers.



     Vision

Professional consumers of data worldwide have already built stablecleaning and refinement chains for all available datasets, but theirefforts are invisible and not reusable. Deep, cleaned data silos existbeyond the reach of publishers and other consumers trapped locally inpipelines.

*Data is not oil that flows out of inflexible pipelines*. Databus breaksexisting pipelines into individual components that together form adecentralized, but centrally coordinated data network in which data canflow back to previous components, the original sources, or end up beingconsumed by external components,

The Databus provides a platform for re-publishing these files with verylittle effort (leaving file traffic as only cost factor) while offeringthe full benefits of built-in system features such as automatedpublication, structured querying, automatic ingestion, as well aspluggable automated analysis, data testing via continuous integration,and automated application deployment *(software with data)*. The impactis highly synergistic, just a few thousand professional consumers andresearch projects can expose millions of cleaned datasets, which are onpar with what has long existed in deep silos and pipelines.



   1 Billion interconnected, quality-controlled Knowledge Graphs until 2025

As we are inversing the paradigm form a publisher-centric view to a dataconsumer network, we will open the download valve to enable discoveryand access to massive amounts of cleaner data than published by theoriginal source. The main DBpedia Knowledge Graph - cleaned data fromWikipedia in all languages and Wikidata - alone has 600k file downloadsper year complemented by downloads at over 20 chapter,e.g.<http://es.dbpedia.org/>_http://es.dbpedia.org_<http://es.dbpedia.org/> as well as over 8 million daily hits on themain Virtuoso endpoint. Community extension from the alpha phase suchas<https://databus.dbpedia.org/sven-h/dbkwik/dbkwik/2019.09.02>_DBkWik_<https://databus.dbpedia.org/sven-h/dbkwik/dbkwik/2019.09.02>,<https://databus.dbpedia.org/propan/lhd/linked-hypernyms>_LinkedHypernyms_<https://databus.dbpedia.org/propan/lhd/linked-hypernyms> are beingloaded onto the bus and consolidated and we expect this number to reachover 100 by the end of the year. Companies and organisations whohave<https://github.com/dbpedia/links>_previously uploaded theirbacklinks here_ <https://github.com/dbpedia/links> will be able tomigrate to the databus. Other datasets are cleaned and posted. In two ofour research projects_LOD-GEOSS_<https://www.enargus.de/pub/bscw.cgi/?op=enargus.eps2&s=14&q=BASF%20SE&v=10&m=2&id=1216225&p=1>and<http://plass.io/>_PLASS_ <http://plass.io/>, we will re-publish opendatasets, clean them and create collections, which will result inDBpedia-style knowledge graphs for energy systems and supply-chainmanagement.

The *full document* is available at:_https://databus.dbpedia.org/dbpedia/publication/strategy/2019.09.09/strategy_databus_initiative.pdf_


**

**

**

[ANN] Maven for Data Beta

Reply via email to