Re: [Wikidata] Wikidata in the LOD cloud

2020-09-17 Thread Sebastian Hellmann

Hi all,

a question here:

P8605 is shown as string, e.g. as "doi" in 
https://www.wikidata.org/wiki/Q5188229  , which is the last path segment 
of the identifier,  shouldn't this be "http://lod-cloud.net/dataset/doi; ?


At least for the LOD Cloud project.

-- Sebastian



On 16.09.20 15:53, Lydia Pintscher wrote:

On Sat, Aug 15, 2020 at 9:06 AM Egon Willighagen
 wrote:

Proposed: 
https://www.wikidata.org/wiki/Wikidata:Property_proposal/Generic#Linked_Open_Data_Cloud_identifier

Egon

And we now have the Property \o/   https://www.wikidata.org/wiki/Property:P8605


Cheers
Lydia



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] [ANN] DBpedia Autumn Hackathon, starting Sept 21st

2020-09-11 Thread Sebastian Hellmann

Apologies for cross-posting


Dear DBpedians, Linked Data savvies and Ontologists,


We would like to invite you to join the DBpedia Autumn Hackathon 2020 as 
a new format to contribute to DBpedia, gain fame, win small prizes and 
experience the latest technology provided by DBpedia Association 
members. The hackathon is part of the Knowledge Graphs in Action 
conference on October 6, 2020. Please check here: 
https://wiki.dbpedia.org/meetings/KnowledgeGraphsInAction



# Timeline

 *

   Registration of participants - main communication channel will be
   the #hackathon channel in DBpedia Slack (sign up
   https://dbpedia-slack.herokuapp.com/, then add yourself to the
   channel). If you wish to receive a reminder email on Sep 21st, you
   can leave your email address in this form: https://tinyurl.com/y24ps5jt

 *

   Until September 14th - preparation phase, participating
   organisations prepare details, track formation, additional tracks
   can be proposed, please contact dbpedia-eve...@infai.org
   

 *

   September 21st - Announcement of details for each track, including
   prizes, participating data, demos, tools and tasks. Check updates on
   hackathon website
   https://wiki.dbpedia.org/events/dbpedia-autumn-hackathon-2020

 *

   September 21st to October 1st - hacking period, coordinated via
   DBpedia slack

 *

   October 1st, 23:59 Hawaii Time -  Submission of hacking result (3
   min video and 2-3 paragraph summary with links, if not stated
   otherwise in the track)

 *

   October 5th, 16:00 CEST - Final Event, each track chair presents a
   short recap of the track, announces prizes or summarizes the result
   of hacking.

 *

   October 6th, 9:50 - 15:30 CEST - Knowledge Graphs in Action Event

 *

   Results and videos are documented on the DBpedia Website and the
   DBpedia Youtube channel.


# Member Tracks

The member tracks are hosted by DBpedia Association members, who are 
technology leaders in the area of Knowledge Engineering. Additional 
tracks can be proposed until Sep 14th, please contact 
dbpedia-eve...@infai.org .



 *

   timbr SQL Knowledge Graph: Learn how to model, map and query
   ontologies in timbr and then model an ontology of GDELT, map it to
   the GDELT database, and answer a number of questions that currently
   are quite impossible to get from the BigQuery GDELT database. Cash
   prizes planned. https://www.timbr.ai/

 *

   GNOSS Knowledge Graph Builder: Give meaning to your organisation’s
   documents and data with a Knowledge Graph.
   https://www.gnoss.com/en/products/semantic-framework

 *

   ImageSnippets: Labeling images with semantic descriptions. Use
   DBpedia spotlight and an entity matching lookup to select DBpedia
   terms to describe images. Then explore the resulting dataset through
   searches over inference graphs and explore the ImageSnippets dataset
   through our SPARQL endpoint. Prizes planned.
   http://www.imagesnippets.com

 *

   Diffbot: Build Your Own Knowledge Graph! Use the Natural Language
   API to extract triples from natural language text and expand these
   triples with data from the Diffbot Knowledge Graph (10+ billion
   entities, 1+ trillion facts). Check out the demo
   http://demo.nl.diffbot.com/. All participants will receive access to
   the Diffbot KG and tools for (non-commercial) research for one year
   ($10,000 value).


# Dutch National Knowledge Graph Track

Following the DBpedia FlexiFusion approach, we are currently 
flexi-fusing a huge, dbpedia-style knowledge graph that will connect 
many Linked Data sources and data silos relevant to the country of the 
Netherlands. We hope that this will eventually crystallize a 
well-connected sub-community linked open data (LOD) cloud in the same 
manner as DBpedia crystallized the original LOD cloud with some 
improvements (you could call it LOD Mark II). Data and hackathon details 
will be announced on 21st of September.



# Improve DBpedia Track

A community track, where everybody can participate and contribute in 
improving existing DBpedia components, in particular the extraction 
framework, the mappings, the ontology, data quality test cases, new 
extractors, links and other extensions. Best individual contributions 
will be acknowledged on the DBpedia website by anointing the WebID/Foaf 
profile.


(chaired by Milan Dojchinovski and Marvin Hofer from the DBpedia 
Association & InfAI and the DBpedia Hacking Committee)



# DBpedia Open Innovation Track

(not part of the hackathon, pre-announcement)

For the DBpedia Spring Event 2021, we are planning an Open Innovation 
Track, where DBpedians can showcase their applications. This endeavour 
will not be part of the hackathon as we are looking for significant 
showcases with development effort of months & years built on the core 
infrastructure of DBpedia such as the SPARQL endpoint, the data, lookup, 
spotlight, DBpedia Live, etc. Details will be 

[Wikidata] Fwd: [CfP] 2nd DBpedia Stack Online-Tutorial, Sept 2, 2020

2020-08-28 Thread Sebastian Hellmann

Dear all,

please find below the announcement of the 2nd DBpedia Stack tutorial. 
The first one is here: 
https://wiki.dbpedia.org/tutorials/1st-dbpedia-stack-tutorial and 
includes a video recording.


-- Sebastian



 Forwarded Message 
Subject:[CfP] 2nd DBpedia Stack Online-Tutorial, Sept 2, 2020
Resent-Date:Sat, 15 Aug 2020 10:57:52 +
Resent-From:public-ld...@w3.org
Date:   Sat, 15 Aug 2020 12:55:13 +0200
From:   Sebastian Hellmann 
Reply-To:   ho...@infai.org
To: public-ld...@w3.org



Apologies for cross-posting


Over the last year, the DBpedia core team has consolidated great amount 
of technology around DBpedia. This tutorial is targeted for developers 
(in particular of DBpedia Chapters) that wish to learn how to replicate 
local infrastructure such as loading and hosting an own SPARQL endpoint. 
A core focus will also be the new DBpedia Stack, which contains several 
dockerized applications that are automatically loading data from the 
DBpedia databus. The second tutorial will be held on September 2nd, 2020 
at 17:00 CEST and it will cover the following topics:


- Using Databus collections (Download)

- Creating customized Databus collections

- Uploading data to the Databus

- Using collections in Databus-ready Docker applications

- Creating dockerized applications for the DBpedia Stack

#Quick Facts

- Web URL: https://wiki.dbpedia.org/tutorials/2nd-dbpedia-stack-tutorial

- When: September 2nd, 2020 at 17:00-18:00 CEST

- Where: The tutorial will be organized online.Registration is required 
though.


- Databus: https://databus.dbpedia.org/

#Registration

Attending the DBpedia Stack tutorial is free. Registration is required 
though. After the registration for the event, you will receive an email 
with more instructions. Please register here to be part of the meeting: 
https://wiki.dbpedia.org/tutorials/2nd-dbpedia-stack-tutorial


#Organisation

- Milan Dojchinovski, AKSW/KILT, DBpedia Association

- Jan Forberg, AKSW/KILT, DBpedia Association

- Julia Holze, InfAI, DBpedia Association

- Sebastian Hellmann, AKSW/KILT, DBpedia Association

We are looking forward to meeting you online!


With kind regards,

The DBpedia Team


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] [Harvest Templates] Harvesting all of Wikipedia Infoboxes for Wikidata

2020-02-07 Thread Sebastian Hellmann

Hi all,

we have made quite some technical progress in GFS[1] regarding the 
harvesting of Wikipedia infoboxes including fact references and also 
external sources and filed a, RFC here:


https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Harvesting_of_Wikipedia_Infoboxes_for_Wikidata._Proposal_for_extension_of_Harvest_Templates


[1] https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE

--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Concise/Notable Wikidata Dump

2019-12-21 Thread Sebastian Hellmann
 does not seem trivial in terms of resources used)? Would it be a 
good idea?


In summary, I like the idea of using WDumper to sporadically generate 
-- and publish on Zenodo -- a "notable version" of Wikidata filtered 
by sitelinks (perhaps also allowing other high-degree or high-PageRank 
nodes to pass the filter). At least I know I would use such a dump.


Best,
Aidan

On 2019-12-19 6:46, Lydia Pintscher wrote:

On Tue, Dec 17, 2019 at 7:16 PM Aidan Hogan  wrote:


Hey all,

As someone who likes to use Wikidata in their research, and likes to
give students projects relating to Wikidata, I am finding it more and
more difficult to (recommend to) work with recent versions of Wikidata
due to the increasing dump sizes, where even the truthy version now
costs considerable time and machine resources to process and handle. In
some cases we just grin and bear the costs, while in other cases we
apply an ad hoc sampling to be able to play around with the data and 
try

things quickly.

More generally, I think the growing data volumes might inadvertently
scare people off taking the dumps and using them in their research.

One idea we had recently to reduce the data size for a student project
while keeping the most notable parts of Wikidata was to only keep 
claims

that involve an item linked to Wikipedia; in other words, if the
statement involves a Q item (in the "subject" or "object") not 
linked to

Wikipedia, the statement is removed.

I wonder would it be possible for Wikidata to provide such a dump to
download (e.g., in RDF) for people who prefer to work with a more
concise sub-graph that still maintains the most "notable" parts? While
of course one could compute this from the full-dump locally, making 
such

a version available as a dump directly would save clients some
resources, potentially encourage more research using/on Wikidata, and
having such a version "rubber-stamped" by Wikidata would also help to
justify the use of such a dataset for research purposes.

... just an idea I thought I would float out there. Perhaps there is
another (better) way to define a concise dump.

Best,
Aidan


Hi Aiden,

That the dumps are becoming too big is an issue I've heard a number of
times now. It's something we need to tackle. My biggest issue is
deciding how to slice and dice it though in a way that works for many
use cases. We have https://phabricator.wikimedia.org/T46581 to
brainstorm about that and figure it out. Input from several people
very welcome. I also added a link to Benno's tool there.
As for the specific suggestion: I fear relying on the existence of
sitelinks will kick out a lot of important things you would care about
like professions so I'm not sure that's a good thing to offer
officially for a larger audience.


Cheers
Lydia



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Comparison of Wikidata, DBpedia, and Freebase (draft and invitation)

2019-11-15 Thread Sebastian Hellmann

Hi Denny, all,

here is the second prototype of the new overarching DBpedia approach:

https://databus.dbpedia.org/vehnem/flexifusion/prefusion/2019.11.01

Datasets are grouped by property, DBpedia ontology is used, if exists. 
Data contains all Wkipedia languages mapped via DBpedia, Wikidata where 
mapped, some properties from DNB, Musicbrainz, Geonames.


We normalized the subjects based on the sameas links with some quality 
control. Datatypes will be normalised by rules plus machine learning in 
the future.


As soon as we make some adjustments, we can load it into the GFS GUI.

We are also working on an export using Wikidata Q's and P's so it is 
easier to ingest into Wikidata. More datasets from LOD will follow.


All the best,

Sebastian


On 04.10.19 01:23, Sebastian Hellmann wrote:


Hi Denny,

here are some initial points:

1. there is also the generic dataset from last month: 
https://databus.dbpedia.org/dbpedia/generic/infobox-properties/2019.08.30 
dataset (We still need to copy the docu on the bus). This has the 
highest coverage, but lowest consistency. English has around 50k 
parent properties maybe more if you count child inverse and other 
variants. We would need to check the mappings at 
http://mappings.dbpedia.org , which we are doing at the moment anyhow. 
It could take only an hour to map some healthy chunks into the 
mappings dataset.


curl 
https://downloads.dbpedia.org/repo/lts/generic/infobox-properties/2019.08.30/infobox-properties_lang=en.ttl.bz2 
| bzcat | grep "/parent"


http://temporary.dbpedia.org/temporary/parentrel.nt.bz2

Normally this dataset is messy, but still quite useful, because you 
can write the queries with alternatives (see 
dbo:position|dbp:position) in a way that make them useable, like this 
query that works since 13 years:


soccer players, who are born in a country with more than 10 million 
inhabitants, who played as goalkeeper for a club that has a stadium 
with more than 30.000 seats and the club country is different from 
the birth country 
<http://dbpedia.org/snorql/?query=SELECT+distinct+%3Fsoccerplayer+%3FcountryOfBirth+%3Fteam+%3FcountryOfTeam+%3Fstadiumcapacity%0D%0A{+%0D%0A%3Fsoccerplayer+a+dbo%3ASoccerPlayer+%3B%0D%0A+++dbo%3Aposition|dbp%3Aposition+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FGoalkeeper_%28association_football%29%3E+%3B%0D%0A+++dbo%3AbirthPlace%2Fdbo%3Acountry*+%3FcountryOfBirth+%3B%0D%0A+++%23dbo%3Anumber+13+%3B%0D%0A+++dbo%3Ateam+%3Fteam+.%0D%0A+++%3Fteam+dbo%3Acapacity+%3Fstadiumcapacity+%3B+dbo%3Aground+%3FcountryOfTeam+.+%0D%0A+++%3FcountryOfBirth+a+dbo%3ACountry+%3B+dbo%3ApopulationTotal+%3Fpopulation+.%0D%0A+++%3FcountryOfTeam+a+dbo%3ACountry+.%0D%0AFILTER+%28%3FcountryOfTeam+!%3D+%3FcountryOfBirth%29%0D%0AFILTER+%28%3Fstadiumcapacity+%3E+3%29%0D%0AFILTER+%28%3Fpopulation+%3E+1000%29%0D%0A}+order+by+%3Fsoccerplayer>
Maybe, we could also evaluate some queries which can be answered by 
one or the other? Can you do the query above in Wikidata?


2. We also have an API to get all references from infoboxes now as a 
partial result of the GFS project . See point 5 here : 
https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE


3. This particular dataset (generic/infobox-properties) above is also 
a good measure of non-adoption of Wikidata in Wikipedia. In total, it 
has over 500 million statements for all languages. Having a statement 
here means, that the data is using an infobox template parameter and 
no wikidata is used. The dataset is still extracted in the same way. 
We can check whether it got bigger or smaller. It is the same 
algorithm. But the fact that this still works and has a decent size 
indicates that Wikidata adoption by Wikipedians is low.


4. I need to look at the parent example in detail. However, I have to 
say that the property lends itself well for the Wikidata approach 
since it is easily understood and has sort of a truthiness and is easy 
to research and add.


I am not sure if it is representative as e.g. "employer" is more 
difficult to model (time scoped). Like my data here is outdated: 
https://www.wikidata.org/wiki/Q39429171


Also I don't see yet how this will become a more systematic approach 
that shows where to optimize, but I still need to read it fully.


We can start with this one however.

-- Sebastian

On 01.10.19 01:13, Denny Vrandečić wrote:

Hi all,

as promised, now that I am back from my trip, here's my draft of the 
comparison of Wikidata, DBpedia, and Freebase.


It is a draft, it is obviously potentially biased given my 
background, etc., but I hope that we can work on it together to get 
it into a good shape.


Markus, amusingly I took pretty much the same example that you went 
for, the parent predicate. So yes, I was also surprised by the 
results, and would love to have Sebastian or Kingsley look into it 
and see if I conducted it fairly.


SJ, Andra, thanks for offering to take a look. I am sure you all can 
contribute 

Re: [Wikidata] Comparison of Wikidata, DBpedia, and Freebase (draft and invitation)

2019-10-03 Thread Sebastian Hellmann
 for the comparison. But I might have gotten the 
whole procedure wrong. I am happy to be corrected.


On Sat, Sep 28, 2019 at 12:28 AM <mailto:hellm...@informatik.uni-leipzig.de>> wrote:
> Meanwhile, Google crawls all the references and extracts facts from 
there. We don't

> have that available, but there is Linked Open Data.

Potentially, not a bad idea, but we don't do that.

Everyone, this is the first time I share a Colab notebook, and I have 
no idea if I did it right. So any feedback of the form "oh you didn't 
switch on that bit over here" or "yes, this works, thank you" is very 
welcome, because I have no clue what I am doing :) Also, I never did 
this kind of analysis so transparently, which is kinda both totally 
cool and rather scary, because now you can all see how dumb I am :)


So everyone is invited to send Pull Requests (I guess that's how this 
works?), and I would love for us to create a result together that we 
agree on. I see the result of this exercise to be potentially twofold:


1) a publication we can point people to who ask about the differences 
between Wikidata, DBpedia, and Freebase


2) to reignite or start projects and processes to reduce these differences

So, here is the link to my Colab notebook:

https://github.com/vrandezo/colabs/blob/master/Comparing_coverage_and_accuracy_of_DBpedia%2C_Freebase%2C_and_Wikidata_for_the_parent_predicate.ipynb

Ideally, the third goal could be to get to a deeper understanding of 
how these three projects relate to each other - in my point of view, 
Freebase is dead and outdated, Wikidata is the core knowledge base 
that anyone can edit, and DBpedia is the core project to weave 
value-adding workflows on top of Wikidata or other datasets from the 
linked open data cloud together. But that's just a proposal.


Cheers,
Denny



On Sat, Sep 28, 2019 at 12:28 AM <mailto:hellm...@informatik.uni-leipzig.de>> wrote:


Hi Gerard,

I was not trying to judge here. I was just saying that it wasn't
much data in the end.
For me Freebase was basically cherry-picked.

Meanwhile, the data we extract is more pertinent to the goal of
having Wikidata cover the info boxes. We still have ~ 500 million
statements left. But none of it is used yet. Hopefully we can
change that.

Meanwhile, Google crawls all the references and extracts facts
from there. We don't have that available, but there is Linked Open
Data.

--
Sebastian

On September 27, 2019 5:26:43 PM GMT+02:00, Gerard Meijssen
mailto:gerard.meijs...@gmail.com>> wrote:

Hoi,
I totally reject the assertion was so bad. I have always had
the opinion that the main issue was an atrocious user
interface. Add to this the people that have Wikipedia notions
about quality. They have and had a detrimental effect on both
the quantity and quality of Wikidata.

When you add the functionality that is being build by the
datawranglers at DBpedia, it becomes easy/easier to compare
the data from Wikipedias with Wikidata (and why not Freebase)
add what has consensus and curate the differences. This will
enable a true datasense of quality and allows us to provide a
much improved service.
Thanks,
      GerardM

On Fri, 27 Sep 2019 at 15:54, Marco Fossati
mailto:foss...@spaziodati.eu>> wrote:

Hey Sebastian,

On 9/20/19 10:22 AM, Sebastian Hellmann wrote:
> Not much of Freebase did end up in Wikidata.

Dropping here some pointers to shed light on the migration
of Freebase
to Wikidata, since I was partially involved in the process:
1. WikiProject [1];
2. the paper behind [2];
3. datasets to be migrated [3].

I can confirm that the migration has stalled: as of today,
*528
thousands* Freebase statements were curated by the
community, out of *10
million* ones. By 'curated', I mean approved or rejected.
These numbers come from two queries against the primary
sources tool
database.

The stall is due to several causes: in my opinion, the
most important
one was the bad quality of sources [4,5] coming from the
Knowledge Vault
project [6].

Cheers,

Marco

[1]
https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase
[2]

http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44818.pdf
[3]

https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data
[4]

https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/2017#Quality_of_sources

Re: [Wikidata] Google's stake in Wikidata and Wikipedia

2019-09-27 Thread Sebastian Hellmann

Hi Marco,

I think, I looked at it some years ago and it still sounds like less 
than 5% made it, which is what I remember.


-- Sebastian

On 27.09.19 15:53, Marco Fossati wrote:

Hey Sebastian,

On 9/20/19 10:22 AM, Sebastian Hellmann wrote:

Not much of Freebase did end up in Wikidata.


Dropping here some pointers to shed light on the migration of Freebase 
to Wikidata, since I was partially involved in the process:

1. WikiProject [1];
2. the paper behind [2];
3. datasets to be migrated [3].

I can confirm that the migration has stalled: as of today, *528 
thousands* Freebase statements were curated by the community, out of 
*10 million* ones. By 'curated', I mean approved or rejected.
These numbers come from two queries against the primary sources tool 
database.


The stall is due to several causes: in my opinion, the most important 
one was the bad quality of sources [4,5] coming from the Knowledge 
Vault project [6].


Cheers,

Marco

[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase
[2] 
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44818.pdf
[3] 
https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data
[4] 
https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/2017#Quality_of_sources
[5] 
https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_Addition_of_References_to_Wikidata_Statements#A_whitelist_for_sources

[6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Google's stake in Wikidata and Wikipedia

2019-09-20 Thread Sebastian Hellmann
Na, I am quite open, albeit impulsive. The information given was quite 
good and some of my concerns regarding the involvement of Google were 
also lifted or relativized. Mainly due to the fact that there seems to 
be a sense of awareness.


I am just studying  economic principles, which are very powerful. I also 
have the feeling that free and open stuff just got a lot more commercial 
and I am still struggling with myself whether this is good or not. Also 
whether DBpedia should become frenemies with BigTech. Or funny things 
like many funding agencies try to push for national sustainability 
options, but most of the time, they suggest to use the GitHub Platform. 
Wikibase could be an option here.


I have to apologize for the Knowledge Graph Talk thing. I was a bit 
grumpy, because I thought I wasted a lot of time on the Talk page that 
could have been invested in making the article better (WP:BE_BOLD 
style), but now I think, it might have been my own mistake. So apologies 
for lashing out there.


(see comments below)

On 20.09.19 17:53, Denny Vrandečić wrote:

Sebastian,

"I don't want to facilitate conspiracy theories, but ..."
"[I am] interested in what is the truth behind the truth"

I am sorry, I truly am, but this *is* the language I know from 
conspiracy theorists. And given that, I cannot imagine that there is 
anything I can say that could convince you otherwise. Therefore there 
is no real point for me in engaging with this conversation on these 
terms, I cannot see how it would turn constructive.


The answers to many of your questions are public and on the record. 
Others tried to point you to them (thanks), but you dismiss them as 
not fitting your narrative.


So here's a suggestion, which I think might be much more constructive 
and forward-looking:


I have been working on a comparison of DBpedia, Wikidata, and Freebase 
(and since you've read my thesis, you know that's a thing I know a bit 
about). Simple evaluation, coverage, correctness, nothing dramatically 
fancy. But I am torn about publishing it, because, d'oh, people may 
(with good reasons) dismiss it as being biased. And truth be told - 
the simple fact that I don't know DBpedia as well as I know Wikidata 
and Freebase might indeed have lead to errors, mistakes, and stuff I 
missed in the evaluation. But you know what would help?


You.

My suggestion is that I publish my current draft, and then you and me 
work together on it, publically, in the open, until we reach a state 
we both consider correct enough for publication.


What do you think?


Sure, we are doing statistics at the moment as well. It is a bit hard to 
define what DBpedia is nowadays as we are rebranding the remixed 
datasets, now that we can pick up links and other data from the Databus. 
It might not even be a real dataset anymore, but glue between datasets 
focusing on the speed of integration and ease of quality improvement. 
Also still working on the concrete Sync Targets for GlobalFactSync 
(https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE) 
as well.


One question I have is whether Wikidata is effective/efficient or where 
it is effective and where it could use improvement as a chance for 
collaboration.


So yes any time.

-- Sebastian



Cheers,
Denny

P.S.: I am travelling the next week, so I may ask for patience


On Fri, Sep 20, 2019 at 8:11 AM Thad Guidry <mailto:thadgui...@gmail.com>> wrote:


Thank you for sharing your opinions, Sebastian.

Cheers,
Thad
https://www.linkedin.com/in/thadguidry/


On Fri, Sep 20, 2019 at 9:43 AM Sebastian Hellmann
mailto:hellm...@informatik.uni-leipzig.de>> wrote:

Hi Thad,

On 20.09.19 15:28, Thad Guidry wrote:

With my tech evangelist hat on...

Google's philanthropy is nearly boundless when it comes to
the promotion of knowledge.  Why? Because indeed it's in
their best interest otherwise no one can prosper without
knowledge.  They aggregate knowledge for the benefit of
mankind, and then make a profit through advertising ... all
while making that knowledge extremely easy to be found for
the world.


I am neither pro-Google or anti-Google per se. Maybe skeptical
and interested in what is the truth behind the truth. Google
is not synonym to philanthropy. Wikimedia is or at least I
think they are doing many things right. Google is a platform,
so primarily they "aggregate knowledge for their benefit"
while creating enough incentives in form of accessibility for
users to add the user's knowledge to theirs. It is not about
what Google offers, but what it takes in return. 20% of
employees time is also an investment in the skill of the
employee, a Google asset called Human Capital and also leads
to me and Denny from Google discussing whether
https

Re: [Wikidata] Google's stake in Wikidata and Wikipedia

2019-09-20 Thread Sebastian Hellmann

Hi Thad,

On 20.09.19 15:28, Thad Guidry wrote:

With my tech evangelist hat on...

Google's philanthropy is nearly boundless when it comes to the 
promotion of knowledge.  Why? Because indeed it's in their best 
interest otherwise no one can prosper without knowledge.  They 
aggregate knowledge for the benefit of mankind, and then make a profit 
through advertising ... all while making that knowledge extremely easy 
to be found for the world.


I am neither pro-Google or anti-Google per se. Maybe skeptical and 
interested in what is the truth behind the truth. Google is not synonym 
to philanthropy. Wikimedia is or at least I think they are doing many 
things right. Google is a platform, so primarily they "aggregate 
knowledge for their benefit" while creating enough incentives in form of 
accessibility for users to add the user's knowledge to theirs. It is not 
about what Google offers, but what it takes in return. 20% of employees 
time is also an investment in the skill of the employee, a Google asset 
called Human Capital and also leads to me and Denny from Google 
discussing whether https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is 
content marketing or knowledge (@Denny: no offense, legit arguments, but 
no agenda to resolve the stalled discussion there). Except I don't have 
20% time to straighten the view into what I believe would be neutral, so 
pushing it becomes a resource issue.


I found the other replies much more realistic and the perspective is yet 
unclear. Maybe Mozilla wasn't so much frenemy with Google and got 
removed from the browser market for it. I am also thinking about Linked 
Open Data. Decentralisation is quite weak, individually. I guess 
spreading all the Wikibases around to super-nodes is helpful unless it 
prevents the formation of a stronger lobby of philanthropists or 
competition to BigTech. Wikidata created some pressure on DBpedia as 
well (also opportunities), but we are fine since we can simply innovate. 
Others might not withstand. Microsoft seems to favor OpenStreetMaps so I 
am just asking to which degree Open Source and Open Data is being 
instrumentalised by BigTech.


Hence my question, whether it is compromise or be removed. (Note that 
states are also platforms, which measure value in GDP and make laws and 
roads and take VAT on transactions. Sometimes, they even don't remove 
opposition.)


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Google's stake in Wikidata and Wikipedia

2019-09-20 Thread Sebastian Hellmann

Dear all,

personally I am quite happy that Denny can contribute more to Wikidata 
and Wikipedia. No personal criticism there, I read his thesis and I am 
impressed by his work and contributions.


I don't want to facilitate any conspiracy theories here, but I am 
wondering about where Wikidata is going, especially with respect to Google.


Note that Chrome/Chromium being Open Source with a twist has already 
pushed Firefox from the market, but now there is this controversy about 
what is being tracked server side by Google Analytics and Client side by 
cookies and also the current discussion about Ad Blocker removal from 
Chrome: 
https://www.wired.com/story/google-chrome-ad-blockers-extensions-api/


Maybe somebody could enlighten me about the overall strategy and 
connections here.


1. there was a Knowledge Engine Project which failed, but in principle 
had the right idea: 
https://en.wikipedia.org/wiki/Knowledge_Engine_(Wikimedia_Foundation)


This was aimed to "democratize the discovery of media, news and 
information", in particular counter-moving the traffic sink by Google 
providing Wikipedia's information in Google Search. Now that there is 
Wikidata, this is much better for Google because they can take the CC-0 
data as they wish.


2. there are some very widely used terms like  "Knowledge Graph" , which 
seems to be blocked by Google: https://www.wikidata.org/wiki/Q648625 and 
https://en.wikipedia.org/wiki/Knowledge_Graph without a neutral point of 
view like the German WP adopted: 
https://de.wikipedia.org/wiki/Google#Knowledge_Graph


3. I was under the impression that Google bought Freebase and then 
started Wikidata as a non-threatening model to the data they have in 
their Knowledge Graph


Could someone give me some pointers about the financial connections of 
Google and Wikimedia (this should be transparent, right?) and also who 
pushed the Wikidata movement into life in 2012?


Google was also mentioned in 
https://blog.wikimedia.org/2017/10/30/wikidata-fifth-birthday/ but while 
it reads "Freebase <https://en.wikipedia.org/wiki/Freebase>, was 
discontinued because of the superiority of Wikidata’s approach and 
active community." I know the story as: Google didn't want its 
competitors to have the data and the service. Not much of Freebase did 
end up in Wikidata.


As I said, I don't want to push any opinions in any directions. I am 
more asking for more information about the connection of Google to 
Wikidata (financially), then Google to WMF and also I am asking about 
any strategic advantages for Google in relation to their competition.


Please don't answer with "How great Wikidata is", I already know that 
and this is also not in the scope of my "How intertwined is Google with 
Wikidata / WMF?" question. Can't mention this enough: also not against 
Denny.


It is a request for better information as I can't seem to find clear 
answers here.


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] [ANN] DBpedia’s Databus and strategic initiative to facilitate 1 Billion derived Knowledge Graphs by and for Consumers until 2025

2019-09-11 Thread Sebastian Hellmann

**

[Please forward to interested colleagues]

We are proud to announce that the DBpedia Databus website 
at_https://databus.dbpedia.org_ 
 and the SPARQL API 
at_https://databus.dbpedia.org/(repo/sparql|yasgui)_ 
(_docu_ ) are in public beta now. 
The system is usable (eat-your-own-dog-food tested) following a “working 
software over comprehensive documentation” approach. Due to its many 
components (website, sparql endpoints, keycloak, mods, upload client, 
download client, and data debugging), we estimate approximately six 
months in beta to fix bugs, implement all features and improve the 
details. If you have any feedback or questions, please use 
the_DBpedia Forum_ 
, the “report issues” button, or 
_dbpedia@infai.org_.



The full document is available at: 
_https://databus.dbpedia.org/dbpedia/publication/strategy/2019.09.09/strategy_databus_initiative.pdf_ 



We are looking forward to the feedback and discussion at the_14th 
DBpedia Community Meeting at SEMANTiCS 2019 in Karlsruhe_ 
 
on September 12th or online.




# Excerpt



 DBpedia Databus

The DBpedia Databus is a platform to capture invested effort by data 
consumers who needed better data quality (fitness for use) in order to 
use the data and give improvements back to the data source and other 
consumers. DBpedia Databus enables anybody to build an automated 
DBpedia-style extraction, mapping and testing for any data they need. 
Databus incorporates features from DNS, Git, RSS, online forums and 
Maven to harness the full workpower of data consumers.



 Vision

Professional consumers of data worldwide have already built stable 
cleaning and refinement chains for all available datasets, but their 
efforts are invisible and not reusable. Deep, cleaned data silos exist 
beyond the reach of publishers and other consumers trapped locally in 
pipelines.


*Data is not oil that flows out of inflexible pipelines*. Databus breaks 
existing pipelines into individual components that together form a 
decentralized, but centrally coordinated data network in which data can 
flow back to previous components, the original sources, or end up being 
consumed by external components,


The Databus provides a platform for re-publishing these files with very 
little effort (leaving file traffic as only cost factor) while offering 
the full benefits of built-in system features such as automated 
publication, structured querying, automatic ingestion, as well as 
pluggable automated analysis, data testing via continuous integration, 
and automated application deployment *(software with data)*. The impact 
is highly synergistic, just a few thousand professional consumers and 
research projects can expose millions of cleaned datasets, which are on 
par with what has long existed in deep silos and pipelines.



   1 Billion interconnected, quality-controlled Knowledge Graphs until 2025

As we are inversing the paradigm form a publisher-centric view to a data 
consumer network, we will open the download valve to enable discovery 
and access to massive amounts of cleaner data than published by the 
original source. The main DBpedia Knowledge Graph - cleaned data from 
Wikipedia in all languages and Wikidata - alone has 600k file downloads 
per year complemented by downloads at over 20 chapter, 
e.g._http://es.dbpedia.org_ 
 as well as over 8 million daily hits on the 
main Virtuoso endpoint. Community extension from the alpha phase such 
as_DBkWik_ 
,_LinkedHypernyms_ 
 are being 
loaded onto the bus and consolidated and we expect this number to reach 
over 100 by the end of the year. Companies and organisations who 
have_previously uploaded their 
backlinks here_  will be able to 
migrate to the databus. Other datasets are cleaned and posted. In two of 
our research projects_LOD-GEOSS_ 
 
and_PLASS_ , we will re-publish open 
datasets, clean them and create collections, which will result in 
DBpedia-style knowledge graphs for energy systems and supply-chain 
management.


The *full document* is available at: 
_https://databus.dbpedia.org/dbpedia/publication/strategy/2019.09.09/strategy_databus_initiative.pdf_ 



**

**

**


___
Wikidata mailing list

[Wikidata] [ANN] New Monthly DBpedia Releases

2019-09-08 Thread Sebastian Hellmann
[Please forward to all people who have been waiting for new DBpedia 
releases.]


[Responses and questions can also go into https://forum.dbpedia.org]

Dear all,

we built a complex, automated, test-driven system around the DBpedia 
releases to allow the community to debug and extend the data and tools 
better. The system is partially implemented and documented, meaning:


* from now on there will be monthly DBpedia releases

* they are *neither as complete* as the last big release *nor perfect*, 
but decent and they will improve each month with your contribution


* the system feels more effective as in: we found it much easier to 
locate and fix issues due to automatic testing URI patterns, N-Triples 
syntax and soon SHACL on Minidumps (triggered on software git commit) 
and on the final large dumps.


* two former community extensions were submitted already: DBkWik 
<https://databus.dbpedia.org/sven-h/dbkwik/dbkwik/2019.09.02> and 
LinkedHypernyms 
<https://databus.dbpedia.org/propan/lhd/linked-hypernyms/2016.04.01>



Only technical documentation is available at the moment. No summary 
statistics yet, i.e. we don't know yet how well we are doing overall.


* How to download: http://dev.dbpedia.org/Download_DBpedia

* How to improve: http://dev.dbpedia.org/Improve_DBpedia

* They can be browsed via http://databus.dbpedia.org/dbpedia but there  
docu is incomplete and the query and collection builder is untested


Also next week, 12th of September, DBpedia will meet at SEMANTiCS 2019 
<https://wiki.dbpedia.org/events/14th-dbpedia-community-meeting-karlsruhe>



--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata geo-coordinates

2019-09-02 Thread Sebastian Hellmann

Hi Olaf

https://databus.dbpedia.org/dbpedia/wikidata/geo-coordinates/2019.08.01 
has monthly (around the 7th) extractions of Wikidata's geo-coordinates.


The website still has a bug and the download links are currently not 
displayed any more at the bottom. But you can query for the latest version.


https://databus.dbpedia.org/yasgui/

PREFIX dataid: <http://dataid.dbpedia.org/ns/core#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dcat:  <http://www.w3.org/ns/dcat#>

SELECT ?downloadURL ?sha256sum WHERE {
        ?dataset dataid:artifact 
<https://databus.dbpedia.org/dbpedia/wikidata/geo-coordinates> .

    ?dataset dcat:distribution/dcat:downloadURL ?downloadURL .
    ?dataset dcat:distribution/dataid:sha256sum ?sha256sum .
        ?dataset dct:hasVersion ?version .

} ORDER BY DESC (?version) LIMIT 1

-- Sebastian


On 02.09.19 17:56, Eugene Alvin Villar wrote:
On Mon, Sep 2, 2019, 11:39 PM Olaf Simons, 
<mailto:olaf.sim...@pierre-marteau.com>> wrote:


Is there an elegant way to get data out of wikidata in a format
that you can then fill back into another Wikibase without the pain
of such conversions (like splitting coordinates, changing columns,
changing the prefixes...)


Depends on how elegant you want it to be but it won't be trivial. If 
you want to get data from WDQS, you can use any of the available 
SPARQL text/regex manipulation functions to convert the WKT format 
into a different format.


Question 2 is related: When you extract dates with the
QueryService that will change just years like 1971  from
1971-00-00 into 1971-01-01 dates. I felt unable to tell whether
such a date was just a year or actually a January 1 entry. Is
there a way to get the exact date as it has been put in to
Wikibase back from the QueryService?


Again, this is not trivial. You need to also query the datePrecision 
field of the date value and that means querying for the actual date 
statement and not just the simple value that the usual SPARQL queries 
provide. Then based on the datePrecision value (I think 9 is for year 
precision vs. 11 for day precision), you can then truncate the date to 
just the year.



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Proposal for the introduction of a practicable Data Quality Indicator in Wikidata

2019-08-28 Thread Sebastian Hellmann

Hi Imre,

we can encode these rules using the JSON MongoDB database we created in 
GlobalFactSync project 
(https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE). 
As  basis for the GFS Data Browser. The database has open read access.


Is there a list of geodata issues, somewhere? Can you give some example? 
GFS focuses on both: overall quality measures and very domain specific 
adaptations. We will also try to flag these issues for Wikipedians.


So I see that there is some notion of what is good and what not by 
source. Do you have a reference dataset as well, or would that be 
NaturalEarth itself? What would help you to measure completeness for 
adding concordances to NaturalEarth.


-- Sebastian

On 24.08.19 21:26, Imre Samu wrote:
For geodata ( human settlements/rivers/mountains/... )  ( with GPS 
coordinates ) my simple rules:
- if it has a  "local wikipedia pages" or  any big 
lang["EN/FR/PT/ES/RU/.."]  wikipedia page ..  than it is OK.
- if it is only in "cebuano" AND outside of "cebuano BBOX" ->  then 
 this is lower quality
- only:{shwiki+srwiki} AND outside of "sh"&"sr" BBOX -> this is lower 
quality

- only {huwiki} AND outside of CentralEuropeBBOX -> this is lower quality
- geodata without GPS coordinate ->  ...
- 
so my rules based on wikipedia pages and languages areas ... and I 
prefer wikidata - with local wikipedia pages.


This is based on my experience - adding Wikidata ID concordances to 
NaturalEarth ( https://www.naturalearthdata.com/blog/ )

--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Proposal for the introduction of a practicable Data Quality Indicator in Wikidata

2019-08-28 Thread Sebastian Hellmann
;)
www.archivfuehrer-kolonialzeit.de/thesaurus
<http://www.archivfuehrer-kolonialzeit.de/thesaurus>



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] [GlobalFactSync] User Script, Data Browser, Reference web service - WMF Grant project

2019-08-15 Thread Sebastian Hellmann

Dear all,

we would like to share consolidated updates for the GlobalFactSync (GFS) 
project with you (copied from 
https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE/News)


We polished everything for our presentation at Wikimania tomorrow: 
https://wikimania.wikimedia.org/wiki/2019:Technology_outreach_%26_innovation/GlobalFactSync


All feedback welcome!

-- Sebastian (with the team: Tina, Włodzimierz,  Krzysztof, Johannes and 
Marvin)



 User Script, Data Browser, Reference web service (15. August 2019)

After the Kick-Off note end of July 
, 
which described our first edit and the concept better, we shaped the 
technical microservices and data into more concise tools that are easier 
to use and demo during our Wikimania presentation 
: 



1. User Script  available
   at User:JohannesFre/global.js
    shows
   links from each article and Wikidata to the Data Browser and
   Reference Web Service
   

1.
   User Script Linking to the GFS Data Browser
2. GFS Data Browser  Github
    now accepts any URI in subject from
   Wikipedia, DBpedia or Wikidata, see the Boys Don't Cry example from
   Kick-Off Note
   
,
   Berlin/Geo-coords lat
   

   long
   
,
   Albert Einstein's Religion
   
.
   *Not Live yet, edits/fixes are not reflected*
3. Reference Web Service (Albert Einstein:
   
http://dbpedia.informatik.uni-leipzig.de:8111/infobox/references?article=https://en.wikipedia.org/wiki/Albert_Einstein=json)
   extracts (1) all references from a Wikipedia page, (2) matched to
   the infobox parameter and (3) also extracts the fact from it. The
   service will remain stable, so you can use it.

Furthermore, we are designing a friendly fork of HarvestTemplates 
 to effectively import all 
that data into Wikidata.



 Kick-off note (25. Juli 2019)


*GlobalFactSync - Synchronizing Wikidata and Wikipedia's infoboxes*


How is data edited in Wikipedia/Wikidata? Where does it come from? And 
how can we synchronize it globally?


The GlobalFactSync 
 
(GFS) Project — funded by the Wikimedia Foundation — started in June 
2019 and has two goals:


 * Answer the above-mentioned three questions.
 * Build an information system to synchronize facts between all
   Wikipedia language-editions and Wikidata.

Now we are seven weeks into the project (10+ more months to go) and we 
are releasing our first prototypes to gather feedback.



/How – Synchronization vs Consensus/

We follow an absolute *Human(s)-in-the-loop* approach when we talk about 
synchronization. The final decision whether to synchronize a value or 
not should rest with a human editor who understands consensus and the 
implications. There will be no automatic imports. Our focus is to 
drastically reduce the time to research all references for individual 
facts.


A trivial example is the release date of the single “Boys Don’t Cry” 
(March 16th, 1989) in the English 
, 
Japanese 
, 
and French 
 
Wikipedia, Wikidata  and 
finally in the external open database MusicBrainz 
. A 
human editor might need 15-30 minutes finding and opening all different 
sources, while our current prototype can spot differences and display 
them in 5 seconds.


We already had our first successful edit where a Wikipedia editor fixed 
the discrepancy with our prototype: “I’ve updated Wikidata so that all 
five sources are in agreement.” We are now working on the following tasks:


 * Scaling the system to all infoboxes, Wikidata and selected external
   databases (see below on the difficulties 

Re: [Wikidata] Ontology in XML

2019-08-11 Thread Sebastian Hellmann

Hi Ali, all,

we have this dataset: 
https://databus.dbpedia.org/dbpedia/wikidata/instance-types/2018.10.20 
and an ontology with some Wikidata links: 
https://databus.dbpedia.org/dbpedia/ontology/dbo-snapshots/2019.02.21T08.00.00Z


The owl version is XML.

It is true that there is no intention to make a Wikidata ontology. 
Nevertheless, we were wondering whether Wikidata couldn't just load 
DBpedia's model. We can bot import it, easily. I am sure this would help 
to query Wikidata.


Cleaning up the P31's and P279's is quite tedious, if done individually.

-- Sebastian



On 10.08.19 19:18, Marijane White wrote:


Perhaps someone can correct me if I am wrong, but I am under the 
impression that such a thing doesn’t exist and that Wikidata’s models 
are intentionally not documented as an ontology.  I gathered this 
understanding from Bob DuCharme’s blog post about extracting RDF 
models from Wikidata with SPARQL queries: 
http://www.bobdc.com/blog/extracting-rdf-data-models-fro/ 
<http://www.bobdc.com/blog/extracting-rdf-data-models-fro/>


*Marijane White, M.S.L.I.S.*

Data Librarian, Assistant Professor

Oregon Health & Science University Library

*Phone*: 503.494.3484

*Email*: whi...@ohsu.edu <mailto:whi...@ohsu.edu>

*ORCiD*: https://orcid.org/-0001-5059-4132

*From: *Wikidata  on behalf of 
Manzoor Ali 
*Reply-To: *Discussion list for the Wikidata project 


*Date: *Saturday, August 10, 2019 at 2:38 AM
*To: *"wikidata@lists.wikimedia.org" 
*Subject: *[Wikidata] Ontology in XML


Hello Wikidata,

Sorry in advance if I am using wrong mail. I need Wikidata ontology in 
XML form. can you please tell me that from which link I can download 
it. Thanks in advance.



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Scaling Wikidata Query Service

2019-06-17 Thread Sebastian Hellmann

Hi Amirouche,

On 16.06.19 23:01, Amirouche Boubekki wrote:
Le mer. 12 juin 2019 à 19:27, Amirouche Boubekki 
mailto:amirouche.boube...@gmail.com>> a 
écrit :


Hello Sebastian,

First thanks a lot for the reply. I started to believe that what I
was saying was complete nonsense.

Le mer. 12 juin 2019 à 16:51, Sebastian Hellmann
mailto:hellm...@informatik.uni-leipzig.de>> a écrit :

Hi Amirouche,

Any open data projects that are running open databases with
FoundationDB and WiredTiger? Where can I query them?

Thanks for asking. I will set up a wiredtiger instance of
wikidata. I need a few days, maybe a week (or two :)).

I could setup FoundationDB on a single machine instead but it will
require more time (maybe one more week).

Also, it will not support geo-queries. I will try to make
labelling work but with a custom syntax (inspired form SPARQL).


I figured that anything that is not SPARQL will not be convincing. 
Getting my engine 100% compatible is much work.


The example deployment I have given in the previous message should be 
enough to convince you that

FoundationDB can store WDQS.


Don get me wrong, I don want you to set it up. I am asking about a 
reference project, that has:


1. open data and an open database

2. decent amount of data

3. several years of running it.

Like OpenStreetMap and PostreSQL, MediaWiki/Wikipedia -> MySQL, DBpedia 
-> Virtuoso.


This would be a very good point for it. Otherwise I would consider it a 
sales trap, i.e. some open source which does not work really until you 
switch to the commercial product, same for Neptune.


Now I think, only Apple knows how to use it. Any other reference projects?


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Scaling Wikidata Query Service

2019-06-12 Thread Sebastian Hellmann

Hi Amirouche,

On 12.06.19 14:07, Amirouche Boubekki wrote:
> So there needs to be some smarter solution, one that we'd unlike to 
develop inhouse


Big cat, small fish. As wikidata continue to grow, it will have 
specific needs.

Needs that are unlikely to be solved by off-the-shelf solutions.



Are you suggesting to develop the database in-house? even MediaWiki uses 
MySQL





> but one that has already been verified by industry experience and 
other deployments.


FoundationDB and WiredTiger are respectively used at Apple (among 
other companies)
and MongoDB since 3.2 all over-the-world. WiredTiger is also used at 
Amazon.



Let`s not talk about MongoDB, it is irrelevant and very mixed. Some say 
it is THE solution for scalability, others have said it was the biggest 
disappointment.


Do FoundationDB and WiredTiger have any track record for hosting open 
data projects or being chosen by open data projects? PostgreSQL and 
MySQL are widely used, e.g. OpenStreetMaps. Virtuoso by DBpedia, 
LODCloud cache and Uniprot.


I don know FoundationDB or WiredTiger, but in the past there were often 
these OS projects published by large corporations that worked in-house, 
but not the OS variant. Apache UIMA was one such example. Maybe 
Blazegraph works much better if you move to Neptune, that could be a 
sales hook.


Any open data projects that are running open databases with FoundationDB 
and WiredTiger? Where can I query them?





> "Evaluation of Metadata Representations in RDF stores"

I don't understand how this is related to the scaling issues.


Not 100% pertinent, but do you have a better paper?


> [About proprietary version Virtuoso], I dare say [it must have] enormous advantage for us to 
consider running it in production.


That will be vendor lock-in for wikidata and wikimedia along all the 
poor souls that try to interop with it.


Actually Uniprot and Kingsley suggested to host the OS version. Sounded 
like this will hold for 5 more years, which is probably the average 
lifecycle. There is also SPARQL, which normally doesn`t do vendor 
lock-ins. Maybe you mean that nobody can rent 15 servers and install the 
same setup as WMF for Wikidata. That would be true. Switching always 
seems possible though.



--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Scaling Wikidata Query Service

2019-06-10 Thread Sebastian Hellmann
Yes, I can ask. I am talking a lot with them as we are redeploying 
DBpedia live and also pushing the new DBpedia to them soon.


I think, they also had a specific issue with how Wikidata does linked 
data, but I didn't get it, as it was mentioned too briefly.


All the best,

Sebastian


On 10.06.19 22:46, Stas Malyshev wrote:

Hi!


thanks for the elaboration. I can understand the background much better.
I have to admit, that I am also not a real expert, but very close to the
real experts like Vidal and Rahm who are co-authors of the SWJ paper or
the OpenLink devs.

If you know anybody at OpenLink that would be interested in trying to
evaluate such thing (i.e. how Wikidata could be hosted on Virtuso) and
provide support for this project, it would be interesting to discuss it.
While open-source thing is still a barrier and in general the
requirements are different, at least discussing it and maybe getting
some numbers might be useful.

Thanks,

--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Scaling Wikidata Query Service

2019-06-10 Thread Sebastian Hellmann
selves, so they know how to optimise. They
normally do large banks as customers with millions of write transactions
per hour. In LOD2 they also implemented column store features with
MonetDB and repartitioning in clusters.

I do not know the details of your usage scenario, so before we get into
comparisons, I'd like to understand:

1. Do your servers provide live synchronized updates with Wikdiata or
DBPedia? How many updates per second that server can process?
2. How many queries per second this server is serving? What kind of
queries are those?

We did preliminary very limited evaluation of Virtuoso for hosting
Wikidata, and it looks like it can load and host the necessary data
(though it does not support some customizations we have now and we could
not evaluate whether such customizations are possible) but it would
require significant time investment to port all the functionality to it.
Unfortunately, the lack of resources did not allow us to do fuller
evaluation.

Also, as I understand, "professional" capabilities of Virtuoso are
closed-source and require paid license, which probably would be a
problem to run it on WMF infrastructure unless we reach some kind of
special arrangement. Since this arrangement will probably not include
open-sourcing the enterprise part of Virtuoso, it should deliver a very
significant, I dare say enormous advantage for us to consider running it
in production. It may be possible that just OS version is also clearly
superior to the point that it is worth migrating, but this needs to be
established by evaluation.


- I recently heard a presentation from Arango-DB and they had a good
cluster concept as well, although I don't know anybody who tried it. The
slides seemed to make sense.

We considered AgangoDB in the past, and it turned out we couldn't use it
efficiently on the scales we need (could be our fault of course). They
also use their own proprietary language for querying, which might be
worth it if they deliver us a clear win on all other aspects, but that
does not seem to be the case.
Also, AgangoDB seems to be document database inside. This is not what
our current data model is. While it is possible to model Wikidata in
this way, again, changing the data model from RDF/SPARQL to a different
one is an enormous shift, which can only be justified by an equally
enormous improvement in some other areas, which currently is not clear.
This project seems to be still very young. While I would be very
interested if somebody took on themselves to model Wikidata in terms of
ArangoDB documents, load the whole data and see what the resulting
performance would be, I am not sure it would be wise for us to invest
our team's - very limited currently - resources into that.

Thanks,

--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Scaling Wikidata Query Service

2019-06-10 Thread Sebastian Hellmann

Hi Guillaume,

On 10.06.19 16:54, Guillaume Lederrey wrote:

Hello!

On Mon, Jun 10, 2019 at 4:28 PM Sebastian Hellmann
 wrote:

Hi Guillaume,

On 06.06.19 21:32, Guillaume Lederrey wrote:

Hello all!

There has been a number of concerns raised about the performance and
scaling of Wikdata Query Service. We share those concerns and we are
doing our best to address them. Here is some info about what is going
on:

In an ideal world, WDQS should:

* scale in terms of data size
* scale in terms of number of edits
* have low update latency
* expose a SPARQL endpoint for queries
* allow anyone to run any queries on the public WDQS endpoint
* provide great query performance
* provide a high level of availability

Scaling graph databases is a "known hard problem", and we are reaching
a scale where there are no obvious easy solutions to address all the
above constraints. At this point, just "throwing hardware at the
problem" is not an option anymore. We need to go deeper into the
details and potentially make major changes to the current architecture.
Some scaling considerations are discussed in [1]. This is going to take
time.

I am not sure how to evaluate this correctly. Scaling databases in general is a "known hard 
problem" and graph databases a sub-field of it, which are optimized for graph-like queries as 
opposed to column stores or relational databases. If you say that "throwing hardware at the 
problem" does not help, you are admitting that Blazegraph does not scale for what is needed by 
Wikidata.

Yes, I am admitting that Blazegraph (at least in the way we are using
it at the moment) does not scale to our future needs. Blazegraph does
have support for sharding (what they call "Scale Out"). And yes, we
need to have a closer look at how that works. I'm not the expert here,
so I won't even try to assert if that's a viable solution or not.


Yes, sharding is what you need, I think, instead of replication. This is 
the technique where data is repartitioned into more manageable chunks 
across servers.


Here is a good explanation of it:

http://vos.openlinksw.com/owiki/wiki/VOS/VOSArticleWebScaleRDF

http://docs.openlinksw.com/virtuoso/ch-clusterprogramming/


Sharding, scale-out or repartitioning is a classical enterprise feature 
for Open-source databases. I am rather surprised that Blazegraph is full 
GPL without an enterprise edition. But then they really sounded like 
their goal as a company was to be bought by a bigger fish, in this case 
Amazon Web Services. What is their deal? They are offering support?


So if you go open-source, I think you will have a hard time finding good 
free databases sharding/repartition. FoundationDB as proposed in the 
grant [1]is from Apple


[1] https://meta.wikimedia.org/wiki/Grants:Project/WDQS_On_FoundationDB


I mean try the sharding feature. At some point though it might be worth 
considering to go enterprise. Corporate Open Source often has a twist.


Just a note here: Virtuoso is also a full RDMS, so you could probably 
keep wikibase db in the same cluster and fix the asynchronicity. That is 
also true for any mappers like Sparqlify: 
http://aksw.org/Projects/Sparqlify.html However, these shift the 
problem, then you need a sharded/repartitioned relational database



All the best,

Sebastian





 From [1]:

At the moment, each WDQS cluster is a group of independent servers, sharing 
nothing, with each server independently updated and each server holding a full 
data set.

Then it is not a "cluster" in the sense of databases. It is more a redundancy 
architecture like RAID 1. Is this really how BlazeGraph does it? Don't they have a proper 
cluster solution, where they repartition data across servers? Or is this independent 
servers a wikimedia staff homebuild?

It all depends on your definition of a cluster. We have groups of
machine collectively serving some coherent traffic, but each machine
is completely independent from others. So yes, the comparison to RAID1
is adequate.


Some info here:

- We evaluated some stores according to their performance: 
http://www.semantic-web-journal.net/content/evaluation-metadata-representations-rdf-stores-0
  "Evaluation of Metadata Representations in RDF stores"

Thanks for the link! That looks quite interesting!


- Virtuoso has proven quite useful. I don't want to advertise here, but the 
thing they have going for DBpedia uses ridiculous hardware, i.e. 64GB RAM and 
it is also the OS version, not the professional with clustering and repartition 
capability. So we are playing the game since ten years now: Everybody tries 
other databases, but then most people come back to virtuoso. I have to admit 
that OpenLink is maintaining the hosting for DBpedia themselves, so they know 
how to optimise. They normally do large banks as customers with millions of 
write transactions per hour. In LOD2 they also implemented column store 
features with MonetDB and re

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-10 Thread Sebastian Hellmann

Hi Guillaume,

On 06.06.19 21:32, Guillaume Lederrey wrote:

Hello all!

There has been a number of concerns raised about the performance and
scaling of Wikdata Query Service. We share those concerns and we are
doing our best to address them. Here is some info about what is going
on:

In an ideal world, WDQS should:

* scale in terms of data size
* scale in terms of number of edits
* have low update latency
* expose a SPARQL endpoint for queries
* allow anyone to run any queries on the public WDQS endpoint
* provide great query performance
* provide a high level of availability

Scaling graph databases is a "known hard problem", and we are reaching
a scale where there are no obvious easy solutions to address all the
above constraints. At this point, just "throwing hardware at the
problem" is not an option anymore. We need to go deeper into the
details and potentially make major changes to the current architecture.
Some scaling considerations are discussed in [1]. This is going to take
time.


I am not sure how to evaluate this correctly. Scaling databases in 
general is a "known hard problem" and graph databases a sub-field of it, 
which are optimized for graph-like queries as opposed to column stores 
or relational databases. If you say that "throwing hardware at the 
problem" does not help, you are admitting that Blazegraph does not scale 
for what is needed by Wikidata.


From [1]:

At the moment, each WDQS cluster is a group of independent servers, 
sharing nothing, with each server independently updated and each 
server holding a full data set.


Then it is not a "cluster" in the sense of databases. It is more a 
redundancy architecture like RAID 1. Is this really how BlazeGraph does 
it? Don't they have a proper cluster solution, where they repartition 
data across servers? Or is this independent servers a wikimedia staff 
homebuild?


Some info here:

- We evaluated some stores according to their performance: 
http://www.semantic-web-journal.net/content/evaluation-metadata-representations-rdf-stores-0 
"Evaluation of Metadata Representations in RDF stores"


- Virtuoso has proven quite useful. I don't want to advertise here, but 
the thing they have going for DBpedia uses ridiculous hardware, i.e. 
64GB RAM and it is also the OS version, not the professional with 
clustering and repartition capability. So we are playing the game since 
ten years now: Everybody tries other databases, but then most people 
come back to virtuoso. I have to admit that OpenLink is maintaining the 
hosting for DBpedia themselves, so they know how to optimise. They 
normally do large banks as customers with millions of write transactions 
per hour. In LOD2 they also implemented column store features with 
MonetDB and repartitioning in clusters.


- I recently heard a presentation from Arango-DB and they had a good 
cluster concept as well, although I don't know anybody who tried it. The 
slides seemed to make sense.


All the best,

Sebastian





Reasonably, addressing all of the above constraints is unlikely to
ever happen. Some of the constraints are non negotiable: if we can't
keep up with Wikidata in term of data size or number of edits, it does
not make sense to address query performance. On some constraints, we
will probably need to compromise.

For example, the update process is asynchronous. It is by nature
expected to lag. In the best case, this lag is measured in minutes,
but can climb to hours occasionally. This is a case of prioritizing
stability and correctness (ingesting all edits) over update latency.
And while we can work to reduce the maximum latency, this will still
be an asynchronous process and needs to be considered as such.

We currently have one Blazegraph expert working with us to address a
number of performance and stability issues. We
are planning to hire an additional engineer to help us support the
service in the long term. You can follow our current work in phabricator [2].

If anyone has experience with scaling large graph databases, please
reach out to us, we're always happy to share ideas!

Thanks all for your patience!

Guillaume

[1] https://wikitech.wikimedia.org/wiki/Wikidata_query_service/ScalingStrategy
[2] https://phabricator.wikimedia.org/project/view/1239/


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Query performance

2019-06-03 Thread Sebastian Hellmann

Hi Gerard,

Query performance has never been this bad. Currently the lag is over 6 
HOURS.. and rising


My previous question stands.. What is the plan because we do not cope.


how about each hosts their own? This would provide some relief.

Below, I attached a query to https://databus.dbpedia.org/repo/sparql to 
query the latest download urls for the Wikidata-dbpedia extraction: 
https://databus.dbpedia.org/dbpedia/wikidata


Here is the yasgui link: https://tinyurl.com/yy768vh3

We have a virtuoso docker image that takes the query, downloads the 
files and fills a local sparql endpoint:



 1. Download theDockerfile

https://github.com/dbpedia/dev.dbpedia.org/raw/master/pics/Dockerfile.dockerfile

 2. Build|docker build -t databus-dump-triplestore .|
 3. Load any Databus|?file|query:
|docker run -p 8890:8890 databus-dump-triplestore $(cat
file-with-query.sparql)|

Doing it this way would ease some load and the docker updates each week 
and can be cronjobbed.


Note that this is for the Wikidata-DBpedia extraction: 
http://svn.aksw.org/papers/2015/ISWC_Wikidata2DBpedia/public.pdf


Databus is an open platform, so as soon as Wikidata/WMF or somebody else 
publishes the original wikidata dumps there, you can use the docker to 
decentralise hosting.



All the best,

Sebastian

QUERY:

PREFIX dataid: 
PREFIX dataid-cv: 
PREFIX dct: 
PREFIX dcat:  

# Get all files
SELECT DISTINCT ?file WHERE {
 ?dataset dataid:artifact ?artifact .
    FILTER (?artifact in (
,
,
,
,
,
,
,
,
,
,
,
,
,
,

    ) ).
    ?dataset dcat:distribution ?distribution .
    ?dataset dct:hasVersion ?latestVersion .
    {
        SELECT (max(?version) as ?latestVersion) WHERE {
            ?dataset dataid:artifact ?artifact .
    FILTER (?artifact in (
,
,
,
,
,
,
,
,
,
,
,
,
,
,

   ) ).
            ?dataset dct:hasVersion ?version .
        }
    }
    ?distribution dcat:downloadURL ?file .

}
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Are we ready for our future

2019-05-06 Thread Sebastian Hellmann

Hi all,

I would like to throw in a slightly different angle here. The 
GlobalFactSync Project 
https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE 
will start in June.


As a preparation we wrote this paper describing the engine behind it: 
https://svn.aksw.org/papers/2019/ISWC_FlexiFusion/public.pdf


There has already been very constructive comments by 
https://meta.wikimedia.org/wiki/Grants_talk:Project/DBpedia/GlobalFactSyncRE#Interfacing_with_Wikidata's_data_quality_issues_in_certain_areas 
which led us to focus on syncing music (bands, singles, albums) as 1 of 
the 10 sync targets. Other proposals for domains are very welcome.


The rationale behind GlobalFactSync is this:

Managing Data Quality is pareto-efficient, i.e. the first 80% are easy 
to achieve and each percent after that gets much more expensive 
following the law of diminishing returns. As a consequence for Wikidata: 
WD is probably at 80% now, so maintaining it gets harder because you 
need to micro-optimize to find the new errors and fill in missing 
information. This is exponentiated by growing Wikidata further in terms 
of entities.


GlobalFactSync does not solve the pareto-efficiency, but it cheats it as 
we hope that it will pool all the manpower of Wikipedia editors and 
Wikidata editors and also mobilize DBpedia users to edit either in WP or 
WD.


In general, Wikimedia runs the 6th largest website in the World. They 
are in the same league as Google or Facebook and I have absolutely no 
doubt that they have ample expertise in tackling scalability of hosting, 
e.g. by doubling the number of servers or web-caching. The problem I see 
is that you can not easily double the editor manpower or bot edits. 
Hence the GlobalFactSync Grant.


We will send out an announcement in a week or two. Fell free to suggest 
sync targets. We are still looking into the complexity of managing 
references as this is bread and butter for the project.


All the best,

Sebastian



On 05.05.19 18:07, Yaroslav Blanter wrote:
Indeed, these collaborations in high-energy physics are not static 
quantities, they change essentially every day (people getting hired 
and had their contract expired, and most likely every two papers have 
a slightly different author list.


Cheers
Yaroslav

On Sun, May 5, 2019 at 5:58 PM Darren Cook <mailto:dar...@dcook.org>> wrote:


> We may also want to consider if Wikidata is actually the best
store for
> all kinds of data. Let's consider example:
>
> https://www.wikidata.org/w/index.php?title=Q57009452
>
> This is an entity that is almost 2M in size, almost 3000
statements ...

A paper with 2884 authors! arxiv.org <http://arxiv.org> deals with
it by calling them the
"Atlas Collaboration": https://arxiv.org/abs/1403.0489
The actual paper does the same (with the full list of names and
affiliations in the Appendix).

The nice thing about graph databases is we should be able to set
author
to point to an "Atlas Collaboration" node, and then have that node
point
to the 2884 individual author nodes (and each of those nodes point to
their affiliation).

What are the reasons to not re-organize it that way?

My first thought was that who is in the collaboration changes over
time?
But does it change day to day, or only change each academic year?

Either way, maybe we need to point the author field to something like
"Atlas Collaboration 2014a", and clone-and-modify that node each
time we
come to a paper that describes a different membership?

Or is it better to do each persons membership of such a group with a
start and end date?

(BTW, arxiv.org <http://arxiv.org> tells me there are 1059 results
for ATLAS Collaboration;
don't know if one "result" corresponds to one "paper", though.)

> While I am not against storing this as such, I do wonder if it's
> sustainable to keep such kind of data together with other
Wikidata data
> in a single database.

It feels like it belongs in "core" Wikidata. Being able to ask "which
papers has this researcher written?" seems like a good example of a
Wikidata query. Similarly,  "which papers have The ATLAS
Collaboration"
worked on?"

But, also, are queries like "Which authors of Physics papers went to a
high school that had more than 1000 students?" part of the goal of
Wikidata? If so, Wikidata needs optimizing in such a way that
makes such
queries both possible and tractable.

Darren

___
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata


_______
Wikidata mailin

Re: [Wikidata] [Wikipedia-l] Fwd: [Wikimedia-l] Wikipedia in an abstract language

2019-01-15 Thread Sebastian Hellmann
h quality prose.
>
> On Mon, Jan 14, 2019 at 8:09 AM Felipe Schenone
mailto:scheno...@gmail.com>> wrote:
> >
> > This is quite an awesome idea. But thinking about it, wouldn't
it be possible to use structured data in wikidata to generate
articles? Can't we skip the need of learning an abstract language
by using wikidata?
> >
> > Also, is there discussion about this idea anywhere in the
Wikimedia wikis? I haven't found any...
> >
> > On Sat, Sep 29, 2018 at 3:44 PM Pine W mailto:wiki.p...@gmail.com>> wrote:
> >>
> >> Forwarding because this (ambitious!) proposal may be of
interest to people
> >> on other lists. I'm not endorsing the proposal at this time,
but I'm
> >> curious about it.
> >>
> >> Pine
> >> ( https://meta.wikimedia.org/wiki/User:Pine )
> >>
> >>
> >> -- Forwarded message -
> >> From: Denny Vrandečić mailto:vrande...@gmail.com>>
> >> Date: Sat, Sep 29, 2018 at 6:32 PM
> >> Subject: [Wikimedia-l] Wikipedia in an abstract language
> >> To: Wikimedia Mailing List mailto:wikimedi...@lists.wikimedia.org>>
> >>
> >>
> >> Semantic Web languages allow to express ontologies and
knowledge bases in a
> >> way meant to be particularly amenable to the Web. Ontologies
formalize the
> >> shared understanding of a domain. But the most expressive and
widespread
> >> languages that we know of are human natural languages, and
the largest
> >> knowledge base we have is the wealth of text written in human
languages.
> >>
> >> We looks for a path to bridge the gap between knowledge
representation
> >> languages such as OWL and human natural languages such as
English. We
> >> propose a project to simultaneously expose that gap, allow to
collaborate
> >> on closing it, make progress widely visible, and is highly
attractive and
> >> valuable in its own right: a Wikipedia written in an abstract
language to
> >> be rendered into any natural language on request. This would
make current
> >> Wikipedia editors about 100x more productive, and increase
the content of
> >> Wikipedia by 10x. For billions of users this will unlock
knowledge they
> >> currently do not have access to.
> >>
> >> My first talk on this topic will be on October 10, 2018,
16:45-17:00, at
> >> the Asilomar in Monterey, CA during the Blue Sky track of
ISWC. My second,
> >> longer talk on the topic will be at the DL workshop in Tempe,
AZ, October
> >> 27-29. Comments are very welcome as I prepare the slides and
the talk.
> >>
> >> Link to the paper:
http://simia.net/download/abstractwikipedia.pdf
> >>
> >> Cheers,
> >> Denny
> >> ___
> >> Wikimedia-l mailing list, guidelines at:
> >> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> >> https://meta.wikimedia.org/wiki/Wikimedia-l
> >> New messages to: wikimedi...@lists.wikimedia.org
<mailto:wikimedi...@lists.wikimedia.org>
> >> Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> >> <mailto:wikimedia-l-requ...@lists.wikimedia.org
<mailto:wikimedia-l-requ...@lists.wikimedia.org>?subject=unsubscribe>
> >> ___
> >> Wikipedia-l mailing list
> >> wikipedi...@lists.wikimedia.org
<mailto:wikipedi...@lists.wikimedia.org>
> >> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
> > https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Wikidata Adoption in Wikipedia (584 million facts, HarvestTemplates process granularity)

2018-12-06 Thread Sebastian Hellmann

Hi all,

as we have written a previous email, we are currently applying for a 
MediaWiki grant to exploit the DBpedia Extraction Software to syncronize 
between infoboxes between Wikipedias as well as Wikipedia and Wikidata.


During the discussion on the talk page 
(https://meta.wikimedia.org/wiki/Grants_talk:Project/DBpedia/GlobalFactSyncRE) 
the question was raised that we as DBpedia have too much of a bird's eye 
view on things and it is true, i.e. since we are used to bulk extracting 
and working with a lot of data instead of individual records.


The main problem here is that for us the prototype first of all shows 
that we have the data, which we could exploit in several ways, however 
for Wikipedians and Wikidata users the process of using it is the main 
focus. We assumed that for Wikipedians an article-centric view would be 
the best, i.e. you can directly compare one article's infobox with all 
other articles and wikidata. However, for Wikidata the 
article/entity-centric view does not seem practical and we would like to 
have feedback on this. The options for globalfactsync are:


1. entity-centric view as it is now: same infobox across all wikipedias
   and wikidata for one article/entity
2. template-centric (this one will not work, as there are no equivalent
   infoboxes across Wikipedias or only very few )
3. template-parameter-centric: this is the current focus of Harvest
   Templates, i.e. one parameter in one template in one language
   https://tools.wmflabs.org/pltools/harvesttemplates/
 * Note that one improvement DBpedia could make here is the
   mappings we have parameter to DBpedia to Wikidata
 * Another is that we can save the logs and manifest the mappings
   entered by users to do a continuous sync, at the moment it is a
   one time import
4. multilingual-template-parameter-centric or wikidata property
   centric, i.e. one parameter/one Wikidata P across multiple templates
   across multiple languages. This is supercharging harvesttemplates,
   but since it is a power tool for syncing, it gets more complex and
   overview is difficult.

All Feedback welcome, we also created a topic here: 
https://meta.wikimedia.org/wiki/Grants_talk:Project/DBpedia/GlobalFactSyncRE#Focus_of_Tool_for_Wikidata


# Motivation, Wikidata adoption report

One goal of Wikidata is to support the infoboxes. We are doing monthly 
releases now at DBpedia and are able to provide statistics about 
Wikidata adoption or missing adoption in Wikipedia:


https://docs.google.com/spreadsheets/d/1_aNjgExJW_b0MvDSQs5iSXHYlwnZ8nU2zrQMxZ5edrQ/edit#gid=0

In total 584 million facts are still maintained in Wikipedia, not using 
Wikidata. In case they are already in Wikidata, this means that there 
are two or more places the same fact is maintained, multiplying 
maintenance work (unless the fact is static).


Code used to extract: 
https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/mappings/InfoboxExtractor.scala


Data: 
http://downloads.dbpedia.org/repo/dev/generic-spark/infobox-properties/2018.11.01/


Stat generation: echo -n "" > res.csv ;  for i in `ls *.bz2` ; do echo 
-n $i | sed 's/infobox-properties-2018.11.01_//;s/.ttl.bz2/\t/'  >> 
res.csv ; lbzip2 -dc $i | wc -l >> res.csv  ; done



--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Wikidata Logo on DBpedia HTML Website

2018-10-29 Thread Sebastian Hellmann

Hi all,

we are planning to include the Wikidata Logo on the new DBpedia hosting, 
i.e.


 * Old page: http://dbpedia.org/page/Siemens
 * New page prototype for January 2019:
   http://kurzum.net/dbpedia_banner/Siemens.html

The link is taken from the SameAs relation of DBpedia to Wikidata and we 
will add the text "Edit $label on Wikidata".



If nobody objects, we would just go ahead and use the logo to link to 
Wikidata. In my opinion only good things can come from that and it will 
help with new editors for Wikidata.


In case you have any concerns or we need to do a logo clearing, please 
tell me.

Also cc'ing Lydia: if there are organisational issues please PM me.


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-18 Thread Sebastian Hellmann
But the goal is to load and mix public and/or closed data. For 
organisations this is the normal process, you discuss the requirements, 
then you do a demo with their data, so they can see the benefits directly.


So we made the demo for the DNB (https://data.dnb.de/opendata/) and the 
KB (no download, they sent a link via email).



I was suggesting your foaf file, because you might not have a dataset 
(maybe from some research project?).


I uploaded my foaf file, but I accidently added wrong links, otherwise 
it would have fused my DBpedia entry with my wikidata and DNB entry.


However, I am unable to load examples, it only works on real data ;)  
What would be the point? Even I don't know whether this is you: 
https://www.wikidata.org/wiki/Q21264248


All the best,

Sebastian




On 18.05.2018 10:39, Laura Morales wrote:

You need my data to show me a demo? I don't understand... it doesn't make 
sense... Don't you think that people would rather not bother with your demo at 
all, instead of giving their data to you? You should have a public demo with a 
demo foaf as well, but anyway if you need my foaf file then you can use this:


 @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
 @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
 @prefix foaf: <http://xmlns.com/foaf/0.1/> .
 
 <http://example.org/LM>

 a foaf:Person ;
 foaf:name "Laura" ;
 foaf:mbox <mailto:la...@example.org> ;
 foaf:homepage <http://example.org/LM> ;
 foaf:nick "Laura" .

  

  


Sent: Friday, May 18, 2018 at 12:04 AM
From: "Sebastian Hellmann" <hellm...@informatik.uni-leipzig.de>
To: "Discussion list for the Wikidata project" <wikidata@lists.wikimedia.org>, "Laura 
Morales" <laure...@mail.com>
Subject: Re: [Wikidata] DBpedia Databus (alpha version)

Hi Laura,
to see a small demo, we would need your data, either your foaf profile or other 
data, ideally publicly downloadable. Automatic upload is currently being 
implemented, but I can load it manually or you can wait.
At the moment you can see:
http://88.99.242.78:9009/?s=http%3A%2F%2Fid.dbpedia.org%2Fglobal%2F4o4XK=http%3A%2F%2Fdbpedia.org%2Fontology%2FdeathDate=dnb.de
a data entry where en wikipedia and wikidata have more granular data than the 
dutch and german national library
http://88.99.242.78:9009/?s=http%3A%2F%2Fid.dbpedia.org%2Fglobal%2Fe6R5=http%3A%2F%2Fdbpedia.org%2Fontology%2FdeathDate=dnb.de[http://88.99.242.78:9009/?s=http%3A%2F%2Fid.dbpedia.org%2Fglobal%2Fe6R5=http%3A%2F%2Fdbpedia.org%2Fontology%2FdeathDate=dnb.de]
(DNB value could actually be imported, although I am not sure if there is a 
difference, between a source and a reference, i.e. DNB has this statement, but 
they don't have a reference themselves)
a data entry where the german national library has the best value.
We also made an infobox mockup for the Eiffel Tower for our grant proposal with 
a sync button next to the Infobox property:
https://meta.wikimedia.org/wiki/Grants_talk:Project/DBpedia/GlobalFactSync#Prototype_with_more_focus[https://meta.wikimedia.org/wiki/Grants_talk:Project/DBpedia/GlobalFactSync#Prototype_with_more_focus]
All the best,
Sebastian
  
On 15.05.2018 06:35, Laura Morales wrote:

I was more expecting technical questions here, but it seems there is interest
in how the economics work. However, this part is not easy to write for me.
I'd personally like to test a demo of the Databus. I'd also like to see a 
complete list of all the graphs that are available.

___
Wikidata mailing 
listwikid...@lists.wikimedia.org[mailto:Wikidata@lists.wikimedia.org]https://lists.wikimedia.org/mailman/listinfo/wikidata

  
--

All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center
at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org[http://dbpedia.org], 
http://nlp2rdf.org[http://nlp2rdf.org], 
http://linguistics.okfn.org[http://linguistics.okfn.org], 
https://www.w3.org/community/ld4lt[http://www.w3.org/community/ld4lt]
Homepage: http://aksw.org/SebastianHellmann[http://aksw.org/SebastianHellmann]
Research Group: http://aksw.org[http://aksw.org]



--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Wikimedia-l] Solve legal uncertainty of Wikidata

2018-05-18 Thread Sebastian Hellmann
 that you might be
> interested
> > > to look at and participate in.
> > >
> > > As Denny suggested in the ticket to give it more visibility
through the
> > > discussion on the Wikidata chat
> > > <
> > > https://www.wikidata.org/wiki/Wikidata:Project_chat#
> > Importing_datasets_under_incompatible_licenses>,
> > >
> > > I thought it was interesting to highlight it a bit more.
> > >
> > > Cheers
> > >
> > > ___
> > > Wikimedia-l mailing list, guidelines at:
> > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> > > https://meta.wikimedia.org/wiki/Wikimedia-l
> > > New messages to: wikimedi...@lists.wikimedia.org
<mailto:wikimedi...@lists.wikimedia.org>
> > > Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > > <mailto:wikimedia-l-requ...@lists.wikimedia.org
<mailto:wikimedia-l-requ...@lists.wikimedia.org>?subject=unsubscribe>
> > ___
> > Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/
> > wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
> > wiki/Wikimedia-l
> > New messages to: wikimedi...@lists.wikimedia.org
<mailto:wikimedi...@lists.wikimedia.org>
> > Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-requ...@lists.wikimedia.org
<mailto:wikimedia-l-requ...@lists.wikimedia.org>?subject=unsubscribe>
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: wikimedi...@lists.wikimedia.org
<mailto:wikimedi...@lists.wikimedia.org>
> Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-requ...@lists.wikimedia.org
<mailto:wikimedia-l-requ...@lists.wikimedia.org>?subject=unsubscribe>
___
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: wikimedi...@lists.wikimedia.org
<mailto:wikimedi...@lists.wikimedia.org>
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-requ...@lists.wikimedia.org
<mailto:wikimedia-l-requ...@lists.wikimedia.org>?subject=unsubscribe>



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-17 Thread Sebastian Hellmann

Hi Laura,

to see a small demo, we would need your data, either your foaf profile 
or other data, ideally publicly downloadable. Automatic upload is 
currently being implemented, but I can load it manually or you can wait.


At the moment you can see:

http://88.99.242.78:9009/?s=http%3A%2F%2Fid.dbpedia.org%2Fglobal%2F4o4XK=http%3A%2F%2Fdbpedia.org%2Fontology%2FdeathDate=dnb.de

a data entry where en wikipedia and wikidata have more granular data 
than the dutch and german national library


http://88.99.242.78:9009/?s=http%3A%2F%2Fid.dbpedia.org%2Fglobal%2Fe6R5=http%3A%2F%2Fdbpedia.org%2Fontology%2FdeathDate=dnb.de

(DNB value could actually be imported, although I am not sure if there 
is a difference, between a source and a reference, i.e. DNB has this 
statement, but they don't have a reference themselves)


a data entry where the german national library has the best value.

We also made an infobox mockup for the Eiffel Tower for our grant 
proposal with a sync button next to the Infobox property:


https://meta.wikimedia.org/wiki/Grants_talk:Project/DBpedia/GlobalFactSync#Prototype_with_more_focus

All the best,

Sebastian


On 15.05.2018 06:35, Laura Morales wrote:

I was more expecting technical questions here, but it seems there is interest
in how the economics work. However, this part is not easy to write for me.


I'd personally like to test a demo of the Databus. I'd also like to see a 
complete list of all the graphs that are available.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] How to find the Dbpedia data for a Wikidata item?

2018-05-17 Thread Sebastian Hellmann

Hi PWN,


On 23.04.2018 06:41, PWN wrote:

If one knows the Q code (or URI) for an entity on Wikidata, how can one find 
the Dbpedia Id and the information linked to it?
Thank you.


sorry to come late to this. You can find the new dataset here that kind 
of clusters all DBpedia URIs and the Wikidata ID around the new DBpedia 
Identifiers:


http://downloads.dbpedia.org/databus/global/persistence-core/cluster-iri-provenance-ntriples/

There is also a version with some external datasets:
http://downloads.dbpedia.org/databus/global/persistence-all/cluster-iri-provenance-ntriples/

Both are not complete yet, but will soon be.

All the best,
Sebastian




Sent from my iPad
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-17 Thread Sebastian Hellmann

Hi GerardM and Nemo,

it is kind of ok, what Nemo said, because the comparison to pixie dust 
holds.


We are trying to decentralize a lot, which makes everything seem very 
vague. Probably the same with Tim Berners-Lee's new project: 
https://solid.mit.edu/


At first glance they offer the same features as Facebook and Twitter, 
which makes it hard to believe that they will be successful, the trick 
is here to provide the right incentives and usefulness, which will make 
the network effect.


The main problem I see is that data quality follows the 
pareto-distribution. The more data you have and the better the quality, 
the harder it gets to be even better. Test-driven validation only makes 
it more efficient, but does not beat the pareto-distribution.  Networked 
data can help here to enable reuse and kind of cheat pareto, but not 
beat it. If you crack the incentives/network issue it is pixie dust and 
makes the thing fly.



Working with data is hard and repetitive. We envision a hub, where
everybody can upload data and then useful operations like
versioning, cleaning, transformation, mapping, linking, merging,
hosting is done


Sounds like Wikidata!
@Nemo: In my experience you can't really upload data to Wikidata. It 
comes with a lot of barriers. In the beginning, I understood the 
argument, that you couldn't load DBpedia or Freebase since there were no 
references.
Now I saw stats that half the statements are not referenced anyhow and 
another third references Wikipedia. 
https://docs.google.com/presentation/d/1XX-yzT98fglAfFkHoixOI1XC1uwrS6f0u1xjdZT9TYI/edit#slide=id.g21d2403a1a_0_50 
So in hindsight, Wikidata could have started out with DBpedia easily and 
would have a much better start and be much more developed.


DBpedia's properties are directly related to the infobox properties as 
all data is extracted there, which Wikidata aims to cover as well, so a 
perfect match. So the Wikidata community spend a lot of time adding data 
that could have been just uploaded right from the start and focused on 
the references.


There is also Cunningham's law (the inventor of wikis): "the best way to 
get the right answer on the internet is not to ask a question; it's to 
post the wrong answer." So the extraction errors would have been an 
incentive to fix them...


Now Wikidata is dealing with this 
https://en.wikipedia.org/wiki/Wikipedia:Wikidata/2018_Infobox_RfC and we 
are concerned that it will not reach its goal to the fullest.
We are still very interested to collaborate on this and contribute where 
we can.


All the best,
Sebastian


On 15.05.2018 07:59, Gerard Meijssen wrote:

Hoi,
We do not provide useful operations like versioning, cleaning, 
transformation at Wikidata. We do not compare we do not curate at 
Wikidata.


So when somewhere else they make it their priority and do a better job 
at it, rejoice, don't mock. The GREAT thing about DBpedia that they 
/are /willing to collaborate.

Thanks,
         GerardM

On 15 May 2018 at 07:51, Federico Leva (Nemo) <nemow...@gmail.com 
<mailto:nemow...@gmail.com>> wrote:


Sebastian Hellmann, 08/05/2018 14:29:

Working with data is hard and repetitive. We envision a hub,
where everybody can upload data and then useful operations
like versioning, cleaning, transformation, mapping, linking,
merging, hosting is done


Sounds like Wikidata!

automagically


Except this. There is always some market for pixie dust.

Federico


___
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
<https://lists.wikimedia.org/mailman/listinfo/wikidata>




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-14 Thread Sebastian Hellmann

Hi Nicolas, Thad,


On 14.05.2018 22:42, Nicolas Torzec wrote:

Sebastian,

Is there a document/paper that summarizes the rationales, vision, and 
technical details behind DBpedia's Databus.  Also, since a few other 
companies already tried to recombine and publish structured data in a 
more principled way before, what is different here?




these are legit questions. If you mention other companies, I would start 
to wonder which ones exactly. I chatted with Jamie Taylor about Freebase 
and he agreed (or maybe just wanted to be nice...) that we are solving 
many aspects which were flawed there. Do you know of any other 
successful models? I would need to know specifics in order to compare.


We basically took the top of the cream of what was developed over the 
last years and discussed models that would incentivize open data as well 
as bringing in the money to maintain open data. The system here is 
fundamentally different from any approach that copies and aggregates 
data. I know that nowadays you would hear the sentence "we built a 
pipeline" in every other presentation. But data is not oil and one way 
pipelines do not make sources better.


Since we are "alpha" there is no documentation yet, but we are 
developing the whole system with around 10 organisations at different 
ends of the databus. The data is available already for download.

So basically...

where you get "compute" heavy (querying SPARQL)... you are going to 
charge fees for providing that compute heavy query service.
 where you are not "compute" heavy (providing download bandwidth to 
get files) ... you are not going to charge fees.


The latest read is the handout for Europeana Tech, which should clarify 
this point: 
https://docs.google.com/document/d/1OkHBpvQ0h5Qnifn5XKYVV2fiMjsdaooNg1iqS4rCRHA/edit 

I was more expecting technical questions here, but it seems there is 
interest in how the economics work. However, this part is not easy to 
write for me.


All the best,
Sebastian





Cheers.
-N.



On Tue, May 8, 2018 at 1:57 PM, Thad Guidry <thadgui...@gmail.com 
<mailto:thadgui...@gmail.com>> wrote:


I am asking Sebastian about the rationale for paid service.


On Tue, May 8, 2018 at 2:47 PM Laura Morales <laure...@mail.com
<mailto:laure...@mail.com>> wrote:

Is this a question for Sebastian, or are you talking on behalf
of the project?




Sent: Tuesday, May 08, 2018 at 5:10 PM
From: "Thad Guidry" <thadgui...@gmail.com
<mailto:thadgui...@gmail.com>>
To: "Discussion list for the Wikidata project"
<wikidata@lists.wikimedia.org
<mailto:wikidata@lists.wikimedia.org>>
Cc: "Laura Morales" <laure...@mail.com <mailto:laure...@mail.com>>
Subject: Re: [Wikidata] DBpedia Databus (alpha version)

So basically...

where you get "compute" heavy (querying SPARQL)... you are
going to charge fees for providing that compute heavy query
service.
 where you are not "compute" heavy (providing download
bandwidth to get files) ... you are not going to charge fees.
 -Thad


___
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
<https://lists.wikimedia.org/mailman/listinfo/wikidata>




_______
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-08 Thread Sebastian Hellmann

Hi Laura,


On 08.05.2018 15:30, Laura Morales wrote:

So, in short, DBPedia is turning into a business with a "community edition + 
enterprise edition" kind of model?



No, definitely not. We were asked by many companies to make an 
enterprise edition, but we concluded that this would diminish the 
quality of available open data.


So the core tools are more a GitHub for data, where you can fork, mix, 
republish. The business model is a 
https://en.wikipedia.org/wiki/Clearing_house_(finance) where you can do 
the transactions yourself or pay if you would like the convenience of 
somebody else doing the work. This is an adaption of business models 
from open source software.


There is also a vision of producing economic Linked Data. Many bubbles 
in the LOD cloud have deteriorated a lot, since they have run out of 
funding for maintenance. In the future, we hope to provide a revenue 
stream for them via the clearing house mechanisms, i.e. files free, 
queryable via SPARQL/Linked Data as a paid service.


Also since the data is open, there should be no conflict to synchronize 
with Wikidata and make Wikidata richer.



All the best,
Sebastian


  
  


Sent: Tuesday, May 08, 2018 at 2:29 PM
From: "Sebastian Hellmann" <hellm...@informatik.uni-leipzig.de>
To: "Discussion list for the Wikidata project" <wikidata@lists.wikimedia.org>, "Laura 
Morales" <laure...@mail.com>
Subject: Re: [Wikidata] DBpedia Databus (alpha version)

Hi Laura,
  
I don't understand, is this just another project built on DBPedia, or a project to replace DBPedia entirely?
  
a valid question. DBpedia is quite decentralised and hard to understand in its entirety. So actually some parts are improved and others will be replaced eventually (also an improvement, hopefully).

The main improvement here is that, we don't have  large monolithic releases that take 
forever anymore. Especially the language chapters and also the professional community can 
work better with the "platform" in terms of turnaround, effective contribution 
and also incentives for contribution. Another thing that will hopefully improve is that 
we can more sustainably maintain contributions and add-ons, which were formerly lost 
between releases. So the structure and processes will be clearer.
The DBpedia in the "main endpoint" will still be there, but in a way that 
nl.dbpedia.org/sparql or wikidata.dbpedia.org/sparql is there. The new hosted service 
will be more a knowledge graph of knowledge graph, where you can get either all 
information in a fused way or you can quickly jump to the sources, compare and do 
improvements there. Projects and organisations can also upload their data to query it 
there themselves or share it with others and persist it. Companies can sell or advertise 
their data. The core consists of the Wikipedia/Wikidata data and we hope to be able to 
improve it and also send contributors and contributions back to the Wikiverse.
  
Are you a DBPedia maintainer?

Yes, I took it as my task to talk to everybody in the community over the last 
year and draft/aggregate the new strategy and innovate.

All the best,
Sebastian

  
On 08.05.2018 13:42, Laura Morales wrote:

I don't understand, is this just another project built on DBPedia, or a project 
to replace DBPedia entirely? Are you a DBPedia maintainer?



  
  


Sent: Tuesday, May 08, 2018 at 1:29 PM
From: "Sebastian Hellmann" 
<hellm...@informatik.uni-leipzig.de>[mailto:hellm...@informatik.uni-leipzig.de]
To: "Discussion list for the Wikidata project." 
<wikidata@lists.wikimedia.org>[mailto:wikidata@lists.wikimedia.org]
Subject: [Wikidata] DBpedia Databus (alpha version)


DBpedia Databus (alpha version)

  
The DBpedia Databus is a platform that allows to exchange, curate and access data between multiple stakeholders. Any data entering the bus will be versioned, cleaned, mapped, linked and its licenses and provenance tracked. Hosting in multiple formats will be provided to access the data either as dump download or as API. Data governance stays with the data contributors.
  


Vision

Working with data is hard and repetitive. We envision a hub, where everybody 
can upload data and then useful operations like versioning, cleaning, 
transformation, mapping, linking, merging, hosting is done automagically on a 
central communication system (the bus) and then dispersed again in a decentral 
network to the consumers and applications.
On the databus, data flows from data producers through the platform to the 
consumers (left to right), any errors or feedback flows in the opposite 
direction and reaches the data source to provide a continuous integration 
service and improve the data at the source.
  


Open Data vs. Closed (paid) Data

We have studied the data network for 10 years now and we conclude that 
organisations with open data are struggling to work together properly, although 
they

Re: [Wikidata] DBpedia Databus (alpha version)

2018-05-08 Thread Sebastian Hellmann

Hi Laura,


I don't understand, is this just another project built on DBPedia, or a project 
to replace DBPedia entirely?


a valid question. DBpedia is quite decentralised and hard to understand 
in its entirety. So actually some parts are improved and others will be 
replaced eventually (also an improvement, hopefully).


The main improvement here is that, we don't have  large monolithic 
releases that take forever anymore. Especially the language chapters and 
also the professional community can work better with the "platform" in 
terms of turnaround, effective contribution and also incentives for 
contribution. Another thing that will hopefully improve is that we can 
more sustainably maintain contributions and add-ons, which were formerly 
lost between releases. So the structure and processes will be clearer.


The DBpedia in the "main endpoint" will still be there, but in a way 
that nl.dbpedia.org/sparql or wikidata.dbpedia.org/sparql is there. The 
new hosted service will be more a knowledge graph of knowledge graph, 
where you can get either all information in a fused way or you can 
quickly jump to the sources, compare and do improvements there. Projects 
and organisations can also upload their data to query it there 
themselves or share it with others and persist it. Companies can sell or 
advertise their data. The core consists of the Wikipedia/Wikidata data 
and we hope to be able to improve it and also send contributors and 
contributions back to the Wikiverse.



Are you a DBPedia maintainer?
Yes, I took it as my task to talk to everybody in the community over the 
last year and draft/aggregate the new strategy and innovate.


All the best,
Sebastian


On 08.05.2018 13:42, Laura Morales wrote:

I don't understand, is this just another project built on DBPedia, or a project 
to replace DBPedia entirely? Are you a DBPedia maintainer?



  
  


Sent: Tuesday, May 08, 2018 at 1:29 PM
From: "Sebastian Hellmann" <hellm...@informatik.uni-leipzig.de>
To: "Discussion list for the Wikidata project." <wikidata@lists.wikimedia.org>
Subject: [Wikidata] DBpedia Databus (alpha version)


DBpedia Databus (alpha version)

  
The DBpedia Databus is a platform that allows to exchange, curate and access data between multiple stakeholders. Any data entering the bus will be versioned, cleaned, mapped, linked and its licenses and provenance tracked. Hosting in multiple formats will be provided to access the data either as dump download or as API. Data governance stays with the data contributors.
  


Vision

Working with data is hard and repetitive. We envision a hub, where everybody 
can upload data and then useful operations like versioning, cleaning, 
transformation, mapping, linking, merging, hosting is done automagically on a 
central communication system (the bus) and then dispersed again in a decentral 
network to the consumers and applications.
On the databus, data flows from data producers through the platform to the 
consumers (left to right), any errors or feedback flows in the opposite 
direction and reaches the data source to provide a continuous integration 
service and improve the data at the source.
  


Open Data vs. Closed (paid) Data

We have studied the data network for 10 years now and we conclude that 
organisations with open data are struggling to work together properly, although 
they could and should, but are hindered by technical and organisational 
barriers. They duplicate work on the same data. On the other hand, companies 
selling data can not do so in a scalable way. The loser is the consumer with 
the choice of inferior open data or buying from a djungle-like market.

Publishing data on the databus

If you are grinding your teeth about how to publish data on the web, you can 
just use the databus to do so. Data loaded on the bus will be highly visible, 
available and queryable. You should think of it as a service:

Visibility guarantees, that your citations and reputation goes up
Besides a web download, we can also provide a Linked Data interface, SPARQL 
endpoint, Lookup (autocomplete) or many other means of availability (like AWS 
or Docker images)
Any distribution we are doing will funnel feedback and collaboration 
opportunities your way to improve your dataset and your internal data quality
You will receive an enriched dataset, which is connected and complemented with 
any other available data (see the same folder names in data and fusion folders).
  
  


Data Sellers

If you are selling data, the databus provides numerous opportunities for you. 
You can link your offering to the open entities in the databus. This allows 
consumers to discover your services better by showing it with each request.
  


Data Consumers

Open data on the databus will be a commodity. We are greatly downing the cost 
for understanding the data, retrieving and reformatting it. We are constantly 
extending ways of using the data and are willing

Re: [Wikidata] Wikiata and the LOD cloud

2018-05-08 Thread Sebastian Hellmann

Hi Lucas, Denny,

all you need to do is update your entry on old.datahub.io:

https://old.datahub.io/dataset/wikidata

It was edited by Lucie-Aimée Kaffee two years ago. You need to contact 
her, as she created the Wikimedia org in Datahub. I might be able to 
have someone switch ownership of the org to a new account.


But there is many essential metadata missing:

Compare with the DBpedia entry: https://old.datahub.io/dataset/dbpedia

Especially the links and the triple size in the bottom. So you need to 
keep this one updated in order to appear in the LOD cloud.


Please tell me if you can't edit it, I know a former admin from the time 
datahub.io was first created 10 years ago in LOD2 and LATC EU projects, 
he might be able to do something in case there is nobody answering due 
to datahub.io switching to a new style.


All the best,

Sebastian


On 07.05.2018 22:35, Lucas Werkmeister wrote:
Folks, I’m already in contact with John, there’s no need to contact 
him again :)


Cheers, Lucas

Am Mo., 7. Mai 2018 um 19:32 Uhr schrieb Denny Vrandečić 
<vrande...@gmail.com <mailto:vrande...@gmail.com>>:


Well, then, we have tried several times to get into that diagram,
and it never worked out.

So, given the page you linke, it says:


  Contributing to the Diagram

First, make sure that you publish data according to the Linked
Data principles <http://www.w3.org/DesignIssues/LinkedData.html>.
We interpret this as:

  * There must be /resolvable http:// (or https://) URIs/.
  * They must resolve, with or without content negotiation, to
/RDF data/ in one of the popular RDF formats (RDFa, RDF/XML,
Turtle, N-Triples).
  * The dataset must contain /at least 1000 triples/. (Hence, your
FOAF file most likely does not qualify.)
  * The dataset must be connected via /RDF links/ to a dataset
that is already in the diagram. This means, either your
dataset must use URIs from the other dataset, or vice versa.
We arbitrarily require at least 50 links.
  * Access of the /entire/ dataset must be possible via /RDF
crawling/, via an /RDF dump/, or via a /SPARQL endpoint/.

The process for adding datasets is still under development, please
contact John P. McCrae <mailto:j...@mcc.ae> to add a new dataset


Wikidata fulfills all the conditions easily. So, here we go, I am
adding John to this thread - although I know he already knows
about this request - and I am asking officially to enter Wikidata
into the LOD diagram.

Let's keep it all open, and see where it goes from here.

Cheers,
Denny


On Mon, May 7, 2018 at 4:15 AM Sebastian Hellmann
<hellm...@informatik.uni-leipzig.de
<mailto:hellm...@informatik.uni-leipzig.de>> wrote:

Hi Denny, Maarten,

you should read your own emails. In fact it is quite easy to
join the LOD cloud diagram.

The most important step is to follow the instructions on the
page: http://lod-cloud.net under how to contribute and then
add the metadata.

Some years ago I made a Wordpress with enabled Linked Data:
http://www.klappstuhlclub.de/wp/ Even this is included as I
simply added the metadata entry.

Do you really think John McCrae added a line in the code that
says "if (dataset==wikidata) skip; " ?

You just need to add it like everybody else in LOD, DBpedia
also created its entry and updates it now and then. The same
accounts for http://lov.okfn.org Somebody from Wikidata needs
to upload the Wikidata properties as OWL.  If nobody does it,
it will not be in there.

All the best,

Sebastian


On 04.05.2018 18:33, Maarten Dammers wrote:

It almost feels like someone doesn’t want Wikidata in there?
Maybe that website is maintained by DBpedia fans? Just
thinking out loud here because DBpedia is very popular in the
academic world and Wikidata a huge threat for that popularity.

Maarten

Op 4 mei 2018 om 17:20 heeft Denny Vrandečić
<vrande...@gmail.com <mailto:vrande...@gmail.com>> het
volgende geschreven:


I'm pretty sure that Wikidata is doing better than 90% of
the current bubbles in the diagram.

If they wanted to have Wikidata in the diagram it would have
been there before it was too small to read it. :)

On Tue, May 1, 2018 at 7:47 AM Peter F. Patel-Schneider
<pfpschnei...@gmail.com <mailto:pfpschnei...@gmail.com>> wrote:

Thanks for the corrections.

So https://www.wikidata.org/entity/Q42 is *the* Wikidata
IRI for Douglas
Adams.  Retrieving from this IRI results in a 303 See
Other to
https://www.wikidata.org/wiki/Special:EntityData/Q42,
 

Re: [Wikidata] Wikiata and the LOD cloud

2018-05-07 Thread Sebastian Hellmann
Wikidata should hold the data of all Wikipedias, that is its main 
purpose. However, it doesn't yet and there are many problems, i.e. 
missing references, population count moved to Commons and an open 
discussion to even throw out Wikidata from the infoboxes: 
https://en.wikipedia.org/wiki/Wikipedia:Wikidata/2018_Infobox_RfC


DBpedia is more about technology than data, so we are trying to help out 
and push Wikidata, so it has all the values of all Wikipedias plus it's 
references: 
https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSync


All the best,

Sebastian


On 07.05.2018 14:26, Andy Mabbett wrote:

On 7 May 2018 at 00:15, Sylvain Boissel <sylvainbois...@gmail.com> wrote:


Le sam. 5 mai 2018 à 16:35, Andy Mabbett <a...@pigsonthewing.org.uk> a écrit

On 5 May 2018 at 14:39, David Abián <davidab...@wikimedia.es> wrote:


Both Wikidata and DBpedia surely can, and should, coexist because we'll
never be able to host in Wikidata the entirety of the Wikipedias.

Can you give an example of something that can be represented in
DBpedia, but not Wikidata?

Sure : DBpedia knows the specific values different versions of Wikipedia
choose to display in the infobox. For example, the size or population of
countries with disputed borders. This data is useful for researchers working
on cultural bias in Wikipedia, but it makes little sense to store it in
Wikidata.

Except that does; and Wikidata is more than capable of holding values
from conflicting sources. So again, this does not substantiate the
"Both Wikidata and DBpedia surely can, and should, coexist because
we'll never be able to host in Wikidata the entirety of the
Wikipedias" claim.



--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] GlobalFactSync new prototype Re: Wikiata and the LOD cloud

2018-05-07 Thread Sebastian Hellmann

Hi all,

the discussion about Wikidata and LOD got into this specific detail and 
I was just hoping that we could pick up on a few topics.


We are still hoping to get some support for our GlobalFactSync proposal: 
https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSync



We created a new prototype here (Just Eiffel Tour for now):

http://88.99.242.78:9000/?s=http%3A%2F%2Fid.dbpedia.org%2Fglobal%2F12HpzV=http%3A%2F%2Fdbpedia.org%2Fontology%2FfloorCount=general

You can see there that the floor count property is different in the 
French Wikipedia (properties can be switched with the dropox on the top)


The English Wikipedia has the same value as Wikidata plus a reference. 
One of the goals of GlobalFactSync is to extract these references and 
import them into Wikidata.


We will also build a redirection service around it, so you can use 
Wikidata Q's and P's as arguments for ?s= and ?p= and get resolved to 
the right entry for quick comparison between WD and WP.


All the best,

Sebastian





On 06.05.2018 10:54, Ettore RIZZA wrote:
@Antonin : You're right, I now remember Magnus Knuth's message on this 
list about GlobalFactSync 
<https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSync>, 
a lite version of CrossWikiFact, if I understood correctly. I also 
remember that his message did not trigger many reactions...


2018-05-06 10:46 GMT+02:00 Antonin Delpeuch (lists) 
<li...@antonin.delpeuch.eu <mailto:li...@antonin.delpeuch.eu>>:


On 06/05/2018 10:37, Ettore RIZZA wrote:
>     More simply, there's still a long way to go until Wikidata
imports
>     all the data contained in Wikipedia infoboxes (or equivalent
data
>     from other sources), let alone the rest.
>
>
> This surprises me. Are there any statistics somewhere on the rate of
> Wikipedia's infoboxes fully parsed ?


That was more or less the goal of the CrossWikiFact project, which was
unfortunately not very widely supported:
https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/CrossWikiFact
<https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/CrossWikiFact>

It's still not clear to me why this got so little support - it looked
like a good opportunity to collaborate with DBpedia.

Antonin

___
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
<https://lists.wikimedia.org/mailman/listinfo/wikidata>




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikiata and the LOD cloud

2018-05-07 Thread Sebastian Hellmann
://www.wikidata.org/wiki/Special:EntityData/Q%7BNUMBER%7D.ttl>).
>
>
> Suprisingly, there is no connection between the entity IRIs and
the wikipage
> URLs. If one was given the IRI of an entity from Wikidata, and
had no
> further information about how Wikidata works, they would not be
able to
> retrieve HTML content about the entity.
>
>
> BTW, I'm not sure the implementation of content negotiation in
Wikidata is
> correct because the server does not tell me the format of the
resource to
> which it redirects (as opposed to what DBpedia does, for instance).
>
>
> --AZ


___
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] About OCLC and DBpedia Links

2018-03-06 Thread Sebastian Hellmann
Hm, now I am also curious and would like to ask the same question as 
Ettore. What is the policy here?


Viaf has schema.org backlinks, see https://viaf.org/viaf/85312226/rdf.xml

http://www.wikidata.org/entity/Q80"/>

Then it's ok to duplicate? because it is not owl:sameAs?

All the best,
Sebastain

On 06.03.2018 20:01, Magnus Sälgö wrote:
If Dbpedia has “same as” then Wikidata doesn’t have to duplicate that 
information you can ask dbpedia what is same as Q7724


Regards
Magnus Sälgö
Stockholm, Sweden

6 mars 2018 kl. 19:49 skrev Ettore RIZZA <ettoreri...@gmail.com 
<mailto:ettoreri...@gmail.com>>:



First of all, thank you all for your answers.

@Magnus and Thad: it's a bit what I suspected. Since the URL to 
WorldCat can be rebuilt from the Library of congress authority ID, I 
guess someone thought it would be a duplicate.


But 1) I'm not sure that there is a 1 to 1 mapping between all 
Worldcat Identities and the Library of Congress 2) It would be rather 
strange that a Library of Congress ID would also serve as an ID for a 
"competitor" (ie OCLC, which maintains Worldcat and VIAF) 3) One 
would then wonder why Wikipedia provides both links to the Library of 
Congress Authority ID and Worldcat Identities.


With respect for the fact that Wikidata already contains links to 
VIAF and that VIAF contains links to Worldcat Identities, this 
transitivity reasoning could apply to many other Authority IDs, I think.


@Sebastian: Il would be great! I'll follow this project closely, just 
as I'm already following your papers 
<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcontent.iospress.com%2Farticles%2Fsemantic-web%2Fsw277=02%7C01%7C%7Cb20d7c5a98504fcaa29e08d583930f50%7C84df9e7fe9f640afb435%7C1%7C0%7C636559589991689411=R1LeZSVSLmdZ%2FiJh6evIwrukAgtQQhzz7OxrJNL%2FXVg%3D=0>. 
And it is precisely because I know that there is a desire for 
"rapprochement" on both sides that I asked why there is absolutely 
nothing in Wikidata that links to DBpedia (or Yago), whereas DBpedia 
contains a lot of owl: sameAs to Wikidata. All this must have been 
discussed somewhere I suppose. Still, I do not even find a property 
proposal for "DBpedia link".


2018-03-06 18:59 GMT+01:00 Sebastian Hellmann 
<hellm...@informatik.uni-leipzig.de 
<mailto:hellm...@informatik.uni-leipzig.de>>:


Hi Ettore,

we just released a very early prototype of the new DBpedia:

http://88.99.242.78/hdt/en_wiki_de_sv_nl_fr-replaced.nt.bz2

<https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2F88.99.242.78%2Fhdt%2Fen_wiki_de_sv_nl_fr-replaced.nt.bz2=02%7C01%7C%7Cb20d7c5a98504fcaa29e08d583930f50%7C84df9e7fe9f640afb435%7C1%7C0%7C636559589991689411=oml9kGY8ikCsVFH0jmSFx3txDjQGyT6lj%2FSxangnIZM%3D=0>

I attached the first 1000 triples. The data is a merge of
Wikidata + 5  DBpedias from the 5 largest Wikipedia versions.
Overall, there are many issues, but we have a test-driven data
engineering process combined with Scrum and biweekly releases,
next one is on March 15th. The new IDs are also stable by design.

We discussed how to effectively reuse all technologies we have
for Wikidata and also Wikipedia and are applying with this
project at the moment:
https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSync

<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmeta.wikimedia.org%2Fwiki%2FGrants%3AProject%2FDBpedia%2FGlobalFactSync=02%7C01%7C%7Cb20d7c5a98504fcaa29e08d583930f50%7C84df9e7fe9f640afb435%7C1%7C0%7C636559589991689411=cOIU%2FCe93h%2Ft4b1PdIDhMJWL%2FZB2XKnere%2BulFSZAFE%3D=0>

(Endorsements on the main page and comments on the talk page are
welcome).

We really hope that the project gets accepted, so we can deploy
the technologies behind DBpedia to the Wikiverse, e.g. we found
over 900k triples/statements with references in the English
Wikipedia's Infoboxes alone.

We still have to do documentation and hosting of the new
releases, but then it would indeed be a good time to add the
links to DBpedia, if nobody objects. Also some people mentioned
that we could load the DBpedia Ontology into Wikidata to provide
an alternate class hierarchy. In DBpedia we loaded 5 or 6
classification schemes (Yago, Umbel, etc.), which are useful for
different kind of queries.


All the best,
Sebastian




On 06.03.2018 18:14, Ettore RIZZA wrote:

Dear all,

I asked myself a series of questions about the links between
Wikidata and other knowledge/data bases, namely those of OCLC
and DBpedia. For example:

- Why Wikidata has no property "Worldcat Identities

<https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fworldcat.org%2Fidentities%2F=02%7C01%7C%7Cb20d7c5a98504fcaa29e08d583930f50%7C84df9e7fe9f640afb435%7C1%7C0

Re: [Wikidata] About OCLC and DBpedia Links

2018-03-06 Thread Sebastian Hellmann

Hi Ettore,

On 06.03.2018 19:48, Ettore RIZZA wrote:
@Sebastian: Il would be great! I'll follow this project closely, just 
as I'm already following your papers 
<https://content.iospress.com/articles/semantic-web/sw277>. And it is 
precisely because I know that there is a desire for "rapprochement" on 
both sides that I asked why there is absolutely nothing in Wikidata 
that links to DBpedia (or Yago), whereas DBpedia contains a lot of 
owl: sameAs to Wikidata. All this must have been discussed somewhere I 
suppose. Still, I do not even find a property proposal for "DBpedia 
link". 


Thanks, Ali (who volunteered to do biweekly Wikidata-DBpedia 
extractions) and Dimitris should get most of the credit for the paper.


We as DBpedia would really like to contribute more to Wikidata, but all 
developers are busy with coding DBpedia improvements and the community 
is more concerned with their own interests and we didn't find a 
volunteer who committed to focusing on closing the gaps between the 
projects (it is quite a lot of work, but worth it). Hence the 
application for the developer in GlobalFactSync[1] as we really need 
someone to focus on porting technologies to Wikimedia and transition the 
core data from Wikipedia to Wikidata via DBpedia as a transparent layer.



[1] https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSync


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] About OCLC and DBpedia Links

2018-03-06 Thread Sebastian Hellmann

Hi Ettore,

we just released a very early prototype of the new DBpedia:

http://88.99.242.78/hdt/en_wiki_de_sv_nl_fr-replaced.nt.bz2

I attached the first 1000 triples. The data is a merge of Wikidata + 5  
DBpedias from the 5 largest Wikipedia versions. Overall, there are many 
issues, but we have a test-driven data engineering process combined with 
Scrum and biweekly releases, next one is on March 15th. The new IDs are 
also stable by design.


We discussed how to effectively reuse all technologies we have for 
Wikidata and also Wikipedia and are applying with this project at the 
moment: 
https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSync


(Endorsements on the main page and comments on the talk page are welcome).

We really hope that the project gets accepted, so we can deploy the 
technologies behind DBpedia to the Wikiverse, e.g. we found over 900k 
triples/statements with references in the English Wikipedia's Infoboxes 
alone.


We still have to do documentation and hosting of the new releases, but 
then it would indeed be a good time to add the links to DBpedia, if 
nobody objects. Also some people mentioned that we could load the 
DBpedia Ontology into Wikidata to provide an alternate class hierarchy. 
In DBpedia we loaded 5 or 6 classification schemes (Yago, Umbel, etc.), 
which are useful for different kind of queries.



All the best,
Sebastian



On 06.03.2018 18:14, Ettore RIZZA wrote:

Dear all,

I asked myself a series of questions about the links between Wikidata 
and other knowledge/data bases, namely those of OCLC and DBpedia. For 
example:


- Why Wikidata has no property "Worldcat Identities 
<http://worldcat.org/identities/>" while the English edition of 
Wikipedia systematically mentions this identity (when it exists) in 
its section "Autorithy control" ?


- Why do VIAF links to all editions of Wikipedia, but not (simply) to 
Wikidata ?


- Why is there no link to DBpedia when the opposite is true ?

These questions may seem very different from each other, but they 
ultimately concern a common subject and are all very basic. I suspect 
they had to be discussed somewhere, maybe at the dawn of Wikidata. 
However, I find nothing in the archives of this Newsletter, nor in the 
discussions on Wikidata.


Could someone point me to some documentation on these issues ?

Cheers,

Ettore Rizza


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://id.dbpedia.org/global/64>
a <http://dbpedia.org/ontology/Media>, <http://www.w3.org/2002/07/owl#Thing>, <http://www.wikidata.org/entity/Q340169> ;
<http://www.w3.org/2000/01/rdf-schema#label> "Categoria:Panagyurishte"@it, "Categorie:Panagyurishte"@nl, "Category:Panagyurishte"@en, "Category:パナギュリシテ"@ja, "Categoría:Panaguiúrishte"@es, "Catégorie:Panagyurichté"@fr, "Kategori:Panagjurisjte"@sv, "Kategorie:Panagjurischte"@de, "Kategorie:Panagyurishte"@cs, "Kategória:Panagyurishte"@sk, "Panagjurischte"@de, "Panagyurishte"@en, "Panagyurishte"@pl, "Категория:Панагюриште"@ru, "Категория:Панагюрище"@bg, "Категорија:Панаѓуриште"@mk ;
<http://www.w3.org/2000/01/rdf-schema#seeAlso> <https://commons.wikimedia.org/wiki/Category:Panagyurishte> .

<http://id.dbpedia.org/global/65>
<http://dbpedia.org/ontology/award> <http://id.dbpedia.org/global/66> ;
<http://dbpedia.org/ontology/birthDate> "1937-4-11"^^<http://www.w3.org/2001/XMLSchema#date> ;
<http://dbpedia.org/ontology/birthPlace> <http://id.dbpedia.org/global/67> ;
<http://dbpedia.org/ontology/deathDate> "2016-3-17"^^<http://www.w3.org/2001/XMLSchema#date> ;
<http://dbpedia.org/ontology/deathPlace> <http://id.dbpedia.org/global/68> ;
a <http://dbpedia.org/ontology/Agent>, <http://dbpedia.org/ontology/Person>, <http://schema.org/Person>, <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Agent>, <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#NaturalPerson>, <http://www.w3.org/20

Re: [Wikidata] Kickstartet: Adding 2.2 million German organisations to Wikidata

2017-10-16 Thread Sebastian Hellmann
Ok, I put some effort into 
https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/Handelsregister 
to move the discussion there.


All the best,

Sebastian


On 16.10.2017 18:06, Yaroslav Blanter wrote:

Dear All,

it is great that we are having this discussion, but may I please 
suggest to have it on the RfP page on Wikidata? People already asked 
similar questions there, and, in my experience, on-wiki discussion 
will likely lead to refined request which will accomodate all suggestions.


Cheers
Yaroslav

On Mon, Oct 16, 2017 at 5:53 PM, Sebastian Hellmann 
<hellm...@informatik.uni-leipzig.de 
<mailto:hellm...@informatik.uni-leipzig.de>> wrote:


ah, ok, sorry, I was assuming that Blazegraph would transitively
resolve this automatically.

Ok, so let's divide the problem:

# Task 1:

Connect all existing organisations with the data from the
handelsregister. (No new identifiers added, we can start right now)

Add a constraint that all German organisations should be connected
to a court, i.e. the registering organisation as well as the id
assigned by the court.

@all: any properties I can reuse for this?

I will focus on this as it seems quite easy. We can first filter
orgs by other criteria, i.e. country as a blocking key and then
string match the rest.

# Task 2:

Add all missing identifiers for the remaining orgs in
Handelsregister. Whereas 2 can be rediscussed and decided, if 1 is
finished sufficiently.


# regarding maintenance:
I find Wikidata as such very hard to maintain as all data is
copied from somewhere else eventually, but Wikipedia has the same
problem. In the case of the German Business register, maintenance
is especially easy as the orgs are stable and uniquely
identifiable. Even the fact that a company gets shut down should
still be in Wikidata, so you have historical information. I mean,
you also keep the Roman Empire, the Hanse and even finished
projects in Wikidata. So even if an org ceases to exist, the entry
in Wikidata should stay.

# regarding Opencorporates
I have a critical opinion with Opencorporates. It appears to be
open, but you actually can not get the data. If somebody has a
data dump, please forward to me. Thanks.
More on top, I consider Opencorporates a danger to open data. It
appears to push open availability of data, but then it is limited
to open licenses. Usefulness is limited as there are no free dumps
and no possibility to duplicate it effectlively. Wikipedia and
Wikidata provide dumps and an API for exactly this reason.
Everytime somebody wants to create an open organisation dataset
with no barriers, the existence of Opencorporates is blocking this.

Cheers,
Sebastian


On 16.10.2017 15:34, Antonin Delpeuch (lists) wrote:

And… my own count was wrong too, because I forgot to add DISTINCT in my
query (if there are multiple paths from the class to "organization
(Q43229)", items will appear multiple times).

So, I get 1 168 084 now.
http://tinyurl.com/yaeqlsnl

It's easy to get these things wrong!

Antonin

On 16/10/2017 14:16, Antonin Delpeuch (lists) wrote:

Thanks Ettore for spotting that!

Wikidata types (P31) only make sense when you consider the "subclass of"
(P279) property that we use to build the ontology (except in a few cases
where the community has decided not to use any subclass for a particular
type).

So, to retrieve all items of a certain type in SPARQL, you need to use
something like this:

?item wdt:P31/wdt:P279* ?type

You can also have other variants to accept non-truthy statements.

Just with this truthy version, I currently get 1 208 227 items. But note
that there are still a lot of items where P31 is not provided, or
subclasses which have not been connected to "organization (Q43229)"…

So in general, it's very hard to have any "guarantees that there are no
duplicates", just because you don't have any guarantees that the
information currently in Wikidata is complete or correct.

I would recommend trying to import something a bit smaller to get
acquainted with how Wikidata works and what the matching process looks
like in practice. And beyond a one-off import, as Ettore said it is
important to think how the data will be maintained in the future…

Antonin

On 16/10/2017 13:46, Ettore RIZZA wrote:

 - Wikidata has 40k organisations:

 https://query.wikidata.org/#SELECT
<https://query.wikidata.org/#SELECT>
 <https://query.wikidata.org/#SELECT>
<https://query.wikidata.org/#SELECT>  %3Fitem %3FitemLabel %0AWHERE
 %0A{%0A %3Fitem wdt%3AP31 wd%3AQ43229.%0A SERVICE wikibase%3Alabel {
 bd%3AserviceParam wikibase%3Alanguage "[AUTO_LANGUAGE]%2Cen&qu

Re: [Wikidata] Kickstartet: Adding 2.2 million German organisations to Wikidata

2017-10-16 Thread Sebastian Hellmann

Hi Yaroslav,

in addition to this list, I added it here:

https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/Handelsregister

and here:

https://www.wikidata.org/wiki/Wikidata:Project_chat#Handelsregister

but I received more and longer answers on this list.

All the best,

Sebastian


On 16.10.2017 18:06, Yaroslav Blanter wrote:

Dear All,

it is great that we are having this discussion, but may I please 
suggest to have it on the RfP page on Wikidata? People already asked 
similar questions there, and, in my experience, on-wiki discussion 
will likely lead to refined request which will accomodate all suggestions.


Cheers
Yaroslav

On Mon, Oct 16, 2017 at 5:53 PM, Sebastian Hellmann 
<hellm...@informatik.uni-leipzig.de 
<mailto:hellm...@informatik.uni-leipzig.de>> wrote:


ah, ok, sorry, I was assuming that Blazegraph would transitively
resolve this automatically.

Ok, so let's divide the problem:

# Task 1:

Connect all existing organisations with the data from the
handelsregister. (No new identifiers added, we can start right now)

Add a constraint that all German organisations should be connected
to a court, i.e. the registering organisation as well as the id
assigned by the court.

@all: any properties I can reuse for this?

I will focus on this as it seems quite easy. We can first filter
orgs by other criteria, i.e. country as a blocking key and then
string match the rest.

# Task 2:

Add all missing identifiers for the remaining orgs in
Handelsregister. Whereas 2 can be rediscussed and decided, if 1 is
finished sufficiently.


# regarding maintenance:
I find Wikidata as such very hard to maintain as all data is
copied from somewhere else eventually, but Wikipedia has the same
problem. In the case of the German Business register, maintenance
is especially easy as the orgs are stable and uniquely
identifiable. Even the fact that a company gets shut down should
still be in Wikidata, so you have historical information. I mean,
you also keep the Roman Empire, the Hanse and even finished
projects in Wikidata. So even if an org ceases to exist, the entry
in Wikidata should stay.

# regarding Opencorporates
I have a critical opinion with Opencorporates. It appears to be
open, but you actually can not get the data. If somebody has a
data dump, please forward to me. Thanks.
More on top, I consider Opencorporates a danger to open data. It
appears to push open availability of data, but then it is limited
to open licenses. Usefulness is limited as there are no free dumps
and no possibility to duplicate it effectlively. Wikipedia and
Wikidata provide dumps and an API for exactly this reason.
Everytime somebody wants to create an open organisation dataset
with no barriers, the existence of Opencorporates is blocking this.

Cheers,
Sebastian


On 16.10.2017 15:34, Antonin Delpeuch (lists) wrote:

And… my own count was wrong too, because I forgot to add DISTINCT in my
query (if there are multiple paths from the class to "organization
(Q43229)", items will appear multiple times).

So, I get 1 168 084 now.
http://tinyurl.com/yaeqlsnl

It's easy to get these things wrong!

Antonin

On 16/10/2017 14:16, Antonin Delpeuch (lists) wrote:

Thanks Ettore for spotting that!

Wikidata types (P31) only make sense when you consider the "subclass of"
(P279) property that we use to build the ontology (except in a few cases
where the community has decided not to use any subclass for a particular
type).

So, to retrieve all items of a certain type in SPARQL, you need to use
something like this:

?item wdt:P31/wdt:P279* ?type

You can also have other variants to accept non-truthy statements.

Just with this truthy version, I currently get 1 208 227 items. But note
that there are still a lot of items where P31 is not provided, or
subclasses which have not been connected to "organization (Q43229)"…

So in general, it's very hard to have any "guarantees that there are no
duplicates", just because you don't have any guarantees that the
information currently in Wikidata is complete or correct.

I would recommend trying to import something a bit smaller to get
acquainted with how Wikidata works and what the matching process looks
like in practice. And beyond a one-off import, as Ettore said it is
important to think how the data will be maintained in the future…

Antonin

On 16/10/2017 13:46, Ettore RIZZA wrote:

 - Wikidata has 40k organisations:

 https://query.wikidata.org/#SELECT
<https://query.wikidata.org/#SELECT>
 <https://query.wikidata.org/#SELECT>
<https://query.wikidata.org/#SELECT>  %3Fitem %3FitemLabel %0AWHERE
 %0A{%0A %3

Re: [Wikidata] Kickstartet: Adding 2.2 million German organisations to Wikidata

2017-10-16 Thread Sebastian Hellmann
ah, ok, sorry, I was assuming that Blazegraph would transitively resolve 
this automatically.


Ok, so let's divide the problem:

# Task 1:

Connect all existing organisations with the data from the 
handelsregister. (No new identifiers added, we can start right now)


Add a constraint that all German organisations should be connected to a 
court, i.e. the registering organisation as well as the id assigned by 
the court.


@all: any properties I can reuse for this?

I will focus on this as it seems quite easy. We can first filter orgs by 
other criteria, i.e. country as a blocking key and then string match the 
rest.


# Task 2:

Add all missing identifiers for the remaining orgs in Handelsregister. 
Whereas 2 can be rediscussed and decided, if 1 is finished sufficiently.



# regarding maintenance:
I find Wikidata as such very hard to maintain as all data is copied from 
somewhere else eventually, but Wikipedia has the same problem. In the 
case of the German Business register, maintenance is especially easy as 
the orgs are stable and uniquely identifiable. Even the fact that a 
company gets shut down should still be in Wikidata, so you have 
historical information. I mean, you also keep the Roman Empire, the 
Hanse and even finished projects in Wikidata. So even if an org ceases 
to exist, the entry in Wikidata should stay.


# regarding Opencorporates
I have a critical opinion with Opencorporates. It appears to be open, 
but you actually can not get the data. If somebody has a data dump, 
please forward to me. Thanks.
More on top, I consider Opencorporates a danger to open data. It appears 
to push open availability of data, but then it is limited to open 
licenses. Usefulness is limited as there are no free dumps and no 
possibility to duplicate it effectlively. Wikipedia and Wikidata provide 
dumps and an API for exactly this reason. Everytime somebody wants to 
create an open organisation dataset with no barriers, the existence of 
Opencorporates is blocking this.


Cheers,
Sebastian


On 16.10.2017 15:34, Antonin Delpeuch (lists) wrote:

And… my own count was wrong too, because I forgot to add DISTINCT in my
query (if there are multiple paths from the class to "organization
(Q43229)", items will appear multiple times).

So, I get 1 168 084 now.
http://tinyurl.com/yaeqlsnl

It's easy to get these things wrong!

Antonin

On 16/10/2017 14:16, Antonin Delpeuch (lists) wrote:

Thanks Ettore for spotting that!

Wikidata types (P31) only make sense when you consider the "subclass of"
(P279) property that we use to build the ontology (except in a few cases
where the community has decided not to use any subclass for a particular
type).

So, to retrieve all items of a certain type in SPARQL, you need to use
something like this:

?item wdt:P31/wdt:P279* ?type

You can also have other variants to accept non-truthy statements.

Just with this truthy version, I currently get 1 208 227 items. But note
that there are still a lot of items where P31 is not provided, or
subclasses which have not been connected to "organization (Q43229)"…

So in general, it's very hard to have any "guarantees that there are no
duplicates", just because you don't have any guarantees that the
information currently in Wikidata is complete or correct.

I would recommend trying to import something a bit smaller to get
acquainted with how Wikidata works and what the matching process looks
like in practice. And beyond a one-off import, as Ettore said it is
important to think how the data will be maintained in the future…

Antonin

On 16/10/2017 13:46, Ettore RIZZA wrote:

 - Wikidata has 40k organisations:

 https://query.wikidata.org/#SELECT
 <https://query.wikidata.org/#SELECT> %3Fitem %3FitemLabel %0AWHERE
 %0A{%0A %3Fitem wdt%3AP31 wd%3AQ43229.%0A SERVICE wikibase%3Alabel {
 bd%3AserviceParam wikibase%3Alanguage "[AUTO_LANGUAGE]%2Cen". }%0A}


Hi,

I think Wikidata contains many more organizations than that. If we
choose the "instance of Business enterprise", we get 135570 results. And
I imagine there are many other categories that bring together commercial
companies.


https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ4830453.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D

On the substance, the project to add all companies of a country would
make Wikidata a kind of totally free clone of Open Corporates
<https://opencorporates.com/>. I would of course be delighted to see
that, but is it not a challenge to maintain such a database? Companies
are like humans, it appears and disappears every day.

  


2017-10-16 13:41 GMT+02:00 Sebastian Hellmann
<hellm...@informatik.uni-leipzig.de
<mailto:hellm...@informatik.uni-leipzig.de>>:

 Hi all,

 the technical challenges are not

Re: [Wikidata] Kickstartet: Adding 2.2 million German organisations to Wikidata

2017-10-16 Thread Sebastian Hellmann

Ah yes, forgot to mention:

there is no URI or unique identifier given by the Handelsregister 
system. However, the courts take care that the registrations are unique, 
so it is implicit. Handelsregister could easily create stable URIs out 
of the court+type+number like /Leipzig_HRB_32853


For Wikidata this is not a problem to handle. So no technical issues 
from this side either.


All the best,

Sebastian


On 16.10.2017 13:41, Sebastian Hellmann wrote:


Hi all,

the technical challenges are not so difficult.

- 2.2 million are the exact number of German organisations, i.e. 
associations and companies. They are also unique.


- Wikidata has 40k organisations:

https://query.wikidata.org/#SELECT %3Fitem %3FitemLabel %0AWHERE 
%0A{%0A %3Fitem wdt%3AP31 wd%3AQ43229.%0A SERVICE wikibase%3Alabel { 
bd%3AserviceParam wikibase%3Alanguage "[AUTO_LANGUAGE]%2Cen". }%0A}


so there would be a maximum of 40k duplicates These are easy to find 
and deduplicate


- The crawl can be done easily, a colleague has done so before.


The issues here are:

- Do you want to upload the data in Wikidata? It would be a real big 
extension. Can I go ahead


- If the data were available externally as structured data under open 
license, I would probably not suggest loading it into wikidata, as the 
data can be retrieved from the official source directly, however, here 
this data will not be published in a decent format.


I thought that the way data is copied from coyrighted sources, i.e. 
only facts is ok for wikidata. This done in a lot of places, I guess. 
Same for Wikipedia, i.e. News articles and copyrighted books are 
referenced. So Wikimedia or the Wikimedia community are experts on this.


All the best,

Sebastian


On 16.10.2017 10:18, Neubert, Joachim wrote:


Hi Sebastian,

This is huge! It will cover almost all currently existing German 
companies. Many of these will have similar names, so preparing for 
disambiguation is a concern.


A good way for such an approach would be proposing a property for an 
external identifier, loading the data into Mix-n-match, creating 
links for companies already in Wikidata, and adding the rest (or 
perhaps only parts of them - I’m not sure if having all of them in 
Wikidata makes sense, but that’s another discussion), preferably with 
location and/or sector of trade in the description field.


I’ve tried to figure out what could be used as key for a external 
identifier property. However, it looks like the registry does not 
offer any (persistent) URL to its entries. So for looking up a 
company, apparently there are two options:


-conducting an extended search for the exact string “A 
Dienstleistungsgesellschaft mbH“


-copying the register number “32853” plus selecting the court 
(Leipzig) from the according dropdown list and search that


Both ways are not very intuitive, even if we can provide a link to 
the search form. This would make a weak connection to the source of 
information. Much more important, it makes disambiguation in 
Mix-n-match difficult. This applies for the preparation of your 
initial load (you would not want to create duplicates). But much more 
so for everybody else who wants to match his or her data later on. 
Being forced to search for entries manually in a cumbersome way for 
disambiguation of a new, possibly large and rich dataset is, in my 
eyes, not something we want to impose on future contributors. And 
often, the free information they find in the registry (formal name, 
register number, legal form, address) will not easily match with the 
information they have (common name, location, perhaps founding date, 
and most important sector of trade), so disambiguation may still be 
difficult.


Have you checked which parts of the accessible information as below 
can be crawled and added legally to external databases such as Wikidata?


Cheers, Joachim

--

Joachim Neubert

ZBW – German National Library of Economics

Leibniz Information Centre for Economics

Neuer Jungfernstieg 21
20354 Hamburg

Phone +49-42834-462

*Von:*Wikidata [mailto:wikidata-boun...@lists.wikimedia.org] *Im 
Auftrag von *Sebastian Hellmann

*Gesendet:* Sonntag, 15. Oktober 2017 09:45
*An:* wikidata@lists.wikimedia.org <mailto:wikidata@lists.wikimedia.org>
*Betreff:* [Wikidata] Kickstartet: Adding 2.2 million German 
organisations to Wikidata


Hi all,

the German business registry contains roughly 2.2 million 
organisations. Some information is paid, but other is public, i.e. 
the info you are searching for at and clicking on UT (see example below):


https://www.handelsregister.de/rp_web/mask.do?Typ=e

I would like to add this to Wikidata, either by crawling or by 
raising money to use crowdsourcing concepts like crowdflour or amazon 
turk.


It should meet notability criteria 2: 
https://www.wikidata.org/wiki/Wikidata:Notability


2. It refers to an instance of a *clearly identifiable conceptual
or material entity*. The entity must be notab

Re: [Wikidata] Kickstartet: Adding 2.2 million German organisations to Wikidata

2017-10-16 Thread Sebastian Hellmann

Hi all,

the technical challenges are not so difficult.

- 2.2 million are the exact number of German organisations, i.e. 
associations and companies. They are also unique.


- Wikidata has 40k organisations:

https://query.wikidata.org/#SELECT %3Fitem %3FitemLabel %0AWHERE %0A{%0A 
%3Fitem wdt%3AP31 wd%3AQ43229.%0A SERVICE wikibase%3Alabel { 
bd%3AserviceParam wikibase%3Alanguage "[AUTO_LANGUAGE]%2Cen". }%0A}


so there would be a maximum of 40k duplicates These are easy to find and 
deduplicate


- The crawl can be done easily, a colleague has done so before.


The issues here are:

- Do you want to upload the data in Wikidata? It would be a real big 
extension. Can I go ahead


- If the data were available externally as structured data under open 
license, I would probably not suggest loading it into wikidata, as the 
data can be retrieved from the official source directly, however, here 
this data will not be published in a decent format.


I thought that the way data is copied from coyrighted sources, i.e. only 
facts is ok for wikidata. This done in a lot of places, I guess. Same 
for Wikipedia, i.e. News articles and copyrighted books are referenced. 
So Wikimedia or the Wikimedia community are experts on this.


All the best,

Sebastian


On 16.10.2017 10:18, Neubert, Joachim wrote:


Hi Sebastian,

This is huge! It will cover almost all currently existing German 
companies. Many of these will have similar names, so preparing for 
disambiguation is a concern.


A good way for such an approach would be proposing a property for an 
external identifier, loading the data into Mix-n-match, creating links 
for companies already in Wikidata, and adding the rest (or perhaps 
only parts of them - I’m not sure if having all of them in Wikidata 
makes sense, but that’s another discussion), preferably with location 
and/or sector of trade in the description field.


I’ve tried to figure out what could be used as key for a external 
identifier property. However, it looks like the registry does not 
offer any (persistent) URL to its entries. So for looking up a 
company, apparently there are two options:


-conducting an extended search for the exact string “A 
Dienstleistungsgesellschaft mbH“


-copying the register number “32853” plus selecting the court 
(Leipzig) from the according dropdown list and search that


Both ways are not very intuitive, even if we can provide a link to the 
search form. This would make a weak connection to the source of 
information. Much more important, it makes disambiguation in 
Mix-n-match difficult. This applies for the preparation of your 
initial load (you would not want to create duplicates). But much more 
so for everybody else who wants to match his or her data later on. 
Being forced to search for entries manually in a cumbersome way for 
disambiguation of a new, possibly large and rich dataset is, in my 
eyes, not something we want to impose on future contributors. And 
often, the free information they find in the registry (formal name, 
register number, legal form, address) will not easily match with the 
information they have (common name, location, perhaps founding date, 
and most important sector of trade), so disambiguation may still be 
difficult.


Have you checked which parts of the accessible information as below 
can be crawled and added legally to external databases such as Wikidata?


Cheers, Joachim

--

Joachim Neubert

ZBW – German National Library of Economics

Leibniz Information Centre for Economics

Neuer Jungfernstieg 21
20354 Hamburg

Phone +49-42834-462

*Von:*Wikidata [mailto:wikidata-boun...@lists.wikimedia.org] *Im 
Auftrag von *Sebastian Hellmann

*Gesendet:* Sonntag, 15. Oktober 2017 09:45
*An:* wikidata@lists.wikimedia.org <mailto:wikidata@lists.wikimedia.org>
*Betreff:* [Wikidata] Kickstartet: Adding 2.2 million German 
organisations to Wikidata


Hi all,

the German business registry contains roughly 2.2 million 
organisations. Some information is paid, but other is public, i.e. the 
info you are searching for at and clicking on UT (see example below):


https://www.handelsregister.de/rp_web/mask.do?Typ=e

I would like to add this to Wikidata, either by crawling or by raising 
money to use crowdsourcing concepts like crowdflour or amazon turk.


It should meet notability criteria 2: 
https://www.wikidata.org/wiki/Wikidata:Notability


2. It refers to an instance of a *clearly identifiable conceptual
or material entity*. The entity must be notable, in the sense that
it *can be described using serious and publicly available
references*. If there is no item about you yet, you are probably
not notable.


The reference is the official German business registry, which is 
serious and public. Orgs are also per definition clearly identifiable 
legal entities.


How can I get clearance to proceed on this?

All the best,
Sebastian


  Entity data

Saxony District court *Leipzig HRB 32853 * – A 
Dienstleistungs

Re: [Wikidata] Kickstartet: Adding 2.2 million German organisations to Wikidata

2017-10-16 Thread Sebastian Hellmann

Thanks, done.

https://www.wikidata.org/wiki/Wikidata:Project_chat#Handelsregister


On 15.10.2017 22:10, Yaroslav Blanter wrote:

Hi Sebastian,

I would say the best way is to file a request for the permissions for 
the bot


https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot

and possibly leave a message on the Project Chat

https://www.wikidata.org/wiki/Wikidata:Project_chat

Cheers
Yaroslav

On Sun, Oct 15, 2017 at 9:44 AM, Sebastian Hellmann 
<hellm...@informatik.uni-leipzig.de 
<mailto:hellm...@informatik.uni-leipzig.de>> wrote:


Hi all,

the German business registry contains roughly 2.2 million
organisations. Some information is paid, but other is public, i.e.
the info you are searching for at and clicking on UT (see example
below):

https://www.handelsregister.de/rp_web/mask.do?Typ=e
<https://www.handelsregister.de/rp_web/mask.do?Typ=e>


I would like to add this to Wikidata, either by crawling or by
raising money to use crowdsourcing concepts like crowdflour or
amazon turk.


It should meet notability criteria 2:
https://www.wikidata.org/wiki/Wikidata:Notability
<https://www.wikidata.org/wiki/Wikidata:Notability>


2. It refers to an instance of a *clearly identifiable conceptual
or material entity*. The entity must be notable, in the sense
that it *can be described using serious and publicly available
references*. If there is no item about you yet, you are probably
not notable.



The reference is the official German business registry, which is
serious and public. Orgs are also per definition clearly
identifiable legal entities.

How can I get clearance to proceed on this?

All the best,
Sebastian



  Entity data


Saxony District court *Leipzig HRB 32853 *– A
Dienstleistungsgesellschaft mbH
Legal status:   Gesellschaft mit beschränkter Haftung
Capital:25.000,00 EUR
Date of entry:  29/08/2016
(When entering date of entry, wrong data input can occur due to
system failures!)
Date of removal:-
Balance sheet available:-
Address (subject to correction):A Dienstleistungsgesellschaft mbH
Prager Straße 38-40
04317 Leipzig


-- 
All the best,

Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies
(KILT) Competence Center
at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org,
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
<http://www.w3.org/community/ld4lt>
Homepage: http://aksw.org/SebastianHellmann
<http://aksw.org/SebastianHellmann>
Research Group: http://aksw.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
<https://lists.wikimedia.org/mailman/listinfo/wikidata>




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Kickstartet: Adding 2.2 million German organisations to Wikidata

2017-10-15 Thread Sebastian Hellmann

Hi all,

the German business registry contains roughly 2.2 million organisations. 
Some information is paid, but other is public, i.e. the info you are 
searching for at and clicking on UT (see example below):


https://www.handelsregister.de/rp_web/mask.do?Typ=e


I would like to add this to Wikidata, either by crawling or by raising 
money to use crowdsourcing concepts like crowdflour or amazon turk.



It should meet notability criteria 2: 
https://www.wikidata.org/wiki/Wikidata:Notability


2. It refers to an instance of a *clearly identifiable conceptual or 
material entity*. The entity must be notable, in the sense that it 
*can be described using serious and publicly available references*. If 
there is no item about you yet, you are probably not notable.




The reference is the official German business registry, which is serious 
and public. Orgs are also per definition clearly identifiable legal 
entities.


How can I get clearance to proceed on this?

All the best,
Sebastian



 Entity data


Saxony District court *Leipzig HRB 32853 *– A 
Dienstleistungsgesellschaft mbH

Legal status:   Gesellschaft mit beschränkter Haftung
Capital:25.000,00 EUR
Date of entry:  29/08/2016
(When entering date of entry, wrong data input can occur due to system 
failures!)

Date of removal:-
Balance sheet available:-
Address (subject to correction):A Dienstleistungsgesellschaft mbH
Prager Straße 38-40
04317 Leipzig


--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] CrossWikiFact Wikimedia Grant proposal

2017-09-22 Thread Sebastian Hellmann

Dear all,

we have created a draft to create a website that hopefully will help 
merge/unify/improve Infoboxes, Wikidata and DBpedia.


The proposal is still in a draft form (deadline September 26th), we 
would be very happy to hear your comments. (and there is still time to 
correct things we forgot, didn't know about or didn't phrase right).


All feedback is welcome: 
https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/CrossWikiFact



We also created a prototype here:

http://downloads.dbpedia.org/temporary/crosswikifact/results/q64.html

There is no index yet, so you have to manually change it to q1.html - 
q1.html in the uri (we only did 10k pages)






--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) 
Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] DBpedia Newsletter

2017-03-03 Thread Sebastian Hellmann

Dear Wikidata Community,

please find below the first edition of the DBpedia Newsletter, which 
also tries to raise awareness for Wikidata. We will send these in a 
three month intervall to all subscribers (~1900 at the moment) and would 
welcome any people who are interested to sign up 
(_http://eepurl.com/crrrkz_). We decided to use the newsletter as the 
major outlet for updates on DBpedia, so no further Newsletter edition 
will be sent to this list. Smaller and concise announcements will be 
posted on our _blog_ .


Newsletter sign up form:_http://eepurl.com/crrrkz_

Regarding Wikidata, we would also welcome contributions to the DBpedia 
Newsletter that help to further push Wikidata and increase contributions 
to your project. Please send us an email to dbpe...@infai.org 
(Announcement/News can include links and should be less than 100 words)


Thank you very much,

Sebastian


   *(Belated) Happy New Year to all DBpedians!*

We are a proud to look retrospectively at a successful year 2016 and 
would like to welcome 2017 with our new newsletter (~4 times per year), 
which will allow you to stay up to date and provide feedback, e.g. via 
the included survey and the planned events in 2017.


*New members *- We kickstarted 
the_DBpedia Association_ 
 with a first wave of new 
members: OpenLink Software, Semantic Web Company, Ontotext, FIZ 
Karlsruhe, Mannheim University, Poznań University of Economics and 
Business, the Business Information Systems Institute Ltd and the Open 
Knowledge Foundation Greece joined the DBpedia Association in 2016. We 
created two web pages that summarize reasons 
to_support DBpedia 
via donations_  
and_become a member_ 
.


*DBpedia in Dutch* (_http://nl.dbpedia.org_ ) is 
the first chapter which formalized an official DBpedia Chapter 
consortium. The cooperation was initiated by Koninklijke Bibliotheek 
(National Library of the Netherlands) and Huygens ING (research 
institute of History and Culture). Other partners like imec/Ghent 
University, the Netherlands Institute of Sound and Vision and the 
Network Institute (Vrije Universiteit) joined as well. This cooperation 
will strengthen the DBpedia Dutch chapter and community of contributors 
as well improve the cooperation with the Dutch research infrastructure 
and the Dutch Digital Heritage.


We are in intensive contact with the Japanese, the German and the 
Italian chapters to form official chapters and gain institutional 
support for DBpedia. You would like to know how to start a DBpedia 
chapter? Please check 
here:_http://wiki.dbpedia.org/about/language-chapters_


*DBpedia Releases *- Since 2015 we provide bi-annual DBpedia releases. 
The latest 
one,_2016-04_ 
, featured a new 
Wikipedia references & citation dataset for the English Wikipedia. In 
addition, we separated the data which we received from heuristic-based 
infobox extractors from the already mapped and quality-controlled data 
to remove low quality data and avoid duplicates & low quality 
information. We improved on 
our_Wikidata 
extraction_ 
 
and mapped all our data to Wikidata-based DBpedia IRIs. Wikidata will 
become more and more the core of future extractions.


*Consortium partners for EU and national projects - *Since its 
conception DBpedia has never been funded directly, but was supported 
only by individual tasks in different projects and by the spirit of 
volunteers. With the establishment of the DBpedia Association, we are 
now in a position to receive direct funding and are eligible as a full 
consortium partner for EU and International projects. Furthermore, we 
are able to mediate trusted partners - who have worked with us in the 
past and contributed greatly to DBpedia - for national proposals in over 
30 countries worldwide (contact via reply to this email.)


*Events - *In addition to our participation in the 12th edition of the 
SEMANTiCS, in the 19th International Conference on Business Information 
Systems and in the Google Summer of Code project, we arranged three 
DBpedia community meetings, in The Hague, Leipzig and Sunnyvale, 
California. In order to get updates on our events and projects, stay 
tuned and check the_DBpedia blog_ 
. Please check out the dates to the two new 
DBpedia meetings below.


*Links - *We have revived the portal, that 

Re: [Wikidata-l] [Dbpedia-discussion] [Dbpedia-developers] DBpedia-based RDF dumps for Wikidata

2015-03-12 Thread Sebastian Hellmann
Your description  sounds quite close to what we had in mind.  The high level 
group is manifesting quite well,  the domain groups are planned as pilots for 
selected domains (e.g. Law or Mobility). 

I lost a bit the overview on the data classification.  We might auto-link or 
crowdsource.  I would need to ask others,  however. 

We are aiming to create a structure that allows stability and innovation in an 
economic way -  -  I see this as the real challenge... 

Jolly good show, 
Sebastian 




On 11 March 2015 20:53:55 CET, John Flynn jflyn...@verizon.net wrote:
This is a very ambitious, but commendable, goal. To map all data on the
web to the DBpedia ontology is a huge undertaking that will take many
years of effort. However, if it can be accomplished the potential
payoff is also huge and could result in the realization of a true
Semantic Web. Just as with any very large and complex software
development effort, there needs to be a structured approach to
achieving the desired results. That structured approach probably
involves a clear requirements analysis and resulting requirements
documentation. It also requires a design document and an implementation
document, as well as risk assessment and risk mitigation. While there
is no bigger believer in the build a little, test a little rapid
prototyping approach to development, I don't think that is appropriate
for a project of this size and complexity. Also, the size and
complexity also suggest the final product will likely be beyond the
scope of any individual to fully comprehend the overall ontological
structure. Therefore, a reasonable approach might be to break the
effort into smaller, comprehensible segments. Since this is a large
ontology development effort, segmenting the ontology into domains of
interest and creating working groups to focus on each domain might be a
workable approach. There would also need to be a working group that
focus on the top levels of the ontology and monitors the domain working
groups to ensure overall compatibility and reduce the likelihood of
duplicate or overlapping concepts in the upper levels of the ontology
and treats universal concepts such as  space and time consistently.
There also needs to be a clear, and hopefully simple, approach to
mapping data on the web to the DBpedia ontology that will accommodate
both large data developers and web site developers.  It would be
wonderful to see the worldwide web community get behind such an
initiative and make rapid progress in realizing this commendable goal.
However, just as special interests defeated the goal of having a
universal software development approach (Ada), I fear the same sorts of
special interests will likely result in a continuation of the current
myriad development efforts. I understand the one size doesn't fit all
arguments, but I also think one size could fit a whole lot could be
the case here. 
 
Respectfully,
 
John Flynn
http://semanticsimulations.com
 
 
From: Sebastian Hellmann [mailto:hellm...@informatik.uni-leipzig.de] 
Sent: Wednesday, March 11, 2015 3:12 AM
To: Tom Morris; Dimitris Kontokostas
Cc: Wikidata Discussion List; dbpedia-ontology;
dbpedia-discuss...@lists.sourceforge.net; DBpedia-Developers
Subject: Re: [Dbpedia-discussion] [Dbpedia-developers] DBpedia-based
RDF dumps for Wikidata
 
Dear Tom,

let me try to answer this question in a more general way.  In the
future, we  honestly consider to map all data on the web to the DBpedia
ontology (extending it where it makes sense). We hope that this will
enable you to query many  data sets on the Web using the same queries. 


As a convenience measure, we will get a huge download server that
provides all data from a single point in consistent  formats and
consistent metadata, classified by the DBpedia Ontology.  Wikidata is
just one example, there is also commons, Wiktionary (hopefully via
DBnary), data from companies, DBpedia members and EU projects. 

all the best,
Sebastian

On 11.03.2015 06:11, Tom Morris wrote:
Dimitris, Soren, and DBpedia team, 
 
That sounds like an interesting project, but I got lost between the
statement of intent, below, and the practical consequences:
 
On Tue, Mar 10, 2015 at 5:05 PM, Dimitris Kontokostas
kontokos...@informatik.uni-leipzig.de wrote:
we made some different design choices and map wikidata data directly
into the DBpedia ontology.
 
What, from your point of view, is the practical consequence of these
different design choices?  How do the end results manifest themselves
to the consumers?
 
Tom
 




--
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub
for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join
the 
conversation now. http://goparallel.sourceforge.net

[Wikidata-l] 2nd LIDER Hackathon Preparation call: today at 14:00

2014-08-19 Thread Sebastian Hellmann

Apologies for cross-posting.

This is a kind reminder about the weekly Google Hangout to prepare for 
the LIDER Hackathon in Leipzig (Sept 1st).
The preparation Hangouts will happen each Tuesday at 2pm Leipzig time 
until the event.

Links to join can be found here:
http://mlode2014.nlp2rdf.org/hackathon/

You are still able to submit topics for hacking. Please add them to this 
document:

https://docs.google.com/document/d/13riJU5LY50Q6AeHzlkIqln9enlq9a1EDsOyp6C4XqNk/edit#
or send an email to Bettina Klimek kli...@informatik.uni-leipzig.de

Currently we have the confirmed topics below. Furthermore we have 
experts available that will help you to get in touch with Linked Data 
and RDF and help you to bring your own tools to the Semantic Web world.



   T7: [Confirmed] Roundtrip conversion from TBX2RDF and back


The idea of this is to work on a roundtrip conversion from the TBX 
standard for representing terminology to RDF and back.  The idea would 
be to build on the existing code at bitbucket: 
https://bitbucket.org/vroddon/tbx2rdf



Potential industry partner: TILDE (Tatiana)

Source code: https://bitbucket.org/vroddon/tbx2rdf

TBX Standard: http://www.ttt.org/oscarstandards/tbx/


Contact person: Philipp Cimiano, John McCrae, Victor Rodriguez-Doncel


   T8: [Confirmed] Converting multilingual dictionaries as LD on the Web


The experience on the creation of the Apertium RDF 
http://linguistic.linkeddata.es/apertium/dictionaries will be 
presented. Taking as starting point a bilingual dictionary represented 
in LMF/XML, a mapping into RDF was made by using tools such as Open 
Refine http://openrefine.org/. From each bilingual dictionary three 
components (graphs) were created in RDF: two lexicons and a translation 
set. The used vocabularies were lemon http://lemon-model.net/for 
representing lexical information and the translation module 
http://linguistic.linkeddata.es/def/translation/for representing 
translations. Once they were published on the Web, some immediate 
benefits arise such as: automatic enrichment of the monolingual lexicons 
each time a new dictionary is published (due to the URIs ruse), simple 
graph-based navigation across the lexical information and, more 
interestingly, simple querying across (initially) independent dictionaries.



The task could be either to reproduce part of the Apertium generation 
process, for those willing to learn about lemon and about techniques for 
representing translations in RDF, or to repeat the process with other 
input data (bilingual or multilingual lexica) provided by participants.



Contact person: Jorge Gracia


   T9: [Confirmed] Based on the NIF-LD output of Babelfy we can try to
   deploy existing RDF visualizations out of the box and query the
   output with SPARQL


Babelfy http://babelfy.org/is a unified, multilingual, graph-based 
approach to Entity Linking and Word Sense Disambiguation. Based on a 
loose identification of candidate meanings, coupled with a densest 
subgraph heuristic which selects high-coherence semantic 
interpretations, Babelfy is able to annotate free text with with both 
concepts and named entities drawn from BabelNet 
http://www.babelnet.org/’s sense inventory.



The task consists of converting text annotated by Babelfy into RDF 
format. In order to accomplish this, participants will start from free 
text, will annotate it with Babelfy and will eventually make use of the 
NLP2RDF NIF module http://site.nlp2rdf.org/. Data can also be 
displayed using visualization tools such as RelFinder 
http://www.visualdataweb.org/relfinder/relfinder.php.



Contact person: Tiziano Flati (fl...@di.uniroma1.it 
mailto:fl...@di.uniroma1.it), Roberto Navigli (navi...@di.uniroma1.it 
mailto:navi...@di.uniroma1.it)






--
Sebastian Hellmann
AKSW/NLP2RDF research group
Insitute for Applied Informatics (InfAI) and DBpedia Association
Events:
* *Sept. 1-5, 2014* Conference Week in Leipzig, including
** *Sept 2nd*, MLODE 2014 http://mlode2014.nlp2rdf.org/
** *Sept 3rd*, 2nd DBpedia Community Meeting 
http://wiki.dbpedia.org/meetings/Leipzig2014

** *Sept 4th-5th*, SEMANTiCS (formerly i-SEMANTICS) http://semantics.cc/
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
http://www.w3.org/community/ld4lt

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
Thesis:
http://tinyurl.com/sh-thesis-summary
http://tinyurl.com/sh-thesis
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Industry talks and Hackathon @ MLODE 2014, Sept. 1-2 in Leipzig, Germany

2014-08-11 Thread Sebastian Hellmann

Cal for Participation

Multilingual Linked Open Data (MLODE) 2014, Sept 1-2 in Leipzig, Germany
http://mlode2014.nlp2rdf.org/
Free registration, but space is limited.
Co-located with SEMANTiCS 2014 (http://semantics.cc)


   Industry speakers lined up to discuss use cases and requirements
   for linked data and content analytics

The agenda of the 4th LIDER roadmapping workshop and LD4LT event 
http://mlode2014.nlp2rdf.org/lider-roadmapping-workshop/ has been 
published. A great variety of industry stakeholders will talk about 
linked data and content analytics. Industry areas represented include 
content analytics technology, multilingual conversational applications, 
localisation and more.


The LIDER roadmapping workshop will take place on September 2nd in 
Leipzig, Germany and it will be collocated with the SEMANTiCS conference 
http://www.semantics.cc/. The workshop will be organised as part of 
MLODE 2014 http://mlode2014.nlp2rdf.org/ and will be preceded by a 
hackathon http://mlode2014.nlp2rdf.org/hackathon/ on the 1st of September.


The event is supported by the LIDER http://lider-project.eu/ EU 
project, the MultilingualWeb http://www.multilingualweb.eu/ community, 
the NLP2RDF http://nlp2rdf.org/ project as well as the DBpedia 
http://dbpedia.org/ project.



   3 Public hackathon preparation hangouts

In order to start brainstorming for the Hackathon on September 1st, we 
would like to invite you to the preparation hangouts each Tuesday at 
2pm. Please join here:



12/08 - 2 pm 
https://plus.google.com/events/c71fsqdkknppi5sj8fdr1cruam0?authkey=CKjBxoaQtP68zgE
19/08 - 2 pm 
https://plus.google.com/events/cpdamst0pl0vugsjtivleo6hjdk?authkey=CPHyjIn99Yi7NQ
26/08 - 2 pm 
https://plus.google.com/events/cbu14mvpfke9cpqicqlt834p3tk?authkey=CMnOwtq9k6D71gE


or via the NLP2RDF Google+ Community:
https://plus.google.com/communities/104767637055806635343

--
Sebastian Hellmann
AKSW/NLP2RDF research group
Insitute for Applied Informatics (InfAI) and DBpedia Association
Events:
* *Sept. 1-5, 2014* Conference Week in Leipzig, including
** *Sept 2nd*, MLODE 2014 http://mlode2014.nlp2rdf.org/
** *Sept 3rd*, 2nd DBpedia Community Meeting 
http://wiki.dbpedia.org/meetings/Leipzig2014

** *Sept 4th-5th*, SEMANTiCS (formerly i-SEMANTICS) http://semantics.cc/
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
http://www.w3.org/community/ld4lt

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
Thesis:
http://tinyurl.com/sh-thesis-summary
http://tinyurl.com/sh-thesis
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Fellow ticket discount for the SEMANTiCS conference Leipzig

2014-08-06 Thread Sebastian Hellmann

Dear colleagues and fellow enthusiast of the semantic web,

we are pleased to announce that the SEMANTiCS conference on Sep 4-5, 
2014 in Leipzig, will become a major industry conference on semantic web 
and linked data applications in Europe.


With partners such as PoolParty, STI2, Eccenca, IntraFind, LOD2 Project, 
MarkLogic, Ontos and Wolters Kluwer as well as more than 50% of our 
currently registered attendees being from the industry sector, SEMANTiCS 
has lifted the primarily academic topic of semantic web to the next 
level of business application.
From our keynote speakers Sofia Angeletou (BBC), Thomas Kelly 
(Cognizant), Phil Archer (W3C) and Orri Erling (OpenLink) to 40+ 
speakers on 5 parallel tracks and special events like the vocabulary 
carnival, H2020 networking event and conference dinner, SEMANTiCS offers 
a wide variety of industry insights and networking chances. You can see 
our programm here: http://www.semantics.cc/programme/


Hence, this year's SEMANTiCS conference is your chance to get in touch 
with potential business clients and industry partners to push your own 
projects and developments in the semantic web sector.


You can still submit to the Vocabulary Carnival: 
http://www.semantics.cc/vocarnival/ as well as the H2020 networking 
session. Furthermore, we will collect and print your H2020 organisation 
profile description in the program guide, so you can be approached at 
the conference for potential projects.


Being a fellow enthusiast in this future-defining field, we'd like to 
offer you a special discount of 20% on your ticket to the conference. 
Simply go to www.semantics.cc/registration/discount and claim your 
discount with the following promo code: “semantic-web-fellow”

This offer is valid until 15th of August.

For further information on the programme and our keynote speaker, please 
visit www.semantics.cc

Feel free to forward this email to any interested fellow.

See you in Leipzig,

Sebastian Hellmann
on behalf of the all conference committee members


--
Sebastian Hellmann
AKSW/NLP2RDF research group
Insitute for Applied Informatics (InfAI) and DBpedia Association
Events:
* *Sept. 1-5, 2014* Conference Week in Leipzig, including
** *Sept 2nd*, MLODE 2014 http://mlode2014.nlp2rdf.org/
** *Sept 3rd*, 2nd DBpedia Community Meeting 
http://wiki.dbpedia.org/meetings/Leipzig2014

** *Sept 4th-5th*, SEMANTiCS (formerly i-SEMANTICS) http://semantics.cc/
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
http://www.w3.org/community/ld4lt

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
Thesis:
http://tinyurl.com/sh-thesis-summary
http://tinyurl.com/sh-thesis
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Pubsubhubbub

2014-03-28 Thread Sebastian Hellmann
Hi all,
some weeks ago Anja asked me to send an email to this list with requirements 
from DBpedia regarding the Pubsubhubbub feed.

We are really happy that finally somebody started working on this.

The main thing DBpedia needs is the software to create an up-to-date mirror of 
each language version of Wikipedia . All other requirements can be deduced from 
this one.it would be bad for us,  If this is out of scope or not working 
correctly at the end of the project.

All the best,
Sebastian
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] ANN, the new DBpedia Association

2014-03-20 Thread Sebastian Hellmann

(please forward)

Dear all,

we are very happy to announce that we have succeeded in the formation of 
a new organization to support DBpedia and its community.
The DBpedia Association is now officially in action. In the coming 
months, we hope to raise funding to reach some of the goals outlined in 
our charter: http://dbpedia.org/association


At the European Data Forum in Athens today and tomorrow, you are able to 
meet many of the people from the DBpedia Community who have helped to 
create the DBpedia Association for example Martin Kaltenböck, Michael 
Martin and Dimitris Kontokostas will be at the the LOD2 Booth at EDF 
and  Asunción Gómez Pérez will be at the LD4LT side event on Friday 
(Just to name a few)


From September 1st-5th in Leipzig, we hope to gather everyone to 
celebrate this great achievement. Especially on the 3rd of September, 
where we will have the 2nd DBpedia Community meeting, which is 
co-located with the SEMANTiCS 2014 (formerly i-SEMANTICS) on September 4-5.


The people who have all worked together to create this wealth of value 
under the DBpedia name are so numerous that we are hardly able to know 
their exact number or all their names. For proper acknowledgement, as a 
first action the DBpedia Association will start to give out Linked Data 
URIs during the next months for all its contributors and supporters.


Personally, I am very proud to live in such a great age of collaboration 
where we are able to work together across borders and institutions.


Hope to see you in person in September or earlier as linked data under 
the http://dbpedia.org/community/$contributor namespace.

Sebastian Hellmann


--
Sebastian Hellmann
AKSW/NLP2RDF research group
Insitute for Applied Informatics (InfAI) affiliated with DBpedia
Events:
* *21st March, 2014*: LD4LT Kick-Off 
https://www.w3.org/community/ld4lt/wiki/LD4LT_Group_Kick-Off_and_Roadmap_Meeting 
@European Data Forum

* *Sept. 1-5, 2014* Conference Week in Leipzig, including
** *Sept 2nd*, MLODE 2014
** *Sept 3rd*, 2nd DBpedia Community Meeting
** *Sept 4th-5th*, SEMANTiCS (formerly i-SEMANTICS) http://semantics.cc/
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
http://www.w3.org/community/ld4lt

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
Thesis:
http://tinyurl.com/sh-thesis-summary
http://tinyurl.com/sh-thesis
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] CfP SEMANTiCS (formerly known as I-SEMANTICS) September 4-5, 2014

2014-02-13 Thread Sebastian Hellmann
. SEMANTiCS is the place to 
dive deep into semantic web and linked data technologies.


Community: Make new friends and renew old acquaintances. SEMANTiCS is 
the meeting place to discuss and to initiate new projects, to extend 
ongoing activities, and to showcase the latest developments.


On behalf of all the people organizing SEMANTiCS 2014, we hope to see 
you in September in Leipzig,
Sebastian Hellmann (AKSW, DBpedia), Harald Sack 
(Hasso-Plattner-Institut), Agata Filipowska (Poznan University of 
Economics), Christian Dirschl (Wolters Kluwer) and Andreas Blumenauer 
(Semantic Web Company)



--
Sebastian Hellmann
AKSW/NLP2RDF research group
Insitute for Applied Informatics (InfAI) affiliated with DBpedia
Events:
* *21st March, 2014*: LD4LT Kick-Off 
https://www.w3.org/community/ld4lt/wiki/LD4LT_Group_Kick-Off_and_Roadmap_Meeting 
@European Data Forum

* *Sept. 1-5, 2014* Conference Week in Leipzig, including
** *Sept 2nd*, MLODE 2014
** *Sept 3rd*, 2nd DBpedia Community Meeting
** *Sept 4th-5th*, SEMANTiCS (formerly i-SEMANTICS) http://semantics.cc/
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
http://www.w3.org/community/ld4lt

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
Thesis:
http://tinyurl.com/sh-thesis-summary
http://tinyurl.com/sh-thesis
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Call for Ideas and Mentors for GSoC 2014 DBpedia + Spotlight joint proposal (please contribute within the next days)

2014-02-11 Thread Sebastian Hellmann

Dear all,
Dimitris Kontokostas has started to draft a document for submission at 
Google Summer of Code:

http://dbpedia.org/gsoc2014

We are still in need of ideas and mentors.  If you have any improvements 
on DBpedia or DBpedia Spotlight that you would like to have done, please 
submit it in the ideas section now. Note that accepted GSoC students 
will receive about 5000 USD, which can help you to estimate the effort 
and size of proposed ideas. It is also ok to extend/amend existing ideas 
(as long as you don't hi-jack them). Please edit here:

https://docs.google.com/document/d/13YcM-LCs_W3-0u-s24atrbbkCHZbnlLIK3eyFLd7DsI/edit?pli=1

Becoming a mentor is also a very good way to get involved with DBpedia. 
As a mentor you will also be able to vote on proposals, after Google 
accepts our project. Note that it is also ok, if you are a researcher 
and have a suitable student to submit an idea and become mentor. After 
acceptance by Google the student then has to apply for the idea and get 
accepted.


Please take some time this week to add your ideas and apply as a mentor, 
if applicable. Feel free to improve the introduction as well and comment 
on the rest of the document.


Information on GSoC in general can be found here:
http://www.google-melange.com/gsoc/homepage/google/gsoc2014

Thank you for your help,
Sebastian and Dimitris



--
Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events:
* *21st March, 2014*: LD4LT Kick-Off 
https://www.w3.org/community/ld4lt/wiki/LD4LT_Group_Kick-Off_and_Roadmap_Meeting 
@European Data Forum

* *Sept. 1-5, 2014* Conference Week in Leipzig, including
** *MLODE 2014*
** *Sept 3rd 2014* 2nd DBpedia Community Meeting
** *Sept. 4th-5th 2014* SEMANTiCs (formerly i-SEMANTICS) 
http://semantics.cc/

Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, http://dbpedia.org/Wiktionary

Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
Stop asking, it's here:
http://tinyurl.com/sh-thesis-summary
http://tinyurl.com/sh-thesis
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Reminder DBpedia Community meeting, 30th January at VU Amsterdam

2014-01-22 Thread Sebastian Hellmann

Dear DBpedians,
organisation of the DBpedia Community meeting in Amsterdam is 
progressing quite well for the short time we had to organise it.


Here are some news:
* Schedule: the meeting page and schedule has (and constantly is ) 
updated: http://wiki.dbpedia.org/meetings/Amsterdam2014
* Twitter hashtag :  #DBpediaAmsterdam : 
https://twitter.com/search?q=%23DBpediaAmsterdamf=realtime
* Lunch and drinks are sponsored by the Semantic Web Company 
(http://semantic-web.at) and the National library of the Netherlands 
(http://www.kb.nl/)
* Almost 50 participants have registered since the last announcement 
email: http://nl.dbpedia.org/DBpediaMeeting/


# Sponsors:
We still have one open slot for a sponsor, who will be linked on the 
page and prominently placed in the summary report as well.


# Presentation submission open:
http://tinyurl.com/DBpedia-amsterdam-2014

# Register
Please register right now here:
https://docs.google.com/spreadsheet/ccc?key=0AgXBFSqdVAOndDEwNDY2MkpYUHM5cl9vS3dhTkJ2YkE#gid=0


Hope to see you in Amsterdam,
Gerard, Gerald (from Dutch DBpedia), Lora and Victor from VU Amsterdam,  
Mariano from the Spanish DBpedia and Dimitris and Sebastian



--
Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events:
* *30th January, 2014*: 1st DBpedia Community meeting 
(http://wiki.dbpedia.org/meetings/Amsterdam2014)

* *Sept. 2014* MLODE
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, http://dbpedia.org/Wiktionary

Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
Stop asking, it's here:
http://tinyurl.com/sh-thesis-summary
http://tinyurl.com/sh-thesis
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata RDF export available

2013-08-10 Thread Sebastian Hellmann

Hi Markus!
Thank you very much.

Regarding your last email:
Of course, I am aware of your arguments in your last email, that the 
dump is not official. Nevertheless, I am expecting you and others to 
code (or supervise) similar RDF dumping projects in the future.


Here are two really important things to consider:

1. Always use a mature RDF framework for serializing:
Even DBpedia was publishing RDF for years that had some errors in it, 
this was really frustrating for maintainers (handling bug reports) and 
clients (trying to quick-fix it).
Other small projects (in fact exactly the same as yours Markus, a guy 
publishing some useful software) went the same way: Lot's of small 
syntax bugs, many bug requests, lot of additional work. Some of them 
were abandoned because the developer didn't have time anymore.


2. Use NTriples or one-triple-per-line Turtle:
(Turtle supports IRIs and unicode, compare)
curl 
http://downloads.dbpedia.org/3.8/ko/mappingbased_properties_ko.ttl.bz2 | 
bzcat | head
curl 
http://downloads.dbpedia.org/3.8/ko/mappingbased_properties_ko.nt.bz2 | 
bzcat | head


one-triple-per-line let's you
a) find errors easier and
b) aids further processing, e.g. calculate the outdegree of subjects:
curl 
http://downloads.dbpedia.org/3.8/ko/mappingbased_properties_ko.ttl.bz2 | 
bzcat | head -100 | cut -f1 -d '' | grep -v '^#' | sed 's///;s///' | 
awk '{count[$1]++}END{for(j in count) print  j  \tcount [j]}'


Furthermore:
- Parsers can treat one-triple-per-line more robust, by just skipping lines
- compression size is the same
- alphabetical ordering of data works well (e.g. for GitHub diffs)
- you can split the files in several smaller files easily


Blank nodes have some bad properties:
- some databases react weird to them and they sometimes fill up indexes 
and make the DB slow (depends on the implementations of course, this is 
just my experience )

- make splitting one-triple-per-line more difficult
- difficult for SPARQL to resolve recursively
- see http://videolectures.net/iswc2011_mallea_nodes/ or 
http://web.ing.puc.cl/~marenas/publications/iswc11.pdf



Turtle prefixes:
Why do you think they are a good thing? They are disputed as sometimes 
as a premature feature. They do make data more readable, but nobody is 
going to read 4.4 GB of Turtle.

By the way, you can always convert it to turtle easily:
curl 
http://downloads.dbpedia.org/3.8/ko/mappingbased_properties_ko.ttl.bz2 | 
bzcat | head -100  | rapper -i turtle -o turtle -I - - file


All the best,
Sebastian



Am 10.08.2013 12:44, schrieb Markus Krötzsch:
Good morning. I just found a bug that was caused by a bug in the 
Wikidata dumps (a value that should be a URI was not). This led to a 
few dozen lines with illegal qnames of the form w: . The updated 
script fixes this.


Cheers,

Markus

On 09/08/13 18:15, Markus Krötzsch wrote:

Hi Sebastian,

On 09/08/13 15:44, Sebastian Hellmann wrote:

Hi Markus,
we just had a look at your python code and created a dump. We are still
getting a syntax error for the turtle dump.


You mean just as in at around 15:30 today ;-)? The code is under
heavy development, so changes are quite frequent. Please expect things
to be broken in some cases (this is just a little community project, not
part of the official Wikidata development).

I have just uploaded a new statements export (20130808) to
http://semanticweb.org/RDF/Wikidata/ which you might want to try.



I saw, that you did not use a mature framework for serializing the
turtle. Let me explain the problem:

Over the last 4 years, I have seen about two dozen people 
(undergraduate

and PhD students, as well as Post-Docs) implement simple serializers
for RDF.

They all failed.

This was normally not due to the lack of skill, but due to the lack of
missing time. They wanted to do it quick, but they didn't have the time
to implement it correctly in the long run.
There are some really nasty problems ahead like encoding or special
characters in URIs. I would direly advise you to:

1. use a Python RDF framework
2. do some syntax tests on the output, e.g. with rapper
3. use a line by line format, e.g. use turtle without prefixes and just
one triple per line (It's like NTriples, but with Unicode)


Yes, URI encoding could be difficult if we were doing it manually. Note,
however, that we are already using a standard library for URI encoding
in all non-trivial cases, so this does not seem to be a very likely
cause of the problem (though some non-zero probability remains). In
general, it is not unlikely that there are bugs in the RDF somewhere;
please consider this export as an early prototype that is meant for
experimentation purposes. If you want an official RDF dump, you will
have to wait for the Wikidata project team to get around doing it (this
will surely be based on an RDF library). Personally, I already found the
dump useful (I successfully imported some 109 million triples of some
custom script into an RDF store), but I know that it can require

Re: [Wikidata-l] Wikidata RDF export available

2013-08-09 Thread Sebastian Hellmann
 for usage instructions
[2] http://semanticweb.org/RDF/Wikidata/
[3] http://meta.wikimedia.org/wiki/Wikidata/Development/RDF




___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l




--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events:
* NLP  DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, Extended 
Deadline: *July 18th*)

* LSWT 23/24 Sept, 2013 in Leipzig (http://aksw.org/lswt)
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
http://dbpedia.org/Wiktionary , http://dbpedia.org

Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Deadline Extension - July 18th - Workshop on NLP DBpedia at ISWC 2013

2013-07-08 Thread Sebastian Hellmann
 Stanojevic', Institut Mihajlo Pupin, Serbia
* Hans Uszkoreit, Deutsches Forschungszentrum für künstliche 
Intelligenz, Germany

* Rupert Westenthaler, Salzburg Research, Austria
* Feiyu Xu, Deutsches Forschungszentrum für künstliche Intelligenz, Germany

Contact
=
Of course we would prefer that you will post any questions and comments 
regarding NLP and DBpedia to our public mailing list at: 
nlp-dbpedia-public [at] lists.informatik.uni-leipzig.de


If you want to contact the chairs of the workshop directly, please write to:
nlp-dbpedia2013 [at] easychair.org

Kind regards,
Sebastian Hellmann, Agata Filipowska, Caroline Barrière,
Pablo N. Mendes, Dimitris Kontokostas


--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events:
* NLP  DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, Extended 
Deadline: *July 18th*)

* LSWT 23/24 Sept, 2013 in Leipzig (http://aksw.org/lswt)
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
http://dbpedia.org/Wiktionary , http://dbpedia.org

Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] WikiData change propagation for third parties

2013-04-26 Thread Sebastian Hellmann

Dear Jeremy,
please read email from Daniel Kinzler on this list from 26.03.2013 18:26 :


* A dispatcher needs about 3 seconds to dispatch 1000 changes to a client wiki.
* Considering we have ~300 client wikis, this means one dispatcher can handle
about 4000 changes per hour.
* We currently have two dispatchers running in parallel (on a single box, hume),
that makes a capacity of 8000 changes/hour.
* We are seeing roughly 17000 changes per hour on wikidata.org - more than twice
our dispatch capacity.
* I want to try running 6 dispatcher processes; that would give us the capacity
to handle 24000 changes per hour (assuming linear scaling).


1.  Somebody needs to run the Hub and it needs to scale. Looks like the 
protocol was intended to save some traffic, not to dispatch a massive 
amount of messages / per day to a large number of clients. Again, I am 
not familiar, how efficient PubSubHubbub is. What kind of hardware is 
needed to run this, effectively? Do you have experience with this?


2. Somebody will still need to run and maintain the Hub and feed all 
clients. I was offering to host one of the hubs for DBpedia users, but I 
am not sure, whether we have that capacity.


So we should use IRC RC + http request to the changed page as fallback?

Sebastian

Am 26.04.2013 08:06, schrieb Jeremy Baron:

  Hi,

On Fri, Apr 26, 2013 at 5:29 AM, Sebastian Hellmann
hellm...@informatik.uni-leipzig.de wrote:

Well, PubSubHubbub is a nice idea. However it clearly depends on two factors:
1. whether Wikidata sets up such an infrastructure (I need to check whether we 
have capacities, I am not sure atm)

Capacity for what? the infrastructure should be not be a problem.
(famous last words, can look more closely tomorrow. but I'm really not
worried about it) And you don't need any infrastructure at all for
development; just use one of google's public instances.


2. whether performance is good enough to handle high-volume publishers

Again, how do you mean?


Basically, polling to recent changes [1] and then do a http request to the 
individual pages should be fine for a start. So I guess this is what we will 
implement, if there aren't any better suggestions.
The whole issue is problematic and the DBpedia project would be happy, if this 
were discussed and decided right now, so we can plan development.

What is the best practice to get updates from Wikipedia at the moment?

I believe just about everyone uses the IRC feed from
irc.wikimedia.org.
https://meta.wikimedia.org/wiki/IRC/Channels#Raw_feeds

I imagine wikidata will or maybe already does propagate changes to a
channel on that server but I can imagine IRC would not be a good
method for many Instant data repo users. Some will not be able to
sustain a single TCP connection for extended periods, some will not be
able to use IRC ports at all, and some may go offline periodically.
e.g. a server on a laptop. AIUI, PubSubHubbub has none of those
problems and is better than the current IRC solution in just about
every way.

We could potentially even replace the current cross-DB job queue
insert crazyness with PubSubHubbub for use on the cluster internally.

-Jeremy

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l




--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
http://dbpedia.org/Wiktionary , http://dbpedia.org

Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] WikiData change propagation for third parties

2013-04-25 Thread Sebastian Hellmann
Well, PubSubHubbub is a nice idea. However it clearly depends on two 
factors:
1. whether Wikidata sets up such an infrastructure (I need to check 
whether we have capacities, I am not sure atm)

2. whether performance is good enough to handle high-volume publishers

Basically, polling to recent changes [1] and then do a http request to 
the individual pages should be fine for a start. So I guess this is what 
we will implement, if there aren't any better suggestions.
The whole issue is problematic and the DBpedia project would be happy, 
if this were discussed and decided right now, so we can plan development.


What is the best practice to get updates from Wikipedia at the moment?
We are still using OAI-PMH...

In DBpedia, we use a simple self-created protocol:
http://wiki.dbpedia.org/DBpediaLive#h156-4
/Publication of changesets/: Upon modifications old triples 
are replaced with updated triples. Those added and/or deleted triples 
are also written as N-Triples files and then compressed. Any client 
application or DBpedia-Live mirror can download those files 
and integrate and, hence, update a local copy of DBpedia. This enables 
that application to always in synchronization with our DBpedia-Live. 

This could also work for Wikidata facts, right?


Other useful links:
- http://www.openarchives.org/rs/0.5/resourcesync
- http://www.sdshare.org/
- http://www.w3.org/community/sdshare/
- http://www.rabbitmq.com/


All the best,
Sebastian

[1] 
https://www.wikidata.org/w/index.php?title=Special:RecentChangesfeed=atom


Am 26.04.2013 03:15, schrieb Hady elsahar:

Hello Dimirtis

what do you thing of that ?
shall i write this part as an abstract part in the proposal and wait 
for more details ,
or could we have a smiliar plan like the one already implemented in 
dbpedia http://wiki.dbpedia.org/DBpediaLive#h156-3


thanks
regards


On Fri, Apr 26, 2013 at 12:50 AM, Jeremy Baron jer...@tuxmachine.com 
mailto:jer...@tuxmachine.com wrote:


On Thu, Apr 25, 2013 at 10:42 PM, Hady elsahar
hadyelsa...@gmail.com mailto:hadyelsa...@gmail.com wrote:
 2- is there any design pattern or a  brief outline for the
change propagation design , how it would be ? in order that i
could make a rough plan and estimation about how it could be
consumed from the DBpedia side ?

I don't know anything about the plan for this but it seems at first
glance like a good place to use [[w:PubSubHubbub]].

-Jeremy

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org mailto:Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l




--
-
Hady El-Sahar
Research Assistant
Center of Informatics Sciences | Nile University 
http://nileuniversity.edu.eg/


email : hadyelsa...@gmail.com mailto:hadyelsa...@gmail.com
Phone : +2-01220887311 tel:%2B2-01220887311
http://hadyelsahar.me/

http://www.linkedin.com/in/hadyelsahar



___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l



--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
http://dbpedia.org/Wiktionary , http://dbpedia.org

Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Fwd: [Dbpedia-gsoc] Google Summer of Code update (Wikidata + DBpedia, Guidelines, deadline)

2013-04-22 Thread Sebastian Hellmann

Hi all,
I streamlined your ideas about the Wikidata GSoC projects with our 
description:

http://wiki.dbpedia.org/gsoc2013/ideas/WikidataMappings

Not sure, whether there can be joint or duplicate applications (e.g. one 
student applying at both organizations with the same proposal ) ... I 
guess we can figure this out, when it happens.

All the best,
Sebastian


 Original-Nachricht 
Betreff: 	[Dbpedia-gsoc] Google Summer of Code update (Wikidata + 
DBpedia, Guidelines, deadline)

Datum:  Mon, 22 Apr 2013 12:45:26 +0200
Von:Sebastian Hellmann hellm...@informatik.uni-leipzig.de
An: 	Dbpedia-developers dbpedia-develop...@lists.sourceforge.net, 
DBpedia dbpedia-discuss...@lists.sourceforge.net, 
dbpedia-g...@lists.sourceforge.net




Hi all,

1. the deadline for GSoC applications is approaching. Students have to
apply before: 3rd of May
Details here: http://wiki.dbpedia.org/gsoc2013/apply and here
http://wiki.dbpedia.org/gsoc2013/ideas


2. we updated the description for the Wikidata + DBpedia idea:
http://wiki.dbpedia.org/gsoc2013/ideas/WikidataMappings?v=1cy8
Feedback welcome :)

3. Also I added two guidelines for students and mentors:
http://wiki.dbpedia.org/gsoc2013/ideas?v=1cgg#h254-3
They are straightforward, actually.

All the best,
Sebastian



--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
___
Dbpedia-gsoc mailing list
dbpedia-g...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc



___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] MLODE - Multilingual Linked Open Data for Enterprises Workshop 23/24/25 Sept.

2012-07-02 Thread Sebastian Hellmann
Just to give you a heads up (the official announcement will follow soon, 
lot's of changes still to come):

http://sabre2012.infai.org/mlode

There will be a session for Best Practices for Multilingual LOD .
We will also talk about a road map for DBpedia, so it might be really 
interesting to stake common goals there.  Large parts of DBpedia dev 
will be there.


Please contact the whole MLODE orga team, if you have ideas or want to 
join it:  mlode2...@lists.informatik.uni-leipzig.de
or write to mlode2012-spon...@informatik.uni-leipzig.de , if you are 
interested in sponsoring the event.


All the best,
Sebastian

--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Provenance tracking on the Web with NIF-URIs

2012-06-21 Thread Sebastian Hellmann
paragraph level? See e.g. how Berlin is highlighted here:
http://pcai042.informatik.uni-**leipzig.de/~swp12-9/**
vorprojekt/index.php?**annotation_request=http%3A%2F%**
2Fwww.worldatlas.com%**2Fcitypops.htm%23hash_4_30_**
7449e732716c8e68842289bf2e6667**d5_Berlin%2C%2520Germany%2520-**%25203%2Chttp://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.worldatlas.com%2Fcitypops.htm%23hash_4_30_7449e732716c8e68842289bf2e6667d5_Berlin%2C%2520Germany%2520-%25203%2C
in this very early prototype.

Could you give me a link were I can read more about any Wikidata plans
towards this direction?
Sebastian



On 05/16/2012 09:10 AM, Sebastian Hellmann wrote:


Dear all,
(Note: I could not find the document, where your requirements regarding
the tracking of facts on the web are written, so I am giving a general
introduction to NIF. Please send me a link to the document that specifies
your need for tracing facts on the web, thanks)

I would like to point your attention to the URIs used in the NLP
Interchange Format (NIF).
NIF-URIs are quite easy to use, understand and implement. NIF has a
one-triple-per-annotation paradigm.  The latest documentation can be found
here:
http://svn.aksw.org/papers/**2012/WWW_NIF/public/string_**ontology.pdfhttp://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf

The basic idea is to use URIs with hash fragment ids to annotate or mark
pages on the web:
An example is the first occurrence of Semantic Web on
http://www.w3.org/**DesignIssues/LinkedData.htmlhttp://www.w3.org/DesignIssues/LinkedData.html
  as highlighted here:
http://pcai042.informatik.uni-**leipzig.de/~swp12-9/**
vorprojekt/index.php?**annotation_request=http%3A%2F%**
2Fwww.w3.org%2FDesignIssues%**2FLinkedData.html%23hash_10_**12_**
60f02d3b96c55e137e13494cf9a02d**06_Semantic%2520Webhttp://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web

Here is a NIF example for linking a part of the document to the DBpedia
entry of the Semantic Web:
http://www.w3.org/**DesignIssues/LinkedData.html#**offset_717_729http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729
  a str:StringInContext ;
  
sso:oenhttp://dbpedia.org/resource/**Semantic_Webhttp://dbpedia.org/resource/Semantic_Web
.


We are currently preparing a new draft for the spec 2.0. The old one can
be found here:
http://nlp2rdf.org/nif-1-0/

There are several EU projects that intend to use NIF. Furthermore, it is
easier for everybody, if we standardize a Web annotation format together.
Please give feedback of your use cases.
All the best,
Sebastian



--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects:http://nlp2rdf.org  ,http://dbpedia.org
Homepage:http://bis.informatik.uni-**leipzig.de/SebastianHellmannhttp://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group:http://aksw.org


__**_
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/**mailman/listinfo/wikidata-lhttps://lists.wikimedia.org/mailman/listinfo/wikidata-l





___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l



--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects:http://nlp2rdf.org  ,http://dbpedia.org
Homepage:http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group:http://aksw.org

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Provenance tracking on the Web with NIF-URIs

2012-05-16 Thread Sebastian Hellmann

Dear all,
(Note: I could not find the document, where your requirements regarding 
the tracking of facts on the web are written, so I am giving a general 
introduction to NIF. Please send me a link to the document that 
specifies your need for tracing facts on the web, thanks)


I would like to point your attention to the URIs used in the NLP 
Interchange Format (NIF).
NIF-URIs are quite easy to use, understand and implement. NIF has a 
one-triple-per-annotation paradigm.  The latest documentation can be 
found here:

http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf

The basic idea is to use URIs with hash fragment ids to annotate or mark 
pages on the web:
An example is the first occurrence of Semantic Web on 
http://www.w3.org/DesignIssues/LinkedData.html  as highlighted here:
http://pcai042.informatik.uni-leipzig.de/~swp12-9/vorprojekt/index.php?annotation_request=http%3A%2F%2Fwww.w3.org%2FDesignIssues%2FLinkedData.html%23hash_10_12_60f02d3b96c55e137e13494cf9a02d06_Semantic%2520Web 



Here is a NIF example for linking a part of the document to the DBpedia 
entry of the Semantic Web:

http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729
  a str:StringInContext ;
  sso:oen http://dbpedia.org/resource/Semantic_Web .


We are currently preparing a new draft for the spec 2.0. The old one can 
be found here:

http://nlp2rdf.org/nif-1-0/

There are several EU projects that intend to use NIF. Furthermore, it is 
easier for everybody, if we standardize a Web annotation format together.

Please give feedback of your use cases.
All the best,
Sebastian

--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Linked Data Cup - Deadline Extension until April 25th, 2012

2012-04-15 Thread Sebastian Hellmann
(apologies for multiple posts; please forward; please email 
i-challenge2012_a_t_easychair.org for questions)


NEWS:
- Deadline extension until April 25th, 2012
- The total amount of 2,000 EUR (sponsored by Wolters Kluwer) will be 
awarded in prizes and split among the most promising applications.
- Linked Data Cup Board updated: 
http://i-challenge.blogs.aksw.org/chairs-committee


Linked Data Cup 2012
http://i-challenge.blogs.aksw.org/
co-located with the I-Semantics 2012
Graz, Austria, 5 - 7 September 2012
http://www.i-semantics.at

The yearly organised Linked Data Cup (formerly Triplification Challenge) 
awards prizes to the most promising innovation involving linked data. 
Four different technological topics are addressed: triplification, 
interlinking, cleansing, and application mash-ups. The Linked Data Cup 
invites scientists and practitioners to submit novel and innovative (5 
star) linked data sets and applications built on linked data technology.
Although more and more data is triplified and published as RDF and 
linked data, the question arises how to evaluate the usefulness of such 
approaches. The Linked Data Cup therefore requires all submissions to 
include a concrete use case and problem statement alongside a solution 
(triplified data set, interlinking/cleansing approach, linked data 
application) that showcases the usefulness of linked data. Submissions 
that can provide measurable benefits of employing linked data over 
traditional methods are preferred.
Note that the call is not limited to any domain or target group. We 
accept submissions ranging from value-added business intelligence use 
cases to scientific networks to the longest tail [1] of information 
domains. The only strict requirement is that the employment of linked 
data is very well motivated and also justified (i.e. we rank approaches 
higher that provide solutions, which could not have been realised 
without linked data, even if they lack technical or scientific 
brilliance). The total amount of 2,000 EUR (sponsored by Wolters Kluwer) 
will be awarded in prizes and split among the most promising applications.


Evaluation Criteria
===
The submissions will be initially evaluated with a well-known five star 
ranking system [2]. Furthermore, entries will be assessed according to 
the extent to which they

1. motivate the relevancy of their use case for their respective domain;
2. justify the adequacy of linked data technologies for their solution;
3. demonstrate that all alternatives to linked data would have resulted 
in an inferior solution;

4. provide an evaluation that can measure the benefits of linked data

Topics
==
Ideas for topics include (but are not limited to):
* Improving traditional approaches with help of linked data
* Linked data use in science and education
* Linked data supported multimedia applications
* Linked data in the open source context
* Web annotation
* Generic applications
* Internationalization of linked data
* Visualization of linked data
* Linked government data
* Business models based on linked data
* Recommender systems supported by linked data
* Integrating microposts with linked data
* Distributed social web based on linked data
* Linked data sensor networks

Submission and Reviewing

Submissions to the Linked Data Cup will be reviewed by members of the 
Linked Data Cup Board and invited experts from the Linked Data community.
Submissions should consist of 4 pages and must be original and must not 
have been submitted for publication elsewhere. Papers should follow the 
ACM ICPS guidelines for formatting as accepted submissions will be 
published in the I-SEMANTICS 2012 proceedings in the digital library of 
the ACM ICP series. Please read the submission page[a] for detailed 
information on how to submit.

Important Dates (Linked Data Cup)
1. Paper Submission Deadline: April 25, 2012
2. Notification of Acceptance: May 21, 2012
3. Camera-Ready Paper: June 11, 2012

Links
=
[1] 
http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=uleiauersemmedmainz20110714-110715042111-phpapp01rel=0startSlide=5stripped_title=the-semantic-data-web-sren-auer-university-of-leipziguserName=lod2project 


[2] http://www.w3.org/DesignIssues/LinkedData.html


--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l