Re: [Wikidata] Comparisons between DBpedia and Wikidata

2017-04-18 Thread Dimitris Kontokostas
Hi, a few clarifications from my side
A more up-to-date link for the data is here: https://gist.github.com/
jimkont/01f6add8527939c39192bcb3f840eca0
and this dump was not generated with federated queries, as this was not
possible at the time of creation but with a simple script

it is meant only as a proof-of-concept project that showcases differences
in birthdates between Wikidata, Dutch Wikipedia and Greek Wikipedia as
extracted in the DBpedia 2016-04 release, (which also means that it is
based on ~1y old data)

Best,
Dimitris


On Mon, Apr 17, 2017 at 4:14 PM, Gerard Meijssen 
wrote:

> Hoi,
> With the recent introduction of federation for DBpedia, it is possible to
> have queries for the DBpedias for a specific language and Wikidata. I have
> blogged how we can make use for this [1].
>
> It makes it much easier to compare Wikidata and DBpedia and when we take
> this serious and apply some effort we can make a tool like the one by
> Pasleim [2] for Wikipedias that do not have a category for people who died
> in a given year.
> Thanks,
>   GerardM
>
> [1] http://ultimategerardm.blogspot.nl/2017/04/wikidata-user
> -story-dbpedia-death-and.html
> [2] http://tools.wmflabs.org/pltools/recentdeaths/
>
>
>
>
> On 1 April 2017 at 11:34, Gerard Meijssen 
> wrote:
>
>> Hoi,
>> I was asked by one of the DBpedia people to write a project plan.. I gave
>> it a try [1].
>>
>> The idea is to first compare DBpedia with Wikidata where a comparison is
>> possible. When it is not (differences in their classes for instance) it is
>> at first not what we focus on.
>>
>> Please comment on the talk page and when there are things missing in the
>> plan, please help it improve.
>> Thanks,
>>  GerardM
>>
>>
>>
>> [1] https://www.wikidata.org/wiki/User:GerardM/DBpedia_for_Quality
>>
>>
>> On 1 April 2017 at 10:44, Reem Al-Kashif  wrote:
>>
>>> Hi
>>>
>>> I don't have an idea about how to develop this, but it seems like an
>>> interesting project!
>>>
>>> Best,
>>> Reem
>>>
>>> On 30 Mar 2017 10:17, "Gerard Meijssen" 
>>> wrote:
>>>
 Hoi,
 Much of the content of DBpedia and Wikidata have the same origin;
 harvesting data from a Wikipedia.  There is a lot of discussion going on
 about quality and one point that I make is that comparing "Sources" and
 concentrating on the differences particularly where statements differ is
 where it is easiest to make a quality difference.

 So given that DBpedia harvests both Wikipedia and Wikidata, can it
 provide us with a view where a Wikipedia statement and a Wikidata statement
 differ. To make it useful, it is important to subset this data. I will not
 start with 500.000 differences but I will begin when they are about a
 subset that I care about.

 When I care about entries for alumni of a university, I will consider
 curating the information in question. Particularly when I know the language
 of the Wikipedia.

 When we can do this, another thing that will promote the use of a tool
 like this is when regularly (say once a month) numbers are stored and
 trends are published.

 How difficult is it to come up with something like this. I know this
 tool would be based on DBpedia but there are several reasons why this is
 good. First it gives added relevance to DBpedia (without detracting from
 Wikidata) and secondly as DBpedia updates on RSS changes for several
 Wikipedias, the effect of these changes is quickly noticed when a new set
 of data is requested.

 Please let us know what the issues are and what it takes to move
 forward with this, Does this make sense?
 Thanks,
GerardM

 http://ultimategerardm.blogspot.nl/2017/03/quality-dbpedia-a
 nd-kappa-alpha-psi.html

 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata


>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>


-- 
Kontokostas Dimitris
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] NLP text corpus annotated with Wikidata entities?

2017-02-06 Thread Dimitris Kontokostas
I am quoting a response by my colleague Martin Brummer (in cc) that
answered a similar question recently

```
there are the DBpedia NIF abstract datasets which contain DBpedia
abstracts, article structure annotations and entity links contained in
the abstracts, currently available in 9 languages.[1]

Entity links in that datasets are only the links set by Wikipedia
editors. This means each linked entity is only linked once in the
article (the first time it is mentioned). Repeat mentions of the entity
are not linked again.

[...Martin & Milan...] tried to remedy this issue by additionally linking
other
surface forms of entities previously mentioned in the abstract in this
older version of the corpus, available in 7 languages [2].

[1] http://wiki.dbpedia.org/nif-abstract-datasets
[2] https://datahub.io/dataset/dbpedia-abstract-corpus
```

DBpedia is also working on providing the whole Wikipedia pages in NIF
format with annotated links.
These will be available for the upcoming release.

As Markus said, switching WIkipedia/DBpedia IRIs to Wikidata should be
trivial when Wikidata IRIs exist.

Best,
Dimitris

On Mon, Feb 6, 2017 at 4:04 PM, Shilad Sen  wrote:

> Whoops! Apologies for shorting your name to "Sam." Looks like the coffee
> has not yet kicked in this morning...
>
> On Mon, Feb 6, 2017 at 8:02 AM, Shilad Sen  wrote:
>
>> Hi Sam,
>>
>> The NLP task you are referring to is often called "wikification," and if
>> you Google using that term you'll find some hits for datasets. Here's the
>> first one I found: https://cogcomp.cs.illinois.edu/page/resource_view/4
>>
>> I also have a full EN corpus marked up by a simple Wikification
>> algorithm. It's not very good, but you are welcome to it!
>>
>> -Shilad
>>
>> On Mon, Feb 6, 2017 at 3:28 AM, Samuel Printz 
>> wrote:
>>
>>> Hello Markus,
>>>
>>> to take a Wikipedia-annotated corpus and replace the the Wikipedia-URIs
>>> by the respective Wikidata-URIs is a great idea, I think I'll try that
>>> out.
>>>
>>> Thank you!
>>>
>>> Samuel
>>>
>>>
>>> Am 05.02.2017 um 21:40 schrieb Markus Kroetzsch:
>>> > On 05.02.2017 15:47, Samuel Printz wrote:
>>> >> Hello everyone,
>>> >>
>>> >> I am looking for a text corpus that is annotated with Wikidata
>>> entites.
>>> >> I need this for the evaluation of an entity linking tool based on
>>> >> Wikidata, which is part of my bachelor thesis.
>>> >>
>>> >> Does such a corpus exist?
>>> >>
>>> >> Ideal would be a corpus annotated in the NIF format [1], as I want to
>>> >> use GERBIL [2] for the evaluation. But it is not necessary.
>>> >
>>> > I don't know of any such corpus, but Wikidata is linked with Wikipedia
>>> > in all languages. You can therefore take any Wikipedia article and
>>> > find, with very little effort, the Wikidata entity for each link in
>>> > the text.
>>> >
>>> > The downside of this is that Wikipedia pages do not link all
>>> > occurrences of all linkable entities. You can get a higher coverage
>>> > when taking only the first paragraph of each page, but many things
>>> > will still not be linked.
>>> >
>>> > However, you could also take any existing Wikipedia-page annotated
>>> > corpus and translate the links to Wikidata in the same way.
>>> >
>>> > Finally, DBpedia also is linked to Wikipedia (in fact, the local names
>>> > of entities are Wikipedia article names). So if you find any
>>> > DBpedia-annotated corpus, you can also translate it to Wikidata easily.
>>> >
>>> > Good luck,
>>> >
>>> > Markus
>>> >
>>> > P.S. If you build such a corpus from another resource, it would be
>>> > nice if you could publish it for others to save some effort :-)
>>> >
>>> >>
>>> >> Thanks for hints!
>>> >> Samuel
>>> >>
>>> >> [1] https://site.nlp2rdf.org/
>>> >> [2] http://aksw.org/Projects/GERBIL.html
>>> >>
>>> >>
>>> >> ___
>>> >> Wikidata mailing list
>>> >> Wikidata@lists.wikimedia.org
>>> >> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>> >>
>>> >
>>> > ___
>>> > Wikidata mailing list
>>> > Wikidata@lists.wikimedia.org
>>> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>>
>>
>> --
>> Shilad W. Sen
>>
>> Associate Professor
>> Mathematics, Statistics, and Computer Science Dept.
>> Macalester College
>>
>> Senior Research Fellow, Target Corporation
>>
>> s...@macalester.edu
>> http://www.shilad.com
>> https://www.linkedin.com/in/shilad
>> 651-696-6273 <(651)%20696-6273>
>>
>
>
>
> --
> Shilad W. Sen
>
> Associate Professor
> Mathematics, Statistics, and Computer Science Dept.
> Macalester College
>
> Senior Research Fellow, Target Corporation
>
> s...@macalester.edu
> http://www.shilad.com
> https://www.linkedin.com/in/shilad
> 651-696-6273
>
> 

Re: [Wikidata] Wikidata ontology

2017-01-06 Thread Dimitris Kontokostas
Hi,

In case it helps, there is also an a few months old version from the latest
DBpedia release for properties [1,2] and classes [3,4].
the properties do not contain the "rdf:type rdf:Property / owl:*Property"
definitions
and the current dump of the classes contain only subClassOf statements to
DBpedia classes based on some mappings described in [5]

[1]
http://downloads.dbpedia.org/2016-04/core-i18n/wikidata/properties_wikidata.ttl.bz2
[2]
http://downloads.dbpedia.org/preview.php?file=2016-04_sl_core-i18n_sl_wikidata_sl_properties_wikidata.ttl.bz2
(preview)
[3]
http://downloads.dbpedia.org/2016-04/core-i18n/wikidata/ontology_subclassof_wikidata.ttl.bz2
[4]
http://downloads.dbpedia.org/preview.php?file=2016-04_sl_core-i18n_sl_wikidata_sl_properties_wikidata.ttl.bz2
(preview)
[5]
http://www.semantic-web-journal.net/content/wikidata-through-eyes-dbpedia-1

On Thu, Jan 5, 2017 at 11:21 PM, Stas Malyshev 
wrote:

> Hi!
>
> > The best you can get in terms of "downloading the wikidata ontology"
> would be to
> > download all properties and all the items representing classes. We
> currently
> > don't have a separate dump for these. Also, do not expect this to be a
> concise
> > or consistent model that can be used for reasoning. You are bound to find
> > contradictions and lose ends.
>
> Also, Wikidata Toolkit (https://github.com/Wikidata/Wikidata-Toolkit)
> can be used to generate something like taxonomy - see e.g.
> http://tools.wmflabs.org/wikidata-exports/rdf/exports/
> 20160801/dump_download.html
>
> But one has to be careful with it as Wikidata may not (and frequently
> does not) follow assumptions that are true for proper OWL models - there
> are no limits on what can be considered a class, a subclass, an
> instance, etc. Same entity can be treated both as class and individual,
> and there may be some weird structures, including even outright errors
> such as cycles in subclass graph, etc. And, of course, it changes all
> the time :)
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 
Kontokostas Dimitris
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Determining Wikidata Usage in Wikipedia Pages

2016-11-24 Thread Dimitris Kontokostas
Hi,

A related DBpedia GSoC project from this summer is described here
http://www.mail-archive.com/dbpedia-discussion@lists.sourceforge.net/msg07828.html

Some preliminary results that bootstraped this project from ~1y ago are here
https://lists.wikimedia.org/pipermail/wikidata/2015-December/007757.html


On Thu, Nov 24, 2016 at 3:07 PM, Daniel Kinzler  wrote:

> Am 23.11.2016 um 21:33 schrieb Andrew Hall:
> > Hi,
> >
> > I’m a PhD student/researcher at the University of Minnesota who (along
> with Max
> > Klein and another grad student/researcher) has been interested in
> understanding
> > the extent to which Wikidata is used in (English, for now) Wikipedia.
> >
> > There seems to be no easy way to determine Wikidata usage in Wikipedia
> pages so
> > I’ll describe two approaches we’ve considered as our best attempts at
> solving
> > this problem. I’ll also describe shortcomings of each approach.
>
> There is two pretty easy ways, which you may not have found because they
> were
> added only a couple of months ago:
>
> You can look at the "page information" (action=info, linked from the
> sidebar),
> e.g.
>  Telescope=info>.
> Near the bottom you can find "Wikidata entities used in this page".
>
> The same information is available via an API module,
>  wbentityusage=South_Pole_Telescope>.
> See
>  query%2Bwbentityusage>
> for documentation.
>
>
> These URLs will list all direct and indirect usages, and also indicate
> what part
> or aspect of the entity was used.
>
> HTH
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 
Kontokostas Dimitris
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Normalized DBpedia datasets based on WikidataIDs / use cases

2016-10-18 Thread Dimitris Kontokostas
Hi Everyone,

Starting from this release DBpedia provides an alternate view of its data
with RDF dumps based on Wikidata IDs
http://wiki.dbpedia.org/dbpedia-version-2016-04

e.g.
 - disambiguations_en.ttl.bz2
<http://downloads.dbpedia.org/2016-04/core-i18n/en/disambiguations_en.ttl.bz2>
(DBpedia
uris)
 - disambiguations_wkd_uris_en.ttl.bz2
<http://downloads.dbpedia.org/2016-04/core-i18n/en/disambiguations_wkd_uris_en.ttl.bz2>
(the
same data but all DBpedia URIs are converted to wikidata based IDs)

We need these dumps for our ongoing tasks but we also want to share these
with the Wikidata community as we think they may be useful.

One of the side tasks that we have in our plans but never found enough
people to work on is to identify Wikipedia / Wikidata data overlaps as well
as data conflicts and identify areas where e.g. Wikidata data are fresher,
stalled or missing.

Another task that that pop up during a discussion with Lydia and Daniel in
the DBpedia meeting in Leipzig last month was to use these dumps and fix
errors in Wikidata. The example we discussed is with interlinks and
disambiguations when e.g. an interlink cluster consists of disambiguation
links except one (that is most probably wrong).
This was a real example that Daniel came up with and can be easily
identified with these dumps

Maybe there are other cases where these dumps can be useful but you can
have a better judge on this.

How to move on.
After a quick discussion, it was suggested to create tasks in Phabricator
for each task but before I proceed I wanted to get an initial community
feedback

Best,
Dimitris

-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig & DBpedia Association
Projects: http://dbpedia.org, http://rdfunit.aksw.org,
http://aligned-project.eu
Homepage: http://aksw.org/DimitrisKontokostas
Research Group: AKSW/KILT http://aksw.org/Groups/KILT
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] People who died in 2015 who were Dutch

2016-08-31 Thread Dimitris Kontokostas
Based on the other open related thread [1] there are references for the
deathDate of 1950 people [2]
I manually checked a random 5 pages and all had a reference "imported from
Wikipedia" so maybe this is a good start

(cc'ing wiki-cite after Dario's suggestion on the other thread)

Best,
Dimitris

[1] https://lists.wikimedia.org/pipermail/wikidata/2016-August/009447.html
[2] curl
http://downloads.dbpedia.org/temporary/citations/enwiki-20160305-citedFacts.tql.bz2
| bzcat | grep "deathDate"


On Thu, Jun 4, 2015 at 3:00 PM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> On 04.06.2015 12:17, Dimitris Kontokostas wrote:
> ...
>
>>
>> Another question: can DBpedia extract references from Wikipedia
>> articles too? If this would be possible, it might be feasible to
>> guess and suggest a reference (or a list of references). Especially
>> with things like date of death, one would expect that references
>> have a publication date very close to (but strictly after) the
>> event, which could narrow down the choices very much.
>>
>>
>> We don't extract them for now, although I think we could relatively
>> easily. The problem in this case would be that we cannot associate
>> references with facts. The DBpedia Information Extraction Framework is
>> quite module and can be easily extended with new extractors but it is
>> hard to make these extractors "talk to each other".
>> So we could easily get something like the following
>> dbp:A dbo:birthDate "..."
>> dbp:A dbo:deahthDate "..."
>> dbp:A dbo:reference dbp:r1 # and maybe " dbp:r1 something else"
>> depending on the modeling
>> dbp:A dbo:reference dbp:r2
>>
>> but not sure if this solves your problem
>>
>
> Yes, I understand that you can hardly get the association between
> extracted facts and references. My suggestion was to extract both
> independently and then to query for references that have a publication date
> close to a person's death so as to suggest them to users as a possible
> reference for the death-date fact. This would still require a manual check,
> since we cannot know if the guessed reference belongs to the date of death,
> but if it has a high precision it would be a worthwhile way of spending
> volunteer time to obtain confirmed references.
>
> At the same time, it might be one of the fastest ways to get sourced date
> of death into Wikidata, since news articles will usually appear before the
> major authority files are updated (so even if we get donations from them,
> some lag would remain). With such an extraction framework, one could
> establish a pipeline from Wikipedia to Wikidata.
>
> In the long run, references from authority files will become more valuable
> than news articles, because they are more long-lived.
>
> Best wishes,
>
> Markus
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 
Kontokostas Dimitris
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] (semi-)automatic statement references fro Wikidata from DBpedia

2016-08-28 Thread Dimitris Kontokostas
Hi,

I had this idea for some time now but never got to test/write it down.
DBpedia extracts detailed context information in Quads (where possible) on
where each triple came from, including the line number in the wiki text.
Although each DBpedia extractor is independent, using this context there is
a small window for combining output from different extractors, such as the
infobox statements we extract from Wikipedia and the very recent citation
extractors we announced [1]

I attach a very small sample from the article about Germany where I filter
out the related triples and order them by the line number they were
extracted from e.g.

dbr:Germany dbo:populationTotal "82175700"^^xsd:nonNegativeInteger  <
http://en.wikipedia.org/wiki/Germany?oldid=736355524#*absolute-line=66*
=Infobox_country=population_estimate=1&
wikiTextSize=10=10=8> .
<https://www.destatis.de/DE/PresseService/Presse/Pressemitteilungen/2016/08/
PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8D510CF89FDCBC74DBAE
9F.cae2?__blob=publicationFile> dbp:isCitedBy dbr:Germany <
http://en.wikipedia.org/wiki/Germany?oldid=736355524#*absolute-line=66*> .

Looking at the wikipedia article we see:
|population_estimate = 82,175,700{{cite web|url=https://www.destatis.
de/DE/PresseService/Presse/Pressemitteilungen/2016/08/PD16_295_12411pdf.pdf;
jsessionid=996EC2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=
publicationFile|title=Population at 82.2 million at the end of 2015 –
population increase due to high immigration|date=26 August 2016|work=
destatis.de}}

Could this approach be a good candidate reference suggestions in Wikidata?
(This particular one is already a reference but the anthem and GDP in the
attachment are not for example)

There are many things that can be done to improve the matching but before
getting into details I would like to see if this idea is worth exploring
more or not

Cheers,
Dimitris

[1] http://www.mail-archive.com/dbpedia-discussion%40lists.sourceforge.net/
msg07739.html

-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig & DBpedia Association
Projects: http://dbpedia.org, http://rdfunit.aksw.org,
http://aligned-project.eu
Homepage: http://aksw.org/DimitrisKontokostas
Research Group: AKSW/KILT http://aksw.org/Groups/KILT
<http://en.dbpedia.org/resource/Germany> 
<http://en.dbpedia.org/property/conventionalLongName> "Federal Republic of 
Germany"@en 
<http://en.wikipedia.org/wiki/Germany?oldid=736355524#absolute-line=10=Infobox_country=conventional_long_name>
 .
<http://en.dbpedia.org/resource/Germany> 
<http://en.dbpedia.org/property/commonName> "Germany"@en 
<http://en.wikipedia.org/wiki/Germany?oldid=736355524#absolute-line=19=Infobox_country=common_name>
 .
<http://en.dbpedia.org/resource/Germany> 
<http://en.dbpedia.org/property/nationalAnthem> "Deutschlandlied"@en 
<http://en.wikipedia.org/wiki/Germany?oldid=736355524#absolute-line=20=Infobox_country=national_anthem>
 .
<http://en.dbpedia.org/resource/Germany> 
<http://en.dbpedia.org/property/nationalAnthem> 
<http://en.dbpedia.org/resource/File:German_national_anthem_performed_by_the_US_Navy_Band.ogg>
 
<http://en.wikipedia.org/wiki/Germany?oldid=736355524#absolute-line=20=Infobox_country=national_anthem>
 .
<http://en.dbpedia.org/resource/Germany> <http://dbpedia.org/ontology/anthem> 
<http://en.dbpedia.org/resource/Deutschlandlied> 
<http://en.wikipedia.org/wiki/Germany?oldid=736355524#absolute-line=20=Infobox_country=national_anthem=1=398=78=15>
 .
<http://www.bundespraesident.de/DE/Amt-und-Aufgaben/Wirken-im-Inland/Repraesentation-und-Integration/repraesentation-und-integration-node.html>
 <http://en.dbpedia.org/property/isCitedBy> 
<http://en.dbpedia.org/resource/Germany> 
<http://en.wikipedia.org/wiki/Germany?oldid=736355524#absolute-line=20> .

<http://en.dbpedia.org/resource/Germany> 
<http://dbpedia.org/ontology/populationTotal> 
"82175700"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> 
<http://en.wikipedia.org/wiki/Germany?oldid=736355524#absolute-line=66=Infobox_country=population_estimate=1=10=10=8>
 .
<http://en.dbpedia.org/resource/Germany> 
<http://en.dbpedia.org/property/populationEstimate> 
"82175700"^^<http://www.w3.org/2001/XMLSchema#integer> 
<http://en.wikipedia.org/wiki/Germany?oldid=736355524#absolute-line=66=Infobox_country=population_estimate>
 .
<https://www.destatis.de/DE/PresseService/Presse/Pressemitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile>
 <http://en.dbpedia.org/property/isCitedBy> 
<http://en.dbpedia.org/resource/Germany> 
<http://en.wikipedia.org/wiki/Germany?oldid=736355524#absolute-line=66> .

<http://en.dbpedia.org/resource/Germany> 
<http://en.dbp

Re: [Wikidata] ntriples dump?

2016-08-27 Thread Dimitris Kontokostas
Hi Stats,

out of curiosity, can you give an example of triples that do not originate
from a single wikidata item / property?

for me turtle dumps are process-able only by RDF tools while nt-like dumps
both by rdf tools and other kind of scripts and I fild the former redundant

On Fri, Aug 26, 2016 at 11:52 PM, Stas Malyshev 
wrote:

> Hi!
>
> > Of course if providing both is easy, then there's no reason not to
> > provide both.
>
> Technically it's quite easy - you just run the same script with
> different options. So the only question is what is useful.
>
> > It is useful in such applications to know the online RDF documents in
> > which a triple can be found. The document could be the entity, or it
> > could be a physical location like:
> >
> > http://www.wikidata.org/entity/Q13794921.ttl
>
> That's where the tricky part is: many triples won't have specific
> document there since they may appear in many documents. Of course, if
> you merge all these documents in a dump, the triple would appear only
> once (we have special deduplication code to take care of that) but it's
> impossible to track it back to a specific document then. So I understand
> the idea, and see how it may be useful, but I don't see a real way to
> implement it now.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 
Kontokostas Dimitris
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] A property to exemplify SPARQL queries associated witha property

2016-08-24 Thread Dimitris Kontokostas
example sparql queries alone can be very helpful but I would suggest that
they can be accompanied with a short description explaining what the query
does

On Wed, Aug 24, 2016 at 3:21 PM, Navino Evans 
wrote:

> If you could store queries, you could also store queries for each item
>> that is about a list of things, so that the query returns exactly the
>> things that should be in the list ... could be useful.
>
>
> This also applies to a huge number of Wikipedia categories (the non
> subjective ones). It would be extremely useful to have queries describing
> them attached to the Wikidata items for the categories.
>
> On 24 August 2016 at 02:31, Ananth Subray  wrote:
>
>> मा
>> --
>> From: Stas Malyshev 
>> Sent: ‎24-‎08-‎2016 12:33 AM
>> To: Discussion list for the Wikidata project.
>> 
>> Subject: Re: [Wikidata] A property to exemplify SPARQL queries
>> associated witha property
>>
>> Hi!
>>
>> > Relaying a question from a brief discussion on Twitter [1], I am curious
>> > to hear how people feel about the idea of creating a a "SPARQL query
>> > example" property for properties, modeled after "Wikidata property
>> > example" [2]?
>>
>> Might be nice, but we need a good way to present the query in the UI
>> (see below).
>>
>> > This would allow people to discover queries that exemplify how the
>> > property is used in practice. Does the approach make sense or would it
>> > stretch too much the scope of properties of properties? Are there better
>> > ways to reference SPARQL examples and bring them closer to their source?
>>
>> I think it may be a good idea to start thinking about some way of
>> storing queries on Wikidata maybe? On one hand, they are just strings,
>> on the other hand, they are code - like CSS or Javascript - and storing
>> them just as strings may be inconvenient. Maybe .sparql file extension
>> handler like we have for .js and .json and so on?
>>
>> --
>> Stas Malyshev
>> smalys...@wikimedia.org
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
>
> --
> ___
>
> The Timeline of Everything
>
> www.histropedia.com
>
> Twitter  Facebo
> ok
>  Google +
> 
>L inke
> dIn
> 
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>


-- 
Kontokostas Dimitris
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] An attribute for "famous person"

2016-08-03 Thread Dimitris Kontokostas
Apologies for the brief and combined reply but I'm on mobile. Will try and
answer the DBpedia-related comments.

DBpedia provides out degree metrics ( number of outgoing links from an
article) and article size (in wikitext chars)  directly through related
extractors that can be used for ranking.  In-degree is easy to calculate
from the DBpedia dumps but iirc we did not include it in the last releases
(not sure why). Also in-degree is not provided in DBpedia-Live but the
other metrics are.

These metrics are nice but fail in some cases iirc years are heavily linked
but articles about a year e.g. 2000 is not so important.

I agree that a page rank metric would be the most appropriate in this case.
We have pagerank metrics iirc for en, de, nl and we are preparing a
Wikidata-based pagerank that will be presented in the DBpedia meeting in
Leipzig next month.

Best,
Dimitris

Typed by thumb. Please forgive brevity, errors.

On Aug 3, 2016 11:41, "Yuri Astrakhan"  wrote:

> Jane, now we are really going into the field of elastic search's relevancy
> calculation. When searching, things like popularity (pageviews), incoming
> links, number of different language wiki articles, article size, article
> quality (good/selected), and many other aspects could be used to better the
> results. I wish these were available together with the WDQS results,
> possibly as a number similar to Google's "page rank".
>
> On Wed, Aug 3, 2016 at 11:21 AM, Jane Darnell  wrote:
>
>> Too bad, because it would be great for all sorts of project workflows!
>>
>> On Wed, Aug 3, 2016 at 10:04 AM, Stas Malyshev 
>> wrote:
>>
>>> Hi!
>>>
>>> On 8/2/16 11:36 PM, Jane Darnell wrote:
>>> > Would page props also give me the creation date of the Wikipedia page
>>> in
>>> > that specific sitelink? Because this is something I needed when
>>>
>>> Don't think so and I don't think such data should be in Wikidata or WDQS
>>> database - it's Wikipedia administrative data and should be there.
>>>
>>> External service can combine data from these sources but I don't think
>>> it falls under WDQS tasks.
>>> --
>>> Stas Malyshev
>>> smalys...@wikimedia.org
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] 7th DBpedia Community Meeting in Leipzig - 15/09/2016

2016-06-24 Thread Dimitris Kontokostas
Following our successful meetings in Europe & US our next DBpedia meeting
will be held at Leipzig on September 15th, co-located with SEMANTiCS:
http://2016.semantics.cc/

* Highlights *

- Keynote by Lydia Pintscher, Wikidata

- A session for the “DBpedia references and citations challenge”:
http://wiki.dbpedia.org/ideas/idea/261/dbpedia-citations-reference-challenge/

- A session on DBpedia ontology by members of the DBpedia ontology
committee: http://mappings.dbpedia.org/index.php/DBpedia_Ontology_Committee

- Tell us what cool things you do with DBpedia: https://goo.gl/AieceU

- As always, there will be tutorials to learn about DBpedia

* Quick facts *

- Web URL: http://wiki.dbpedia.org/meetings/Leipzig2016

- Hashtag: #DBpediaLeipzig

- When: September 15th, 2016

- Where: University of Leipzig, Augustusplatz 10, 04109 Leipzig

- Call for Contribution: submission form <https://goo.gl/AieceU>

- Registration: Free to participate but only through registration
<https://event.gg/3396-7th-dbpedia-community-meeting-in-leipzig-2016>
(Option for DBpedia support tickets)
https://event.gg/3396-7th-dbpedia-community-meeting-in-leipzig-2016

* Sponsors and Acknowledgments *

- University of Leipzig (http://www.uni-leipzig.de/)

- National Library of the Netherlands (http://www.kb.nl/)

- ALIGNED Project (http://aligned-project.eu/)

- Institute for Applied Informatics (InfAI, http://infai.org/en/AboutInfAI)

- OpenLink Software (http://www.openlinksw.com/)

- SEMANTICS Conference Sep 12-15, 2016 in Leipzig (http://2016.semantics.cc/
)

If you would like to become a sponsor for the 7th DBpedia Meeting, please
contact the DBpedia Association (dbpe...@infai.org)

* Organisation *

- Magnus Knuth, HPI, DBpedia German/Commons

- Monika Solanki, University of Oxford, DBpedia Ontology

- Julia Holze, DBpedia Association

- Dimitris Kontokostas, AKSW/KILT, DBpedia Association

- Sebastian Hellmann, AKSW/KILT, DBpedia ASsociation


-- 
Kontokostas Dimitris
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] DBpedia citations & references challenge

2016-06-07 Thread Dimitris Kontokostas
In the latest release (2015-10) DBpedia started exploring the citation and
reference data from Wikipedia and we were pleasantly surprised by the rich
data
<http://downloads.dbpedia.org/preview.php?file=2015-10_sl_core-i18n_sl_en_sl_citation_data_en.ttl.bz2>
we managed to extract.

   -

   citation_data_en.ttl.bz2
   <http://downloads.dbpedia.org/2015-10/core-i18n/en/citation_data_en.ttl.bz2>
   (sample
   
<http://downloads.dbpedia.org/preview.php?file=2015-10_sl_core-i18n_sl_en_sl_citation_data_en.ttl.bz2>
   )
   -

   citation_links_en.ttl.bz2
   <http://downloads.dbpedia.org/2015-10/core-i18n/en/citation_links_en.ttl.bz2>
   (sample
   
<http://downloads.dbpedia.org/preview.php?file=2015-10_sl_core-i18n_sl_en_sl_citation_links_en.ttl.bz2>
   )


This data holds huge potential, especially for the Wikidata challenge
of providing
a reference source for every statement. It describes not only a lot of
bibliographical data, but also a lot of web pages and many other sources
around the web.

The data we extract at the moment is quite raw and can be improved in many
different ways. Some of the potential improvements are:

   -

   Extend the citation extractor to handle other Wikipedia language editions
   <https://github.com/dbpedia/extraction-framework/issues/451>; currently
   only English Wikipedia is supported.
   -

   Map the data to a relevant Bibliographic ontology
   <https://github.com/dbpedia/mappings-tracker/issues/79> (there are many
   candidates and, although BIBO got most votes, we are open to other
   ontologies)
   -

   Map the data to existing Bibliographic LOD (eg TEL has 100M records,
   Worldcat 300M) or online books (eg Google Books). See the citationIri
   issue <https://github.com/dbpedia/extraction-framework/issues/452>.
   -

   Ways to merge / fuse identical citations from multiple articles
   -

   Use the citation data in the Wikidata primary sources tool
   <https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool>
   -

   Surprise us with your ideas!


We welcome contributions that improve the existing citation dataset in any
way; and we are open to collaboration and helping. Results will be
presented at the next DBpedia meeting: 15 September 2016 in Leipzig,
co-located with SEMANTiCS 2016. Each participant should submit a short
description of his/her contribution by Monday 12 September 2016 and present
his/her work at the meeting. Comments, questions can be posted on the
DBpedia discussion & developer lists or in our new DBpedia ideas page
<http://wiki.dbpedia.org/ideas/idea/261/dbpedia-citations-reference-challenge/>
.

Submissions will be judged by the Organizing Committee and the best two
will receive a prize.

Organizing Committee

   -

   Vladimir Alexiev, Ontotext and DBpedia BG
   -

   Anastasia Dimou, Ghent University, iMinds
   - Dimitris Kontokostas, KILT/AKSW, DBpedia Association



-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig & DBpedia Association
Projects: http://dbpedia.org, http://rdfunit.aksw.org,
http://aligned-project.eu
Homepage: http://aksw.org/DimitrisKontokostas
Research Group: AKSW/KILT http://aksw.org/Groups/KILT
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Is there a dump of wiki media items?

2016-05-30 Thread Dimitris Kontokostas
Hello Melvin,

Maybe DBpedia commons can be of help

http://svn.aksw.org/papers/2015/ISWCData_DBpediaCommons/public.pdf
http://commons.dbpedia.org

Cheers,
Dimitris

On Mon, May 30, 2016 at 1:40 AM, Federico Leva (Nemo) 
wrote:

> Melvin Carvalho, 29/05/2016 19:31:
>
>> Is there a way I can get a dump of media items in wikidata or wikimedia
>> commons
>>
>
>
> https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Media_tarballs
>
> Nemo
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 
Kontokostas Dimitris
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] People who died in 2015 who were Dutch

2016-01-27 Thread Dimitris Kontokostas
Coming back to an old thread. We now extract references from Wikipedia and
are available in the 2015-10 beta release

citation_data_en.ttl.bz2
<http://downloads.dbpedia.org/2015-10/core-i18n/en/citation_data_en.ttl.bz2>citation_links_en.ttl.bz2
<http://downloads.dbpedia.org/2015-10/core-i18n/en/citation_links_en.ttl.bz2>

any feedback is more than welcome


Best,

Dimitris


On Thu, Jun 4, 2015 at 3:00 PM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> On 04.06.2015 12:17, Dimitris Kontokostas wrote:
> ...
>
>>
>> Another question: can DBpedia extract references from Wikipedia
>> articles too? If this would be possible, it might be feasible to
>> guess and suggest a reference (or a list of references). Especially
>> with things like date of death, one would expect that references
>> have a publication date very close to (but strictly after) the
>> event, which could narrow down the choices very much.
>>
>>
>> We don't extract them for now, although I think we could relatively
>> easily. The problem in this case would be that we cannot associate
>> references with facts. The DBpedia Information Extraction Framework is
>> quite module and can be easily extended with new extractors but it is
>> hard to make these extractors "talk to each other".
>> So we could easily get something like the following
>> dbp:A dbo:birthDate "..."
>> dbp:A dbo:deahthDate "..."
>> dbp:A dbo:reference dbp:r1 # and maybe " dbp:r1 something else"
>> depending on the modeling
>> dbp:A dbo:reference dbp:r2
>>
>> but not sure if this solves your problem
>>
>
> Yes, I understand that you can hardly get the association between
> extracted facts and references. My suggestion was to extract both
> independently and then to query for references that have a publication date
> close to a person's death so as to suggest them to users as a possible
> reference for the death-date fact. This would still require a manual check,
> since we cannot know if the guessed reference belongs to the date of death,
> but if it has a high precision it would be a worthwhile way of spending
> volunteer time to obtain confirmed references.
>
> At the same time, it might be one of the fastest ways to get sourced date
> of death into Wikidata, since news articles will usually appear before the
> major authority files are updated (so even if we get donations from them,
> some lag would remain). With such an extraction framework, one could
> establish a pipeline from Wikipedia to Wikidata.
>
> In the long run, references from authority files will become more valuable
> than news articles, because they are more long-lived.
>
> Best wishes,
>
> Markus
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 
Kontokostas Dimitris
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] [ANN] 6th DBpedia Community meeting / Feb 12th in The Hague

2016-01-15 Thread Dimitris Kontokostas
** 6th DBpedia Community Meeting in The Hague 2016 **

Following our successful meetings in Europe & US our next DBpedia meeting
will be held at The Hague on February 12th, hosted by the National Library
of the Netherlands.


*Highlights*
 - Discussion about the Dutch DBpedia becoming the first chapter with
institutional support of the new DBpedia
 - This meeting we would like to issue a Call for Showcases. If you wish to
present your DBpedia showcase, please send an email to  dbpe...@infai.org
 or fill out http://goo.gl/forms/lVkXnPGiEk
 - Participants can give final feedback for the new Charter of the DBpedia
Association (http://tinyurl.com/dbpedia-assocation-charter)
 - A session on DBpedia ontology by members of the DBpedia ontology
committee (http://mappings.dbpedia.org/index.php/DBpedia_Ontology_Committee)
 - An introduction to the new Clariah project (http://www.clariah.nl/)
which will use Linked Data and DBpedia at the core of its infrastructure
 - The LIDER Project (http://lider-project.eu/ ) and its communities has
successfully bootstrapped a Linguist Linked Data Cloud (
http://linguistic-lod.org/llod-cloud ) which will now be integrated into
DBpedia+
 - As always, there will be tutorials to learn about DBpedia

*Quick facts*
 - Web URL: http://wiki.dbpedia.org/meetings/TheHague2016
 - Hashtag: #DBpediaDenHaag
 - When: February 12th, 2016
 - Where: Prins Willem-Alexanderhof 5, 2595 BE The Hague,  Netherlands
(directions)
 - Host: National Library of the Netherlands (http://www.kb.nl)
 - Call for Contribution: http://goo.gl/forms/lVkXnPGiEk
 - Registration: Free to participate but only through registration (Option
for DBpedia support tickets):
https://event.gg/2245-6th-dbpedia-meeting-in-the-hague-2016

*Sponsors and Acknowledgments*
National Library of the Netherlands (http://www.kb.nl)
ALIGNED Project (http://aligned-project.eu/)
Institute for Applied Informatics (InfAI, http://infai.org/en/AboutInfAI )
OpenLink Software (http://www.openlinksw.com/ )
SEMANTICS Conference Sep 12-15, 2016 in Leipzig (http://2016.semantics.cc )

If you would like to become a sponsor for the 6th DBpedia Meeting, please
contact the DBpedia Association dbpe...@infai.org

*Organisation*
 - Enno Meijers, National Library of the Netherlands, Dutch DBpedia
 - Gerard Kuys, Ordina, Dutch DBpedia
 - Gerald Wildenbeest, Saxion, Dutch DBpedia
 - Richard Nagelmaeker, Dutch DBpedia

-- 
Kontokostas Dimitris
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Wikidata usage within Wikipedia

2015-12-01 Thread Dimitris Kontokostas
Hi,

I was wondering if there are some statistics about the usage of Wikidata
within Wikipedia (in templates or directly through articles).
I tried to calculate it through DBpedia but not sure if the numbers I got
are correct

e.g. tracking these templates
<https://docs.google.com/document/d/1tKpTriL9wZ8BTcXTEEcS5kS_GUys8_7ioKD4Cnl8WhI/edit>
and
testing English Wikipedia (October dump) I got
173 usages in templates (out of which 45 in sandbox templates) and
and ~7.8K in articles with ~30 using arbitrary access (from=)

Trying the same in itwiki for testing I got
189 usages in templates (75 sandbox)
and ~300 in articles
but maybe Italian Wikipedia uses other wrapper templates that I did not
look into

Any pointers / hints are appreciated

Cheers,
Dimitris

-- 
Kontokostas Dimitris
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Wrong template usage in Wikipedia (feedback)

2015-11-17 Thread Dimitris Kontokostas
Hi,

probably not the right mailing list for this mail but not sure where to ask
:)

Based on DBpedia dumps I created a script that identifies usage of
undefined templates in Wikipedia, most times due to spelling mistakes.

https://docs.google.com/spreadsheets/d/1_9szZwij4fJujiFUFcsndiDkT_XpTKlgKRHi5MHzRlA/edit#gid=38776559

I was wondering if it makes sense to extend this script and
 - provide suggestions based in string similarity metrics
 - extend this in infobox properties and report properties that are not
defined in the template definitions (And also provide suggestions from
existing properties)

I did create a one-time dump for all of the above for the Greek Wikipedia
4-5 years ago but not sure if Wikipedia maintains this automatically now

note that this is based on the Oct dump and might be a little out of date

Cheers,
Dimitris

-- 
Kontokostas Dimitris
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wrong template usage in Wikipedia (feedback)

2015-11-17 Thread Dimitris Kontokostas
Hi Scott, great to meet you as well!

Interesting, but this does not look related to my question. Better post it
in a separate thread.
Actually my question too is not so related to this list but I was hoping
someone could give me some feedback / point me to the right people.

(I know my script needs some tweaking for magic word templates, etc. but
wanted to see if spending more time on this is worth the trouble)

Cheers,
Dimitris

On Tue, Nov 17, 2015 at 8:50 PM, Scott MacLeod <
worlduniversityandsch...@gmail.com> wrote:

> Hi Dimitris and Wikidatans,
>
> Great to meet you recently at Stanford (DBpedia), Dimitris.
>
> Could you possibly please take the (initial) lead and help me to develop
> this World University and School (WUaS) plan (and templates)  -
> https://docs.google.com/document/d/1n02XHzbTE8rY14p-ei6WSNtCQZ8AYscgwI6ncMXolpo/edit?usp=sharing
> (which Lydia suggested I write) - in Wikidata (which CC WUaS donated to CC
> Wikidata for its third birthday) - and inter-lingually? Thanks.
>
> Best regards,
> Scott
>
>
>
>
>
> On Tue, Nov 17, 2015 at 9:43 AM, Dimitris Kontokostas <jimk...@gmail.com>
> wrote:
>
>> Hi,
>>
>> probably not the right mailing list for this mail but not sure where to
>> ask :)
>>
>> Based on DBpedia dumps I created a script that identifies usage of
>> undefined templates in Wikipedia, most times due to spelling mistakes.
>>
>>
>> https://docs.google.com/spreadsheets/d/1_9szZwij4fJujiFUFcsndiDkT_XpTKlgKRHi5MHzRlA/edit#gid=38776559
>>
>> I was wondering if it makes sense to extend this script and
>>  - provide suggestions based in string similarity metrics
>>  - extend this in infobox properties and report properties that are not
>> defined in the template definitions (And also provide suggestions from
>> existing properties)
>>
>> I did create a one-time dump for all of the above for the Greek Wikipedia
>> 4-5 years ago but not sure if Wikipedia maintains this automatically now
>>
>> note that this is based on the Oct dump and might be a little out of date
>>
>> Cheers,
>> Dimitris
>>
>> --
>> Kontokostas Dimitris
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
>
> --
>
> - Scott MacLeod - Founder & President
> - Please donate to tax-exempt 501 (c) (3)
> - World University and School
> - via PayPal, or credit card, here -
> - http://worlduniversityandschool.org
> - or send checks to
> - 415 480 4577
> - PO Box 442, (86 Ridgecrest Road), Canyon, CA 94516
> - World University and School - like Wikipedia with best STEM-centric
> OpenCourseWare - incorporated as a nonprofit university and school in
> California, and is a U.S. 501 (c) (3) tax-exempt educational organization.
>
> World University and School is sending you this because of your interest
> in free, online, higher education. If you don't want to receive these,
> please reply with 'unsubscribe' in the body of the email, leaving the
> subject line intact. Thank you.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>


-- 
Kontokostas Dimitris
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Blazegraph

2015-10-28 Thread Dimitris Kontokostas
On Tue, Oct 27, 2015 at 10:24 PM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> On 27.10.2015 15:34, Paul Houle wrote:
>
>> One thing I really liked about Kasabi was that it had a simple interface
>> for people to enter queries and share them with people.  The
>> "Information Workbench" from fluidOps does something similar although I
>> never seen it open to the public.  A database of queries also is a great
>> tool for testing both the code and the documentation,  both of the
>> reference and cookbook kind.
>>
>
> Have you had a look at http://wikidata.metaphacts.com/? It has some
> interesting data presentation/visualisation features that are tied in with
> a SPARQL endpoint over Wikidata (not sure if it is the same one now).
>
>
>> I see no reason why one instance of Blazegraph is having all the fun.
>> With a good RDF dump,  people should be loading Wikidata into all sorts
>> of triple stores and since Wikidata is not that terribly big at this
>> time,  "alternative" endpoints ought to be cheap and easy to run
>>
>
> Definitely. However, there is some infrastructural gap between loading a
> dump once in a while and providing a *live* query service. Unfortunately,
> there are no standard technologies that would routinely enable live updates
> of RDF stores, and Wikidata is rather low-tech when it comes to making its
> edits available to external tools. One could set up the code that is used
> to update query.wikidata.org (I am sure it's available somewhere), but
> it's still some extra work.
>

DBpedia Live does that for some years now. The only thing that is
non-standard in DBpedia Live are the changeset format but now this is
covered by LDPAtch
http://www.w3.org/TR/ldpatch/

At the moment DBpedia Live only produces the changeset that other servers
can consume.
The actual SPARQL Endpoint is located in an Openlink server and we  already
use the same model to feed & update an LDF Endpoint (Still in beta)


>
> Regards,
>
> Markus
>
>
>
>>
>>
>> On Mon, Oct 26, 2015 at 11:31 AM, Kingsley Idehen
>> > wrote:
>>
>> On 10/25/15 10:51 AM, James Heald wrote:
>>
>>> Hi Gerard.  Blazegraph is the name of the open-source SPARQL
>>> engine being used to provide the Wikidata SPARQL service.
>>>
>>> So Blazegraph **is** available to all of us, at
>>> https://query.wikidata.org/ , via
>>>
>>> both the query editor, and the SPARQL API endpoint.
>>>
>>> It's convenient to talk describe some issues with the SPARQL
>>> service being "Blazegraph issues", if the issues appear to lie
>>> with the query engine.
>>>
>>> Other query engines that other people be running might be running
>>> might have other specific issues, eg "Virtuoso issues".  But it is
>>> Blazegraph that the Discovery team and Wikidata have decided to go
>>> with.
>>>
>>
>> The beauty of SPARQL is that you can use URLs to show query results
>> (and even query definitions). Ultimately, engine aside, there is
>> massive utility in openly sharing queries and then determining what
>> might the real problem.
>>
>> Let's use open standards to work in as open a fashion as is possible.
>>
>> --
>> Regards,
>>
>> Kingsley Idehen
>> Founder & CEO
>> OpenLink Software
>> Company Web:http://www.openlinksw.com
>> Personal Weblog 1:http://kidehen.blogspot.com
>> Personal Weblog 2:http://www.openlinksw.com/blog/~kidehen
>> Twitter Profile:https://twitter.com/kidehen
>> Google+ Profile:https://plus.google.com/+KingsleyIdehen/about
>> LinkedIn Profile:http://www.linkedin.com/in/kidehen
>> Personal WebID:
>> http://kingsley.idehen.net/dataspace/person/kidehen#this
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org 
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>>
>>
>> --
>> Paul Houle
>>
>> *Applying Schemas for Natural Language Processing, Distributed Systems,
>> Classification and Text Mining and Data Lakes*
>>
>> (607) 539 6254paul.houle on Skype ontolo...@gmail.com
>> 
>>
>> :BaseKB -- Query Freebase Data With SPARQL
>> http://basekb.com/gold/
>>
>> Legal Entity Identifier Lookup
>> https://legalentityidentifier.info/lei/lookup/
>> 
>>
>> Join our Data Lakes group on LinkedIn
>> https://www.linkedin.com/grp/home?gid=8267275
>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 
Kontokostas Dimitris
___
Wikidata 

Re: [Wikidata] DBpedia-based RDF dumps for Wikidata

2015-05-28 Thread Dimitris Kontokostas
Thank you Hugh,

This is definitely an area where we need further feedback from the
community.

Most of these links are DBpedia language links. The majority of the DBpedia
links are not dereferencable and are based on the different DBpedia
language editions provided as RDF dumps only.

In this release we decided to provide them all for completeness but we are
very open to other suggestions.

Best,
Dimitris
On May 26, 2015 18:20, Hugh Glaser h...@glasers.org wrote:

 Thanks Dimitris - well done to the whole team.

 In case it helps anyone, I have brought up a sameAs store for the sameAs
 relations in this dataset alone:
 http://sameas.org/store/wikidata_dbpedia/

 In passing, it is interesting to note that the example URI,
 http://wikidata.dbpedia.org/resource/Q586 , has 110 sameAs URIs in this
 dataset alone.
 What price now the old view that everybody would use the same URIs for
 Things?!

 Best
 Hugh

  On 15 May 2015, at 11:28, Dimitris Kontokostas 
 kontokos...@informatik.uni-leipzig.de wrote:
 
  Dear all,
 
  Following up on the early prototype we announced earlier [1] we are
 happy to announce a consolidated Wikidata RDF dump based on DBpedia.
  (Disclaimer: this work is not related or affiliated with the official
 Wikidata RDF dumps)
 
  We provide:
   * sample data for preview http://wikidata.dbpedia.org/downloads/sample/
   * a complete dump with over 1 Billion triples:
 http://wikidata.dbpedia.org/downloads/20150330/
   * a  SPARQL endpoint: http://wikidata.dbpedia.org/sparql
   * a Linked Data interface: http://wikidata.dbpedia.org/resource/Q586
 
  Using the wikidata dump from March we were able to retrieve more that 1B
 triples, 8.5M typed things according to the DBpedia ontology along with 48M
 transitive types, 6.4M coordinates and 1.5M depictions. A complete report
 for this effort can be found here:
  http://svn.aksw.org/papers/2015/ISWC_Wikidata2DBpedia/public.pdf
 
  The extraction code is now fully integrated in the DBpedia Information
 Extraction Framework.
 
  We are eagerly waiting for your feedback and your help in improving the
 DBpedia to Wikidata mapping coverage
  http://mappings.dbpedia.org/server/ontology/wikidata/missing/
 
  Best,
 
  Ali Ismayilov, Dimitris Kontokostas, Sören Auer, Jens Lehmann, Sebastian
 Hellmann
 
  [1]
 http://www.mail-archive.com/dbpedia-discussion%40lists.sourceforge.net/msg06936.html
 
  --
  Dimitris Kontokostas
  Department of Computer Science, University of Leipzig  DBpedia
 Association
  Projects: http://dbpedia.org, http://http://aligned-project.eu
  Homepage:http://aksw.org/DimitrisKontokostas
  Research Group: http://aksw.org
 

 --
 Hugh Glaser
20 Portchester Rise
Eastleigh
SO50 4QS
 Mobile: +44 75 9533 4155, Home: +44 23 8061 5652




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata-l] DBpedia-based RDF dumps for Wikidata

2015-05-15 Thread Dimitris Kontokostas
Dear all,

Following up on the early prototype we announced earlier [1] we are happy
to announce a consolidated Wikidata RDF dump based on DBpedia.
(Disclaimer: this work is not related or affiliated with the official
Wikidata RDF dumps)

We provide:
 * sample data for preview http://wikidata.dbpedia.org/downloads/sample/
 * a complete dump with over 1 Billion triples:
http://wikidata.dbpedia.org/downloads/20150330/
 * a  SPARQL endpoint: http://wikidata.dbpedia.org/sparql
 * a Linked Data interface: http://wikidata.dbpedia.org/resource/Q586

Using the wikidata dump from March we were able to retrieve more that 1B
triples, 8.5M typed things according to the DBpedia ontology along with 48M
transitive types, 6.4M coordinates and 1.5M depictions. A complete report
for this effort can be found here:
http://svn.aksw.org/papers/2015/ISWC_Wikidata2DBpedia/public.pdf

The extraction code is now fully integrated in the DBpedia Information
Extraction Framework.

We are eagerly waiting for your feedback and your help in improving the
DBpedia to Wikidata mapping coverage
http://mappings.dbpedia.org/server/ontology/wikidata/missing/

Best,

Ali Ismayilov, Dimitris Kontokostas, Sören Auer, Jens Lehmann, Sebastian
Hellmann

[1]
http://www.mail-archive.com/dbpedia-discussion%40lists.sourceforge.net/msg06936.html

-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig  DBpedia Association
Projects: http://dbpedia.org, http://http://aligned-project.eu
Homepage:http://aksw.org/DimitrisKontokostas
Research Group: http://aksw.org
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] [Multimedia] Structured data on Commons update

2015-03-23 Thread Dimitris Kontokostas
In case this is interesting, DBpedia also provides structured data in RDF
from Wikimedia commons since last September
http://commons.dbpedia.org/

the data is based on commons dumps and thus it is not up to date.

Best,
Dimitris

On Mon, Mar 23, 2015 at 12:42 PM, Emmanuel Engelhart 
emmanuel.engelh...@wikimedia.ch wrote:

 Hi

 Thank you Keegan for this report, even if this is a pretty sad one.

 Even if not directly implicated in the structured data  Wikidata effort,
 we have many projects in Switzerland counting with a working structured
 data system on Commons.

 On 19.02.2015 21:11, Keegan Peterzell wrote:

 The meeting in Berlin[2] in October provided the engineering teams with
 a lot to start on. Unfortunately the Structured Data on Commons project
 was put on hold not too long after this meeting. Development of the
 actual Structured data system for Commons will not begin until more
 resources can be allocated to it.


 What kind of additional resources do you need? How much?

 Regards
 Emmanuel

 --
 Volunteer
 Technology, GLAM, Trainings
 Zurich
 +41 797 670 398

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l




-- 
Kontokostas Dimitris
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-tech] Wikidata Query Backend Update (take two!)

2015-03-06 Thread Dimitris Kontokostas
On Mar 5, 2015 8:50 PM, Nikolas Everett never...@wikimedia.org wrote:

 TL/DR: We're selected BlazeGraph to back the next Wikidata Query Service.

 After Titan evaporated about a month ago we went back to the drawing
board on back ends for a new Wikidata Query Service.  We took four weeks
(including a planed trip to Berlin) to settle on a backend.  As you can see
from the spreadsheet we've really blown out the number of options.  As you
can also see we didn't finish filling them all out.  But we've still pretty
much settled on BlazeGraph anyway.  Let me first explain what BlazeGraph is
and then defend our decision to stop spreadsheet work.

 BlazeGraph is a GPLed RDF triple store that natively supports SPARQL 1.1,
RDFS, some OWL, and some extensions.  Those are all semantic web terms and
they translate into a its a graph database with an expressive, mostly
standardized query language and support for inferring stuff as data is
added and removed to the graph.  It also has some features that you'd
recognize from nice relational databases: join order rewriting, smart query
planner, hash and nested loop joins,  query rewrite rules, group by, order
by, and aggregate functions.

 These are all cool features - really the kind of things that we thought
we need but they come with an interesting price.  Semantic Web is a very
old thing that's had a really odd degree of success.  If you have an hour
and half Jim Hendler can explain it to you.  The upshot is that _tons_ of
people have _tons_ of opinions.  The W3C standardizes RDF, SPARQL, RDFS,
OWL, and about a billion other things.  There are (mostly non-W3C)
standards for talking about people, social connections, and music.  And
they all have rules.  And Wikidata doesn't.  Not like these rules.  One
thing I've learned from this project is that this lack of prescribed rules
is one of Wikidata's founding principles.  Its worth it to allow openness.
So you _can_ set gender to Bacon or put GeoCoordinants on Amber.
Anyway!  I argue that, at least for now, we should ignore many of these
standards.  We need to think of Wikidata Query Service as a tool to answer
questions instead of as a some grand statement about the semantic web.
Mapping existing ontologies onto Wikidata is a task for another day.

 I feel like these semantic web technologies and BlazeGraph in particular
are good fits for this project mostly because the quality of our but what
about X? questions is very very high.  How much inference should we do
instead of query rewriting? instead of Can we do inference?  Can we do
query rewriting?  And Which standard vocabularies should think about
mapping to Wikidata?  Holy cow!  In any other system there aren't
standard vocabularies to even talk about mapping, much less a mechanism
for mapping them.  Much less two!  Its almost an overwhelming wealth and as
I elude to above it can be easy to bikeshed.

 We've been reasonably careful to reach out people we know are familiar
with this space.  We're well aware of projects like the Wikidata Toolkit
and its RDF exports.  We've been using those for testing.  We've talked to
so many people about so many things.  Its really consumed a lot more time
then I'd expected and made the search for the next backend very long.  But
I feel comfortable that we're in a good place.  We don't know all the
answers but we're sure there _are_ answers.

 The BlazeGraph upstream has been super active with us.  They've spent
hours with us over hangouts, had me out to their office (a house an hour
and half from mine) to talk about data modeling, and spent a ton of time
commenting on Phabricator tickets.  They've offered to donate a formal
support agreement as well.  And to get together with us about writing any
features we might need to add to BlazeGraph.  And they've added me as a
committer (I told them I had some typos to fix but I have yet to actually
commit them).  And their code is well documented.

 So by now you've realized I'm a fan.  I believe that we should stop on
the spreadsheet and just start work against BlazeGraph because I think we
have phenomenal momentum with upstream.  And its a pretty clear winner on
the spreadsheet at this point.  But there are two other triple stores which
we haven't fully filled out that might be viable: OpenLink Virtuoso Open
Source and Apache Jena.  Virtuoso is open core so I'm really loath to go
too deep into it at his point.  Their HA features are not open source which
implies that we'd have trouble with them as an upstream.  Apache Jena just
isn't known to scale to data as large as BlazeGraph and Virtuoso.  So I
argue that these are systems that, in the unlikely event that BlazeGraph
goes the way of Titan, we should start our third round of investigation
against.  As it stands now I think we have a winner.

 We created a phabricator task with lots of children to run down our
remaining questions.  The biggest remaining questions revolve around three
areas:
 1.  Operational issues like how should the 

Re: [Wikidata-l] mapping template parameters using Wikidata?

2015-03-04 Thread Dimitris Kontokostas
On Wed, Mar 4, 2015 at 7:13 PM, Daniel Kinzler daniel.kinz...@wikimedia.de
wrote:

 Am 04.03.2015 um 18:00 schrieb Ricordisamoa:

  That's what Translatemplate https://tools.wmflabs.org/translatemplate/
 is for!
  (thanks to you and Daniel for the idea :-)
  It uses mwparserfromhell to parse DBpedia mappings and 'translate'
 templates
  in-place.
  I've added you to the service group so you can fiddle with it.
  The code got a bit hackish to work around a mwparserfromhell/Labs bug.
 If you're
  happy with the result we can publish it in Gerrit.

 Aren't the dbpedia template parameter mappings available in machine
 readable
 form somewhere? I mean, dbpedia is *using* them for extracting information
 somehow, right?

 Dimitris, can you enlighten us?


The framework has it's own MW parser and the design choice when the wiki
was implemented (very long time ago  before I got involved) was to use
wiki markup as the native syntax.
To be honest it makes sense. This what the users use to define mappings, we
can easily parse it and there is no need for another intermediate format
However we are working on an RDF representation of the mappings that people
will be able to get from our mapping server api. But this will be like an
export functionality, the native syntax will still be the wiki markup.



 --
 Daniel Kinzler
 Senior Software Developer

 Wikimedia Deutschland
 Gesellschaft zur Förderung Freien Wissens e.V.

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l




-- 
Kontokostas Dimitris
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] mapping template parameters using Wikidata?

2015-03-04 Thread Dimitris Kontokostas
Hi all,

What you can get from DBpedia is
1) template structure (all properties defined in a template)
I am not sure why this was not included in the 2014 release but you can see
an example in 3.9 [1]
Our parser cannot handle very complex templates but it is a good start.
I'll make sure these are included in the next release but it is also easy
to create a service that extracts them on request

2) mappings wiki
We are in the process of exporting our mappings in RDF using the [R2]RML
vocabulary. We have code that does that for simple mappings but it's not
ready to get merged yet.
Hopefully we'll have this soon and will be quite easy to query and join.
Even without that, we could get a partial functionality by translating 
matching properties from #1

3) mappings wiki (ontology)
links from ontology classes/properties to wikidata, at the moment they are
stored in our wiki but could be stored in Wikidata instead as Daniel
suggested.

Using all these and interlanguage links I think we can create a (decent)
service that can work. I can suggest a DBpedia gsoc project for this if
some people are willing to mentor a student [2].

What we would need from the Wikidata/DBpedia community is
1) more ontology links from DBpedia to Wikidata
2) contributions in the infobox mappings to cover more infoboxes for better
coverage

Best,
Dimitris


[1] http://downloads.dbpedia.org/3.9/en/template_parameters_en.ttl.bz2
[2] dbpedia.org/gsoc2015/ideas

On Tue, Mar 3, 2015 at 10:40 PM, Amir E. Aharoni 
amir.ahar...@mail.huji.ac.il wrote:

 Thanks, that's a step forward. Now the question is how to bring this all
 together.

 The context that interests me the most is translating an article in
 ContentTranslation. Let's go with an architect.[1] I am translating an
 article about an architect from English to Dutch, and it has {{Infobox
 architect}} at the top. How would ContentTranslation, a MediaWiki extension
 installed on the Wikimedia cluster, know that the name parameter is
 naam in Dutch?

 Currently, in theory, it would:
 1. Find that there's a corresponding infobox in Dutch using the
 interlanguage link:
 https://nl.wikipedia.org/wiki/Sjabloon:Infobox_architect
 2. Go to dbpedia and find the English template:
 http://mappings.dbpedia.org/index.php/Mapping_en:Infobox_architect
 3. Find that name is foaf:name
 4. Go to dbpedia and find the Dutch template:
 http://mappings.dbpedia.org/index.php/Mapping_nl:Infobox_architect
 5. Find that foaf:name is naam

 ... And then repeat steps 1 to 5 for each parameter.

 Is this something that is possible to query now? (I'm not even talking
 about performance.)

 Even if it is possible to query it, is it good to be dependent on an
 external website for this? Maybe it makes sense to import the data from
 dbpedia to Wikidata? It's absolutely not a rhetorical question - maybe it
 is OK to use dbpedia.

 [1] {{Infobox cricketer}} exists in the Dutch Wikipedia, but doesn' appear
 in the Dutch mappings in dbpedia.


 --
 Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
 http://aharoni.wordpress.com
 ‪“We're living in pieces,
 I want to live in peace.” – T. Moore‬

 2015-03-03 20:39 GMT+02:00 Daniel Kinzler daniel.kinz...@wikimedia.de:

 Am 03.03.2015 um 18:48 schrieb Amir E. Aharoni:
  Trying again... It's a really important topic for me.
 
  How do I go about proposing storing information about templates
 parameters
  mapping to the community? I kinda understand how Wikidata works, and it
 sounds
  like something that could be implemented using the current properties,
 but
  thoughts about moving this forward would be very welcome.

 Hi Amir!

 We had a call today with the dbPedia folks, about exactly this topic!

 The dbPedia mapping wiki[1] has this information, at least to some
 extent. Let's
 say you are looking at {{Cricketer Infobox}} on en. You can look out the
 DBPedia
 mappings for the template parameters on their mapping page[2]. There you
 can see
 that the country parameter maps to the country proeprty in the dbpedia
 ontology[2], which in turn uses owl:equivalentProperty to cross-link
 P17[4].

 I assume this info is also available in machine readable form somewhere,
 but I
 don't know where offhand.

 Today we discussed that this mapping should also be available in the
 opposite
 direction: on Wikidata, you can use P1628 (equivalent property) to
 cross-reference the dbPedia ontology. I just added this info to the
 country
 property.

 let me know if this helps :)
 -- daniel

 [1] http://mappings.dbpedia.org/index.php/
 [2] http://mappings.dbpedia.org/index.php/Mapping_en:Cricketer_Infobox
 [3] http://mappings.dbpedia.org/index.php/OntologyProperty:Country
 [4] https://www.wikidata.org/wiki/Property:P17

 --
 Daniel Kinzler
 Senior Software Developer

 Wikimedia Deutschland
 Gesellschaft zur Förderung Freien Wissens e.V.

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] mapping template parameters using Wikidata?

2015-03-04 Thread Dimitris Kontokostas
On Wed, Mar 4, 2015 at 11:07 AM, Stas Malyshev smalys...@wikimedia.org
wrote:

 Hi!

  architect}} at the top. How would ContentTranslation, a MediaWiki
  extension installed on the Wikimedia cluster, know that the name
  parameter is naam in Dutch?

 Name would be a bit tricky since I'm not sure if we have property
 called name but for something like date of birth wouldn't it be useful
 to link it in the template to
 https://www.wikidata.org/wiki/Property:P569 somehow? Is there such
 possibility?


In DBpedia we have our own properties and the mappings should use these
instead.
Some exceptions exist for very popular vocabularies such as foaf:name but I
am not sure if we should allow direct mappings to a wikidata property if an
equivalent DBpedia property exists.
In this case it's
http://mappings.dbpedia.org/index.php/OntologyProperty:BirthDate
We already have some mappings in place but more are needed for complete
coverage
http://mappings.dbpedia.org/index.php?title=Special%3ASearchsearch=wikidatago=Go


 With identifying properties, however - such as name - I'm not sure if
 this could be used.


I agree that general properties such as name are difficult to interpret



  Even if it is possible to query it, is it good to be dependent on an
  external website for this? Maybe it makes sense to import the data from
  dbpedia to Wikidata? It's absolutely not a rhetorical question - maybe
  it is OK to use dbpedia.

 Well, in dbpedia it says name is foaf:name, but this could only be
 appropriate for humans (and maybe only in specific contexts), for other
 entities name may have completely different semantics. In Wikidata,
 however, properties are generic, so I wonder if it would be possible to
 keep context. dbPedia obviously does have context but not sure where it
 would be in Wikidata.


We could keep the context in DBpedia and with proper inter-linking do many
interesting stuff.

As we discussed yesterday, we could use DBpedia Live and check for
updated/stalled/missing values.
For example, if the previous values were the same in DBpedia/Wikipedia 
Wikidata and e.g. Wikipedia changes a value we could trigger an update
alert, or if a new value such as deathDate in introduced that does not
exist in Wikidata.
DBpedia would use dbo:deathDate, but using the link to P570 we could allow
an agent to do the check



 --
 Stas Malyshev
 smalys...@wikimedia.org

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l




-- 
Kontokostas Dimitris
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] mapping template parameters using Wikidata?

2015-03-04 Thread Dimitris Kontokostas
On Wed, Mar 4, 2015 at 1:31 PM, Andy Mabbett a...@pigsonthewing.org.uk
wrote:

 On 3 March 2015 at 18:39, Daniel Kinzler daniel.kinz...@wikimedia.de
 wrote:

  The dbPedia mapping wiki[1] has this information, at least to some
 extent. Let's
  say you are looking at {{Cricketer Infobox}} on en. You can look out the
 DBPedia
  mappings for the template parameters on their mapping page[2]. There you
 can see
  that the country parameter maps to the country proeprty in the
 dbpedia
  ontology[2], which in turn uses owl:equivalentProperty to cross-link
 P17[4].

 Sounds good. We also have a problem on en.Wikipedia (and presumably
 elsewhere) of inconsistency in template parameter naming (latitude,
 lat and latd all mean the same thing, for example). Some templates
 even have to support multiple versions, for backwards compatibility.

 If as part of this exercise we could resolve that, it would be a
 bonus. Though I expect some resistance form those allergic to
 change...


If it helps, DBpedia can provide counts of property names per template.
we already use that to generate some mapping stats e.g.
http://mappings.dbpedia.org/server/statistics/en/


 --
 Andy Mabbett
 @pigsonthewing
 http://pigsonthewing.org.uk

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l




-- 
Kontokostas Dimitris
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] [Commons-l] [ANN] Experimental Wikimedia Commons RDF extraction with DBpedia

2014-07-30 Thread Dimitris Kontokostas
On Wed, Jul 30, 2014 at 9:11 AM, Luis Villa lvi...@wikimedia.org wrote:


 On Tue, Jul 29, 2014 at 7:32 PM, Gaurav Vaidya gau...@ggvaidya.com
 wrote:

  - This includes 363 license templates that indicate licensing for
 Commons files under public domain, Creative Commons and other open access
 licenses. These were created by bots and still require verification before
 use. They are listed at
 http://mappings.dbpedia.org/index.php/Category:Commons_media_license


 Interesting!


Good to hear that :)


 Is there documentation somewhere on how you ended up with those particular
 363 licenses? Failing that, a pointer at the relevant code would be welcome
 :)


This involved some manual work to gather the related templates and a bot to
import them in the DBpedia mappings wiki. See the following links for
details

https://commons.wikimedia.org/wiki/User:Gaurav/DBpedia/dcterms:license
https://github.com/gaurav/extraction-framework/issues/16
https://github.com/gaurav/extraction-framework/issues/18
https://github.com/gaurav/extraction-framework/issues/20
https://github.com/gaurav/extraction-framework/pull/30

The way we designed it with Gaurav there is no need to code anything to
change an existing licence mapping or add a new one
you just need to request editor rights for the DBpedia mappings wiki (
http://mappings.dbpedia.org)
Hard-coding this into code could probably give us more fine-grained control
but it would be much harder to adjust.

Best,
Dimitris


 Thanks-
 Luis


 --
 Luis Villa
 Deputy General Counsel
 Wikimedia Foundation
 415.839.6885 ext. 6810

 *This message may be confidential or legally privileged. If you have
 received it by accident, please delete it and let us know about the
 mistake. As an attorney for the Wikimedia Foundation, for legal/ethical
 reasons I cannot give legal advice to, or serve as a lawyer for, community
 members, volunteers, or staff members in their personal capacity. For more
 on what this means, please see our legal disclaimer
 https://meta.wikimedia.org/wiki/Wikimedia_Legal_Disclaimer.*

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l




-- 
Kontokostas Dimitris
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] 2nd International DBpedia Community Meeting

2014-07-15 Thread Dimitris Kontokostas
 are still looking for sponsors for travel grants (as
well as coffee and food for the sessions). If you are interested to
sponsor this meeting, please fill out this form to request more
information: http://goo.gl/LSwt4P 

Given we can acquire a sponsor, participants can apply for a travel
grant here: http://goo.gl/LSwt4P or email
Adrian. 
These grants will be awarded depending on the standing in the
community and community activity, e.g. Google Summer of Code
participation or Git Commits to DBpedia framework, activity on the
mailing lists, etc. 

We hope to see you all in Leipzig: 
* Adrian Paschke (DBpedia German Chapter  FU Berlin) 
* Harald Sack (DBpedia German Chapter  HPI Potsdam) 
* Sebastian Hellmann (DBpedia Association  AKSW Leipzig) 
* Heiko Ehrig (Neofonie) 
* Dimitris Kontokostas (DBpedia Association  AKSW Leipzig) 
* Magnus Knuth (DBpedia German Chapter  HPI Potsdam) 
* Alexandru Tudor (DBpedia German Chapter  FU Berlin)



--
Dimitris Kontokostas
  Department ofComputerScience, University of
  Leipzig
  Research Group:http://aksw.org
  Homepage:http://aksw.org/DimitrisKontokostas

  


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-tech] reviews needed for pubsubhubbub extension

2014-07-12 Thread Dimitris Kontokostas
On Thu, Jul 10, 2014 at 3:50 PM, Daniel Kinzler daniel.kinz...@wikimedia.de
 wrote:

 Am 09.07.2014 19:39, schrieb Dimitris Kontokostas:
  On Wed, Jul 9, 2014 at 6:13 PM, Daniel Kinzler 
 daniel.kinz...@wikimedia.de
  mailto:daniel.kinz...@wikimedia.de wrote:
 
  Am 09.07.2014 08:14, schrieb Dimitris Kontokostas:
   Maybe I am biased with DBpedia but by doing some experiments on
 English
   Wikipedia we found that the ideal update with OAI-PMH time was
 every ~5
  minutes.
   OAI aggregates multiple revisions of a page to a single edit
   so when we ask: get me the items that changed the last 5 minutes
 we skip the
   processing of many minor edits
   It looks like we lose this option with PubSubHubbub right?
 
  I'm not quite positive on this point, but I think with PuSH, this is
 done by the
  hub. If the hub gets 20 notifications for the same resource in one
 minute, it
  will only grab and distribute the latest version, not all 20.
 
  But perhaps someone from the PuSH development team could confirm
 this.
 
 
  It 'd be great if the dev team can confirm this.
  Besides push notifications, is polling an option in PuSH? I briefed
 through the
  spec but couldn't find this.

 Yes. You can just poll the interface that the hub uses to fetch new data.


Thanks for the info Daniel

I'm waiting for the dev to confirm the revision merging and one last
question / use case from me.

Since you'll sync to an external server (in Google right?), did you set any
requirements on the durability of the changesets?
I mean, are the changes stored *for ever* or did you set any ttl?
e.g. my application breaks for a week and I want to resume, or I download a
one-month old dump and want to get in sync, etc

In OAI-PMH I could for instance set the date to 15/01/2001 and get all
pages by modification date
In PuSH this would require some sort of importing and is probably out of
the question right? :)

Cheers,
Dimitris




 -- daniel

 --
 Daniel Kinzler
 Senior Software Developer

 Wikimedia Deutschland
 Gesellschaft zur Förderung Freien Wissens e.V.




-- 
Kontokostas Dimitris
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] reviews needed for pubsubhubbub extension

2014-07-09 Thread Dimitris Kontokostas
Hi,

Is it easy to brief the added value (or supported use cases) by switching
to PubSubHubbub?
The edit stream in Wikidata is so huge that I can hardly think of anyone
wanting to be in *real-time* sync with Wikidata
With 20 p/s their infrastructure should be pretty scalable to not break.

Maybe I am biased with DBpedia but by doing some experiments on English
Wikipedia we found that the ideal update with OAI-PMH time was every ~5
minutes.
OAI aggregates multiple revisions of a page to a single edit
so when we ask: get me the items that changed the last 5 minutes we skip
the processing of many minor edits

It looks like we lose this option with PubSubHubbub right?
As we already asked before, does PubSubHubbub supports mirroring a wikidata
clone? The OAI-PMH extension has this option

Best,
Dimitris




On Tue, Jul 8, 2014 at 11:31 AM, Daniel Kinzler daniel.kinz...@wikimedia.de
 wrote:

 Replying to myself because I forgot to mention an important detail:

 Am 08.07.2014 10:22, schrieb Daniel Kinzler:
  Am 08.07.2014 01:46, schrieb Rob Lanphier:
  On Fri, Jul 4, 2014 at 7:16 AM, Lydia Pintscher 
 lydia.pintsc...@wikimedia.de
  ...
  Hi Lydia,
 
  Thanks for providing the basic overview of this.  Could you (or someone
 on the
  team) provide an explanation about how you would like this to be
 configured on
  the Wikimedia cluster?
 
  We'd like to enable it just on Wikidata at first, but I see no reason
 not to
  enable it for all projects if that goes well.
 
  The PubSubHubbub (PuSH) extension would be configured to push
 notifications to
  the google hub (two per edit). The hub then notifies any subscribers via
 their
  callback urls.

 We need a proxy to be set up to allow the app servers to talk to the
 google hub.
 If this is deployed on full scale, we expect in excess of 20 POST requests
 per
 second (two per edit), plus up to the same number (but probably fewer) of
 GET
 requests coming back from the hub, asking for the full page content of
 every
 page changed, as XML export, from a special page interface similar to
 Special:Export. This would probably bypass the web cache.

 PubSubHubbub is nice and simple, but it's really designed for news feeds,
 not
 for versioned content of massive collaborative sites. It works, but it's
 not as
 efficient as we could wish.

 --
 Daniel Kinzler
 Senior Software Developer

 Wikimedia Deutschland
 Gesellschaft zur Förderung Freien Wissens e.V.

 ___
 Wikidata-tech mailing list
 Wikidata-tech@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-tech




-- 
Kontokostas Dimitris
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] reviews needed for pubsubhubbub extension

2014-07-09 Thread Dimitris Kontokostas
On Wed, Jul 9, 2014 at 6:13 PM, Daniel Kinzler daniel.kinz...@wikimedia.de
wrote:

 Am 09.07.2014 08:14, schrieb Dimitris Kontokostas:
  Hi,
 
  Is it easy to brief the added value (or supported use cases) by switching
  to PubSubHubbub?

 * It's easier to handle than OAI, because it uses the standard dump format.
 * It's also push-based, avoiding constant polling on small wikis.
 * The OAI extension has been deprecated for a long time now.

  The edit stream in Wikidata is so huge that I can hardly think of anyone
 wanting
  to be in *real-time* sync with Wikidata
  With 20 p/s their infrastructure should be pretty scalable to not break.

 The push aspect is probably most useful for small wikis. It's true, for
 large
 wikis, you could just poll, since you would hardly ever poll in vain.

 IT would be very nice if the sync could be filtered by namespace,
 category, etc.
 But PubSubHubbub (i'll use PuSH from now on) doesn't really support
 this, sadly.

  Maybe I am biased with DBpedia but by doing some experiments on English
  Wikipedia we found that the ideal update with OAI-PMH time was every ~5
 minutes.
  OAI aggregates multiple revisions of a page to a single edit
  so when we ask: get me the items that changed the last 5 minutes we
 skip the
  processing of many minor edits
  It looks like we lose this option with PubSubHubbub right?

 I'm not quite positive on this point, but I think with PuSH, this is done
 by the
 hub. If the hub gets 20 notifications for the same resource in one minute,
 it
 will only grab and distribute the latest version, not all 20.

 But perhaps someone from the PuSH development team could confirm this.


It 'd be great if the dev team can confirm this.
Besides push notifications, is polling an option in PuSH? I briefed through
the spec but couldn't find this.



  As we already asked before, does PubSubHubbub supports mirroring a
 wikidata
  clone? The OAI-PMH extension has this option

 Yes, there is a client extension for PuSH, allowing for seemless
 replication of
 one wiki into another, including creation and deletion (I don't know about
 moves/renames).

 --
 Daniel Kinzler
 Senior Software Developer

 Wikimedia Deutschland
 Gesellschaft zur Förderung Freien Wissens e.V.




-- 
Kontokostas Dimitris
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-l] Making Wikidata entries at the time of 'Article for Creation' publication

2014-03-12 Thread Dimitris Kontokostas
On Wed, Mar 12, 2014 at 5:00 PM, David Cuenca dacu...@gmail.com wrote:

 On Wed, Mar 12, 2014 at 2:52 PM, Andy Mabbett 
 a...@pigsonthewing.org.ukwrote:

 I'm not sugegsting that we make people enter even more information in
 Wikipedia; I'm suggesting that wikidata would benefit from capturing
 the data that is /already/ being entered into Wikipedia, not least via
 AfC, by the people I describe above; and that I and others who review
 and publish those articles would benefit from tool to save us the
 manual task of having to retype (into Wikidata) what we're already
 asked to type once (into the AfC tool) as part of that process.


 We cannot get there yet, since we depend on many features still in
 development:
 1.- Simple data editing from VisualEditor
 2.- Easy way to map wikipedia template fields to wikidata properties


Actually there is a lot of work done in this regard.
We map Wikipedia templates to the DBpedia ontology [1] and already started
marking equivalent properties to Wikidata (see [2] [3])

Best,
Dimitris

[1] http://mappings.dbpedia.org/index.php/Main_Page
[2] http://mappings.dbpedia.org/index.php/OntologyProperty:Family
[3]
https://github.com/dbpedia/extraction-framework/wiki/GSOC2013_Progress_Hady-Elsahar


 3.- Migration of main infobox templates to make use of Wikidata

 There are many needed features still not done, plus some more, which take
 a long time to discuss, implement, and test. Of course when all that is
 available then you should be able to have an infobox selection wizard
 (possibly based on this structure [1]), and then by editing the fields on
 VisualEditor the data would be automatically filled on Wikidata.

 As said, it sounds easy, but this has many prerequisites that are still
 not met.

 My only advice: patience :-)

 And if you want meanwhile you can help with the infobox mappings:
 https://www.wikidata.org/wiki/Wikidata:Infobox_mappings

 Cheers,
 Micru

 [1] https://en.wikipedia.org/wiki/Wikipedia:List_of_infoboxes

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l




-- 
Kontokostas Dimitris
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] ontology Wikidata API, managing ontology structure and evolutions

2014-01-09 Thread Dimitris Kontokostas
What about monthly/dump-based aggregated property usage statistics?
People would be able to check property trends or maybe subscribe to
specific properties via rss.



On Thu, Jan 9, 2014 at 3:55 PM, Daniel Kinzler
daniel.kinz...@wikimedia.dewrote:

 Am 08.01.2014 16:20, schrieb Thomas Douillard:
  Hi, a problem seems (not very surprisingly) to emerge into Wikidata : the
  managing of the evolution of how we do things on Wikidata.
 
  Properties are deleted, which made some consumer of the datas sometimes
 a little
  frustrated they are not informed of that and could not take part of the
 discussion.

 They are informed if they follow the relevant channels. There's no way to
 inform
 them if they don't. These channels can very likely be improved, yes.

 That being said: a property that is still widely used should very rarely be
 deleted, if at all. Usually, properties would be phased out by replacing
 them
 with another property, and only then they get deleted.

 Of course, 3rd parties that rely on specific properties would still face
 the
 problem that the property they use is simply no longer used (that's the
 actual
 problem - whether it is deleted doesn't really matter, I think).

 So, the question is really: how should 3rd party users be notified in
 changes of
 policy and best practice regarding the usage and meaning of properties?

 That's an interesting question, one that doesn't have a technical solution
 I can
 see.

 -- daniel


 --
 Daniel Kinzler
 Senior Software Developer

 Wikimedia Deutschland
 Gesellschaft zur Förderung Freien Wissens e.V.

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l




-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Cooperation between Wikidata and DBpedia and Wikipedia

2013-08-26 Thread Dimitris Kontokostas
Speaking from DBpedia (not on behalf), we have been always trying to find
ways to contribute data back to wikipedia and If licencing is the only
issue here I am sure we can make any necessary arrangements.

imho the main scepticism so far was trust from the WIkipedia community to
load data in bulk.
However, there are datasets of very high quality in DBpedia that could be
used for that purpose.

Recently we are experimenting in an alternative where people can manually
import single facts in Wikidata from the DBpedia resource interface.
Here's a recent talk about this [1] on the tech list. Any comments on that
are also welcome.

Best,
Dimitris

[1]
http://lists.wikimedia.org/pipermail/wikidata-tech/2013-August/000189.html



On Fri, Aug 23, 2013 at 6:18 PM, David Cuenca dacu...@gmail.com wrote:

 Hi, I will answer with questions with more questions...

 On Fri, Aug 23, 2013 at 10:58 AM, Gerard Meijssen 
 gerard.meijs...@gmail.com wrote:

 Hoi,

 The questions are:

- would we advance a lot when we adopt the DBpedia schema as it is?

 Which schema? All of them? Some? Article classification? Infobox
 extraction? Wikidata is going to be linked to the infoboxes in Wikipedia,
 so the priority is to support those needs, not to replicate any schema.





- Would we be open to include substantially more data?

 Which data? All of it? What is the reliability?




- When we adopt the schema, can we tinker with it to suit our needs?

 Again, could you please give some example of what to import and how
 should it be adapted?


 If the answers to these questions are yes, what is the point in
 procrastinating???


 Do we have already all the datatypes that would be needed? Most of the
 properties that are missing is because of the lack of value or others.





 One other big thing of DBpedia is that it is connected to many external
 resources. This will make it possible to verify our data against these
 other sources. This is imho the more important thing to do with the time of
 our volunteers. Doing the things that have already been done is a waste of
 time.


 The thing is that if those resources already are in dbpedia, we can just
 use dbpedia as a bridge, that is how linked data is supposed to be... no
 need to replicate everything, but of course, if it is worth replicating, we
 can go through case by case.

 Micru

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l




-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Question about wikipedia categories.

2013-05-07 Thread Dimitris Kontokostas
On Tue, May 7, 2013 at 7:40 PM, Jane Darnell jane...@gmail.com wrote:

 What is interesting about categories, is that no matter how shaky the
 system is, these are pretty much the only meta data that there is for
 articles, because as I said before, just about every article has one.
 The weakness of DBpedia is that it is only programmed to pick up
 articles with infoboxes, and there just aren't that many of those.


That is not true actually. DBpedia picks up (almost) everything except from
talk  user pages



 2013/5/7, Michael Hale hale.michael...@live.com:
  Pardon the spam, but it is only 2000 categories. Four steps would be
 25000.
 
  From: hale.michael...@live.com
  To: wikidata-l@lists.wikimedia.org
  Date: Tue, 7 May 2013 12:10:51 -0400
  Subject: Re: [Wikidata-l] Question about wikipedia categories.
 
 
 
 
  I spoke too soon. That is the only loop at two steps. But if you go out
  three steps (25000 categories) you find another 23 loops. Organizational
  studies - organizations, housing - household behavior and family
  economics - home - housing, religious pluralism - religious
 persecution,
  secularism - religious pluralism, learning - inductive reasoning -
  scientific theories - sociological theories - social systems -
 society -
  education - learning, etc.
 
  From: hale.michael...@live.com
  To: wikidata-l@lists.wikimedia.org
  Date: Tue, 7 May 2013 11:31:24 -0400
  Subject: Re: [Wikidata-l] Question about wikipedia categories.
 
 
 
 
  I don't know if these are useful, but if we go two steps from the
  fundamental categories on the English Wikipedia we find several loops.
  Knowledge contains information and information contains knowledge, for
  example. Not allowing loops might force you to have to give different
 ranks
  to two categories that are equally important.
 
  Date: Tue, 7 May 2013 16:41:45 +0200
  From: hellm...@informatik.uni-leipzig.de
  To: wikidata-l@lists.wikimedia.org
  Subject: Re: [Wikidata-l] Question about wikipedia categories.
 
 
 
 
 
 
  Am 07.05.2013 14:01, schrieb emw:
 
 
 
Yes, there is and should be more than one
  ontology, and that is
 
  already the case with categories, which are so flexible they can
  loop
 
  around and become their own grandfather.
 
 
 
  Can someone give an example of where it would be useful to have
  a cycle in an ontology?
 
 
 
  Navigation! How else are you going to find back where you came from
  ;)
 
  Wikipieda categories were invented originally for navigation,
  right?  Cycles are not soo bad, then...
 
  Now we live in a new era.
 
  -- Sebastian
 
 
 
 
 
 
To my knowledge cycles are considered a problem in
  categorization, and would be a problem in a large-scaled
  ontology-based classification system as well.  My impression has
  been that Wikidata's ontology would be a directed acyclic graph
  (DAG) with a single root at entity (thing).
 
 
 
 
 
 
  On Tue, May 7, 2013 at 3:03 AM, Mathieu
Stumpf psychosl...@culture-libre.org
wrote:
 
Le
  2013-05-06 18:13, Jane Darnell a écrit :
 
 
 
  Yes, there is and should be more than one ontology,
  and that is
 
  already the case with categories, which are so flexible
  they can loop
 
  around and become their own grandfather.
 
 
 
 
 
  To my mind, categories indeed feet better how we think. I'm
  not sure grandfather is a canonical term in such a graph,
  I think it's simply a cycle[1].
 
 
 
  [1] https://en.wikipedia.org/wiki/Cycle_%28graph_theory%29
 
 
 
 
 
  Dbpedia complaints should be discussed on that list, I
  am not a
 
  dbpedia user, though I think it's a useful project to
  have around.
 
 
 
 
 
  Sorry I didn't want to make off topic messages, nor sound
  complaining. I just wanted to give my feedback, hopefuly a
  constructive one, on a message posted on this list. I
  transfered my message to dbpedia mailing list.
 
 
 
 
 
 
 
 
Sent from my iPad
 
 
 
On May 6, 2013, at 12:00 PM, Jona Christopher
Sahnwaldt j...@sahnwaldt.de
wrote:
 
 
 
 
  Hi Mathieu,
 
 
 
  I think the DBpedia mailing list is a better place
  for discussing the
 
  DBpedia ontology:
 
 
  https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
 
  Drop us a message if you have questions or concerns.
  I'm sure someone
 
  will answer your questions. I am not an ontology
  expert, so I'll just
 
  

Re: [Wikidata-l] WikiData change propagation for third parties

2013-04-26 Thread Dimitris Kontokostas
Hi Daniel,

On Fri, Apr 26, 2013 at 6:15 PM, Daniel Kinzler daniel.kinz...@wikimedia.de
 wrote:

 On 26.04.2013 16:56, Denny Vrandečić wrote:
  The third party propagation is not very high on our priority list. Not
 because
  it is not important, but because there are things that are even more
 important -
  like getting it to work for Wikipedia :) And this seems to be
 stabilizing.
 
  What we have, for now:
 
  * We have the broadcast of all edits through IRC.

 This interface is quite unreliable, the output can't be parsed in an
 unambiguous
 way, and may get truncated. I did implement notifications via XMPP several
 years
 ago, but it never went beyond a proof of concept. Have a look at the XMLRC
 extension if you are interested.

  * One could poll recent changes, but with 200-450 edits per minute, this
 might
  get problematic.

 Well, polling isn't really the problem, fetching all the content is. And
 you'd
 need to do that no matter how you get the information of what has changed.

  * We do have the OAIRepository extension installed on Wikidata. Did
 anyone try that?

 In principle that is a decent update interface, but I'd recommend not to
 use OAI
  before we have implemented feature 47714 (Support RDF and API
 serializations
 of entity data via OAI-MPH). Right now, what you'd get from there would
 be our
 *internal* JSON representation, which is different from what the API
 returns,
 and may change at any time without notice.


What we do right now in DBpedia Live is that we have a local clone of
Wikipedia that get's in sync using the OAIRepository extension. This is
done to abuse our local copy as we please.

The local copy also publishes updates with OAI-PMH that we use to get the
list of modified page ids. Once we get the page ids, we use the normal
mediawiki api to fetch the actual page content.
So, feature 47714 should not be a problem in our case since we don't need
the data serialized directly from OAI-PMH

Cheers,
Dimitris



 -- daniel

 --
 Daniel Kinzler, Softwarearchitekt
 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.


 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l




-- 
Kontokostas Dimitris
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] DBpedia+Spotlight accepted @ Google Summer of Code 2013

2013-04-10 Thread Dimitris Kontokostas
[Apologies for cross-posting]

Dear fellow DBpedians,

I am very excited to announce that DBpedia and DBpedia Spotlight have again
been been selected for the Google Summer of Code 2013!!!

If you know energetic students (BSc,MSc,PhD) interested in working with
DBpedia, text processing, and semantics, please encourage them to apply!

More details can also be found on the blog post here:
http://blog.dbpedia.org/2013/04/10/dbpediaspotlight-accepted-google-summer-of-code-2013/

On behalf of the DBpedia GSoC team,
Dimitris Kontokostas

-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas
 http://aksw.org/DimitrisKontokostas
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l