Hoi,
As indicated by the DBpedia people, there are two ways in which data gets
into their latest Fusion offering. There is consensus, all the available
sources agree and, there is the notion where one source is deemed
authoritative. Remember, DBpedia uses sources outside of the Wikimedia
movement like national libraries !!
What I miss in your paper is purpose, what is the way forward and how does
it compare with and improve on current practice. Current practice is that
people import data from anywhere, typically it is single sourced if at all
and including is introduced human error that is inherent in a manual
process. The DBpedia folks have a WMF sponsored project whereby they
facilitate the inclusion of data to Wikidata. Particularly where there is
consensus (no opposing sources) it is an improvement on current practice,
it complements nicely the existing Wikidata content. The content where
there is NO consensus, is useful because it enables the highlighting where
these errors occur. It will really help in finding false friends.
The Freebase data has been abandoned. It did not get the respect it
deserved and particularly at the time its quality was better than Wikidata.
The fact that it is dated IS a saving grace because Wikidata/ Wikipedia is
particularly strong on the content related to the period of Wikipedia
activity. My preferred way of treating the Freebase data is fusing it is
the Fusion project. All the data that is new or expands on what is known in
Fusion is of relevance. Given that no maintenance is done on the Freebase
data, the dissenting data at best can be used for curating what is in the
WMF projects.
In your paper you support the notion of harvesting based on single sources.
Maybe at a later date. First we need to integrate the uncontroversial data,
the data where there is a consensus in multiple projects. The biggest
benefit will be that a lot of make work is prevented. Work done because the
data just did not get into Wikidata.
Thanks,
GerardM
On Tue, 1 Oct 2019 at 01:14, Denny Vrandečić <[email protected]> wrote:
> Hi all,
>
> as promised, now that I am back from my trip, here's my draft of the
> comparison of Wikidata, DBpedia, and Freebase.
>
> It is a draft, it is obviously potentially biased given my background,
> etc., but I hope that we can work on it together to get it into a good
> shape.
>
> Markus, amusingly I took pretty much the same example that you went for,
> the parent predicate. So yes, I was also surprised by the results, and
> would love to have Sebastian or Kingsley look into it and see if I
> conducted it fairly.
>
> SJ, Andra, thanks for offering to take a look. I am sure you all can
> contribute your own unique background and make suggestions on how to
> improve things and whether the results ring true.
>
> Marco, I totally agree with what you said - the project has stalled, and
> there is plenty of opportunity to harvest more data from Freebase and bring
> it to Wikidata, and this should be reignited. Sebastian, I also agree with
> you, and the numbers do so too, the same is true with the extraction
> results from DBpedia.
>
> Sebastian, Kingsley, I tried to describe how I understand DBpedia, and all
> steps should be reproducible. As it seems that the two of you also have to
> discuss one or the other thing about DBpedia's identity, I am relieved that
> my confusion is not entirely unjustified. So I tried to use both the last
> stable DBpedia release as well as a new-style DBpedia fusion dataset for
> the comparison. But I might have gotten the whole procedure wrong. I am
> happy to be corrected.
>
> On Sat, Sep 28, 2019 at 12:28 AM <[email protected]>
> wrote:
>
>> > Meanwhile, Google crawls all the references and extracts facts from
> there. We don't
> > have that available, but there is Linked Open Data.
>
> Potentially, not a bad idea, but we don't do that.
>
> Everyone, this is the first time I share a Colab notebook, and I have no
> idea if I did it right. So any feedback of the form "oh you didn't switch
> on that bit over here" or "yes, this works, thank you" is very welcome,
> because I have no clue what I am doing :) Also, I never did this kind of
> analysis so transparently, which is kinda both totally cool and rather
> scary, because now you can all see how dumb I am :)
>
> So everyone is invited to send Pull Requests (I guess that's how this
> works?), and I would love for us to create a result together that we agree
> on. I see the result of this exercise to be potentially twofold:
>
> 1) a publication we can point people to who ask about the differences
> between Wikidata, DBpedia, and Freebase
>
> 2) to reignite or start projects and processes to reduce these differences
>
> So, here is the link to my Colab notebook:
>
>
> https://github.com/vrandezo/colabs/blob/master/Comparing_coverage_and_accuracy_of_DBpedia%2C_Freebase%2C_and_Wikidata_for_the_parent_predicate.ipynb
>
> Ideally, the third goal could be to get to a deeper understanding of how
> these three projects relate to each other - in my point of view, Freebase
> is dead and outdated, Wikidata is the core knowledge base that anyone can
> edit, and DBpedia is the core project to weave value-adding workflows on
> top of Wikidata or other datasets from the linked open data cloud together.
> But that's just a proposal.
>
> Cheers,
> Denny
>
>
>
> On Sat, Sep 28, 2019 at 12:28 AM <[email protected]>
> wrote:
>
>> Hi Gerard,
>>
>> I was not trying to judge here. I was just saying that it wasn't much
>> data in the end.
>> For me Freebase was basically cherry-picked.
>>
>> Meanwhile, the data we extract is more pertinent to the goal of having
>> Wikidata cover the info boxes. We still have ~ 500 million statements left.
>> But none of it is used yet. Hopefully we can change that.
>>
>> Meanwhile, Google crawls all the references and extracts facts from
>> there. We don't have that available, but there is Linked Open Data.
>>
>> --
>> Sebastian
>>
>> On September 27, 2019 5:26:43 PM GMT+02:00, Gerard Meijssen <
>> [email protected]> wrote:
>>>
>>> Hoi,
>>> I totally reject the assertion was so bad. I have always had the opinion
>>> that the main issue was an atrocious user interface. Add to this the people
>>> that have Wikipedia notions about quality. They have and had a detrimental
>>> effect on both the quantity and quality of Wikidata.
>>>
>>> When you add the functionality that is being build by the datawranglers
>>> at DBpedia, it becomes easy/easier to compare the data from Wikipedias with
>>> Wikidata (and why not Freebase) add what has consensus and curate the
>>> differences. This will enable a true datasense of quality and allows us to
>>> provide a much improved service.
>>> Thanks,
>>> GerardM
>>>
>>> On Fri, 27 Sep 2019 at 15:54, Marco Fossati <[email protected]>
>>> wrote:
>>>
>>>> Hey Sebastian,
>>>>
>>>> On 9/20/19 10:22 AM, Sebastian Hellmann wrote:
>>>> > Not much of Freebase did end up in Wikidata.
>>>>
>>>> Dropping here some pointers to shed light on the migration of Freebase
>>>> to Wikidata, since I was partially involved in the process:
>>>> 1. WikiProject [1];
>>>> 2. the paper behind [2];
>>>> 3. datasets to be migrated [3].
>>>>
>>>> I can confirm that the migration has stalled: as of today, *528
>>>> thousands* Freebase statements were curated by the community, out of
>>>> *10
>>>> million* ones. By 'curated', I mean approved or rejected.
>>>> These numbers come from two queries against the primary sources tool
>>>> database.
>>>>
>>>> The stall is due to several causes: in my opinion, the most important
>>>> one was the bad quality of sources [4,5] coming from the Knowledge
>>>> Vault
>>>> project [6].
>>>>
>>>> Cheers,
>>>>
>>>> Marco
>>>>
>>>> [1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase
>>>> [2]
>>>>
>>>> http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44818.pdf
>>>> [3]
>>>>
>>>> https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data
>>>> [4]
>>>>
>>>> https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/2017#Quality_of_sources
>>>> [5]
>>>>
>>>> https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_Addition_of_References_to_Wikidata_Statements#A_whitelist_for_sources
>>>> [6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf
>>>>
>>>> _______________________________________________
>>>> Wikidata mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>
>>>
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>> _______________________________________________
>> Wikidata mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
> _______________________________________________
> Wikidata mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata