thanks for sharing the paper, this is an interesting topic. I just wanted to
point to some (own) prior work on entity summarization which is related to what
you have done:
All the best
> Am 08.03.2018 um 19:00 schrieb Aidan Hogan <aid...@gmail.com>:
> Hey Raphaël,
> Thanks for the comments and the reference! And sorry we missed discussion of
> your paper (which indeed looks at largely the same problem in a slightly
> different context). If there's a next time, we will be sure to include it in
> the related work.
> I am impressed btw to see a third-party evaluation of a Google tool. Also it
> seems Google has room for improvement. :)
> On 07-03-2018 13:43, Raphaël Troncy wrote:
>> Hey Aidan,
>> Great work, I loved it! You may want to (cite and) look at what we did 4
>> years ago where we tried to reverse engineer a bit what Google is doing when
>> choosing properties (and values) to show in its rich panels alongside
>> popular entities.
>> The paper is entitled "What Are the Important Properties
>> of an Entity? Comparing Users and Knowledge Graph Point of View",
>> ... and the code is on github to replicate: https://github.com/ahmadassaf/KBE
>> Le 07/03/2018 à 05:53, Aidan Hogan a écrit :
>>> Hi all,
>>> Tomás and I would like to share a paper that might be of interest to the
>>> community. It presents some preliminary results of a work looking at fully
>>> automated methods to generate Wikipedia info-boxes from Wikidata. The main
>>> focus is on deciding what information from Wikidata to include, and in what
>>> order. The results are based on asking users (students) to rate some
>>> prototypes of generated info-boxes.
>>> Tomás Sáez, Aidan Hogan "Automatically Generating Wikipedia Infoboxes from
>>> Wikidata". In the Proceedings of the Wiki Workshop at WWW 2018, Lyon,
>>> France, April 24, 2018.
>>> - Link: http://aidanhogan.com/docs/infobox-wikidata.pdf
>>> We understand that populating info-boxes is an important goal of Wikidata
>>> and hence we thought we'd share some lessons learned.
>>> Obviously a lot of work is being put into populating info-boxes from
>>> Wikidata, but the main methods at the moment seem to be template-based and
>>> require a lot of manual labour; plus the definition of these templates
>>> seems to be a difficult problem for classes such as person (where different
>>> information will have different priorities for people of different
>>> professions, notoriety, etc.).
>>> We were just interested to see how far we could get with a fully automated
>>> approach using some generic ranking methods. Also we thought that something
>>> like this could perhaps be used to generate a "default" info-box for
>>> articles with no info-box and no associated template mapping. The paper
>>> presents preliminary results along those lines.
>>> One interesting result is that a major factor in the evaluation of the
>>> generated info-boxes was the importance of the value. For example, Barack
>>> Obama has lots of awards, but perhaps only something like the Nobel Peace
>>> Prize might be of relevance to show in the info-box (<- being intended as
>>> an illustrative example rather than a concrete assertion of course!).
>>> Another example is that sibling might not be an important attribute in a
>>> lot of cases, but when that sibling is Barack Obama, then that deserves to
>>> be in the info-box (<- how such cases could be expressed in a purely
>>> template-based approach, we are not sure, but it would seem difficult).
>>> We assess the importance of values with PageRank. Assessing the importance
>>> not only of attributes, but of values, turned out to be a major influence
>>> on how highly our evaluators assessed the quality of the generated
>>> This initial/isolated observation might be interesting since, to the best
>>> of our understanding, the current wisdom on populating info-boxes from
>>> Wikidata focuses on what attributes to present and in which order, but does
>>> not consider the importance of values (aside from the Wikidata rank
>>> feature, which we believe is more intended to assess relevance/timeliness,
>>> than importance).
>>> Hence one of the most interesting (and surprising, for us at least) results
>>> of the work is to suggest that it appears to be important to rank *values*
>>> by importance (not just attributes) when considering what information the
>>> user might be interested in.
>>> (There are limitations to PageRank measures, however, in that they cannot
>>> assess, for example, the importance of a particular date, or, more
>>> generally, datatype values.)
>>> In any case, we are looking forward to presenting these results at the Wiki
>>> Workshop at WWW 2018, and any feedback or thoughts are welcome!
>>> Wikidata mailing list
> Wikidata mailing list
Hasso-Plattner-Institut für Digital Engineering gGmbH
Amtsgericht Potsdam, HRB 12184
Geschäftsführung: Prof. Dr. Christoph Meinel
tel: +49 331 5509 547
Wikidata mailing list