Hi Aidan and Tomás,
Thanks a lot for sharing your research. It'll be valuable input as we
look into making it easier for smaller Wikipedias to generate
infoboxes based on Wikidata.
On Wed, Mar 7, 2018 at 5:53 AM, Aidan Hogan <aid...@gmail.com> wrote:
> Hi all,
> Tomás and I would like to share a paper that might be of interest to the
> community. It presents some preliminary results of a work looking at fully
> automated methods to generate Wikipedia info-boxes from Wikidata. The main
> focus is on deciding what information from Wikidata to include, and in what
> order. The results are based on asking users (students) to rate some
> prototypes of generated info-boxes.
> Tomás Sáez, Aidan Hogan "Automatically Generating Wikipedia Infoboxes from
> Wikidata". In the Proceedings of the Wiki Workshop at WWW 2018, Lyon,
> France, April 24, 2018.
> - Link: http://aidanhogan.com/docs/infobox-wikidata.pdf
> We understand that populating info-boxes is an important goal of Wikidata
> and hence we thought we'd share some lessons learned.
> Obviously a lot of work is being put into populating info-boxes from
> Wikidata, but the main methods at the moment seem to be template-based and
> require a lot of manual labour; plus the definition of these templates seems
> to be a difficult problem for classes such as person (where different
> information will have different priorities for people of different
> professions, notoriety, etc.).
> We were just interested to see how far we could get with a fully automated
> approach using some generic ranking methods. Also we thought that something
> like this could perhaps be used to generate a "default" info-box for
> articles with no info-box and no associated template mapping. The paper
> presents preliminary results along those lines.
> One interesting result is that a major factor in the evaluation of the
> generated info-boxes was the importance of the value. For example, Barack
> Obama has lots of awards, but perhaps only something like the Nobel Peace
> Prize might be of relevance to show in the info-box (<- being intended as an
> illustrative example rather than a concrete assertion of course!). Another
> example is that sibling might not be an important attribute in a lot of
> cases, but when that sibling is Barack Obama, then that deserves to be in
> the info-box (<- how such cases could be expressed in a purely
> template-based approach, we are not sure, but it would seem difficult).
> We assess the importance of values with PageRank. Assessing the importance
> not only of attributes, but of values, turned out to be a major influence on
> how highly our evaluators assessed the quality of the generated info-boxes.
> This initial/isolated observation might be interesting since, to the best of
> our understanding, the current wisdom on populating info-boxes from Wikidata
> focuses on what attributes to present and in which order, but does not
> consider the importance of values (aside from the Wikidata rank feature,
> which we believe is more intended to assess relevance/timeliness, than
> Hence one of the most interesting (and surprising, for us at least) results
> of the work is to suggest that it appears to be important to rank *values*
> by importance (not just attributes) when considering what information the
> user might be interested in.
> (There are limitations to PageRank measures, however, in that they cannot
> assess, for example, the importance of a particular date, or, more
> generally, datatype values.)
> In any case, we are looking forward to presenting these results at the Wiki
> Workshop at WWW 2018, and any feedback or thoughts are welcome!
> Wikidata mailing list
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list