> On Nov 21, 2015, at 9:44 AM, Gerard Meijssen <[email protected]> > wrote: > > Hoi, > Yes you can add an item for the missing brother. When you do, you should link > it to his brother and thereby they are explicitly not the same. They can both > have the same alias. It helps when you add pertinent data like a date of > birth/death. I take it they are not twins. > Thanks, > GerardM
Hi Gerard, I am actually interested in the general problem, not this specific pair. In other words: should Mix’n’match automatically perform the two actions I listed above? In other words, how can we clearly signal *in Wikidata* that the output of costly human labor should not be undone by machines or lazy humans in the future? > On 21 November 2015 at 18:34, Dario Taraborelli <[email protected] > <mailto:[email protected]>> wrote: > I finally found the time to play extensively with Mix’n’match and it’s by far > one of the most promising models I’ve come across for Wikidata growth. A > short conversation with Magnus on Twitter got me thinking on how to best > preserve the output of costly human curation.[1] > > I spent most of my time manually auditing automatically matched entries from > the Dizionario Biografico degli Italiani [2]. These entries are long, > unstructured biographical entries and it takes quite a lot of effort to > understand if the two individuals referenced by Wikidata and DBI actually are > the same person. This is a great example of a task that’s still pretty hard > for a machine to perform, no matter how sophisticated the algorithm. > > My favorite example? Mix’n’ match suggested a match between Giulio Baldigara > (Q1010811 <https://www.wikidata.org/wiki/Q1010811>) and Giulio Baldigara (DBI > <http://www.treccani.it/enciclopedia/giulio-baldigara_(Dizionario_Biografico)/>) > which looked totally legitimate: these two individuals are both Italian > architects from the 16th century with the same name, they were both born > around the same years in the same city, they were both active in Hungary at > the same time: strong indication that they are the same person, right? It > turns out they are brothers and the full name of the person referenced in > Wikidata is Giulio Cesare Baldigara (the least known in a family of > architects). I unmatched the suggestion and flagged the DBI entry as non > existing in Wikidata. > > My question at the moment is: the output of a labor-intensive review of a > potential match is currently stored as a volatile flag in a tool hosted on > labs, but is invisible in Wikidata. Should something happen to Mix’n’match > (god forbid) the result of my work would get lost. Which got me thinking: > > - shouldn’t a manually unmatched item be created directly on Wikidata (after > all DBI is all about notable individuals who would easily pass Wikidata’s > notability threshold for biographies) > - shouldn’t the relation between Giulio (Cesare) Baldigara (Q1010811 > <https://www.wikidata.org/wiki/Q1010811>) and the newly created item for > Giulio Baldigara be explicitly represented via a not the same as property, to > prevent future humans or machines from accidentally remerging the two items > based on some kind of heuristics > > Thoughts welcome, > > Dario > > [1] https://twitter.com/ReaderMeter/status/667214565621432320 > <https://twitter.com/ReaderMeter/status/667214565621432320> > [2] > https://tools.wmflabs.org/mix-n-match/?mode=catalog&catalog=55&offset=0&show_noq=0&show_autoq=1&show_userq=0&show_na=0 > > <https://tools.wmflabs.org/mix-n-match/?mode=catalog&catalog=55&offset=0&show_noq=0&show_autoq=1&show_userq=0&show_na=0> > > > > _______________________________________________ > Wikidata mailing list > [email protected] <mailto:[email protected]> > https://lists.wikimedia.org/mailman/listinfo/wikidata > <https://lists.wikimedia.org/mailman/listinfo/wikidata> > > > _______________________________________________ > Wikidata mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikidata Dario Taraborelli Head of Research, Wikimedia Foundation wikimediafoundation.org <http://wikimediafoundation.org/> • nitens.org <http://nitens.org/> • @readermeter <http://twitter.com/readermeter>
_______________________________________________ Wikidata mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata
