> On Nov 21, 2015, at 9:44 AM, Gerard Meijssen <[email protected]> 
> wrote:
> 
> Hoi,
> Yes you can add an item for the missing brother. When you do, you should link 
> it to his brother and thereby they are explicitly not the same. They can both 
> have the same alias. It helps when you add pertinent data like a date of 
> birth/death. I take it they are not twins.
> Thanks,
>      GerardM

Hi Gerard, I am actually interested in the general problem, not this specific 
pair. In other words: should Mix’n’match automatically perform the two actions 
I listed above? In other words, how can we clearly signal *in Wikidata* that 
the output of costly human labor should not be undone by machines or lazy 
humans in the future? 

> On 21 November 2015 at 18:34, Dario Taraborelli <[email protected] 
> <mailto:[email protected]>> wrote:
> I finally found the time to play extensively with Mix’n’match and it’s by far 
> one of the most promising models I’ve come across for Wikidata growth. A 
> short conversation with Magnus on Twitter got me thinking on how to best 
> preserve the output of costly human curation.[1]
> 
> I spent most of my time manually auditing automatically matched entries from 
> the Dizionario Biografico degli Italiani [2]. These entries are long, 
> unstructured biographical entries and it takes quite a lot of effort to 
> understand if the two individuals referenced by Wikidata and DBI actually are 
> the same person. This is a great example of a task that’s still pretty hard 
> for a machine to perform, no matter how sophisticated the algorithm.
> 
> My favorite example? Mix’n’ match suggested a match between Giulio Baldigara 
> (Q1010811 <https://www.wikidata.org/wiki/Q1010811>) and Giulio Baldigara (DBI 
> <http://www.treccani.it/enciclopedia/giulio-baldigara_(Dizionario_Biografico)/>)
>  which looked totally legitimate: these two individuals are both Italian 
> architects from the 16th century with the same name, they were both born 
> around the same years in the same city, they were both active in Hungary at 
> the same time: strong indication that they are the same person, right? It 
> turns out they are brothers and the full name of the person referenced in 
> Wikidata is Giulio Cesare Baldigara (the least known in a family of 
> architects). I unmatched the suggestion and flagged the DBI entry as non 
> existing in Wikidata.
> 
> My question at the moment is: the output of a labor-intensive review of a 
> potential match is currently stored as a volatile flag in a tool hosted on 
> labs, but is invisible in Wikidata. Should something happen to Mix’n’match 
> (god forbid) the result of my work would get lost. Which got me thinking:
> 
> - shouldn’t a manually unmatched item be created directly on Wikidata (after 
> all DBI is all about notable individuals who would easily pass Wikidata’s 
> notability threshold for biographies)
> - shouldn’t the relation between Giulio (Cesare) Baldigara (Q1010811 
> <https://www.wikidata.org/wiki/Q1010811>) and the newly created item for 
> Giulio Baldigara be explicitly represented via a not the same as property, to 
> prevent future humans or machines from accidentally remerging the two items 
> based on some kind of heuristics
> 
> Thoughts welcome,
> 
> Dario
> 
> [1] https://twitter.com/ReaderMeter/status/667214565621432320 
> <https://twitter.com/ReaderMeter/status/667214565621432320>
> [2] 
> https://tools.wmflabs.org/mix-n-match/?mode=catalog&catalog=55&offset=0&show_noq=0&show_autoq=1&show_userq=0&show_na=0
>  
> <https://tools.wmflabs.org/mix-n-match/?mode=catalog&catalog=55&offset=0&show_noq=0&show_autoq=1&show_userq=0&show_na=0>
> 
> 
> 
> _______________________________________________
> Wikidata mailing list
> [email protected] <mailto:[email protected]>
> https://lists.wikimedia.org/mailman/listinfo/wikidata 
> <https://lists.wikimedia.org/mailman/listinfo/wikidata>
> 
> 
> _______________________________________________
> Wikidata mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata



Dario Taraborelli  Head of Research, Wikimedia Foundation
wikimediafoundation.org <http://wikimediafoundation.org/> • nitens.org 
<http://nitens.org/> • @readermeter <http://twitter.com/readermeter>
_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to