Envlh added a comment.
In T204440#5110658 <https://phabricator.wikimedia.org/T204440#5110658>, @GoranSMilovanovic wrote: > @Envlh I will also compare your and mine processing procedures. You observe this identifier (P380 <https://phabricator.wikimedia.org/P380>) on 48,202 items, my code finds 48,232 use cases, while I am using an older version of the dump. Of course, that is empirically possible, but I would normally expect an increase in the usage of an identifier with time. @GoranSMilovanovic My tool checks overlaps only on properties used as statements, not when they are used as qualifiers or references. Maybe that can explain some discrepancy? In T204440#5111158 <https://phabricator.wikimedia.org/T204440#5111158>, @Jheald wrote: > When you've got the data sorted, a table showing the closest identifiers by Jaccard similarity, rather than total overlap, might be quite interesting. @Jheald It's available here: https://tools.dicare.org/properties/?type[]=ExternalId#jaccard_index You can click on the name of a property to have its closest properties by Jaccard index. You can also reset the form at the top of the page to display all properties, not only external identifiers. TASK DETAIL https://phabricator.wikimedia.org/T204440 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic, Envlh Cc: Jheald, agray, Envlh, Lea_Lacroix_WMDE, VIGNERON, Pintoch, Daniel_Mietchen, connorshea, Moebeus, Multichill, Hjfocs, RazShuty, GoranSMilovanovic, Aklapper, Lydia_Pintscher, alaa_wmde, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
