[Please pardon me if you have already read this on the Wikidata chat]

Hello folks,

------------------------------------------------------------
TL;DR: what do you think of the 3 validation criteria below?
------------------------------------------------------------

I'm excited to let you know that the soweego 2 project has just started [1]!

To cut a long story short, soweego links Wikidata to large third-party catalogs.

The next step will be all about synchronization of Wikidata to a given target catalog through a set of validation criteria. Let me paste below some key parts of the project proposal.

1) existence: whether a target identifier found in a given Wikidata item is still available in the target catalog; 2) links: to what extent all URLs available in a Wikidata item overlap with those in the corresponding target catalog entry; 3) metadata: to what extent relevant statements available in a Wikidata item overlap with those in the corresponding target catalog entry.

These criteria would respectively trigger a set of actions. As a toy example:

1) Elvis Presley (Q303) has a MusicBrainz identifier 01809552, which does not exist in MusicBrainz anymore.
   Action = mark the identifier statement with a deprecated rank;
2) Elvis Presley (Q303) has 7 URLs, MusicBrainz 01809552 has 8 URLs, and 3 overlap. Action = add 5 URLs from MusicBrainz to Elvis Presley (Q303) and submit 4 URLs from Wikidata to the MusicBrainz community; 3) Wikidata states that Elvis Presley (Q303) was born on January 8, 1935 in Tupelo, while MusicBrainz states that 01809552 was born in 1934 in Memphis. Action = add 2 referenced statements with MusicBrainz values to Elvis Presley (Q303) and notify 2 Wikidata values to the MusicBrainz community.

In case of either full or no overlap in criteria 2 and 3, the Wikidata identifier statement should be marked with a preferred or a deprecated rank respectively.

Please note that the soweego bot already has an approved task for criterion 2 [2], together with a set of test edits [3]. In addition, we performed (then reverted) a set of test edits for criterion 1 [4].

I'm glad to hear any thoughts about the validation criteria, keeping in mind that the more generic the better.

Stay tuned for more rock'n'roll!
With love,

Marco

[1] https://meta.wikimedia.org/wiki/Grants:Project/Hjfocs/soweego_2
[2] https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/Soweego_bot_2 [3] https://www.wikidata.org/w/index.php?title=Special:Contributions&target=Soweego_bot&contribs=user&start=2018-11-05&end=2018-11-05&limit=250 [4] https://www.wikidata.org/w/index.php?title=Special:Contributions&target=Soweego_bot&contribs=user&start=2018-11-07&end=2018-11-13&limit=100
_______________________________________________
Wikidata mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to