On Tue, Jun 2, 2015 at 12:12 PM Markus Krötzsch < mar...@semantic-mediawiki.org> wrote:
> Another interesting type of Scottish historic orphans are those that are > duplicates of items that do have site links. Even very prominent ones > are duplicated, such as > > https://www.wikidata.org/wiki/Q17569486 (dup) > https://www.wikidata.org/wiki/Q933000 (real item) > > Interestingly, they use different Scotland IDs, and it does indeed seem > that Historic Scotland also contains duplicates: > > > http://data.historic-scotland.gov.uk/pls/htmldb/f?p=2200:15:0::::BUILDING,HL:47778 > > http://data.historic-scotland.gov.uk/pls/htmldb/f?p=2200:15:0::::BUILDING,HL:49165 > > Overall, this seems to be an example of an ID that really should not be > considered "identity providing" since there seems to be an many-to-many > relationship between Wikidata and Historic Scottland. Orphans should > receive additional ids from a better source if at all possible. With the > great number of seemingly legit non-functional uses of the Scotland IDs, > they cannot be used in practice to detect duplicates. > They are not unique on the Historic Scorland site, but they can still have the correct IDs on WIkidata, even if they are non-unique. What will be required for this (and other external IDs) in the long run is an automated or semi-automated check against the foreign data corpus, with heuristics highlighting potential issues. This includes new items in the external source (or ones we missed during initial import). Given that I received the original data as a CSV from WMUK, who got it from Historic Scotland under Freedom of Information (IIRC), this might prove tricky. > > Regards, > > Markus > > > On 02.06.2015 13:01, Markus Krötzsch wrote: > > On 02.06.2015 11:30, Magnus Manske wrote: > >> Update 2: > >> For example, > >> https://www.wikidata.org/wiki/Q17847522 > >> and > >> https://www.wikidata.org/wiki/Q17847537 > >> have the same Scotland ID, but refer to different entities (church and > >> churchyard, respectively). They were as two entities in the original > >> dataset, sharing the same ID. > > > > Yes, I noticed such cases too. From the information Wikidata, it is not > > clear to me why this is sometimes done and sometimes not done. > > > > For example, these adjacent houses have the same Scotland ID but > > different items that each have their own coordinates (where did the > > coordinates come from?): > > > > https://www.wikidata.org/wiki/Q17576211 > > https://www.wikidata.org/wiki/Q17576182 > > https://www.wikidata.org/wiki/Q17576185 > > > > In many other cases, adjacent houses with the same ID are combined into > > one item: > > > > https://www.wikidata.org/wiki/Q17806587 > > > > (note, however, that the house addresses given in the ID and in the item > > label do not match, though they overlap on most of the houses.) > > > > Finally, there are also cases where there are different IDs and we have > > several items, but they have the same labels that merge the contents of > > the two IDs: > > > > https://www.wikidata.org/wiki/Q17810121 > > https://www.wikidata.org/wiki/Q17810137 > > > > > > It seems that the data was not taken from the Historic Sites database > > but from some different source that has its own coordinate data and a > > different (but seemingly arbitrary) approach to grouping sites. However, > > the coordinated give Historic Scotland as their reference -- I wonder if > > Historic Scotland might be changing frequently or exist in several > > versions. > > > > Regards, > > > > Markus > > > > > >> > >> On Tue, Jun 2, 2015 at 10:26 AM Magnus Manske > >> <magnusman...@googlemail.com <mailto:magnusman...@googlemail.com>> > wrote: > >> > >> Update: There appear to be quite a few items with duplicate Scotland > >> IDs (not all of them may be erroneous!): > >> http://wdq.wmflabs.org/stats?action=doublestring&prop=709 > >> > >> On Tue, Jun 2, 2015 at 10:23 AM Magnus Manske > >> <magnusman...@googlemail.com <mailto:magnusman...@googlemail.com>> > >> wrote: > >> > >> I created (some/most of) these items as part of the Wiki Loves > >> Monuments UK 2014 drive, to run the campaign from Wikidata > >> rather than from a bespoke database. This allows the community > >> (TM) to maintain the data, rather than one poor sod (e.g., > >> myself) having to frantically update all of it every year ;-) > >> > >> "Consumer" tool is here: > >> https://tools.wmflabs.org/wlmuk/index_wd.html > >> > >> These are based on "official" data from National Heritage, > >> provided to me via Wikimedia UK. Grade A (or Grade I/II* in > >> England) structures should be noteworthy by default. > >> > >> It appears (as per your examples) that some of these were > >> created as duplicates/with wrong IDs. As I said, this is based > >> on "official" data, so it's the best I could do at the time. > >> With mass creation, there are bound to be a few strays. If you > >> can find some large-scale, systemic issue I'll try to fix it, > >> but the one-offs will always fall back to manual fixing. At > >> least, with Wikidata, we can fix them together. > >> > >> On Tue, Jun 2, 2015 at 10:01 AM Daniel Kinzler > >> <daniel.kinz...@wikimedia.de > >> <mailto:daniel.kinz...@wikimedia.de>> wrote: > >> > >> Am 01.06.2015 um 22:26 schrieb Markus Krötzsch: > >> > Finally, the technical question is: Why is this even > >> possible? I thought that, > >> > in each language, label+description are a key (globally > >> unique), yet here we > >> > have many pairs of items with exactly the same label and > >> description. Or is the > >> > problem that no description was entered and so the system > >> does not apply the > >> > key? > >> > >> The uniqueness constraint does indeed not apply if there is > >> no description. > >> > >> -- > >> Daniel Kinzler > >> Senior Software Developer > >> > >> Wikimedia Deutschland > >> Gesellschaft zur Förderung Freien Wissens e.V. > >> > >> _______________________________________________ > >> Wikidata mailing list > >> Wikidata@lists.wikimedia.org > >> <mailto:Wikidata@lists.wikimedia.org> > >> https://lists.wikimedia.org/mailman/listinfo/wikidata > >> > >> > >> > >> _______________________________________________ > >> Wikidata mailing list > >> Wikidata@lists.wikimedia.org > >> https://lists.wikimedia.org/mailman/listinfo/wikidata > >> > > > > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata >
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata