On Tue, Jun 2, 2015 at 12:12 PM Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> Another interesting type of Scottish historic orphans are those that are
> duplicates of items that do have site links. Even very prominent ones
> are duplicated, such as
>
> https://www.wikidata.org/wiki/Q17569486 (dup)
> https://www.wikidata.org/wiki/Q933000 (real item)
>
> Interestingly, they use different Scotland IDs, and it does indeed seem
> that Historic Scotland also contains duplicates:
>
>
> http://data.historic-scotland.gov.uk/pls/htmldb/f?p=2200:15:0::::BUILDING,HL:47778
>
> http://data.historic-scotland.gov.uk/pls/htmldb/f?p=2200:15:0::::BUILDING,HL:49165
>
> Overall, this seems to be an example of an ID that really should not be
> considered "identity providing" since there seems to be an many-to-many
> relationship between Wikidata and Historic Scottland. Orphans should
> receive additional ids from a better source if at all possible. With the
> great number of seemingly legit non-functional uses of the Scotland IDs,
> they cannot be used in practice to detect duplicates.
>

They are not unique on the Historic Scorland site, but they can still have
the correct IDs on WIkidata, even if they are non-unique. What will be
required for this (and other external IDs) in the long run is an automated
or semi-automated check against the foreign data corpus, with heuristics
highlighting potential issues. This includes new items in the external
source (or ones we missed during initial import).

Given that I received the original data as a CSV from WMUK, who got it from
Historic Scotland under Freedom of Information (IIRC), this might prove
tricky.


>
> Regards,
>
> Markus
>
>
> On 02.06.2015 13:01, Markus Krötzsch wrote:
> > On 02.06.2015 11:30, Magnus Manske wrote:
> >> Update 2:
> >> For example,
> >> https://www.wikidata.org/wiki/Q17847522
> >> and
> >> https://www.wikidata.org/wiki/Q17847537
> >> have the same Scotland ID, but refer to different entities (church and
> >> churchyard, respectively). They were as two entities in the original
> >> dataset, sharing the same ID.
> >
> > Yes, I noticed such cases too. From the information Wikidata, it is not
> > clear to me why this is sometimes done and sometimes not done.
> >
> > For example, these adjacent houses have the same Scotland ID but
> > different items that each have their own coordinates (where did the
> > coordinates come from?):
> >
> > https://www.wikidata.org/wiki/Q17576211
> > https://www.wikidata.org/wiki/Q17576182
> > https://www.wikidata.org/wiki/Q17576185
> >
> > In many other cases, adjacent houses with the same ID are combined into
> > one item:
> >
> > https://www.wikidata.org/wiki/Q17806587
> >
> > (note, however, that the house addresses given in the ID and in the item
> > label do not match, though they overlap on most of the houses.)
> >
> > Finally, there are also cases where there are different IDs and we have
> > several items, but they have the same labels that merge the contents of
> > the two IDs:
> >
> > https://www.wikidata.org/wiki/Q17810121
> > https://www.wikidata.org/wiki/Q17810137
> >
> >
> > It seems that the data was not taken from the Historic Sites database
> > but from some different source that has its own coordinate data and a
> > different (but seemingly arbitrary) approach to grouping sites. However,
> > the coordinated give Historic Scotland as their reference -- I wonder if
> > Historic Scotland might be changing frequently or exist in several
> > versions.
> >
> > Regards,
> >
> > Markus
> >
> >
> >>
> >> On Tue, Jun 2, 2015 at 10:26 AM Magnus Manske
> >> <magnusman...@googlemail.com <mailto:magnusman...@googlemail.com>>
> wrote:
> >>
> >>     Update: There appear to be quite a few items with duplicate Scotland
> >>     IDs (not all of them may be erroneous!):
> >>     http://wdq.wmflabs.org/stats?action=doublestring&prop=709
> >>
> >>     On Tue, Jun 2, 2015 at 10:23 AM Magnus Manske
> >>     <magnusman...@googlemail.com <mailto:magnusman...@googlemail.com>>
> >>     wrote:
> >>
> >>         I created (some/most of) these items as part of the Wiki Loves
> >>         Monuments UK 2014 drive, to run the campaign from Wikidata
> >>         rather than from a bespoke database. This allows the community
> >>         (TM) to maintain the data, rather than one poor sod (e.g.,
> >>         myself) having to frantically update all of it every year ;-)
> >>
> >>         "Consumer" tool is here:
> >>         https://tools.wmflabs.org/wlmuk/index_wd.html
> >>
> >>         These are based on "official" data from National Heritage,
> >>         provided to me via Wikimedia UK. Grade A (or Grade I/II* in
> >>         England) structures should be noteworthy by default.
> >>
> >>         It appears (as per your examples) that some of these were
> >>         created as duplicates/with wrong IDs. As I said, this is based
> >>         on "official" data, so it's the best I could do at the time.
> >>         With mass creation, there are bound to be a few strays. If you
> >>         can find some large-scale, systemic issue I'll try to fix it,
> >>         but the one-offs will always fall back to manual fixing. At
> >>         least, with Wikidata, we can fix them together.
> >>
> >>         On Tue, Jun 2, 2015 at 10:01 AM Daniel Kinzler
> >>         <daniel.kinz...@wikimedia.de
> >>         <mailto:daniel.kinz...@wikimedia.de>> wrote:
> >>
> >>             Am 01.06.2015 um 22:26 schrieb Markus Krötzsch:
> >>              > Finally, the technical question is: Why is this even
> >>             possible? I thought that,
> >>              > in each language, label+description are a key (globally
> >>             unique), yet here we
> >>              > have many pairs of items with exactly the same label and
> >>             description. Or is the
> >>              > problem that no description was entered and so the system
> >>             does not apply the
> >>              > key?
> >>
> >>             The uniqueness constraint does indeed not apply if there is
> >>             no description.
> >>
> >>             --
> >>             Daniel Kinzler
> >>             Senior Software Developer
> >>
> >>             Wikimedia Deutschland
> >>             Gesellschaft zur Förderung Freien Wissens e.V.
> >>
> >>             _______________________________________________
> >>             Wikidata mailing list
> >>             Wikidata@lists.wikimedia.org
> >>             <mailto:Wikidata@lists.wikimedia.org>
> >>             https://lists.wikimedia.org/mailman/listinfo/wikidata
> >>
> >>
> >>
> >> _______________________________________________
> >> Wikidata mailing list
> >> Wikidata@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikidata
> >>
> >
>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to