Re: [Wikidata] No links, wrong data: Scotland's orphans need help

2015-06-08 Thread Markus Krötzsch

On 08.06.2015 11:16, Bene* wrote:

Hi

Am 07.06.2015 um 17:18 schrieb Markus Krötzsch:

Magnus also pointed out that many external IDs are "self-verifying" in
that they are their own reference. The situation is somewhat similar
for homepages. Should we adopt the practice of giving a single
retrieved value (without any further information) as the reference for
such cases?


I'd use the reference URL the value was imported from together with the
retrieved value.


Yes, that's a good point. Even if an ID has a value for "formatter URL" 
that defines the URL, it might be good to record the URL that was used 
to verify the data, since the formatter URL might change. Especially 
bots should add this, since it's no extra work for them. However, a 
single "retrieved" value is still better than nothing there. For 
homepages and other URL properties, I would not maybe store the URL 
again in the reference.


Note that most often the reference is not where the value is "imported 
from" but simply an external reference. In many cases, we have imported 
data from Wikipedia but it is then verified from another dataset.


Cheers,

Markus


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] No links, wrong data: Scotland's orphans need help

2015-06-08 Thread Lydia Pintscher
On Sun, Jun 7, 2015 at 11:30 PM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> I don't know if there is a good place on Wikidata to document such things.
> I always struggle to find documentation about how to do references (best
> practices, e.g., how to cite an online news portal correctly).


That'd be https://www.wikidata.org/wiki/Help:Sources


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] No links, wrong data: Scotland's orphans need help

2015-06-08 Thread Bene*

Hi

Am 07.06.2015 um 17:18 schrieb Markus Krötzsch:
Magnus also pointed out that many external IDs are "self-verifying" in 
that they are their own reference. The situation is somewhat similar 
for homepages. Should we adopt the practice of giving a single 
retrieved value (without any further information) as the reference for 
such cases?


I'd use the reference URL the value was imported from together with the 
retrieved value.


Best regards
Bene

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] No links, wrong data: Scotland's orphans need help

2015-06-07 Thread Markus Krötzsch

On 07.06.2015 20:07, Luca Martinelli wrote:

Il 07/giu/2015 17:19, "Markus Krötzsch" mailto:mar...@semantic-mediawiki.org>> ha scritto:
 >
 > Coming back to Magnus's suggestion ... I think the existing property
"retrieved" (P813) could be used for this "last verified on" property,
that is, for setting the time a which some external reference was last
compared to a claim in Wikidata.
 >
 > Magnus also pointed out that many external IDs are "self-verifying"
in that they are their own reference. The situation is somewhat similar
for homepages. Should we adopt the practice of giving a single retrieved
value (without any further information) as the reference for such cases?
 >
 > Adding P813 dates more widely would also open up new ways of
maintaining data, since one would have a way to filter statements by how
long ago they had last been checked.

Sounds ok, but how will we do it?


As editors, we can just do it from now on. I was always unsure what to 
use as a reference for ids and homepages. Now I'll use this.


Bot operators can do the same. Magnus is already using P813 in the 
sourcerer game as well.


I don't know if there is a good place on Wikidata to document such 
things. I always struggle to find documentation about how to do 
references (best practices, e.g., how to cite an online news portal 
correctly).



And should we wait for the identifier
datatype to be ready?


Time information makes sense for any online reference, so we do not 
really need to know if the statement we are editing is for an ID 
property. Whether a statement is "self-verifying" so that a single P813 
would already work as reference depends on the context. The properties 
that this is mainly true for are those of type URL and those where you 
can get a URL or URI to verify things (i.e., those with properties P1630 
or P1921).


Markus


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] No links, wrong data: Scotland's orphans need help

2015-06-07 Thread Markus Krötzsch

On 07.06.2015 18:29, Magnus Manske wrote:

One question remaining is: Should there be a difference between
"human-verified" and "bot-verified"? A bot can check if e.g. the label
(or the words in the label) occur on the page at the URL to check, but
it can't know for sure. Human review is more reliable, but vastly slower
and not likely to happen for many/most such statements. Two different
properties could act as different confidence levels. But maybe I'm just
over-engineering this ;-)


It depends. For structured data sources, a bot should be able to do a 
thorough verification (possibly better than a human), e.g., by comparing 
name, birthdate and deathdate of a person at once. I would focus on 
these cases first since we have enough of them ;-)


For cases where a bot con only make a guess, it might be better to add a 
human to the loop, as in your (truly amazing!) sourcerer game. The game 
also shows that it may depend on the items how well this approach works, 
since text matches are sometimes completely meaningless (e.g., "Human 
parent taxon homo" can not be verified by looking for "Homo" since every 
page that might contain this fact also mentions "Homo sapiens" many 
times). For such difficult cases, I am not sure if a bot-defined 
information "looked correct, but I am not sure" would really be very 
helpful. It depends ;-)


Cheers,

Markus



On Sun, Jun 7, 2015 at 4:19 PM Markus Krötzsch
mailto:mar...@semantic-mediawiki.org>>
wrote:

Coming back to Magnus's suggestion ... I think the existing property
"retrieved" (P813) could be used for this "last verified on" property,
that is, for setting the time a which some external reference was last
compared to a claim in Wikidata.

Magnus also pointed out that many external IDs are "self-verifying" in
that they are their own reference. The situation is somewhat similar for
homepages. Should we adopt the practice of giving a single retrieved
value (without any further information) as the reference for such cases?

Adding P813 dates more widely would also open up new ways of maintaining
data, since one would have a way to filter statements by how long ago
they had last been checked.

Best wishes,

Markus

On 03.06.2015 15:56, Markus Krötzsch wrote:
 > On 03.06.2015 13:57, Magnus Manske wrote:
 >> Maybe there is a case to separate import and verification here?
 >>
 >> There are many statements in Wikidata nowadays, but they get really
 >> "trustworthy" through references (other than "imported from
Wikipedia").
 >> But for external IDs, references are superfluous; they are their own
 >> reference, by definition. So how about marking IDs with a
"verified" (or
 >> "last verified on") qualifier? Much of such work could be done
by bots;
 >> we could then filter the problematic ones out for manual
verification.
 >>
 >> As we have no control over external lists, this would have to be
 >> re-checked ever so often; but, again bots to the rescue.
 >>
 >
 > Yes, I fully support this proposal.
 >
 > What do you think about making "last verified on" not a qualifier but
 > (part of) the reference information? The reference could state
where the
 > bot has looked up the ID and give a time. This would be somewhat
similar
 > to what is now used in Freebase Ids, e.g., in
 > https://www.wikidata.org/wiki/Q42.
 >
 > In general, it might be useful to have such a "last verified on"
 > property that can be added to arbitrary references. There are
many other
 > uses for this. One common case would be that a user has changed the
 > value without even being aware of the reference -- then one would be
 > able to detect this automatically by comparing the last modification
 > time with the "last verified on" date.
 >
 > Putting the "last verified on" into the references also makes it
 > possible to have different dates for different references there.
 >
 > Regards,
 >
 > Markus
 >
 >
 >
 >
 >


___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] No links, wrong data: Scotland's orphans need help

2015-06-07 Thread Luca Martinelli
Il 07/giu/2015 17:19, "Markus Krötzsch"  ha
scritto:
>
> Coming back to Magnus's suggestion ... I think the existing property
"retrieved" (P813) could be used for this "last verified on" property, that
is, for setting the time a which some external reference was last compared
to a claim in Wikidata.
>
> Magnus also pointed out that many external IDs are "self-verifying" in
that they are their own reference. The situation is somewhat similar for
homepages. Should we adopt the practice of giving a single retrieved value
(without any further information) as the reference for such cases?
>
> Adding P813 dates more widely would also open up new ways of maintaining
data, since one would have a way to filter statements by how long ago they
had last been checked.

Sounds ok, but how will we do it? And should we wait for the identifier
datatype to be ready?

L.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] No links, wrong data: Scotland's orphans need help

2015-06-07 Thread Magnus Manske
One question remaining is: Should there be a difference between
"human-verified" and "bot-verified"? A bot can check if e.g. the label (or
the words in the label) occur on the page at the URL to check, but it can't
know for sure. Human review is more reliable, but vastly slower and not
likely to happen for many/most such statements. Two different properties
could act as different confidence levels. But maybe I'm just
over-engineering this ;-)

On Sun, Jun 7, 2015 at 4:19 PM Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> Coming back to Magnus's suggestion ... I think the existing property
> "retrieved" (P813) could be used for this "last verified on" property,
> that is, for setting the time a which some external reference was last
> compared to a claim in Wikidata.
>
> Magnus also pointed out that many external IDs are "self-verifying" in
> that they are their own reference. The situation is somewhat similar for
> homepages. Should we adopt the practice of giving a single retrieved
> value (without any further information) as the reference for such cases?
>
> Adding P813 dates more widely would also open up new ways of maintaining
> data, since one would have a way to filter statements by how long ago
> they had last been checked.
>
> Best wishes,
>
> Markus
>
> On 03.06.2015 15:56, Markus Krötzsch wrote:
> > On 03.06.2015 13:57, Magnus Manske wrote:
> >> Maybe there is a case to separate import and verification here?
> >>
> >> There are many statements in Wikidata nowadays, but they get really
> >> "trustworthy" through references (other than "imported from Wikipedia").
> >> But for external IDs, references are superfluous; they are their own
> >> reference, by definition. So how about marking IDs with a "verified" (or
> >> "last verified on") qualifier? Much of such work could be done by bots;
> >> we could then filter the problematic ones out for manual verification.
> >>
> >> As we have no control over external lists, this would have to be
> >> re-checked ever so often; but, again bots to the rescue.
> >>
> >
> > Yes, I fully support this proposal.
> >
> > What do you think about making "last verified on" not a qualifier but
> > (part of) the reference information? The reference could state where the
> > bot has looked up the ID and give a time. This would be somewhat similar
> > to what is now used in Freebase Ids, e.g., in
> > https://www.wikidata.org/wiki/Q42.
> >
> > In general, it might be useful to have such a "last verified on"
> > property that can be added to arbitrary references. There are many other
> > uses for this. One common case would be that a user has changed the
> > value without even being aware of the reference -- then one would be
> > able to detect this automatically by comparing the last modification
> > time with the "last verified on" date.
> >
> > Putting the "last verified on" into the references also makes it
> > possible to have different dates for different references there.
> >
> > Regards,
> >
> > Markus
> >
> >
> >
> >
> >
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] No links, wrong data: Scotland's orphans need help

2015-06-07 Thread Markus Krötzsch
Coming back to Magnus's suggestion ... I think the existing property 
"retrieved" (P813) could be used for this "last verified on" property, 
that is, for setting the time a which some external reference was last 
compared to a claim in Wikidata.


Magnus also pointed out that many external IDs are "self-verifying" in 
that they are their own reference. The situation is somewhat similar for 
homepages. Should we adopt the practice of giving a single retrieved 
value (without any further information) as the reference for such cases?


Adding P813 dates more widely would also open up new ways of maintaining 
data, since one would have a way to filter statements by how long ago 
they had last been checked.


Best wishes,

Markus

On 03.06.2015 15:56, Markus Krötzsch wrote:

On 03.06.2015 13:57, Magnus Manske wrote:

Maybe there is a case to separate import and verification here?

There are many statements in Wikidata nowadays, but they get really
"trustworthy" through references (other than "imported from Wikipedia").
But for external IDs, references are superfluous; they are their own
reference, by definition. So how about marking IDs with a "verified" (or
"last verified on") qualifier? Much of such work could be done by bots;
we could then filter the problematic ones out for manual verification.

As we have no control over external lists, this would have to be
re-checked ever so often; but, again bots to the rescue.



Yes, I fully support this proposal.

What do you think about making "last verified on" not a qualifier but
(part of) the reference information? The reference could state where the
bot has looked up the ID and give a time. This would be somewhat similar
to what is now used in Freebase Ids, e.g., in
https://www.wikidata.org/wiki/Q42.

In general, it might be useful to have such a "last verified on"
property that can be added to arbitrary references. There are many other
uses for this. One common case would be that a user has changed the
value without even being aware of the reference -- then one would be
able to detect this automatically by comparing the last modification
time with the "last verified on" date.

Putting the "last verified on" into the references also makes it
possible to have different dates for different references there.

Regards,

Markus








___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] No links, wrong data: Scotland's orphans need help

2015-06-02 Thread Scott MacLeod
Thanks, Magnus,

What I was hoping to be able to do is wiki-add resources myself to this and
other Q items, but I still can't do this even after the changes you made
(thank you!) ... I'd add Fife, for example, to - located in the
administrative territorial entity
 - but would also like to add
further resources to Wikidata and Q items too ... which is what I think
makes Wikidata so potentially great ... and which is what lead to the
growth of Wikipedia too I think. Thank you again, M, M & Wikidatans!

Best,
Scott



On Tue, Jun 2, 2015 at 3:59 PM, Magnus Manske 
wrote:

> I have added the (un-broken) URL as "official website".
>
> Not sure which property to use for "Fife", though.
>
> On Tue, Jun 2, 2015 at 11:35 PM Scott MacLeod <
> worlduniversityandsch...@gmail.com> wrote:
>
>> Hi Markus, Magnus and Wikidatans,
>>
>> I can't yet add data to, for example, this -
>> https://www.wikidata.org/wiki/Q933000 (real item) - by clicking "save,"
>> since the "save" button isn't an active link, but the "cancel" button is. I
>> tried to add this URL - http://www.forthroadbridge.org/home (which I"m
>> not actually able to see in my browser presently - all I see is a blank
>> white page, unusually) - as well as to add the word "Fife" to various
>> fields to this "Forth Road Bridge" Q item. Will this be possible in the
>> near future?
>>
>> Scott
>>
>>
>>
>> On Tue, Jun 2, 2015 at 4:12 AM, Markus Krötzsch <
>> mar...@semantic-mediawiki.org> wrote:
>>
>>> Another interesting type of Scottish historic orphans are those that are
>>> duplicates of items that do have site links. Even very prominent ones are
>>> duplicated, such as
>>>
>>> https://www.wikidata.org/wiki/Q17569486 (dup)
>>> https://www.wikidata.org/wiki/Q933000 (real item)
>>>
>>> Interestingly, they use different Scotland IDs, and it does indeed seem
>>> that Historic Scotland also contains duplicates:
>>>
>>>
>>> http://data.historic-scotland.gov.uk/pls/htmldb/f?p=2200:15:0BUILDING,HL:47778
>>>
>>> http://data.historic-scotland.gov.uk/pls/htmldb/f?p=2200:15:0BUILDING,HL:49165
>>>
>>> Overall, this seems to be an example of an ID that really should not be
>>> considered "identity providing" since there seems to be an many-to-many
>>> relationship between Wikidata and Historic Scottland. Orphans should
>>> receive additional ids from a better source if at all possible. With the
>>> great number of seemingly legit non-functional uses of the Scotland IDs,
>>> they cannot be used in practice to detect duplicates.
>>>
>>> Regards,
>>>
>>> Markus
>>>
>>>
>>>
>>> On 02.06.2015 13:01, Markus Krötzsch wrote:
>>>
 On 02.06.2015 11:30, Magnus Manske wrote:

> Update 2:
> For example,
> https://www.wikidata.org/wiki/Q17847522
> and
> https://www.wikidata.org/wiki/Q17847537
> have the same Scotland ID, but refer to different entities (church and
> churchyard, respectively). They were as two entities in the original
> dataset, sharing the same ID.
>

 Yes, I noticed such cases too. From the information Wikidata, it is not
 clear to me why this is sometimes done and sometimes not done.

 For example, these adjacent houses have the same Scotland ID but
 different items that each have their own coordinates (where did the
 coordinates come from?):

 https://www.wikidata.org/wiki/Q17576211
 https://www.wikidata.org/wiki/Q17576182
 https://www.wikidata.org/wiki/Q17576185

 In many other cases, adjacent houses with the same ID are combined into
 one item:

 https://www.wikidata.org/wiki/Q17806587

 (note, however, that the house addresses given in the ID and in the item
 label do not match, though they overlap on most of the houses.)

 Finally, there are also cases where there are different IDs and we have
 several items, but they have the same labels that merge the contents of
 the two IDs:

 https://www.wikidata.org/wiki/Q17810121
 https://www.wikidata.org/wiki/Q17810137


 It seems that the data was not taken from the Historic Sites database
 but from some different source that has its own coordinate data and a
 different (but seemingly arbitrary) approach to grouping sites. However,
 the coordinated give Historic Scotland as their reference -- I wonder if
 Historic Scotland might be changing frequently or exist in several
 versions.

 Regards,

 Markus



> On Tue, Jun 2, 2015 at 10:26 AM Magnus Manske
> mailto:magnusman...@googlemail.com>>
> wrote:
>
> Update: There appear to be quite a few items with duplicate
> Scotland
> IDs (not all of them may be erroneous!):
> http://wdq.wmflabs.org/stats?action=doublestring&prop=709
>
> On Tue, Jun 2, 2015 at 10:23 AM Magnus Manske
> mailto:magnusman...@googlemail.com>>
> wrote:
>
> I created 

Re: [Wikidata] No links, wrong data: Scotland's orphans need help

2015-06-02 Thread Magnus Manske
On Tue, Jun 2, 2015 at 12:12 PM Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> Another interesting type of Scottish historic orphans are those that are
> duplicates of items that do have site links. Even very prominent ones
> are duplicated, such as
>
> https://www.wikidata.org/wiki/Q17569486 (dup)
> https://www.wikidata.org/wiki/Q933000 (real item)
>
> Interestingly, they use different Scotland IDs, and it does indeed seem
> that Historic Scotland also contains duplicates:
>
>
> http://data.historic-scotland.gov.uk/pls/htmldb/f?p=2200:15:0BUILDING,HL:47778
>
> http://data.historic-scotland.gov.uk/pls/htmldb/f?p=2200:15:0BUILDING,HL:49165
>
> Overall, this seems to be an example of an ID that really should not be
> considered "identity providing" since there seems to be an many-to-many
> relationship between Wikidata and Historic Scottland. Orphans should
> receive additional ids from a better source if at all possible. With the
> great number of seemingly legit non-functional uses of the Scotland IDs,
> they cannot be used in practice to detect duplicates.
>

They are not unique on the Historic Scorland site, but they can still have
the correct IDs on WIkidata, even if they are non-unique. What will be
required for this (and other external IDs) in the long run is an automated
or semi-automated check against the foreign data corpus, with heuristics
highlighting potential issues. This includes new items in the external
source (or ones we missed during initial import).

Given that I received the original data as a CSV from WMUK, who got it from
Historic Scotland under Freedom of Information (IIRC), this might prove
tricky.


>
> Regards,
>
> Markus
>
>
> On 02.06.2015 13:01, Markus Krötzsch wrote:
> > On 02.06.2015 11:30, Magnus Manske wrote:
> >> Update 2:
> >> For example,
> >> https://www.wikidata.org/wiki/Q17847522
> >> and
> >> https://www.wikidata.org/wiki/Q17847537
> >> have the same Scotland ID, but refer to different entities (church and
> >> churchyard, respectively). They were as two entities in the original
> >> dataset, sharing the same ID.
> >
> > Yes, I noticed such cases too. From the information Wikidata, it is not
> > clear to me why this is sometimes done and sometimes not done.
> >
> > For example, these adjacent houses have the same Scotland ID but
> > different items that each have their own coordinates (where did the
> > coordinates come from?):
> >
> > https://www.wikidata.org/wiki/Q17576211
> > https://www.wikidata.org/wiki/Q17576182
> > https://www.wikidata.org/wiki/Q17576185
> >
> > In many other cases, adjacent houses with the same ID are combined into
> > one item:
> >
> > https://www.wikidata.org/wiki/Q17806587
> >
> > (note, however, that the house addresses given in the ID and in the item
> > label do not match, though they overlap on most of the houses.)
> >
> > Finally, there are also cases where there are different IDs and we have
> > several items, but they have the same labels that merge the contents of
> > the two IDs:
> >
> > https://www.wikidata.org/wiki/Q17810121
> > https://www.wikidata.org/wiki/Q17810137
> >
> >
> > It seems that the data was not taken from the Historic Sites database
> > but from some different source that has its own coordinate data and a
> > different (but seemingly arbitrary) approach to grouping sites. However,
> > the coordinated give Historic Scotland as their reference -- I wonder if
> > Historic Scotland might be changing frequently or exist in several
> > versions.
> >
> > Regards,
> >
> > Markus
> >
> >
> >>
> >> On Tue, Jun 2, 2015 at 10:26 AM Magnus Manske
> >> mailto:magnusman...@googlemail.com>>
> wrote:
> >>
> >> Update: There appear to be quite a few items with duplicate Scotland
> >> IDs (not all of them may be erroneous!):
> >> http://wdq.wmflabs.org/stats?action=doublestring&prop=709
> >>
> >> On Tue, Jun 2, 2015 at 10:23 AM Magnus Manske
> >> mailto:magnusman...@googlemail.com>>
> >> wrote:
> >>
> >> I created (some/most of) these items as part of the Wiki Loves
> >> Monuments UK 2014 drive, to run the campaign from Wikidata
> >> rather than from a bespoke database. This allows the community
> >> (TM) to maintain the data, rather than one poor sod (e.g.,
> >> myself) having to frantically update all of it every year ;-)
> >>
> >> "Consumer" tool is here:
> >> https://tools.wmflabs.org/wlmuk/index_wd.html
> >>
> >> These are based on "official" data from National Heritage,
> >> provided to me via Wikimedia UK. Grade A (or Grade I/II* in
> >> England) structures should be noteworthy by default.
> >>
> >> It appears (as per your examples) that some of these were
> >> created as duplicates/with wrong IDs. As I said, this is based
> >> on "official" data, so it's the best I could do at the time.
> >> With mass creation, there are bound to be a few strays. If you
> >> can f

Re: [Wikidata] No links, wrong data: Scotland's orphans need help

2015-06-02 Thread Magnus Manske
I have added the (un-broken) URL as "official website".

Not sure which property to use for "Fife", though.

On Tue, Jun 2, 2015 at 11:35 PM Scott MacLeod <
worlduniversityandsch...@gmail.com> wrote:

> Hi Markus, Magnus and Wikidatans,
>
> I can't yet add data to, for example, this -
> https://www.wikidata.org/wiki/Q933000 (real item) - by clicking "save,"
> since the "save" button isn't an active link, but the "cancel" button is. I
> tried to add this URL - http://www.forthroadbridge.org/home (which I"m
> not actually able to see in my browser presently - all I see is a blank
> white page, unusually) - as well as to add the word "Fife" to various
> fields to this "Forth Road Bridge" Q item. Will this be possible in the
> near future?
>
> Scott
>
>
>
> On Tue, Jun 2, 2015 at 4:12 AM, Markus Krötzsch <
> mar...@semantic-mediawiki.org> wrote:
>
>> Another interesting type of Scottish historic orphans are those that are
>> duplicates of items that do have site links. Even very prominent ones are
>> duplicated, such as
>>
>> https://www.wikidata.org/wiki/Q17569486 (dup)
>> https://www.wikidata.org/wiki/Q933000 (real item)
>>
>> Interestingly, they use different Scotland IDs, and it does indeed seem
>> that Historic Scotland also contains duplicates:
>>
>>
>> http://data.historic-scotland.gov.uk/pls/htmldb/f?p=2200:15:0BUILDING,HL:47778
>>
>> http://data.historic-scotland.gov.uk/pls/htmldb/f?p=2200:15:0BUILDING,HL:49165
>>
>> Overall, this seems to be an example of an ID that really should not be
>> considered "identity providing" since there seems to be an many-to-many
>> relationship between Wikidata and Historic Scottland. Orphans should
>> receive additional ids from a better source if at all possible. With the
>> great number of seemingly legit non-functional uses of the Scotland IDs,
>> they cannot be used in practice to detect duplicates.
>>
>> Regards,
>>
>> Markus
>>
>>
>>
>> On 02.06.2015 13:01, Markus Krötzsch wrote:
>>
>>> On 02.06.2015 11:30, Magnus Manske wrote:
>>>
 Update 2:
 For example,
 https://www.wikidata.org/wiki/Q17847522
 and
 https://www.wikidata.org/wiki/Q17847537
 have the same Scotland ID, but refer to different entities (church and
 churchyard, respectively). They were as two entities in the original
 dataset, sharing the same ID.

>>>
>>> Yes, I noticed such cases too. From the information Wikidata, it is not
>>> clear to me why this is sometimes done and sometimes not done.
>>>
>>> For example, these adjacent houses have the same Scotland ID but
>>> different items that each have their own coordinates (where did the
>>> coordinates come from?):
>>>
>>> https://www.wikidata.org/wiki/Q17576211
>>> https://www.wikidata.org/wiki/Q17576182
>>> https://www.wikidata.org/wiki/Q17576185
>>>
>>> In many other cases, adjacent houses with the same ID are combined into
>>> one item:
>>>
>>> https://www.wikidata.org/wiki/Q17806587
>>>
>>> (note, however, that the house addresses given in the ID and in the item
>>> label do not match, though they overlap on most of the houses.)
>>>
>>> Finally, there are also cases where there are different IDs and we have
>>> several items, but they have the same labels that merge the contents of
>>> the two IDs:
>>>
>>> https://www.wikidata.org/wiki/Q17810121
>>> https://www.wikidata.org/wiki/Q17810137
>>>
>>>
>>> It seems that the data was not taken from the Historic Sites database
>>> but from some different source that has its own coordinate data and a
>>> different (but seemingly arbitrary) approach to grouping sites. However,
>>> the coordinated give Historic Scotland as their reference -- I wonder if
>>> Historic Scotland might be changing frequently or exist in several
>>> versions.
>>>
>>> Regards,
>>>
>>> Markus
>>>
>>>
>>>
 On Tue, Jun 2, 2015 at 10:26 AM Magnus Manske
 mailto:magnusman...@googlemail.com>>
 wrote:

 Update: There appear to be quite a few items with duplicate Scotland
 IDs (not all of them may be erroneous!):
 http://wdq.wmflabs.org/stats?action=doublestring&prop=709

 On Tue, Jun 2, 2015 at 10:23 AM Magnus Manske
 mailto:magnusman...@googlemail.com>>
 wrote:

 I created (some/most of) these items as part of the Wiki Loves
 Monuments UK 2014 drive, to run the campaign from Wikidata
 rather than from a bespoke database. This allows the community
 (TM) to maintain the data, rather than one poor sod (e.g.,
 myself) having to frantically update all of it every year ;-)

 "Consumer" tool is here:
 https://tools.wmflabs.org/wlmuk/index_wd.html

 These are based on "official" data from National Heritage,
 provided to me via Wikimedia UK. Grade A (or Grade I/II* in
 England) structures should be noteworthy by default.

 It appears (as per your examples) that some of these were
 

Re: [Wikidata] No links, wrong data: Scotland's orphans need help

2015-06-02 Thread Scott MacLeod
Hi Markus, Magnus and Wikidatans,

I can't yet add data to, for example, this -
https://www.wikidata.org/wiki/Q933000 (real item) - by clicking "save,"
since the "save" button isn't an active link, but the "cancel" button is. I
tried to add this URL - http://www.forthroadbridge.org/home (which I"m not
actually able to see in my browser presently - all I see is a blank white
page, unusually) - as well as to add the word "Fife" to various fields to
this "Forth Road Bridge" Q item. Will this be possible in the near future?

Scott



On Tue, Jun 2, 2015 at 4:12 AM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> Another interesting type of Scottish historic orphans are those that are
> duplicates of items that do have site links. Even very prominent ones are
> duplicated, such as
>
> https://www.wikidata.org/wiki/Q17569486 (dup)
> https://www.wikidata.org/wiki/Q933000 (real item)
>
> Interestingly, they use different Scotland IDs, and it does indeed seem
> that Historic Scotland also contains duplicates:
>
>
> http://data.historic-scotland.gov.uk/pls/htmldb/f?p=2200:15:0BUILDING,HL:47778
>
> http://data.historic-scotland.gov.uk/pls/htmldb/f?p=2200:15:0BUILDING,HL:49165
>
> Overall, this seems to be an example of an ID that really should not be
> considered "identity providing" since there seems to be an many-to-many
> relationship between Wikidata and Historic Scottland. Orphans should
> receive additional ids from a better source if at all possible. With the
> great number of seemingly legit non-functional uses of the Scotland IDs,
> they cannot be used in practice to detect duplicates.
>
> Regards,
>
> Markus
>
>
>
> On 02.06.2015 13:01, Markus Krötzsch wrote:
>
>> On 02.06.2015 11:30, Magnus Manske wrote:
>>
>>> Update 2:
>>> For example,
>>> https://www.wikidata.org/wiki/Q17847522
>>> and
>>> https://www.wikidata.org/wiki/Q17847537
>>> have the same Scotland ID, but refer to different entities (church and
>>> churchyard, respectively). They were as two entities in the original
>>> dataset, sharing the same ID.
>>>
>>
>> Yes, I noticed such cases too. From the information Wikidata, it is not
>> clear to me why this is sometimes done and sometimes not done.
>>
>> For example, these adjacent houses have the same Scotland ID but
>> different items that each have their own coordinates (where did the
>> coordinates come from?):
>>
>> https://www.wikidata.org/wiki/Q17576211
>> https://www.wikidata.org/wiki/Q17576182
>> https://www.wikidata.org/wiki/Q17576185
>>
>> In many other cases, adjacent houses with the same ID are combined into
>> one item:
>>
>> https://www.wikidata.org/wiki/Q17806587
>>
>> (note, however, that the house addresses given in the ID and in the item
>> label do not match, though they overlap on most of the houses.)
>>
>> Finally, there are also cases where there are different IDs and we have
>> several items, but they have the same labels that merge the contents of
>> the two IDs:
>>
>> https://www.wikidata.org/wiki/Q17810121
>> https://www.wikidata.org/wiki/Q17810137
>>
>>
>> It seems that the data was not taken from the Historic Sites database
>> but from some different source that has its own coordinate data and a
>> different (but seemingly arbitrary) approach to grouping sites. However,
>> the coordinated give Historic Scotland as their reference -- I wonder if
>> Historic Scotland might be changing frequently or exist in several
>> versions.
>>
>> Regards,
>>
>> Markus
>>
>>
>>
>>> On Tue, Jun 2, 2015 at 10:26 AM Magnus Manske
>>> mailto:magnusman...@googlemail.com>>
>>> wrote:
>>>
>>> Update: There appear to be quite a few items with duplicate Scotland
>>> IDs (not all of them may be erroneous!):
>>> http://wdq.wmflabs.org/stats?action=doublestring&prop=709
>>>
>>> On Tue, Jun 2, 2015 at 10:23 AM Magnus Manske
>>> mailto:magnusman...@googlemail.com>>
>>> wrote:
>>>
>>> I created (some/most of) these items as part of the Wiki Loves
>>> Monuments UK 2014 drive, to run the campaign from Wikidata
>>> rather than from a bespoke database. This allows the community
>>> (TM) to maintain the data, rather than one poor sod (e.g.,
>>> myself) having to frantically update all of it every year ;-)
>>>
>>> "Consumer" tool is here:
>>> https://tools.wmflabs.org/wlmuk/index_wd.html
>>>
>>> These are based on "official" data from National Heritage,
>>> provided to me via Wikimedia UK. Grade A (or Grade I/II* in
>>> England) structures should be noteworthy by default.
>>>
>>> It appears (as per your examples) that some of these were
>>> created as duplicates/with wrong IDs. As I said, this is based
>>> on "official" data, so it's the best I could do at the time.
>>> With mass creation, there are bound to be a few strays. If you
>>> can find some large-scale, systemic issue I'll try to fix it,
>>> but the one-offs will always fall back 

Re: [Wikidata] No links, wrong data: Scotland's orphans need help

2015-06-02 Thread Markus Krötzsch
Another interesting type of Scottish historic orphans are those that are 
duplicates of items that do have site links. Even very prominent ones 
are duplicated, such as


https://www.wikidata.org/wiki/Q17569486 (dup)
https://www.wikidata.org/wiki/Q933000 (real item)

Interestingly, they use different Scotland IDs, and it does indeed seem 
that Historic Scotland also contains duplicates:


http://data.historic-scotland.gov.uk/pls/htmldb/f?p=2200:15:0BUILDING,HL:47778
http://data.historic-scotland.gov.uk/pls/htmldb/f?p=2200:15:0BUILDING,HL:49165

Overall, this seems to be an example of an ID that really should not be 
considered "identity providing" since there seems to be an many-to-many 
relationship between Wikidata and Historic Scottland. Orphans should 
receive additional ids from a better source if at all possible. With the 
great number of seemingly legit non-functional uses of the Scotland IDs, 
they cannot be used in practice to detect duplicates.


Regards,

Markus


On 02.06.2015 13:01, Markus Krötzsch wrote:

On 02.06.2015 11:30, Magnus Manske wrote:

Update 2:
For example,
https://www.wikidata.org/wiki/Q17847522
and
https://www.wikidata.org/wiki/Q17847537
have the same Scotland ID, but refer to different entities (church and
churchyard, respectively). They were as two entities in the original
dataset, sharing the same ID.


Yes, I noticed such cases too. From the information Wikidata, it is not
clear to me why this is sometimes done and sometimes not done.

For example, these adjacent houses have the same Scotland ID but
different items that each have their own coordinates (where did the
coordinates come from?):

https://www.wikidata.org/wiki/Q17576211
https://www.wikidata.org/wiki/Q17576182
https://www.wikidata.org/wiki/Q17576185

In many other cases, adjacent houses with the same ID are combined into
one item:

https://www.wikidata.org/wiki/Q17806587

(note, however, that the house addresses given in the ID and in the item
label do not match, though they overlap on most of the houses.)

Finally, there are also cases where there are different IDs and we have
several items, but they have the same labels that merge the contents of
the two IDs:

https://www.wikidata.org/wiki/Q17810121
https://www.wikidata.org/wiki/Q17810137


It seems that the data was not taken from the Historic Sites database
but from some different source that has its own coordinate data and a
different (but seemingly arbitrary) approach to grouping sites. However,
the coordinated give Historic Scotland as their reference -- I wonder if
Historic Scotland might be changing frequently or exist in several
versions.

Regards,

Markus




On Tue, Jun 2, 2015 at 10:26 AM Magnus Manske
mailto:magnusman...@googlemail.com>> wrote:

Update: There appear to be quite a few items with duplicate Scotland
IDs (not all of them may be erroneous!):
http://wdq.wmflabs.org/stats?action=doublestring&prop=709

On Tue, Jun 2, 2015 at 10:23 AM Magnus Manske
mailto:magnusman...@googlemail.com>>
wrote:

I created (some/most of) these items as part of the Wiki Loves
Monuments UK 2014 drive, to run the campaign from Wikidata
rather than from a bespoke database. This allows the community
(TM) to maintain the data, rather than one poor sod (e.g.,
myself) having to frantically update all of it every year ;-)

"Consumer" tool is here:
https://tools.wmflabs.org/wlmuk/index_wd.html

These are based on "official" data from National Heritage,
provided to me via Wikimedia UK. Grade A (or Grade I/II* in
England) structures should be noteworthy by default.

It appears (as per your examples) that some of these were
created as duplicates/with wrong IDs. As I said, this is based
on "official" data, so it's the best I could do at the time.
With mass creation, there are bound to be a few strays. If you
can find some large-scale, systemic issue I'll try to fix it,
but the one-offs will always fall back to manual fixing. At
least, with Wikidata, we can fix them together.

On Tue, Jun 2, 2015 at 10:01 AM Daniel Kinzler
mailto:daniel.kinz...@wikimedia.de>> wrote:

Am 01.06.2015 um 22:26 schrieb Markus Krötzsch:
 > Finally, the technical question is: Why is this even
possible? I thought that,
 > in each language, label+description are a key (globally
unique), yet here we
 > have many pairs of items with exactly the same label and
description. Or is the
 > problem that no description was entered and so the system
does not apply the
 > key?

The uniqueness constraint does indeed not apply if there is
no description.

--
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft z

Re: [Wikidata] No links, wrong data: Scotland's orphans need help

2015-06-02 Thread Markus Krötzsch

On 02.06.2015 11:30, Magnus Manske wrote:

Update 2:
For example,
https://www.wikidata.org/wiki/Q17847522
and
https://www.wikidata.org/wiki/Q17847537
have the same Scotland ID, but refer to different entities (church and
churchyard, respectively). They were as two entities in the original
dataset, sharing the same ID.


Yes, I noticed such cases too. From the information Wikidata, it is not 
clear to me why this is sometimes done and sometimes not done.


For example, these adjacent houses have the same Scotland ID but 
different items that each have their own coordinates (where did the 
coordinates come from?):


https://www.wikidata.org/wiki/Q17576211
https://www.wikidata.org/wiki/Q17576182
https://www.wikidata.org/wiki/Q17576185

In many other cases, adjacent houses with the same ID are combined into 
one item:


https://www.wikidata.org/wiki/Q17806587

(note, however, that the house addresses given in the ID and in the item 
label do not match, though they overlap on most of the houses.)


Finally, there are also cases where there are different IDs and we have 
several items, but they have the same labels that merge the contents of 
the two IDs:


https://www.wikidata.org/wiki/Q17810121
https://www.wikidata.org/wiki/Q17810137


It seems that the data was not taken from the Historic Sites database 
but from some different source that has its own coordinate data and a 
different (but seemingly arbitrary) approach to grouping sites. However, 
the coordinated give Historic Scotland as their reference -- I wonder if 
Historic Scotland might be changing frequently or exist in several versions.


Regards,

Markus




On Tue, Jun 2, 2015 at 10:26 AM Magnus Manske
mailto:magnusman...@googlemail.com>> wrote:

Update: There appear to be quite a few items with duplicate Scotland
IDs (not all of them may be erroneous!):
http://wdq.wmflabs.org/stats?action=doublestring&prop=709

On Tue, Jun 2, 2015 at 10:23 AM Magnus Manske
mailto:magnusman...@googlemail.com>>
wrote:

I created (some/most of) these items as part of the Wiki Loves
Monuments UK 2014 drive, to run the campaign from Wikidata
rather than from a bespoke database. This allows the community
(TM) to maintain the data, rather than one poor sod (e.g.,
myself) having to frantically update all of it every year ;-)

"Consumer" tool is here:
https://tools.wmflabs.org/wlmuk/index_wd.html

These are based on "official" data from National Heritage,
provided to me via Wikimedia UK. Grade A (or Grade I/II* in
England) structures should be noteworthy by default.

It appears (as per your examples) that some of these were
created as duplicates/with wrong IDs. As I said, this is based
on "official" data, so it's the best I could do at the time.
With mass creation, there are bound to be a few strays. If you
can find some large-scale, systemic issue I'll try to fix it,
but the one-offs will always fall back to manual fixing. At
least, with Wikidata, we can fix them together.

On Tue, Jun 2, 2015 at 10:01 AM Daniel Kinzler
mailto:daniel.kinz...@wikimedia.de>> wrote:

Am 01.06.2015 um 22:26 schrieb Markus Krötzsch:
 > Finally, the technical question is: Why is this even
possible? I thought that,
 > in each language, label+description are a key (globally
unique), yet here we
 > have many pairs of items with exactly the same label and
description. Or is the
 > problem that no description was entered and so the system
does not apply the
 > key?

The uniqueness constraint does indeed not apply if there is
no description.

--
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org

https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata