Re: [Wikidata] Duplicate identifiers (redirects & non-redirects)

2015-10-05 Thread Stas Malyshev
Hi!

> It seems like the constraint checker could check for either only one
> "Preferred" or all but one "Deprecated" which would allow editors to
> evolve in whichever way they wanted.

It should probably consider "best rank" ones - i.e. if Preferred exists
then Preferred ones, otherwise Normal ones but never Deprecated ones.
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Duplicate identifiers (redirects & non-redirects)

2015-10-02 Thread Bene*



Am 01.10.2015 um 18:40 schrieb Tom Morris:
On Thu, Oct 1, 2015 at 4:19 AM, Markus Krötzsch 
> 
wrote:


On 01.10.2015 00:58, Ricordisamoa wrote:

I think Tom is referring to external identifiers such as
MusicBrainz
artist ID  etc.
and whether
Wikidata items should show all of them or 'preferred' ones
only as we
did for VIAF redirects

.


Now if the external site reconciles the ids, we have these options:
(1) Keep everything as is (one main id marked as "preferred")
(2) Make the redirect ids deprecated on Wikidata (show people that
we are aware of the ids but they should not be used)
(3) Delete the redirect ids

I think (2) would be cleanest, since it avoids that unaware users
re-add the old ids. (3) would also be ok once the old id is no
longer in circulation.


I agree #2 is best, although #1 could work too.  The problem with #3 
is that an identifier, once minted, is never "no longer in 
circulation."  This is precisely why Wikidata items are never 
deleted.  There's always the possibility that someone will hold a 
reference to it somewhere.  Thad's use case isn't uncommon.


I think #2 is the only solution we should do. This is exactly what the 
deprecated rank is for: marking some information as valid for some point 
in time but telling the users that this should not be used any more.




Is there any benefit in removing old ids completely? I guess
constraint reports will work better (but maybe constraint reports
should not count deprecated statements in single value contraints
...). 



The constraint reports definitely need to be fixed.  I recently saw a 
reference to a VIAF bot run that deleted a whole bunch of VIAF 
identifiers to "fix" things being flagged by some constraint.


Not sure if marking the statements as deprecated would already fix them. 
If not, the code creating these lists needs to be adjusted to ignore 
deprecated statements (maybe optionally?).



Other than this, I don't see a big reason to spend time on
removing some ids. It's not wrong to claim that these are ids,
just slightly redundant, and the old ids might still be useful for
integrating with web sources that were not updated when the
redirect happened.


Rather than not wasting time removing, I'd like to see affirmative 
statements that keeping them is a good thing. If people find them 
annoying or cluttering, it's because of poor UI design, not because 
they lack usefulness.


Indeed and as far as I know the new ui will hide deprecated statements 
per default and only show them on demand by toggling.


Best regards
Bene
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Duplicate identifiers (redirects & non-redirects)

2015-10-01 Thread Markus Krötzsch

On 01.10.2015 00:58, Ricordisamoa wrote:

I think Tom is referring to external identifiers such as MusicBrainz
artist ID  etc. and whether
Wikidata items should show all of them or 'preferred' ones only as we
did for VIAF redirects
.


There are also other cases where external sites have duplicates that are 
not reconciled (yet). For example, Q46843 has multiple GeoNames Ids:


http://sws.geonames.org/7602447
http://sws.geonames.org/2954602

The second was suggested by Freebase, the first is what Wikipedia had. I 
think the first is better (polygon rather than bounding box), so I made 
this preferred. This is a situation where we should keep multiple 
identifiers, since the external database really has two ids that are not 
integrated yet.


Now if the external site reconciles the ids, we have these options:
(1) Keep everything as is (one main id marked as "preferred")
(2) Make the redirect ids deprecated on Wikidata (show people that we 
are aware of the ids but they should not be used)

(3) Delete the redirect ids

I think (2) would be cleanest, since it avoids that unaware users re-add 
the old ids. (3) would also be ok once the old id is no longer in 
circulation.


Is there any benefit in removing old ids completely? I guess constraint 
reports will work better (but maybe constraint reports should not count 
deprecated statements in single value contraints ...). Other than this, 
I don't see a big reason to spend time on removing some ids. It's not 
wrong to claim that these are ids, just slightly redundant, and the old 
ids might still be useful for integrating with web sources that were not 
updated when the redirect happened.


Markus




Il 01/10/2015 00:48, Addshore ha scritto:



On 30 September 2015 at 20:58, Tom Morris > wrote:

I think I've seen something somewhere saying that the prevailing
sentiment is that obsolete identifiers which are just redirects to
a new identifier should be removed.


I hope not. See my post at
http://addshore.com/2015/04/redirects-on-wikidata/ Redirects should
remain!

Also see http://addshore.com/2015/09/un-deleting-50-wikidata-items/


There's also the case of sites like MusicBrainz which keep the
non-canonical IDs without redirecting to the canonical ID, but
will tell you which ID is preferred, e.g. Fritz Kreisler


https://musicbrainz.org/artist/590fcad4-2ba4-43bc-a22f-a4bb9b496fe8
https://musicbrainz.org/artist/627ac6c2-ee5c-4120-8af3-ab00345447f5
https://musicbrainz.org/artist/bf6d6ce1-ce88-40e6-9424-11d11d2e54ea

where all the tabs for the second two pages actually point to the
first, canonical entry.

Is there an established policy for either the redirect or
non-redirect case?


See
https://www.wikidata.org/wiki/Wikidata:Deletion_policy#Deletion_of_items_.28Phase_I.29
which says "Items should not be deleted when - The item redirects to
another item"

Also see https://www.wikidata.org/wiki/Help:Merge#Create_redirect
which says redirects should be created when items are merged


I'd argue that even the obsolete identifiers are useful for
inbound resolution and reconciliation. Aggressively pruning them
just makes more work for people, because they must resolve the
identifier that they have in hand to its canonical form (probably
by hitting the issuing authority) before using it for Wikidata
lookups.

What do others think?

Tom

___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Addshore


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Duplicate identifiers (redirects & non-redirects)

2015-10-01 Thread Thad Guidry
No benefit to removing the old ids...in fact...It would make things more
difficult for me and others in a few older databases.  I would like to keep
the old IDs in Wikidata around for posterity and provenance ...some of us
still have really old databases with cruft and old IDs from years and years
ago, some from the start of the Internet :)  If you remove the old IDs it
will make it that much harder for me to reconcile some of them.

+1  Being able to query the Wikidata API with an older ID and it showing me
that it is an old ID and letting me know there is now a preferred ID, would
be fantastic.

Thad
+ThadGuidry 

On Thu, Oct 1, 2015 at 3:19 AM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> On 01.10.2015 00:58, Ricordisamoa wrote:
>
>> I think Tom is referring to external identifiers such as MusicBrainz
>> artist ID  etc. and whether
>> Wikidata items should show all of them or 'preferred' ones only as we
>> did for VIAF redirects
>> <
>> https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/SamoaBot_38
>> >.
>>
>
> There are also other cases where external sites have duplicates that are
> not reconciled (yet). For example, Q46843 has multiple GeoNames Ids:
>
> http://sws.geonames.org/7602447
> http://sws.geonames.org/2954602
>
> The second was suggested by Freebase, the first is what Wikipedia had. I
> think the first is better (polygon rather than bounding box), so I made
> this preferred. This is a situation where we should keep multiple
> identifiers, since the external database really has two ids that are not
> integrated yet.
>
> Now if the external site reconciles the ids, we have these options:
> (1) Keep everything as is (one main id marked as "preferred")
> (2) Make the redirect ids deprecated on Wikidata (show people that we are
> aware of the ids but they should not be used)
> (3) Delete the redirect ids
>
> I think (2) would be cleanest, since it avoids that unaware users re-add
> the old ids. (3) would also be ok once the old id is no longer in
> circulation.
>
> Is there any benefit in removing old ids completely? I guess constraint
> reports will work better (but maybe constraint reports should not count
> deprecated statements in single value contraints ...). Other than this, I
> don't see a big reason to spend time on removing some ids. It's not wrong
> to claim that these are ids, just slightly redundant, and the old ids might
> still be useful for integrating with web sources that were not updated when
> the redirect happened.
>
> Markus
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Duplicate identifiers (redirects & non-redirects)

2015-10-01 Thread Tom Morris
On Thu, Oct 1, 2015 at 4:19 AM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> On 01.10.2015 00:58, Ricordisamoa wrote:
>
>> I think Tom is referring to external identifiers such as MusicBrainz
>> artist ID  etc. and whether
>> Wikidata items should show all of them or 'preferred' ones only as we
>> did for VIAF redirects
>> <
>> https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/SamoaBot_38
>> >.
>>
>
> Now if the external site reconciles the ids, we have these options:
> (1) Keep everything as is (one main id marked as "preferred")
> (2) Make the redirect ids deprecated on Wikidata (show people that we are
> aware of the ids but they should not be used)
> (3) Delete the redirect ids
>
> I think (2) would be cleanest, since it avoids that unaware users re-add
> the old ids. (3) would also be ok once the old id is no longer in
> circulation.
>

I agree #2 is best, although #1 could work too.  The problem with #3 is
that an identifier, once minted, is never "no longer in circulation."  This
is precisely why Wikidata items are never deleted.  There's always the
possibility that someone will hold a reference to it somewhere.  Thad's use
case isn't uncommon.

Is there any benefit in removing old ids completely? I guess constraint
> reports will work better (but maybe constraint reports should not count
> deprecated statements in single value contraints ...).


The constraint reports definitely need to be fixed.  I recently saw a
reference to a VIAF bot run that deleted a whole bunch of VIAF identifiers
to "fix" things being flagged by some constraint.


> Other than this, I don't see a big reason to spend time on removing some
> ids. It's not wrong to claim that these are ids, just slightly redundant,
> and the old ids might still be useful for integrating with web sources that
> were not updated when the redirect happened.
>

Rather than not wasting time removing, I'd like to see affirmative
statements that keeping them is a good thing.  If people find them annoying
or cluttering, it's because of poor UI design, not because they lack
usefulness.

Tom
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Duplicate identifiers (redirects & non-redirects)

2015-10-01 Thread Jane Darnell
I agree with Tom on this. I would prefer to keep all of the redirects nd
just deprecate them (especially for names of people, because hidden in the
redirect is an alternate spelling that should be added as an alias to the
label field)

On Thu, Oct 1, 2015 at 6:55 PM, James Heald  wrote:

> It might be worth creating a qualifier "reason for deprecation" to
> indicate in more detail why a particular value is deprecated (eg
> "superseded", "redirected on target website", etc).
>
>   -- James.
>
>
> On 01/10/2015 17:40, Tom Morris wrote:
>
>> On Thu, Oct 1, 2015 at 4:19 AM, Markus Krötzsch <
>> mar...@semantic-mediawiki.org> wrote:
>>
>> On 01.10.2015 00:58, Ricordisamoa wrote:
>>>
>>> I think Tom is referring to external identifiers such as MusicBrainz
 artist ID  etc. and
 whether
 Wikidata items should show all of them or 'preferred' ones only as we
 did for VIAF redirects
 <

 https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/SamoaBot_38

> .
>


>>> Now if the external site reconciles the ids, we have these options:
>>> (1) Keep everything as is (one main id marked as "preferred")
>>> (2) Make the redirect ids deprecated on Wikidata (show people that we are
>>> aware of the ids but they should not be used)
>>> (3) Delete the redirect ids
>>>
>>> I think (2) would be cleanest, since it avoids that unaware users re-add
>>> the old ids. (3) would also be ok once the old id is no longer in
>>> circulation.
>>>
>>>
>> I agree #2 is best, although #1 could work too.  The problem with #3 is
>> that an identifier, once minted, is never "no longer in circulation."
>> This
>> is precisely why Wikidata items are never deleted.  There's always the
>> possibility that someone will hold a reference to it somewhere.  Thad's
>> use
>> case isn't uncommon.
>>
>> Is there any benefit in removing old ids completely? I guess constraint
>>
>>> reports will work better (but maybe constraint reports should not count
>>> deprecated statements in single value contraints ...).
>>>
>>
>>
>> The constraint reports definitely need to be fixed.  I recently saw a
>> reference to a VIAF bot run that deleted a whole bunch of VIAF identifiers
>> to "fix" things being flagged by some constraint.
>>
>>
>> Other than this, I don't see a big reason to spend time on removing some
>>> ids. It's not wrong to claim that these are ids, just slightly redundant,
>>> and the old ids might still be useful for integrating with web sources
>>> that
>>> were not updated when the redirect happened.
>>>
>>>
>> Rather than not wasting time removing, I'd like to see affirmative
>> statements that keeping them is a good thing.  If people find them
>> annoying
>> or cluttering, it's because of poor UI design, not because they lack
>> usefulness.
>>
>> Tom
>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Duplicate identifiers (redirects & non-redirects)

2015-10-01 Thread James Heald
It might be worth creating a qualifier "reason for deprecation" to 
indicate in more detail why a particular value is deprecated (eg 
"superseded", "redirected on target website", etc).


  -- James.


On 01/10/2015 17:40, Tom Morris wrote:

On Thu, Oct 1, 2015 at 4:19 AM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:


On 01.10.2015 00:58, Ricordisamoa wrote:


I think Tom is referring to external identifiers such as MusicBrainz
artist ID  etc. and whether
Wikidata items should show all of them or 'preferred' ones only as we
did for VIAF redirects
<
https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/SamoaBot_38

.




Now if the external site reconciles the ids, we have these options:
(1) Keep everything as is (one main id marked as "preferred")
(2) Make the redirect ids deprecated on Wikidata (show people that we are
aware of the ids but they should not be used)
(3) Delete the redirect ids

I think (2) would be cleanest, since it avoids that unaware users re-add
the old ids. (3) would also be ok once the old id is no longer in
circulation.



I agree #2 is best, although #1 could work too.  The problem with #3 is
that an identifier, once minted, is never "no longer in circulation."  This
is precisely why Wikidata items are never deleted.  There's always the
possibility that someone will hold a reference to it somewhere.  Thad's use
case isn't uncommon.

Is there any benefit in removing old ids completely? I guess constraint

reports will work better (but maybe constraint reports should not count
deprecated statements in single value contraints ...).



The constraint reports definitely need to be fixed.  I recently saw a
reference to a VIAF bot run that deleted a whole bunch of VIAF identifiers
to "fix" things being flagged by some constraint.



Other than this, I don't see a big reason to spend time on removing some
ids. It's not wrong to claim that these are ids, just slightly redundant,
and the old ids might still be useful for integrating with web sources that
were not updated when the redirect happened.



Rather than not wasting time removing, I'd like to see affirmative
statements that keeping them is a good thing.  If people find them annoying
or cluttering, it's because of poor UI design, not because they lack
usefulness.

Tom



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Duplicate identifiers (redirects & non-redirects)

2015-10-01 Thread Tom Morris
A small mechanical note for those not familiar with Wikidata's internals
(since it took me a while to figure this out):

"Preferred" and "Deprecated" are "Ranks" (the third is "Normal") and the
rank can be set by clicking the "Edit" button and then clicking on the
leftmost of the two tiny sets of three stacked buttons near the left of the
input field.

It seems like the constraint checker could check for either only one
"Preferred" or all but one "Deprecated" which would allow editors to evolve
in whichever way they wanted.

Tom

On Thu, Oct 1, 2015 at 12:40 PM, Tom Morris  wrote:

> On Thu, Oct 1, 2015 at 4:19 AM, Markus Krötzsch <
> mar...@semantic-mediawiki.org> wrote:
>
>> On 01.10.2015 00:58, Ricordisamoa wrote:
>>
>>> I think Tom is referring to external identifiers such as MusicBrainz
>>> artist ID  etc. and whether
>>> Wikidata items should show all of them or 'preferred' ones only as we
>>> did for VIAF redirects
>>> <
>>> https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/SamoaBot_38
>>> >.
>>>
>>
>> Now if the external site reconciles the ids, we have these options:
>> (1) Keep everything as is (one main id marked as "preferred")
>> (2) Make the redirect ids deprecated on Wikidata (show people that we are
>> aware of the ids but they should not be used)
>> (3) Delete the redirect ids
>>
>> I think (2) would be cleanest, since it avoids that unaware users re-add
>> the old ids. (3) would also be ok once the old id is no longer in
>> circulation.
>>
>
> I agree #2 is best, although #1 could work too.  The problem with #3 is
> that an identifier, once minted, is never "no longer in circulation."  This
> is precisely why Wikidata items are never deleted.  There's always the
> possibility that someone will hold a reference to it somewhere.  Thad's use
> case isn't uncommon.
>
> Is there any benefit in removing old ids completely? I guess constraint
>> reports will work better (but maybe constraint reports should not count
>> deprecated statements in single value contraints ...).
>
>
> The constraint reports definitely need to be fixed.  I recently saw a
> reference to a VIAF bot run that deleted a whole bunch of VIAF identifiers
> to "fix" things being flagged by some constraint.
>
>
>> Other than this, I don't see a big reason to spend time on removing some
>> ids. It's not wrong to claim that these are ids, just slightly redundant,
>> and the old ids might still be useful for integrating with web sources that
>> were not updated when the redirect happened.
>>
>
> Rather than not wasting time removing, I'd like to see affirmative
> statements that keeping them is a good thing.  If people find them annoying
> or cluttering, it's because of poor UI design, not because they lack
> usefulness.
>
> Tom
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Duplicate identifiers (redirects & non-redirects)

2015-09-30 Thread Ricordisamoa
I think Tom is referring to external identifiers such as MusicBrainz 
artist ID  etc. and whether 
Wikidata items should show all of them or 'preferred' ones only as we 
did for VIAF redirects 
.


Il 01/10/2015 00:48, Addshore ha scritto:



On 30 September 2015 at 20:58, Tom Morris > wrote:


I think I've seen something somewhere saying that the prevailing
sentiment is that obsolete identifiers which are just redirects to
a new identifier should be removed.


I hope not. See my post at 
http://addshore.com/2015/04/redirects-on-wikidata/ Redirects should 
remain!


Also see http://addshore.com/2015/09/un-deleting-50-wikidata-items/


There's also the case of sites like MusicBrainz which keep the
non-canonical IDs without redirecting to the canonical ID, but
will tell you which ID is preferred, e.g. Fritz Kreisler


https://musicbrainz.org/artist/590fcad4-2ba4-43bc-a22f-a4bb9b496fe8
https://musicbrainz.org/artist/627ac6c2-ee5c-4120-8af3-ab00345447f5
https://musicbrainz.org/artist/bf6d6ce1-ce88-40e6-9424-11d11d2e54ea

where all the tabs for the second two pages actually point to the
first, canonical entry.

Is there an established policy for either the redirect or
non-redirect case?


See 
https://www.wikidata.org/wiki/Wikidata:Deletion_policy#Deletion_of_items_.28Phase_I.29 
which says "Items should not be deleted when - The item redirects to 
another item"


Also see https://www.wikidata.org/wiki/Help:Merge#Create_redirect 
which says redirects should be created when items are merged



I'd argue that even the obsolete identifiers are useful for
inbound resolution and reconciliation. Aggressively pruning them
just makes more work for people, because they must resolve the
identifier that they have in hand to its canonical form (probably
by hitting the issuing authority) before using it for Wikidata
lookups.

What do others think?

Tom

___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Addshore


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata