Re: [Wikimedia-l] On traceability and reliability of data we publish [was Re: [Wikidata] Solve legal uncertainty of Wikidata]

2018-07-10 Thread Andreas Kolbe
Hi Roger / Alphos,



On Sat, Jul 7, 2018 at 10:01 PM, Alphos OGame 
wrote:

> Traceability of information does not pertain with who imported said
> information on Wikidata. Could be an unregistered user, could be a bot,
> could be a Wikimedian in residence or even the Pope himself, it doesn't
> make a difference in the world.
> What matters in traceability of a piece of data is *where* it comes from,
> and that piece of metadata is achieved through referencing.
> That does not belong in licencing.
> CC-BY would mean reusers would have to mention individual users (including
> vandals, sigh) who took part in compiling datasets on Wikidata, which you
> seem to be oblivious to.




When Bing and Google display Wikipedia snippets, they add a Wikipedia
hyperlink, not a list of all contributors.

That satisfies the CC-BY licence, which states explicitly,


   1. You may satisfy the conditions in Section 3(a)(1)
    in any
   reasonable manner based on the medium, means, and context in which You
   Share the Licensed Material. For example, it may be reasonable to satisfy
   the conditions by providing a URI or hyperlink to a resource that includes
   the required information.


Few people would say that listing every contributor and vandal who has ever
had a hand in a dataset would be "reasonable". This applies even more so in
the case of voice-based digital assistants (imagine ...). So I don't see
this as a real concern.




> CC-BY does not help track where a work comes from,
> only who took part in making it.




As you say, what matters in traceability is where a piece of data comes
from. Referencing is indeed important there (Wikidata has historically had
a poor track record in that respect).

But you seem to overlook the much simpler fact that end users have to be
able to tell that data they encounter come from Wikidata and not some other
unnamed database. How can they look up a Wikidata reference if the re-user
doesn't even tell them that Wikidata is where the data is from?

Telling end users that data come from Wikidata is important for several
reasons.

It brings new contributors to Wikidata. It provides visibility for
Wikimedia and its volunteer community. And when data is wrong, it enables
people to find out why the data is wrong, and where they have to go to fix
the error. Many eyes make all bugs shallow, but this is only true if people
know where to go to fix a bug.

Telling end users that data come from Wikidata is surely the most
fundamental level of traceability. And a prime reason why Google and Bing
tell users that their snippets come from Wikipedia is that Wikipedia's
licence demands attribution. That's a good thing.

Best,
A.




> Datasets reusers don't need that kind of
> information when all they want is a list of countries of which Heads of
> State have spouses whose given name starts with a D [1].
>
> When it comes to reliability, there again, who imported it is not
> particularly of importance, keeping in mind that Wikidata is not the most
> user-friendly wiki in the Wikimedia Ecosystem, as it is not the simplest to
> grasp for human minds, and as such not the most frequently vandalized
> (three huzzahs for small favors !).
> What matters in reliability of a piece of data stems from the *source* of a
> particular source of information, which, once again, is indicated by a
> reference, and the credit you give to said source.
> That again does not belong in licencing.
> CC-BY would mean reusers would know that a crapton of people they have no
> idea even exist took part in compiling a dataset they require (instead of
> just the dataset and references pertaining to it), which again you seem to
> be oblivious to. It doesn't protect reusers against vandalism, it does
> however make their dataset a whole lot larger by adding the names of a
> whole lot of people they don't know or care about.
>
> It really doesn't matter how you put it, the arguments you've put forward
> so far simply don't make any kind of sense against CC0 or for CC-BY.
>
> If however you do insist on knowing which user added a particular piece of
> data (or reference/metadata, for that matter) to an item, Wikidata keeps an
> edit history just in case. It is not necessary for the licence currently in
> effect on Wikidata (which, need I remind you, is still CC0), but it is
> there nonetheless should you need it.
>
> Now, do we need to keep this needlessly long and tedious thread alive under
> another name or could we please drop it and carry on with our lives.
>
> Roger / Alphos
>
> [1] Maybe for a prophecy or something ? Well, if anyone need it, it's
> hopelessly simple, so here goes : https://tinyurl.com/yb6dh3r6
>
>
>
> 2018-07-07 17:59 GMT+02:00 mathieu lovato stumpf guntz <
> psychosl...@culture-libre.org>:
>
> > Hi Andra,
> >
> > I agree this is misconception that a copyright license make any direct
> > change to data reliability. But attribution requirement 

Re: [Wikimedia-l] On traceability and reliability of data we publish [was Re: [Wikidata] Solve legal uncertainty of Wikidata]

2018-07-08 Thread Gerard Meijssen
Hoi,
This same mail was send at the same time to the Wikidata mailing list.. The
answer there is argued in a different way with an utterly different
outcome.. This is an example of forum shopping and the result is that there
is no single outcome, it is great example why forum shopping does not help.
It divides more than brings together.
Thanks,
 GerardM

On 9 July 2018 at 04:17, Samuel Klein  wrote:

> Hello Mathieu!  I agree that tracing the full history of a data cite is
> important, independent of license.  I'm thinking about scalable solutions
> for this.
> It's definitely not the only factor in reliability; but it does matter who
> entered the data (for instance) as one way to estimate the importance of
> doublechecking a cited source to confirm that the data is found there.
>
> On Sat, Jul 7, 2018 at 11:59 AM mathieu lovato stumpf guntz <
> psychosl...@culture-libre.org> wrote:
>
>
> > I agree this is misconception that a copyright license make any direct
> > change to data reliability. But attribution requirement does somewhat
> > indirectly have an impact on it, as it... enforces traceability.
> > That is I strongly disagree with the following assertion: "a license
> > that requires BY sucks so hard for data [because] attribution
> > requirements grow very quickly". To my mind it is equivalent to say that
> > we will throw away traceability because it is subjectively judged too
> > large a burden, without providing any start of evidence that it indeed
> > can't be managed, at least with Wikimedia current ressources.
> >
> > Now, I don't say traceability is the sole factor one should take into
> > account in data reliability, but certainly it is one of them.
> ___
> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
> wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
> wiki/Wikimedia-l
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] On traceability and reliability of data we publish [was Re: [Wikidata] Solve legal uncertainty of Wikidata]

2018-07-08 Thread Samuel Klein
Hello Mathieu!  I agree that tracing the full history of a data cite is
important, independent of license.  I'm thinking about scalable solutions
for this.
It's definitely not the only factor in reliability; but it does matter who
entered the data (for instance) as one way to estimate the importance of
doublechecking a cited source to confirm that the data is found there.

On Sat, Jul 7, 2018 at 11:59 AM mathieu lovato stumpf guntz <
psychosl...@culture-libre.org> wrote:


> I agree this is misconception that a copyright license make any direct
> change to data reliability. But attribution requirement does somewhat
> indirectly have an impact on it, as it... enforces traceability.
> That is I strongly disagree with the following assertion: "a license
> that requires BY sucks so hard for data [because] attribution
> requirements grow very quickly". To my mind it is equivalent to say that
> we will throw away traceability because it is subjectively judged too
> large a burden, without providing any start of evidence that it indeed
> can't be managed, at least with Wikimedia current ressources.
>
> Now, I don't say traceability is the sole factor one should take into
> account in data reliability, but certainly it is one of them.
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] On traceability and reliability of data we publish [was Re: [Wikidata] Solve legal uncertainty of Wikidata]

2018-07-08 Thread Alphos OGame
Hello,

You seem to be mistaken.

Traceability of information does not pertain with who imported said
information on Wikidata. Could be an unregistered user, could be a bot,
could be a Wikimedian in residence or even the Pope himself, it doesn't
make a difference in the world.
What matters in traceability of a piece of data is *where* it comes from,
and that piece of metadata is achieved through referencing.
That does not belong in licencing.
CC-BY would mean reusers would have to mention individual users (including
vandals, sigh) who took part in compiling datasets on Wikidata, which you
seem to be oblivious to. CC-BY does not help track where a work comes from,
only who took part in making it. Datasets reusers don't need that kind of
information when all they want is a list of countries of which Heads of
State have spouses whose given name starts with a D [1].

When it comes to reliability, there again, who imported it is not
particularly of importance, keeping in mind that Wikidata is not the most
user-friendly wiki in the Wikimedia Ecosystem, as it is not the simplest to
grasp for human minds, and as such not the most frequently vandalized
(three huzzahs for small favors !).
What matters in reliability of a piece of data stems from the *source* of a
particular source of information, which, once again, is indicated by a
reference, and the credit you give to said source.
That again does not belong in licencing.
CC-BY would mean reusers would know that a crapton of people they have no
idea even exist took part in compiling a dataset they require (instead of
just the dataset and references pertaining to it), which again you seem to
be oblivious to. It doesn't protect reusers against vandalism, it does
however make their dataset a whole lot larger by adding the names of a
whole lot of people they don't know or care about.

It really doesn't matter how you put it, the arguments you've put forward
so far simply don't make any kind of sense against CC0 or for CC-BY.

If however you do insist on knowing which user added a particular piece of
data (or reference/metadata, for that matter) to an item, Wikidata keeps an
edit history just in case. It is not necessary for the licence currently in
effect on Wikidata (which, need I remind you, is still CC0), but it is
there nonetheless should you need it.

Now, do we need to keep this needlessly long and tedious thread alive under
another name or could we please drop it and carry on with our lives.

Roger / Alphos

[1] Maybe for a prophecy or something ? Well, if anyone need it, it's
hopelessly simple, so here goes : https://tinyurl.com/yb6dh3r6



2018-07-07 17:59 GMT+02:00 mathieu lovato stumpf guntz <
psychosl...@culture-libre.org>:

> Hi Andra,
>
> I agree this is misconception that a copyright license make any direct
> change to data reliability. But attribution requirement does somewhat
> indirectly have an impact on it, as it legally enforce traceability. That
> is I strongly disagree with the following assertion: "a license that
> requires BY sucks so hard for data [because] attribution requirements grow
> very quickly". To my mind it is equivalent to say that we will throw away
> traceability because it is subjectively judged too large a burden, without
> providing any start of evidence that it indeed can't be managed, at least
> with Wikimedia current ressources.
>
> Now, I don't say traceability is the sole factor one should take into
> account in data reliability, but certainly it is one of them. Maybe we
> should first come with clear criteria to put in a equation that enable to
> calculate reliability of information. Since it's in the core goals of the
> Wikimedia strategy, it would certainly worth the effort to establish clear
> metrics about reliability of information the movement is spreading.
>
> Cheers
>
>
> Le 04/07/2018 à 13:00, Andra Waagmeester a écrit :
>
>> I agree with Maarten and to add to that. It is a huge misconception that
>> CC0  makes data unreliable. It is only a legal statement about copyright,
>> nothing more, nothing less. Statements without proper references and
>> qualifiers make data unreliable, but Wikidata has a decent mechanism to
>> capture that needed provenance.
>>
>> On Wed, Jul 4, 2018 at 12:50 PM, Maarten Dammers > > wrote:
>>
>> Hi Mathieu,
>>
>> On 04-07-18 11:07, mathieu stumpf guntz wrote:
>>
>> Hi,
>>
>> Le 19/05/2018 à 03:35, Denny Vrandečić a écrit :
>>
>>
>> Regarding attribution, commonly it is assumed that you
>> have to respect it transitively. That is one of the
>> reasons a license that requires BY sucks so hard for data:
>> unlike with text, the attribution requirements grow very
>> quickly. It is the same as with modified images and
>> collages: it is not sufficient to attribute the last
>> author, but all contributors have to be attributed.
>>
>> If we want our