Should interactive web, internet of things, or offline services
relying on Foundation encyclopedia CC-BY-SA content be required to
attribute authorship by specifying the revision date from which the
transluded content is derived?

On Thu, Oct 12, 2017 at 7:01 AM, Erik Moeller <> wrote:
> On Tue, Oct 10, 2017 at 7:31 AM, Andreas Kolbe <> wrote:
>> Wikidata has its own problems in that regard that have triggered ongoing
>> discussions and concerns on the English Wikipedia.[1]
> Tensions between different communities with overlapping but
> non-identical objectives are unavoidable. Repository projects like
> Wikidata and Wikimedia Commons provide huge payoff: they dramatically
> reduce duplication of effort, enable small language communities to
> benefit from the work done internationally, and can tackle a more
> expansive scope than the immediate needs of existing projects. A few
> examples include:
> - Wiki Loves Monuments, recognized as the world's largest photo competition
> - Partnerships with countless galleries, libraries, archives, and museums
> - Wikidata initiatives like mySociety's "Everypolitician" project or Gene Wiki
> This is not without its costs, however. Differing policies, levels of
> maturity, and social expectations will always fuel some level of
> conflict, and the repository approach creates huge usability
> challenges. The latter is also true for internal wiki features like
> templates, which shift information out of the article space,
> disempowering users who no longer understand how the whole is
> constructed from its parts.
> I would call these usability and "legibility" issues the single
> biggest challenge in the development of Wikidata, Structured Data for
> Commons, and other repository functionality. Much related work has
> already been done or is ticketed in Phabricator, such as the effective
> propagation of changes into watchlists, article histories, and
> notifications. Much more will need to follow.
> With regard to the issue of citations, it's worth noting that it's
> already possible to _conditionally_ load data from Wikidata, excluding
> information that is unsourced or only sourced circularly (i.e. to
> Wikipedia itself). [1] Template invocations can also override values
> provided by Wikidata, for example, if there is a source, but it is not
> considered reliable by the standards of a specific project.
>> If a digital voice assistant propagates a Wikimedia mistake without telling
>> users where it got its information from, then there is not even a feedback
>> form. Editability is of no help at all if people can't find the source.
> I'm in favor of always indicating at least provenance (something like
> "Here's a quote from Wikipedia:"), even for short excerpts, and I
> certainly think WMF and chapters can advocate for this practice.
> However, where short excerpts are concerned, it's not at all clear
> that there is a _legal_ issue here, and that full compliance with all
> requirements of the license is a reasonable "ask".
> Bing's search result page manages a decent compromise, I think: it
> shows excerpts from Wikipedia clearly labeled as such, and it links to
> the CC-BY-SA license if you expand the excerpt, e.g.:
> I know that over the years, many efforts have been undertaken to
> document best practices for re-use, ranging from local
> community-created pages to chapter guides and tools like the
> "Lizenzhinweisgenerator". I don't know what the best-available of
> these is nowadays, but if none exists, it might be a good idea to
> develop a new, comprehensive guide that takes into account voice
> applications, tabular data, and so on.
> Such a guide would ideally not just be written from a license
> compliance perspective, but also include recommendations, e.g., on how
> to best indicate provenance, distinguishing "here's what you must do"
> from "here's what we recommend".
>>> Wikidata will often provide a shallow first level of information about
>>> a subject, while other linked sources provide deeper information. The
>>> more structured the information, the easier it becomes to validate in
>>> an automatic fashion that, for example, the subset of country
>>> population time series data represented in Wikidata is an accurate
>>> representation of the source material. Even when a large source
>>> dataset is mirrored by Wikimedia (for low-latency visualization, say),
>>> you can hash it, digitally sign it, and restrict modifiability of
>>> copies.
>> Interesting, though I'm not aware of that being done at present.
> At present, Wikidata allows users to model constraints on internal
> data validity. These constraints are used for regularly generated
> database reports as well as on-demand lookup via
> . This kicks
> in, for example, if you put in an insane number in a population field,
> or mark a country as female.
> There is a project underway to also validate against external sources; see:
> Wikidata still tends to deal with relatively small amounts of data; a
> highly annotated item like Germany (Q183), for example, comes in at
> under 1MB in uncompressed JSON form. Time series data like GDP is
> often included only for a single point in time, or for a subset of the
> available data. The relatively new "Data:" namespace on Commons exists
> to store raw datasets; this is only used to a very limited extent so
> far, but there are some examples of how such data can be visualized,
> e.g.:
> Giving volunteers more powerful tools to select and visualize data
> while automating much of the effort of maintaining data integrity
> seems like an achievable and strategic goal, and as these examples
> show, some building blocks for this are already in place.
>>> But the proprietary knowledge graphs are valuable to users in ways
>>> that the previous generation of search engines was not. Interacting
>>> with a device like you would with a human being ("Alexa/Google/Siri,
>>> is yarrow edible?") makes knowledge more accessible and usable,
>>> including to people who have difficulty reading long texts, or who are
>>> not literate at all. In this sense I don't think WMF should ever find
>>> itself in the position to argue _against_ inclusion of information
>>> from Wikimedia projects in these applications.
>> There is a distinct likelihood that they will make reading Wikipedia
>> articles progressively obsolete, just like the availability of Googling has
>> dissuaded many people from sitting down and reading a book.
> There is an important distinction between "lookup" and "learning"; the
> former is a transactional activity ("Is this country part of the Euro
> zone?") and the latter an immersive one ("How did the EU come
> about?"). Where we now get instant answers from home assistants or
> search engines, we may have previously skimmed, or performed our own
> highly optimized search in the local knowledge repository called a
> "bookshelf".
> In other words, even if some instant answers lead to a drop in
> Wikipedia views, it would be unreasonable to assume that those views
> were "reads" rather than "skims". When you're on a purely
> transactional journey, you appreciate almost anything that shortens
> it.
> I don't think Wikimedia should fight the gravity of a user's
> intentions out of its own pedagogical motives. Rather, it should make
> both lookup and learning as appealing as possible. Doing well in the
> "lookup" category is important to avoid handing too much control off
> to gatekeepers, and being good in the "learning" category holds the
> greatest promise for lasting positive impact.
> As for the larger social issue, at least in the US, the youngest (most
> googley) generation is the one that reads the most books, and
> income/education are very strong predictors of whether people do or
> not:
>>> The applications themselves are not the problem; the centralized
>>> gatekeeper control is. Knowledge as an open service (and network) is
>>> actually the solution to that root problem. It's how we weaken and
>>> perhaps even break the control of the gatekeepers. Your critique seems
>>> to boil down to "Let's ask Google for more crumbs". In spite of all
>>> your anti-corporate social justice rhetoric, that seems to be the path
>>> to developing a one-sided dependency relationship.
>> I considered that, but in the end felt that given the extent to which
>> Google profited from volunteers' work, it wasn't an unfair ask.
> While I think your proposal to ask Google to share access to resources
> it already has digitized or licensed is worth considering, I would
> suggest being very careful about the long term implications of any
> such agreements. Having a single corporation control volunteers'
> access to proprietary resources means that such access can also be
> used as leverage down the road, or abruptly be taken away for other
> reasons.
> I think it would be more interesting to spin off the existing
> "Wikipedia Library" into its own international organization (or home
> it with an existing one), tasked with giving free knowledge
> contributors (including potentially to other free knowledge projects
> like OSM) access to proprietary resources, and pursuing public and
> private funding of its own. The development of many relationships may
> take longer, but it is more sustainable in the long run. Moreover, it
> has the potential to lead to powerful collaborations with existing
> public/nonprofit digitization and preservation efforts.
>> Publicise the fact that Google and others profit from volunteer work, and
>> give very little back. The world could do with more articles like this:
> I have plenty of criticisms of Facebook, but the fact that users don't
> get paid for posting selfies isn't one of them. My thoughts on how the
> free culture movement (not limited to Wikipedia) should interface with
> the for-profit sector are as follows, FWIW:
> 1) Demand appropriate levels of taxation on private profits, [2]
> sufficient investments in public education and cultural institutions,
> and "open licensing" requirements on government contracts with private
> corporations.
> 2) Require compliance with free licenses, first gently, then more
> firmly. This is a game of diminishing returns, and it's most useful to
> go after the most blatant and problematic cases. As noted above, "fair
> use" limits should be understood and taken into consideration.
> 3) Encourage corporations to be "good citizens" of the free culture
> world, whether it's through indicating provenance beyond what's
> legally required, or by contributing directly (open source
> development, knowledge/data donations, in-kind goods/services,
> financial contributions). The payoff for them is goodwill and a
> thriving (i.e. also profitable) open Internet that more people in more
> places use for more things.
> 4) Build community-driven, open, nonprofit alternatives to
> out-of-control corporate quasi-monopolies. As far as proprietary
> knowledge graphs are concerned, I will reiterate: open data is the
> solution, not the problem.
> Cheers,
> Erik
> [1] See the getValue function in
> , specifically its
> "onlysourced" parameter. The module also adds a convenient "Edit this
> on Wikidata" link to each claim included from there.
> [2] As far as Wikimedia organizations are concerned, specific tax
> policy will likely always be out of scope of political advocacy, but
> the other points need not be.
> _______________________________________________
> Wikimedia-l mailing list, guidelines at: 
> and 
> New messages to:
> Unsubscribe:, 
> <>

Wikimedia-l mailing list, guidelines at: and
New messages to:

Reply via email to