If we are running dedicated services for free on behalf of a major search
engine as  part of our symbiotic relationship with them, then kudos to Lila
for putting it on the trustees agenda and getting some discussion in the

I can understand how we get into a situation where a known major search
engine is allowed a level of web crawling that would be treated as a denial
of service attack if it came from elsewhere.

Echoing Andreas and Denny I can see the case for asking for some
contribution to cost recovery when we do something extra for a major reuser
of our data. But I would prefer this to be couched as part of a wider
strategic dialogue with those entities.

My particular concern is with attack pages, and if we are providing the
service that crawls all edits including new pages then I think we can do
what has in the past been dismissed as impossible or outside our control:
Shift the new page process to one where unpatrolled pages are not crawled
by search engine bots until after someone has patrolled them.
Treat "flagged for deletion" as a third status in addition to patrolled and
If we do this then when someone creates an article about their high school
prom queen and her unorthodox method for getting good grades from male
teachers, we should be able to delete it without it being mirrored for
hours by search engines.

Others might want the dialogue to be more about how much content can be
shown in an uneditable unattributed way by being treated as simply
extracted facts and thereby public domain.

I'm keen that the WMF board has oversight of these arrangements,  I
appreciate that some data about crawl frequencies and algorithms will be
confidential to the commercial entities involved, So I could understand if
some discussions or  briefing papers to the board were confidential.

What I don't want is for cost recovery to be the first item on the agenda
when we talk about these relationships. Less mirroring of vandalism and
attack pages, better compliance with CC-BY-SA and other licenses and more
opportunities for readers to edit are more important to me, and considering
our current financial health should be to us all..

This does of course bring us back to the discussion about conflicts of
interest and the need for staff and trustees to recuse, not just when their
employer's crawler is being discussed, but also when making decisions about
entities in which they own any shares. I think we should also add when the
trustees are discussing their employer's direct competitors. It might also
help if more of the trustees had the detachment and neutrality of say a
Canadian Medic as opposed to a silicon valley insider whose future
employers could easily be other tech giants.


Message: 3
> Date: Sat, 16 Jan 2016 18:11:51 -0800
> From: Denny Vrandecic <dvrande...@wikimedia.org>
> To: Wikimedia Mailing List <wikimedia-l@lists.wikimedia.org>
> Subject: Re: [Wikimedia-l] Monetizing Wikimedia APIs
> Message-ID:
>         <
> calurxatfxjs9a3oo-kz_w+prdqshgfxhye5kq23rgfhetax...@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> I find it rather surprising, but I very much find myself in agreement with
> most what Andreas Kolbe said on this thread.
> To give a bit more thoughts: I am not terribly worried about current
> crawlers. But currently, and more in the future, I expect us to provide
> more complex and this expensive APIs: a SPARQL endpoint, parsing APIs, etc.
> These will be simply expensive to operate. Not for infrequent users - say,
> to the benefit of us 70,000 editors - but for use cases that involve tens
> or millions of requests per day. These have the potential of burning a lot
> of funds to basically support the operations of commercial companies whose
> mission might or might not be aligned with our.
> Is monetizing such use cases really entirely unthinkable? Even under
> restrictions like the ones suggested by Andreas, or other such restrictions
> we should discuss?
> On Jan 16, 2016 3:49 PM, "Risker" <risker...@gmail.com> wrote:
> > Hmm.  The majority of those crawlers are from search engines - the very
> > search engines that keep us in the top 10 of their results (and often in
> > the top 3), thus leading to the usage and donations that we need to
> > survive. If they have to pay, then they might prefer to change their
> > algorithm, or reduce the frequency of scraping (thus also failing to
> catch
> > updates to articles including removal of vandalism in the lead
> paragraphs,
> > which is historically one of the key reasons for frequently crawling the
> > same articles).  Those crawlers are what attracts people to our sites, to
> > read, to make donations, to possibly edit.  Of course there are lesser
> > crawlers, but they're not really big players.
> >
> > I'm at a loss to understand why the Wikimedia Foundation should take on
> the
> > costs and indemnities associated with hiring staff to create a for-pay
> > that would have to meet the expectations of a customer (or more than one
> > customer) that hasn't even agreed to pay for access.  If they want a
> > specialized API (and we've been given no evidence that they do), let THEM
> > hire the staff, pay them, write the code in an appropriately open-source
> > way, and donate it to the WMF with the understanding that it could be
> > modified as required, and that it will be accessible to everyone.
> >
> > It is good that the WMF has studied the usage patterns.  Could a link be
> > given to the report, please?  It's public, correct?  This is exactly the
> > point of transparency.  If only the WMF has the information, then it
> gives
> > an excuse for the community's comments to be ignored "because they don't
> > know the facts".  So let's lay out all the facts on the table, please.
> >
> > Risker/Anne
> >
Wikimedia-l mailing list, guidelines at: 
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 

Reply via email to