A few more thoughts:

* You probably don't need the full URLs of the content being accessed, so
those could be anonymized and replaced with random identifiers to some
degree, right?

* Someone might be able to monitor the user's end of the transactions, such
as by having university network logs that show destination domains and
timestamps, in such a way that they could pair the university logs with
Wikimedia access traces of one second granularity and thus defeat some
measures of privacy for the university's Wikimedia users, correct?

* I am not sure that the staff time required to analyze this request and
produce the data is a good use of resources on Wikimedia's end. Toby would
be a good person to ask about this.

Pine
 On Sep 20, 2014 12:45 AM, "Pine W" <wiki.p...@gmail.com> wrote:

> Thanks for the explanation. On moderate to high traffic pages, let's say
> with a minimum of 10 hits per minute across the entire time span studied,
> perhaps the requested data could be provided while still providing strong
> privacy protection. Toby might need to discuss this with WMF Legal.
>
> Pine
> On Sep 19, 2014 4:57 AM, "Valerio Schiavoni" <valerio.schiav...@gmail.com>
> wrote:
>
>> Hello everyone,
>> it seems the discussion is sparkling an interesting debate, thanks to
>> everyone.
>>
>> To put back things in context, we use Wikipedia as one of the few
>> websites where users can access different 'versions' of the same page.
>> Users mostly read the most recent version of a given page, but from time
>> to time, read accesses to the 'history' of a page happens.
>> New versions of a page are created as well. Finally, users might
>> potentially need to explore several old versions of a given web page, for
>> example by accessing the details of its history[1].
>> Access traces need to be accurate to model the workload on the servers
>> that are storing the contents being served the web serves.
>> A resolution bigger than 1 second would not reflect the access patterns
>> on Wikipedia, or similarly versioned, web sites.
>> We use these access patterns to test different version-aware storage
>> techniques.
>> For those interested, I could send the pre-print version of an article
>> that
>> I will present next month at the IEEE SRDS'14 conference.
>>
>> For what concern potential privacy concerns about disclosing such traces,
>> I would like to stress that we are not looking into 'who' or from 'where' a
>> given URL was requested. Those informations are completely absent from the
>> Wikibench traces, and can/should remain such in new traces.
>>
>> Let's say Wikipedia somehow reveals the top-10 most-visited pages in the
>> last minute: would that represent a privacy breach for some users? I hardly
>> doubt so, and I invite the audience to convince me about the contrary.
>>
>> Best regards,
>> Valerio
>>
>> 1- For example:
>> http://it.wikipedia.org/w/index.php?title=George_W._Bush&action=history
>>
>> On Fri, Sep 19, 2014 at 8:36 AM, Pine W <wiki.p...@gmail.com> wrote:
>>
>>> Let's loop back to the request at hand. Valerio, can you describe your
>>> use case for access traces at intervals shorter than one hour? The very
>>> likely outcome of this discussion is that the access traces at shorter
>>> intervals will not be made available, but I'm curious about what you would
>>> do with the data if you had it.
>>>
>>> Pine
>>> On Sep 18, 2014 4:55 PM, "Richard Jensen" <rjen...@uic.edu> wrote:
>>>
>>>> the basic issue in sampling is to decide what the target population T
>>>> actually is. Then you weight the sample so that each person in the target
>>>> population has an equal chance w  and people not in it have weight zero.
>>>>
>>>> So what is the target population we want to study?
>>>> --the world's population?
>>>> --the world's educated population?
>>>> --everyone with internet access
>>>> --everyone who ever uses Wikipedia
>>>> --everyone who use it a lot
>>>> --everyone  who has knowledge to contribute in positive fashion?
>>>> --everyone  who has the internet, skills and potential to contribute?
>>>> --everyone  who has the potential to contribute but does not do so?
>>>>
>>>> Richard Jensen
>>>> rjen...@uic.edu
>>>>
>>>>
>>>> _______________________________________________
>>>> Wiki-research-l mailing list
>>>> Wiki-research-l@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> Wiki-research-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to