Re: [Wikimedia-l] DEITYBOUNCE and reader logs (was Re: Introducing Victoria Coleman, WMF Chief Technology Officer)

Dario Taraborelli Fri, 11 Nov 2016 11:34:06 -0800

If you want to hear about the results of this research collaboration, or
have additional questions about the data collection approach or the
analysis, I invite you to come and join us at our upcoming showcase on
*Wednesday
11/16. *


https://lists.wikimedia.org/pipermail/analytics/2016-November/005504.html

On Tue, Nov 8, 2016 at 10:42 AM, Dario Taraborelli <
[email protected]> wrote:

>
> On Tue, Nov 8, 2016 at 9:10 AM, James Salsman <[email protected]> wrote:
>
>> I assumed that when an affiliated researcher apart from Foundation
>> staff says, "we have the complete server logs for Wikipedia,"
>> amounting to 17 terabytes per month, that means they possess the
>> information. I am glad to be wrong about that, but I object to the
>> implication that such an assumption based on the plain language of
>> the statement could possibly be made in bad faith.
>>
>
> I am glad we cleared that confusion.
>
>
>> > the terms of our formal collaborations
>> > https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations
>> > prohibit the sharing of any raw data containing PII (such as
>> > webrequest logs) outside of WMF operated servers,
>>
>> There is nothing on that page which suggests that prohibition.
>>
>
> You're correct that that document doesn't describe in detail the data
> access process. When we start a formal collaboration under an NDA, we have
> an onboarding process that gives researchers restricted access to our
> cluster, covers server access responsibilities and best practices around
> the handling of private data. I'll check with our Legal and Security team
> if we can better document this process.
>
>
>> > as well as the retention of any such data past our data retention
>> > period https://meta.wikimedia.org/wiki/Data_retention_guidelines
>>
>> That page says, "Information (including personal information)
>> collected through participation in a survey or other research
>> conducted by the Wikimedia Foundation will be retained indefinitely
>> for educational, development, or other related purposes, unless
>> otherwise indicated in the privacy policy or statement of such
>> survey or research."
>>
>
> This is for surveys requesting explicit (*opt in*) consent to collect and
> retain specific types of data (such as demographic information) from
> participants, not for data collected by default via our webrequest logs.
> Webrequest logs and instrumentation data is purged/sanitized by default
> within a the 90-day retention window, most often the data sits on our
> servers for a much shorter time and is removed in a shorter time frame.
>
>
>> https://meta.wikimedia.org/w/index.php?title=Talk:2016_Strat
>> egy/Draft_WMF_Strategy&diff=15467086&oldid=15466763
>> says that the Foundation's standard research NDAs include an
>> "obligation to return or destroy any copies of confidential
>> information the individual may have upon request by WMF"
>>
>> Does that not imply that such copies are allowed in general?
>>
>
> IANAL so I can't comment on that but I believe this is a clause that's
> part of our NDA to avoid confidential information (not specifically PII) to
> be retained by third parties past the terms of the NDA.
>
>
>> I hope we can move forward to a solution to the general problem.
>>
>> Is there any legitimate research or any other need to save IP
>> addresses associated with HTTP GET web logs to disk prior to
>> creating a secure hash of them?
>>
>
> these are considerations that the analytics / ops team are best suited to
> answer, I encourage you to relay them to analytics-l if you want to have a
> more technical discussion.
>
> HTH,
> Dario
>
>


-- 

*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
_______________________________________________
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: [email protected]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:[email protected]?subject=unsubscribe>

Re: [Wikimedia-l] DEITYBOUNCE and reader logs (was Re: Introducing Victoria Coleman, WMF Chief Technology Officer)

Reply via email to