On Tue, Nov 8, 2016 at 9:10 AM, James Salsman <jsals...@gmail.com> wrote:

> I assumed that when an affiliated researcher apart from Foundation
> staff says, "we have the complete server logs for Wikipedia,"
> amounting to 17 terabytes per month, that means they possess the
> information. I am glad to be wrong about that, but I object to the
> implication that such an assumption based on the plain language of
> the statement could possibly be made in bad faith.

I am glad we cleared that confusion.

> > the terms of our formal collaborations
> > https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations
> > prohibit the sharing of any raw data containing PII (such as
> > webrequest logs) outside of WMF operated servers,
> There is nothing on that page which suggests that prohibition.

You're correct that that document doesn't describe in detail the data
access process. When we start a formal collaboration under an NDA, we have
an onboarding process that gives researchers restricted access to our
cluster, covers server access responsibilities and best practices around
the handling of private data. I'll check with our Legal and Security team
if we can better document this process.

> > as well as the retention of any such data past our data retention
> > period https://meta.wikimedia.org/wiki/Data_retention_guidelines
> That page says, "Information (including personal information)
> collected through participation in a survey or other research
> conducted by the Wikimedia Foundation will be retained indefinitely
> for educational, development, or other related purposes, unless
> otherwise indicated in the privacy policy or statement of such
> survey or research."

This is for surveys requesting explicit (*opt in*) consent to collect and
retain specific types of data (such as demographic information) from
participants, not for data collected by default via our webrequest logs.
Webrequest logs and instrumentation data is purged/sanitized by default
within a the 90-day retention window, most often the data sits on our
servers for a much shorter time and is removed in a shorter time frame.

> https://meta.wikimedia.org/w/index.php?title=Talk:2016_
> Strategy/Draft_WMF_Strategy&diff=15467086&oldid=15466763
> says that the Foundation's standard research NDAs include an
> "obligation to return or destroy any copies of confidential
> information the individual may have upon request by WMF"
> Does that not imply that such copies are allowed in general?

IANAL so I can't comment on that but I believe this is a clause that's part
of our NDA to avoid confidential information (not specifically PII) to be
retained by third parties past the terms of the NDA.

> I hope we can move forward to a solution to the general problem.
> Is there any legitimate research or any other need to save IP
> addresses associated with HTTP GET web logs to disk prior to
> creating a secure hash of them?

these are considerations that the analytics / ops team are best suited to
answer, I encourage you to relay them to analytics-l if you want to have a
more technical discussion.

Wikimedia-l mailing list, guidelines at: 
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 

Reply via email to