+1 to Leila.  Really good suggestions re. making the dataset cite-able and
providing an in-depth discussion of how it was produced.  That's a lot of
work, but it could produce a bunch of additional value.

Thanks for working on this, A-team.  I wish I could transport it back to
the past so I could use it to finish my dissertation faster!

On Tue, Feb 11, 2020 at 3:30 PM Giovanni Luca Ciampaglia <[email protected]>
wrote:

> Hi Joseph,
>
> Thanks a lot for creating and sharing such a valuable resource. I went
> through the schema and from what I understand there is no information about
> page-to-page links, correct? Are there any resources that would provide
> such historical data?
>
> Best,
>
> *Giovanni Luca Ciampaglia* ∙ glciampaglia.com
> Assistant Professor
> Computer Science and Engineering
> <https://www.usf.edu/engineering/cse/> ∙ University
> of South Florida <https://www.usf.edu/>
>
> *Due to Florida’s broad open records law, email to or from university
> employees is public record, available to the public and the media upon
> request.*
>
>
> On Mon, Feb 10, 2020 at 11:28 AM Joseph Allemandou <
> [email protected]> wrote:
>
> > Hi Analytics People,
> >
> > The Wikimedia Analytics Team is pleased to announce the release of the
> most
> > complete dataset we have to date to analyze content and contributors
> > metadata: Mediawiki History [1] [2].
> >
> > Data is in TSV format, released monthly around the 3rd of the month
> > usually, and every new release contains the full history of metadata.
> >
> > The dataset contains an enhanced [3] and historified [4] version of user,
> > page and revision metadata and serves as a base to Wiksitats API on
> edits,
> > users and pages [5] [6].
> >
> > We hope you will have as much fun playing with the data as we have
> building
> > it, and we're eager to hear from you [7], whether for issues, ideas or
> > usage of the data.
> >
> > Analytically yours,
> >
> > --
> > Joseph Allemandou (joal) (he / him)
> > Sr Data Engineer
> > Wikimedia Foundation
> >
> > [1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html
> > [2]
> >
> >
> https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps
> > [3] Many pre-computed fields are present in the dataset, from edit-counts
> > by user and page to reverts and reverted information, as well as time
> > between events.
> > [4] As accurate as possible historical usernames and page-titles (as well
> > as user-groups and blocks) is available in addition to current values,
> and
> > are provided in a denormalized way to every event of the dataset.
> > [5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
> > [6] https://wikimedia.org/api/rest_v1/
> > [7]
> >
> >
> https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20History%20Dumps&projectPHIDs=Analytics-Wikistats,Analytics
> > _______________________________________________
> > Wiki-research-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to