+1 to Leila. Really good suggestions re. making the dataset cite-able and providing an in-depth discussion of how it was produced. That's a lot of work, but it could produce a bunch of additional value.
Thanks for working on this, A-team. I wish I could transport it back to the past so I could use it to finish my dissertation faster! On Tue, Feb 11, 2020 at 3:30 PM Giovanni Luca Ciampaglia <[email protected]> wrote: > Hi Joseph, > > Thanks a lot for creating and sharing such a valuable resource. I went > through the schema and from what I understand there is no information about > page-to-page links, correct? Are there any resources that would provide > such historical data? > > Best, > > *Giovanni Luca Ciampaglia* ∙ glciampaglia.com > Assistant Professor > Computer Science and Engineering > <https://www.usf.edu/engineering/cse/> ∙ University > of South Florida <https://www.usf.edu/> > > *Due to Florida’s broad open records law, email to or from university > employees is public record, available to the public and the media upon > request.* > > > On Mon, Feb 10, 2020 at 11:28 AM Joseph Allemandou < > [email protected]> wrote: > > > Hi Analytics People, > > > > The Wikimedia Analytics Team is pleased to announce the release of the > most > > complete dataset we have to date to analyze content and contributors > > metadata: Mediawiki History [1] [2]. > > > > Data is in TSV format, released monthly around the 3rd of the month > > usually, and every new release contains the full history of metadata. > > > > The dataset contains an enhanced [3] and historified [4] version of user, > > page and revision metadata and serves as a base to Wiksitats API on > edits, > > users and pages [5] [6]. > > > > We hope you will have as much fun playing with the data as we have > building > > it, and we're eager to hear from you [7], whether for issues, ideas or > > usage of the data. > > > > Analytically yours, > > > > -- > > Joseph Allemandou (joal) (he / him) > > Sr Data Engineer > > Wikimedia Foundation > > > > [1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html > > [2] > > > > > https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps > > [3] Many pre-computed fields are present in the dataset, from edit-counts > > by user and page to reverts and reverted information, as well as time > > between events. > > [4] As accurate as possible historical usernames and page-titles (as well > > as user-groups and blocks) is available in addition to current values, > and > > are provided in a denormalized way to every event of the dataset. > > [5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2 > > [6] https://wikimedia.org/api/rest_v1/ > > [7] > > > > > https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20History%20Dumps&projectPHIDs=Analytics-Wikistats,Analytics > > _______________________________________________ > > Wiki-research-l mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > _______________________________________________ > Wiki-research-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > _______________________________________________ Wiki-research-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
