Well, I have now updated the script to include the xml dump in raw format. I will have to add more information the achive.org item, at least a basic readme. other thing is that the wikipybot does not support the full history it seems, so that I will have to move over to the wikiteam version and rework it, I just spent 2 hours on this so i am pretty happy for the first version.
mike On Tue, May 29, 2012 at 1:52 AM, Hydriz Wikipedia <[email protected]> wrote: > This is quite nice, though the item's metadata is too little :) > > On Tue, May 29, 2012 at 3:40 AM, Mike Dupont <[email protected] >> wrote: > >> first version of the Script is ready , it gets the versions, puts them >> in a zip and puts that on archive.org >> https://github.com/h4ck3rm1k3/pywikipediabot/blob/master/export_deleted.py >> >> here is an example output : >> http://archive.org/details/wikipedia-delete-2012-05 >> >> http://ia601203.us.archive.org/24/items/wikipedia-delete-2012-05/archive2012-05-28T21:34:02.302183.zip >> >> I will cron this, and it should give a start of saving deleted data. >> Articles will be exported once a day, even if they they were exported >> yesterday as long as they are in one of the categories. >> >> mike >> >> On Mon, May 21, 2012 at 7:21 PM, Mike Dupont >> <[email protected]> wrote: >> > Thanks! and run that 1 time per day, they dont get deleted that quickly. >> > mike >> > >> > On Mon, May 21, 2012 at 9:11 PM, emijrp <[email protected]> wrote: >> >> Create a script that makes a request to Special:Export using this >> category >> >> as feed >> >> https://en.wikipedia.org/wiki/Category:Candidates_for_speedy_deletion >> >> >> >> More info >> https://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export >> >> >> >> >> >> 2012/5/21 Mike Dupont <[email protected]> >> >>> >> >>> Well I whould be happy for items like this : >> >>> http://en.wikipedia.org/wiki/Template:Db-a7 >> >>> would it be possible to extract them easily? >> >>> mike >> >>> >> >>> On Thu, May 17, 2012 at 2:23 PM, Ariel T. Glenn <[email protected]> >> >>> wrote: >> >>> > There's a few other reasons articles get deleted: copyright issues, >> >>> > personal identifying data, etc. This makes maintaning the sort of >> >>> > mirror you propose problematic, although a similar mirror is here: >> >>> > http://deletionpedia.dbatley.com/w/index.php?title=Main_Page >> >>> > >> >>> > The dumps contain only data publically available at the time of the >> run, >> >>> > without deleted data. >> >>> > >> >>> > The articles aren't permanently deleted of course. The revisions >> texts >> >>> > live on in the database, so a query on toolserver, for example, >> could be >> >>> > used to get at them, but that would need to be for research purposes. >> >>> > >> >>> > Ariel >> >>> > >> >>> > Στις 17-05-2012, ημέρα Πεμ, και ώρα 13:30 +0200, ο/η Mike Dupont >> έγραψε: >> >>> >> Hi, >> >>> >> I am thinking about how to collect articles deleted based on the >> "not >> >>> >> notable" criteria, >> >>> >> is there any way we can extract them from the mysql binlogs? how are >> >>> >> these mirrors working? I would be interested in setting up a mirror >> of >> >>> >> deleted data, at least that which is not spam/vandalism based on >> tags. >> >>> >> mike >> >>> >> >> >>> >> On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn < >> [email protected]> >> >>> >> wrote: >> >>> >> > We now have three mirror sites, yay! The full list is linked to >> from >> >>> >> > http://dumps.wikimedia.org/ and is also available at >> >>> >> > >> >>> >> > >> http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current_Mirrors >> >>> >> > >> >>> >> > Summarizing, we have: >> >>> >> > >> >>> >> > C3L (Brazil) with the last 5 good known dumps, >> >>> >> > Masaryk University (Czech Republic) with the last 5 known good >> dumps, >> >>> >> > Your.org (USA) with the complete archive of dumps, and >> >>> >> > >> >>> >> > for the latest version of uploaded media, Your.org with >> >>> >> > http/ftp/rsync >> >>> >> > access. >> >>> >> > >> >>> >> > Thanks to Carlos, Kevin and Yenya respectively at the above sites >> for >> >>> >> > volunteering space, time and effort to make this happen. >> >>> >> > >> >>> >> > As people noticed earlier, a series of media tarballs per-project >> >>> >> > (excluding commons) is being generated. As soon as the first run >> of >> >>> >> > these is complete we'll announce its location and start generating >> >>> >> > them >> >>> >> > on a semi-regular basis. >> >>> >> > >> >>> >> > As we've been getting the bugs out of the mirroring setup, it is >> >>> >> > getting >> >>> >> > easier to add new locations. Know anyone interested? Please let >> us >> >>> >> > know; we would love to have them. >> >>> >> > >> >>> >> > Ariel >> >>> >> > >> >>> >> > >> >>> >> > _______________________________________________ >> >>> >> > Wikitech-l mailing list >> >>> >> > [email protected] >> >>> >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> >>> >> >> >>> >> >> >>> >> >> >>> > >> >>> > >> >>> > >> >>> > _______________________________________________ >> >>> > Wikitech-l mailing list >> >>> > [email protected] >> >>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> >>> >> >>> >> >>> >> >>> -- >> >>> James Michael DuPont >> >>> Member of Free Libre Open Source Software Kosova http://flossk.org >> >>> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org >> >>> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3 >> >>> >> >>> _______________________________________________ >> >>> Wikitech-l mailing list >> >>> [email protected] >> >>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> >> >> >> >> >> >> >> >> >> -- >> >> Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com >> >> Pre-doctoral student at the University of Cádiz (Spain) >> >> Projects: AVBOT | StatMediaWiki | WikiEvidens | WikiPapers | WikiTeam >> >> Personal website: https://sites.google.com/site/emijrp/ >> >> >> >> >> >> _______________________________________________ >> >> Xmldatadumps-l mailing list >> >> [email protected] >> >> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l >> >> >> > >> > >> > >> > -- >> > James Michael DuPont >> > Member of Free Libre Open Source Software Kosova http://flossk.org >> > Contributor FOSM, the CC-BY-SA map of the world http://fosm.org >> > Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3 >> >> >> >> -- >> James Michael DuPont >> Member of Free Libre Open Source Software Kosova http://flossk.org >> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org >> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3 >> >> _______________________________________________ >> Wikitech-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> > > > > -- > Regards, > Hydriz > > We've created the greatest collection of shared knowledge in history. Help > protect Wikipedia. Donate now: http://donate.wikimedia.org > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- James Michael DuPont Member of Free Libre Open Source Software Kosova http://flossk.org Contributor FOSM, the CC-BY-SA map of the world http://fosm.org Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3 _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
