first version of the Script is ready , it gets the versions, puts them
in a zip and puts that on archive.org
https://github.com/h4ck3rm1k3/pywikipediabot/blob/master/export_deleted.py

here is an example output :
http://archive.org/details/wikipedia-delete-2012-05
http://ia601203.us.archive.org/24/items/wikipedia-delete-2012-05/archive2012-05-28T21:34:02.302183.zip

I will cron this, and it should give a start of saving deleted data.
Articles will be exported once a day, even if they they were exported
yesterday as long as they are in one of the categories.

mike

On Mon, May 21, 2012 at 7:21 PM, Mike  Dupont
<[email protected]> wrote:
> Thanks! and run that 1 time per day, they dont get deleted that quickly.
> mike
>
> On Mon, May 21, 2012 at 9:11 PM, emijrp <[email protected]> wrote:
>> Create a script that makes a request to Special:Export using this category
>> as feed
>> https://en.wikipedia.org/wiki/Category:Candidates_for_speedy_deletion
>>
>> More info https://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export
>>
>>
>> 2012/5/21 Mike Dupont <[email protected]>
>>>
>>> Well I whould be happy for items like this :
>>> http://en.wikipedia.org/wiki/Template:Db-a7
>>> would it be possible to extract them easily?
>>> mike
>>>
>>> On Thu, May 17, 2012 at 2:23 PM, Ariel T. Glenn <[email protected]>
>>> wrote:
>>> > There's a few other reasons articles get deleted: copyright issues,
>>> > personal identifying data, etc.  This makes maintaning the sort of
>>> > mirror you propose problematic, although a similar mirror is here:
>>> > http://deletionpedia.dbatley.com/w/index.php?title=Main_Page
>>> >
>>> > The dumps contain only data publically available at the time of the run,
>>> > without deleted data.
>>> >
>>> > The articles aren't permanently deleted of course.  The revisions texts
>>> > live on in the database, so a query on toolserver, for example, could be
>>> > used to get at them, but that would need to be for research purposes.
>>> >
>>> > Ariel
>>> >
>>> > Στις 17-05-2012, ημέρα Πεμ, και ώρα 13:30 +0200, ο/η Mike Dupont έγραψε:
>>> >> Hi,
>>> >> I am thinking about how to collect articles deleted based on the "not
>>> >> notable" criteria,
>>> >> is there any way we can extract them from the mysql binlogs? how are
>>> >> these mirrors working? I would be interested in setting up a mirror of
>>> >> deleted data, at least that which is not spam/vandalism based on tags.
>>> >> mike
>>> >>
>>> >> On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn <[email protected]>
>>> >> wrote:
>>> >> > We now have three mirror sites, yay!  The full list is linked to from
>>> >> > http://dumps.wikimedia.org/ and is also available at
>>> >> >
>>> >> > http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current_Mirrors
>>> >> >
>>> >> > Summarizing, we have:
>>> >> >
>>> >> > C3L (Brazil) with the last 5 good known dumps,
>>> >> > Masaryk University (Czech Republic) with the last 5 known good dumps,
>>> >> > Your.org (USA) with the complete archive of dumps, and
>>> >> >
>>> >> > for the latest version of uploaded media, Your.org with
>>> >> > http/ftp/rsync
>>> >> > access.
>>> >> >
>>> >> > Thanks to Carlos, Kevin and Yenya respectively at the above sites for
>>> >> > volunteering space, time and effort to make this happen.
>>> >> >
>>> >> > As people noticed earlier, a series of media tarballs per-project
>>> >> > (excluding commons) is being generated.  As soon as the first run of
>>> >> > these is complete we'll announce its location and start generating
>>> >> > them
>>> >> > on a semi-regular basis.
>>> >> >
>>> >> > As we've been getting the bugs out of the mirroring setup, it is
>>> >> > getting
>>> >> > easier to add new locations.  Know anyone interested?  Please let us
>>> >> > know; we would love to have them.
>>> >> >
>>> >> > Ariel
>>> >> >
>>> >> >
>>> >> > _______________________________________________
>>> >> > Wikitech-l mailing list
>>> >> > [email protected]
>>> >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>> >>
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > Wikitech-l mailing list
>>> > [email protected]
>>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>
>>>
>>>
>>> --
>>> James Michael DuPont
>>> Member of Free Libre Open Source Software Kosova http://flossk.org
>>> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org
>>> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
>>>
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>>
>>
>>
>> --
>> Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
>> Pre-doctoral student at the University of Cádiz (Spain)
>> Projects: AVBOT | StatMediaWiki | WikiEvidens | WikiPapers | WikiTeam
>> Personal website: https://sites.google.com/site/emijrp/
>>
>>
>> _______________________________________________
>> Xmldatadumps-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>>
>
>
>
> --
> James Michael DuPont
> Member of Free Libre Open Source Software Kosova http://flossk.org
> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org
> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3



-- 
James Michael DuPont
Member of Free Libre Open Source Software Kosova http://flossk.org
Contributor FOSM, the CC-BY-SA map of the world http://fosm.org
Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to