2010/12/10 James Linden <[email protected]>

> This may or may not be appropriate to this list -- this is where I
> found most of the discussions on the matter, so posting here.
>
> From reading the past couple of weeks of messages, I surmise that
> there isn't a way to get a current data dump (for enwiki), while the
> server is fubar.
>
> I have the 20100312 dump, which seems to be more recent than others
> available from archive.org, Amazon EC2, and others. However, even this
> dump is significantly behind the current article revisions from
> en.wikipedia.org.
>
> I pulled 333 semi-random articles from the live API -- of those, 329
> of them have significant content changes since 20100312 dump.
>
> Thus, my question:
>
> What is the current preference/recommendation regarding pulling
> significant quantities of articles (250k/ish) from the live API, until
> the dumps are available again?
>
> Sidenote 1: I'm in the process of uploading the 20100312 dump to a
> public web location, in case it is helpful to others.
>
>
Thanks


> Sidenote 2: Is there any discussion regarding insuring current dumps
> are mirrored in the future, say with archive.org ?
>
>
http://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive
http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps


> --------------------------------------
> James Linden
> [email protected]
> --------------------------------------
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to