2010/12/10 James Linden <[email protected]> > This may or may not be appropriate to this list -- this is where I > found most of the discussions on the matter, so posting here. > > From reading the past couple of weeks of messages, I surmise that > there isn't a way to get a current data dump (for enwiki), while the > server is fubar. > > I have the 20100312 dump, which seems to be more recent than others > available from archive.org, Amazon EC2, and others. However, even this > dump is significantly behind the current article revisions from > en.wikipedia.org. > > I pulled 333 semi-random articles from the live API -- of those, 329 > of them have significant content changes since 20100312 dump. > > Thus, my question: > > What is the current preference/recommendation regarding pulling > significant quantities of articles (250k/ish) from the live API, until > the dumps are available again? > > Sidenote 1: I'm in the process of uploading the 20100312 dump to a > public web location, in case it is helpful to others. > > Thanks
> Sidenote 2: Is there any discussion regarding insuring current dumps > are mirrored in the future, say with archive.org ? > > http://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps > -------------------------------------- > James Linden > [email protected] > -------------------------------------- > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
