Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-18 Thread Mike Dupont
there is no 10gb limit, but it is the recommended bucket size if you want to split up the file, according to my recent discussion with the archive.org team, and they have been helping me optimize the storage. the idea of mine is to make smaller blocks that can be fetched quickly and that people fo

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-18 Thread emijrp
There is no such 10GB limit, http://archive.org/details/ARCHIVETEAM-YV-6360017-6399947 (238 GB example) ArchiveTeam/WikiTeam is uploading some dumps to Internet Archive, if you want to join the effort use the mailing list https://groups.google.com/group/wikiteam-discuss to avoid wasting resources.

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-18 Thread Mike Dupont
Hello People, I have completed my first set in uploading the osm/fosm dataset (350gb unpacked) to archive.org http://osmopenlayers.blogspot.de/2012/05/upload-finished.html We can do something similar with wikipedia, the bucket size of archive.org is 10gb, we need to split up the data in a way that

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-17 Thread Kim Bruning
On Thu, May 17, 2012 at 07:43:09AM -0400, Anthony wrote: > > In fact, I think someone at WMF should contact Amazon and see if > they'll let us conduct the experiment for free, in exchange for us > creating the dump for them to host as a public data set > (http://aws.amazon.com/publicdatasets/).

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-17 Thread Neil Harris
On 17/05/12 12:49, Anthony wrote: Please have someone at WMF coordinate this so that there aren't multiple requests made. In my opinion, it should preferably be made by a WMF employee. Fill out the form at https://aws-portal.amazon.com/gp/aws/html-forms-controller/aws-dataset-inquiry Tell them

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-17 Thread Anthony
On Thu, May 17, 2012 at 8:11 AM, Thomas Dalton wrote: > On 17 May 2012 12:43, Anthony wrote: >> In fact, I think someone at WMF should contact Amazon and see if >> they'll let us conduct the experiment for free, in exchange for us >> creating the dump for them to host as a public data set >> (htt

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-17 Thread Thomas Dalton
On 17 May 2012 12:43, Anthony wrote: > In fact, I think someone at WMF should contact Amazon and see if > they'll let us conduct the experiment for free, in exchange for us > creating the dump for them to host as a public data set > (http://aws.amazon.com/publicdatasets/). What dump are you going

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-17 Thread Anthony
Please have someone at WMF coordinate this so that there aren't multiple requests made. In my opinion, it should preferably be made by a WMF employee. Fill out the form at https://aws-portal.amazon.com/gp/aws/html-forms-controller/aws-dataset-inquiry Tell them you want to create a public data se

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-17 Thread Anthony
On Thu, May 17, 2012 at 7:27 AM, J Alexandr Ledbury-Romanov wrote: > I'd like to point out that the increasingly technical nature of this > conversation probably belongs either on wikitech-l, or off-list, and that > the strident nature of the comments is fast approaching inappropriate. Really? I

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-17 Thread J Alexandr Ledbury-Romanov
I'd like to point out that the increasingly technical nature of this conversation probably belongs either on wikitech-l, or off-list, and that the strident nature of the comments is fast approaching inappropriate. Alex Wikimedia-l list administrator 2012/5/17 Anthony > On Thu, May 17, 2012 at

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-17 Thread Anthony
On Thu, May 17, 2012 at 2:06 AM, John wrote: > On Thu, May 17, 2012 at 1:52 AM, Anthony wrote: >> On Thu, May 17, 2012 at 1:22 AM, John wrote: >> > Anthony the process is linear, you have a php inserting X number of rows >> > per >> > Y time frame. >> >> Amazing.  I need to switch all my databas

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread Mike Dupont
On Thu, May 17, 2012 at 6:06 AM, John wrote: > If your willing to foot the bill for the new hardware > Ill gladly prove my point given the millions of dollars that wikipedia has, it should not be a problem to provide such resources for a good cause like that. -- James Michael DuPont Member of F

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread John
On Thu, May 17, 2012 at 1:52 AM, Anthony wrote: > On Thu, May 17, 2012 at 1:22 AM, John wrote: > > Anthony the process is linear, you have a php inserting X number of rows > per > > Y time frame. > > Amazing. I need to switch all my databases to MySQL. It can insert X > rows per Y time frame,

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread Anthony
On Thu, May 17, 2012 at 1:22 AM, John wrote: > Anthony the process is linear, you have a php inserting X number of rows per > Y time frame. Amazing. I need to switch all my databases to MySQL. It can insert X rows per Y time frame, regardless of whether the database is 20 gigabytes or 20 teraby

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread Mike Dupont
Well to be honest, I am still upset about how much data is deleted from wikipedia because it is not "notable", there are so many articles that I might be interested in that are lost in the same garbage as spam and other things. We should make non notable articles and non harmful ones available in t

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread John
Anthony the process is linear, you have a php inserting X number of rows per Y time frame. Yes rebuilding the externallinks, links, and langlinks tables will take some additional time and wont scale. However I have been working with the toolserver since 2007 and Ive lost count of the number of time

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread Anthony
On Thu, May 17, 2012 at 12:45 AM, John wrote: > Simple.wikipedia is nothing like en.wikipedia I care to dispute that > statement, All WMF wikis are setup basically the same (an odd extension here > or there is different, and different namespace names at times) but for the > purpose of recovery sim

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread John
*Simple.wikipedia is nothing like en.wikipedia* I care to dispute that statement, All WMF wikis are setup basically the same (an odd extension here or there is different, and different namespace names at times) but for the purpose of recovery simplewiki_p is a very standard example. this issue isnt

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread Anthony
On Thu, May 17, 2012 at 12:30 AM, John wrote: > Ill run a quick benchmark and import the full history of simple.wikipedia to > my laptop wiki on a stick, and give an exact duration Simple.wikipedia is nothing like en.wikipedia. For one thing, there's no need to turn on $wgCompressRevisions with

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread John
Ill run a quick benchmark and import the full history of simple.wikipedia to my laptop wiki on a stick, and give an exact duration On Thu, May 17, 2012 at 12:26 AM, John wrote: > Toolserver is a clone of the wmf servers minus files. they run a database > replication of all wikis. these times ar

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread John
Toolserver is a clone of the wmf servers minus files. they run a database replication of all wikis. these times are dependent on available hardware and may very, but should provide a decent estimate On Thu, May 17, 2012 at 12:23 AM, Anthony wrote: > On Thu, May 17, 2012 at 12:18 AM, John wrote

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread Anthony
On Thu, May 17, 2012 at 12:18 AM, John wrote: > take a look at http://www.mediawiki.org/wiki/Manual:Importing_XML_dumps for > exactly how to import an existing dump, I know the process of re-importing > a cluster for the toolserver is normally just a few days when they have the > needed dumps. To

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread Anthony
On Thu, May 17, 2012 at 12:13 AM, John wrote: > that two week estimate was given worst case scenario. Given the best case > we are talking as little as a few hours for the smaller wikis to 5 days or > so for a project the size of enwiki. (see > http://lists.wikimedia.org/pipermail/xmldatadumps-l/2

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread John
take a look at http://www.mediawiki.org/wiki/Manual:Importing_XML_dumps for exactly how to import an existing dump, I know the process of re-importing a cluster for the toolserver is normally just a few days when they have the needed dumps. On Thu, May 17, 2012 at 12:13 AM, John > wrote: > that

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread John
that two week estimate was given worst case scenario. Given the best case we are talking as little as a few hours for the smaller wikis to 5 days or so for a project the size of enwiki. (see http://lists.wikimedia.org/pipermail/xmldatadumps-l/2012-May/000491.htmlfor progress on image dumps`) On We

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread Kim Bruning
On Thu, May 17, 2012 at 12:03:02AM -0400, John wrote: > Except for files, getting a content clone up is relativity easy, and can be > done in a fairly quick order (aka less than two weeks for everything). I > know there is talk about getting a rsync setup for images. Ouch, 2 weeks. We need the ima

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread John
Except for files, getting a content clone up is relativity easy, and can be done in a fairly quick order (aka less than two weeks for everything). I know there is talk about getting a rsync setup for images. ___ Wikimedia-l mailing list Wikimedia-l@lists.

[Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread Kim Bruning
On Wed, May 16, 2012 at 11:11:04PM -0400, John wrote: > I know from experience that a wiki can be re-built from any one of the > dumps that are provided, (pages-meta-current) for example contains > everything needed to reboot a site except its user database > (names/passwords ect). see > http://www