Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-18 Thread Mike Dupont
Hello People, I have completed my first set in uploading the osm/fosm dataset (350gb unpacked) to archive.org http://osmopenlayers.blogspot.de/2012/05/upload-finished.html We can do something similar with wikipedia, the bucket size of archive.org is 10gb, we need to split up the data in a way

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-18 Thread emijrp
There is no such 10GB limit, http://archive.org/details/ARCHIVETEAM-YV-6360017-6399947 (238 GB example) ArchiveTeam/WikiTeam is uploading some dumps to Internet Archive, if you want to join the effort use the mailing list https://groups.google.com/group/wikiteam-discuss to avoid wasting

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-17 Thread Mike Dupont
On Thu, May 17, 2012 at 6:06 AM, John phoenixoverr...@gmail.com wrote: If your willing to foot the bill for the new hardware Ill gladly prove my point given the millions of dollars that wikipedia has, it should not be a problem to provide such resources for a good cause like that. -- James

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-17 Thread J Alexandr Ledbury-Romanov
I'd like to point out that the increasingly technical nature of this conversation probably belongs either on wikitech-l, or off-list, and that the strident nature of the comments is fast approaching inappropriate. Alex Wikimedia-l list administrator 2012/5/17 Anthony wikim...@inbox.org On

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-17 Thread Neil Harris
On 17/05/12 12:49, Anthony wrote: Please have someone at WMF coordinate this so that there aren't multiple requests made. In my opinion, it should preferably be made by a WMF employee. Fill out the form at https://aws-portal.amazon.com/gp/aws/html-forms-controller/aws-dataset-inquiry Tell

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-17 Thread Kim Bruning
On Thu, May 17, 2012 at 07:43:09AM -0400, Anthony wrote: In fact, I think someone at WMF should contact Amazon and see if they'll let us conduct the experiment for free, in exchange for us creating the dump for them to host as a public data set (http://aws.amazon.com/publicdatasets/). That

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread John
Ill run a quick benchmark and import the full history of simple.wikipedia to my laptop wiki on a stick, and give an exact duration On Thu, May 17, 2012 at 12:26 AM, John phoenixoverr...@gmail.com wrote: Toolserver is a clone of the wmf servers minus files. they run a database replication of

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread Anthony
On Thu, May 17, 2012 at 12:30 AM, John phoenixoverr...@gmail.com wrote: Ill run a quick benchmark and import the full history of simple.wikipedia to my laptop wiki on a stick, and give an exact duration Simple.wikipedia is nothing like en.wikipedia. For one thing, there's no need to turn on

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread Mike Dupont
Well to be honest, I am still upset about how much data is deleted from wikipedia because it is not notable, there are so many articles that I might be interested in that are lost in the same garbage as spam and other things. We should make non notable articles and non harmful ones available in

Re: [Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

2012-05-16 Thread Anthony
On Thu, May 17, 2012 at 1:22 AM, John phoenixoverr...@gmail.com wrote: Anthony the process is linear, you have a php inserting X number of rows per Y time frame. Amazing. I need to switch all my databases to MySQL. It can insert X rows per Y time frame, regardless of whether the database is