[Wikitech-l] Dumps 2.0 Wiki Dev Summit but also...

2015-12-01 Thread Ariel T. Glenn
Also if you are a dumps user or have thoughts about how you would redo them from scratch, get your ideas in now. We're not waiting for the Dev Summit to get the work started. See https://phabricator.wikimedia.org/T114019 for details, especially the document linked at the end of the task

Re: [Wikitech-l] Proposal: slight change to the XML dump format

2014-10-27 Thread Ariel T. Glenn
Thank you Google for hiding the start of this thread in my spam folder _ I'm going to have to change my import tools for the new format, but that's the way it goes; it's a reasonable change. Have you checked with folks on the xml data dumps list to see who might be affected? Ariel Στις

Re: [Wikitech-l] dumps.wikimedia.org, downloads.wikimedia.org downtime Thursday June 26 13.30 UTC

2014-06-26 Thread Ariel T. Glenn
Στις 23-06-2014, ημέρα Δευ, και ώρα 20:56 +0300, ο/η Ariel T. Glenn έγραψε: dumps.wikimedia.org, downloads.wikimedia.org will be down on Thursday June 26 from 13.30 UTC until 14.30 UTC. While we expect the actual downtime to be much less, we're blocking one hour just in case. And Murphy has

Re: [Wikitech-l] dumps.wikimedia.org, downloads.wikimedia.org downtime Thursday June 26 13.30 UTC

2014-06-26 Thread Ariel T. Glenn
Στις 26-06-2014, ημέρα Πεμ, και ώρα 17:37 +0300, ο/η Ariel T. Glenn έγραψε: Στις 23-06-2014, ημέρα Δευ, και ώρα 20:56 +0300, ο/η Ariel T. Glenn έγραψε: dumps.wikimedia.org, downloads.wikimedia.org will be down on Thursday June 26 from 13.30 UTC until 14.30 UTC. While we expect the actual

[Wikitech-l] dumps.wikimedia.org, downloads.wikimedia.org downtime Thursday June 26 13.30 UTC

2014-06-23 Thread Ariel T. Glenn
dumps.wikimedia.org, downloads.wikimedia.org will be down on Thursday June 26 from 13.30 UTC until 14.30 UTC. While we expect the actual downtime to be much less, we're blocking one hour just in case. We will be moving it to a new rack in preparation for improved bandwidth, and yes this mean

[Wikitech-l] download.wikimedia.org, dumps.wikimedia.org moves

2014-03-26 Thread Ariel T. Glenn
These names will be moved so that requests to them go to our server in the eqiad data center. This should not cause any service interruptions but you may notice more current files available for download as the switch goes into effect. Time of switch: 10 to 12 am Thursday March 27, UTC.

[Wikitech-l] wmf getting ready for puppet3, advice please

2014-01-28 Thread Ariel T. Glenn
Hi puppet wranglers, We're trying to refactor the WMF puppet manifests to get rid of reliance on dynamic scope, since puppet 3 doesn't permit it. Until now we've done what is surely pretty standard pupet 2.x practice: assign values to a variable in the node definition and pick it up in the class

Re: [Wikitech-l] wmf getting ready for puppet3, advice please

2014-01-28 Thread Ariel T. Glenn
Στις 28-01-2014, ημέρα Τρι, και ώρα 10:21 -0800, ο/η Ryan Lane έγραψε: In puppet3 variables assigned in the node are still global. It's the only place other than facts (or hiera) that you can assign them and have their scope propagate. So, this'll continue working. I think the future path is

Re: [Wikitech-l] FWD: [Bug 58236] New: No longer allow gadgets to be turned on by default for all users on Wikimedia sites

2013-12-12 Thread Ariel T. Glenn
Στις 11-12-2013, ημέρα Τετ, και ώρα 23:01 -0500, ο/η MZMcBride έγραψε: ... The idea being proposed in bug 58236, as it was framed, was a non-starter. It simply riled people up and caused them to become defensive. (Its sibling bugs didn't help.) However, if we re-frame the issue, I think many

Re: [Wikitech-l] Bulk download

2013-09-23 Thread Ariel T. Glenn
We have a somewhat out of date off site mirror of images (I'm working on the out of date part). This includes commons. It's accessible by rsync, http, ftp: http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Media Thanks again to your.org for hosting that. Are these images

Re: [Wikitech-l] Are revision tags in the dump files?

2013-07-09 Thread Ariel T. Glenn
Στις 09-07-2013, ημέρα Τρι, και ώρα 07:07 -0400, ο/η Tyler Romeo έγραψε: Follow-up question. Will our new dumps project be dumping the change_tag table? ;) Our old dumps project could (the new one isn't intended to handle the table dumps but only the page metadata and content data). I don't

Re: [Wikitech-l] Are revision tags in the dump files?

2013-07-08 Thread Ariel T. Glenn
Στις 08-07-2013, ημέρα Δευ, και ώρα 20:17 -0700, ο/η Robert Rohde έγραψε: Various parts of Mediawiki will apply tags to specific edits in recent changes and histories. For example, the recently introduced Visual Editor is adding Tag: VisualEditor to all of its edits. Are such tags

Re: [Wikitech-l] [Xmldatadumps-l] Suggested file format of new incremental dumps

2013-07-07 Thread Ariel T. Glenn
Στις 07-07-2013, ημέρα Κυρ, και ώρα 21:09 -0700, ο/η Randall Farmer έγραψε: Sorry, reading back over this thread late. What I hope for is a format that allows dumps to be produced much more rapidly, where the time to produce the incrementals grows only as the number of edits per time

Re: [Wikitech-l] [Xmldatadumps-l] Suggested file format of new incremental dumps

2013-07-02 Thread Ariel T. Glenn
Στις 02-07-2013, ημέρα Τρι, και ώρα 11:47 +0100, ο/η Neil Harris έγραψε: The simplest possible dump format is the best, and there's already a thriving ecosystem around the current XML dumps, which would be broken by moving to a binary format. Binary file formats and APIs defined by code

[Wikitech-l] cleaning up wikitech.wikimedia.org (again)

2013-06-25 Thread Ariel T. Glenn
For folks that edit on Wikitech, note that obsolete docs can now be moved to their own namespace, Obsolete, where we'll be able to dig them up if we ever want them but they won't clutter up the search results etc. Please feel free to start populating the new namespace with all that cruft you were

Re: [Wikitech-l] Incremental XML dumps GSoC proposal

2013-05-02 Thread Ariel T. Glenn
Στις 02-05-2013, ημέρα Πεμ, και ώρα 15:40 +0200, ο/η Petr Onderka έγραψε: I realized I didn't post my proposal to the list yet (I have added it to the official GSoC site few days ago), so here it is: http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps In short, the project aims to

Re: [Wikitech-l] enwiki dump -- retrying dumping database tables on partial failure?

2013-03-14 Thread Ariel T. Glenn
Στις 14-03-2013, ημέρα Πεμ, και ώρα 23:24 +, ο/η Neil Harris έγραψε: Dear Wikimedia ops team, The most recent enwiki dump now seems to have finished _almost_ successfully, apart from the dumping of the database metadata tables such as the pages table and the various links tables,

Re: [Wikitech-l] Identifying pages that are slow to render

2013-03-11 Thread Ariel T. Glenn
Στις 07-03-2013, ημέρα Πεμ, και ώρα 21:12 -0400, ο/η bawolff έγραψε: On 2013-03-07 4:06 PM, Matthew Flaschen mflasc...@wikimedia.org wrote: On 03/07/2013 12:00 PM, Antoine Musso wrote: Le 06/03/13 23:58, Federico Leva (Nemo) a écrit : There's slow-parse.log, but it's private unless a

Re: [Wikitech-l] Free Disk Space needed for import

2013-03-11 Thread Ariel T. Glenn
Στις 11-03-2013, ημέρα Δευ, και ώρα 05:35 -0500, ο/η wiki έγραψε: Thank you for the response. I think those sizes refer to the exported xml, e.g. 41.5GB is the English xml.bz2 expanded. I was curious as to how much extra disk space is needed (and consumed) after importing this

Re: [Wikitech-l] Integrating MediaWiki into MS SBS

2013-02-05 Thread Ariel T. Glenn
Στις 05-02-2013, ημέρα Τρι, και ώρα 07:21 -0500, ο/η Chad έγραψε: On Tue, Feb 5, 2013 at 7:06 AM, Marco Fleckinger marco.fleckin...@wikipedia.at wrote: The farmer doesn't want to eat anything he doesn't know. I don't know this sentence's popularity in Hungary (AFAIK?), but in German it's

Re: [Wikitech-l] RFC: Parsoid roadmap

2013-01-29 Thread Ariel T. Glenn
Στις 23-01-2013, ημέρα Τετ, και ώρα 15:10 -0800, ο/η Gabriel Wicke έγραψε: Fellow MediaWiki hackers! After the pretty successful December release and some more clean-up work following up on that we are now considering the next steps for Parsoid. To this end, we have put together a rough

Re: [Wikitech-l] Can we help Tor users make legitimate edits?

2012-12-28 Thread Ariel T. Glenn
Στις 28-12-2012, ημέρα Παρ, και ώρα 10:38 -0500, ο/η Brad Jorsch έγραψε: On Thu, Dec 27, 2012 at 7:26 PM, Sumana Harihareswara suma...@wikimedia.org wrote: 3) Look at Nymble - http://freehaven.net/anonbib/#oakland11-formalizing and http://cgi.soic.indiana.edu/~kapadia/nymble/overview.php .

Re: [Wikitech-l] Mobile apps: time to go native?

2012-12-12 Thread Ariel T. Glenn
Στις 11-12-2012, ημέρα Τρι, και ώρα 19:04 -0500, ο/η MZMcBride έγραψε: Brion Vibber wrote: Over on the mobile team we've been chatting for a while about the various trade-offs in native vs HTML-based (PhoneGap/Cordova) development. [...] iOS and Android remain our top-tier mobile

Re: [Wikitech-l] [Site issue] s3 wikis read-only until replication catches up

2012-12-11 Thread Ariel T. Glenn
Στις 11-12-2012, ημέρα Τρι, και ώρα 01:10 -0800, ο/η Erik Moeller έγραψε: Wikimedia wikis hosted on the s3 cluster (pretty much all but the very large wikis, click on the s3 box in https://noc.wikimedia.org/dbtree/ to get a full list) are currently in read-only mode due to severe replication

Re: [Wikitech-l] Media infrastructure maintenance (uploads disabled) Monday Oct 8, 11 am UTC

2012-10-06 Thread Ariel T. Glenn
+0300, ο/η Ariel T. Glenn έγραψε: We're going to swap out ms7, the current media server fallback, for a netapp. We'll start this on Friday Oct 5 at 11am UTC, to conclude at 2pm UTC or earlier. This may entail turning off uploads to all projects during the switchover. It is possible

Re: [Wikitech-l] Media infrastructure maintenance (uploads disabled) Friday Oct 5, 11 am UTC

2012-10-05 Thread Ariel T. Glenn
We rolled back this change after discovering an ownership issue with the rsynced media files that caused deletions of media to fail. We'll try again early next week. Ariel Στις 04-10-2012, ημέρα Πεμ, και ώρα 15:19 +0300, ο/η Ariel T. Glenn έγραψε: We're going to swap out ms7, the current media

[Wikitech-l] Media infrastructure maintenance (uploads disabled) Friday Oct 5, 11 am UTC

2012-10-04 Thread Ariel T. Glenn
We're going to swap out ms7, the current media server fallback, for a netapp. We'll start this on Friday Oct 5 at 11am UTC, to conclude at 2pm UTC or earlier. This will entail turning off uploads to all projects during the switchover. It is possible that ExtensionDistributor and captchas will

[Wikitech-l] scaled media (thumbs) as *temporary* files, not stored forever

2012-08-31 Thread Ariel T. Glenn
So it's time to have this discussion again. At least, I think we're having it again, though I could not find previous threads on this list about the subject. In short, scaled media is currently generated on the fly for any size and for any user. The resulting files are kept around forever or

Re: [Wikitech-l] 3 million null edits

2012-06-19 Thread Ariel T. Glenn
How did this not propagate in the usual way through the job queue? (And why wouldn't either a null or an insignificant edit to the template add the requisite job queue entries now?) A. Στις 19-06-2012, ημέρα Τρι, και ώρα 10:00 +0200, ο/η Maarten Dammers έγραψε: Hi guys, There must be an

Re: [Wikitech-l] [Xmldatadumps-l] XML dumps/Media mirrors update

2012-06-05 Thread Ariel T. Glenn
Dupont jamesmikedup...@googlemail.com Well I whould be happy for items like this : http://en.wikipedia.org/wiki/Template:Db-a7 would it be possible to extract them easily? mike On Thu, May 17, 2012 at 2:23 PM, Ariel T. Glenn ar...@wikimedia.org wrote: There's a few

[Wikitech-l] XML dumps/Media mirrors update

2012-05-17 Thread Ariel T. Glenn
We now have three mirror sites, yay! The full list is linked to from http://dumps.wikimedia.org/ and is also available at http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current_Mirrors Summarizing, we have: C3L (Brazil) with the last 5 good known dumps, Masaryk

Re: [Wikitech-l] XML dumps/Media mirrors update

2012-05-17 Thread Ariel T. Glenn
of deleted data, at least that which is not spam/vandalism based on tags. mike On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn ar...@wikimedia.org wrote: We now have three mirror sites, yay! The full list is linked to from http://dumps.wikimedia.org/ and is also available at http

Re: [Wikitech-l] Inactive sysops + improving security

2012-04-13 Thread Ariel T. Glenn
Στις 13-04-2012, ημέρα Παρ, και ώρα 12:49 +1000, ο/η Andrew Garrett έγραψε: On Wed, Apr 4, 2012 at 6:25 PM, Petr Bena benap...@gmail.com wrote: An account with sysop rights cannot do that much damage anyway. Deleting a page does no more damage than deleting a paragraph in an existent

Re: [Wikitech-l] Development process doesn't work (yes this is another complaint from another community member)

2012-04-05 Thread Ariel T. Glenn
*cough* LQT 3 is private because it doesn't exist... see the date of that email (hint, 1st day of April). If there were to be a project like that I expect it would be very very public indeed. ;-) Ariel Στις 05-04-2012, ημέρα Πεμ, και ώρα 09:10 +0200, ο/η Petr Bena έγραψε: When we talk about

Re: [Wikitech-l] Development process doesn't work (yes this is another complaint from another community member)

2012-04-05 Thread Ariel T. Glenn
know if this is a part of some joke https://www.mediawiki.org/wiki/LiquidThreads_3.0/status but it seems that someone wrote some code 2012/4/5 Petr Bena benap...@gmail.com: This isn't true? https://www.mediawiki.org/wiki/LiquidThreads_3.0 On Thu, Apr 5, 2012 at 9:40 AM, Ariel T

Re: [Wikitech-l] Wikipedia Url Shortner Service

2012-03-26 Thread Ariel T. Glenn
Στις 26-03-2012, ημέρα Δευ, και ώρα 10:39 -0400, ο/η Mark A. Hershberger έγραψε: Benjamin Lees emufarm...@gmail.com writes: I see two different use cases here: one, you have URLs that need to be short so they can fit in Twitter messages and the like. Here, it doesn't matter whether the

Re: [Wikitech-l] Test suite for dumping MediaWikis using xmldumps-backup

2012-03-17 Thread Ariel T. Glenn
Στις 17-03-2012, ημέρα Σαβ, και ώρα 16:45 +0100, ο/η Christian Aistleitner έγραψε: Hi Saper, On Sat, Mar 17, 2012 at 01:59:33PM +, Marcin Cieslak wrote: [ Announcing xmldumps-test ] The code is up for review at https://gerrit.wikimedia.org/r/p/operations/dumps/test.git [

Re: [Wikitech-l] Opengrok for Mediawiki code

2012-03-01 Thread Ariel T. Glenn
Στις 01-03-2012, ημέρα Πεμ, και ώρα 15:00 +0200, ο/η Amir E. Aharoni έγραψε: 2012/3/1 Srikanth Lakshmanan srik@gmail.com: Hi all, Would having opengrok[1] setup for Mediawiki code be useful tool? I haven't used ViewVC much, so not sure if opengrok doesn't do something that ViewVC

Re: [Wikitech-l] Decentralized data center

2012-01-25 Thread Ariel T. Glenn
Actually we still want mirrors of the revision texts and the mysql tables [1], and we do not yet have a mirror of the image data. If anyone has contacts at a university with good bandwidth and 6T + of space lying around for a good cause... Ariel [1]

Re: [Wikitech-l] Picture rotation: what on earth?

2011-12-08 Thread Ariel T. Glenn
I can't answer to the automated or not part, partially because I'm missing the rest of the thread. I can tell you that because our one server with the thumbs on it was getting dangerously low on space (and still is), I've been purging a number of thumbs each day that are not linked to on our

Re: [Wikitech-l] Forbidden access

2011-11-26 Thread Ariel T. Glenn
Hello, I just checked the pagelinks and categorylinks files here: http://dumps.wikimedia.org/enwiki/2015/ and they are accessible. Can you give a couple of specific links that did not work? Ariel Στις 26-11-2011, ημέρα Σαβ, και ώρα 20:54 +0100, ο/η Khalida BEN SIDI AHMED έγραψε: For my

Re: [Wikitech-l] Bugzilla vandalism

2011-11-25 Thread Ariel T. Glenn
Στις 25-11-2011, ημέρα Παρ, και ώρα 19:08 +1000, ο/η K. Peachey έγραψε: On Fri, Nov 25, 2011 at 7:06 PM, Bryan Tong Minh bryan.tongm...@gmail.com wrote: There is also another, impersonating brion, which does not appear to have been cleaned up. Most of that has been (Its what Ariel was

Re: [Wikitech-l] Bugzilla vandalism

2011-11-25 Thread Ariel T. Glenn
Does anyone here use the mass bug change feature? If not, we might consider just turning that off outright for all users. Ariel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Bugzilla vandalism

2011-11-25 Thread Ariel T. Glenn
Στις 25-11-2011, ημέρα Παρ, και ώρα 18:05 +0100, ο/η Siebrand Mazeland έγραψε: Does anyone here use the mass bug change feature? If not, we might consider just turning that off outright for all users. I use it from time to time, as do others (mostly bugzilla admins). Can it be disabled on

Re: [Wikitech-l] Bugzilla vandalism

2011-11-25 Thread Ariel T. Glenn
Στις 26-11-2011, ημέρα Σαβ, και ώρα 01:24 +0800, ο/η Liangent έγραψε: On Sat, Nov 26, 2011 at 1:11 AM, Ariel T. Glenn ar...@wikimedia.org wrote: I'm sure it can. If we had to we could give it to all existing users, it's just a bit more tedious. Only to users that existed before you sent

Re: [Wikitech-l] Bugzilla vandalism

2011-11-24 Thread Ariel T. Glenn
Στις 24-11-2011, ημέρα Πεμ, και ώρα 21:30 +0100, ο/η Daniel Zahn έγραψε: On Thu, Nov 24, 2011 at 8:15 PM, Rob Lanphier ro...@wikimedia.org wrote: So, here's the solution for now, and probably for a while: 1. New account creation has been re-enabled 2. All existing accounts have been

Re: [Wikitech-l] No new german dump

2011-11-21 Thread Ariel T. Glenn
The three processes we had going for largish wikis had been restarted from a particilar step, since I had to interrupt them earlier for kernel upgrade and reboot. These stop at the end of the run. Three regular jobs are now running; these cycle through the list of the ten largish wikis in the

Re: [Wikitech-l] page view stats redux

2011-11-16 Thread Ariel T. Glenn
Thanks! But it seems that the update of pagecounts files is stopped for the past few hours. Is this a temporary problem? Thanks, Ikuya Yes, very temporary. A mistaken side-effect of taking Domas' server out of the loop; fixed. Ariel ___

Re: [Wikitech-l] page view stats redux

2011-11-16 Thread Ariel T. Glenn
: very cool! is there a readme or project page somewhere that explains what all these files are? On Wed, Nov 16, 2011 at 1:27 PM, Ariel T. Glenn ar...@wikimedia.org wrote: Thanks! But it seems that the update of pagecounts files is stopped for the past few hours. Is this a temporary

Re: [Wikitech-l] page view stats redux

2011-11-12 Thread Ariel T. Glenn
Στις 09-11-2011, ημέρα Τετ, και ώρα 10:07 -0500, ο/η Sean Timm έγραψε: On 11/9/2011 8:21 AM, Ikuya Yamada wrote: I had thought to do a daily update. If it turns out that hourly updates are indeed useful, I'll set that up. I don't know of anyone else that has a current mirror. I had

Re: [Wikitech-l] page view stats redux

2011-11-07 Thread Ariel T. Glenn
Στις 07-11-2011, ημέρα Δευ, και ώρα 18:41 +, ο/η Sean Timm έγραψε: Ariel T. Glenn ariel at wikimedia.org writes: I think we finally have a complete copy from December 2007 through August 2011 of the pageview stats scrounged from various sources, now available on our dumps server

[Wikitech-l] dump titles form other namespaces than 0?

2011-11-02 Thread Ariel T. Glenn
A while back (over 2 years ago, urk!) we had a request for dumps of titles of things other than articles [1]. I haven't seen that request repeated, but I'm wondering how useful that would be to folks and which namespaces we should dump, if we were going to add a few. Article talk pages? Other?

Re: [Wikitech-l] [Xmldatadumps-l] first mirror of most recent XML dumps, at C3SL in Brazil

2011-10-23 Thread Ariel T. Glenn
: Some of the most recent dumps links are broken[1]. [1] http://wikipedia.c3sl.ufpr.br/jawikisource/20111018 2011/10/13 Ariel T. Glenn ar...@wikimedia.org As the subject says, the first mirror of our XML dumps is up, hosted at C3Sl in BRazil. We're really excited about

Re: [Wikitech-l] [Xmldatadumps-l] first mirror of most recent XML dumps, at C3SL in Brazil

2011-10-23 Thread Ariel T. Glenn
recent dumps links are broken[1]. [1] http://wikipedia.c3sl.ufpr.br/jawikisource/20111018 2011/10/13 Ariel T. Glenn ar...@wikimedia.org As the subject says, the first mirror of our XML dumps is up, hosted at C3Sl in BRazil. We're really excited about it. Details

[Wikitech-l] first mirror of most recent XML dumps, at C3SL in Brazil

2011-10-13 Thread Ariel T. Glenn
As the subject says, the first mirror of our XML dumps is up, hosted at C3Sl in BRazil. We're really excited about it. Details are listed on the main index page on our download server ( http://dumps.wikimedia.org/ ) and are reproduced below for everyone's convenience: Site: Centro de Computação

Re: [Wikitech-l] FW: RFC: Refactor on File/FileRepo/MediaHandler? (Separation of concerns, portability)

2011-10-03 Thread Ariel T. Glenn
Στις 03-10-2011, ημέρα Δευ, και ώρα 22:21 -0400, ο/η Russell Nelson έγραψε: On Mon, Oct 3, 2011 at 10:15 PM, Brion Vibber br...@wikimedia.org wrote: I would *very* strongly recommend doing the internal refactoring before we get anywhere near reviewing and deploying that bad boy; otherwise

Re: [Wikitech-l] Counting revisions in 2011090

2011-09-26 Thread Ariel T. Glenn
Στις 26-09-2011, ημέρα Δευ, και ώρα 02:20 +0200, ο/η melvin_mm έγραψε: John phoenixoverride at gmail.com writes: mysql select count(*) from archive; +--+ | count(*) | +--+ | 33263574 | +--+ 1 row in set (8 min 47.50 sec) On Sun, Sep 25, 2011

Re: [Wikitech-l] Counting revisions in 2011090

2011-09-26 Thread Ariel T. Glenn
Στις 26-09-2011, ημέρα Δευ, και ώρα 08:47 +0200, ο/η melvin_mm έγραψε: Ariel T. Glennarielat wikimedia.org writes: Στις 26-09-2011, ημέρα Δευ, και ώρα 02:20 +0200, ο/η melvin_mm έγραψε: Ok, thanks! So in pages-meta-history, those ~33.000.000 archived / deleted revisions are not

Re: [Wikitech-l] Counting revisions in 2011090

2011-09-26 Thread Ariel T. Glenn
Στις 26-09-2011, ημέρα Δευ, και ώρα 17:59 +0300, ο/η Ariel T. Glenn έγραψε: Στις 26-09-2011, ημέρα Δευ, και ώρα 08:47 +0200, ο/η melvin_mm έγραψε: Ariel T. Glennarielat wikimedia.org writes: Στις 26-09-2011, ημέρα Δευ, και ώρα 02:20 +0200, ο/η melvin_mm έγραψε: Ok, thanks! So

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-18 Thread Ariel T. Glenn
Στις 17-09-2011, ημέρα Σαβ, και ώρα 22:55 -0700, ο/η Robert Rohde έγραψε: On Sat, Sep 17, 2011 at 4:56 PM, Anthony wikim...@inbox.org wrote: snip For offline analyses, there's no need to change the online database tables. Need? That's debatable, but one of the major motivators is the

Re: [Wikitech-l] page view stats redux

2011-09-18 Thread Ariel T. Glenn
Yes, and I've already been getting the information on that together so it can be documented. :-) Ariel Στις 18-09-2011, ημέρα Κυρ, και ώρα 11:55 +0100, ο/η Harry Burt έγραψε: Ariel T. Glenn wrote: I think we finally have a complete copy from December 2007 through August 2011 of the pageview

[Wikitech-l] page view stats redux

2011-09-15 Thread Ariel T. Glenn
I think we finally have a complete copy from December 2007 through August 2011 of the pageview stats scrounged from various sources, now available on our dumps server. See http://dumps.wikimedia.org/other/pagecounts-raw/ Ariel ___ Wikitech-l

Re: [Wikitech-l] Sep11 Wiki

2011-09-06 Thread Ariel T. Glenn
Στις 06-09-2011, ημέρα Τρι, και ώρα 17:07 -0700, ο/η Brion Vibber έγραψε: snip Indeed -- as long as the data's accessible I'm content enough - http://lists.wikimedia.org/pipermail/foundation-l/2006-September/023835.html:) Since then though we've removed it from the data dumps, so it's no

Re: [Wikitech-l] Proposed chat system

2011-09-04 Thread Ariel T. Glenn
If it's actually etherpad-based, that keeps track of who makes which change within a given session, so one could attribute specific pieces of text to a given editor. Ariel Στις 04-09-2011, ημέρα Κυρ, και ώρα 21:40 +, ο/η Russell N. Nelson - rnnelson έγραψε: Treat the concurrent session as

Re: [Wikitech-l] Code review process (was: Status of more regular code deployments)

2011-06-02 Thread Ariel T. Glenn
Στις 01-06-2011, ημέρα Τετ, και ώρα 15:58 -0600, ο/η bawolff έγραψε: On Wed, Jun 1, 2011 at 3:02 PM, Brion Vibber br...@pobox.com wrote: On Wed, Jun 1, 2011 at 1:53 PM, bawolff bawolff...@gmail.com wrote: As a volunteer person, I'm fine if code I commit is reverted based on it sucking,

Re: [Wikitech-l] Status of more regular code deployments

2011-06-02 Thread Ariel T. Glenn
Στις 02-06-2011, ημέρα Πεμ, και ώρα 08:31 -0400, ο/η Alex Mr.Z-man έγραψε: On Thu, Jun 2, 2011 at 12:10 AM, Brandon Harris bhar...@wikimedia.org wrote: Your solution, as you've described it in the past, comprises people do code review or orf wit' dere heads. I know of no

Re: [Wikitech-l] [Xmldatadumps-l] april en history dumps incomplete

2011-04-19 Thread Ariel T. Glenn
And they're done. In the future I expect to spam these lists much less often, as we'll be able to add a status notice on the download page. Thanks for your patience. Ariel Στις 18-04-2011, ημέρα Δευ, και ώρα 13:17 +0300, ο/η Ariel T. Glenn έγραψε: The en wikipedia history bz2 files are ready

Re: [Wikitech-l] [Xmldatadumps-l] april en history dumps incomplete

2011-04-18 Thread Ariel T. Glenn
The en wikipedia history bz2 files are ready; the last of the 7z files is being rerun. Ariel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Status of 1.17 1.18

2011-04-15 Thread Ariel T. Glenn
Στις 15-04-2011, ημέρα Παρ, και ώρα 18:41 -0700, ο/η Brion Vibber έγραψε: One issue we see at present is that since we version and deploy core and extensions together, it's tough to get a semi-experimental extension into limited deployment with regular updates. Let's make sure that's clean

[Wikitech-l] april en history dumps incomplete

2011-04-13 Thread Ariel T. Glenn
The April run of the english history dumps is incomplete. There is at least one file that will need to be regenerated. When it's ready I'll send an email update. I expect a delay of 4-5 days for that. Ariel ___ Wikitech-l mailing list

Re: [Wikitech-l] [Xmldatadumps-l] March 17 en wikipedia history bz2 files ready

2011-03-30 Thread Ariel T. Glenn
file size growth seems to be pretty linear: (chart x-axis starts from 20060816 dump and ends at 20110115 dump) http://nekrom.com/wikipedia/enwiki%20history%20dump%20file%20size% 20over%20time.png cheers, Jamie - Original Message - From: Ariel T. Glenn ar...@wikimedia.org Date

[Wikitech-l] March 17 en wikipedia history bz2 files ready

2011-03-29 Thread Ariel T. Glenn
Well, that used up all my good luck for the year, but the bz2s are ready for download. The md5sums are still calculating, give them a couple hours to show up. If all continues to go well we'll have the 7z files in 4-5 days. As before I do not plan to provide a single 350gb file of the bz2, nor

Re: [Wikitech-l] Moving the Dump Process to another language

2011-03-25 Thread Ariel T. Glenn
Στις 24-03-2011, ημέρα Πεμ, και ώρα 20:29 -0400, ο/η James Linden έγραψε: So, thoughts on this? Is 'Move Dumping Process to another language' a good idea at all? I'd worry a lot less about what languages are used than whether the process itself is scalable. I'm not a mediawiki /

Re: [Wikitech-l] Moving the Dump Process to another language

2011-03-25 Thread Ariel T. Glenn
Στις 25-03-2011, ημέρα Παρ, και ώρα 21:49 +0100, ο/η Platonides έγραψε: Andrew Dunbar wrote: Just a thought, wouldn't it be easier to generate dumps in parallel if we did away with the assumption that the dump would be in database order. The metadata in the dump provides the ordering info

Re: [Wikitech-l] testing of localization

2011-03-23 Thread Ariel T. Glenn
Στις 23-03-2011, ημέρα Τετ, και ώρα 02:03 +0100, ο/η Platonides έγραψε: Marcin Cieslak wrote: So having a possibility to have a pre-flight test of the translation (or even watch the demo of the original in action) is something Selenium could deinitely help. In many cases, translators do

[Wikitech-l] Jan history dumps (bz2 and 7z) now available

2011-03-18 Thread Ariel T. Glenn
Well that, like many things about dumps, took longer than I would have liked but the January enwikipedia run is finally complete. Unless someone really really wants them (and then we might talk off list about it) I am not going to provide a single file for download of the history dumps; instead

Re: [Wikitech-l] [Xmldatadumps-l] post-1.17 deployment restart of dumps

2011-02-28 Thread Ariel T. Glenn
And one more time... I noticed that we were seeing a 3 to 4-fold slowdown on sv wiki history dumps in comparison with the previous run. After investigation it appears that this is due to use of XMLReader(). I've rolled that back and we are once again up. I've also restarted dawiki from the

Re: [Wikitech-l] [Xmldatadumps-l] post-1.17 deployment restart of dumps

2011-02-26 Thread Ariel T. Glenn
Irritatingly enough we haven't quite switched all the paths of everything to use php-1.17. For example, the dumps. So the previous tests aren't very useful. I'm shooting the svwiki dump in process and doing another round of tests with the correct path; after that I'll restart svwiki from its

Re: [Wikitech-l] [Xmldatadumps-l] post-1.17 deployment restart of dumps

2011-02-26 Thread Ariel T. Glenn
We are back in business and running off the new codebase. Please check the output carefully. Note that we are on schema version 0.5 now, which includes byte length of revisions. Also please note that leading spaces before / in xml markup are now removed. Other than that things should look

[Wikitech-l] post-1.17 deployment restart of dumps

2011-02-25 Thread Ariel T. Glenn
I have done a small amount of testing, the tests look good. Acccordingly I have started up one process to do dumps; please get your eyeballs on them and let me know thumbs up or down. I'd like to start up the rest of the processes by tomorrow at this time so if you can squeeze in some time to

Re: [Wikitech-l] upcoming 1.17 deployment and the xml dumps

2011-02-07 Thread Ariel T. Glenn
back up again. Ariel Στις 08-02-2011, ημέρα Τρι, και ώρα 11:51 +0530, ο/η Janesh Kodikara έγραψε: - Original Message - From: Ariel T. Glenn ar...@wikimedia.org Newsgroups: gmane.science.linguistics.wikipedia.technical To: xmldatadump...@lists.wikimedia.org; wikitech-l

[Wikitech-l] upcoming 1.17 deployment and the xml dumps

2011-02-05 Thread Ariel T. Glenn
A little bit before the scheduled deployment of the 1.17 branch on our production servers, I will be halting production of XML dumps. Deployment is set for Tuesday Feb 8 at 07:00 UTC, so a few hours before that I'll start shutting down processes. This is a precautionary measure; after the

Re: [Wikitech-l] [Xmldatadumps-l] dataset1, xml dumps

2011-01-11 Thread Ariel T. Glenn
Στις 11-01-2011, ημέρα Τρι, και ώρα 10:16 +, ο/η Neil Harris έγραψε: On 10/01/11 22:13, Ariel T. Glenn wrote: So soon took longer than I would have liked. However, we are up and running with the new code. I have started a few processes going and over the next few days I will ramp

[Wikitech-l] bogus recombine step in xml dump indexes

2011-01-11 Thread Ariel T. Glenn
You may be noticing a recombine step for several files on the recent dumps which simply seems to list the same file again. That's a bug not a feature; fortunately it doesn't impact the files themselves. I have fixed the configuration file so that it should no longer claim to run these, as they

Re: [Wikitech-l] [Xmldatadumps-l] dataset1, xml dumps

2011-01-10 Thread Ariel T. Glenn
get trapped behind them. Guess I'd better go update the various pages on wikitech now. Ariel Στις 24-12-2010, ημέρα Παρ, και ώρα 20:42 +0200, ο/η Ariel T. Glenn έγραψε: The new host Dataset2 is now up and running and serving XML dumps. Those of you paying attention to DNS entries should see

Re: [Wikitech-l] Does anybody have the 20080726 dump version?

2011-01-01 Thread Ariel T. Glenn
Στις 01-01-2011, ημέρα Σαβ, και ώρα 16:42 +, ο/η David Gerard έγραψε: On 31 December 2010 17:09, Ariel T. Glenn ar...@wikimedia.org wrote: I'd like all the dumps from all the projects to be on line. Being realistic I think we would wind up keeping offline copies of all

Re: [Wikitech-l] Does anybody have the 20080726 dump version?

2011-01-01 Thread Ariel T. Glenn
I'd like to remind everyone once again of the mirror page: http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps If you have any ideas, please add them there, and pursue them or ask for help in doing so. If you are able to host, don't be shy, step right up ;-) Ariel Στις

Re: [Wikitech-l] Backup / Mirror of wikipedia dumps

2011-01-01 Thread Ariel T. Glenn
At the moment the easiest way for you to mirror our content would be via wget. You would want to generate a list of the most recent completed dumps, or we might make such a list available on a biweekly basis. I need to think about the best mechanism for that. There is also an RSS feed which

Re: [Wikitech-l] Does anybody have the 20080726 dump version?

2010-12-31 Thread Ariel T. Glenn
Anthony: We would like to get copies of any of these dumps as well. This includes any of the other files: stubs, tables, the lot. If you have them for other languages or other time periods, that would be great to know too. I think we could ship you a disk, or two if needed. Contact me off list

Re: [Wikitech-l] Does anybody have the 20080726 dump version?

2010-12-31 Thread Ariel T. Glenn
next week when I am back, currently traveling in germany. Best, Huib 2010/12/31, Ariel T. Glenn ar...@wikimedia.org: Anthony: We would like to get copies of any of these dumps as well. This includes any of the other files: stubs, tables, the lot. If you have them for other

Re: [Wikitech-l] Does anybody have the 20080726 dump version?

2010-12-31 Thread Ariel T. Glenn
I'd like all the dumps from all the projects to be on line. Being realistic I think we would wind up keeping offline copies of all of it, and copies from every 6 months online, with the last several months of consecutive runs = around 20 or 30 of them also online. Since these are en wiki we

Re: [Wikitech-l] Christmas server failure report

2010-12-26 Thread Ariel T. Glenn
Ryan Lane wrote a script to purge some of the Flaged Rev memcached entries; that ran last night as well. The DOM-related errors all seem to have come from srv227; apache on that host was restarted about half an hour ago and the results look good. Ariel Στις 26-12-2010, ημέρα Κυρ, και ώρα 01:49

Re: [Wikitech-l] [Xmldatadumps-l] dataset1, xml dumps

2010-12-24 Thread Ariel T. Glenn
The new host Dataset2 is now up and running and serving XML dumps. Those of you paying attention to DNS entries should see the change within the hour. We are not generating new dumps yet but expect to do so soon. Ariel ___ Wikitech-l mailing list

Re: [Wikitech-l] dataset1, xml dumps

2010-12-20 Thread Ariel T. Glenn
Google donated storage space for backups for XML dumps. Accordingly, a copy of the latest complete dump for each project is being copied over (public files only). We expect to run similar copies once every two weeks, keeping the four latest copies as well as one permanent copy at every six month

Re: [Wikitech-l] [Xmldatadumps-l] dataset1, xml dumps

2010-12-20 Thread Ariel T. Glenn
if it gave everyone one more copy. Ariel Στις 20-12-2010, ημέρα Δευ, και ώρα 17:41 +0100, ο/η Platonides έγραψε: Ariel T. Glenn wrote: Google donated storage space for backups for XML dumps. Accordingly, a copy of the latest complete dump for each project is being copied over (public files only

Re: [Wikitech-l] Parallelizing export dump (bug 24630)

2010-12-19 Thread Ariel T. Glenn
Στις 20-12-2010, ημέρα Δευ, και ώρα 00:21 +0100, ο/η Platonides έγραψε: Diederik van Liere wrote: Which dump file is offered in smaller sub files? http://download.wikimedia.org/enwiki/20100904/ Also see http://wikitech.wikimedia.org/view/Dumps/Parallelization Expect to see more of this

Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread Ariel T. Glenn
The dumps in the archive are there because they are incomplete, by the way. Ariel Στις 16-12-2010, ημέρα Πεμ, και ώρα 16:50 +0100, ο/η emijrp έγραψε: Hi Monica; You dump is this one, with date 2010-03-12:[1][2] a3a5ee062abc16a79d111273d4a1a99a enwiki-20100312-pages-articles.xml.bz2

Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread Ariel T. Glenn
has arrived and we are waiting for the arrays to be put together and shipped! Ariel Στις 16-12-2010, ημέρα Πεμ, και ώρα 17:06 +0100, ο/η emijrp έγραψε: All? The 2006 one too? 2010/12/16 Ariel T. Glenn ar...@wikimedia.org The dumps in the archive are there because they are incomplete

Re: [Wikitech-l] dataset1, xml dumps

2010-12-16 Thread Ariel T. Glenn
, emijrp emi...@gmail.com wrote: Have you checked the md5sum? 2010/12/16 Gabriel Weinberg y...@alum.mit.edu Ariel T. Glenn ariel at wikimedia.org writes: We now have a copy of the dumps on a backup host. Although we are still resolving hardware issues on the XML dumps

Re: [Wikitech-l] Parallelizing export dump (bug 24630)

2010-12-16 Thread Ariel T. Glenn
Στις 17-12-2010, ημέρα Παρ, και ώρα 00:52 +0100, ο/η Platonides έγραψε: Roan Kattouw wrote: I'm not sure how hard this would be to achieve (you'd have to correlate blob parts with revisions manually using the text table; there might be gaps for deleted revs because ES is append-only) or how

  1   2   >