Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Happy-melon
William Pietri will...@scissor.com wrote in message news:4b7a141e.9000...@scissor.com... On 02/15/2010 07:25 PM, Domas Mituzas wrote: Was there some urgent production impact that required doing this with no notice? Ok. I'm going to take that as no. As best I understand the discussion in

Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-16 Thread emmanuel
Le mar 16/02/10 14:13, Jamie Morken jmor...@shaw.ca a écrit: What is the benefit of the database dumps being archived/distributed in xml format instead of sql format?  Converting the xml to sql takes a long time for big wiki's and people seem to have problems with this step, so why isn't the

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Domas Mituzas
As best I understand the discussion in #wikimedia-tech last night, ~20% of search server load was being taken by aforementioned spamvertisers. That sounds like an urgent production impact to me. 50% of load, which at that time was using ~20% of search server CPU load. It also cut our API

Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-16 Thread Domas Mituzas
Hi! Converting the xml to sql takes a long time for big wiki's and people seem to have problems with this step, so why isn't the sql format available for download instead of the xml format? Our dumps are not 'sql dumps'. We assemble them from all the different parts (memcached, multiple

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Anthony
On Mon, Feb 15, 2010 at 8:54 PM, Domas Mituzas midom.li...@gmail.comwrote: Hi! from now on specific per-bot/per-software/per-client User-Agent header is mandatory for contacting Wikimedia sites. Domas Hi, Whose decision was this? Were Erik, Sue, or Danese involved?

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Domas Mituzas
Hi! Whose decision was this? Mine. Were Erik, Sue, or Danese involved? No. Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Anthony
On Tue, Feb 16, 2010 at 10:31 AM, Domas Mituzas midom.li...@gmail.comwrote: Hi! Whose decision was this? Mine. Were Erik, Sue, or Danese involved? No. Cool. Who's your boss, and who's your boss's boss? Sorry, I couldn't find you in the org chart or I'd just have looked that up

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Domas Mituzas
William, I am saying that going forward you have eliminated WMF's ability to use a tertiary tool that you agree was helpful. I can't say, that we entirely eliminated it - we transform it a bit, I guess. Having spent a lot of time dealing with abuse early in the Web's history, I wouldn't

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Domas Mituzas
Cool. Who's your boss, and who's your boss's boss? Sorry, I couldn't find you in the org chart or I'd just have looked that up myself. Nobody? Been like that for ages, haven't it? Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Anthony
On Tue, Feb 16, 2010 at 10:39 AM, Domas Mituzas midom.li...@gmail.comwrote: Cool. Who's your boss, and who's your boss's boss? Sorry, I couldn't find you in the org chart or I'd just have looked that up myself. Nobody? Really? Were you doing this work as a contractor, or as a

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Marco Schuster
On Tue, Feb 16, 2010 at 4:44 PM, Anthony wikim...@inbox.org wrote: On Tue, Feb 16, 2010 at 10:39 AM, Domas Mituzas midom.li...@gmail.comwrote: Been like that for ages, haven't it? No idea.  For ages you've been able to just go onto the Wikimedia servers and change whatever you feel like, and

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Domas Mituzas
Hi! Really? Were you doing this work as a contractor, or as a volunteer? Volunteer. Someone's gotta be in charge of the contractors and/or the volunteers, no? Dunno, Cary maybe? :) On the other hand, even if they are in charge, it doesn't mean that they are my bosses :-) No idea. For

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Anthony
On Tue, Feb 16, 2010 at 11:04 AM, Domas Mituzas midom.li...@gmail.comwrote: No idea. For ages you've been able to just go onto the Wikimedia servers and change whatever you feel like, and answer to nobody? You must be misunderstanding my question or something. Kind of. Isn't that a good

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread David Gerard
On 16 February 2010 16:13, Anthony wikim...@inbox.org wrote: On Tue, Feb 16, 2010 at 11:04 AM, Domas Mituzas midom.li...@gmail.comwrote: No idea.  For ages you've been able to just go onto the Wikimedia servers and change whatever you feel like, and answer to nobody?  You must be

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Ariel T. Glenn
In fact some WMF paid employees (including me) were in the channel at that time and agreed with the decision. It seemed then and still seems to me a reasonable course of action given the circumstances. I understand it's aggravating to people who didn't get notice; let's look forward. PLease just

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Cary Bass
Wikimedia Secure Info wrote: Hi! Really? Were you doing this work as a contractor, or as a volunteer? Volunteer. Someone's gotta be in charge of the contractors and/or the volunteers, no? Dunno, Cary maybe? :) On the other hand, even if they are in charge, it doesn't

Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-16 Thread Brion Vibber
On 2/16/10 7:03 AM, Jamie Morken wrote: Ok, the simple question: how many people prefer XML or sql dumps? I think we have a FAQ on this... http://meta.wikimedia.org/wiki/Download#What_happened_to_the_SQL_dumps.3F You *do* realize that such SQL dumps would have to be invented from whole cloth

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Steve Summit
Yes, that's precisely the violation of Postel's Law I was thinking of. Steve, someone is sending us this User-Agent, is that you?:)) No. :-| Let me tell you a story. Once upon a time, there was a browser named SeaMonkey... I have no idea what point you were trying to make there (I had

Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-16 Thread Brion Vibber
On 2/11/10 7:41 AM, emman...@engelhart.org wrote: Almost one month ago I have reported a bug in mwdumper which seems to me to be critical. I simply can't user mwdumper with the itwiki XML dumps: https://bugzilla.wikimedia.org/show_bug.cgi?id=22137 I have extract the problematic part of the

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Brion Vibber
On 2/16/10 9:33 AM, Cary Bass wrote: Domas: Have I given you a Barnstar lately? Thank you for your hard labors. I know it doesn't feed the kittehs, but: /\ /**\ ___/\___ *.**/^^\**.* *.***( () )***.* *.**\,./**.* /**.**.**\

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Tei
On 16 February 2010 02:54, Domas Mituzas midom.li...@gmail.com wrote: Hi! from now on specific per-bot/per-software/per-client User-Agent header is mandatory for contacting Wikimedia sites. Domas Looks OK to me. But this is the type of decission that often break existing stuff somewhere

Re: [Wikitech-l] More dump problems?

2010-02-16 Thread Ariel T. Glenn
This appears to have been caused by a brief period during which we were mistakenly rejecting requests without user agents from the CLI. This was fixed relatively quickly; the more recent dumps seem to be fine. Please let us know if there are any other problems. Ariel Glenn ar...@wikimedia.org

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Steve Summit
Domas wrote: We don't use UA as first step of analysis, it was helpful tertiary tool... But it's now being claimed (one might assume, in defense of the new policy) that disallowing missing User-Agent strings is cutting 20-50% of the (presumably undesirable) load. Which sounds pretty primary.

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Conrad Irwin
On 02/16/2010 06:57 PM, Steve Summit wrote: Presumably some percentage of that 20-50% will come back as the spammers realize they have to supply the string. Presumably we then start playing whack-a-mole. If you assume every problem is caused by actively malicious intelligent agents, then

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Steve Summit
Conrad wrote: Given the lack of of any evidence, I assert that most of the percentage of people who a) notice a problem, b) care, c) know how to fix it; probably deserve to be using the resources anyway. Besides anyone who doesn't deserve but still fixes the problem will likely be able to, and

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Domas Mituzas
Hello, But it's now being claimed (one might assume, in defense of the new policy) that disallowing missing User-Agent strings is cutting 20-50% of the (presumably undesirable) load. Which sounds pretty primary. So which is it? Check the CPU drop in Monday:

Re: [Wikitech-l] Wikitech-l Digest, Vol 79, Issue 27

2010-02-16 Thread Aerik Sylvan
On Tue, 16 Feb 2010 21:31:35, Domas Mituzas midom.li...@gmail.com wrote: Random strings are easy to identify, fixed strings are easy to verify. Maybe this is naive of me, but that sounds like an interesting problem. It seems to me that randomized strings that are made of real words are kind

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Steve Summit
Ariel Glenn wrote: I understand it's aggravating to people who didn't get notice; let's look forward. PLease just add the UA header and your tools / bots/ etc. will be back to working. Thanks. Well, sorry, no, it's not quite like that. A few of us -- though I fear an inconsequential

[Wikitech-l] Using Template:Clayden

2010-02-16 Thread Mohamed Magdy
Hi I wanted to use http://en.wikipedia.org/wiki/Template:Clayden on a wiki so I went to Special:Export and added Template:Clayden and checked Include templates and got the file which I then imported into a new MW. I expected to get the same output as on enwiki but I got an error instead.

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Robert Rohde
If you going to do such blocking can we PLEASE finally find a way to set up a more informative error message for blocked user agents. I have long ago lost track of how many people come to WP:VPT and other places complaining that they are trying to write a bot / script / etc., and it isn't working

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Steve Summit
Robert Rohde wrote: If you going to do such blocking can we PLEASE finally find a way to set up a more informative error message for blocked user agents... When the new code blocks requests with missing User Agent strings (which is, oddly, not all of the time), it is with a 403 Forbidden

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread K. Peachey
What made you guys think it was a good idea to block these or the apple rss reader without prior notice on the mailing lists (perhaps 24hrs worth)? I'm aware of server bots that have broken because of this... -Peachey ___ Wikitech-l mailing list

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Anthony
On Tue, Feb 16, 2010 at 11:32 AM, Ariel T. Glenn ar...@wikimedia.orgwrote: In fact some WMF paid employees (including me) were in the channel at that time and agreed with the decision. It seemed then and still seems to me a reasonable course of action given the circumstances. I understand

Re: [Wikitech-l] Using Template:Clayden

2010-02-16 Thread Aryeh Gregor
On Tue, Feb 16, 2010 at 3:57 PM, Mohamed Magdy mohamed@gmail.com wrote: Hi I wanted to use http://en.wikipedia.org/wiki/Template:Clayden on a wiki so I went to Special:Export and added Template:Clayden and checked Include templates and got the file which I then imported into a new MW.  I

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Anthony
On Tue, Feb 16, 2010 at 2:31 PM, Domas Mituzas midom.li...@gmail.comwrote: Presumably some percentage of that 20-50% will come back as the spammers realize they have to supply the string. Presumably we then start playing whack-a-mole. Yes, we will ban all IPs participating in this.

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Domas Mituzas
Anthony, Yes, we will ban all IPs participating in this. Guess it's just a matter of time until *reading* Wikipedia is unavailable to large portions of the world. Your insight is entirely bogus here. And Mozilla/4.0 (compatible; MSIE 7.0; Windows NT

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Domas Mituzas
Hi! 1) Only if you've already identified the spammer through some other process (otherwise, you don't even know if they're using automated software). You probably don't get scale of wikipedia or scale of the behavior we had to deal with, if you think that it isn't possible to notice behavior

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Marco Schuster
Hi, On Tue, Feb 16, 2010 at 8:31 PM, Domas Mituzas midom.li...@gmail.com wrote: You can sure assume, that we need to come up with something to defend a new policy. Yeah, ban no/broken-UA clients for these things that do cause CPU load, but leave article reading unharmed. Normal readers with

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Anthony
Anyway, you probably are missing one important point. We're trying to make Wikipedia's service better. I'm sure you are. But that doesn't mean I agree with your methods. Probably everything looks easier from your armchair. I'd love to have that view! :) Then stop volunteering.

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread John Doe
Anthony I'm only going to say this once, either shut the fuck up or put your money where your mouth is and develop and propose a valid replacement, and stop wining when others take actions that are deemed necessary for the betterment of wikimedia related projects. UserAgents are not a big deal,

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread John Vandenberg
On Wed, Feb 17, 2010 at 1:00 PM, Anthony wikim...@inbox.org wrote: On Wed, Feb 17, 2010 at 11:57 AM, Domas Mituzas midom.li...@gmail.com wrote: Probably everything looks easier from your armchair. I'd love to have that view! :) Then stop volunteering. Did you miss the point? The graphs

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Anthony
On Tue, Feb 16, 2010 at 9:47 PM, John Vandenberg jay...@gmail.com wrote: On Wed, Feb 17, 2010 at 1:00 PM, Anthony wikim...@inbox.org wrote: On Wed, Feb 17, 2010 at 11:57 AM, Domas Mituzas midom.li...@gmail.com wrote: Probably everything looks easier from your armchair. I'd love to have

Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-16 Thread Jamie Morken
Date: Tue, 16 Feb 2010 09:34:41 -0800 From: Brion Vibber br...@pobox.com Subject: Re: [Wikitech-l] [mwdumper] new maintainer? To: wikitech-l@lists.wikimedia.org Message-ID: hlekvf$nl...@ger.gmane.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 2/16/10 7:03 AM, Jamie Morken

Re: [Wikitech-l] importing enwiki into local database

2010-02-16 Thread Eric Sun
Even after setting $wgUseTidy = true, many of my pages show an error Expression error: Missing operand for at the bottom in the References section. It looks like an error message produced by the ParserFunctions package. I've never seen this on en.wikipedia.org so I wonder if some customization

Re: [Wikitech-l] More dump problems?

2010-02-16 Thread Tomasz Finc
Ariel, Can we isolate and yank the builds that happened to go during this window? --tomasz Ariel T. Glenn wrote: This appears to have been caused by a brief period during which we were mistakenly rejecting requests without user agents from the CLI. This was fixed relatively quickly; the

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Jamie Morken
Message: 7 Date: Wed, 17 Feb 2010 13:47:47 +1100 From: John Vandenberg jay...@gmail.com Subject: Re: [Wikitech-l] User-Agent: To: Wikimedia developers wikitech-l@lists.wikimedia.org Message-ID: deea21831002161847o2f64f736w37e5a448a7642...@mail.gmail.com Content-Type: text/plain;

Re: [Wikitech-l] enwiki complete page edit history

2010-02-16 Thread Tomasz Finc
It sadly failed as noted in http://lists.wikimedia.org/pipermail/xmldatadumps-admin-l/2010-January/78.html I've updated the index to clear that up. --tomasz Jamie Morken wrote: Hi, I was looking at the enwiki dump progress and noticed the file size for the enwiki

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread John Vandenberg
On Wed, Feb 17, 2010 at 2:51 PM, Jamie Morken jmor...@shaw.ca wrote: Don't forget some normal traffic was blocked from this unannounced change, ie. Google's translate service?  How much of the traffic reduction was from services like this?  Some of the cited reduced traffic proving the

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Robert Rohde
In the interest of proactive discussion (rather than griping), why don't we discuss better ways to manage bad bots, etc. I don't know what internal tools currently exist but it seems to me like there ought to be better opportunities for traffic monitoring than UA blocks. For example, we have the

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread John Doe
that would work if it was single ips that where the main cause. what happens is there are wide distributions of IPs which lead to either blocking hole ISP ranges or being unable to easily identify DDoS like behavior. John On Tue, Feb 16, 2010 at 11:22 PM, Robert Rohde raro...@gmail.com wrote:

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Tim Starling
Anthony wrote: Probably everything looks easier from your armchair. I'd love to have that view! :) Then stop volunteering. John Vandenberg wrote: I am even less in favour of Domas retiring to an armchair, and think that anyone suggesting that is deluding themselves about Wikimedia's need

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Anthony
On Tue, Feb 16, 2010 at 11:18 PM, John Vandenberg jay...@gmail.com wrote: With this solution, it is now possible to determine how much of the traffic was from valid services. i.e. google translate and other useful services will identify themselves And what separates google translate from

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Anthony
On Tue, Feb 16, 2010 at 11:32 PM, Tim Starling tstarl...@wikimedia.orgwrote: I think it's common knowledge among people who have been reading these lists for a long time, that Anthony has a serious deficit in his sarcasm detection department, and often gives inappropriate responses to

Re: [Wikitech-l] Using Template:Clayden

2010-02-16 Thread Mohamed Magdy
On Wed, Feb 17, 2010 at 2:00 AM, Aryeh Gregor simetrical+wikil...@gmail.comsimetrical%2bwikil...@gmail.com wrote: Without looking into it at all, I'd guess that the Wikipedia template relies on ParserFunctions features from later versions. Wikimedia sites are using version 1.3.0 of