[Wikitech-l] Proposed Program Architecture Summit 2014
Heya, The Program Committee for the Architecture Summit has published a proposed program: https://www.mediawiki.org/wiki/Architecture_Summit_2014 Highlights of the Program: 1) We have tried to incorporate flexibility into the program by allowing 4 unconference break-out slots, 1 open plenary session and a daily ‘agenda bashing’ session to make adjustments to the program if the need arises. 2) There are 3 plenary sessions: HTML Templating ( https://www.mediawiki.org/wiki/Architecture_Summit_2014/HTML_templating), Service Oriented Architecture ( https://www.mediawiki.org/wiki/Architecture_Summit_2014/Service-oriented_architecture) and one open slot. Possible candidates for the open session include Performance and UI styling but this will be decided during the Summit. The short list will include the higher vote-getters in the straw poll, so if there’s one of the clusters you strongly feel should be part of the program, now is your time to make that case. 3) There are 6 breakout sessions: * 2 planned: UI styling ( https://www.mediawiki.org/wiki/Architecture_Summit_2014/UI_styling) and Storage Services ( https://www.mediawiki.org/wiki/Architecture_Summit_2014/Storage_services) * 4 unconference slots There will be a round of Lightning Talks at the beginning of the plenary session after the breakout session to summarize what happened by answering the following questions: a) What did you try to achieve? b) What did you decide? c) What are the next steps? 4) Architecture Panel and Value discussion. This is a plenary session for the architects to share what they value in good architecture, as well as talk about how they see the architecture of MediaWiki evolving, and what role people other than our historical core group of three have to play in the process. During this session, we hope to at least answer some of the questions outlined in the recent discussion of the RFC process[1] 5) RFC roulette: a one hour closing session for RFC’s that have not been in the spotlight during the Summit and where the next step can be decided. This is intended to be fast paced and slightly chaotic. If you would like to hear what’s next for your RFC please participate with the roulette by adding your name to https://www.mediawiki.org/wiki/Architecture_Summit_2014/RFC_roulette What does the Program Committee expect from summit participants? a) If you are a participant, please familiarize yourself with the latest version of the RFC’s that you care about. b) If you are an author of an RFC that is scheduled in a plenary session, please start preparing in collaboration with the other authors from the same session on a short slidedeck that summarizes all the different RFC’s. One slide that could be really useful is to have a matrix that highlights key differences between alternative / competing proposals. Diederik will contact the folks who are invited for the plenary session to help coordinate and organize. c) If you are an author of an RFC that is scheduled in a breakout session, please create maximum 3 slides that summarize your RFC and think about what you want to get out of your session. The slides are optional, but the requisite level of preparation is not. d) If you want to run an unconference slot then start thinking about a theme and possible co-organizers. It’s okay to use an existing RFC cluster. Two quick final notes: The Program has been created using the input from the straw poll ( https://www.mediawiki.org/wiki/Architecture_Summit_2014/Straw_poll), input from the Program Committee and input from the Engineering Community Team. To see which RFC’s compose a cluster please have a look at https://www.mediawiki.org/wiki/Architecture_Summit_2014/RFC_clusters Looking forward to your feedback! Best regards, The Program Committee [1] Discussion of the RFC process: https://www.mediawiki.org/wiki/Talk:Requests_for_comment/Process# ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Proposed Program Architecture Summit 2014
On Thu, Jan 16, 2014 at 10:11 PM, legoktm legoktm.wikipe...@gmail.comwrote: Hi, Given that the Configuration cluster had the second most number of votes in the poll, why was it left of the agenda entirely? We have been going back and forth between plenary session and breakout session for Configuration, it's definitely on the table and I am 99.9% sure that we will have a session dedicated to it. It's just not officially slotted anywhere right now but I will check with Robla and see how we will schedule this important cluster. It not being in the program right now should not be seen as a sign that's not important, on the contrary, we are just trying to find the most appropriate slot for it. D On Thu, Jan 16, 2014 at 10:47 AM, Diederik van Liere dvanli...@wikimedia.org wrote: Heya, The Program Committee for the Architecture Summit has published a proposed program: https://www.mediawiki.org/wiki/Architecture_Summit_2014 Highlights of the Program: 1) We have tried to incorporate flexibility into the program by allowing 4 unconference break-out slots, 1 open plenary session and a daily ‘agenda bashing’ session to make adjustments to the program if the need arises. 2) There are 3 plenary sessions: HTML Templating ( https://www.mediawiki.org/wiki/Architecture_Summit_2014/HTML_templating ), Service Oriented Architecture ( https://www.mediawiki.org/wiki/Architecture_Summit_2014/Service-oriented_architecture ) and one open slot. Possible candidates for the open session include Performance and UI styling but this will be decided during the Summit. The short list will include the higher vote-getters in the straw poll, so if there’s one of the clusters you strongly feel should be part of the program, now is your time to make that case. Make that case where? Given that slidedecks are supposed to be prepared, shouldn't this be decided beforehand rather than waiting? 3) There are 6 breakout sessions: * 2 planned: UI styling ( https://www.mediawiki.org/wiki/Architecture_Summit_2014/UI_styling) and Storage Services ( https://www.mediawiki.org/wiki/Architecture_Summit_2014/Storage_services ) * 4 unconference slots There will be a round of Lightning Talks at the beginning of the plenary session after the breakout session to summarize what happened by answering the following questions: a) What did you try to achieve? b) What did you decide? c) What are the next steps? 4) Architecture Panel and Value discussion. This is a plenary session for the architects to share what they value in good architecture, as well as talk about how they see the architecture of MediaWiki evolving, and what role people other than our historical core group of three have to play in the process. During this session, we hope to at least answer some of the questions outlined in the recent discussion of the RFC process[1] 5) RFC roulette: a one hour closing session for RFC’s that have not been in the spotlight during the Summit and where the next step can be decided. This is intended to be fast paced and slightly chaotic. If you would like to hear what’s next for your RFC please participate with the roulette by adding your name to https://www.mediawiki.org/wiki/Architecture_Summit_2014/RFC_roulette What does the Program Committee expect from summit participants? a) If you are a participant, please familiarize yourself with the latest version of the RFC’s that you care about. b) If you are an author of an RFC that is scheduled in a plenary session, please start preparing in collaboration with the other authors from the same session on a short slidedeck that summarizes all the different RFC’s. One slide that could be really useful is to have a matrix that highlights key differences between alternative / competing proposals. Diederik will contact the folks who are invited for the plenary session to help coordinate and organize. c) If you are an author of an RFC that is scheduled in a breakout session, please create maximum 3 slides that summarize your RFC and think about what you want to get out of your session. The slides are optional, but the requisite level of preparation is not. d) If you want to run an unconference slot then start thinking about a theme and possible co-organizers. It’s okay to use an existing RFC cluster. Two quick final notes: The Program has been created using the input from the straw poll ( https://www.mediawiki.org/wiki/Architecture_Summit_2014/Straw_poll), input from the Program Committee and input from the Engineering Community Team. To see which RFC’s compose a cluster please have a look at https://www.mediawiki.org/wiki/Architecture_Summit_2014/RFC_clusters Looking forward to your feedback! Best regards, The Program Committee [1] Discussion of the RFC process: https://www.mediawiki.org/wiki
[Wikitech-l] Final chance to vote in Architecture Summit straw poll
Heya, Today, January 8th until 11:59 PM PST, you can vote in the straw poll for the Architecture Summit. Please cast your votes here: https://www.mediawiki.org/wiki/Architecture_Summit_2014/Straw_poll Tomorrow I will start creating the program for the Summit and cannot promise to include your votes anymore. Thanks to all the folks who have voted so far. D ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Straw poll Architecture Summit closing January 8th
Heya, If you haven't exercised your right to cast a vote in the Straw Poll for the Architecture Summit then this would be a real good time to do so. I would like to close the poll by January 8th so we can start making the final program. You can find the straw poll here: https://www.mediawiki.org/wiki/Architecture_Summit_2014/Straw_poll Thanks for your help and please help me in driving the turnout - the more votes the better! D ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RFC on PHP profiling
@Chad: should this be included in the straw poll for the architecture summit or is that too soon? D On Tue, Dec 31, 2013 at 6:55 PM, Chad innocentkil...@gmail.com wrote: I'm starting a new RFC to discuss ways we can improve our PHP profiling. https://www.mediawiki.org/wiki/Requests_for_comment/Better_PHP_profiling Please feel free to help expand and/or comment on the talk page if you've got ideas :) -Chad ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Straw poll to determine program for Architecture Summit
Hi everyone, Best wishes for 2014! I hope on your list of resolutions for the New Year is to participate in the straw poll for the Architecture Summit -- https://www.mediawiki.org/wiki/Architecture_Summit_2014/Straw_poll Quick refresher: we have created clusters of related RFC's ( https://www.mediawiki.org/wiki/Architecture_Summit_2014/RFC_clusters) for the upcoming Architecture Summit. We are pretty happy with the clustering and so now we want to hear from you what you think are the most important clusters that should definitely be included in the program. The poll will close Thursday, January 8th. If you have any questions then please let me know! Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Fwd: Proposal for biweekly Labs showcase
Bumping my proposal. I am particularly looking forward from responses from community members. Have an awesome New Year's Eve! Best, Diederik Heya, I just posted an initial proposal to start running a biweekly showcase to feature all the cool things that are happening on Labs. Please have a look at https://wikitech.wikimedia.org/wiki/Showcase, express your interest and chime in on the Talk page to help get this off the ground. Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Clustering of RFC's for the Architecture Summit
Heya, We are making good progress with creating clusters to group RFC for the upcoming Architectural Summit. Some clusters are still too big, in particular the following clusters can/should be split in smaller clusters of 3-4 RFC's each. * General Mediawiki Functionality * Backend code modularity frameworks * Installation * SOA * UI/UX: styling Please have a look at https://www.mediawiki.org/wiki/Architecture_Summit_2014/RFC_clusters and help us finalize the clustering! Thanks! Best, D ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Suggested process for determining topics at the Architecture Summit
Hola, We wanted to update you with our proposal on how we are thinking on how to create a program for the Architecture Summit coming January. We started grouping the RFC's from https://www.mediawiki.org/wiki/RFC into clusters at https://www.mediawiki.org/wiki/Architecture_Summit_2014/RFC_Clusters. This clustering is neither perfect nor complete and we would like to ask your help with finetuning the clustering. The idea is that a cluster contains RFC's that belong to each other -- this means if we discuss RFC A and therefore RFC B needs to be discussed as well then those two RFC's should be in the same cluster. Clusters should also be small, probably not more than 3 or 4 RFC's per cluster. Sometimes RFC's in the same cluster will offer alternative suggested implementations, sometimes, RFC's are closely related because they pursue a similar goal. Currently, we have one big cluster called 'General Mediawiki Functionality' and this list definitely needs to be broken up in smaller clusters. The 'Misc' cluster can probably also broken up in smaller clusters. Once we have nailed down the clusters of RFC's then we will run a strawpoll to gauge interest for the different clusters. The strawpoll will inform our decision which clusters should be discussed at the Architecture Summit. We want to launch this strawpoll at the latest on January 2nd, 2014. Summary: 1) Help us finalize the clustering of RFC's on https://www.mediawiki.org/wiki/Architecture_Summit_2014/RFC_Clusters 2) Participate in the strawpoll once it goes live (probably January 2nd), separate email will follow. If you have any questions, thoughts, suggestions, remarks, etc, etc please let us know! Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Architecture Summit -- Gathering all relevant RfC's
On Wed, Nov 27, 2013 at 2:55 PM, Jon Robson jdlrob...@gmail.com wrote: One that I would like to discuss but still need to write up is JavaScript template support in ResourceLoader. Mobile has been using Hogan.js for some time and we would like to upstream this as a standard. I'll try and get this written in next 2 weeks but it would be good to capture this even in a stub like form (not sure if stubs are allowed on the RFC page) Hey Jon, If there's anything I can do to help you with this RfC then please let me know. Best, Diederik On Tue, Nov 26, 2013 at 6:27 PM, Diederik van Liere dvanli...@wikimedia.org wrote: Heya, The Architecture Summit will be upon us in less than two months. To make sure that this Summit is going to be productive it is important that we discuss the right RfC's. Before deciding which RfC's should be discussed at the Summit I want to make sure that https://www.mediawiki.org/wiki/Requests_for_comment contains all RfC's and that all important topics have an RfC. If you have a Mediawiki related RfC in a personal notepad, on your User Page, in your mind then this would be a great moment to write or move it under https://www.mediawiki.org/wiki/Requests_for_comment and add an entry to the table. If you don't have 'move' rights then please let me know and I can move it for you. If you know of a topic that *should* have an RfC but does not yet have an RfC then please reply to this list mentioning the topic. I will check with Tim/Brion to see how these topics can get an RfC. Once we have collected all relevant RfC's under https://www.mediawiki.org/wiki/Requests_for_comment then I will make a page where everybody can express their interest in which RfC's should be discussed at the Summit. Questions? Let me know! Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Jon Robson http://jonrobson.me.uk @rakugojon ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Analytics] [WikimediaMobile] Mobile stats
Heya, I would suggest to at least run it for a 7 day period so you capture at least the weekly time-trends, increasing the sample size should also be recommendable. We can help setup a udp-filter for this purpose as long as the data can be extracted from the user-agent string. D On Wed, Sep 4, 2013 at 1:50 PM, Arthur Richards aricha...@wikimedia.orgwrote: Thanks Max for digging into this :) I'm no analytics guy, but I am a little concerned about the sample size and duration of the internal logging that we've done - sampling 1/1 for only a few days for data about something we generally know usage to already be low seems to me like it might be difficult to get accurate numbers. Can someone from the analytics team chime in and let us know if the approach is sound and if we should trust the data Max has come up with? This has big implications as it will play role in determining whether or not we continue supporting WAP devices and providing WAP access to the sites. Thanks everyone! On Tue, Sep 3, 2013 at 10:40 AM, Erik Zachte ezac...@wikimedia.orgwrote: Sadly you need to take squid log based reports with a grain of salt. Several incomplete maintenance jobs have taken their toll. Each report starts with a long list of unsolved bugs. Among those https://bugzilla.wikimedia.org/show_bug.cgi?id=46273 So yeah better trust your own data. Erik -Original Message- From: analytics-boun...@lists.wikimedia.org [mailto: analytics-boun...@lists.wikimedia.org] On Behalf Of Max Semenik Sent: Tuesday, September 03, 2013 5:33 PM To: analyt...@lists.wikimedia.org; Wikimedia developers; mobile-l Subject: [Analytics] Mobile stats Hi, I have a few questions regarding mobile stats. I need to determine a real percentage of WAP browsers. At first glance, [1] looks interesting: ratio of text/html to text/vnd.wap.wml is 92M / 3987M = 2.3% on m.wikipedia.org. However, this contradicts the stats at [2] which have different numbers and a different ratio. I did my own research: because during browser detection in Varnish WAPness is detected mostly by looking at accept header and because our current analytics infrastructure doesn't log it, I quickly whipped up a code that recorded user-agent and accept of every 10,000th request for mobile page views hitting apaches. According to several days worth of data, out of 14917 logged requests 1445 contained vnd.wap.wml in Accept: headers in any form. That's more than what is logged for frontend responses, however it is expected as WAP should have worse cache hit rate and thus should hit apaches more often. Next, our WAP detection code is very simple: user-agent is checked against a few major browser IDs (all of them are HTML-capable and this check is not actually needed anymore and will go away soon) and if still not known, we consider every device that sends Accept: header vnd.wap.wml (but not application/vnd.wap.xhtml+xml), to be WAP-only. If we apply these rules, we get only 68 entries that qualify as WAP which is 0.05% of all mobile requests. The question is, what's wrong: my research or stats.wikimedia.org? And if it's indeed just 0.05%, we should probably^W definitely kill WAP support on our mobile site as it's virtually unmaintained. - [1] http://stats.wikimedia.org/wikimedia/squids/SquidReportRequests.htm [2] http://stats.wikimedia.org/wikimedia/squids/SquidReportClients.htm -- Best regards, Max Semenik ([[User:MaxSem]]) ___ Analytics mailing list analyt...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics ___ Mobile-l mailing list mobil...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l -- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687 ___ Analytics mailing list analyt...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] access log (pagecounts) dump stopped
Hi Cheol! Thanks for alerting us to this issue. We are looking into it right now. Best, Diederik On Mon, Aug 5, 2013 at 4:24 PM, Ryu Cheol rch...@gmail.com wrote: Hello guys, http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-08/ is not updated for a few hours. I don't know who keeps this running. Would please you let him know? Cheers! Cheol ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] access log (pagecounts) dump stopped
Hi Cheol, The cronjob was broken due to some maintenance on the dumps server. The cronjob is being fixed right now and no data has been lost. In a couple of hours all files should be present again. If you still see an issue in 48 hours then please ping me. Best, Diederik On Mon, Aug 5, 2013 at 5:09 PM, Diederik van Liere dvanli...@wikimedia.orgwrote: Hi Cheol! Thanks for alerting us to this issue. We are looking into it right now. Best, Diederik On Mon, Aug 5, 2013 at 4:24 PM, Ryu Cheol rch...@gmail.com wrote: Hello guys, http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-08/ is not updated for a few hours. I don't know who keeps this running. Would please you let him know? Cheers! Cheol ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RFC: Tab as field delimiter in logging format of cache servers
This bug has been fixed, see https://bugzilla.wikimedia.org/show_bug.cgi?id=45178 I will post a message on the Village Pump as well. Best, Diederik On Sun, Feb 3, 2013 at 3:44 PM, Brad Jorsch bjor...@wikimedia.org wrote: On Fri, Jan 25, 2013 at 12:51 PM, Diederik van Liere dvanli...@wikimedia.org wrote: No, the output format of http://dumps.wikimedia.org/other/pagecounts-raw/ will stay the same. It seems that page names are coming through with spaces now, where they didn't before. See https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Format_Change_of_Page_View_Stats ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Wmfall] Yuri Astrakhan Adam Baso join Mobile department partner team
Awesome news! Go team Mobile! D On Mon, Mar 18, 2013 at 1:59 PM, Rachel Farrand rfarr...@wikimedia.orgwrote: Welcome Adam and Yuri! Looking forward to working with both of you. :) Rachel On Mon, Mar 18, 2013 at 10:48 AM, Erik Moeller e...@wikimedia.org wrote: On Mon, Mar 18, 2013 at 10:29 AM, Tomasz Finc tf...@wikimedia.org wrote: I'm pleased to announce that the mobile department has two new staff members. Yuri Astrakhan Adam Baso join as sr. software developers on the mobile partner team. Welcome on board, guys. Really looking forward to the next steps with WP Zero. :-) Erik -- Erik Möller VP of Engineering and Product Development, Wikimedia Foundation Support Free Knowledge: https://wikimediafoundation.org/wiki/Donate ___ Wmfall mailing list wmf...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wmfall ___ Wmfall mailing list wmf...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wmfall ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
Thanks Asher for tying this up! I was about to write a similar email :) One final question, just to make sure we are all on the same page: is the X-CS field becoming a generic key/value pair for tracking purposes? D On Fri, Feb 15, 2013 at 11:16 AM, Asher Feldman afeld...@wikimedia.orgwrote: Just to tie this thread up - the issue of how to count ajax driven pageviews loaded from the api and of how to differentiate those requests from secondary api page requests has been resolved without the need for code or logging changes. Tagging of the mobile beta site will be accomplished via a new generic mediawiki http response header dedicated to logging containing key value pairs. -Asher On Tue, Feb 12, 2013 at 9:56 AM, Asher Feldman afeld...@wikimedia.org wrote: On Tuesday, February 12, 2013, Diederik van Liere wrote: It does still seem to me that the data to determine secondary api requests should already be present in the existing log line. If the value of the page param in an action=mobileview api request matches the page in the referrer (perhaps with normalization), it's a secondary request as per case 1 below. Otherwise, it's a pageview as per case 2. Difficult or expensive to reconcile? Not when you're doing distributed log analysis via hadoop. So I did look into this prior to writing the RFC and the issue is that a lot of API referrers don't contain the querystring. I don't know what triggers this so if we can fix this then we can definitely derive the secondary pageview request from the referrer field. D If you can point me to some examples, I'll see if I can find any insights into the behavior. On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards aricha...@wikimedia.org wrote: Thanks, Jon. To try and clarify a bit more about the API requests... they are not made on a per-section basis. As I mentioned earlier, there are two cases in which article content gets loaded by the API: 1) Going directly to a page (eg clicking a link from a Google search) will result in the backend serving a page with ONLY summary section content and section headers. The rest of the page is lazily loaded via API request once the JS for the page gets loaded. The idea is to increase responsiveness by reducing the delay for an article to load (further details in the article Jon previously linked to). The API request looks like: http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all 2) Loading an article entirely via Javascript - like when a link is clicked in an article to another article, or an article is loaded via search. This will make ONE call to the API to load article content. API request looks like: http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all These API requests are identical, but only #2 should be counted as a 'pageview' - #1 is a secondary API request and should not be counted as a 'pageview'. You could make the argument that we just count all of these API requests as pageviews, but there are cases when we can't load article content from the API (like devices that do not support JS), so we need to be able to count the traditional page request as a pageview - thus we need a way to differentiate the types of API requests being made when they otherwise share the same URL. On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrob...@gmail.com wrote: I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this further let's start another thread as I'm getting extremely confused doing so on this one). Lazy loading sections For motivation behind moving MobileFrontend into the direction of lazy loading section content and subsequent pages can be found here [1], I just gave it a refresh as it was a little out of date. In summary the reason is to 1) make the app feel more responsive by simply loading content rather than reloading the entire interface 2) reducing the payload sent to a device. Session Tracking Going back to the discussion of tracking mobile page views, it sounds like a header stating whether a page is being viewed in alpha, beta or stable works fine for standard page views. As for the situations where an entire page is loaded via the api
Re: [Wikitech-l] Page view stats we can believe in
Hi all, Lars, Rupert thanks for flagging this and you are quite right: the numbers are too high because webstatscollector, the software that does the counts, just counts every request as a hit including bots, error pages etc. I am planning on running a sprint at the Amsterdam Hackathon to built an easy queryable datastore with clean pageview counts. Please let me know if you are interested in this so I can pitch this. Best, Diederik On Wed, Feb 13, 2013 at 3:36 PM, Lars Aronsson l...@aronsson.se wrote: On 02/14/2013 12:03 AM, rupert THURNER wrote: this means 569 pages accessed in this hour, at least once. Thanks for taking the time to do this check! This number already is unreasonable for an obscure project with 8000 articles. da.d Speciel:Eksporter/engelsk 2 7818 Should Special:Export ever count as page views? Anyway, there are no humans using Special:Export on da.wiktionary in the middle of the night. this means that e.g. springer was supposedly accessed 3 times in that hour. the article does not exist, but there is a red link out of http://da.wiktionary.org/wiki/**Wiktionary:Top_1_(Dansk)http://da.wiktionary.org/wiki/Wiktionary:Top_1_(Dansk) . So are there some stupid bots that follow red links? There could be a large number of such accesses on Wiktionary (in any language) because there are so many red links. But bots should never be counted among the page views. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se __**_ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
It does still seem to me that the data to determine secondary api requests should already be present in the existing log line. If the value of the page param in an action=mobileview api request matches the page in the referrer (perhaps with normalization), it's a secondary request as per case 1 below. Otherwise, it's a pageview as per case 2. Difficult or expensive to reconcile? Not when you're doing distributed log analysis via hadoop. So I did look into this prior to writing the RFC and the issue is that a lot of API referrers don't contain the querystring. I don't know what triggers this so if we can fix this then we can definitely derive the secondary pageview request from the referrer field. D On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards aricha...@wikimedia.org wrote: Thanks, Jon. To try and clarify a bit more about the API requests... they are not made on a per-section basis. As I mentioned earlier, there are two cases in which article content gets loaded by the API: 1) Going directly to a page (eg clicking a link from a Google search) will result in the backend serving a page with ONLY summary section content and section headers. The rest of the page is lazily loaded via API request once the JS for the page gets loaded. The idea is to increase responsiveness by reducing the delay for an article to load (further details in the article Jon previously linked to). The API request looks like: http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all 2) Loading an article entirely via Javascript - like when a link is clicked in an article to another article, or an article is loaded via search. This will make ONE call to the API to load article content. API request looks like: http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all These API requests are identical, but only #2 should be counted as a 'pageview' - #1 is a secondary API request and should not be counted as a 'pageview'. You could make the argument that we just count all of these API requests as pageviews, but there are cases when we can't load article content from the API (like devices that do not support JS), so we need to be able to count the traditional page request as a pageview - thus we need a way to differentiate the types of API requests being made when they otherwise share the same URL. On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrob...@gmail.com wrote: I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this further let's start another thread as I'm getting extremely confused doing so on this one). Lazy loading sections For motivation behind moving MobileFrontend into the direction of lazy loading section content and subsequent pages can be found here [1], I just gave it a refresh as it was a little out of date. In summary the reason is to 1) make the app feel more responsive by simply loading content rather than reloading the entire interface 2) reducing the payload sent to a device. Session Tracking Going back to the discussion of tracking mobile page views, it sounds like a header stating whether a page is being viewed in alpha, beta or stable works fine for standard page views. As for the situations where an entire page is loaded via the api it makes no difference to us to whether we 1) send the same header (set via javascript) or 2) add a query string parameter. The only advantage I can see of using a header is that an initial page load of the article San Francisco currently uses the same api url as a page load of the article San Francisco via javascript (e.g. I click a link to 'San Francisco' on the California article). In this new method they would use different urls (as the data sent is different). I'm not sure how that would effect caching. Let us know which method is preferred. From my perspective implementation of either is easy. [1] http://www.mediawiki.org/wiki/MobileFrontend/Dynamic_Sections On Mon, Feb 11, 2013 at 12:50 PM, Asher Feldman afeld...@wikimedia.org wrote: Max - good answers re: caching concerns. That leaves studying if the bytes transferred on average mobile article view increases or decreases with lazy section loading. If it increases, I'd say this isn't a positive direction to go in and stop there. If it decreases, then we should look at the effect on total latency, number of
Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
Analytics folks, is this workable from your perspective? Yes, this works fine for us and it's also no problem to set multiple key/value pairs in the http header that we are now using for the X-CS header. Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Fwd: RFC: Introducing two new HTTP headers to track mobile pageviews
Thanks Ori, I was not aware of this D Sent from my iPhone On 2013-02-02, at 16:55, Ori Livneh o...@wikimedia.org wrote: On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote: I don't like it's cryptic nature. Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b». Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary But that also means sending more bytes through the wire :S Well, you can (and should) drop the 'X-' :-) See http://tools.ietf.org/html/rfc6648: Deprecating the X- Prefix and Similar Constructs in Application Protocols -- Ori Livneh ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Fwd: RFC: Introducing two new HTTP headers to track mobile pageviews
(Apologies for cross-posting) Heya, The mobile team needs accurate pageviews for the alpha and beta mobile site. Currently, this information is only stored in a cookie, but we don't want to go the route of starting to store this cookie because of cache server performance, network performance and privacy policy issues. The mobile team also needs to be able to diferentiate between initial and secondary API requests - pages in the beta version of MobileFrontend are dynamically loaded via the API, meaning that MobileFrontend will might make multiple API requests to load sections of an article when they are toggled open up by the user. At the moment, we have no way of diferentiating between API requests to determine which one should count as a 'pageview'. We propose that we set two additional custom HTTP headers - one to identify alpha/beta/stable version of MobileFrontend, the other to be able to diferentiate between initial and secondary API requests. This would make logging the necessary information trivial, and we believe it would be fairly lightweight to implement. We propose the following two headers with their possible values: X-MF-Mode: a/b/s (alpha/beta/stable) X-MF-Req: 1/2 (primary/secondary) X-MF-Mode would be determined by Varnish based off the existence of the alpha/beta identifying cookies while X-MF-Req would be set by MobileFrontend in the backend response. These headers would only be set on the Varnish servers, on the Squids/Nginx we will just set a dash ('-') in the log fields. Questions: 1) Are there objections to the introduction of these two http headers? 2) We would like to aim for a late February deployment, is that an okay period? (We will announce the real deployment date as well) 3) Are we missing anything important? Thanks for your feedback! Best Arthur Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Analytics] RFC: Tab as field delimiter in logging format of cache servers
Yes let's not change the filenames D Sent from my iPhone On 2013-01-31, at 18:45, Matthew Walker mwal...@wikimedia.org wrote: We will most likely change the file names back to their original names in a month or so Please don't. It'll serve as a visible marker for the future for when we go back and look at the files and do a WTF. ~Matt Walker ___ Analytics mailing list analyt...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Nexus Maven repo
heya, for all you Java junkies out there, oh wait there are very few within WMF :) Anyways, if you do Java you can now use the Nexus Maven repo that is installed on Labs at http://nexus.wmflabs.org/nexus/index.html#welcome We are happy to give you an account, please poke us on IRC @ wikimedia-analytics or email David Schoonover or me. Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] RFC: Tab as field delimiter in logging format of cache servers
Apologies for crossposting Heya, The Analytics Team is planning to deploy tab as field delimiter to replace the current space as fielddelimiter on the varnish/squid/nginx servers. We would like to do this on February 1st. The reason for this change is that we need to have a consistent number of fields in each webrequest log line. Right now, some fields contain spaces and that require a lot of post-processing cleanup and slows down the generation of reports. What is affected and maintained by Analytics * udp-filter already has support for the tab character * webstatscollector: we compiled a new version of filter to add support for the tab character * wikistats: we will fix the scripts on an ongoing basis. * udp2log: we have a patch ready for inserting sequence numbers separated by tab. In particular, I would like to have feedback to three questions: 1) Are there important reasons not to use tab as field delimiter? 2) Are there important pieces of logging that expect a space instead of a tab and that need to be fixed and that I did not mention in this email? 3) Is February 1st a good date to deploy this change? (Assuming that all preps are finished) Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RFC: Tab as field delimiter in logging format of cache servers
No, the output format of http://dumps.wikimedia.org/other/pagecounts-raw/ will stay the same. Best, Diederik On Fri, Jan 25, 2013 at 12:48 PM, bawolff bawolff...@gmail.com wrote: Just to clarify, will this affect the stats at http://dumps.wikimedia.org/other/pagecounts-raw/ ? Changing the format of that will probably break third party scripts. -- -bawolff On Fri, Jan 25, 2013 at 1:41 PM, Diederik van Liere dvanli...@wikimedia.org wrote: Apologies for crossposting Heya, The Analytics Team is planning to deploy tab as field delimiter to replace the current space as fielddelimiter on the varnish/squid/nginx servers. We would like to do this on February 1st. The reason for this change is that we need to have a consistent number of fields in each webrequest log line. Right now, some fields contain spaces and that require a lot of post-processing cleanup and slows down the generation of reports. What is affected and maintained by Analytics * udp-filter already has support for the tab character * webstatscollector: we compiled a new version of filter to add support for the tab character * wikistats: we will fix the scripts on an ongoing basis. * udp2log: we have a patch ready for inserting sequence numbers separated by tab. In particular, I would like to have feedback to three questions: 1) Are there important reasons not to use tab as field delimiter? 2) Are there important pieces of logging that expect a space instead of a tab and that need to be fixed and that I did not mention in this email? 3) Is February 1st a good date to deploy this change? (Assuming that all preps are finished) Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Research on newcomer experience - do we want to take part?
Hey Quim I also sent you this survey a week ago with the question whether we should participate :) D On Fri, Nov 16, 2012 at 5:13 PM, Quim Gil q...@wikimedia.org wrote: Hi, sorry for cross-replying. On Wed, Nov 14, 2012 at 3:11 PM, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: On Wed, Nov 14, 2012 at 11:00 PM, Marcin Cieslak sa...@saper.info wrote: Hello, Kevin Carillo[1] from University of Wellington is going to research Newcomer experience and contributor behavior in FOSS communities[2] So far Debian, GNOME, Gentoo, KDE, Mozilla, Ubuntu, NetBSD, OpenSUSE will be taken into account, and FreeBSD recently joined[3] and there is still some possibility for other large FOSS projects to join. I think it could fit nicely into our recent efforts directed at newcomer experience after Git migration. And MediaWiki is a bit different than above projects. Are we interested to include MediaWiki in that research? As Kevin explains in his post he tried to avoid spamming mailing lists to look for project interested, so I am doing this for him :-) //Saper I've worked with Kevin in preparation for his survey and later promotion from the KDE-side quite a bit. This is not the kind of research project that is of no value to the project taking part. I expect the results to be very useful for KDE (and likely also the other projects taking part). It turns out that Sumana and me have been in touch with Kevin in the past days after Asheesh Laroia proposed directly to include Wikimedia in this research. Said and done, Wikimedia is also included in the survey and you are encouraged to invest some minutes in it: https://limesurvey.sim.vuw.ac.nz/index.php?sid=65151 I will send a proper announcement next Monday, but in the meantime here is an illustrative link of links: http://kevincarillo.org/2012/11/15/survey-update-after-1-week/ -- Quim ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] editing channels - How was this edit made?
On 2012-11-14, at 18:33, Platonides platoni...@gmail.com wrote: On 13/11/12 23:42, MZMcBride wrote: Please stop top-posting. If you don't understand what that means, please read https://wiki.toolserver.org/view/Mailing_list_etiquette. As I posted at https://www.mediawiki.org/wiki/Talk:Revtagging, it's not clear to me why the built-in revision tagging system in MediaWiki is insufficient for your needs. It _feels_ like wheel-reinvention, but perhaps there's some key component I'm missing. It should indeed be enough to use change_tag. Also note that some parameters listed in the page are redundant for some campaigns (such as adding the bot name). I think that the Analytics team would prefer either: 1) detect source of edit in the URL Or 2) have a hook activated after a successful edit and have the data send to the pixel service Having this data in a MySQL table poses a lot of challenges with respect of importing that data into the analytics cluster Best Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] editing channels - How was this edit made?
Dario has been proposing RevTagging to exactly address this need, see: http://www.mediawiki.org/wiki/Revtagging I really think we should put this on the roadmap for 2013 for Mediawiki, we definitely need this more granular level of instrumentation for determining the source of an edit. Best Diederik On Tue, Nov 13, 2012 at 6:19 AM, Amir E. Aharoni amir.ahar...@mail.huji.ac.il wrote: Hi, In the Bangalore DevCamp I spoke a bit with Brion about a way to measure various ways of editing MediaWiki pages. The original idea was to measure how much the mobile editing, when it becomes widely available, is actually used. A simplistic solution would be add a boolean rev_mobile field to the revision table, but this can apply to a lot of other things, for example: * Visual Editor vs. the current wiki-syntax editor * A usual browser vs. AutoWikiBrowser vs. direct API calls * bots vs. non-bots * for file uploads, Special:Upload vs. Special:UploadWizard Things get even more complicated, because several such flags may apply at once: for example, I can imagine a human editor using a mobile editing interface with a bot flag, because he makes a lot of tiny edits and the community doesn't want them to appear in RecentChanges. And of course, there may be privacy and performance implications, too. Nevertheless, some kind of metrics of the various contributions channels would be useful. Any more ideas? -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] About RESOLVED LATER
Hi, I made the exact same argument a while back (Dropping the LATER resolution in Bugzilla http://wikimedia.7.n6.nabble.com/Dropping-the-LATER-resolution-in-Bugzilla-td743804.html ) +1 D On Mon, Nov 5, 2012 at 5:25 PM, Quim Gil quim...@gmail.com wrote: I was a bit of a lazy child, specially when it came to clean up my room or do my homework. I tried to convince my mom and teachers about the paradigm of RESOLVED LATER, but they never bought it. At the end I had to clean up my room and do my homework. Even as a child I suspected that they were actually right. If something has been postponed for later it can't be called resolved. Now it's me who hears from time to time excuses from my kids that sound more or less like RESOLVED LATER. Yeah, sure, I tell them, pointing to the source of pending work. :) And now to the topic: What about removing the LATER resolution from our Bugzilla? It feels like sweeping reports under the carpet. If a team is convinced that something won't be addressed any time soon then they can WONTFIX. If anybody feels differently then they can take the report back and fix it. Before we could simply reopen the 311 reports filed under RESOLVED LATER: http://bit.ly/YxW60z Huggle 1 MediaWiki 74 MediaWiki extensions104 Monuments database 1 mwEmbed 3 Parsoid 1 Tools 2 WikiLoves Monuments Mobile 4 Wikimedia 114 Wikimedia Labs 1 Wikimedia Mobile3 Wikipedia App 3 Total 311 Looking at the total amount of open bugs, the impact is not that big. The house will be as tidy/untidy as before, but at least things wll be clearer now. What do you think? -- Quim __**_ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MediaWiki community metrics
I'm not even sure where to find the code for http://gerrit-stats.wmflabs.* *org/ http://gerrit-stats.wmflabs.org/ . In gerrit I could only find the /analytics/scorecard project. The repo is available at: https://gerrit.wikimedia.org/r/gitweb?p=analytics%2Fgerrit-stats.git;a=shortlog;h=HEAD As mentioned before, Limn is responsible for visualizing the data, gerrit-stats only pulls data from Gerrit and construct measures. Happy to discuss how to come up with developer centric measures. Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MediaWiki community metrics
Question: what is the best approach to retrieve the number of existing Gerrit accounts? This number is already stored within gerrit-stats, it is just not being written to a dataset. D ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] IPv6 usage on Wikimedia?
On World IP6 day (June 6th 2012), we had about 5000 IP6 hits, however, for the first 17 days of September we had a total of 1,000,032,000 hits coming from IP6 addresses. This is based on the sampled squid log data. Best, Diederik On Mon, Sep 17, 2012 at 8:03 AM, David Gerard dger...@gmail.com wrote: On 17 September 2012 12:36, Thomas Dalton thomas.dal...@gmail.com wrote: On 17 September 2012 11:25, David Gerard dger...@gmail.com wrote: Do we have any stats on IPv6 accesses and edits on Wikimedia sites? I see this page on stats, which suggests it's literally so small we can't even count it: http://stats.wikimedia.org/wikimedia/squids/SquidReportCountryData.htm Is that actually the case? 'Cos we do know IPv6 edits occur, therefore IPv6 page views occur. That's a split by country, why would it mention IPv6? Judging by the number of anonymous edits coming from IPv6 addresses, there might be fairly high usage. Indeed. So where are the actual stats? - d. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] IPv6 usage on Wikimedia?
Here is the actual raw data since Jan. 1st 2012 (multiply each observation with 1000 to get the estimated number of hits for that day). The assumption is that each hit has the same probability of showing up in the squid log file. As you can see, after World IP6 day, we started supporting way more IP6 services and hence the increase in traffic. /a/squid/archive/sampled/sampled-1000.log-20120101.gz,2 /a/squid/archive/sampled/sampled-1000.log-20120102.gz,2 /a/squid/archive/sampled/sampled-1000.log-20120103.gz,4 /a/squid/archive/sampled/sampled-1000.log-20120104.gz,3 /a/squid/archive/sampled/sampled-1000.log-20120105.gz,2 /a/squid/archive/sampled/sampled-1000.log-20120106.gz,7 /a/squid/archive/sampled/sampled-1000.log-20120107.gz,107 /a/squid/archive/sampled/sampled-1000.log-20120108.gz,139 /a/squid/archive/sampled/sampled-1000.log-20120109.gz,322 /a/squid/archive/sampled/sampled-1000.log-20120110.gz,367 /a/squid/archive/sampled/sampled-1000.log-20120111.gz,378 /a/squid/archive/sampled/sampled-1000.log-20120112.gz,341 /a/squid/archive/sampled/sampled-1000.log-20120113.gz,263 /a/squid/archive/sampled/sampled-1000.log-20120114.gz,187 /a/squid/archive/sampled/sampled-1000.log-20120115.gz,191 /a/squid/archive/sampled/sampled-1000.log-20120116.gz,360 /a/squid/archive/sampled/sampled-1000.log-20120117.gz,368 /a/squid/archive/sampled/sampled-1000.log-20120118.gz,510 /a/squid/archive/sampled/sampled-1000.log-20120119.gz,398 /a/squid/archive/sampled/sampled-1000.log-20120120.gz,274 /a/squid/archive/sampled/sampled-1000.log-20120121.gz,176 /a/squid/archive/sampled/sampled-1000.log-20120122.gz,177 /a/squid/archive/sampled/sampled-1000.log-20120123.gz,349 /a/squid/archive/sampled/sampled-1000.log-20120124.gz,339 /a/squid/archive/sampled/sampled-1000.log-20120125.gz,364 /a/squid/archive/sampled/sampled-1000.log-20120126.gz,366 /a/squid/archive/sampled/sampled-1000.log-20120127.gz,277 /a/squid/archive/sampled/sampled-1000.log-20120128.gz,175 /a/squid/archive/sampled/sampled-1000.log-20120129.gz,244 /a/squid/archive/sampled/sampled-1000.log-20120130.gz,370 /a/squid/archive/sampled/sampled-1000.log-20120131.gz,373 /a/squid/archive/sampled/sampled-1000.log-20120201.gz,366 /a/squid/archive/sampled/sampled-1000.log-20120202.gz,327 /a/squid/archive/sampled/sampled-1000.log-20120203.gz,259 /a/squid/archive/sampled/sampled-1000.log-20120204.gz,159 /a/squid/archive/sampled/sampled-1000.log-20120205.gz,192 /a/squid/archive/sampled/sampled-1000.log-20120206.gz,360 /a/squid/archive/sampled/sampled-1000.log-20120207.gz,351 /a/squid/archive/sampled/sampled-1000.log-20120208.gz,350 /a/squid/archive/sampled/sampled-1000.log-20120209.gz,306 /a/squid/archive/sampled/sampled-1000.log-20120210.gz,275 /a/squid/archive/sampled/sampled-1000.log-20120211.gz,176 /a/squid/archive/sampled/sampled-1000.log-20120212.gz,210 /a/squid/archive/sampled/sampled-1000.log-20120213.gz,336 /a/squid/archive/sampled/sampled-1000.log-20120214.gz,372 /a/squid/archive/sampled/sampled-1000.log-20120215.gz,339 /a/squid/archive/sampled/sampled-1000.log-20120216.gz,333 /a/squid/archive/sampled/sampled-1000.log-20120217.gz,272 /a/squid/archive/sampled/sampled-1000.log-20120218.gz,147 /a/squid/archive/sampled/sampled-1000.log-20120219.gz,202 /a/squid/archive/sampled/sampled-1000.log-20120220.gz,316 /a/squid/archive/sampled/sampled-1000.log-20120221.gz,321 /a/squid/archive/sampled/sampled-1000.log-20120222.gz,331 /a/squid/archive/sampled/sampled-1000.log-20120223.gz,334 /a/squid/archive/sampled/sampled-1000.log-20120224.gz,319 /a/squid/archive/sampled/sampled-1000.log-20120225.gz,178 /a/squid/archive/sampled/sampled-1000.log-20120226.gz,155 /a/squid/archive/sampled/sampled-1000.log-20120227.gz,229 /a/squid/archive/sampled/sampled-1000.log-20120228.gz,347 /a/squid/archive/sampled/sampled-1000.log-20120229.gz,344 /a/squid/archive/sampled/sampled-1000.log-20120301.gz,362 /a/squid/archive/sampled/sampled-1000.log-20120302.gz,339 /a/squid/archive/sampled/sampled-1000.log-20120303.gz,337 /a/squid/archive/sampled/sampled-1000.log-20120304.gz,201 /a/squid/archive/sampled/sampled-1000.log-20120305.gz,242 /a/squid/archive/sampled/sampled-1000.log-20120306.gz,421 /a/squid/archive/sampled/sampled-1000.log-20120307.gz,485 /a/squid/archive/sampled/sampled-1000.log-20120308.gz,460 /a/squid/archive/sampled/sampled-1000.log-20120309.gz,413 /a/squid/archive/sampled/sampled-1000.log-20120310.gz,322 /a/squid/archive/sampled/sampled-1000.log-20120311.gz,205 /a/squid/archive/sampled/sampled-1000.log-20120312.gz,202 /a/squid/archive/sampled/sampled-1000.log-20120313.gz,417 /a/squid/archive/sampled/sampled-1000.log-20120314.gz,478 /a/squid/archive/sampled/sampled-1000.log-20120315.gz,378 /a/squid/archive/sampled/sampled-1000.log-20120316.gz,426 /a/squid/archive/sampled/sampled-1000.log-20120317.gz,332 /a/squid/archive/sampled/sampled-1000.log-20120318.gz,231 /a/squid/archive/sampled/sampled-1000.log-20120319.gz,275 /a/squid/archive/sampled/sampled-1000.log-20120320.gz,440
Re: [Wikitech-l] Announcing initial version of gerrit-stats
Is the slowness issue known? -Niklas Yes this is known and it is related to the fact that gerrit-stats is currently hosted on a Labs instance. We are working on migrating it to another server. Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Announcing initial version of gerrit-stats
Hi everybody, The Analytics Team is happy to announce the first version of gerrit-stats. Gerrit-stats keeps track of the backlog of codereview for Git individual repositories. Gerrit-stats dashboard is available at http://gerrit-stats.wmflabs.org Currently, it has a few example charts but we can add your repo to the dashboard as well, just let us know! To create a new chart yourself visit http://gerrit-stats.wmflabs.org/graphs/new This will launch the interface to create your own graph. Click on 'Data' and Click on 'Add Metric' and a pull down menu with all the repositories will appear. Select the repository of your interest and select the metric that you want to visualize. Once you have selected all the metrics of your interest go back to 'Info' and enter a slug name. Then press 'Enter' and then click the 'Save' button. Currently, the following metrics are tracked (on a daily basis): 1) Number of new changesets 2) Number of changesets without any codereview per day (this excludes automated review from lint and lint-like reviewers). 3) Number of changesets waiting for merge per day (only applies to changesets that received only positive reviews) 4) Number of changesets self reviewed. And for metrics 2 and 3, there is a version for volunteers and for WMF staff. Gerrit-stats is visualized using Limn, Limn is the data GUI developed by the Analytics Team and lead by David Schoonover. Limn is available on https://github.com/wikimedia/limn This is the initial release and I am sure there will be bugs and issues. If you have any questions, or problems using gerrit-stats then either: 1) Head over to #wikimedia-analytics on IRC and ask us 2) Send an email to the analytics mailinglist 3) Contact us directly. Not Yet Frequently Asked Questions: 1) How do I create a visualization of the code review metrics for a repo? Visit gerrit-stats.wmflabs.org/graphs/new This will launch the interface to create your own graph. Click on 'Data' and Click on 'Add Metric' and a pull down menu with all the repositories will appear. Select the repository of your interest and select the metric that you want to visualize. Once you have selected all the metrics of your interest press the 'Save' button. Your are all set and you can use this permalink for future reference. 2) How do I edit an existing chart? Simply append /edit to the URL of your chart and you can edit it. 3) My repository is not showing up in the pull down menu, what happened? By default, all repositories are automagically kept track of as soon they contain a single commit. There are two exceptions: 1) If your repository name contains the string 'test' or 'private', it will be ignored. 2) The orgchart repository is not tracked by gerrit-stats, this is a known issue but Chad and I haven't been able to figure out what causes this. If your repository is missing then please contact me. 4) Will you add metrics for individual committers? Right now, the unit of analysis is a repository but it is definitely possible to keep track of codereview metrics for individuals. However, I would like to hear some use-cases first before embarking on this. 5) The chart looks to spikey, how can I have smoother lines? 1) Go to http://gerrit-stats.wmflabs.org/graphs/name_of_chart/edit 2) Click on 'Options' 3) Click on 'Advanced' (right side of screen) 4) Click on 'rollPeriod' (bottom of screen, yellow box) 5) This allows to create a moving average, so you can replace the 1 with 7 meaning that each datapoint is the average of the past 7 days. This option applies to both metrics but it really smooths out the outliers. 6) I want a new metric. How do I go about it? There are two options: a) Clone the gerrit-stats repo and hack away,it's Python btw. We are happy to help out! b) Send us a suggestion for a new metric, the more precise the more useful! On behalf of the Analytics Team, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Code review statistics and trends
There seems to be a 10-day lag (no data after August 21st). Is this a bug or a feature? Data hasn't been pushed to gerrit for 10 days, something is wrong with the script. We will fix it today. D ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Code review statistics and trends
Hi Harry The change set numbers are accurate and the spikes are caused by translatewiki. See my response to siebrand on how to remove the outliers and create a smoother chart. Best Diederik Sent from my iPhone On 2012-08-25, at 17:01, Harry Burt jarry1...@gmail.com wrote: I realise that many contributors are WMF staff, and many WMF staff work a relatively predictable 5-day week, but the new changesets graph still seems a little spiky to my eyes. Given the +- 10 changesets range, how much confidence should I be placing in these numbers? Thanks, Harry -- Harry Burt (User:Jarry1250) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Code review statistics and trends
On 2012-08-23, at 2:42 AM, Siebrand Mazeland (WMF) wrote: The graph for new changesets fluctuates a lot. I would guess this is due to change sets submitted by user l10n-bot. Maybe it's a good idea to filter those out, to get a line that's a little easier to interpret. Hey Siebrand, I prefer to keep the data collection as simple as possible. One way of fixing this issue is as follows: 1) Go to http://gerrit-stats.wmflabs.org/graphs/mediawiki/edit 2) Click on 'Options' 3) Click on 'Advanced' (right side of screen) 4) Click on 'rollPeriod' (bottom of screen, yellow box) 5) This allows to create a moving average, so you can replace the 1 with 7 meaning that each datapoint is the average of the past 7 days. This option applies to both metrics but it really smooths out the outliers. If you want to save this then please use another slug name (click on 'Info') and replace 'slug' and then click 'Save' else you will have changed Robla's original chart. Best, D ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The Death of OAuth 2
Anyone want me to go back through the specs and make a list of some of the things that are wrong with both Yes! I think that would be hugely helpful! Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] The Death of OAuth 2
Hi all, The lead author of Oauth 2.0, Eran Hammer, has withdrawn his name from the OAuth 2 spec: http://hueniverse.com/2012/07/oauth-2-0-and-the-road-to-hell/ That's a very sad news, IMHO, and it probably means we really should reconsider what protocol we want to support Oauth 1.0 / Oauth 2.0 / SAML or something else if we want to allow interoperability with our sites. Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Automatic mobile redirection enabled for *.wikimedia.org sites hosted on the cluster (except for commons)
Hey arthur It seems that the redirection to the mobile donation site (donate.m.wikimedia.org) does not work. D Sent from my iPhone On 2012-07-19, at 19:35, Arthur Richards aricha...@wikimedia.org wrote: PS big thanks to Asher Feldman for getting the change compiled and deployed. On Thu, Jul 19, 2012 at 4:34 PM, Arthur Richards aricha...@wikimedia.orgwrote: Around 21:30UTC automatic redirection to the mobile version of *. wikimedia.org sites hosted on the cluster (except for commons) was enabled with the deployment of https://gerrit.wikimedia.org/r/#/c/16000/. This is part of the ongoing effort by the mobile team to provide automatic redirection for mobile devices to the mobile version of all of our sites. For more information about the project and the timeline for enabling automatic redirection to the remaining projects, see http://www.mediawiki.org/wiki/Mobile_default_for_sibling_projects. Please let us know if you see any issues. As always, feel free to join us on IRC in #wikimedia-mobile. -- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687 -- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] getting Jenkins to run basic automated tests on commits to extensions
Yes, I don't disagree that jshint should be run by Jenkins. AIUI Timo's work to make jshint work on the command line is prep work for exactly that. Ah, I misunderstood you. Thought you meant so people can run it before uploading which no one will ever do ;-) Maybe we should create a git pre-commit script that does the jslint / php -c check that people can install on their local dev computers. That way people will never forget it ;) D ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Barkeep code review tool
Roan Kattouw wrote: Yes, ops essentially uses a post-commit workflow right now, and that makes sense for them. ops also uses pre-commit review for non-ops people :-] Yeah, that's right. What I meant to say (and thought I had said in some form later in that message) was that the puppet repo has post-commit review for most changes by ops staff, and pre-commit review for everything else (non-ops staff, volunteers, and certain changes by ops staff in some cases). I became curious with these statements regarding self-review (committer==reviewer) and so I ran a couple of queries against the gerrit database to see how often this occurs: 1) For the puppet repo, 84.1% of the commits is self-reviewed. 2) For the mediawiki core repo, 27.9% of the commits is self-reviewed. 3) For the mediawiki extensions repos, 67.8% of the commits is self-reviewed. I think we need to take a step back from a tool-focused discussion and first hash out what our commit workflows are / should be. In particular: 1) Should there be one commit workflow that applies to all teams? Looking at current practise, the answer seems to be no but I am curious to hear what other people think. If the answer is that it's okay for different teams to have different commit workflows, then we should also look for tools that support this. 2) If self-review is so prevalent, does that mean that the pre-commit review workflow has failed? Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] HTTPS Wikipedia search for Firefox?
Hey Chris Could you give us a ballpark estimate of how much search queries you expect per day? Best Diederik Sent from my iPhone On 2012-06-19, at 13:51, Chris Peterson cpeter...@mozilla.com wrote: Thanks, Ryan. When you guys would like Mozilla to make this switch to HTTPS, you can just reopen Firefox bug 758857. chris On 6/19/12 10:35 AM, Ryan Lane wrote: On Tue, Jun 19, 2012 at 3:39 AM, Chris Peterson cpeter...@mozilla.com wrote: hi, I'm a developer at Mozilla and I have a patch [1] that would switch Firefox's Wikipedia search box from HTTP to HTTPS. Who would be an appropriate technical contact at Wikimedia that I can coordinate with? Is this a change Wikimedia would welcome? Or would the increased SSL server load be an undue burden for Wikimedia? Just to be clear, this change would only affect Firefox users who search Wikipedia using Firefox's search box. A few months ago, Mozilla switched Firefox 14 (currently in Beta) to use Google's HTTPS search [2]. If I check in my Wikipedia patch soon, the change would ride Firefox's Nightly, Aurora, and Beta release channels [3] and be released to the general public in Firefox 16 (October 2012). Please don't do so. HTTPS is a new service, and we haven't properly load tested it yet. The first target for production load testing is for logged-in users. I'm not opposed to the change completely, but I'd prefer to let you guys know when we're ready. Thanks, - Ryan ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Giving people additional rights on gerrit
Hi Antoine, I really think we need to rethink how we are handing out non-admin Gerrit rights to our engineers, both staff and volunteers. Create repo and create branch rights should be handed out by default. There is absolute zero reason for being stingy in handing out these rights. The loss of productivity is really not acceptable and it would be a real shame if we decided to drop Gerrit as our code review not because of Gerrit's inadequacies but because the way we utilize Gerrit is broken. Best, Diederik Please bug Gerrit admins through the not so broken workflow :-) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Some old proposed changes in Gerrit waiting merge, after a code review.
The analytics team has written A script to generate such reports and we will publish these results shortly once we have enough data points Best Diederik Sent from my iPhone On 2012-06-14, at 4:31, Sébastien Santoro dereck...@espace-win.org wrote: Hi, I saw this morning those reviewed but not merged code changes in gerrit: Parser issue for HTML definition list Bug 11748: Handle optionally-closed HTML tags without tidy 2012-04-17 Owner: GWicke Review: +1 by saper https://bugzilla.wikimedia.org/11748 https://gerrit.wikimedia.org/r/#/c/5174/ (bug 32381) Allow descending order for list=backlinks, list=embeddedin and list=imageusage 2012-04-30 Owner : Umherirrender Review: +1 by Aaron Schulz https://bugzilla.wikimedia.org/32381 https://gerrit.wikimedia.org/r/#/c/6108/ Upgrade cortado-ovt to newer version (seems to work fine locally) 2012-05-05 Owner : Reedy Review : +1 by awjrichards https://gerrit.wikimedia.org/r/#/c/6640/ Would it be interesting to generate an automated report detecting 45 jours code submission having one at least +1 review but still not merged? -- Best Regards, Sébastien Santoro aka Dereckson http://www.dereckson.be/ ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Give create gerrit repo right to all WMF engineers
Hi Ori, I absolutely 100% agree and we really need to sort this out this week. The lost productivity is unacceptable. So far I have heard different arguments why we cannot hand out 'create-repo rights' to engineers: The first reason was that only admin's could do it but that is not longer true with the special create repo right group The second reason was that Gerrit's permission system is either too complex or engineers don't know how it works. I have full confidence in our engineers that they can master Gerrit's permission system in less than a day. Now a new argument is unleashed and that is that we cannot delete repos. The fact that we cannot delete repos is a non-argument. None of us are going to create a bazillion repos. The way we are using Git right now makes it a more centralized system than Subversion ever was. This means that we are not using it right. So I really hope that we can close this discussion by handing out the 'create-repo right' to paid WMF engineers or any paid WMF engineer who requests this. Diederik On Tue, Jun 5, 2012 at 8:13 AM, Ori Livneh ori.liv...@gmail.com wrote: On Mon, Jun 4, 2012 at 11:00 PM, Jeremy Baron jer...@tuxmachine.com wrote: I mostly agree with what you've said. Just wanted to point out gerrit projects (aka repos) can never be destroyed. so if you e.g. typo or rename a project or kill it 5 days after you started it's still there forever. Only very recently have we even been able to hide projects from project listings in the UI. Isn't the same basically true of Wiki articles? I understand the desire to keep things tidy, okay. But what would be the big deal about having ten or even a hundred thousand abandoned repositories, so long as they are hidden, and do not clutter the UI? The repositories that would be candidates for deletion are the ones that got no further than an initial stab, and those measure in kilobytes. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Give create gerrit repo right to all WMF engineers
On Tue, Jun 5, 2012 at 8:44 AM, Jeremy Baron jer...@tuxmachine.com wrote: On Tue, Jun 5, 2012 at 2:25 AM, Diederik van Liere dvanli...@gmail.com wrote: Now a new argument is unleashed and that is that we cannot delete repos. The fact that we cannot delete repos is a non-argument. None of us are going to create a bazillion repos. I was just pointing it out; I've no idea how gerrit behaves with lots of small+hidden repos. or with most of the repos in an instance hidden. Maybe it's not a problem. I would suggest that we cross that bridge when we get there. AFAIK,Ori and the E3 team would only need a handful of repos in the coming months and the same applies to the Analytics team. It sounds like Ori (and I think this is true for other people too) would create lots of repos that don't live too long. Maybe that's a bazillion, maybe not. -Jeremy ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Give create gerrit repo right to all WMF engineers
So the estimated maximum number of projects is 10.000, while the default maximum is 1.000. For contributors, the default maximum is 1.000 and the estimated maximum number is 50.000 Can we please tag this concern as addressed and start handing out the rights? Diederik On Tue, Jun 5, 2012 at 11:32 AM, Ori Livneh ori.liv...@gmail.com wrote: On Mon, Jun 4, 2012 at 11:44 PM, Jeremy Baron jer...@tuxmachine.com wrote: I was just pointing it out; I've no idea how gerrit behaves with lots of small+hidden repos. or with most of the repos in an instance hidden. Maybe it's not a problem. Some numbers here: http://gerrit-documentation.googlecode.com/svn/Documentation/2.4/dev-design.html#_spam_and_abuse_considerations ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Give create gerrit repo right to all WMF engineers
Did anyway say, Ask about it? I'm sure if you followed up with the one of the project creators (eg: chad) he would have been more than happy to push things along. I am sorry but I disagree. The question is not whether Chad or one of the Gerrit admin's will help us, because they are super responsive and are always helping us out when there are issues. The question is: what do we (WMF engineers) think is a sensible Git / Gerrit workflow. Creating repo's is part of this workflow. I believe in decentralized teams and our software should support this. A workflow where engineers have to bug a Gerrit admin to do something is a broken workflow: * You will always bug an admin at the wrong time * It always takes more time to bug somebody than DIY, we are really losing productive hours on issues like this. * We are professional engineers, and every engineer should know how to create a repo in Gerrit. * Bugging an engineer (in general) is not a scalable workflow and we should really move away from these kind of of accepted practises. We need to stop focusing on what Gerrit can / cannot do and we need to start drafting out team-specific workflows on how we want to use Git / Gerrit. Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Give create gerrit repo right to all WMF engineers
I've whipped up a quick tutorial for people who want to create new repositories[0]. If people can read and make sure they understand this page (with its various caveats), then yes, we can start handing this out. -Chad [0] https://www.mediawiki.org/wiki/Git/Creating_new_repositories Dear Chad, This is really helpful! Thanks so much for putting this together! Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Give create gerrit repo right to all WMF engineers
Hi all, Ryan Lane just showed me that in Gerrit there is a separate right for creating repositories. I suggest we give this right to all WMF engineers. A repo is free and fun and will prevent unnecessary delays. Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Give create gerrit repo right to all WMF engineers
Could you please add David Schoonover and Andrew Otto to the Project Creators group? Best, Diederik On 2012-06-01, at 5:41 PM, Chad wrote: I don't want to give this right to all engineers because setting up new repositories is more than just choosing the name. There's also the issue of understanding how Gerrit permissions work so you can set them up properly. I did make a new Project Creators group that I'm more than willing to add people to, once they've learned Gerrit permissions. In addition, unless you make a group you're in the owner of the repo (which can't be done via the GUI, only the CLI--this is a bug), you won't be able to set permissions at all (this is by design). So yeah, its not as easy as it sounds on the tin, so I don't want to hand this out en masse. In an ideal world, I want us to have a special page where people can request repos and we can automate the icky backend stuff. -Chad On Jun 1, 2012 10:33 AM, Diederik van Liere dvanli...@gmail.com wrote: Hi all, Ryan Lane just showed me that in Gerrit there is a separate right for creating repositories. I suggest we give this right to all WMF engineers. A repo is free and fun and will prevent unnecessary delays. Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The bugtracker problem, again
I don't think we should aim to cater to non-developers at all. The changes that a non-developer finds a real bug are very very small (in my previous life as an academic I have done a lot of research on Bugzilla and developer productivity and it's based on that experience that I am making this statement). I think that if a newbie / non-developer finds bugzilla then he /she should be redirected to either IRC / Teahouse / Talk pages / FAQ or any other support channel that we have. They can always be send back to file a bug report. If we are going to spend effort on improving bugzilla then it should be focused (IMHO) on matching a bug with the right developer (right meaning a person who can actually fix the problem). It is this area that Bugzilla (or any other bug tracker AFAIK) provides very limited support. -- Diederik On 2012-05-14, at 1:10 AM, Ryan Lane wrote: I don't think you'll ever find a finished bug-/issue-tracking solution that caters just as well for newbies and developers. The main reason is (of course?) that most issue tracking software is written for developers, by developers with little or no experience or thought as to what makes a good end-user experience. Also, most issue tracking tools are *made deliberately* to work best for developers - with human (end-user) interaction kept to a minimum. That's also why most issue tracking solutions end up looking like glorified (not the good kind) spreadsheets (Mantis, Flyspray, others?), something the IRS would want you to fill out (BZ, OTRS, RT, others?), or some kind of bastard child in-between (The Bug Genie, Redmine, Jira, Fogbugz, others?). I'd like to go one step further. There is not a single good bug/issue tracking system in existence. Yes, I'm completely serious too. I've come to believe that it's impossible to make one that anyone will be happy with. That includes most developers of tracking systems too (I've written one, and I hated it, though I liked it better than what I was using before). We can complain about this till the end of time. This discussion is even worse than bikeshedding discussions. At least with bikeshedding discussions you end up with a color for the bikeshed. When discussing bug/issue trackers you just end up with the same tracker, or another crappy tracker. - Ryan ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikisource link stats
Hey Lars, You might be interested in the WMF Analytics mailing list at https://lists.wikimedia.org/mailman/listinfo/analytics. There we discuss all our overall analytics projects, usually a little bit less focused on Mediawiki issues, but definitely focused on WMF data. Hope to see you there! Best, Diederik On 2012-05-03, at 4:46 PM, Lars Aronsson wrote: From [[Special:Linksearch]] I can find all the external links, based on the external links table in the database, which can be accessed by tools on the German toolserver. But is there any way to find similar information about links to Wikisource? I.e. what are the total number of links? Which pages link to a particular Wikisource page? -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Using tab as delimiter instead of space in the server log files
Hi all, In the last 24 hours I have found two new cases of spaces in log lines where the space is not used as a delimiter. Case 1: There are mobile page requests that contain a space in the URL, for example: ssl1002 2198871 2012-04-06T23:50:24.566 0.002 0.0.0.0 FAKE_CACHE_STATUS/301 1051 GET https://en.m.wikipedia.org/wiki/Extensor_carpi radialis longus NONE/mobilewikipedia - https://www.biodigitalhuman.com/ - Mozilla/5.0%20(Windows%20NT%206.1;%20WOW64)%20AppleWebKit/535.19%20(KHTML,%20like%20Gecko)%20Chrome/18.0.1025.151%20Safari/535.19 Case 2: The mimetype on varnish often contains additional charset=utf8 information, that results in a mimetype like application/json; charset=utf8 or text/xml; charset=utf8 Instead of continuing patching our servers to fix these space issues I strongly suggest that we move away from the space as delimiter and start using the tab (\t) character. Spaces not being used as delimiters have been cropping up in our server logs for many years and it makes the analytics part that much more complex as we need to check more and more edge cases and/or create patches. I rather solve the problem at the root and that is by moving to a new delimiter. The delimiter is added by nginx/varnish/squid when writing the log file, so Please let me know if this is a sane or insane idea. Please let me also know if you are a consumer of these server log files and you would need to make a change on your end to accommodate this change. Andrew has been working hard on building a test environment in Labs where we have nginx / varnish / squid servers running with production configuration and where we can test these changes extensively. Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] OAuth
The current version of http://www.mediawiki.org/wiki/OAuth was written by me and Dario. It's definitely a starting point and not a finished proposal. I am not sure to what extent the OAuth 2 protocol has evolved since this was written but that definitely needs to be checked. Diederik On Fri, Apr 27, 2012 at 1:52 PM, Chris Steipp cste...@wikimedia.org wrote: Petr, OAuth is something we're committing to on the roadmap for Summer/Fall of this year. So baring anything crazy occurring, oauth should be happening over the next few months. I'm planning to help drive the process from WMF's side, but it's something I'm hoping some people in the community will also take on and help with. I've heard the mobile, api, and labs all want oauth to help with their projects. But can we start collecting specific user stories from anyone who wants to use oauth? It looks like most of the wikitech conversations have made it to http://www.mediawiki.org/wiki/OAuth, but would someone be willing to make sure it's up to date? I'll try to also get to over the next few days. Thanks! Chris On Fri, Apr 27, 2012 at 4:40 AM, Petr Bena benap...@gmail.com wrote: Some updates on this? Is WMF or someone going to work on this or it's waiting for someone to start? On Fri, Mar 16, 2012 at 3:19 PM, Petr Bena benap...@gmail.com wrote: Sorry, few typos: So, right now a question is if it's supposed to be implemented as extension or in core, or both (in case extension can't be created now, update core so that it's possible). ^ that's what I was about to say On Fri, Mar 16, 2012 at 3:17 PM, Petr Bena benap...@gmail.com wrote: So, right now a question is if it's supposed to be implemented as extension or in core, or both (in case extension can't be created now, updated core do that it's possible). I would rather make is as extension since there is a little benefit for most of mediawiki users in having this feature. I think it's better to keep only necessary stuff inside core and keep extra stuff as extensions. Is there any objection against implementing it as extension? Thanks On Wed, Mar 14, 2012 at 12:49 AM, John Erling Blad jeb...@gmail.com wrote: Just as an idea, would it be possible for Wikimedia Foundation to establish some kind of joint project with the SimpleSAMLphp-folks? Those are basically Uninett, which is FEIDE, which is those that handle identity federation for lots of the Norwegian schools, colleges and universities.. The SimpleSAML solution is in use in several other projects/countries, not sure whats the current status. The platform for FEIDE is also in use in several other countries so if the log on problems in Norway are solved other countries will be able to use the same solution. Note also that OAuth 2.0 seems to be supported. https://rnd.feide.no/2012/03/08/releasing-a-oauth-2-0-javascript-library/ In april this year there is a conference GoOpen 2012 (http://www.goopen.no/) in Oslo and some folks from Wikimedia Foundation is there, perhaps some folks from Uninett too? Could it be possible for interested people to sit down and discuss wetter a joint project is possible? Uninett is hiring for SimpleSAML development and that could be interesting too! John On Wed, Mar 14, 2012 at 12:13 AM, Thomas Gries m...@tgries.de wrote: There's really two separate things that these systems can do. The classic OAuth scenario is like this: site A: Wikipedia user A site B: Huggle Site B initiates a special login on site A using a shared secret; on success, site A passes back authentication tokens to site B which verify that user A allowed site B access. Site B then uses those tokens when it accesses site A, in place of a username/password directly. OpenID, SAML, etc seem to be more appropriate for this scenario: site A: Wikipedia site B: University user B These systems allow user B to verify their identity to site A; one possibility is to use this to associate a user A' with the remote user B, letting you use the remote ID verification in place of a local password authentication. (This is what our current OpenID extension does, basically.) These are, IMO, totally separate use cases and I'm not sure they should be treated the same. The Extension:OpenID can be used for both cases ( given, that you set $wgOpenIDClientOnly = false; ) https://www.mediawiki.org/wiki/Extension:OpenID . The extension makes a MediaWiki installation OpenID 2.0-aware and lets users log in using their OpenID identity - a special URL - instead of (or as an alternative to) standard username/password log in. In that way, the MediaWiki acts as Relying part (RP) = OpenID consumer.[1] *As an option, it also allows the*_*MediaWiki to act as OpenID provider*, _so
Re: [Wikitech-l] Page views
My suggestion for how to filter these bots efficiently in c program (no costly nuanced regexps) before sending data to webstatscollector: a) Find 14th field in space delimited log line = user agent (but beware of false delimiters in logs from varnish, if still applicable) b) Search this field case insensitive for bot/crawler/spider/http (by convention only bots have url in agent string) That will filter out most bot pollution. We still want those records in sampled log though. Any thoughts? I did some research on fast string matching and it seems that the recently developed algorithm by Leonid Volnitsky is very fast (http://volnitsky.com/project/str_search/index.html). I will do some benchmarks vs the ordinary C strstr function but the author claims it's 20x faster. So instead of hard coding where the bot information should be, just search the entire logline for the bot information and if it is present discard the logline and else process as-is. Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] 2nd Analytics Day videos are available
Hi all, March 2nd, we had our 2nd WMF Analytic Day. We taped all the sessions and they are now available on Commons: http://commons.wikimedia.org/wiki/File:WMF_Analytics_Day_-_Cassandra.ogv http://commons.wikimedia.org/wiki/File:WMF_Analytics_Day_-_HBase.ogv http://commons.wikimedia.org/wiki/File:WMF_Analytics_Day_-_Hive.ogv http://commons.wikimedia.org/wiki/File:WMF_Analytics_Day_-_Peregrine.ogv http://commons.wikimedia.org/wiki/File:WMF_Analytics_Day_-_Storm.ogv http://commons.wikimedia.org/wiki/File:WMF_Analytics_Day_-_Hadoop.ogv Big thanks to Chip for converting these gigantic files! If you are curious to see what the Analytics Team is up to, then head over to our roadmap: http://www.mediawiki.org/wiki/Analytics/2012-2013_Roadmap Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Page views
Hi Srikanth, Yes, we are looking into the growth percentages as they seem unrealistically high. Best, Diederik On Mon, Apr 9, 2012 at 3:30 AM, Srikanth Lakshmanan srik@gmail.com wrote: On Mon, Apr 9, 2012 at 00:46, Erik Zachte ezac...@wikimedia.org wrote: returns 20 lines from this 1:1000 sampled squid log file after removing javascript/json/robots.txt there are 13 left, which fits perfectly with 10,000 to 13,000 per day however 9 of these are bots!! Is this the same case for mobile stats as well? I don't think there could be sudden 100% growth for 2 months now across wikis[1] without some reason like this. [1] http://stats.wikimedia.org/EN_India/TablesPageViewsMonthlyMobile.htm -- Regards Srikanth.L ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Languages supported by Jenkins (was Changes status in Gerrit)
Thanks, I meant to say what languages initially will be supported :) D On 2012-04-06, at 7:04 AM, Antoine Musso wrote: Le 05/04/12 20:20, Diederik van Liere a écrit : Which languages will Jenkins support? Jenkins is just a bot, we can make it do whatever we want. The plan is to have a universal linting job able to analyse any language or format in use, be it PHP, Python, JS, CSS ... https://www.mediawiki.org/wiki/Continuous_integration/Workflow_specification I am not sure when I am going to work on it, but for sure after we have bring back Testswarm alive. -- Antoine hashar Musso ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Languages supported by Jenkins (was Changes status in Gerrit)
Hi Chad, On 2012-04-05, at 2:17 PM, Chad wrote: Once we've got jenkins working reliably, I plan to remove the verified permission so only the bots can set it. -Chad Which languages will Jenkins support? Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] WURFL licensing concerns and Git migration
I am in touch with the core developers of the Apache Devicemap project and we are exploring the possibility of collaborating. If something comes out of this exploration then I will announce it here. Best, Diederik On Wed, Mar 21, 2012 at 2:33 AM, Patrick Reilly prei...@wikimedia.org wrote: I can remove it. — Patrick On Mar 20, 2012, at 10:53 PM, Erik Moeller e...@wikimedia.org wrote: On Tue, Mar 20, 2012 at 10:37 PM, Q overlo...@gmail.com wrote: ScientiaMobile basically took an open data repository and closed it, the complete opposite of what the WMF is trying to do. I'd strongly suggest looking for real Open solutions like OpenDDR/DeviceMap And apparently they've been trying to take down legitimate copies, too: http://openddr.org/takedown.html Yikes, that's evil. To the extent we're relying on it today, we should move off it ASAP. -- Erik Möller VP of Engineering and Product Development, Wikimedia Foundation Support Free Knowledge: https://wikimediafoundation.org/wiki/Donate ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] WURFL licensing concerns and Git migration
Yes, ScientiaMobile has made some very important changes to the license and it does mean (AFAIK) that you cannot store the wurfl.xml in a repository. This paragraph is particularly important: You are not authorized to create a derivative work of or otherwise modify this WURFL file, and you are further not authorized to use, copy, display, or distribute, in each case, any derivative work of this WURFL file, whether created by you or someone else. I think it's best with waiting to put the file in git and I'll forward this question to the legal team. Best, Diederik On Wed, Mar 21, 2012 at 12:03 AM, Kevin Israel pleasest...@live.com wrote: Our MobileFrontend extension, which is currently deployed on Wikimedia sites, uses WURFL to detect the mobile devices it targets. However, I recently became aware the version of the WURFL data files we use has a rather restrictive license. http://tech.groups.yahoo.com/group/wmlprogramming/message/34311 The license seems to suggest we are not even supposed to redistribute verbatim copies or install the data files on multiple servers rather than only making [...] one copy [...], if not merely fail to grant such permission. Currently, the files are in our Subversion repository and are going to end up in Git soon. I am not a lawyer, and I realize this is probably a matter for the Wikimedia Foundation to handle, albeit one of urgent importance to us. If I am not mistaken, proper removal of infringing material from Git repositories is somewhat painful in that it causes all child SHA-1 hashes to change, so I feel resolution of the above licensing concern blocks Git migration of at least the MobileFrontend extension. -- Wikipedia user PleaseStand http://en.wikipedia.org/wiki/User:PleaseStand ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Git, Gerrit and the coming migration
On 2012-03-07, at 6:01 AM, Chad wrote: My main worry is that we are not spending enough time on getting all engineers (both internal and in the community) up to speed with the coming migration to Git and Gerrit and that we are going to blame the tools (Gerrit and/or Git) instead of the complex interaction between three changes. We are making three fundamental changes in one-shot: 1) Migrating from a centralized source control system to a decentralized system (SVN - Git) 2) Introducing a new dedicated code-review tool (Gerrit) 3) Introducing a gated-trunk model These are big changes. They're drastic changes. They require a rethinking of a great many things that we do from both technical and non-technical perspectives. Unfortunately, I don't see how we could've done #1 without #2. CodeReview is not designed (and was never designed) to work with a DVCS. The workflow's just not there, and it would've basically required rewriting huge parts of it. Rather than reinvent the wheel (again), we went with Gerrit. Arguably, we could've gone a straight push and skipped item #3. But given the continual code review backlog, and the desire to keep trunk stable (and hopefully deploy much more often), the decision to gate trunk was made pretty early on in the discussions. I understand that we want to do all 3 of those changes, my point was merely to make it very in explicit what we are changing and that the biggest change, IMHO, is the introduction of 3). It seems that most of the discussion is focusing on the tools (that's also how this thread started) while I think the discussion should focus on mastering the new workflow and what we can do to make sure that we have the right tutorials training available to make this migration as gentle as possible. I am confident that we will master the new tools, but a new workflow requires new habits and that might take more time to develop. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Git, Gerrit and the coming migration
Hi all, Some disclaimers before I start my thread: 1) I am a big believer in Git and dvcs and I think this is the right decision 2) I am a big believer in Gerrit and code-review and I think this is the right decision 3) I might be wholly unaware / inaccurate of certain things, apologies in advance. 4) A BIIGG thankyou to all the folks involved in preparing this migration (evaluation, migration and training): in particular Chad, Sumanah and Roan (but I am sure more people are involved and I am just blissfully unaware). My main worry is that we are not spending enough time on getting all engineers (both internal and in the community) up to speed with the coming migration to Git and Gerrit and that we are going to blame the tools (Gerrit and/or Git) instead of the complex interaction between three changes. We are making three fundamental changes in one-shot: 1) Migrating from a centralized source control system to a decentralized system (SVN - Git) 2) Introducing a new dedicated code-review tool (Gerrit) 3) Introducing a gated-trunk model My concern is not about the UI of Gerrit, I know it's popular within WMF to say that it's UI sucks but I don't think that's the case and even if it was an issue it's only minor. People have already suggested that we might consider other code-review systems, I did a quick Google search and we are the only community considering migrating from Gerrit to Phabricator. I think this is besides the point: the real challenge is moving to a gated-trunk model, regardless of the chosen code-review tool. I cannot imagine other code-review tools that are also based on a gated-trunk model and work with Git are much easier than Gerrit. The complexity comes from the gated-trunk model, not from the tool. The gated-trunk model means that, when you clone or pull from master, it might be the case that files relevant to you have been changed but that those new changes are waiting to be merged (the pull request backlog, AKA the code-review backlog). In the always-commit world with no gatekeeping between developers and master, this never happens; your local copy can always be fully synchronized with trunk (master). Even if a commit is reverted, then your local working copy will still have it, and any changes that you might have based on this reverted commit, you can still commit. Obviously people get annoyed when you keep checking in reverted code, but it won't break anything. In an ideal world, our code-review backlog would be zero commits at any time of the day, if that's the case then 'master' is always up-to-date and you have the same situation as with the 'always-commit' model. However, we know that the code-review backlog is a fact and it's the intersection of Git, Gerrit and the backlog that is going to be painful. Suppose I clone master, but there are 10 commits waiting to be reviewed with files that are relevant to me. I am happily coding in my own local branch and after a while ready to commit. Meanwhile, those 10 commits have been reviewed and merged and now when I want to merge my branch back to master I get merge conflicts. Either I discover these merge conflicts when my branch is merged back to master or if I pull mid-way to update my local branch. To be a productive engineer after the migration it will *not* be sufficient if you have only mastered git clone, git pull, git push, git add and git commit commands. These are the basic git commands. Two overall recommendations: 1) The Git / Gerrit combination means that you will have to understand git rebase, git commit --amend, git bisect and git cherry-pick. This is advanced Git usage and that will make the learning curve steeper. I think we need to spend more time on training, I have been looking for good tutorials about GitGerrit in practise and I haven't been able to find it but maybe other people have better Google Fu skills (I think we are looking for advanced tutorials, not just cloning and pulling, but also merging, bisect and cherrypick). 2) We need to come up with a smarter way determining how to approach the code-review backlog. Three overall strategies come to mind: a) random, just pick a commit b) time-based picking (either the oldest or the youngest commit) c) 'impact' of commit a) and b) do not require anything but are less suited for a gated-trunk model. Option c) could be something where we construct a graph of the codebase and determine the most central files (hubs) and that commits are sorted by centrality in this graph. The graph only needs to be reconstructed after major refactoring or every month or so. Obviously, this requires a bit of coding and I don't have formal proof that this actually will reduce the pain but I am hopeful. If constructing a graph is too cumbersome then we can sort by the number of affected files in a commit as a proxy. If we cannot come up with a c) strategy then the only real option is to make sure that the queue is as Wikimedia short as possible. Best, Diederik
Re: [Wikitech-l] Proposed removal of some API output formats
Hi, Andre Engels did some analysis of the type of API formats used. The data is from a single random Sunday in late 2011: 1997267 application/json 314285 text/xml 171259 - 68358 application/vnd.php.serialized 55549 text/html 34680 text/javascript 8907 application/x-www-form-urlencoded 8882 application/xml 807 application/rsd+xml 467 text/text 105 application/x-www-form-urlencoded; 18 application/yaml 1 multipart/form-data; yaml is used for the query and parse API actions. On this particular day, the following services used yaml: http://www.huddba.cz corporama.com reftag.appspot.com Thank you Andre! Best, Diederik On Wed, Feb 8, 2012 at 7:45 PM, Roan Kattouw roan.katt...@gmail.com wrote: On Wed, Feb 8, 2012 at 11:42 PM, Tim Starling tstarl...@wikimedia.org wrote: What are the other problems? I'm not sure what Max is referring to, other than the fact that I hate XML (or at least using XML for this API) and generally don't like the fact that we have to support so many formats. As I said on Bugzilla earlier today, if I ever were to rewrite the API from scratch it'd be JSON-only. However, we can't actually get rid of XML realistically. * YAML - we don't serve real YAML anyway, currently it's just a subset of JSON. YAML is just a few harmless lines of code, why would you want to remove it? Yeah that can probably stay, it's not worth breaking anything over. Roan ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Welcome, Andrew Otto - Software Developer for Analytics
Welcome andrew! Super excited to have you joining us! Diederik Sent from my iPhone On 2012-01-06, at 13:13, Sumana Harihareswara suma...@wikimedia.org wrote: On 01/06/2012 01:08 PM, Rob Lanphier wrote: We're really excited to have Andrew on board to help bring some systems rigor to our data gathering process. Our current data mining regime involves a few pieces of lightweight data gathering infrastructure (e.g. udp2log), a combination of one-off special purpose log crunching scripts, along with other scripts that started their lives as one-off special purpose scripts, but have gradually become core infrastructure. Most of these scripts have single maintainers, and there is a lot of duplication of effort. In addition, the systems have a nasty tendency to break at the least opportune times. Andrew's background bringing sanity to insane environments will be enormously helpful here. (See episode S10E07, The Shadow Scripts, https://blog.wikimedia.org/2011/10/31/data-analytics-at-wikimedia-foundation/ )* Andrew is based out of Virginia, but is still traveling the world. Right now, you'll find him in New York City. Please join me in welcoming Andrew to the team! I congratulated him IN PERSON five minutes ago, because we're coworking today. There's another New Yorker now, yay! -- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation * I am being silly and acting as though this blog entry were an episode of a science fiction TV show. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Git migration progress for MW core
Hi Chad, Reposurgeon (http://catb.org/~esr/reposurgeon/ ) might be a useful tool to help fix the svn history. Best, Diederik On Tue, Dec 13, 2011 at 11:47 AM, Chad innocentkil...@gmail.com wrote: On Tue, Dec 13, 2011 at 11:44 AM, Chad innocentkil...@gmail.com wrote: Couple of caveats (things I'm gonna try and fix): * Permissions aren't sorted yet, so it's only supporting anonymous clones, no pushing yet. * The revision graph is crazy. svn:mergeinfo is unreliable and we're pretty much unable to build a cohesive history without a *lot* of manual labor. Right now I'm thinking of just dropping the mergeinfo so the branches look like linear graphs cherry picking from master. Not perfect, but less annoying than now. Also there's two stupid commits at the head of master due to my mistake when initially pushing the repo. That won't happen again on subsequent tests or the real conversion. -Chad ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Mediawiki 2.0
I think that the current version numbering system is confusing, incremental version increases from 1.15 to 1.16 to 1.17 to 1.18, etc suggest to most people minor changes with no compatibility implications. This is not the case with MW. The Chrome version numbering is the other extreme, releasing every 6 weeks a major version increment. In the end I think that a version system should give an idea how much has changed under the hood. just my 2 cents. Diederik On Thu, Dec 8, 2011 at 4:19 AM, Tim Starling tstarl...@wikimedia.orgwrote: On 08/12/11 05:45, Dan Nessett wrote: On Wed, 07 Dec 2011 12:54:22 +1100, Tim Starling wrote: On 07/12/11 12:34, Dan Nessett wrote: On Wed, 07 Dec 2011 12:15:41 +1100, Tim Starling wrote: How many servers do you have? 3. It would help to get it down to 2. I assume my comments apply to many other small wikis that use MW as well. Most operate on a shoe string budget. You should try running MediaWiki on HipHop. See http://www.mediawiki.org/wiki/HipHop It's not possible to pay developers to rewrite MediaWiki for less than what it would cost to buy a server. But maybe getting a particular MW installation to run on HipHop with a reduced feature set would be in the same order of magnitude of cost. -- Tim Starling Are there any production wikis running MW over HipHop? No. There are very few test installations, let alone production installations. But isn't it exciting to break new ground? -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Call Graphs in MediaWiki Documentation
-1, Personally, I like them because they give me a quick overview of the inter-dependencies and how methods related to each other and so I guess that for other 'newbies' this helps in getting through the learning curve faster. Diederik On Thu, Dec 8, 2011 at 12:51 PM, Yuvi Panda yuvipa...@gmail.com wrote: Why do we have callgraph images in the documentation? I can't understand how they are useful, and they eat bandwidth (+ screenspace) unnecessarily. Is there a reason for their existence? Can we get rid of them? Check this for an example: http://svn.wikimedia.org/doc/classLinker.html -- Yuvi Panda T http://yuvi.in/blog ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Dropping the 'LATER' resolution in Bugzilla
But then the bug should be NEW, nobody is checking for a bug that is marked LATER. I mentioned WORKSFORME because i suspect that some of the LATER bugs have been resolved by now. Diederik On Tue, Nov 29, 2011 at 1:59 PM, Chad innocentkil...@gmail.com wrote: On Tue, Nov 29, 2011 at 1:45 PM, Diederik van Liere dvanli...@gmail.com wrote: Hi folks, Currently, we have a 'LATER' resolution in Bugzilla, it contains 339 bug reports over all the products, see: https://bugzilla.wikimedia.org/buglist.cgi?query_format=advancedlist_id=57731resolution=LATERproduct=CiviCRMproduct=Cortadoproduct=dbzip2product=Kate%27s%20Toolsproduct=Logwoodproduct=MediaWikiproduct=MediaWiki%20extensionsproduct=mwdumperproduct=mwEmbedproduct=Wikimediaproduct=Wikimedia%20Mobileproduct=Wikimedia%20Toolsproduct=Wiktionary%20toolsproduct=XML%20Snapshots The question is, when is LATER? Technically, these bugs are not open and so nobody will ever see them again and that's how they will be forgotten. To me, it seems that bugs that are labeled LATER should either be labeled: 1) WONTFIX, which I guess is the majority of these bugs 2) WORKSFORME, I am sure some things have been fixed 3) NEW, it is a real bug / feature request. LATER means we can't or won't do it (right now) but that is likely to change in the future. WONTFIX implies no and this is not likely to change WORKSFORME is unrelated. -Chad ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- a href=http://about.me/diederik;Check out my about.me profile!/a ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Dropping the 'LATER' resolution in Bugzilla
I agree, currently the LATER acts as a blackhole and there is no structured process to re-evaluate these kind of bugs. I have done a lot of reading of these bugs and many were filed 3 to 5 years ago, I think it's better to say WONTFIX then to suggest that this is something that is going to be fixed. It is about expectation management :) On Tue, Nov 29, 2011 at 2:53 PM, Merlijn van Deen valhall...@arctus.nlwrote: On 29 November 2011 19:45, Diederik van Liere dvanli...@gmail.com wrote: The question is, when is LATER? Technically, these bugs are not open and so nobody will ever see them again and that's how they will be forgotten. I would interpret 'LATER' as 'this bug should be re-evaluated after a certain period of time'. Following this train of thought, a LATER bug should have a re-evaluation date planned, after which it is changed back to NEW. This probably is not possible, but I think it makes sense to change LATER bugs to NEW after, say, a year or so. Merlijn ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- a href=http://about.me/diederik;Check out my about.me profile!/a ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Dropping the 'LATER' resolution in Bugzilla
So today I have read about a 100 LATER marked bug reports and I do think we need the LATER resolution, but I would suggest to limit it's use case to only those bugs were an external constituent, either the Wikipedia community or a third-party software developer, needs to take an action and *then* we need to actually follow up on that. So this would, IMHO, exclude the following type of bug reports: 1) We do not currently have enough resources (is not a good reason to label it LATER) 2) A bug that is dependent on another bug (is not a good reason to label it LATER) 3) Bug reports that only dependent on upstream but do not require any action after it has been fixed should not be labeled LATER I am not sure how to handle bug reports that require a major architectural overhaul, not a big fan of LATER but not quite sure if there is a better alternative. Best, Diederik On 2011-11-29, at 8:35 PM, Jay Ashworth wrote: - Original Message - From: Mark A. Hershberger mhershber...@wikimedia.org Jay Ashworth j...@baylink.com writes: Do we have a Target release in our BZ? We've begun using Milestones in Bugzilla for this. One of the milestones is Mysterious Future. I think you should feel free to use that instead of LATER. I love this, and am promptly stealing it for my own. -- j -- Jay R. Ashworth Baylink j...@baylink.com Designer The Things I Think RFC 2100 Ashworth Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Northern Soto Wikipedia
It works on safari but it definitely gives a backtrace error on firefox 7 Diederik Sent from my iPhone On 2011-11-05, at 9:56, Amir E. Aharoni amir.ahar...@mail.huji.ac.il wrote: 2011/11/5 Andre Engels andreeng...@gmail.com: There seems to be a Northern Soto Wikipedia at http://nso.wikipedia.org, at least that's what http://incubator.wikimedia.org/wiki/Wp/nso claims. However, when I go to that site I see the following text: Unstub loop detected on call of $wgLang-getCode from MessageCache::get Backtrace: ... It works for me. Can you try again? -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Northern Soto Wikipedia
I am running Firefox 7.01 on OSX Leopard 10.6.8 and it gives a backtrace error. Diederik On Sat, Nov 5, 2011 at 10:15 AM, Ole Palnatoke Andersen palnat...@gmail.com wrote: On Sat, Nov 5, 2011 at 2:56 PM, Amir E. Aharoni amir.ahar...@mail.huji.ac.il wrote: 2011/11/5 Andre Engels andreeng...@gmail.com: There seems to be a Northern Soto Wikipedia at http://nso.wikipedia.org, at least that's what http://incubator.wikimedia.org/wiki/Wp/nso claims. However, when I go to that site I see the following text: Unstub loop detected on call of $wgLang-getCode from MessageCache::get Backtrace: ... It works for me. Can you try again? Windows Vista: Chrome 15.0.874.106: Same experience as Andre. Firefox 6.0.1, Opera 11.50, Safari 5.0.5, IE8: Same as Amir. - Ole -- http://palnatoke.org * @palnatoke * +4522934588 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- a href=http://about.me/diederik;Check out my about.me profile!/a ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] lost in sites for coding challenge mobile
I think that is true unless you want to upload a fair use image, that is not allowed on commons but is on some wikipedia sites like the English. Diederik Sent from my iPhone On 2011-10-24, at 8:23, Greg DeKoenigsberg greg.dekoenigsb...@gmail.com wrote: This is a good question. Simone sent it to me privately, and it occurred to me that the answer was sufficiently non-obvious that I asked him to report to wikitech-l. In looking at this page: http://en.wikipedia.org/wiki/Wikipedia:Files_for_upload ...it seems as though uploading to Commons is the preferred option. Is that right? --g On Mon, Oct 24, 2011 at 7:14 AM, Simone simonelocc...@gmail.com wrote: i am lost in sites, when i upload a picture where it must go? in *.wikipedia.org or in commons.wikimedia.org? i didn't understand... another thing is that to take the token the user must be logged, and i have seen that every country has got different logins, for examples: my login of it.wikipedia.org is different than en.wikipedia.org... so the user must choose first the domain where want to log and after take the token to upload the contents... Thank's for attenction, Simo ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] page view stats redux
This is really cool! Thanks Ariel and team for making this available. best, Diederik On Thu, Sep 15, 2011 at 5:16 PM, MZMcBride z...@mzmcbride.com wrote: Ariel T. Glenn wrote: I think we finally have a complete copy from December 2007 through August 2011 of the pageview stats scrounged from various sources, now available on our dumps server. See http://dumps.wikimedia.org/other/pagecounts-raw/ This is a great step in the right direction! Thanks! MZMcBride ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- a href=http://about.me/diederik;Check out my about.me profile!/a ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)
Thanks for moving the page. Diederik On 2011-09-04, at 3:29 PM, Krinkle wrote: 2011/9/4 MZMcBride z...@mzmcbride.com Diederik van Liere wrote: I've suggested to generate bulk checksums as well but both Brion and Ariel see the primary purpose of this field to check the validity of the dump generating process and so they want to generate the checksums straight from the external storage. [...] PS: not sure if this proposal should be on strategy or mediawiki... I think standard practice nowadays is a subpage of http://www.mediawiki.org/wiki/Requests_for_comment. MZMcBride Indeed. Moved: http://mediawiki.org/wiki/Requests_for_comment/Database_field_for_checksum_of_page_text – Krinkle ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)
Hi, I've suggested to generate bulk checksums as well but both Brion and Ariel see the primary purpose of this field to check the validity of the dump generating process and so they want to generate the checksums straight from the external storage. In a general sense, there are two use cases for this new field: 1) Checking the validity of the XML dump files 2) Identifying reverts I have started to work on a proposal for deployment (and while being incomplete) it might be a good start to further plan the deployment. I have been trying to come up with some back-of-the-envelope calculations about how much time and space it would take but I don't have all the required information yet to come up with some reasonable estimates. You can find the proposal here: http://strategy.wikimedia.org/wiki/Proposal:Implement_and_deploy_checksum_revision_table I want to thank Brion and Asher for giving feedback on prior drafts. Please feel free to improve this proposal. Best, Diederik PS: not sure if this proposal should be on strategy or mediawiki... On 2011-09-03, at 7:16 AM, Daniel Friesen wrote: On 11-09-02 09:33 PM, Rob Lanphier wrote: On Fri, Sep 2, 2011 at 5:47 PM, Daniel Friesen li...@nadir-seen-fire.com wrote: On 11-09-02 05:20 PM, Asher Feldman wrote: When using for analysis, will we wish the new columns had partial indexes (first 6 characters?) Bug 2939 is one relevant bug to this, it could probably use an index. [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=2939 My understanding is that having a normal index on a table the size of our revision table will be far too expensive for db writes. ... Rob We've got 5 normal indexes on revision: - A unique int+int - A binary(14) - An int+binary(14) - Another int+binary(14) - And a varchar(255)+binary(14) That bug wise a (rev_page,rev_sha1) or (rev_page,rev_timestamp,rev_sha1) may do. -- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name] ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)
Hi! I am starting this thread because Brion's revision r94289 reverted r94289 [0] stating core schema change with no discussion [1]. Bugs 21860 [2] and 25312 [3] advocate for the inclusion of a hash column (either md5 or sha1) in the revision table. The primary use case of this column will be to assist detecting reverts. I don't think that data integrity is the primary reason for adding this column. The huge advantage of having such a column is that it will not be longer necessary to analyze full dumps to detect reverts, instead you can look for reverts in the stub dump file by looking for the same hash within a single page. The fact that there is a theoretical chance of a collision is not very important IMHO, it would just mean that in very rare cases in our research we would flag an edit being reverted while it's not. The two bug reports contain quite long discussions and this feature has also been discussed internally quite extensively but oddly enough it hasn't happened yet on the mailinglist. So let's have a discussion! [0] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94289 [1] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94541 [2] https://bugzilla.wikimedia.org/show_bug.cgi?id=21860 [3] https://bugzilla.wikimedia.org/show_bug.cgi?id=25312 Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Changing XML Wikipedia Schema to Enable Smaller Incremental Dumps that are Hadoop ready
Hi! Over the last year, I have been using the Wikipedia XML dumps extensively. I used it to conduct the Editor Trends Study [0] and me and the Summer Research Fellows [1] have used it in the last three months during the Summer of Research. I am proposing some changes to the current XML schema based on those experiences. The current XML schema presents a number of challenges for both the people who are creating dump files as the people who are consuming the dump files. Challenges include: 1) The embedded structure of the schema, a single page tag with multiple revision tags makes it very hard to develop an incremental dump utility 2) A lot of post processing is required. 3) By storing the entire text for each revision, the dump files are getting so large that they become unmanageable for most people. 1. Denormalization of the schema Instead of having a page tag with multiple revision tags, I propose to just have revision tags. Each revision tag would include a page_id, page_title, page_namespace and page_redirect tag. This denormalization would make it much easier to build an incremental dump utility. You only need to keep track of the final revision of each article at the moment of dump creation and then you can create a new incremental dump continueing from the last dump. It would also easier to restore a dump process that crashed. Finally, tools like Hadoop would have a way easier time handling this XML schema than the current one. 2. Post-processing of data Currently, a significant amount of time is required for post-processing the data. Some examples include: * The title includes the namespace and so to exclude pages from a particular namespace requires generating a separate namespace variable. Particularly, focusing on the main namespace is tricky because that can only be done by checking whether a page does not belong to any other namespace (see bug https://bugzilla.wikimedia.org/show_bug.cgi?id=27775). * The redirect tag currently is either True or False, more useful would be the article_id of the page to which a page is redirected. * Revisions within a page are sorted by revision_id, but they should be sorted by timestamp. The current ordering makes it even harder to generate diffs between two revisions (see bug https://bugzilla.wikimedia.org/show_bug.cgi?id=27112) * Some useful variables in the MySQL database are not yet exposed in the XML files. Examples include: - Length of revision (part of Mediawiki 1.17) - Namespace of article 3. Smaller dump sizes The dump files continue to grow as the text of each revision is stored in the XML file. Currently, the uncompressed XML dump files of the English Wikipedia are about 5.5Tb in size and this will only continue to grow. An alternative would be to replace the text tag with a text_added and text_removed tags. A page can still be reconstructed by patching multiple text_added and text_removed tags. We can provide a simple script / tool that would reconstruct the full text of an article up to a particular date / revision id. This has two advantages: 1) The dump files will be significantly smaller 2) It will be easier and faster to analyze the types of edits. Who is adding a template, who is wikifying an edit, who is fixing spelling and grammar mistakes. 4. Downsides This suggestion is obviously not backwards compatible and it might break some tools out there. I think that the upsides (incremental backups, Hadoop-ready and smaller sizes) outweigh the downside of being backwards incompatible. The current way of dump generation cannot continue forever. [0] http://strategy.wikimedia.org/wiki/Editor_Trends_Study, http://strategy.wikimedia.org/wiki/March_2011_Update [1] http://blog.wikimedia.org/2011/06/01/summerofresearchannouncement/ I would love to hear your thoughts and comments! Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] How can I get data to map our linguistic interconnectedness?
Dear Alec, Maybe the Community Department can help you out with your question. We are doing a number of research sprints this summer to map out different aspects of the Wikipedia communities and this sounds like a great question and we have some researchers available to help write the queries. So please contact me and I'll hook you up with the right people. Best, Diederik On Thu, Jun 16, 2011 at 4:40 AM, Platonides platoni...@gmail.com wrote: Alec Conroy wrote: I think I can build you something if you give me appropiate values for the above definition. Cheers Excellent-- so striking while the iron is hot-- I see that [[Special:Statistics]] defines active as edited within the last 30 days. I'm open to whoever many users we can realistically get info on-- the more the merrier, at least until I run out of ram. :) My initial query my go something like Select users where lasttouched was within the last month and total edit counts are greater than 500. And then, adding in the requirement of second project will narrow that pool. And then adding the constraint of a second project with a second language will narrow the pool even more. We're looking for the orphan community who have a lot of editors but little connection to English and Meta. I have added a small script at http://www.toolserver.org/~platonides/activeusers/activeusers.php to show active users per project and language. Requisites for appearing there are more than 500 edits (total) and at least one action (usually an edit) in the last month (since May 16, data is cached). Bots appear in the list. I'm still populating the data, but it should be completed by the time you read this. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- a href=http://about.me/diederik;Check out my about.me profile!/a ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Open call for Parser bugs
I love this idea! Diederik On Wed, Apr 6, 2011 at 5:21 PM, Mark A. Hershberger mhershber...@wikimedia.org wrote: Starting with this coming Monday's bug triage, I want to try and make sure the community's voice is heard. In order to do that, I've created the “triage” keyword in Bugzilla. Every week, I'll announce a theme and use this keyword to keep track of the bugs that will be handled in the meeting. As we discuss the bug, it will be modified, probably assigned to a developer, and the “triage” keyword removed. Some people may see this as bug-spam, but I'd like to keep the email notifications on so that people who have expressed an interest will know that we're giving the bugs some love. This week, I'm going to focus on Parser-related bugs. There are currently 10 bugs on the with the “triage” keyword applied. A bug triage meeting needs about 30 bugs, so I have room for about 20 more right now. I'll be adding to the list before Monday, but this is your chance to get WMF's developers talking about YOUR favorite parser bug by adding the “triage” keyword. I will reserve the right to remove the “triage” keyword — especially if the list becomes unwieldy, or if the bug has nothing to do with parsing — but I wanted to start to open up the triage process a bit more and begin to provide a way for the community to participate in these meetings. Mark. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- a href=http://about.me/diederik;Check out my about.me profile!/a ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Converting to Git?
The Python Community recently switched to a DVCS and they have documented their choice. It compares Git, Mercurial and Bzr and shows the pluses and minuses of each. In the end, they went for Mercurial. Choosing a distributed VCS for the Python project: http://www.python.org/dev/peps/pep-0374/ best, Diederik On Tue, Mar 22, 2011 at 3:47 PM, Krinkle krinklem...@gmail.com wrote: On March 22 2011, at 20:29 Mark Wonsil wrote: I haven't used git yet but after reading the excellent article that Rob Lanphier posted (http://hginit.com/00.html), I think I will. That article also explains why there wouldn't have to be as many updates to SVN as is done today. I don't think there's any doubt that git would work for Wikimedia but there would definitely be some workflow changes. That's probably the larger issue. Mark W. Another good read is http://whygitisbetterthanx.com/ -- Krinkle ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- a href=http://about.me/diederik;Check out my about.me profile!/a ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Topic and cathegory analyser
Please elaborate. Diederik Sent from my iPhone On 2011-03-03, at 16:12, Dávid Tóth 90010...@gmail.com wrote: Would it be useful to make a program that would create topic relations for each wikipedia article based on the links and the distribution of semantic structures? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] How users without programming skills can help
I am not following this line of reasoning: how can adding guidance / instructions on how to write a good bug report turn people away? In a previous life, I have studied the factors that shorten the time required to fix a big. Bugreports that contain steps to reproduce are a significant predictor to shorten the time to fix a bug. You can find the paper here: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1507233 A systematic lack of replies is also an issue but this solution was not aimed at fixing that problem. On Mon, Feb 14, 2011 at 4:39 AM, Bryan Tong Minh bryan.tongm...@gmail.comwrote: On Mon, Feb 14, 2011 at 2:46 AM, Diederik van Liere dvanli...@gmail.com wrote: So maybe we can paste these 5 steps (or something similar) in the initial form used to file a bugreport. This would increase the quality of bugreports and make it easier for bug triaging. Increase the quality perhaps, but also increase the the barrier of reporting bugs, and that is something that is not very good imho. I don't think we have a systematic problem with bad bug reports. The systematic problem is the lack of replies from developers. Bryan ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- a href=http://about.me/diederik;Check out my about.me profile!/a ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Migrating to GIT (extensions)
If I am not mistaken then mercurial has better support for highly modularized open source software projects. You can use a mercurial subrepository (which is very similar to svn external and git submodule). According to their manual: Subrepositories is a feature that allows you to treat a collection of repositories as a group. This will allow you to clone, commit to, push, and pull projects and their associated libraries as a group. see: http://mercurial.selenic.com/wiki/Subrepository http://mercurial.selenic.com/wiki/NestedRepositories just my 2 cents. On Mon, Feb 14, 2011 at 2:18 AM, Siebrand Mazeland s.mazel...@xs4all.nl wrote: Op 14-02-11 05:01 schreef Daniel Friesen li...@nadir-seen-fire.com: Ohh... if the translatewiki guys are looking for a dummy for streamlining support for extensions based in git in preparation for a git migration if we do so, I'd be happy to offer monaco-port up as a existing extension (well, skin) using git that could be used as a test for streamlining git support. ;) having monaco-port get proper i18n while it's still not up to a level I believe I want to commit it into svn yet wouldn't be a bad thing. With regards to i18n support it is not clear to me how translatewiki staff would deal with 100+1 commits to different repo's every day if core and extensions would each be in individual repos. Can you please explain how Raymond would be working with Windows and Git in the proposed structure updating L10n for 100 extensions and MediaWiki core? How would translatewiki.net easily manage MediaWiki updates (diff review/commits)? I'm not particularly looking forward to having to jump through a huge series of hoops just to keep checkouts for single extensions small. If that is the real issue, extension distribution should get another look as this might indicate that ExtensionDistributor does not work as expected. I have currently checked out all of trunk, and for translatewiki.net we have a selective checkout of i18n files for extensions and we have a checkout for core and the installed extensions. The fragmentation and disorganisation/disharmony that will exist after creating 450 GIT repos instead of one Subversion repo as we currently have is also something I am not looking forward to. Source code management is now centralised, and correct me if I'm wrong, but we encourage developers to request commit access to improve visibility of their work and grow the community. Going distributed in the proposed way, would hamper that, if I'm correct. I think the relative lower popularity of extensions that are maintained outside of svn.wikimedia.org are proof of this. I am not in favour of using GIT in the proposed way. I think core and extensions should remain in the same repo. Checkout are for developers, and developers should get just all of it. Siebrand ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- a href=http://about.me/diederik;Check out my about.me profile!/a ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] How users without programming skills can help
Maybe i am not expressing myself clear, i am not talking about adding checkboxes, radiobuttons or pulldown menus, I am saying that we could add the following text to the textarea field which contains the actual bugreport: Please describe the steps to take to reproduce the problem: What is the expected result: What is the actual result: If you know which version you are using or you have other information that you think might be helpful please add it as well. You can also describe the problem in your own words and not sticking to the abovementioned questions. So, again I am not saying we should add fields, we could add this text as the default text in the textarea so people have a bit more guidance when writing a bugreport. No hard checks, nothing is mandatory. On Mon, Feb 14, 2011 at 10:22 AM, Amir E. Aharoni amir.ahar...@mail.huji.ac.il wrote: 2011/2/14 Diederik van Liere dvanli...@gmail.com: I am not following this line of reasoning: how can adding guidance / instructions on how to write a good bug report turn people away? It's very simple, really: a form with a lot of fields may turn people away. I know that it turns me away. How many people are like me in this regard? That is someone that should be studied. I still do report bugs in Firefox, despite the many field in the form, but i can easily imagine people who won't. In a previous life, I have studied the factors that shorten the time required to fix a big. Bugreports that contain steps to reproduce are a significant predictor to shorten the time to fix a bug. You can find the paper here: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1507233 That makes perfect sense, but that's the developer side side of the question. I'm talking about the user side. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- a href=http://about.me/diederik;Check out my about.me profile!/a ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Making code review happen in 1.18
+1 to migrate to a DVCS On Sun, Feb 13, 2011 at 8:38 PM, Mark A. Hershberger mhershber...@wikimedia.org wrote: mhershber...@wikimedia.org (Mark A. Hershberger) writes: The solution I'm proposing is that we branch 1.18 immediately after the release of the 1.17 tarball. I want to give credit where it is due. Although I haven't seen him propose it here, this is, in fact, Robla's idea. He and I were discussing what needed to happen for 1.18 and it was his idea to branch 1.18 immediately after the release of the 1.17 tarball. Mark. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- a href=http://about.me/diederik;Check out my about.me profile!/a ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Roadmaps and getting and keeping devs
Maybe we can make the bugathon part of the Berlin hackaton? On Sun, Feb 13, 2011 at 4:03 PM, Ashar Voultoiz hashar+...@free.fr wrote: On 13/02/11 11:54, Roan Kattouw wrote: Bugzilla patches are another matter, yes, but I think making sure patches get reviewed can be a Bugmeister task. We get relatively few patches through Bugzilla these days anyway. Maybe once 1.17 is released, we should focus on the bugzilla patch queue and get it solved. Would probably keep us busy until June. Do we have any hack-a-ton planned? I can probably take a whole week day-offs to participate and solve them. -- Ashar Voultoiz ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- a href=http://about.me/diederik;Check out my about.me profile!/a ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] How users without programming skills can help
I think we can draw some inspiration from Mozilla's use of Bugzilla and particular the format they are encourage users when submitting a bugreport: 1) Steps to reproduce 2) Expected result 3) Actual result 4) Reproducible (by bugreporter): always / sometimes 5) Version information, extensions installed, database used (this information is dependent on the skill level of the bugreporter and maybe we can add make this information easily retrievable if it's current not easy to determine. So maybe we can paste these 5 steps (or something similar) in the initial form used to file a bugreport. This would increase the quality of bugreports and make it easier for bug triaging. On Sun, Feb 13, 2011 at 8:28 PM, MZMcBride z...@mzmcbride.com wrote: Mark A. Hershberger wrote: Perhaps we could recruit some people from the he.wikipedia.org community to take problems reported (via the localized interface?) and reproduce them or act as a translator between developers and bug reporters? There is already some infrastructure for this kind of idea: https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors I didn't know about this mailing list until a few days ago, but it's a start in building the bridge between MediaWiki development and (power-)users. MZMcBride ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- a href=http://about.me/diederik;Check out my about.me profile!/a ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] How users without programming skills can help
That's exactly my point :) Most Firefox bugreporters are ordinary users so if they are able to report a bug then Mediawiki users can do it as well because they are basically the same group of Internet users. And again, my suggestion is not hard, it's about giving ordinary people a number of things they might want to think about when submitting a report. This certainly will not scare people away, in the worst case they will ignore the questions. On Sun, Feb 13, 2011 at 11:16 PM, Daniel Friesen li...@nadir-seen-fire.comwrote: Actually our users could be anyone who reads Wikipedia and notices there's something wrong with what MediaWiki is doing or thinks there is something about the ui we need to fix. They don't even have to be as advanced as a Firefox user... they could be a random human who doesn't even know they can install a browser other than Internet Explorer on their computer. If someone is already saying it's harder to report a bug to Mozilla about something they usually install themselves, I don't think we want reporting to be as hard when we have users who don't even know it's something they can install. ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name] On 11-02-13 07:53 PM, Diederik van Liere wrote: Dear James, Amir and fellow wikimedia devs, I understand your concern and I am not suggesting that we should force a user to enter all Bugzilla fields but add those 5 questions as a guideline in the free-text form. Reporters can use it when they feel uncertain what information we are looking for but they are not forced to stick to any format in particular. Additionally, I think that Mediawiki users are as technological advanced as Firefox users so I don't think this will scare somebody away. If we really want to make it easier for people to file a bug then we should add a simple wizard to guide them through the process. In particular choosing the right product and component can be quite confusing / intimidating for somebody new to Medawiki. On Sun, Feb 13, 2011 at 9:43 PM, James Alexander jalexan...@wikimedia.orgwrote: On 2/13/2011 8:46 PM, Diederik van Liere wrote: I think we can draw some inspiration from Mozilla's use of Bugzilla and particular the format they are encourage users when submitting a bugreport: 1) Steps to reproduce 2) Expected result 3) Actual result 4) Reproducible (by bugreporter): always / sometimes 5) Version information, extensions installed, database used (this information is dependent on the skill level of the bugreporter and maybe we can add make this information easily retrievable if it's current not easy to determine. So maybe we can paste these 5 steps (or something similar) in the initial form used to file a bugreport. This would increase the quality of bugreports and make it easier for bug triaging. I can totally understand the idea behind this but I think Amir brings up the concern about this best: On 2/13/2011 5:56 PM, Amir E. Aharoni wrote: bugzilla.wikimedia.org is the tracker where i report more bugs than elsewhere. The second is bugzilla.mozilla.org . It's not because Firefox has less bugs (quite the contrary!) but because Mozilla's tracker requires me to fill more fields, such as steps for reproduction. This may encourage detailed reporting that helps developers solve the bugs, but it may also discourage people from reporting them in the first place. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l Gathering all that information on a bug report form could quite clearly make it easier to reproduce bugs and may make resolving them easier but I worry that the harder and/or more complicated we make the reporting the more likely we are to scare someone away from taking the time to file the bug (which we want). I'm not totally sure where the best balance there is. -- James Alexander Associate Community Officer Wikimedia Foundation jalexan...@wikimedia.org +1-415-839-6885 x6716 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- a href=http://about.me/diederik;Check out my about.me profile!/a ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Roadmaps and getting and keeping devs
For the last months I have been going through Bugzilla and what strikes me is that we are not using it as efficiently as other communities do. In particular, there is little follow up to reported problems (as Leo mentioned as well). On the short term, I think we can have a bugathon to clean up the buglist a little bit and re-energize some community members: Have a bugathon where we label a lot of bugs as appropriate bugathon bugs that need either: a) test patch / update patch to recent svn version a) confirmation / replication of new / unconfirmed bugs We can provide a simple ready to go Wiki installation for people to use for bug triaging and that way we can re-energize developers and clean up some of the backlog of bugs. Is this something that we should be doing? On Sat, Feb 12, 2011 at 3:41 PM, Leo diebu...@gmail.com wrote: On Samstag, 12. Februar 2011 at 17:55, David Gerard wrote: How to grow your contributor community (and how to decimate it): http://www.codesimplicity.com/post/open-source-community-simplified/ and imo, wikimedia fails at a lot of these points: *Quote: Respond to contributions immediately. This is what I think bugs me the most. There are heaps of bugs which have had patches attached for month or years. For newcomers, who maybe spent a lot of time on these, it's just rude to neither commit them nor explain why they can't be committed immidiately. *Create and document communication channels. This has been talked about before, and maybe it did indeed get it little better. Leo ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Roadmaps and getting and keeping devs
I think one way that non technical people can help is by trying to replicate bugs, if they follow the steps as described in the bugreport Do you get the same malfunction or not. That would be a great help as it weeds out invalid bugreports Sent from my iPhone On 2011-02-12, at 17:26, phoebe ayers phoebe.w...@gmail.com wrote: On Sat, Feb 12, 2011 at 1:11 PM, Ryan Lane rlan...@gmail.com wrote: Have a bugathon where we label a lot of bugs as appropriate bugathon bugs that need either: a) test patch / update patch to recent svn version a) confirmation / replication of new / unconfirmed bugs We can provide a simple ready to go Wiki installation for people to use for bug triaging and that way we can re-energize developers and clean up some of the backlog of bugs. Is this something that we should be doing? This is something we do at hack-a-tons. I don't remember the number of bugs smashed at the last one, but it was a decent number. I believe the next hack-a-ton is in Berlin, soon. I'm not sure if they have this planned. It's apparently GLAM focused (which excludes devs like me), so I'd imagine not, unless the bugs targeted are GLAM related. - Ryan Lane I'm curious: is there a way that non-technical people can help with sprints like this? Documentation-building, maybe? Something else? I'm interested in development sprints, bugathons etc that involve both technical non-technical people; I've been involved in a few and it's pretty fun. But I don't know how many useful ways non-programmers non-developers can help. -- phoebe -- * I use this address for lists; send personal messages to phoebe.ayers at gmail.com * ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Matching main namespace articles with associated talk page
Dear dev's, I am wondering whether the Mediawiki db contains a foreignkey relationship between a main namespace article and the associated talk page (if present). Having this information would greatly simplify analytic projects to monitor editor behaviour and understanding revert behaviour (among other topics). Currently, I am manually matching these two sets of pages by matching titles. I have two questions: 1) If this foreignkey does not exist, would it be worthwhile to create it? 2) If this foreignkey does exist, what would it take to expose this in the XML dumps? Best regards, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Matching main namespace articles with associated talk page
Yes, manually matching is fairly simple but in the worst case you need to iterate over n-1 talk pages (where n is the total number of talk pages of a Wikipedia) to find the talk page that belongs to a user page when using the dump files. Hence, if the dump file would contain for each article a tag with talk page id then it would significantly reduce the processing time. Diederik On Sat, Jan 8, 2011 at 11:39 AM, Bryan Tong Minh bryan.tongm...@gmail.com wrote: On Sat, Jan 8, 2011 at 5:32 PM, John phoenixoverr...@gmail.com wrote: its just a matter of matching page titles, if there is a page in namespace 0 and a page in namespace (article and article talk) with the same title they go together. its fairly simple To expand John's comment, the talk page is always the page with the same title, but with a namespace number 1 higher. Bryan ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- a href=http://about.me/diederik;Check out my about.me profile!/a ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Update on 1.17
The same error is given for: * Russian * Japanese * Italian * Arabic (ar is the language code) Best, Diederik 2011/1/7 Bryan Tong Minh bryan.tongm...@gmail.com: On Fri, Jan 7, 2011 at 4:37 PM, Roan Kattouw roan.katt...@gmail.com wrote: 2011/1/7 Bryan Tong Minh bryan.tongm...@gmail.com: Also FR seems to be unconditionally enabled, also on wikis that do not have the tables present. Which wikis would those be? Rob says he ran update.php so all the tables should be there. http://prototype.wikimedia.org/deployment-nl/Hoofdpagina Databasefout Er is een syntaxisfout in het databaseverzoek opgetreden. Mogelijk zit er een fout in de software. Het laatste verzoek aan de database was: (SQL-zoekopdracht verborgen) vanuit de functie “FlaggedRevision::newFromStable”. De database gaf de volgende foutmelding “1146: Table 'nlwiki.flaggedpages' doesn't exist (localhost)”. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- a href=http://about.me/diederik;Check out my about.me profile!/a ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Parallelizing export dump (bug 24630)
To continue the discussion on how to improve the performance, would it be possible to distribute the dumps as a 7z / gz / other format archive containing multiple smaller XML files. It's quite tricky to split a very large XML file in smaller valid XML files and if the dumping process is already parallelized then we do not have to cat the different XML files to one large XML file but instead we can distribute multiple smaller parallelized files . best, Diederik On 2010-12-16, at 7:02 PM, Ariel T. Glenn wrote: Στις 17-12-2010, ημέρα Παρ, και ώρα 00:52 +0100, ο/η Platonides έγραψε: Roan Kattouw wrote: I'm not sure how hard this would be to achieve (you'd have to correlate blob parts with revisions manually using the text table; there might be gaps for deleted revs because ES is append-only) or how much it would help (my impression is ES is one of the slower parts of our system and reducing the number of ES hits by a factor 50 should help, but I may be wrong), maybe someone with more relevant knowledge and experience can comment on that (Tim?). Roan Kattouw (Catrope) ExternalStoreDB::fetchBlob() is already keeping the last one to optimize repeated accesses to the same blob (we would probably want a bigger cache for the dumper, though). On the other hand, I don't think the dumpers should be doing the store of textid contents in memcached (Revision::loadText) since they are filling them with entries useless for the users queries (having a different locality set), useless for themselves (since they are traversing the full list once) and -even assuming that the memcached can happily handle it and no other data is affecting by it- the network delay make it a non-free operation. Ariel, do you have in wikitech the step-by-step list of actions to setup a WMF dump server? I always forget about which scripts are being used and what does each of them do. Can xmldumps-phase3 be removed? I'd prefer that it uses the release/trunk/wmf-deployment, an old copy is a source for problems. If additional changes are needed (it seems unpatched), the appropiate hooks should be added in core. Most backups run off of trunk. The stuff I have in my branch is the parallel stuff for testing. http://wikitech.wikimedia.org/view/Dumps details the various scripts. No, xmldumps-phase3 can't be removed yet. I have yet to make the changes I need to that code (and I won't make them in core immediately, they need to be tested thoroughly first before being checked in). Once I think they are ok, then I will fold them into trunk. It will be a while yet. Ariel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Parallelizing export dump (bug 24630)
Which dump file is offered in smaller sub files? On Sun, Dec 19, 2010 at 6:02 PM, Platonides platoni...@gmail.com wrote: Diederik van Liere wrote: To continue the discussion on how to improve the performance, would it be possible to distribute the dumps as a 7z / gz / other format archive containing multiple smaller XML files. It's quite tricky to split a very large XML file in smaller valid XML files and if the dumping process is already parallelized then we do not have to cat the different XML files to one large XML file but instead we can distribute multiple smaller parallelized files . best, Diederik That has already been done for enwiki. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- a href=http://about.me/diederik;Check out my about.me profile!/a ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l