Re: [Wikitech-l] mod_pagespeed and MediaWiki
On Mon, Jul 15, 2013 at 7:57 PM, Ilya Grigorik igrigo...@google.com wrote: +asher (woops, forgot to cc :)) On Mon, Jul 15, 2013 at 7:54 PM, Ilya Grigorik igrigo...@google.comwrote: Anyway, I've already started working on something I noticed in mod_pagespeed - a much better JS minification, expect updates soon:) Not to discourage you from doing so.. but JS minification is not the problem. In fact, if you look at the sidehttp://www.webpagetest.org/breakdown.php?test=130715_82_3c03a9eb9339dcf8d3e82ed43ad2998drun=3cached=0by sidehttp://www.webpagetest.org/breakdown.php?test=130715_VZ_7748042f6f940ec663a43130cd597eeerun=4cached=0content breakdown of the original and MPS optimized sites, you'll notice that MPS is loading 3kb more of JS (because we add some of our own logic). We're not talking about applying missing gzip or minification.. To make the site mobile friendly, we're talking about structural changes to the page: eliminating blocking javascript code, inlining critical CSS to unblock first render, deferring other assets to after the above-the-fold is loaded, and so on. Those are the parts that MPS automates - the filmstriphttp://www.webpagetest.org/video/compare.php?tests=130715_82_3c03a9eb9339dcf8d3e82ed43ad2998d-l%3Aoriginal-r%3A3%2C130715_VZ_7748042f6f940ec663a43130cd597eee-l%3Amps-r%3A4%2C%2C%2C%2CthumbSize=100ival=100end=visualshould speak for itself. (Note that filmstrip shows first render at 2s, instead of 1.6, due how how the frames are captured on mobile devices in WPT). These are the points I was trying to highlight from your presentation :) While there's room for further optimization after, inlining above the fold css and deferring everything else including additional content seem like immediate gains we could start working on. The mobile dev team has already put work into being able to serve above the fold plus section headers in an initial request - we just need to make some changes to how we assemble pages to support inlining required css/js for this view, and separating out the rest. I think this can and should be delivered by mediawiki / resourceloader / mobilefrontend by design, instead of via mps however. As Max noted, this would require an additional varnish cache split, varying between devices that support this and those that don't. But the performance gain for supported devices should fully justify it, and we just invested in additional frontend cache capacity for mobile. -Asher ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] mod_pagespeed and MediaWiki
[cc'ing Joshua Marantz who leads the mod_pagespeed effort at Google, and Ilya Grigorik, their developer advocate for page performance] The principals behind mod_pagespeed, especially as they related to mobile page load performance as outlined in http://bit.ly/mobilecrp could themselves be implemented within mediawiki. mod_pagespeed itself can't just be dropped in to do the job, and especially doesn't play nicely with the full page edge caching wmf depends on; but it could be used as development guide. For mobile performance especially, the critical points are: * Everything needed to fully render above the fold content should fit within 10 packets, given our current 10 packet tcp initial connection window. * Those = 10 packets must be in the service of a single request. * All css required by that above-the-fold view must be inline. It doesn't have to be all of the css required for the page overall. * Same with javascript - anything not essential to above the fold should be deferred. I can't think of any good reasons why this couldn't be implemented by MobileFrontend. Accomplishing all of what mod_pagespeed addresses for general mediawiki use would likely involve a rewrite of resourceloader. -Asher On Fri, Jul 12, 2013 at 3:03 PM, Max Semenik maxsem.w...@gmail.com wrote: On 12.07.2013, 3:16 Max wrote: FYI, Google already sent us a sample config for this module optimized for our mobile site, I'm going to try it tomorrow. And here are the results of my research: https://www.mediawiki.org/wiki/User:MaxSem/mod_pagespeed Briefly, this is interesting stuff, but not usable on WMF, or on any other large MW installations either. -- Best regards, Max Semenik ([[User:MaxSem]]) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Please Welcome Sean Pringle, our latest TechOps member
Welcome, Sean! It's great to have you on board. On Monday, June 24, 2013, Ct Woo wrote: Hi All, The Technical Operations team is pleased to announce Sean Pringle joined us today ( 24th June, 2013). Among his duties, Sean will be attending to all aspects of the database layer including management, monitoring, design, capacity, performance, and troubleshooting. Sean comes with vast experience in database technology and development background with a specific focus on MySQL and MariaDB. He has held senior roles in database support, database administration, and technical writing with various companies including MySQL AB, Sun Microsystems, and SkySQL Ab. He also has fingers in a few non-profit and open-source projects scattered around the net. Sean hails from Queensland, Australia and while he travels around frequently, he inevitably always flee back to his down under home. He confessed being forever distracted by all things geek and technology related though he can also be spotted behind a telescope on starry nights and with nose in a book when the cloud rolls in. To quote him, I am excited to be joining the WMF, an opportunity which I see as an 11 on the awesomeness scale of 1 to 10 ! Please join us in welcoming Sean! CT Woo Ken Snider ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Relaxing our TorBlock
https://twitter.com/ioerror/status/342922052841377793 Why not - would the patrol cost really be too high? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Relaxing our TorBlock
Ah, thanks Sumana! On Friday, June 7, 2013, Sumana Harihareswara wrote: On 06/07/2013 08:43 AM, Asher Feldman wrote: https://twitter.com/ioerror/status/342922052841377793 Why not - would the patrol cost really be too high? Discussion from December: http://www.gossamer-threads.com/lists/wiki/wikitech/323006 Can we help Tor users make legitimate edits? -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [WikimediaMobile] Caching Problem with Mobile Main Page?
Faidon - thanks for the more accurate trackdown, and fix! On Sunday, May 5, 2013, Faidon Liambotis wrote: On Fri, May 03, 2013 at 03:19:13PM -0700, Asher Feldman wrote: 1) Our multicast purge stream is very busy and isn't split up by cache type, so it includes lots of purge requests for images on upload.wikimedia.org. Processing the purges is somewhat cpu intensive, and I saw doing so once per varnish server as preferable to twice. I believe the plan is to split up the multicast groups *and* to filter based on predefined regexps on the HTCP-PURGE layer, via the varnishhtcpd rewrite. But I may be mistaken, Mark and Brandon will know more. There are multiple ways to approach making the purges sent to the frontends actually work such as rewriting the purges in varnish, rewriting them before they're sent to varnish depending on where they're being sent, or perhaps changing how cached objects are stored in the frontend. I personally think it's all an unnecessary waste of resources and prefer my original approach. Although the current VCL calls vcl_recv_purge after the rewrite step (and hence actually rewriting purges too), unless I'm mistaken this is actually unnecessary. The incoming purges match the way the objects are stored in the cache: both are without the .m. (et al) prefix, as normal desktop purges are matched with objects that had their URLs rewritten in vcl_recv. Handling purges after the rewrite step might be unnecessary but it doesn't mean it's a bad idea though; it doesn't hurt much and it's better as it allows us to also purge via the original .m. URL, which is what a person might do instictively. While mobile purges were actually broken recently in the past in a similar way as you guessed with I77b88f[1] (Restrict PURGE lookups to mobile domains) they were fixed shortly after with I76e5c4[2], a full day before the frontend cache TTL was removed. 1: https://gerrit.wikimedia.org/r/#q,I77b88f3b4bb5ec84f70b2241cdd5dc496025e6fd,n,z 2: https://gerrit.wikimedia.org/r/#q,I76e5c4218c1dec06673aa5121010875031c1a1e2,n,z What actually broke them again this time is I3d0280[3], which stripped absolute URIs before vcl_recv_purge, despite the latter having code that matches only against absolute URIs. This is my commit, so I'm responsible for this breakage, although in my defence I have an even score now for discovering the flaw last time around :) I've pushed and merged I08f761[4] which moves rewrite_proxy_urls after vcl_recv_purge and should hopefully unbreak purging while also not reintroducing BZ #47807. 3: https://gerrit.wikimedia.org/r/#q,I3d02804170f7e502300329740cba9f45437a24fa,n,z 4: https://gerrit.wikimedia.org/r/#q,I08f7615230037a6ffe7d1130a2a6de7ba370faf2,n,z As a side note, notice how rewrite_proxy_urls vcl_recv_purge are both flawed in the same way: the former exists solely to workaround a Varnish bug with absolute URIs, while the latter is *depending* on that bug to manifest to actually work. req.url should always be a (relative) URL and hence the if (req.url ~ '^http:') comparison in vcl_recv_purge should normally always evaluate to false, making the whole function a no-op. However, due to the bug in question, Varnish doesn't special-handle absolute URIs in violation of RFC 2616. This, in combination with the fact that varnishhtcpd always sends absolute URIs (due to an RFC-compliant behavior of LWP's proxy() method), is why we have this seemingly wrong VCL code but which actually works as intended. This Varnish bug was reported by Tim upstream[5] and the fix is currently sitting in Varnish's git master[6]. It's simple enough and it might be worth it to backport it, although it might be more troulbe that it's worth, considering how it will break purges with our current VCL :) 5: https://www.varnish-cache.org/trac/ticket/1255 6: https://www.varnish-cache.org/trac/changeset/2bbb032bf67871d7d5a43a38104d58f747f2e860 Cheers, Faidon ___ Mobile-l mailing list mobil...@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/mobile-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [WikimediaMobile] Caching Problem with Mobile Main Page?
The problem is due to recent changes that were made to how mobile caching works. I just flushed cache on all of the frontend varnish instances which indeed appears to have fixed the problem but it isn't actually fixed. Note, the frontend instances just have 1GB of cache, so only very popular objects (like the enwiki front page) avoid getting LRU'd. The backend varnish instances utilize the ssd's and perform the heavy caching work. When I originally built this, I had the frontends force a short (300s) ttl on all cacheable objects, while the backends honored the times specified by mediawiki. I chose to only send purges to the backend instances (via wikia's old varnishhtcpd) and let the frontend instances catch up with their short ttls. My reasoning was: 1) Our multicast purge stream is very busy and isn't split up by cache type, so it includes lots of purge requests for images on upload.wikimedia.org. Processing the purges is somewhat cpu intensive, and I saw doing so once per varnish server as preferable to twice. 2) Purges are for url's such as en.wikipedia.org/wiki/Main_Page. The frontend varnish instance strips the m subdomain before sending the request onwards, but still caches content based on the request url. Purges are never sent for en.m.wikipedia.org/wiki/Main_Page - every purge would need to be rewritten to apply to the frontend varnishes. Doing this blindly would be more expensive than it should be, since a significant percentage of purge statements aren't applicable. I don't think my original approach had any fans. Purges are now sent to both varnish instances per host, and more recently, the 300s ttl override was removed from the frontends. But all of the purges are no-ops. There are multiple ways to approach making the purges sent to the frontends actually work such as rewriting the purges in varnish, rewriting them before they're sent to varnish depending on where they're being sent, or perhaps changing how cached objects are stored in the frontend. I personally think it's all an unnecessary waste of resources and prefer my original approach. -Asher On Fri, May 3, 2013 at 2:23 PM, Arthur Richards aricha...@wikimedia.orgwrote: +wikitech-l I've confirmed the issue on my end; ?action=purge seems to have no effect and the 'last modified' notification on the mobile main page looks correct (though the content itself is out of date and not in sync with the 'last modified' notification). What's doubly weird to me is the 'Last modified' HTTP response headers says: Last-Modified: Tue, 30 Apr 2013 00:17:32 GMT Which appears to be newer than when the content I'm seeing on the main page was updated... Anyone from ops have an idea what might be going on? On Thu, May 2, 2013 at 10:01 PM, Yuvi Panda yuvipa...@gmail.com wrote: Encountered https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Issue_with_Main_Page_on_mobile.2C_viz._it_hasn.27t_changed_since_Tuesday Some people seem to be having problems with the mobile main page being cached too much. Can someone look into it? -- Yuvi Panda T http://yuvi.in/blog ___ Mobile-l mailing list mobil...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l -- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687 ___ Mobile-l mailing list mobil...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Mobile caching improvements are coming
This sounds like a great plan. Thank you! On Fri, Mar 29, 2013 at 2:45 AM, Max Semenik maxsem.w...@gmail.com wrote: Hi, we at the mobile team are currently working on improving our current hit rate, publishing the half-implemented plan here for review: == Current status == * X-Device header is generated by frontend Varnish from user-agent. * There are currently 21 possible X-Device values, which we decreased to 20 this week. * X-Device is used for HTML variance (roughly, Vary: X-Device). * Depending on X-Device, we alter skin HTML, serve it full or limited resources. * Because some phones need CSS tweaks and don't support media queries, we have to serve them device-specific CSS. * Device-specific CSS is served via separate ResourceLoader modules e.g. mobile.device.android. == What's bad about it? == Cache fragmentation is very high, resulting in ~55% hit rate. == Proposed strategy == * We don't vary pages on X-Device anymore. * Because we still need to give really ancient WAP phones WML output, we create a new header, X-WAP, with just two values, yes or not[1] * And we vary our output on X-WAP instead of X-Device[2] * Because we still need to serve device-specific CSS but can't use device name in page HTML, we create a single ResourceLoader module, mobile.device.detect, which outputs styles depending on X-Device.[2] This does not affect bits cache fragmentation because it simply changes the way the same data is varied, but not adds the new fragmentation factors. Bits hit rate currently is very high, by the way. * And because we need X-Device, we will need to direct mobile load.php requests to the mobile site itself instead of bits. Not a problem because mobile domains are served by Varnish just like bits. * Since now we will be serving ResourceLoader to all devices, we will blacklist all the incompatible devices in the startup module to prevent them from choking on the loads of JS they can't handle (and even if they degrade gracefully, still no need to force them to download tens of kilobytes needlessly)[3] == Commits == [1] https://gerrit.wikimedia.org/r/#/c/32866/ - adds X-WAP to Varnish [2] https://gerrit.wikimedia.org/r/55226 - main MobileFrontend change [3] https://gerrit.wikimedia.org/r/#/c/55446/ - ResourceLoader change, just a sketch of a real solution as of the moment I'm writing this Your comments are highly appreciated! :) -- Best regards, Max Semenik ([[User:MaxSem]]) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Wmfall] Announcing Latest Member to Operations Engineering Team - Brandon Black
Welcome Brandon!! On Fri, Mar 29, 2013 at 3:00 PM, Ct Woo ct...@wikimedia.org wrote: All, We are excited to announce that Brandon Black will join us this Monday (2013-04-01) as a full-time member of the Operations Engineering team. Brandon comes with deep and wide technical experience. Previously, he held senior systems engineering positions in companies like SqueezeNetwork.com, Veritas DGC, MCI WorldCom and Networks Online. He is an active proponent and contributor of open-source software, and has contributed a new GPL-licensed DNS software (gdnsdhttps://github.com/blblack/gdnsd) to accomplish global-level geographic balancing and automatic failover without paying for expensive commercial solutions. Brandon resides in Magnolia, TX, but has roamed the planet all his life (like spending his High School years in Singapore). He is excited to join the Wikimedia Ops team and hopes to learn many new things from the experience. His interests include auto racing, being a professional amateur, and learning new skills by starting projects which he has no idea how to finish. Brandon will be in San Francisco office this coming Monday and please drop by to welcome him! Thanks, CT Woo ___ Wmfall mailing list wmf...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wmfall ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Mobile caching improvements are coming
Why don't we continue to use the bits cache for all things resourceloader. Can you provide a different path for these requests, such as instead of: http://bits.wikimedia.org/en.wikipedia.org/load.php?.. use something like: http://bits.wikimedia.org/m/en.wikipedia.org/load.php?.. Then we can if (req.url ~ ^/m/) { tag_carrier + strip the /m/ }, so the overhead only effects mobile requests. Faidon has raised that it's still advantageous to shard page resources across more than one domain for browser pipelining. On Fri, Mar 29, 2013 at 1:55 PM, Arthur Richards aricha...@wikimedia.orgwrote: This approach will require either: 1) Adding device detection to bits for device variance 2) Using mobile varnish to handle load.php requests for resources requested from .m domains From conversations with Max and some folks from ops, it sounds like #2 is the preferred approach, but I am a little nervous about it since mobile varnish caches will have to handle a significant increase in requests. It looks like a typical article load results in 6 load.php requests. Also, we'll need to duplicate some configuration from the bits VCL. Ops, is this OK given current architecture? On Fri, Mar 29, 2013 at 11:18 AM, Max Semenik maxsem.w...@gmail.com wrote: On 29.03.2013, 21:47 Yuri wrote: Max, do we still plan to detect javascript support for mobile devices, or do you want to fold that into isWAP ? Non-js-supporting devices need very different handling, as all HTML has to be pre-built for them on the server. ResourceLoader has a small stub module called startup. It checks browser compatibility and then loads jQuery and various MediaWiki modules (including ResourceLoader core). We just need to imporove the checks, as the my original message states: * Since now we will be serving ResourceLoader to all devices, we will blacklist all the incompatible devices in the startup module to prevent them from choking on the loads of JS they can't handle (and even if they degrade gracefully, still no need to force them to download tens of kilobytes needlessly)[3] -- Best regards, Max Semenik ([[User:MaxSem]]) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [RFC] performance standards for new mediawiki features
On Thu, Mar 21, 2013 at 10:55 PM, Yuri Astrakhan yastrak...@wikimedia.orgwrote: API is fairly complex to meassure and performance target. If a bot requests 5000 pages in one call, together with all links categories, it might take a very long time (seconds if not tens of seconds). Comparing that to another api request that gets an HTML section of a page, which takes a fraction of a second (especially when comming from cache) is not very useful. This is true, and I think we'd want to look at a metric like 99th percentile latency. There's room for corner cases taking much longer, but they really have to be corner cases. Standards also have to be flexible, with different acceptable ranges for different uses. Yet if 30% of requests for an api method to fetch pages took tens of seconds, we'd likely have to disable it entirely until its use or the number of pages per request could be limited. On Fri, Mar 22, 2013 at 1:32 AM, Peter Gehres li...@pgehres.com wrote: From where would you propose measuring these data points? Obviously network latency will have a great impact on some of the metrics and a consistent location would help to define the pass/fail of each test. I do think another benchmark Ops features would be a set of latency-to-datacenter values, but I know that is a much harder taks. Thanks for putting this together. On Thu, Mar 21, 2013 at 6:40 PM, Asher Feldman afeld...@wikimedia.org wrote: I'd like to push for a codified set of minimum performance standards that new mediawiki features must meet before they can be deployed to larger wikimedia sites such as English Wikipedia, or be considered complete. These would look like (numbers pulled out of a hat, not actual suggestions): - p999 (long tail) full page request latency of 2000ms - p99 page request latency of 800ms - p90 page request latency of 150ms - p99 banner request latency of 80ms - p90 banner request latency of 40ms - p99 db query latency of 250ms - p90 db query latency of 50ms - 1000 write requests/sec (if applicable; writes operations must be free from concurrency issues) - guidelines about degrading gracefully - specific limits on total resource consumption across the stack per request - etc.. Right now, varying amounts of effort are made to highlight potential performance bottlenecks in code review, and engineers are encouraged to profile and optimize their own code. But beyond is the site still up for everyone / are users complaining on the village pump / am I ranting in irc, we've offered no guidelines as to what sort of request latency is reasonable or acceptable. If a new feature (like aftv5, or flow) turns out not to meet perf standards after deployment, that would be a high priority bug and the feature may be disabled depending on the impact, or if not addressed in a reasonable time frame. Obviously standards like this can't be applied to certain existing parts of mediawiki, but systems other than the parser or preprocessor that don't meet new standards should at least be prioritized for improvement. Thoughts? Asher ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [RFC] performance standards for new mediawiki features
There are all good points, and we certainly do need better tooling for individual developers. There are a lot of things a developer can do on just a laptop in terms of profiling code, that if done consistently, could go a long way, even without it looking anything like production. Things like understanding if algorithms or queries are O(n) or O(2^n), etc. and thinking about the potential size of the relevant production data set might be more useful at that stage than raw numbers. When it comes to gathering numbers in such an environment, it would be helpful if either the mediawiki profiler could gain an easy visualization interface appropriate for such environments, or if we standardized around something like xdebug. The beta cluster has some potential as a performance test bed if only it could gain a guarantee that the compute nodes it runs on aren't oversubscribed or that the beta virts were otherwise consistently resourced. By running a set of performance benchmarks against beta and production, we may be able to gain insight on how new features are likely to perform. Beyond due diligence while architecting and implementing a feature, I'm actually a proponent of testing in production, albeit in limited ways. Not as with test.wikipedia.org which ran on the production cluster, but by deploying a feature to 5% of enwiki users, or 10% of pages, or 20% of editors. Once something is deployed like that, we do indeed have tooling available to gather hard performance metrics of the sort I proposed, though they can always be improved upon. It became apparent that ArticleFeedbackV5 had severe scaling issues after being enabled on 10% of the articles on enwiki. For that example, I think it could have been caught in an architecture review or in local testing by the developers that issuing 17 database write statements per submission of an anonymous text box that would go at the bottom of every wikipedia article was a bad idea. But it's really great that it was incrementally deployed and we could halt its progress before the resulting issues got too serious. That rollout methodology should be considered a great success. If it can become the norm, perhaps it won't be difficult to get to the point where we can have actionable performance standards for new features, via a process that actually encourages getting features in production instead of being a complicated roadblock. On Fri, Mar 22, 2013 at 1:20 PM, Arthur Richards aricha...@wikimedia.orgwrote: Right now, I think many of us profile locally or in VMs, which can be useful for relative metrics or quickly identifying bottlenecks, but doesn't really get us the kind of information you're talking about from any sort of real-world setting, or in any way that would be consistent from engineer to engineer, or even necessarily from day to day. From network topology to article counts/sizes/etc and everything in between, there's a lot we can't really replicate or accurately profile against. Are there plans to put together and support infrastructure for this? It seems to me that this proposal is contingent upon a consistent environment accessible by engineers for performance testing. On Thu, Mar 21, 2013 at 10:55 PM, Yuri Astrakhan yastrak...@wikimedia.orgwrote: API is fairly complex to meassure and performance target. If a bot requests 5000 pages in one call, together with all links categories, it might take a very long time (seconds if not tens of seconds). Comparing that to another api request that gets an HTML section of a page, which takes a fraction of a second (especially when comming from cache) is not very useful. On Fri, Mar 22, 2013 at 1:32 AM, Peter Gehres li...@pgehres.com wrote: From where would you propose measuring these data points? Obviously network latency will have a great impact on some of the metrics and a consistent location would help to define the pass/fail of each test. I do think another benchmark Ops features would be a set of latency-to-datacenter values, but I know that is a much harder taks. Thanks for putting this together. On Thu, Mar 21, 2013 at 6:40 PM, Asher Feldman afeld...@wikimedia.org wrote: I'd like to push for a codified set of minimum performance standards that new mediawiki features must meet before they can be deployed to larger wikimedia sites such as English Wikipedia, or be considered complete. These would look like (numbers pulled out of a hat, not actual suggestions): - p999 (long tail) full page request latency of 2000ms - p99 page request latency of 800ms - p90 page request latency of 150ms - p99 banner request latency of 80ms - p90 banner request latency of 40ms - p99 db query latency of 250ms - p90 db query latency of 50ms - 1000 write requests/sec (if applicable; writes operations must be free from concurrency issues) - guidelines about degrading gracefully
[Wikitech-l] [RFC] performance standards for new mediawiki features
I'd like to push for a codified set of minimum performance standards that new mediawiki features must meet before they can be deployed to larger wikimedia sites such as English Wikipedia, or be considered complete. These would look like (numbers pulled out of a hat, not actual suggestions): - p999 (long tail) full page request latency of 2000ms - p99 page request latency of 800ms - p90 page request latency of 150ms - p99 banner request latency of 80ms - p90 banner request latency of 40ms - p99 db query latency of 250ms - p90 db query latency of 50ms - 1000 write requests/sec (if applicable; writes operations must be free from concurrency issues) - guidelines about degrading gracefully - specific limits on total resource consumption across the stack per request - etc.. Right now, varying amounts of effort are made to highlight potential performance bottlenecks in code review, and engineers are encouraged to profile and optimize their own code. But beyond is the site still up for everyone / are users complaining on the village pump / am I ranting in irc, we've offered no guidelines as to what sort of request latency is reasonable or acceptable. If a new feature (like aftv5, or flow) turns out not to meet perf standards after deployment, that would be a high priority bug and the feature may be disabled depending on the impact, or if not addressed in a reasonable time frame. Obviously standards like this can't be applied to certain existing parts of mediawiki, but systems other than the parser or preprocessor that don't meet new standards should at least be prioritized for improvement. Thoughts? Asher ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Query performance - run code faster, merge code faster :-)
On Thu, Mar 7, 2013 at 3:57 PM, Tim Starling tstarl...@wikimedia.orgwrote: On 07/03/13 12:12, Asher Feldman wrote: Ori - I think this has been discussed but automated xhprof configuration as part of the vagrant dev env setup would be amazing :) I don't think xhprof is the best technology for PHP profiling. I reported a bug a month ago which causes the times it reports to be incorrect by a random factor, often 4 or so. No response so far. And its web interface is packed full of XSS vulnerabilities. XDebug + KCacheGrind is quite nice. That's disappointing, I wonder if xhprof has become abandonware since facebook moved away from zend. Have you looked at Webgrind ( http://code.google.com/p/webgrind/)? If not, I'd love to see it at least get a security review. KCacheGrind is indeed super powerful and nice, and well suited to a dev vm. I'm still interested in this sort of profiling for a very small percentage of production requests though, such as 0.1% of requests hitting a single server. Copying around cachegrind files and using KCacheGrind wouldn't be very practical. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Query performance - run code faster, merge code faster :-)
Database query performance isn't the leading performance bottleneck on the WMF cluster. If reading or writing to a database, certainly do take the time to specifically profile your database queries, and make sure to efficiently use caching (and avoid stampede scenarios on cache expiration) whenever possible. Hopefully in the future, caching won't be as up to individual developers to get right on an ad hoc basis. In the last year, we made changes that reduced the query load to mysql masters by nearly 70%. Those queries were well written - there was nothing to tune at the sql layer. The point being, query tuning can't substitute for or even correlate to making efficient design decisions. If you have profiled sql queries, or if your code doesn't have any to profile, don't stop there. Profiling the code itself is at least as important. The mediawiki profiler ( https://www.mediawiki.org/wiki/Profiler#Profiling) offers an easy place to start and it's good to include profiling hooks as they automatically result in p90/p99, etc. latency graphs in graphite in production. But for individual development environments, setting up xhprof might be more useful. There are plenty of tutorials out there, such as - http://blog.cnizz.com/2012/05/05/enhanced-php-performance-profiling-with-xhprof/ Ori - I think this has been discussed but automated xhprof configuration as part of the vagrant dev env setup would be amazing :) On Wed, Mar 6, 2013 at 4:36 PM, Sumana Harihareswara suma...@wikimedia.orgwrote: If you want your code merged, you need to keep your database queries efficient. How can you tell if a query is inefficient? How do you write efficient queries, and avoid inefficient ones? We have some resources around: Roan Kattouw's https://www.mediawiki.org/wiki/Manual:Database_layout/MySQL_Optimization/Tutorial -- slides at https://commons.wikimedia.org/wiki/File:MediaWikiPerformanceProfiling.pdf Roan's slides actually at https://www.mediawiki.org/wiki/File:SQL_indexing_Tutorial.pdf But! If you're a developer and would appreciate guidance around how to best create and efficiently use indexes, I highly recommend this slide deck: http://www.percona.com/files/presentations/WEBINAR-tools-and-techniques-for-index-design.pdf Asher Feldman's https://www.mediawiki.org/wiki/File:MediaWiki_Performance_Profiling.ogv -- slides at https://www.mediawiki.org/wiki/File:SQL_indexing_Tutorial.pdf slides actually at https://commons.wikimedia.org/wiki/File:MediaWikiPerformanceProfiling.pdf More hints: http://lists.wikimedia.org/pipermail/toolserver-l/2012-June/005075.html Due to the use of views on toolserver, it isn't really possible to use that environment to profile or tune queries as they would actually run in production. When you need to ask for a performance review, you can check out https://www.mediawiki.org/wiki/Developers/Maintainers#Other_Areas_of_Focus which suggests Tim Starling, Asher Feldman, and Ori Livneh. I also BOLDly suggest Nischay Nahata, who worked on Semantic MediaWiki's performance for his GSoC project in 2012. -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] switching to something better than irc.wikimedia.org
I don't think a custom daemon would actually be needed. http://redis.io/topics/pubsub While I was at flickr, we implemented a pubsub based system to push notifications of all photo uploads and metadata changes to google using redis as the backend. The rate of uploads and edits at flickr in 2010 was orders of magnitude greater than the rate of edits across all wmf projects. Publishing to a redis pubsub channel does grow in cost as the number of subscribers increases but I don't see a problem at our scale. If so, there are ways around it. We are planning on migrating the wiki job queues from mysql to redis in the next few weeks, so it's already a growing piece of our infrastructure. I think the bulk of the work here would actually just be in building a frontend webservice that supports websockets / long polling, provides a clean api, and preferably uses oauth or some form of registration to ward off abuse and allow us to limit the growth of subscribers as we scale. On Friday, March 1, 2013, Petr Bena wrote: I still don't see it as too much complex. Matter of month(s) for volunteers with limited time. However I quite don't see what is so complicated on last 2 points. Given the frequency of updates it's most simple to have the client (user / bot / service that need to read the feed) open the persistent connection to server (dispatcher) which fork itself just as sshd does and the new process handle all requests from this client. The client somehow specify what kind of feed they want to have (that's the registration part) and forked dispatcher keeps it updated with information from cache. Nothing hard. And what's the problem with multithreading huh? :) BTW I don't really think there is a need for multithreading at all, but even if there was, it shouldn't be so hard. On Fri, Mar 1, 2013 at 3:47 PM, Tyler Romeo tylerro...@gmail.comjavascript:; wrote: On Fri, Mar 1, 2013 at 9:16 AM, Petr Bena benap...@gmail.comjavascript:; wrote: I have not yet found a good and stable library for JSON parsing in c#, should you know some let me know :) Take a look at http://www.json.org/. They have a list of implementations for different languages. However, I disagree with I feel like such a project would take an insane amount of resources to develop. If we wouldn't make it insanely complicated, it won't take insane amount of time ;). The cache daemon could be memcached which is already written and stable. Listener is a simple daemon that just listen in UDP, parse the data from mediawiki and store them in memcached in some universal format, and dispatcher is just process that takes the data from cache, convert them to specified format and send them to client. Here's a quick list of things that are basic requirements we'd have to implement: - Multi-threading, which is in and of itself a pain in the a**. - Some sort of queue for messages, rather than hoping the daemon can send out every message in realtime. - Ability for clients to register with the daemon (and a place to store a client list) - Multiple methods of notification (IRC would be one, XMPP might be a candidate, and a simple HTTP endpoint would be a must). Just those basics isn't an easy task, especially considering unless WMF allocates resources to it the project would be run solely by those who have enough free time. Also, I wouldn't use memcached as a caching daemon, primarily because I'm not sure such an application even needs a caching daemon. All it does is relay messages. *--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerro...@gmail.com javascript:; ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] switching to something better than irc.wikimedia.org
On Friday, March 1, 2013, Petr Bena wrote: web frontend you say? if you compare the raw data of irc protocol (1 rc feed message) and raw data of a http request and response for one page consisting only of that 1 rc feed message, you will see a huge difference in size and performance. I was sugesting it for websockets or a long poll, the above comparison isn't relevant. Connection is established, with its protocol overhead. It stays open and messages are continually pushed from the server. Not a web request for a page containing one rc message. Also all kinds of authentication required doesn't seem like an improvement to me. It will only complicate what is simple now. Have there been many attempts to abuse irc.wikimedia.org so far? there is no authentication at all. Maybe none is needed but I don't think the irc feed interests anyone outside of a very small community. Doing something a little more modern might attract different uses. It might not, but I have no idea. On Fri, Mar 1, 2013 at 5:46 PM, Asher Feldman afeld...@wikimedia.orgjavascript:; wrote: I don't think a custom daemon would actually be needed. http://redis.io/topics/pubsub While I was at flickr, we implemented a pubsub based system to push notifications of all photo uploads and metadata changes to google using redis as the backend. The rate of uploads and edits at flickr in 2010 was orders of magnitude greater than the rate of edits across all wmf projects. Publishing to a redis pubsub channel does grow in cost as the number of subscribers increases but I don't see a problem at our scale. If so, there are ways around it. We are planning on migrating the wiki job queues from mysql to redis in the next few weeks, so it's already a growing piece of our infrastructure. I think the bulk of the work here would actually just be in building a frontend webservice that supports websockets / long polling, provides a clean api, and preferably uses oauth or some form of registration to ward off abuse and allow us to limit the growth of subscribers as we scale. On Friday, March 1, 2013, Petr Bena wrote: I still don't see it as too much complex. Matter of month(s) for volunteers with limited time. However I quite don't see what is so complicated on last 2 points. Given the frequency of updates it's most simple to have the client (user / bot / service that need to read the feed) open the persistent connection to server (dispatcher) which fork itself just as sshd does and the new process handle all requests from this client. The client somehow specify what kind of feed they want to have (that's the registration part) and forked dispatcher keeps it updated with information from cache. Nothing hard. And what's the problem with multithreading huh? :) BTW I don't really think there is a need for multithreading at all, but even if there was, it shouldn't be so hard. On Fri, Mar 1, 2013 at 3:47 PM, Tyler Romeo tylerro...@gmail.comjavascript:; javascript:; wrote: On Fri, Mar 1, 2013 at 9:16 AM, Petr Bena benap...@gmail.comjavascript:; javascript:; wrote: I have not yet found a good and stable library for JSON parsing in c#, should you know some let me know :) Take a look at http://www.json.org/. They have a list of implementations for different languages. However, I disagree with I feel like such a project would take an insane amount of resources to develop. If we wouldn't make it insanely complicated, it won't take insane amount of time ;). The cache daemon could be memcached which is already written and stable. Listener is a simple daemon that just listen in UDP, parse the data from mediawiki and store them in memcached in some universal format, and dispatcher is just process that takes the data from cache, convert them to specified format and send them to client. Here's a quick list of things that are basic requirements we'd have to implement: - Multi-threading, which is in and of itself a pain in the a**. - Some sort of queue for messages, rather than hoping the daemon can send out every message in realtime. - Ability for clients to register with the daemon (and a place to store a client list) - Multiple methods of notification (IRC would be one, XMPP might be a candidate, and a simple HTTP endpoint would be a must). Just those basics isn't an easy task, especially considering unless WMF allocates resources to it the project would be run solely by those who have enough free time. Also, I wouldn't use memcached as a caching daemon, primarily because I'm not sure such an application even needs a caching daemon. All it does is relay messages. *--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerro...@gmail.com javascript:;javascript
Re: [Wikitech-l] switching to something better than irc.wikimedia.org
On Friday, March 1, 2013, Tyler Romeo wrote: On Fri, Mar 1, 2013 at 11:46 AM, Asher Feldman afeld...@wikimedia.orgjavascript:; wrote: don't think a custom daemon would actually be needed. http://redis.io/topics/pubsub While I was at flickr, we implemented a pubsub based system to push notifications of all photo uploads and metadata changes to google using redis as the backend. The rate of uploads and edits at flickr in 2010 was orders of magnitude greater than the rate of edits across all wmf projects. Publishing to a redis pubsub channel does grow in cost as the number of subscribers increases but I don't see a problem at our scale. If so, there are ways around it. We are planning on migrating the wiki job queues from mysql to redis in the next few weeks, so it's already a growing piece of our infrastructure. I think the bulk of the work here would actually just be in building a frontend webservice that supports websockets / long polling, provides a clean api, and preferably uses oauth or some form of registration to ward off abuse and allow us to limit the growth of subscribers as we scale. Interesting. Didn't know Redis had something like this. I'm not too knowledgeable about Redis, but would clients be able to subscribe directly to Redis queues? Or would that be a security issue (like allowing people to access Memcached would be) and we would have to implement our own notification service anyway? I think a very light weight proxy that only passes subscribe commands to redis would work. A read only redis slave could be provided but I don't think it includes a way to limit what commands clients can run, including administrative ones. I think we'd want a thin proxy layer in front anyways, to track and if necessary, selectively limit access. It could be very simple though. 0mq? RabbitMQ? Seem to fit the use case pretty well / closely. Hmm, I've always only thought of RabbitMQ as a messaging service between linked applications, but I guess it could be used as a type of push notification service as well. *--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerro...@gmail.com javascript:; ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
Just to tie this thread up - the issue of how to count ajax driven pageviews loaded from the api and of how to differentiate those requests from secondary api page requests has been resolved without the need for code or logging changes. Tagging of the mobile beta site will be accomplished via a new generic mediawiki http response header dedicated to logging containing key value pairs. -Asher On Tue, Feb 12, 2013 at 9:56 AM, Asher Feldman afeld...@wikimedia.orgwrote: On Tuesday, February 12, 2013, Diederik van Liere wrote: It does still seem to me that the data to determine secondary api requests should already be present in the existing log line. If the value of the page param in an action=mobileview api request matches the page in the referrer (perhaps with normalization), it's a secondary request as per case 1 below. Otherwise, it's a pageview as per case 2. Difficult or expensive to reconcile? Not when you're doing distributed log analysis via hadoop. So I did look into this prior to writing the RFC and the issue is that a lot of API referrers don't contain the querystring. I don't know what triggers this so if we can fix this then we can definitely derive the secondary pageview request from the referrer field. D If you can point me to some examples, I'll see if I can find any insights into the behavior. On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards aricha...@wikimedia.org wrote: Thanks, Jon. To try and clarify a bit more about the API requests... they are not made on a per-section basis. As I mentioned earlier, there are two cases in which article content gets loaded by the API: 1) Going directly to a page (eg clicking a link from a Google search) will result in the backend serving a page with ONLY summary section content and section headers. The rest of the page is lazily loaded via API request once the JS for the page gets loaded. The idea is to increase responsiveness by reducing the delay for an article to load (further details in the article Jon previously linked to). The API request looks like: http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all 2) Loading an article entirely via Javascript - like when a link is clicked in an article to another article, or an article is loaded via search. This will make ONE call to the API to load article content. API request looks like: http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all These API requests are identical, but only #2 should be counted as a 'pageview' - #1 is a secondary API request and should not be counted as a 'pageview'. You could make the argument that we just count all of these API requests as pageviews, but there are cases when we can't load article content from the API (like devices that do not support JS), so we need to be able to count the traditional page request as a pageview - thus we need a way to differentiate the types of API requests being made when they otherwise share the same URL. On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrob...@gmail.com wrote: I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this further let's start another thread as I'm getting extremely confused doing so on this one). Lazy loading sections For motivation behind moving MobileFrontend into the direction of lazy loading section content and subsequent pages can be found here [1], I just gave it a refresh as it was a little out of date. In summary the reason is to 1) make the app feel more responsive by simply loading content rather than reloading the entire interface 2) reducing the payload sent to a device. Session Tracking Going back to the discussion of tracking mobile page views, it sounds like a header stating whether a page is being viewed in alpha, beta or stable works fine for standard page views. As for the situations where an entire page is loaded via the api it makes no dif ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Labs-l] Maria DB
For most projects, I recommend using the official packages available via the MariaDB projects own apt repo. The official packages are based on the Debian mysql packaging where installing the server package also installs a default database created around generic config defaults, a debian mysql maintenance user with a randomly generated password, and scripts (including init) that assume privileged access via that user. That is, installing the packages provides you with a fresh running working database with generic defaults suitable for a small server, and certain admin tasks automated. I think that's what the average labs and general users wants and expects. The packages I've built for production use at wmf strips out all of the debianisms, the debian project script rewrites, the pre/post install actions. They also leave debug symbols in the binaries and have compiler flag tweaks, but do not at this stage contain any source patches. Installing the server package doesn't create a default db, or provide an environment where you can even start the server on a fresh sever install without further work. Probably not a good choice for most labs users. On Wednesday, February 13, 2013, Petr Bena wrote: thanks for updates. Can you tell me what is a difference between maria db you are using and the version that is recommended for use on ubuntu? On Wed, Feb 13, 2013 at 6:58 PM, Asher Feldman afeld...@wikimedia.orgjavascript:_e({}, 'cvml', 'afeld...@wikimedia.org'); wrote: The production migration to MariaDB was paused for a time by the EQIAD datacenter migration and issues involving other projects that took up my time, but the trial production roll-out will resume this month. All signs still point to our using it in production. I did a lot of query testing on an enwiki MariaDB 5.5 slave over the course of more than a month before the first production deployment. Major version migrations with mysql and derivatives are not to be taken lightly in production environments. At a minimum, one must be concerned about query optimizer changes making one particular query type significantly slower. In the case of the switch to 5.5, there are several default behavior changes over 5.1 that can break applications or change results. Hence, some serious work over a plodding time frame before that first production slave switch. Despite those efforts, a couple weeks after the switch, I saw a query generated by what seems to be a very rare edge case from that AFTv4 extension that violated stricter enforcement of unsigned integer types in 5.5, breaking replication and requiring one off rewriting and execution of the query locally to ensure data consistency before skipping over it. I opened a bug, Mathias fixed the extension, and I haven't seen any other compatibility issues from AFTv4 or anything else deployed on enwiki. That said, other projects utilize different extensions, so all of my testing that has gone into enwiki cannot be assumed to fully cover everything else. Because of that, and because I want to continue proceeding with caution for all of our projects, this will continue to be a slow and methodical process at this stage. Bugs in extensions that aren't used by English Wikipedia may be found and require fixing along the way. As the MariaDB roll-out proceeds, I will provide updates on wikitech-l. Best, Asher On Wed, Feb 13, 2013 at 5:19 AM, Petr Bena benap...@gmail.comjavascript:_e({}, 'cvml', 'benap...@gmail.com'); wrote: Okay - so what is outcome? Should we migrate beta cluster? Are we going to use it in production? On Wed, Feb 13, 2013 at 2:08 PM, Chad innocentkil...@gmail.comjavascript:_e({}, 'cvml', 'innocentkil...@gmail.com'); wrote: On Wed, Feb 13, 2013 at 8:05 AM, bawolff bawolff...@gmail.comjavascript:_e({}, 'cvml', 'bawolff%2...@gmail.com'); wrote: Umm there was a thread several months ago about how it is used on several of the slave dbs, if I recall. Indeed, you're looking for mariadb 5.5 in production for english wikipedia http://www.gossamer-threads.com/lists/wiki/wikitech/319925 -Chad ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:_e({}, 'cvml', 'Wikitech-l@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Labs-l mailing list lab...@lists.wikimedia.org javascript:_e({}, 'cvml', 'lab...@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/labs-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:_e({}, 'cvml', 'Wikitech-l@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman
Re: [Wikitech-l] [Labs-l] Maria DB
Er, no it shouldn't. Initial execution might take microseconds longer due to larger binary sizes and the elf loader having to skip over the symbols but that's about it. On Thursday, February 14, 2013, Petr Bena wrote: Keeping debug symbols in binaries will result in poor performance, or it should On Thu, Feb 14, 2013 at 4:47 PM, Asher Feldman afeld...@wikimedia.orgjavascript:_e({}, 'cvml', 'afeld...@wikimedia.org'); wrote: For most projects, I recommend using the official packages available via the MariaDB projects own apt repo. The official packages are based on the Debian mysql packaging where installing the server package also installs a default database created around generic config defaults, a debian mysql maintenance user with a randomly generated password, and scripts (including init) that assume privileged access via that user. That is, installing the packages provides you with a fresh running working database with generic defaults suitable for a small server, and certain admin tasks automated. I think that's what the average labs and general users wants and expects. The packages I've built for production use at wmf strips out all of the debianisms, the debian project script rewrites, the pre/post install actions. They also leave debug symbols in the binaries and have compiler flag tweaks, but do not at this stage contain any source patches. Installing the server package doesn't create a default db, or provide an environment where you can even start the server on a fresh sever install without further work. Probably not a good choice for most labs users. On Wednesday, February 13, 2013, Petr Bena wrote: thanks for updates. Can you tell me what is a difference between maria db you are using and the version that is recommended for use on ubuntu? On Wed, Feb 13, 2013 at 6:58 PM, Asher Feldman afeld...@wikimedia.orgjavascript:_e({}, 'cvml', 'afeld...@wikimedia.org');javascript:_e({}, 'cvml', 'afeld...@wikimedia.org javascript:_e({}, 'cvml', 'afeld...@wikimedia.org');'); wrote: The production migration to MariaDB was paused for a time by the EQIAD datacenter migration and issues involving other projects that took up my time, but the trial production roll-out will resume this month. All signs still point to our using it in production. I did a lot of query testing on an enwiki MariaDB 5.5 slave over the course of more than a month before the first production deployment. Major version migrations with mysql and derivatives are not to be taken lightly in production environments. At a minimum, one must be concerned about query optimizer changes making one particular query type significantly slower. In the case of the switch to 5.5, there are several default behavior changes over 5.1 that can break applications or change results. Hence, some serious work over a plodding time frame before that first production slave switch. Despite those efforts, a couple weeks after the switch, I saw a query generated by what seems to be a very rare edge case from that AFTv4 extension that violated stricter enforcement of unsigned integer types in 5.5, breaking replication and requiring one off rewriting and execution of the query locally to ensure data consistency before skipping over it. I opened a bug, Mathias fixed the extension, and I haven't seen any other compatibility issues from AFTv4 or anything else deployed on enwiki. That said, other projects utilize different extensions, so all of my testing that has gone into enwiki cannot be assumed to fully cover everything else. Because of that, and because I want to continue proceeding with caution for all of our projects, this will continue to be a slow and methodical process at this stage. Bugs in extensions that aren't used by English Wikipedia may be found and require fixing along the way. As the MariaDB roll-out proceeds, I will provide updates on wikitech-l. Best, Asher On Wed, Feb 13, 2013 at 5:19 AM, Petr Bena benap...@gmail.comjavascript:_e({}, 'cvml', 'benap...@gmail.com');javascript:_e({}, 'cvml', 'benap...@gmail.com javascript:_e({}, 'cvml', 'benap...@gmail.com');'); wrote: Okay - so what is outcome? Should we migrate beta cluster? Are we going to use it in production? On Wed, Feb 13, 2013 at 2:08 PM, Chad innocentkil...@gmail.comjavascript:_e({}, 'cvml', 'innocentkil...@gmail.com');javascript:_e({}, 'cvml', 'innocentkil...@gmail.com javascript:_e({}, 'cvml', 'innocentkil...@gmail.com');'); wrote: On Wed, Feb 13, 2013 at 8:05 AM, bawolff bawolff...@gmail.comjavascript:_e({}, 'cvml', 'bawolff%2...@gmail.com');javascript:_e({}, 'cvml', 'bawolff%2...@gmail.com javascript:_e({}, 'cvml', 'bawolff%252...@gmail.com');'); wrote: Umm there was a thread several months ago about how it is used on several of the slave dbs, if I recall
Re: [Wikitech-l] [Labs-l] Maria DB
I would much rather abandon using debs than use what the debian project has done to mysql packaging in any production environment. If the discussion has come down to this, I did WMF a disservice by drifting away from Domas' optimized make ; make install ; rsync unstripped binaries to prod workflow. In general, I find environments that don't individually package according to distro standards every part of their core application stack that gets built in-house to be more productive, and more responsive to the needs of developers and ultimately the application. When an ops team claims that building a recent version of libmemcached for a stable OS is almost impossibly hard and will take weeks because it requires backporting a debian maintainers packaging of it for an experimental distro with that distros unrelated library version dependencies and reliance on a newer incompatible dpkg tool chain, there's probably something wrong with that workflow. I like to rely on Linux distros for the lowest common denominator layer of the stack and related security updates. The approach that goes into building and maintaining such a beast are rather different than the concerns that go into operating a continually developed and deployed distributed application used by half a billion people. I don't see a win in trying to force the two together. On Thursday, February 14, 2013, Faidon Liambotis wrote: For MySQL/MariaDB, it seems that the Debian packages don't ship a -dbg package by default. That's a shame, we can ask for that. As for the rest of Asher's changes, I'd love to find a way to make stock packages work in our production setup, but I'm not sure if the maintainer would welcome the extra complexity of conditionally switching behaviors. We can try if you're willing to, Asher :) Regards, Faidon ___ Labs-l mailing list lab...@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/labs-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Labs-l] Maria DB
The production migration to MariaDB was paused for a time by the EQIAD datacenter migration and issues involving other projects that took up my time, but the trial production roll-out will resume this month. All signs still point to our using it in production. I did a lot of query testing on an enwiki MariaDB 5.5 slave over the course of more than a month before the first production deployment. Major version migrations with mysql and derivatives are not to be taken lightly in production environments. At a minimum, one must be concerned about query optimizer changes making one particular query type significantly slower. In the case of the switch to 5.5, there are several default behavior changes over 5.1 that can break applications or change results. Hence, some serious work over a plodding time frame before that first production slave switch. Despite those efforts, a couple weeks after the switch, I saw a query generated by what seems to be a very rare edge case from that AFTv4 extension that violated stricter enforcement of unsigned integer types in 5.5, breaking replication and requiring one off rewriting and execution of the query locally to ensure data consistency before skipping over it. I opened a bug, Mathias fixed the extension, and I haven't seen any other compatibility issues from AFTv4 or anything else deployed on enwiki. That said, other projects utilize different extensions, so all of my testing that has gone into enwiki cannot be assumed to fully cover everything else. Because of that, and because I want to continue proceeding with caution for all of our projects, this will continue to be a slow and methodical process at this stage. Bugs in extensions that aren't used by English Wikipedia may be found and require fixing along the way. As the MariaDB roll-out proceeds, I will provide updates on wikitech-l. Best, Asher On Wed, Feb 13, 2013 at 5:19 AM, Petr Bena benap...@gmail.com wrote: Okay - so what is outcome? Should we migrate beta cluster? Are we going to use it in production? On Wed, Feb 13, 2013 at 2:08 PM, Chad innocentkil...@gmail.com wrote: On Wed, Feb 13, 2013 at 8:05 AM, bawolff bawolff...@gmail.com wrote: Umm there was a thread several months ago about how it is used on several of the slave dbs, if I recall. Indeed, you're looking for mariadb 5.5 in production for english wikipedia http://www.gossamer-threads.com/lists/wiki/wikitech/319925 -Chad ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Labs-l mailing list lab...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/labs-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Can we kill DBO_TRX? It seems evil!
It looks like Daniel's change to log implicit commits went live on the wmf cluster with the release of 1.21wmf9. Unfortunately, it doesn't appear to be as useful as hoped for tracking down nested callers of Database::begin, the majority of log entries just look like: Wed Feb 13 22:07:21 UTC 2013mw1146 dewiki DatabaseBase::begin: Transaction already in progress (from DatabaseBase::begin), performing implicit commit! It's like we'd need a backtrace at this point. So I think we should revisit this issue and either: - expand the logging to make it more useful - disable it to prevent filling the dberror log with inactionable messages and nothing else - revisit the ideas of either dropping the implicit commit by use of a transaction counter, or of emulating real nested transactions via save points. The negative impact on concurrency due to longer lived transactions and longer held locks may negate the viability of the third option, even though it feels the most correct. -Asher On Wed, Sep 26, 2012 at 4:30 AM, Daniel Kinzler dan...@brightbyte.dewrote: I have submitted two changes for review that hopefully remedy the current problems: * I1e746322 implements better documentation, more consistent behavior, and easier tracking of implicit commits in Database::begin() * I6ecb8faa restores the flushing commits that I removed a while ago under the assumption that a commit without a begin would be a no-op. I hope this addresses any pressing issues. I still think that we need a way to protect critical sections. But an RFC seems to be in order for that. -- daniel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
On Tuesday, February 12, 2013, Diederik van Liere wrote: It does still seem to me that the data to determine secondary api requests should already be present in the existing log line. If the value of the page param in an action=mobileview api request matches the page in the referrer (perhaps with normalization), it's a secondary request as per case 1 below. Otherwise, it's a pageview as per case 2. Difficult or expensive to reconcile? Not when you're doing distributed log analysis via hadoop. So I did look into this prior to writing the RFC and the issue is that a lot of API referrers don't contain the querystring. I don't know what triggers this so if we can fix this then we can definitely derive the secondary pageview request from the referrer field. D If you can point me to some examples, I'll see if I can find any insights into the behavior. On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards aricha...@wikimedia.org wrote: Thanks, Jon. To try and clarify a bit more about the API requests... they are not made on a per-section basis. As I mentioned earlier, there are two cases in which article content gets loaded by the API: 1) Going directly to a page (eg clicking a link from a Google search) will result in the backend serving a page with ONLY summary section content and section headers. The rest of the page is lazily loaded via API request once the JS for the page gets loaded. The idea is to increase responsiveness by reducing the delay for an article to load (further details in the article Jon previously linked to). The API request looks like: http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all 2) Loading an article entirely via Javascript - like when a link is clicked in an article to another article, or an article is loaded via search. This will make ONE call to the API to load article content. API request looks like: http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all These API requests are identical, but only #2 should be counted as a 'pageview' - #1 is a secondary API request and should not be counted as a 'pageview'. You could make the argument that we just count all of these API requests as pageviews, but there are cases when we can't load article content from the API (like devices that do not support JS), so we need to be able to count the traditional page request as a pageview - thus we need a way to differentiate the types of API requests being made when they otherwise share the same URL. On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrob...@gmail.com wrote: I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this further let's start another thread as I'm getting extremely confused doing so on this one). Lazy loading sections For motivation behind moving MobileFrontend into the direction of lazy loading section content and subsequent pages can be found here [1], I just gave it a refresh as it was a little out of date. In summary the reason is to 1) make the app feel more responsive by simply loading content rather than reloading the entire interface 2) reducing the payload sent to a device. Session Tracking Going back to the discussion of tracking mobile page views, it sounds like a header stating whether a page is being viewed in alpha, beta or stable works fine for standard page views. As for the situations where an entire page is loaded via the api it makes no dif ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
Max - good answers re: caching concerns. That leaves studying if the bytes transferred on average mobile article view increases or decreases with lazy section loading. If it increases, I'd say this isn't a positive direction to go in and stop there. If it decreases, then we should look at the effect on total latency, number of requests required per pageview, and the impact on backend apache utilization which I'd expect to be 0. Does the mobile team have specific goals that this project aims to accomplish? If so, we can use those as the measure against which to compare an impact analysis. On Mon, Feb 11, 2013 at 12:21 PM, Max Semenik maxsem.w...@gmail.com wrote: On 11.02.2013, 22:11 Asher wrote: And then I'd wonder about the server side implementation. How will frontend cache invalidation work? Are we going to need to purge every individual article section relative to /w/api.php on edit? Since the API doesn't require pretty URLs, we could simply append the current revision ID to the mobileview URLs. Article HTML in memcached (parser cache), mobile processed HTML in memcached.. Now individual sections in memcached? If so, should we calculate memcached space needs for article text as 3x the current parser cache utilization? More memcached usage is great, not asking to dissuade its use but because its better to capacity plan than to react. action=mobileview caches pages only in full and serves only sections requested, so no changes in request patterns will result in increased memcached usage. -- Best regards, Max Semenik ([[User:MaxSem]]) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
Thanks for the clarification Arthur, that clears up some misconceptions I had. I saw a demo around the allstaff where individual sections were lazy loaded, so I think I had that in my head. It does still seem to me that the data to determine secondary api requests should already be present in the existing log line. If the value of the page param in an action=mobileview api request matches the page in the referrer (perhaps with normalization), it's a secondary request as per case 1 below. Otherwise, it's a pageview as per case 2. Difficult or expensive to reconcile? Not when you're doing distributed log analysis via hadoop. On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards aricha...@wikimedia.orgwrote: Thanks, Jon. To try and clarify a bit more about the API requests... they are not made on a per-section basis. As I mentioned earlier, there are two cases in which article content gets loaded by the API: 1) Going directly to a page (eg clicking a link from a Google search) will result in the backend serving a page with ONLY summary section content and section headers. The rest of the page is lazily loaded via API request once the JS for the page gets loaded. The idea is to increase responsiveness by reducing the delay for an article to load (further details in the article Jon previously linked to). The API request looks like: http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all 2) Loading an article entirely via Javascript - like when a link is clicked in an article to another article, or an article is loaded via search. This will make ONE call to the API to load article content. API request looks like: http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all These API requests are identical, but only #2 should be counted as a 'pageview' - #1 is a secondary API request and should not be counted as a 'pageview'. You could make the argument that we just count all of these API requests as pageviews, but there are cases when we can't load article content from the API (like devices that do not support JS), so we need to be able to count the traditional page request as a pageview - thus we need a way to differentiate the types of API requests being made when they otherwise share the same URL. On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrob...@gmail.com wrote: I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this further let's start another thread as I'm getting extremely confused doing so on this one). Lazy loading sections For motivation behind moving MobileFrontend into the direction of lazy loading section content and subsequent pages can be found here [1], I just gave it a refresh as it was a little out of date. In summary the reason is to 1) make the app feel more responsive by simply loading content rather than reloading the entire interface 2) reducing the payload sent to a device. Session Tracking Going back to the discussion of tracking mobile page views, it sounds like a header stating whether a page is being viewed in alpha, beta or stable works fine for standard page views. As for the situations where an entire page is loaded via the api it makes no difference to us to whether we 1) send the same header (set via javascript) or 2) add a query string parameter. The only advantage I can see of using a header is that an initial page load of the article San Francisco currently uses the same api url as a page load of the article San Francisco via javascript (e.g. I click a link to 'San Francisco' on the California article). In this new method they would use different urls (as the data sent is different). I'm not sure how that would effect caching. Let us know which method is preferred. From my perspective implementation of either is easy. [1] http://www.mediawiki.org/wiki/MobileFrontend/Dynamic_Sections On Mon, Feb 11, 2013 at 12:50 PM, Asher Feldman afeld...@wikimedia.org wrote: Max - good answers re: caching concerns. That leaves studying if the bytes transferred on average mobile article view increases or decreases with lazy section loading. If it increases, I'd say this isn't a positive direction to go in and stop there. If it decreases, then we should look at the effect on total latency, number of requests required per pageview, and the impact on backend apache utilization which I'd expect to be 0. Does the mobile team have specific goals that this project aims
Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
On Thu, Feb 7, 2013 at 4:32 AM, Mark Bergsma m...@wikimedia.org wrote: - Since we're repurposing X-CS, should we perhaps rename it to something more apt to address concerns about cryptic non-standard headers flying about? I'd like to propose to define *one* request header to be used for all analytics purposes. It can be key/value pairs, and be set client side where applicable. There's been some confusion in this thread between headers used by mediawiki in determining content generation or for cache variance, and those intended only for logging. The zero carrier header is used by the zero extension to return specific content banners and set different default behaviors (i.e. hide all images) as negotiated with individual mobile carriers. A reader familiar with this might note that their are separate X-CS and X-Carrier headers but X-Carrier is supposed to go away now. Agreed that there should be a single header for content that's strictly for analytics purposes. All changes to the udplog format in the last year or so could likely be reverted except for the delimiter change, with a multipurpose analytics key/value field added for all else. I think the question of using a URL param vs a request header should mainly take into account whether the response varies on the value of the parameter. If the responses are otherwise identical, and the value is only used for analytics purposes, I would prefer to put that into the above header instead, as it will impair cacheability / cache size otherwise (even if those requests are currently not cacheable for other reasons). If the responses are actually different based on this parameter, I would prefer to have it in the URL where possible. For this particular case, the API requests are for either getting specific sections of an article as opposed to either the whole thing, or the first section as part of an initial pageview. I might not have grokked the original RFC email well, but I don't understand why this was being discussed as a logging challenge or necessitating a request header. A mobile api request to just get section 3 of the article on otters should already utilize a query param denoting that section 3 is being fetched, and is already clearly not a primary request. Whether or not it makes sense for mobile to move in the direction of splitting up article views into many api requests is something I'd love to see backed up by data. I'm skeptical for multiple reasons. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
On Wednesday, February 6, 2013, David Schoonover wrote: Just want to summarize and make sure I've got the right conclusions, as this thread has wandered a bit. *1. X-MF-Mode: Alpha/Beta Site Usage* * * We'll roll this into the X-CS header, which will now be KV-pairs (using normal URL encoding), and set by Varnish. Nope. There will be a header denoting non-standard MobileFrontend views if the mobile team wants to leave the caching situation as is. It will be a response header set by mediawiki, not varnish. The header will have a unique name, it will not share the name of the zero carrier header. The udplog field that currently only ever contains carrier information on zero requests will become a key value field. Udplog fields are not named, they are positional. This will avoid an explosion of cryptic headers for analytic purposes. Questions: - It seems there's some confusion around bypassing Varnish. If I understand correctly, it's not that Varnish is ever bypassed, just that the upstream response is not cached if cookies are present. Is that right? Bypasses varnish caching != bypassing varnish. I don't see any use of the later in this thread, but if there has been confusion, know that all m.wikipedia.org requests are served via varnish. - Since we're repurposing X-CS, should we perhaps rename it to something more apt to address concerns about cryptic non-standard headers flying about? Nope.. We're repurposing the fixed position udplog field, not the zero carrier code header. *2. X-MF-Req: Primary vs Secondary API Requests* This header will be replaced with a query parameter set by the client-side JS code making the request. Analytics will parse it out at processing time and Do The Right Thing. Kindly correct me if I've gotten anything wrong. -- David Schoonover d...@wikimedia.org On Tue, Feb 5, 2013 at 2:36 PM, Diederik van Liere dvanli...@wikimedia.orgwrote: Analytics folks, is this workable from your perspective? Yes, this works fine for us and it's also no problem to set multiple key/value pairs in the http header that we are now using for the X-CS header. Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
On Wednesday, February 6, 2013, David Schoonover wrote: That all sounds fine to me so long as we're all agreed. Lol. RFC closed. -- David Schoonover d...@wikimedia.org javascript:; On Wed, Feb 6, 2013 at 12:59 PM, Asher Feldman afeld...@wikimedia.orgjavascript:; wrote: On Wednesday, February 6, 2013, David Schoonover wrote: Just want to summarize and make sure I've got the right conclusions, as this thread has wandered a bit. *1. X-MF-Mode: Alpha/Beta Site Usage* * * We'll roll this into the X-CS header, which will now be KV-pairs (using normal URL encoding), and set by Varnish. Nope. There will be a header denoting non-standard MobileFrontend views if the mobile team wants to leave the caching situation as is. It will be a response header set by mediawiki, not varnish. The header will have a unique name, it will not share the name of the zero carrier header. The udplog field that currently only ever contains carrier information on zero requests will become a key value field. Udplog fields are not named, they are positional. This will avoid an explosion of cryptic headers for analytic purposes. Questions: - It seems there's some confusion around bypassing Varnish. If I understand correctly, it's not that Varnish is ever bypassed, just that the upstream response is not cached if cookies are present. Is that right? Bypasses varnish caching != bypassing varnish. I don't see any use of the later in this thread, but if there has been confusion, know that all m.wikipedia.org requests are served via varnish. - Since we're repurposing X-CS, should we perhaps rename it to something more apt to address concerns about cryptic non-standard headers flying about? Nope.. We're repurposing the fixed position udplog field, not the zero carrier code header. *2. X-MF-Req: Primary vs Secondary API Requests* This header will be replaced with a query parameter set by the client-side JS code making the request. Analytics will parse it out at processing time and Do The Right Thing. Kindly correct me if I've gotten anything wrong. -- David Schoonover d...@wikimedia.org javascript:; On Tue, Feb 5, 2013 at 2:36 PM, Diederik van Liere dvanli...@wikimedia.org javascript:;wrote: Analytics folks, is this workable from your perspective? Yes, this works fine for us and it's also no problem to set multiple key/value pairs in the http header that we are now using for the X-CS header. Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
On Mon, Feb 4, 2013 at 4:24 PM, Arthur Richards aricha...@wikimedia.orgwrote: Asher, I understand your hesitation about using HTTP header fields, but there are a couple problems I'm seeing with using query string parameters. Perhaps you or others have some ideas how to get around these: * We should keep user-facing URLs canonical as much as possible (primarily for link sharing) ** If we keep user-facing URLs canonical, we could potentially add query string params via javascript, but that would only work on devices that support javascript/have javascript enabled (this might not be a huge deal as we are planning changes such that users that do not support jQuery will get a simplified version of the stable site) I was thinking of this as a solution for the X-MF-Req header, based on your explanation of it earlier in the the thread: Almost correct - I realize I didn't actually explain it correctly. This would be a request HTTP header set by the client in API requests made by Javascript provided by MobileFrontend. I only meant to apply the query string idea to API requests, which can also be marked to indicate non-standard versions of the site. I completely missed the case of non-api requests about which beta/alpha usage data needs to be collected. What about doing so via the eventlog service? Only for users actually opted into one of these programs, no need to log anything special for the majority of users getting the standard site. * How could this work for the first pageview request (eg a user clicking a link from Google or even just browsing to http://en.wikipedia.org)? I think this is covered by the above, in that the data intended to go into x-mf-req doesn't apply to this sort of page view, and first views from users opted into a trial can eventlog the trial usage. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
On Mon, Feb 4, 2013 at 4:59 PM, Arthur Richards aricha...@wikimedia.orgwrote: In the case of the cookie, the header would actually get set by the backend response (from Apache) and I believe Dave cooked up or was planning on cooking some magic to somehow make that information discernable when results are cached. Opting into the mobile beta as it is currently implemented bypasses varnish caching for all future mobile pageviews for the life of the cookie. So this probably isn't quite right (at least the when results are cached part.) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
On Mon, Feb 4, 2013 at 5:21 PM, Asher Feldman afeld...@wikimedia.orgwrote: On Mon, Feb 4, 2013 at 4:59 PM, Arthur Richards aricha...@wikimedia.orgwrote: In the case of the cookie, the header would actually get set by the backend response (from Apache) and I believe Dave cooked up or was planning on cooking some magic to somehow make that information discernable when results are cached. Opting into the mobile beta as it is currently implemented bypasses varnish caching for all future mobile pageviews for the life of the cookie. So this probably isn't quite right (at least the when results are cached part.) Thinking about this further.. So long as all beta optins bypass all caching and always have to hit an apache, it would be fine for mf to set a response header reflecting the version of the site the optin cookie triggers (but only if there's an optin, avoid setting on standard.) I'd just prefer this to be logged without adding a field to the entire udplog stream that will generally just be wasted space. Mobile already has one dedicated udplog field currently intended for zero carriers, wasted log space for nearly every request. Make it a key/value field that can contain multiple keys, i.e. zc:orn;v:b1 (zero carrier = orange whatever, version = beta1) If by some chance mobile beta gets implemented in a way that doesn't kill frontend caching for its users (maybe solely via different js behavior based on the presence of the optin cookie?) the above won't be applicable anymore, so using the event log facility / pixel service to note beta usage becomes more appropriate. If beta usage is going to be driven upwards, I hope this approach is seriously considered. Mobile currently only has around a 58% edge cache hitrate as it is and it sounds like upcoming features will place significant new demands on the apaches and for memcached space. If a non cache busting beta site is doable, go for the logging method now that will later be compatible with it to avoid having to change processing methods. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
If you want to differentiate categories of API requests in logs, add descriptive noop query params to the requests. I.e mfmode=2. Doing this in request headers and altering edge config is unnecessary and a bad design pattern. On the analytics side, if parsing query params seems challenging vs. having a fixed field to parse, deal. On Sunday, February 3, 2013, David Schoonover wrote: Huh! News to me as well. I definitely agree with that decision. Thanks, Ori! I've already written the Varnish code for setting X-MF-Mode so it can be captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or at least, MF-Mode? Looking especially to hear from Arthur and Matt. -- David Schoonover d...@wikimedia.org javascript:; On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanli...@wikimedia.org javascript:;wrote: Thanks Ori, I was not aware of this D Sent from my iPhone On 2013-02-02, at 16:55, Ori Livneh o...@wikimedia.org javascript:; wrote: On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote: I don't like it's cryptic nature. Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b». Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary But that also means sending more bytes through the wire :S Well, you can (and should) drop the 'X-' :-) See http://tools.ietf.org/html/rfc6648: Deprecating the X- Prefix and Similar Constructs in Application Protocols -- Ori Livneh ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
Regarding varnish cacheability of mobile API requests with a logging query param - it would probably be worth making frontend varnishes strip out all occurrences of that query param and its value from their backend requests so they're all the same to the caching instances. A generic param name that can take any value would allow for adding as many extra log values as needed, limited only by the uri log field length. l=mft2l=mfstable etc. So still an edge cache change but the result is more flexible while avoiding changing the fixed field length log format across unrelated systems like text squids or image caches. On Sunday, February 3, 2013, Asher Feldman wrote: If you want to differentiate categories of API requests in logs, add descriptive noop query params to the requests. I.e mfmode=2. Doing this in request headers and altering edge config is unnecessary and a bad design pattern. On the analytics side, if parsing query params seems challenging vs. having a fixed field to parse, deal. On Sunday, February 3, 2013, David Schoonover wrote: Huh! News to me as well. I definitely agree with that decision. Thanks, Ori! I've already written the Varnish code for setting X-MF-Mode so it can be captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or at least, MF-Mode? Looking especially to hear from Arthur and Matt. -- David Schoonover d...@wikimedia.org On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanli...@wikimedia.orgwrote: Thanks Ori, I was not aware of this D Sent from my iPhone On 2013-02-02, at 16:55, Ori Livneh o...@wikimedia.org wrote: On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote: I don't like it's cryptic nature. Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b». Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary But that also means sending more bytes through the wire :S Well, you can (and should) drop the 'X-' :-) See http://tools.ietf.org/html/rfc6648: Deprecating the X- Prefix and Similar Constructs in Application Protocols -- Ori Livneh ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
That's not at all true in the real world. Look at the actual requests for google analytics on a high percentage of sites, etc. Setting new request headers for mobile that map to new inflexible fields in the log stream that must be set on all non mobile requests (\t-\t-) equals gigabytes of unnecessarily log data every day (that we want to save 100% of) for no good reason. Wanting to keep query params pure isn't a good reason. On Sunday, February 3, 2013, Tyler Romeo wrote: Considering that the query component of a URI is meant to identify the resource whereas HTTP headers are meant to tell the server additional information about the request, I think a header approach is much more appropriate than a no-op query parameter. If the X- is removed, I'd have no problem with the addition of these headers, but what is the advantage of having two over one. Wouldn't a header like: MobileFrontend: 1/2 a/b/s work just as fine? *--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerro...@gmail.com javascript:; On Sun, Feb 3, 2013 at 4:35 AM, Asher Feldman afeld...@wikimedia.orgjavascript:; wrote: Regarding varnish cacheability of mobile API requests with a logging query param - it would probably be worth making frontend varnishes strip out all occurrences of that query param and its value from their backend requests so they're all the same to the caching instances. A generic param name that can take any value would allow for adding as many extra log values as needed, limited only by the uri log field length. l=mft2l=mfstable etc. So still an edge cache change but the result is more flexible while avoiding changing the fixed field length log format across unrelated systems like text squids or image caches. On Sunday, February 3, 2013, Asher Feldman wrote: If you want to differentiate categories of API requests in logs, add descriptive noop query params to the requests. I.e mfmode=2. Doing this in request headers and altering edge config is unnecessary and a bad design pattern. On the analytics side, if parsing query params seems challenging vs. having a fixed field to parse, deal. On Sunday, February 3, 2013, David Schoonover wrote: Huh! News to me as well. I definitely agree with that decision. Thanks, Ori! I've already written the Varnish code for setting X-MF-Mode so it can be captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or at least, MF-Mode? Looking especially to hear from Arthur and Matt. -- David Schoonover d...@wikimedia.org On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanli...@wikimedia.orgwrote: Thanks Ori, I was not aware of this D Sent from my iPhone On 2013-02-02, at 16:55, Ori Livneh o...@wikimedia.org wrote: On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote: I don't like it's cryptic nature. Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b». Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary But that also means sending more bytes through the wire :S Well, you can (and should) drop the 'X-' :-) See http://tools.ietf.org/html/rfc6648: Deprecating the X- Prefix and Similar Constructs in Application Protocols -- Ori Livneh ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
On Sunday, February 3, 2013, Tyler Romeo wrote: Remind me again why a production setup is logging every header of every request? That's ludicrous. Please reread our udplog format documentation and this entire thread carefully, especially the first message before commenting any further. Also, if you are logging every header, then the amount of data added by a single extra header would be insignificant compared to the rest of the request. *--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerro...@gmail.com javascript:; On Sun, Feb 3, 2013 at 5:12 AM, Asher Feldman afeld...@wikimedia.orgjavascript:; wrote: That's not at all true in the real world. Look at the actual requests for google analytics on a high percentage of sites, etc. Setting new request headers for mobile that map to new inflexible fields in the log stream that must be set on all non mobile requests (\t-\t-) equals gigabytes of unnecessarily log data every day (that we want to save 100% of) for no good reason. Wanting to keep query params pure isn't a good reason. On Sunday, February 3, 2013, Tyler Romeo wrote: Considering that the query component of a URI is meant to identify the resource whereas HTTP headers are meant to tell the server additional information about the request, I think a header approach is much more appropriate than a no-op query parameter. If the X- is removed, I'd have no problem with the addition of these headers, but what is the advantage of having two over one. Wouldn't a header like: MobileFrontend: 1/2 a/b/s work just as fine? *--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerro...@gmail.com javascript:;javascript:; On Sun, Feb 3, 2013 at 4:35 AM, Asher Feldman afeld...@wikimedia.orgjavascript:; javascript:; wrote: Regarding varnish cacheability of mobile API requests with a logging query param - it would probably be worth making frontend varnishes strip out all occurrences of that query param and its value from their backend requests so they're all the same to the caching instances. A generic param name that can take any value would allow for adding as many extra log values as needed, limited only by the uri log field length. l=mft2l=mfstable etc. So still an edge cache change but the result is more flexible while avoiding changing the fixed field length log format across unrelated systems like text squids or image caches. On Sunday, February 3, 2013, Asher Feldman wrote: If you want to differentiate categories of API requests in logs, add descriptive noop query params to the requests. I.e mfmode=2. Doing this in request headers and altering edge config is unnecessary and a bad design pattern. On the analytics side, if parsing query params seems challenging vs. having a fixed field to parse, deal. On Sunday, February 3, 2013, David Schoonover wrote: Huh! News to me as well. I definitely agree with that decision. Thanks, Ori! I've already written the Varnish code for setting X-MF-Mode so it can be captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or at least, MF-Mode? Looking especially to hear from Arthur and Matt. -- David Schoonover d...@wikimedia.org On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanli...@wikimedia.orgwrote: Thanks Ori, I was not aware of this D Sent from my iPhone On 2013-02-02, at 16:55, Ori Livneh o...@wikimedia.org wrote: On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote: I don't like it's cryptic nature. Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b». Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary But that also means sending more bytes through the wire :S Well, you can (and should) drop the 'X-' :-) See http://tools.ietf.org/html/rfc6648: Deprecating the X- Prefix and Similar Constructs in Application Protocols -- Ori Livneh ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] mariadb 5.5 in production for english wikipedia
On Wed, Dec 12, 2012 at 6:45 AM, Antoine Musso hashar+...@free.fr wrote: Le 12/12/12 01:10, Asher Feldman a écrit : This afternoon, I migrated one of the main production English Wikipedia slaves, db59, to MariaDB 5.5.28. Congratulations :-) Out of curiosity, have you looked at Drizzle too? I've spoken with Drizzle developers at OSCON in the past. I haven't seen anyone advocate it as a production quality database though, and it doesn't currently seem to have a lot of development momentum behind it, with Brian Aker no longer putting in a lot of time. Lots of interesting ideas and features, especially around replication, but they make it incompatible with MySQL in enough ways where a gradual migration wouldn't be practical even if it was otherwise desirable. -A ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] mariadb 5.5 in production for english wikipedia
On Wednesday, December 12, 2012, David Gerard wrote: On 12 December 2012 15:32, Thomas Fellows thomas.fell...@gmail.comjavascript:; wrote: This is awesome! Is there any write-up of the migration process floating around? +1 In fact, this would be a nice thing to put on the WMF blog. It'll certainly get a lot of linkage and reporting around the geekosphere. A detailed blog post is definitely my intent, I'm just waiting until at least one major project is 100% on mariadb and I have more data and hence confidence in drawn conclusions. I don't think that's far off at all, potentially later this month. If that occurs and goes well, the eqiad data center migration in late January may also be a migration to all mariadb. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] mariadb 5.5 in production for english wikipedia
Hi, This afternoon, I migrated one of the main production English Wikipedia slaves, db59, to MariaDB 5.5.28. We've previously been testing 5.5.27 on the primary research slave, and I've been testing the current build for the last few days on a slave in eqiad. All has looked good, and I spent the last few days adapting our monitoring and metrics collection tools to the new version, and building binary packages that meet our needs. A main gotcha in major version upgrades is performance regressions due to changes in query plans. I've seen no sign of this, and my initial assessment is that performance for our workload is on par with or slightly improved over the 5.1 facebook patchset. Taking the times of 100% of all queries over regular sample windows, the average query time across all enwiki slave queries is about 8% faster with MariaDB vs. our production build of 5.1-fb. Some queries types are 10-15% faster, some are 3% slower, and nothing looks aberrant beyond those bounds. Overall throughput as measured by qps has generally been improved by 2-10%. I wouldn't draw any conclusions from this data yet, more is needed to filter out noise, but it's positive. MariaDB has some nice performance improvements that our workload doesn't really hit (better query optimization and index usage during joins, much better sub query support) but there are also some things, such as full utilization of the primary key embedded on the right of every secondary index that we can take advantage of (and improve our schema around) once prod is fully upgraded, hopefully over the next 1-2 months. The main goal of migrating to MariaDB is not performance driven. More so, I think it's in WMF's and the open source communities interest to coalesce around the MariaDB Foundation as the best route to ensuring a truly open and well supported future for mysql derived database technology. Performance gains along the way are icing on the cake. -Asher ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Ops] mariadb 5.5 in production for english wikipedia
On Tue, Dec 11, 2012 at 5:49 PM, Terry Chay tc...@wikimedia.org wrote: Nice! The main goal of migrating to MariaDB is not performance driven. More so, I think it's in WMF's and the open source communities interest to coalesce around the MariaDB Foundation as the best route to ensuring a truly open and well supported future for mysql derived database technology. Performance gains along the way are icing on the cake. If it works out, then at some point we should probably tell the MariaDB peeos that they can mention that the WMF uses it. :-) We've been talking to Monty Widenius who visited the WMF office prior to the Foundation announcement, and are fostering mutual support between the Wikimedia and MariaDB Foundations. Win-win for the open source community at large! -A ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] 2013 Antonio Pizzigati Prize for Software in the Public Interest
If anyone is interested, or knows of a worthy entrant, the application deadline for the 2013 Antonia Pizzigati prize, honoring open source software development in the public interest, has been extended to Friday, 14 December 2012. http://www.tides.org/impact/awards-prizes/pizzigati-prize/ The Antonio Pizzigati Prize for Software in the Public Interest annually awards a $10,000 cash grant to one individual who has created or led an effort to create an open source software product of significant value to the nonprofit sector and movements for social change. The Pizzigati Prize honors the brief life of Tony Pizzigati ( http://www.tides.org/impact/awards-prizes/pizzigati-prize/tony/ - he does not have a Wikipedia entry), an early advocate of open source computing. -Asher ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Let's talk about Solr
Hi all, I'm excited to see that Max has made a lot of great progress in adding Solr support to the GeoData extension so that we don't have to use mysql for spatial search - https://gerrit.wikimedia.org/r/#/c/27610/ GeoData makes use of the Solarium php client, which is currently included as a part of the extension. GeoData will be our second use of Solar, after TranslationMemory extension which is already deployed - https://www.mediawiki.org/wiki/Help:Extension:Translate/Translation_memoriesand the Wikidata team is working on using Solr in their extensions as well. TranslationMemory also uses Solarium, a copy of which is also bundled with and loaded from the extension. For a loading and config example - https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-config.git;a=blob;f=wmf-config/CommonSettings.php;h=1e7a0e24dcbea106042826474607ec065d328472;hb=HEAD#l2407 I think Solr is the right direction for us to go in. Current efforts can pave the way for a complete refresh of WMF's article full text search as well as how our developers approach information retrieval. We just need to make sure that these efforts are unified, with commonality around the client api, configuration, indexing (preferably with updates asynchronously pushed to Solr in near real-time), and schema definition. This is important from an operational aspect as well, where it would be ideal to have a single distributed and redundant cluster. It would be great to see the i18n, mobile tech, wikidata, and any other interested parties collaborate and agree on a path forward, with a quick sprint around common code that all can use. -Asher ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] GC cache entry
I think the latter case is a likely candidate, as we see a couple hundred apache worker segfaults daily, in both php5 and libxml2 space. The first case is likely a bug in php core and it's worth checking whether we'd see the same behavior running a current php release. Core analysis may help us determine if a reproduceable state always leads to the crash. Similarly with libxml2. I suppose we'd have to patch apc with additional logging to really know for sure that this is the cause. From your understanding of the apc source, when would such items ever actually be freed? Only on apache restart? On Tuesday, October 2, 2012, Patrick Reilly wrote: Time to time, we receive a strange warning message in fenari:/home/wikipedia/log/syslog/apache.log Oct 3 01:01:03 10.0.11.59 apache2[20535]: PHP Warning: * require() [a href='function.require'function.require/a]: GC cache entry '/usr/local/apache/common-local/wmf-config/ExtensionMessages-1.20wmf12.php' (dev=2049 ino=10248005) was on gc-list for 601 seconds in /usr/local/apache/common-local/php-1.20wmf12/includes/AutoLoader.php* on line 1150 Definitely this issue comes from *APC*, source code from package apc-3.1.6-r1. When item is inserted into user cache or file cache, this function is called. static void process_pending_removals(apc_cache_t* cache TSRMLS_DC) { slot_t** slot; time_t now; /* This function scans the list of removed cache entries and deletes any * entry whose reference count is zero (indicating that it is no longer * being executed) or that has been on the pending list for more than * cache-gc_ttl seconds (we issue a warning in the latter case). */ if (!cache-header-deleted_list) return; slot = cache-header-deleted_list; now = time(0); while (*slot != NULL) { int gc_sec = cache-gc_ttl ? (now - (*slot)-deletion_time) : 0; if ((*slot)-value-ref_count = 0 || gc_sec cache-gc_ttl) { slot_t* dead = *slot; if (dead-value-ref_count 0) { switch(dead-value-type) { case APC_CACHE_ENTRY_FILE: apc_warning(GC cache entry '%s' (dev=%d ino=%d) was on gc-list for %d seconds TSRMLS_CC, dead-value-data.file.filename, dead-key.data.file.device, dead-key.data.file.inode, gc_sec); break; case APC_CACHE_ENTRY_USER: apc_warning(GC cache entry '%s'was on gc-list for %d seconds TSRMLS_CC, dead-value-data.user.info, gc_sec); break; } } *slot = dead-next; free_slot(dead TSRMLS_CC); } else { slot = (*slot)-next; } } } From APC configuration ( http://us.php.net/manual/en/apc.configuration.php#ini.apc.gc-ttl ) *apc.gc_ttl integer* The number of seconds that a cache entry may remain on the garbage-collection list. This value provides a fail-safe in the event that a server process dies while executing a cached source file; if that source file is modified, the memory allocated for the old version will not be reclaimed until this TTL reached. Set to zero to disable this feature. We get messages GC cache entry '%s' (dev=%d ino=%d) was on gc-list for %d seconds or GC cache entry '%s'was on gc-list for %d seconds in this condition: (gc_sec cache-gc_ttl) (dead-value-ref_count 0) First condition means, item was deleted later then apc.gc_ttl seconds ago and its still in garbage collector list. Seconds condition means, item is still referenced. e.g., when a process unexpectedly died, reference is not decreased. First apc.ttl seconds is active in APC cache, then is deleted (there isn't next hit on this item). Now item is on garbage collector list (GC) and apc.gc_ttl timeout is running. When apc.gc_ttl is less then (now - item_deletion_time), warning is written and item is finally completely flushed. So what should we do? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Can we kill DBO_TRX? It seems evil!
On Wednesday, September 26, 2012, Daniel Kinzler wrote: I see your point. But if we have the choice between lock contention and silent data loss, which is better? This isn't really a choice - by default, when a statement in mysql hits a lock timeout, it is rolled back but the transaction it's in is not. That can also lead to data loss via partial writes in real world cases if not properly accounted for by the application. Avoiding holding locks longer than needed really should be paramount. Developers need to adapt to cases where transaction semantics alone can't guarantee consistancy across multiple write statements. We're planning on sharding some tables this year and there will be cases where writes will have to go to multiple database servers, likely without the benefit of two phase commit. That doesn't mean that we should give up on consistancy or that we shouldn't try to do better, but not in exchange for more lock contention. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Can we kill DBO_TRX? It seems evil!
On Wed, Sep 26, 2012 at 4:07 AM, Daniel Kinzler dan...@brightbyte.dewrote: On 26.09.2012 12:06, Asher Feldman wrote: On Wednesday, September 26, 2012, Daniel Kinzler wrote: I see your point. But if we have the choice between lock contention and silent data loss, which is better? This isn't really a choice - by default, when a statement in mysql hits a lock timeout, it is rolled back but the transaction it's in is not. Uh. That sounds evil and breaks the A in ACID, no? Why isn't the entire transaction rolled back in such a case? There's a distinction (possibly misguided) between cases where a statement can be retried with an expectation of success, and cases that aren't which trigger an implicit rollback. Deadlocks are considered the latter by mysql, they result in a transaction rollback. Oracle behaves the same way as mysql with regards to lock timeouts - it's up to developers to either retry the timed-out statement, or rollback. The results can definitely be evil if not handled correctly, but it's debatable if it's a violation of atomicity. If lock timeout throws an exception that closes the connection to mysql, at least that will result in a rollback. If the connection is pooled and reused, it can likely result in a commit. Mysql does offer a rollback_on_timeout option that changes the default behavior. We can enable it at wmf, but since that may not be an option for many installs, it's better to work around it. That can also lead to data loss via partial writes in real world cases if not properly accounted for by the application. How could we detect such a case? I can't think of a way that's actually good. Better to account for the behavior. That doesn't mean that we should give up on consistancy or that we shouldn't try to do better, but not in exchange for more lock contention. Well, improving consistency and avoiding data loss is going to be hard without the use of locks... how do you propose to do that? We could try to identify cases where consistency is extremely important, vs. where it isn't. In the cases where a very important lock holding transaction will be entered, can we defer calling hooks or doing anything unrelated until that transaction is closed at its intended endpoint? If so, perhaps everything else can be subject to current behavior, where unrelated code can call commit. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Indexing sha1 hashes in mysql
As we've increased our use of sha1 hashes to identify unique content over the past year, I occasionally see changesets or discussions about indexing sha1's in mysql. When indexing a text field, it's generally beneficial to define the smallest index that still uniquely matches a high percentage of rows. Search and insert performance both benefit from the space savings. As a cryptographic hash function, sha1 has a very high degree of uniformity. We can estimate the percent of partial index look-ups that will match a unique result just by comparing the size of the table to the space covered by the index. sha1 hashes are 160bits, which mediawiki stores in mysql with base36 encoding. base36(2^160) == twj4yidkw7a8pn4g709kzmfoaol3x8g. Looking at enwiki.revision.rev_sha1, the smallest current value is 02xi72hkkhn1nvfdeffgp7e1w3s and the largest, twj4yi9tgesxysgyi41bz16jdkwroha. The number of combinations covered by indexing the top bits represented by the left-most 4 thru 10 characters: sha1_index(4) = 1395184 (twj4) sha1_index(5) = 50226658 (twj4y) sha1_index(6) = 1808159706 (twj4yi) sha1_index(7) = 65093749429 (twj4yid) sha1_index(8) = 2343374979464 (twj4yidk) sha1_index(9) = 84361499260736 (twj4yidkw) sha1_index(10) = 3037013973386503 (twj4yidkw7) percentage of unique matches in a table of 2B sha1's: sha1_index(7) = 96.92% sha1_index(8) = 99.91% sha1_index(9) = 99.997% sha1_index(10) = 99.% percentage of unique matches in a table of 10B sha1's: sha1_index(8) = 99.573% sha1_index(9) = 99.988% sha1_index(10) = 99.9996% Given current table sizes and growth rates, an 8 character index on a sha1 column should be sufficient for years for many cases (i.e. media files outside of commons, revisions on projects outside of the top 10), while a 10 character index still provides 99.99% coverage of 100 billion sha1's. Caveat: The likely but rare worst case for a partial index is that we may have tables with hundreds of rows containing the same sha1, perhaps revisions of a page that had a crazy revert war. A lookup for that specific sha1 will have to do secondary lookups for each match, as would lookups of any other sha1 that happens to collide within the index space. If the index is large enough to make the later case quite unlikely, prudent use of caching can address the first. tl;dr Where an index is desired on a mysql column of base36 encoded sha1 hashes, I recommend ADD INDEX (sha1column(10)). Shorter indexes will be sufficient in many cases, but this is still provides a 2/3 space savings while covering a huge (2^51.43) space. -Asher ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Indexing sha1 hashes in mysql
Base36 certainly isn't the most efficient way to store a sha1, but it's what is in use all over mediawiki. I think there was some discussion on this list of the tradeoffs of different methods when revision.rev_sha1 was added, and base36 was picked as a compromise. I don't know why base36 was picked over base62 once it was decided to stick with an ascii alpha-numeric encoding but regardless, there was opposition to binary. Taken on its own, an integer index would be more efficient but I don't think it makes sense if we continue using base36. On Tue, Sep 25, 2012 at 11:20 AM, Artur Fijałkowski wiki.w...@gmail.comwrote: tl;dr Where an index is desired on a mysql column of base36 encoded sha1 hashes, I recommend ADD INDEX (sha1column(10)). Shorter indexes will be sufficient in many cases, but this is still provides a 2/3 space savings while covering a huge (2^51.43) space. Isn't it better to store BIGINT containing part of (binary) sha1 and use index on numeric column? AJF/WarX ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] mediawiki profiling presentation slide deck
Here's the slide deck for the mediawiki profiling presentation I gave at WMF Tech Days 2012 yesterday: https://commons.wikimedia.org/wiki/File:MediaWikiPerformanceProfiling.pdf -Asher ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] scaled media (thumbs) as *temporary* files, not stored forever
On Tue, Sep 4, 2012 at 3:11 PM, Platonides platoni...@gmail.com wrote: On 03/09/12 02:59, Tim Starling wrote: I'll go for option 4. You can't delete the images from the backend while they are still in Squid, because then they would not be purged when the image is updated or action=purge is requested. In fact, that is one of only two reasons for the existence of the backend thumbnail store on Wikimedia. The thumbnail backend could be replaced by a text file that stores a list of thumbnail filenames which were sent to Squid within a window equivalent to the expiry time sent in the Cache-Control header. The other reason for the existence of the backend thumbnail store is to transport images from the thumbnail scalers to the 404 handler. For that purpose, the image only needs to exist in the backend for a few seconds. It could be replaced by a better 404 handler, that sends thumbnails directly by HTTP. Maybe the Swift one does that already. -- Tim Starling The second one seems easy to fix. The first one should IMHO be fixed in squid/varnish by allowing wildcard purges (ie. PURGE /wikipedia/commons/thumb/5/5c/Tim_starling.jpg/* HTTP/1.0) fast.ly implements group purge for varnish like this via a proxy daemon that watches backend responses for a tag response header (i.e. all resolutions of Tim_starling.jpg would be tagged that) and builds an in-memory hash of tags-objects which can be purged on. I've been told they'd probably open source the code for us if we want it, and it is interesting (especially to deal with the fact that we don't purge articles at all of their possible url's) albeit with its own challenges. If we implemented a backend system to track thumbnails that exist for a given orig, we may be able to remove our dependency on swift container listings to purge images, paving the way for a second class of thumbnails that are only cached. A wiki with such setup could then disable the on-disk storage. I think this is entirely doable, but scaling the imagescalers to support cache failures at wmf scale would be a waste, except perhaps for non-standard sizes that aren't widely used. I like Brion's thoughts on revamping image handling, and would like to see semi-permanent (in swift) storage of a standardized set of thumbnail resolutions but we could still support additional resolutions. Browser scaling is also at least worth experimenting with. Instances where browser scaling would be bad are likely instances where the image is already subpar if viewed on a high-dpi / retina display. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikimedians are rightfully wary
While I don't agree with the negative sentiment around experimentation, I think there's value both in MZMcBride's op-ed, and in the comment thread that follows. He correctly calls out some of our long term organizational failings around product planning, resource allocation, execution, and follow-thru. It's almost as painful to read about LiquidThreads as it is to use talk pages today, eight years after the LT project was first proposed. Are we learning from our failures? The criticism around AFTv5 in terms of product design (nevermind the code) is largely echoed in the comments, yet we seem rather sure that we're giving editors a tool of importance. My daily sampling of what's flowing into the enwiki db from the feature appears to be 99% garbage, with the onus being on volunteers to sort the wheat from the chaff. If we had a dead simple, highly function, and well designed discussion system (see LiquidThreads), wouldn't that be the ideal route for high value feedback from knowledgeable non-editors instead of an anonymous one-way text box at the bottom of the articles that's guaranteed to be a garbage collector? The one thing the op-ed seems to miss is that one of the main goals of the foundation is to attract new editors and improve the editing experience. I think development in that direction (visual editor with a new parser especially) is hugely promising but we also need to remain cognizant of the needs of our community, take care in allocating resources, and integrate feedback lest our efforts mistakenly contribute to our retention problem. On Tue, Aug 21, 2012 at 10:10 AM, Tyler Romeo tylerro...@gmail.com wrote: Hey, Not sure if anybody has seen this article yet: https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2012-08-20/Op-ed Thought it was interesting and possibly worth discussion. --Tyler Romeo ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Gerrit downtime this Friday
Mission accomplished! On Tue, Jul 17, 2012 at 5:26 PM, Asher Feldman afeld...@wikimedia.orgwrote: Hi All, Ryan Lane and I are migrating gerrit's db to a server in eqiad (where the gerrit app server is located) on Friday, and have a downtime window of 18:00-19:00 UTC (11am-12pm PDT). Actual downtime should be shorter. Gerrit makes many mysql queries for some page requests; this will improve the latency of such pages. Additionally, the new db will have both slow and sample based query profiling in ishmael which should assist with further optimizations. -A ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Gerrit downtime this Friday
Hi All, Ryan Lane and I are migrating gerrit's db to a server in eqiad (where the gerrit app server is located) on Friday, and have a downtime window of 18:00-19:00 UTC (11am-12pm PDT). Actual downtime should be shorter. Gerrit makes many mysql queries for some page requests; this will improve the latency of such pages. Additionally, the new db will have both slow and sample based query profiling in ishmael which should assist with further optimizations. -A ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Guidelines for db schema changes
Hi all, I'd like to remind everyone involved in development that requires db schema migrations - please keep in mind the three related guidelines in our official deployment policies - http://www.mediawiki.org/wiki/Development_policy#Database_patches - especially the third, which is to make schema changes optional. Once a migration has been reviewed, please update http://wikitech.wikimedia.org/view/Schema_changes with all pertinent details, then get in touch for deployment scheduling. There are good and legitimate reasons to not follow the make schema changes optional policy but if that's the case, please provide 3-7 days of lead time, depending on the size of tables and number of effected wikis. Best, Asher On Mon, May 14, 2012 at 7:25 PM, Rob Lanphier ro...@wikimedia.org wrote: On Tue, Apr 24, 2012 at 5:52 PM, Rob Lanphier ro...@wikimedia.org wrote: Assuming this seems sensible to everyone, I can update this page with this: http://www.mediawiki.org/wiki/Development_policy And this is done now. In case you aren't using a threaded mail client, here's the original discussion: http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/60967 Rob ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Error after deletion
commons was read-only for about 60 seconds while I was switching the master this afternoon due to issues necessitating a kernel upgrade. On Mon, May 14, 2012 at 6:00 PM, Bináris wikipo...@gmail.com wrote: I deleted a test article from huwiki, and this was the result instead of the usual success message. However, the deletion has been completed. A database error has occurred. Did you forget to run maintenance/update.php after upgrading? See: https://www.mediawiki.org/wiki/Manual:Upgrading#Run_the_update_script Query: DELETE FROM `globalimagelinks` WHERE gil_wiki = 'huwiki' AND gil_page = '929875' Function: GlobalUsage::deleteLinksFromPage Error: 1290 The MySQL server is running with the --read-only option so it cannot execute this statement (10.0.6.61) -- Bináris ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Guidelines for db schema changes
I am generally in favor of all of this and in the meeting that proceeded Rob's email, proposed that we develop a new schema migration tool for mediawiki along similar lines. Such a beast would have to work in all deployment cases without modifications (stock single wiki installs and at wmf with many wikis across multiple masters with tiered replication), be idempotent when run across many databases, track version and state per migration, and include up/down steps in every migration. There are opensource php migration tools modeled along those used by the popular ruby and python frameworks. I deployed https://github.com/davejkiger/mysql-php-migrations at kiva.org a couple years ago where it worked well and is still in use. Nothing will meet our needs off the shelf though. A good project could at best be forked into mediawiki with modifications if the license allows it, or more likely serve as a model for our own development. On Tue, Apr 24, 2012 at 11:27 PM, Faidon Liambotis fai...@wikimedia.orgwrote: In other systems I've worked before, such problems have been solved by each schema-breaking version providing schema *and data* migrations for both forward *and backward* steps. This means that the upgrade transition mechanism knew how to add or remove columns or tables *and* how to fill them with data (say by concatenating two columns of the old schema). The same program would also take care to do the exact opposite steps in a the migration's backward method, in case a rollback was needed. Down migrations aid development; I find them most useful as documentation of prior state, making a migration readable as a diff. They generally aren't useful in production environments at scale though, which developers removed from the workings of production need to be aware of. Even with transparent execution of migrations, the time it takes to apply changes will nearly always be far outside of the acceptable bounds of an emergency response necessitating a code rollback. So except in obvious cases such as adding new tables, care is needed to keep forward migration backwards compatible with code as much as possible. The migrations themselves can be kept in the source tree, perhaps even versioned and with the schema version kept in the database, so that both us and external users can at any time forward their database to any later version, automagically. Yep. That we have to pull in migrations from both core and many extensions (many projects, one migration system) while also running different sets of extensions across different wikis intermingling on the same database servers adds some complexity but we should get there. -Asher ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Guidelines for db schema changes
Thanks, hashar! On Wed, Apr 25, 2012 at 12:12 AM, Antoine Musso hashar+...@free.fr wrote: Le 25/04/12 02:52, Rob Lanphier a écrit : 3. For anything that involves a schema change to the production dbs, make sure Asher Feldman (afeld...@wikimedia.org) is on the reviewer list. He's already keeping an eye on this stuff the best he can, but it's going to be easy for him to miss changes in extensions should they happen. I am pretty sure Jenkins could detect a change is being made on a .sql file and then add a specific reviewer using Gerrit CLI tool. Logged as: https://bugzilla.wikimedia.org/36228 -- Antoine hashar Musso ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Redirect rules ShortURL deployment - need to make a decision
On Fri, Apr 20, 2012 at 8:01 AM, Mark A. Hershberger m...@wikimedia.orgwrote: Sumana Harihareswara suma...@wikimedia.org writes: Please leave your comments at bug 1450 so we can decide how to write the rewrite rule. Since Gerrit makes review possible and the relevant Apache config (redirects.conf) is on noc and *should* be in git, I've gone ahead and (after discussing how to proceed with Ops) submitted a configuration to Gerrit: https://gerrit.wikimedia.org/r/5433 I had to give this a -2 since the rewrite rule was broken and we don't deploy application configs tied to mediawiki via puppet or currently plan to do so. For that reason, I don't want this stuff dumped ad-hoc in the puppet repo (the reason for the -2.) The change itself is straight forward, I just have one follow-up question about scope which I'll ask over at the ticket. -A ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MobileFrontend persistent cookie overhaul and caching weirdness
MW needs full etag support, with hooks for extensions. Without it, we can't widely support caching in the case you've outlined. Different browsers handle the Vary header differently. Some treat Varies as don't cache. Chrome (possibly other webkit browsers) treats it as a marker to revalidate whatever varient is cached. It sends an If-Modified-Since and if there's an etag, If-None-Match header. If MediaWiki provided etags, calculated them differently based on login status, mobilefrontend, etc., and used them for If-None-Match requests, we could handle browser caching sanely. The LoggedOut cookie behavior that Daniel described could provide a less than ideal workaround if set with an updated timestamp on each view switch but I'd rather not see this exploited further. It breaks squid caching in our setup which lessens the user experience. On Thu, Apr 12, 2012 at 12:18 PM, Arthur Richards aricha...@wikimedia.orgwrote: Per bug 35842, I've overhauled the persistent cookie handling in the MobileFrontend extension. I think my changes will work fine on the WMF architecture where most of our sites have a separate domain for their mobile version. However, for sites that use a shared domain for both desktop and mobile views, there is major browser caching-related weirdness that I have not been able to figure out. Details can be found in the bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=35842 A little more context about the issue: we need to be able to allow people to switch between desktop/mobile views. We're currently doing this by setting a cookie when the user elects to switch their view, in order to keep that view persistent across requests. On the WMF architecture, we do some funky stuff at the proxy layer for routing requests, depending on detected device type and whether or not certain cookies are set for the user. Generally speaking the sites hosted on our cluster have a separate domain set up for their mobile versions, even though they're powered by the same backend. This makes view switching a bit easier, although I think the long-term hope is to get rid of mobile-specific domains. For sites that do not have a separate domain set up, we rely solely on cookies to handle user-selected view toggling. This seemed to generally work OK with the way we were previously handling these 'persistent cookies', but the previous way of cookie handling has been causing caching problems on our cluster. The changes I've introduced to hopefully resolve those issues result in browser-caching issues on single-domain sites using MobileFrontend, where after toggling the view and browsing to a page that was earlier viewed in the previous context, you might see a cached copy of the page from the previous context. No good. I'm stumped and am at a point where it's hard to see the forest through the trees. I could use some help to deal with this - if anyone has any insight or suggestions, I'm all ears! Thanks, Arthur -- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] selenium browser testing proposal and prototype
On Thu, Apr 5, 2012 at 5:25 PM, Ryan Lane rlan...@gmail.com wrote: How many languages can we reasonably support? We're currently using PHP, Python, Java, OCaml and Javascript (and probably more). Should we also throw Ruby in here as well? What level of support are the Selenium tests really going to get if they require developers to use Ruby? It might be good to see examples of what MW developers would actually have to do to implement new Selenium tests once the framework is complete. There's a login example in the github prototype that's straight forward but I assume it will get simpler as more is written which can be reused. I doubt it will require much in terms of actual ruby finesse. We've already gone down the Ruby road once. I think a lot of the people involved with that would say it was a bad call, especially ops. Ruby at scale can certainly be a lulz engine, especially for those on the sidelines. This project doesn't seem to place any software demands on the production cluster, or even necessarily require anything from ops though. I assume the road you refer to was the mobile gateway; I consider that to have been a train wreck primarily from a project standpoint as opposed to a technical one. When I stumbled upon it, there wasn't an employee with the combination of access and knowledge required to commit code changes to its read-only-to-us repo, and to deploy those changes. We were essentially passing bits of duct tape back and forth by transatlantic carrier pigeon. For a slew of reasons, it makes much more sense to do what we're doing now with MobileFrontend, but we've yet to reach the point where it does anything the ruby gateway couldn't have done with a bit of iteration. In its last incarnation, it was typically faster than the current MobileFrontend for a request not served by the frontend caching layer. The point being, I don't think language was the main issue there. Chris makes a compelling argument that his preferred route is closer to being off the shelf and widely supported by industry and community. I have no comment on what QA engineers prefer to hack on, but I think the ease of hiring new ones who are good at what they do and excited about the tools they get to use should be part of the decision. -A ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] enwiki revision schema migration in progress
On Tuesday, March 20, 2012, Roan Kattouw roan.katt...@gmail.com wrote: So yeah /normally/ you hit DB servers at random and different servers might respond differently (or be lagged to different degrees), but in this particular case it was always the same DB server returning the same lag value. Nothing strange going on here, this is how the maxlag parameter works. How do you feel about a switch to change that behavior (maxlag - 1)? It would be nice to be continue guiding developers towards throttling API requests around maxlag without complicating schema migrations by requiring config deployments before and after every db for this reason only. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] enwiki revision schema migration in progress
On Tuesday, March 20, 2012, Roan Kattouw roan.katt...@gmail.com wrote: On Tue, Mar 20, 2012 at 11:35 AM, Asher Feldman afeld...@wikimedia.org wrote: How do you feel about a switch to change that behavior (maxlag - 1)? It would be nice to be continue guiding developers towards throttling API requests around maxlag without complicating schema migrations by requiring config deployments before and after every db for this reason only. That sounds reasonable to me, what alternative behavior do you propose? A flag that, when enabled, causes maxlag to use the 2nd highest lag instead of the highest lag? That was my original thought. Jeremy's idea is good too, though I wonder if we could do something similar without depending on deployments. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] enwiki revision schema migration in progress
Just a heads up that the last of the 1.19 migrations, to add the sha1 column to enwiki.revision is going to be running throughout this week. Don't be alarmed by replication lag messages for s1 dbs in irc. I'm going to juggle which db watchlist queries go to during the migration, so nothing should be noticeable on the site. -Asher ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] enwiki revision schema migration in progress
I've temporarily commented out db36 from db.php on the cluster. This is a flaw in the how the client-side use of maxlag interacts with our schema migration process - we run migrations on slaves one by one in an automated fashion, only moving to the next after replication lag catches up. Mediawiki takes care of not sending queries to the lagged slave that is under migration. Meanwhile, maxlag always reports the value of the most lagged slave. Not a new issue, but this particular alter table on enwiki is likely the most time intensive ever run at wmf. It's slightly ridiculous. For this one alter, I can stop the migration script and run each statement by hand, pulling and re-adding db's one by one along the way, but this isn't a sustainable process. Perhaps we can add a migration flag to mediawiki, which if enabled, changes the behavior of maxlag and wfWaitForSlaves() to ignore one highly lagged slave so long as others are available without lag. -A On Mon, Mar 19, 2012 at 9:28 PM, MZMcBride z...@mzmcbride.com wrote: MZMcBride wrote: I'm not sure of the exact configuration, but it seems like nearly every API request is being handled by the lagged server (db36)? Or perhaps my scripts just have terrible luck. I added some prints to the code. Different servers are responding, but they're all unable to get past the lag, apparently: {u'servedby': u'srv234', u'error': {u'info': u'Waiting for 10.0.6.46: 21948 seconds lagged', u'code': u'maxlag'}} {u'servedby': u'srv242', u'error': {u'info': u'Waiting for 10.0.6.46: 21982 seconds lagged', u'code': u'maxlag'}} {u'servedby': u'mw20', u'error': {u'info': u'Waiting for 10.0.6.46: 21984 seconds lagged', u'code': u'maxlag'}} {u'servedby': u'mw45', u'error': {u'info': u'Waiting for 10.0.6.46: 21986 seconds lagged', u'code': u'maxlag'}} {u'servedby': u'mw14', u'error': {u'info': u'Waiting for 10.0.6.46: 21988 seconds lagged', u'code': u'maxlag'}} {u'servedby': u'mw42', u'error': {u'info': u'Waiting for 10.0.6.46: 21989 seconds lagged', u'code': u'maxlag'}} {u'servedby': u'mw3', u'error': {u'info': u'Waiting for 10.0.6.46: 21991 seconds lagged', u'code': u'maxlag'}} {u'servedby': u'srv230', u'error': {u'info': u'Waiting for 10.0.6.46: 22005 seconds lagged', u'code': u'maxlag'}} {u'servedby': u'srv259', u'error': {u'info': u'Waiting for 10.0.6.46: 22006 seconds lagged', u'code': u'maxlag'}} {u'servedby': u'srv274', u'error': {u'info': u'Waiting for 10.0.6.46: 22008 seconds lagged', u'code': u'maxlag'}} {u'servedby': u'srv280', u'error': {u'info': u'Waiting for 10.0.6.46: 22009 seconds lagged', u'code': u'maxlag'}} {u'servedby': u'srv236', u'error': {u'info': u'Waiting for 10.0.6.46: 22010 seconds lagged', u'code': u'maxlag'}} {u'servedby': u'srv230', u'error': {u'info': u'Waiting for 10.0.6.46: 22011 seconds lagged', u'code': u'maxlag'}} And it goes on and on. The relevant branch of code is: --- def __parseJSON(self, data): maxlag = True while maxlag: try: maxlag = False parsed = json.loads(data.read()) content = None if isinstance(parsed, dict): content = APIResult(parsed) content.response = self.response.items() elif isinstance(parsed, list): content = APIListResult(parsed) content.response = self.response.items() else: content = parsed if 'error' in content: error = content['error']['code'] if error == maxlag: lagtime = int(re.search((\d+) seconds, content['error']['info']).group(1)) if lagtime self.wiki.maxwaittime: lagtime = self.wiki.maxwaittime print(Server lag, sleeping for +str(lagtime)+ seconds) maxlag = True time.sleep(int(lagtime)+0.5) return False --- MZMcBride ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] First steps at making MobileFrontend usable beyond the WMF infrastructure
On Wed, Mar 14, 2012 at 5:08 PM, Arthur Richards aricha...@wikimedia.orgwrote: To follow up on this, I actually made some additional changes to how useformat works to simplify manually switching between mobile and desktop views which had been suggested by Brion Vibber. Take a look at: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/113865 This removes the Permanently disable mobile view text (broken for anyone other than the WMF anyway) and makes it so accessing the site with useformat=mobile in the URL (eg by clicking 'Mobile view' at the bottom of any page on a site with MobileFrontend enabled) will set a cookie which will ensure that you see the mobile view until either the cookie expires or you explicitly switch back to desktop view. It looks like permanently disable mobile view is broken completely as of last weeks mobilefrontend deployment. So its impossible to see how its supposed to behave currently, but a key part of it for wikipedia is that it takes you off the m site and disables squid's mobile redirection via the stopMobileRedirect=true cookie. It actually disables use of the .m. site as the text implies, not just disabling the mobilefrontend dom rewrite that you get when viewing the desktop version of a single article, which keeps you on the mobile site. Replacing this with a desktop view that leaves users permanently accessing the desktop site via m. isn't suitable for our environment. It may make sense for smaller sites without a dedicated mobile namespace but even in that case, some care is needed to ensure that any frontend caching dosen't get inadvertently polluted or unduly fragmented. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] First steps at making MobileFrontend usable beyond the WMF infrastructure
On Thu, Mar 15, 2012 at 12:10 PM, Brion Vibber br...@pobox.com wrote: Let's please kill the m. domain. IMO desktop and mobile users should use the same URLs; there should be sane device detection; and an easy override in both directions available at all times. This is blocked on migrating text from squid to varnish which is likely at least a few months off. Until then, MobileFrontend needs to continue supporting the current production reality. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] State of the 1.19 deployment
The larger time range includes the removal of the mysql based parsercache which is the cause of the primary decline and not related to 1.19. The time range of the originally mentioned graph just shows a bit of context before the enwiki deployment up until the time I posted it to irc last night. A key change as hashar suggests is a good theory but I'm not sure if the hit rate is actually recovering when looking at -24 or -8 hours, more time will tell. This may not be valid since the mysql pcache wasn't re-enabled (from an empty state) long before 1.19 but the rate of selects against it seems to be up day over day. Maybe a new key is consistently fetched before it would ever be set. 2012-02-29T11:24:00+00:00,10288.770238 2012-02-29T12:06:00+00:00,10992.754365 2012-02-29T12:48:00+00:00,11140.606746 2012-03-01T11:54:00+00:00,13912.613492 2012-03-01T12:36:00+00:00,13790.359524 2012-03-01T13:18:00+00:00,14176.010317 (from http://ganglia.wikimedia.org/latest/graph.php?c=MySQL%20pmtpah=db40.pmtpa.wmnetv=15608m=mysql_com_selectr=customz=defaultjr=js=st=1330622240cs=2%2F28%2F2012%2010%3A23ce=3%2F1%2F2012%200%3A56vl=stmtsti=mysql_com_selectcsv=1 ) On Thursday, March 1, 2012, Antoine Musso wrote: Le 01/03/12 09:50, Jeroen De Dauw a écrit : Hey, There's been a slight regression in our parser cache hit rate: http://bit.ly/w6Gy9t This one is probably more informative for people not aware of the usual hit rate http://goo.gl/YY80C Looks to me that the miss rate went up over 500% - is that really just a slight regression? :) We really want to use absolute time range: http://bit.ly/A7kcys Anyway, they are absent misses. Probably a key changed somewhere in our parser that magically invalidated roughly 15% of the parser cache. It seems to slowly recover afterward. By zooming and making the Y scale start at 50%, the event seems to have occurred on February 17th just before 6am UTC. I have uploaded a screenshot on mediawiki.org : https://www.mediawiki.org/**wiki/File:Parser_cache_hit_**20120217.pnghttps://www.mediawiki.org/wiki/File:Parser_cache_hit_20120217.png From the wikitech admin logs we have: 07:42 binasher:upgraded mysql on db40 to 5.1.53-facebook-r3753, enabled innodb_use_purge_thread --**--**-- 05:39 tstarling synchronized wmf-config/InitialiseSettings.**php 05:38 tstarling synchronized wmf-config/CommonSettings.php --**--**-- 05:35 Tim: on db40: reduced to 10M, should be causing massive delays, but the site's not down and the purge rate is lower if anything. Going to disable the mysql parser cache entirely. --**--**-- 05:25 Tim: on db40: purge lag is still increasing at 108 per second, so reducing innodb_max_purge_lag to 50M --**--**-- 05:21 Tim: on db40: giving the innodb manual the benefit of the doubt and following its advice, setting innodb_max_purge_lag to 100M, which should give a delay of 4.5ms --**--**-- 05:13 Tim: killing purgeParserCache.php since it is probably doing more harm than good --**--**-- 02:43 maplebed: deployed updated thumb_handler.php to ms5 to include Content-Length in generated images --**--**-- 02:34 logmsgbot: LocalisationUpdate completed (1.19) at Fri Feb 17 02:34:32 UTC 2012 --**--**-- https://wikitech.wikimedia.**org/view/Server_admin_loghttps://wikitech.wikimedia.org/view/Server_admin_log Going to disable the mysql parser cache entirely. Seems to have been reenabled on Feb 29th at 00:20: 00:20 Tim: reimported schema files on db40 and re-enabled mysql parser cache -- Antoine hashar Musso __**_ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Fwd: Revision tagging: use cases needed
+1 to adding to a modified version of change_tag, or something like it. While unfamiliar with the current tagging interface(s), the content of ct_tag seems arbitrary (possible movie studio tagger appears 4 times in enwiki.change_tag.ct_tag out of 2mil rows) and it probably makes sense to keep machine tagging automatically added at the time of an edit distinct from the apparent post-edit human/bot annotation use of ct_tag. Re: information on which automatic tags to hide, I don't think that should be stored with every row. Keeping that in configuration (where configuration options may consist of patterns to match) seems more appropriate. The primary use cases for this feature appear to be around offline analysis and I'd like to see design take into account the possibility of this table existing in a separate database from the revision table at some point in the future. -A On Wed, Feb 15, 2012 at 10:27 AM, Platonides platoni...@gmail.com wrote: change_tag table? Seems straightforward. The only thing is that we may not want to show some of those automatic tags by default, so we would have to introduce a new concept of a 'hidden' tag. There are several ways to accomplish that, a list in the configuration, adding a new column, storing it in ct_params, or just using a convention in the tag name for hidden ones. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Welcome, Andrew Otto - Software Developer for Analytics
This is great!! Welcome Andrew! On Fri, Jan 6, 2012 at 10:08 AM, Rob Lanphier ro...@wikimedia.org wrote: Hi everyone, I'm pleased to announce Andrew Otto will be coming to Wikimedia Foundation as a software developer in Platform Engineering, focused on analytics. We've been hiring for this spot for quite some time, and I'm happy we held out for Andrew. Andrew comes to us from CouchSurfing, where he worked for the past four years as one of the very early technical staff there, working in various places throughout the world (Thailand, Alaska, and New York are the ones I recall). His team scaled their systems from a few web servers and one monolithic database, to a cluster of over 30 machines handling almost 100 million page views per month. He was responsible for introducing Puppet for system configuration at his last job, and much of his work at CouchSurfing has been in reviewing code and maintaining a consistent architecture for CouchSurfing. We're really excited to have Andrew on board to help bring some systems rigor to our data gathering process. Our current data mining regime involves a few pieces of lightweight data gathering infrastructure (e.g. udp2log), a combination of one-off special purpose log crunching scripts, along with other scripts that started their lives as one-off special purpose scripts, but have gradually become core infrastructure. Most of these scripts have single maintainers, and there is a lot of duplication of effort. In addition, the systems have a nasty tendency to break at the least opportune times. Andrew's background bringing sanity to insane environments will be enormously helpful here. Andrew has an email address and is technically starting the onboarding process, but is still wrapping up at CouchSurfing. He'll be with us part-time starting January 17, and ramping up to full-time starting in April. Andrew is based out of Virginia, but is still traveling the world. Right now, you'll find him in New York City. Please join me in welcoming Andrew to the team! Rob ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] bugzilla + etherpad + misc service downtime - weds 21 dec, 18:00pst
Tonight's maintenance on db9 is completed, during which it was read-only for 7 minutes. I'm going to perform a second round of maintenance tomorrow at the same time (Thursday 18:00PST) which will provide the long-term fix to db9's woes. Availability of the same set of services will be interrupted for a similar length of time. -Asher On Mon, Dec 19, 2011 at 5:17 PM, Asher Feldman afeld...@wikimedia.orgwrote: Hi, We need around 20 minutes of downtime to all services that write to db9 for replication maintenance. Outside of services that support the ops team, this primarily means bugzilla, etherpad, and civicrm. It will remain available for read queries, however, so read usage of services such as the tech blog should continue along fine. I'm planning to start this on Weds at 18:00 PST and will send follow-up mail at the start and completion of maintenance. -Asher ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] bugzilla + etherpad + misc service downtime - weds 21 dec, 18:00pst
Hi, We need around 20 minutes of downtime to all services that write to db9 for replication maintenance. Outside of services that support the ops team, this primarily means bugzilla, etherpad, and civicrm. It will remain available for read queries, however, so read usage of services such as the tech blog should continue along fine. I'm planning to start this on Weds at 18:00 PST and will send follow-up mail at the start and completion of maintenance. -Asher ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] error lasted more than 10 minutes....
On Mon, Nov 28, 2011 at 12:06 PM, Roan Kattouw roan.katt...@gmail.comwrote: On Mon, Nov 28, 2011 at 8:59 PM, Neil Harris n...@tonal.clara.co.uk wrote: I hadn't thought properly about cache stampedes: since the parser cache is only part of page rendering, this might also explain some of the other occasional slowdowns I've seen on Wikipedia. It would be really cool if there could be some sort of general mechanism to enable this to be prevented this for all page URLs protected by memcaching, throughout the system. I'm not very familiar with PoolCounter but I suspect it's a fairly generic system for handling this sort of thing. However, stampedes have never been a practical problem for anything except massive traffic combined with slow recaching, and that's a fairly rare case. So I don't think we want to add that sort of concurrency protection everywhere. For memcache objects that can be grouped together into an ok to use if a bit stale bucket (such as all kinds of stats), there is also the possibility of lazy async regeneration. Data is stored in memcache with a fuzzy expire time, i..e { data:foo, stale:$now+15min } and a cache ttl of forever. When getting the key, if the time stamp inside marks the data as stale, you can 1) attempt to obtain a exclusive (acq4me) lock from poolcounter. If immediately successful, launch an async job to regenerate the cache (while holding the lock) but continue the request with stale data. In all other cases, just use the stale data. Mainly useful if the regeneration work is hideously expensive, such that you wouldn't want clients blocking on even a single cache regen (as is the behavior with poolcounter as deployed for the parser cache.) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] error lasted more than 10 minutes....
It appears that we were actually taken down by the reddit community, after a link to the fundraising stats page was posted under Brandon's IAMA there. sq71.wikimedia.org 943326197 2011-11-27T22:51:09.075 62032 109.125.42.71 TCP_MISS/200 1035 GET http://wikimediafoundation.org/wiki/Special:FundraiserStatistics ANY_PARENT/ 208.80.152.47 text/html * http://www.reddit.com/r/IAmA/comments/mr4pf/i_am_wikipedia_programmer_brandon_harris_ama/ * - Mozilla/5.0%20(Windows%20NT%206.1;%20WOW64)%20AppleWebKit/535.2%20(KHTML,%20like%20Gecko)%20Chrome/15.0.874.121%20Safari/535.2 That page wasn't suitable for high volume public consumption (very expensive db query + not properly cached), so the site problem persisted even after the db initially suspected as bad was rotated out. On Sun, Nov 27, 2011 at 2:39 PM, Erik Moeller e...@wikimedia.org wrote: We had a site outage of about 30 mins, caused by a major issue, potentially hardware-related, with a database server, which blocked all MediaWiki application servers (and thereby rendered most of our sites unusable). Should be fixed now; we'll prepare a more comprehensive incident analysis soon. Thanks to the ops team for their speedy response. All best, Erik -- Erik Möller VP of Engineering and Product Development, Wikimedia Foundation Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Wmfall] WMF Staff Announcement - Welcome Leslie Carr!
Yay! Welcome! On Mon, Oct 10, 2011 at 9:52 AM, CT Woo ct...@wikimedia.org wrote: All, Technical Operations department is pleased to announce another new fabulous staff member to its team. Please join us to welcome Leslie Carr , our Network Operations Engineer, starting today, 10/10/11. She is based in San Francisco office. Leslie comes with deep and rich experience in Network Operations, ranging from building and scaling a rapidly expanding high capacity global network with several large data-centres to designing and migrating systems networks from legacy setup to new state of the art infrastructure. Prior to joining us, Leslie was with Twitter, where she was responsible for implementation of a major data-centre and network migration. Before that, she worked at Craiglist as the main network architect who redesigned and scaled their network infrastructure. Leslie has also worked for Google, where she created, designed and deployed redundant and scalable network for their various data-centres. Leslie has two pet cats and is an avid bike enthusiast , who bikes annually from SF to LA, for AIDS LifeCycle. Please join me in welcoming Leslie Carr to WMF and do drop by to say hi to her. You will not miss her (hint - look out for a reddish pink blue hair!). Thanks, CT Woo ___ Wmfall mailing list wmf...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wmfall ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google's cached pages are much faster than wiki*edia's
On Thursday, October 6, 2011, IAlex ialex.w...@gmail.com wrote: Le 7 oct. 2011 à 06:21, Chad a écrit : Well we do serve the logged out cookie. What real purpose that serves, I don't know :) It's to bypass the browser cache, and to not let the user see a page with it's user name at the top when he just logged out. Couldn't deleting cookies have the same effect? If we do want to set or keep cookies on logout, do they need to be included in X-Vary-Options and bypass squid caching? We could also consider loading login/userbar stuff via javascript and allow logged in users to take advantage of squid caching provided care was taken for active editors. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Google's cached pages are much faster than wiki*edia's
From the wmf office in San Francisco, webcache.googleusercontent.comresolves to something geographically close and network RT time is around 25ms versus 86ms for en.wikipedia.org, 61ms in googles favor. From chrome in incognito mode to avoid sending wikipedia cookies, it takes me 391ms to fetch just the html for http://webcache.googleusercontent.com/search?q=cache:FXQPcAQ_2WIJ:en.wikipedia.org/wiki/Devo+wikipedia+devoamp;cd=1amp;hl=enamp;ct=clnkamp;gl=us vs 503ms for http://en.wikipedia.org/wiki/Devo. That difference of 112ms is less than the latency difference from two round trips, but the request depends on more, meaning that our squids are serving the content faster than google is. Pulling http://en.wikipedia.org/wiki/Devo from a host in our tampa datacenter takes an average of 3ms. If we had a west coast caching presence, I think we'd beat google's cache from our office, but I doubt we'll ever be able to compete with google on global points of caching presence, or network connectivity. Note that if you're using wikipedia from a browser that has been logged in within the last month, it is likely still sending cookies that bypass our squid caches even when logged out. On Fri, Sep 30, 2011 at 3:48 PM, jida...@jidanni.org wrote: Fellows, This is Google's cache of http://en.wikipedia.org/wiki/Devo. It is a snapshot of the page as it appeared on 28 Sep 2011 09:22:50 GMT. The current page could have changed in the meantime. Learn more ... Like why is it so much faster than the real thing? Even when not logged in. Nope, you may be one of the top ranked websites, but no by speed. So if you can't beat 'em join 'em. Somehow use Google's caches instead of your own. Something, anything, for a little more speed. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table
Since the primary use case here seems to be offline analysis and it may not be of much interest to mediawiki users outside of wmf, can we store the checksums in new tables (i.e. revision_sha1) instead of running large alters, and implement the code to generate checksums on new edits via an extension? Checksums for most old revs can be generated offline and populated before the extension goes live. Since nothing will be using the new table yet, there'd be no issues with things like gap lock contention on the revision table from mass populating it. On Mon, Sep 19, 2011 at 12:10 PM, Brion Vibber br...@pobox.com wrote: [snip] So just FYI -- the only *actual* controversy that needs to be discussed in this thread is how do we make this update applicable in a way that doesn't disrupt live sites with many millions of pages? We're pretty fixed on SHA-1 as a checksum sig (already using it elsewhere) and have no particular desire or need to change or think about alternatives; bikeshedding details of the formatting and storage are not at issue. -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)
Would it be possible to generate offline hashes for the bulk of our revision corpus via dumps and load that into prod to minimize the time and impact of the backfill? When using for analysis, will we wish the new columns had partial indexes (first 6 characters?) Is code written to populate rev_sha1 on each new edit? On Thu, Aug 18, 2011 at 7:40 AM, Diederik van Liere dvanli...@gmail.comwrote: Hi! I am starting this thread because Brion's revision r94289 reverted r94289 [0] stating core schema change with no discussion [1]. Bugs 21860 [2] and 25312 [3] advocate for the inclusion of a hash column (either md5 or sha1) in the revision table. The primary use case of this column will be to assist detecting reverts. I don't think that data integrity is the primary reason for adding this column. The huge advantage of having such a column is that it will not be longer necessary to analyze full dumps to detect reverts, instead you can look for reverts in the stub dump file by looking for the same hash within a single page. The fact that there is a theoretical chance of a collision is not very important IMHO, it would just mean that in very rare cases in our research we would flag an edit being reverted while it's not. The two bug reports contain quite long discussions and this feature has also been discussed internally quite extensively but oddly enough it hasn't happened yet on the mailinglist. So let's have a discussion! [0] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94289 [1] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94541 [2] https://bugzilla.wikimedia.org/show_bug.cgi?id=21860 [3] https://bugzilla.wikimedia.org/show_bug.cgi?id=25312 Best, Diederik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] https via GPU?
From the orig post Recent Intel CPU has a fature called AES-NIhttp://en.wikipedia.org/wiki/AES_instruction_set that accelerates AES processing. A CPU with AES-NI can perform 5 to 10 times faster than a CPU without it. We observe that a single core can perform 5 Gbps and 15 Gbps for encryption and decryption respectively. There's no longer a need for specialized hardware solutions in this space, GPU based or otherwise. On Fri, Jul 29, 2011 at 12:10 PM, Brion Vibber br...@pobox.com wrote: On Fri, Jul 29, 2011 at 11:53 AM, Jon Davis w...@konsoletek.com wrote: On Fri, Jul 29, 2011 at 11:29, Platonides platoni...@gmail.com wrote: Our servers don't have a GPU, so that would need a hardware upgrade. Yes, but if large scale SSL deployment increased CPU usage to the point of necessitating new hardware... the cost could be reduced by purchased GPU's for servers rather than bunches of entirely new boxes. Conceptually I think it is a cool idea. Most likely we'll end up with dedicated SSL termination subcluster, so those machines could be grabbed with whatever hardware they specifically needed. Certainly something to keep in mind! -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] https via GPU?
On Thu, Aug 4, 2011 at 10:31 AM, Aryeh Gregor a...@aryeh.name wrote: I was under the impression that the biggest cost in TLS isn't the symmetric encryption for an ongoing connection, it's the asymmetric encryption for the connection setup. If so, AES acceleration isn't going to help with the most important performance issue. Am I wrong? The handshake operations still aren't all that expensive these days, and with a prudent amount of sticky loadbalancing to ssl terminating boxes, a good hit rate can be achieved from openssl's session cache which eliminates some of the asymmetric operations and half of the connection handshake. From: http://www.imperialviolet.org/2010/06/25/overclocking-ssl.html In January this year (2010), Gmail switched to using HTTPS for everything by default. Previously it had been introduced as an option, but now all of our users use HTTPS to secure their email between their browsers and Google, all the time. In order to do this we had to deploy *no additional machines* and *no special hardware*. On our production frontend machines, SSL/TLS accounts for less than 1% of the CPU load, less than 10KB of memory per connection and less than 2% of network overhead. Many people believe that SSL takes a lot of CPU time and we hope the above numbers (public for the first time) will help to dispel that. We can't get these sorts of numbers if we run the version of openssl bundled with lucid but everything we need is available either in patch form or has become part of the mainline openssl source. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] New Employee Announcement - Jeff Green
Woo! Looking forward to working with you, Jeff! On Thu, Jun 30, 2011 at 2:31 PM, CT Woo ct...@wikimedia.org wrote: All, Please join me to welcome Jeff Green to Wikimedia Foundation. Jeff is taking up the Special Ops position in the Tech Ops department where one of his responsibilities is to keep our Fundraising infrastructure secured, in compliance with regulation, scalable and highly available. Jeff comes with strong systems operation background especially in scaling and building highly secured infrastructure. He hails from Craiglist where he started as their first system administrator and served as their lead system administrator as well as their Operations manager, most of his tenure there. When not working, Jeff likes cycling, playing music, and building stuff. He is a proud father of two young kids and a lucky husband. He and his family will be moving back to Massachusetts this August. Please drop by next week to the 3rd floor to welcome him. For those who have already met him earlier, do come by as well to see the new 'ponytailess' Jeff ;-) Thanks, CT ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l