from:"Asher Feldman"

Re: [Wikitech-l] mod_pagespeed and MediaWiki

2013-07-15 Thread Asher Feldman

On Mon, Jul 15, 2013 at 7:57 PM, Ilya Grigorik igrigo...@google.com wrote:

 +asher (woops, forgot to cc :))

 On Mon, Jul 15, 2013 at 7:54 PM, Ilya Grigorik igrigo...@google.comwrote:


 Anyway, I've already started working on something I noticed in
 mod_pagespeed - a much better JS minification, expect updates soon:)


 Not to discourage you from doing so.. but JS minification is not the
 problem. In fact, if you look at the 
 sidehttp://www.webpagetest.org/breakdown.php?test=130715_82_3c03a9eb9339dcf8d3e82ed43ad2998drun=3cached=0by
 sidehttp://www.webpagetest.org/breakdown.php?test=130715_VZ_7748042f6f940ec663a43130cd597eeerun=4cached=0content
  breakdown of the original and MPS optimized sites, you'll notice
 that MPS is loading 3kb more of JS (because we add some of our own logic).

 We're not talking about applying missing gzip or minification.. To make
 the site mobile friendly, we're talking about structural changes to the
 page: eliminating blocking javascript code, inlining critical CSS to
 unblock first render, deferring other assets to after the above-the-fold is
 loaded, and so on. Those are the parts that MPS automates - the 
 filmstriphttp://www.webpagetest.org/video/compare.php?tests=130715_82_3c03a9eb9339dcf8d3e82ed43ad2998d-l%3Aoriginal-r%3A3%2C130715_VZ_7748042f6f940ec663a43130cd597eee-l%3Amps-r%3A4%2C%2C%2C%2CthumbSize=100ival=100end=visualshould
  speak for itself. (Note that filmstrip shows first render at 2s,
 instead of 1.6, due how how the frames are captured on mobile devices in
 WPT).


These are the points I was trying to highlight from your presentation :)
While there's room for further optimization after, inlining above the fold
css and deferring everything else including additional content seem like
immediate gains we could start working on.  The mobile dev team has already
put work into being able to serve above the fold plus section headers in an
initial request - we just need to make some changes to how we assemble
pages to support inlining required css/js for this view, and separating out
the rest.  I think this can and should be delivered by mediawiki /
resourceloader / mobilefrontend by design, instead of via mps however.

As Max noted, this would require an additional varnish cache split, varying
between devices that support this and those that don't.  But the
performance gain for supported devices should fully justify it, and we just
invested in additional frontend cache capacity for mobile.

-Asher
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] mod_pagespeed and MediaWiki

2013-07-12 Thread Asher Feldman

[cc'ing Joshua Marantz who leads the mod_pagespeed effort at Google, and Ilya
Grigorik, their developer advocate for page performance]

The principals behind mod_pagespeed, especially as they related to mobile
page load performance as outlined in http://bit.ly/mobilecrp could
themselves be implemented within mediawiki.  mod_pagespeed itself can't
just be dropped in to do the job, and especially doesn't play nicely with
the full page edge caching wmf depends on; but it could be used as
development guide.

For mobile performance especially, the critical points are:

* Everything needed to fully render above the fold content should fit
within 10 packets, given our current 10 packet tcp initial connection
window.
* Those = 10 packets must be in the service of a single request.
* All css required by that above-the-fold view must be inline.  It doesn't
have to be all of the css required for the page overall.
* Same with javascript - anything not essential to above the fold should be
deferred.

I can't think of any good reasons why this couldn't be implemented by
MobileFrontend.  Accomplishing all of what mod_pagespeed addresses for
general mediawiki use would likely involve a rewrite of resourceloader.

-Asher

On Fri, Jul 12, 2013 at 3:03 PM, Max Semenik maxsem.w...@gmail.com wrote:

 On 12.07.2013, 3:16 Max wrote:

  FYI, Google already sent us a sample config for this module optimized
  for our mobile site, I'm going to try it tomorrow.

 And here are the results of my research:
 https://www.mediawiki.org/wiki/User:MaxSem/mod_pagespeed
 Briefly, this is interesting stuff, but not usable on WMF, or on
 any other large MW installations either.


 --
 Best regards,
   Max Semenik ([[User:MaxSem]])


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Please Welcome Sean Pringle, our latest TechOps member

2013-06-24 Thread Asher Feldman

Welcome, Sean! It's great to have you on board.

On Monday, June 24, 2013, Ct Woo wrote:

 Hi All,

 The Technical Operations team is pleased to announce Sean Pringle joined us
 today ( 24th June, 2013). Among his duties, Sean will be attending to all
 aspects of the database layer including management, monitoring, design,
 capacity, performance, and troubleshooting.

 Sean comes with vast experience in database technology and development
 background with a specific focus
 on MySQL and MariaDB.   He has held senior roles in database support,
 database administration, and technical writing with various companies
 including MySQL AB, Sun Microsystems, and SkySQL Ab. He also has fingers in
 a few non-profit and open-source projects scattered around the net.

 Sean hails from Queensland, Australia and while he travels around
 frequently,  he inevitably always flee back to his down under home. He
 confessed  being forever distracted by all things geek and technology
 related though he can also be spotted behind a telescope on starry nights
 and with nose in a book when the cloud rolls in.

 To quote him, I am excited to be joining the WMF, an opportunity which I
 see as an 11 on the awesomeness scale of 1 to 10 !

 Please join us in welcoming Sean!

 CT Woo   Ken Snider
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org javascript:;
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Relaxing our TorBlock

2013-06-07 Thread Asher Feldman

https://twitter.com/ioerror/status/342922052841377793

Why not - would the patrol cost really be too high?
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Relaxing our TorBlock

2013-06-07 Thread Asher Feldman

Ah, thanks Sumana!

On Friday, June 7, 2013, Sumana Harihareswara wrote:

 On 06/07/2013 08:43 AM, Asher Feldman wrote:
  https://twitter.com/ioerror/status/342922052841377793
 
  Why not - would the patrol cost really be too high?

 Discussion from December:
 http://www.gossamer-threads.com/lists/wiki/wikitech/323006 Can we help
 Tor users make legitimate edits?

 --
 Sumana Harihareswara
 Engineering Community Manager
 Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [WikimediaMobile] Caching Problem with Mobile Main Page?

2013-05-05 Thread Asher Feldman

Faidon - thanks for the more accurate trackdown, and fix!

On Sunday, May 5, 2013, Faidon Liambotis wrote:

 On Fri, May 03, 2013 at 03:19:13PM -0700, Asher Feldman wrote:
  1) Our multicast purge stream is very busy and isn't split up by cache
  type, so it includes lots of purge requests for images on
  upload.wikimedia.org.  Processing the purges is somewhat cpu intensive,
 and
  I saw doing so once per varnish server as preferable to twice.

 I believe the plan is to split up the multicast groups *and* to filter
 based on predefined regexps on the HTCP-PURGE layer, via the
 varnishhtcpd rewrite. But I may be mistaken, Mark and Brandon will know
 more.

  There are multiple ways to approach making the purges sent to the
 frontends
  actually work such as rewriting the purges in varnish, rewriting them
  before they're sent to varnish depending on where they're being sent, or
  perhaps changing how cached objects are stored in the frontend.  I
  personally think it's all an unnecessary waste of resources and prefer my
  original approach.

 Although the current VCL calls vcl_recv_purge after the rewrite step
 (and hence actually rewriting purges too), unless I'm mistaken this is
 actually unnecessary. The incoming purges match the way the objects are
 stored in the cache: both are without the .m. (et al) prefix, as normal
 desktop purges are matched with objects that had their URLs rewritten
 in vcl_recv. Handling purges after the rewrite step might be unnecessary
 but it doesn't mean it's a bad idea though; it doesn't hurt much and
 it's better as it allows us to also purge via the original .m. URL,
 which is what a person might do instictively.

 While mobile purges were actually broken recently in the past in a
 similar way as you guessed with I77b88f[1] (Restrict PURGE lookups to
 mobile domains) they were fixed shortly after with I76e5c4[2], a full
 day before the frontend cache TTL was removed.

 1:
 https://gerrit.wikimedia.org/r/#q,I77b88f3b4bb5ec84f70b2241cdd5dc496025e6fd,n,z
 2:
 https://gerrit.wikimedia.org/r/#q,I76e5c4218c1dec06673aa5121010875031c1a1e2,n,z

 What actually broke them again this time is I3d0280[3], which stripped
 absolute URIs before vcl_recv_purge, despite the latter having code that
 matches only against absolute URIs. This is my commit, so I'm
 responsible for this breakage, although in my defence I have an even
 score now for discovering the flaw last time around :)

 I've pushed and merged I08f761[4] which moves rewrite_proxy_urls after
 vcl_recv_purge and should hopefully unbreak purging while also not
 reintroducing BZ #47807.

 3:
 https://gerrit.wikimedia.org/r/#q,I3d02804170f7e502300329740cba9f45437a24fa,n,z
 4:
 https://gerrit.wikimedia.org/r/#q,I08f7615230037a6ffe7d1130a2a6de7ba370faf2,n,z

 As a side note, notice how rewrite_proxy_urls  vcl_recv_purge are both
 flawed in the same way: the former exists solely to workaround a Varnish
 bug with absolute URIs, while the latter is *depending* on that bug to
 manifest to actually work. req.url should always be a (relative) URL and
 hence the if (req.url ~ '^http:') comparison in vcl_recv_purge should
 normally always evaluate to false, making the whole function a no-op.

 However, due to the bug in question, Varnish doesn't special-handle
 absolute URIs in violation of RFC 2616. This, in combination with the
 fact that varnishhtcpd always sends absolute URIs (due to an
 RFC-compliant behavior of LWP's proxy() method), is why we have this
 seemingly wrong VCL code but which actually works as intended.

 This Varnish bug was reported by Tim upstream[5] and the fix is
 currently sitting in Varnish's git master[6]. It's simple enough and it
 might be worth it to backport it, although it might be more troulbe that
 it's worth, considering how it will break purges with our current VCL :)

 5: https://www.varnish-cache.org/trac/ticket/1255
 6:
 https://www.varnish-cache.org/trac/changeset/2bbb032bf67871d7d5a43a38104d58f747f2e860

 Cheers,
 Faidon

 ___
 Mobile-l mailing list
 mobil...@lists.wikimedia.org javascript:;
 https://lists.wikimedia.org/mailman/listinfo/mobile-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [WikimediaMobile] Caching Problem with Mobile Main Page?

2013-05-03 Thread Asher Feldman

The problem is due to recent changes that were made to how mobile caching
works. I just flushed cache on all of the frontend varnish instances which
indeed appears to have fixed the problem but it isn't actually fixed.
Note, the frontend instances just have 1GB of cache, so only very popular
objects (like the enwiki front page) avoid getting LRU'd. The backend
varnish instances utilize the ssd's and perform the heavy caching work.

When I originally built this, I had the frontends force a short (300s) ttl
on all cacheable objects, while the backends honored the times specified by
mediawiki.

I chose to only send purges to the backend instances (via wikia's old
varnishhtcpd) and let the frontend instances catch up with their short
ttls. My reasoning was:

1) Our multicast purge stream is very busy and isn't split up by cache
type, so it includes lots of purge requests for images on
upload.wikimedia.org. Processing the purges is somewhat cpu intensive, and
I saw doing so once per varnish server as preferable to twice.

2) Purges are for url's such as en.wikipedia.org/wiki/Main_Page. The
frontend varnish instance strips the m subdomain before sending the request
onwards, but still caches content based on the request url. Purges are
never sent for en.m.wikipedia.org/wiki/Main_Page - every purge would need
to be rewritten to apply to the frontend varnishes. Doing this blindly
would be more expensive than it should be, since a significant percentage
of purge statements aren't applicable.

I don't think my original approach had any fans. Purges are now sent to
both varnish instances per host, and more recently, the 300s ttl override
was removed from the frontends. But all of the purges are no-ops.

There are multiple ways to approach making the purges sent to the frontends
actually work such as rewriting the purges in varnish, rewriting them
before they're sent to varnish depending on where they're being sent, or
perhaps changing how cached objects are stored in the frontend. I
personally think it's all an unnecessary waste of resources and prefer my
original approach.

-Asher

On Fri, May 3, 2013 at 2:23 PM, Arthur Richards aricha...@wikimedia.orgwrote:

+wikitech-l

I've confirmed the issue on my end; ?action=purge seems to have no effect
and the 'last modified' notification on the mobile main page looks correct
(though the content itself is out of date and not in sync with the 'last
modified' notification). What's doubly weird to me is the 'Last modified'
HTTP response headers says:

Last-Modified: Tue, 30 Apr 2013 00:17:32 GMT

Which appears to be newer than when the content I'm seeing on the main
page was updated... Anyone from ops have an idea what might be going on?

On Thu, May 2, 2013 at 10:01 PM, Yuvi Panda yuvipa...@gmail.com wrote:

Encountered
https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Issue_with_Main_Page_on_mobile.2C_viz._it_hasn.27t_changed_since_Tuesday

Some people seem to be having problems with the mobile main page being
cached too much. Can someone look into it?

--
Yuvi Panda T
http://yuvi.in/blog

___
Mobile-l mailing list
mobil...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l

--
Arthur Richards
Software Engineer, Mobile
[[User:Awjrichards]]
IRC: awjr
+1-415-839-6885 x6687

___
Mobile-l mailing list
mobil...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Mobile caching improvements are coming

2013-03-29 Thread Asher Feldman

This sounds like a great plan.  Thank you!

On Fri, Mar 29, 2013 at 2:45 AM, Max Semenik maxsem.w...@gmail.com wrote:

 Hi, we at the mobile team are currently working on improving our
 current hit rate, publishing the half-implemented plan here for review:

 == Current status ==
 * X-Device header is generated by frontend Varnish from user-agent.
 * There are currently 21 possible X-Device values, which we decreased to 20
 this week.
 * X-Device is used for HTML variance (roughly, Vary: X-Device).
 * Depending on X-Device, we alter skin HTML, serve it full or limited
 resources.
 * Because some phones need CSS tweaks and don't support media queries, we
 have to serve them device-specific CSS.
 * Device-specific CSS is served via separate ResourceLoader modules e.g.
 mobile.device.android.

 == What's bad about it? ==
 Cache fragmentation is very high, resulting in ~55% hit rate.

 == Proposed strategy ==
 * We don't vary pages on X-Device anymore.
 * Because we still need to give really ancient WAP phones WML output, we
 create a new header, X-WAP, with just two values, yes or not[1]
 * And we vary our output on X-WAP instead of X-Device[2]
 * Because we still need to serve device-specific CSS but can't use device
 name in page HTML, we create a single ResourceLoader module,
 mobile.device.detect, which outputs styles depending on X-Device.[2] This
 does not affect bits cache fragmentation because it simply changes the way
 the same data is varied, but not adds the new fragmentation factors. Bits
 hit rate currently is very high, by the way.
 * And because we need X-Device, we will need to direct mobile load.php
 requests to the mobile site itself instead of bits. Not a problem because
 mobile domains are served by Varnish just like bits.
 * Since now we will be serving ResourceLoader to all devices, we will
 blacklist all the incompatible devices in the startup module to prevent
 them from choking on the loads of JS they can't handle (and even if they
 degrade gracefully, still no need to force them to download tens of
 kilobytes needlessly)[3]

 == Commits ==
 [1] https://gerrit.wikimedia.org/r/#/c/32866/ - adds X-WAP to Varnish
 [2] https://gerrit.wikimedia.org/r/55226 - main MobileFrontend change
 [3] https://gerrit.wikimedia.org/r/#/c/55446/ - ResourceLoader change,
 just
 a sketch of a real solution as of the moment I'm writing this


 Your comments are highly appreciated! :)

 --
 Best regards,
 Max Semenik ([[User:MaxSem]])
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wmfall] Announcing Latest Member to Operations Engineering Team - Brandon Black

2013-03-29 Thread Asher Feldman

Welcome Brandon!!

On Fri, Mar 29, 2013 at 3:00 PM, Ct Woo ct...@wikimedia.org wrote:

 All,

 We are excited to announce that Brandon Black will join us this Monday
 (2013-04-01) as a full-time member of the Operations Engineering team.

 Brandon comes with deep and wide technical experience. Previously, he held
 senior systems engineering positions in  companies like SqueezeNetwork.com,
 Veritas DGC, MCI WorldCom and Networks Online.

 He is an active proponent and contributor of open-source software, and has
 contributed a new GPL-licensed DNS software 
 (gdnsdhttps://github.com/blblack/gdnsd)
 to accomplish global-level geographic balancing and automatic failover
 without paying for expensive commercial solutions.

 Brandon resides in Magnolia, TX, but has roamed the planet all his life
 (like spending his High School years in Singapore). He is excited to join
 the Wikimedia Ops team and hopes to learn many new things from the
 experience.  His interests include auto racing, being a professional
 amateur, and learning new skills by starting projects which he has no idea
 how to finish.

 Brandon will be in San Francisco office this coming Monday and please drop
 by to welcome him!

 Thanks,
 CT Woo

 ___
 Wmfall mailing list
 wmf...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wmfall


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Mobile caching improvements are coming

2013-03-29 Thread Asher Feldman

Why don't we continue to use the bits cache for all things resourceloader.
Can you provide a different path for these requests, such as instead of:

http://bits.wikimedia.org/en.wikipedia.org/load.php?..

use something like:

http://bits.wikimedia.org/m/en.wikipedia.org/load.php?..

Then we can if (req.url ~ ^/m/) { tag_carrier + strip the /m/ }, so the
overhead only effects mobile requests.

Faidon has raised that it's still advantageous to shard page resources
across more than one domain for browser pipelining.

On Fri, Mar 29, 2013 at 1:55 PM, Arthur Richards aricha...@wikimedia.orgwrote:

 This approach will require either:

 1) Adding device detection to bits for device variance
 2) Using mobile varnish to handle load.php requests for resources requested
 from .m domains

 From conversations with Max and some folks from ops, it sounds like #2 is
 the preferred approach, but I am a little nervous about it since mobile
 varnish caches will have to handle a significant increase in requests. It
 looks like a typical article load results in 6 load.php requests. Also,
 we'll need to duplicate some configuration from the bits VCL.  Ops, is this
 OK given current architecture?



 On Fri, Mar 29, 2013 at 11:18 AM, Max Semenik maxsem.w...@gmail.com
 wrote:

  On 29.03.2013, 21:47 Yuri wrote:
 
   Max, do we still plan to detect javascript support for mobile devices,
 or
   do you want to fold that into isWAP ?
 
   Non-js-supporting devices need very different handling, as all HTML has
  to
   be pre-built for them on the server.
 
  ResourceLoader has a small stub module called startup. It checks
  browser compatibility and then loads jQuery and various MediaWiki
  modules (including ResourceLoader core). We just need to imporove the
  checks, as the my original message states:
 
   * Since now we will be serving ResourceLoader to all devices, we will
   blacklist all the incompatible devices in the startup module to prevent
   them from choking on the loads of JS they can't handle (and even if
 they
   degrade gracefully, still no need to force them to download tens of
   kilobytes needlessly)[3]
 
  --
  Best regards,
Max Semenik ([[User:MaxSem]])
 
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 



 --
 Arthur Richards
 Software Engineer, Mobile
 [[User:Awjrichards]]
 IRC: awjr
 +1-415-839-6885 x6687
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC] performance standards for new mediawiki features

2013-03-26 Thread Asher Feldman

On Thu, Mar 21, 2013 at 10:55 PM, Yuri Astrakhan
yastrak...@wikimedia.orgwrote:

 API is fairly complex to meassure and performance target. If a bot requests
 5000 pages in one call, together with all links  categories, it might take
 a very long time (seconds if not tens of seconds). Comparing that to
 another api request that gets an HTML section of a page, which takes a
 fraction of a second (especially when comming from cache) is not very
 useful.


This is true, and I think we'd want to look at a metric like 99th
percentile latency.  There's room for corner cases taking much longer, but
they really have to be corner cases.  Standards also have to be flexible,
with different acceptable ranges for different uses.  Yet if 30% of
requests for an api method to fetch pages took tens of seconds, we'd likely
have to disable it entirely until its use or the number of pages per
request could be limited.

On Fri, Mar 22, 2013 at 1:32 AM, Peter Gehres li...@pgehres.com wrote:

  From where would you propose measuring these data points?  Obviously
  network latency will have a great impact on some of the metrics and a
  consistent location would help to define the pass/fail of each test. I do
  think another benchmark Ops features would be a set of
  latency-to-datacenter values, but I know that is a much harder taks.
 Thanks
  for putting this together.
 
 
  On Thu, Mar 21, 2013 at 6:40 PM, Asher Feldman afeld...@wikimedia.org
  wrote:
 
   I'd like to push for a codified set of minimum performance standards
 that
   new mediawiki features must meet before they can be deployed to larger
   wikimedia sites such as English Wikipedia, or be considered complete.
  
   These would look like (numbers pulled out of a hat, not actual
   suggestions):
  
   - p999 (long tail) full page request latency of 2000ms
   - p99 page request latency of 800ms
   - p90 page request latency of 150ms
   - p99 banner request latency of 80ms
   - p90 banner request latency of 40ms
   - p99 db query latency of 250ms
   - p90 db query latency of 50ms
   - 1000 write requests/sec (if applicable; writes operations must be
 free
   from concurrency issues)
   - guidelines about degrading gracefully
   - specific limits on total resource consumption across the stack per
   request
   - etc..
  
   Right now, varying amounts of effort are made to highlight potential
   performance bottlenecks in code review, and engineers are encouraged to
   profile and optimize their own code.  But beyond is the site still up
  for
   everyone / are users complaining on the village pump / am I ranting in
   irc, we've offered no guidelines as to what sort of request latency is
   reasonable or acceptable.  If a new feature (like aftv5, or flow) turns
  out
   not to meet perf standards after deployment, that would be a high
  priority
   bug and the feature may be disabled depending on the impact, or if not
   addressed in a reasonable time frame.  Obviously standards like this
  can't
   be applied to certain existing parts of mediawiki, but systems other
 than
   the parser or preprocessor that don't meet new standards should at
 least
  be
   prioritized for improvement.
  
   Thoughts?
  
   Asher
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC] performance standards for new mediawiki features

2013-03-26 Thread Asher Feldman

There are all good points, and we certainly do need better tooling for
individual developers.

There are a lot of things a developer can do on just a laptop in terms of
profiling code, that if done consistently, could go a long way, even
without it looking anything like production.  Things like understanding if
algorithms or queries are O(n) or O(2^n), etc. and thinking about the
potential size of the relevant production data set might  be more useful at
that stage than raw numbers.  When it comes to gathering numbers in such an
environment, it would be helpful if either the mediawiki profiler could
gain an easy visualization interface appropriate for such environments, or
if we standardized around something like xdebug.

The beta cluster has some potential as a performance test bed if only it
could gain a guarantee that the compute nodes it runs on aren't
oversubscribed or that the beta virts were otherwise consistently
resourced.  By running a set of performance benchmarks against beta and
production, we may be able to gain insight on how new features are likely
to perform.

Beyond due diligence while architecting and implementing a feature, I'm
actually a proponent of testing in production, albeit in limited ways.  Not
as with test.wikipedia.org which ran on the production cluster, but by
deploying a feature to 5% of enwiki users, or 10% of pages, or 20% of
editors.  Once something is deployed like that, we do indeed have tooling
available to gather hard performance metrics of the sort I proposed, though
they can always be improved upon.

It became apparent that ArticleFeedbackV5 had severe scaling issues after
being enabled on 10% of the articles on enwiki.  For that example, I think
it could have been caught in an architecture review or in local testing by
the developers that issuing 17 database write statements per submission of
an anonymous text box that would go at the bottom of every wikipedia
article was a bad idea.  But it's really great that it was incrementally
deployed and we could halt its progress before the resulting issues got too
serious.

That rollout methodology should be considered a great success.  If it can
become the norm, perhaps it won't be difficult to get to the point where we
can have actionable performance standards for new features, via a process
that actually encourages getting features in production instead of being a
complicated roadblock.

On Fri, Mar 22, 2013 at 1:20 PM, Arthur Richards aricha...@wikimedia.orgwrote:

 Right now, I think many of us profile locally or in VMs, which can be
 useful for relative metrics or quickly identifying bottlenecks, but doesn't
 really get us the kind of information you're talking about from any sort of
 real-world setting, or in any way that would be consistent from engineer to
 engineer, or even necessarily from day to day. From network topology to
 article counts/sizes/etc and everything in between, there's a lot we can't
 really replicate or accurately profile against. Are there plans to put
 together and support infrastructure for this? It seems to me that this
 proposal is contingent upon a consistent environment accessible by
 engineers for performance testing.


 On Thu, Mar 21, 2013 at 10:55 PM, Yuri Astrakhan
 yastrak...@wikimedia.orgwrote:

  API is fairly complex to meassure and performance target. If a bot
 requests
  5000 pages in one call, together with all links  categories, it might
 take
  a very long time (seconds if not tens of seconds). Comparing that to
  another api request that gets an HTML section of a page, which takes a
  fraction of a second (especially when comming from cache) is not very
  useful.
 
 
  On Fri, Mar 22, 2013 at 1:32 AM, Peter Gehres li...@pgehres.com wrote:
 
   From where would you propose measuring these data points?  Obviously
   network latency will have a great impact on some of the metrics and a
   consistent location would help to define the pass/fail of each test. I
 do
   think another benchmark Ops features would be a set of
   latency-to-datacenter values, but I know that is a much harder taks.
  Thanks
   for putting this together.
  
  
   On Thu, Mar 21, 2013 at 6:40 PM, Asher Feldman afeld...@wikimedia.org
   wrote:
  
I'd like to push for a codified set of minimum performance standards
  that
new mediawiki features must meet before they can be deployed to
 larger
wikimedia sites such as English Wikipedia, or be considered complete.
   
These would look like (numbers pulled out of a hat, not actual
suggestions):
   
- p999 (long tail) full page request latency of 2000ms
- p99 page request latency of 800ms
- p90 page request latency of 150ms
- p99 banner request latency of 80ms
- p90 banner request latency of 40ms
- p99 db query latency of 250ms
- p90 db query latency of 50ms
- 1000 write requests/sec (if applicable; writes operations must be
  free
from concurrency issues)
- guidelines about degrading gracefully

[Wikitech-l] [RFC] performance standards for new mediawiki features

2013-03-21 Thread Asher Feldman

I'd like to push for a codified set of minimum performance standards that
new mediawiki features must meet before they can be deployed to larger
wikimedia sites such as English Wikipedia, or be considered complete.

These would look like (numbers pulled out of a hat, not actual
suggestions):

- p999 (long tail) full page request latency of 2000ms
- p99 page request latency of 800ms
- p90 page request latency of 150ms
- p99 banner request latency of 80ms
- p90 banner request latency of 40ms
- p99 db query latency of 250ms
- p90 db query latency of 50ms
- 1000 write requests/sec (if applicable; writes operations must be free
from concurrency issues)
- guidelines about degrading gracefully
- specific limits on total resource consumption across the stack per request
- etc..

Right now, varying amounts of effort are made to highlight potential
performance bottlenecks in code review, and engineers are encouraged to
profile and optimize their own code.  But beyond is the site still up for
everyone / are users complaining on the village pump / am I ranting in
irc, we've offered no guidelines as to what sort of request latency is
reasonable or acceptable.  If a new feature (like aftv5, or flow) turns out
not to meet perf standards after deployment, that would be a high priority
bug and the feature may be disabled depending on the impact, or if not
addressed in a reasonable time frame.  Obviously standards like this can't
be applied to certain existing parts of mediawiki, but systems other than
the parser or preprocessor that don't meet new standards should at least be
prioritized for improvement.

Thoughts?

Asher
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Query performance - run code faster, merge code faster :-)

2013-03-07 Thread Asher Feldman

On Thu, Mar 7, 2013 at 3:57 PM, Tim Starling tstarl...@wikimedia.orgwrote:

 On 07/03/13 12:12, Asher Feldman wrote:
  Ori - I think this has been discussed but automated xhprof configuration
 as
  part of the vagrant dev env setup would be amazing :)

 I don't think xhprof is the best technology for PHP profiling. I
 reported a bug a month ago which causes the times it reports to be
 incorrect by a random factor, often 4 or so. No response so far. And
 its web interface is packed full of XSS vulnerabilities. XDebug +
 KCacheGrind is quite nice.


That's disappointing, I wonder if xhprof has become abandonware since
facebook moved away from zend.  Have you looked at Webgrind (
http://code.google.com/p/webgrind/)?  If not, I'd love to see it at least
get a security review.  KCacheGrind is indeed super powerful and nice, and
well suited to a dev vm.  I'm still interested in this sort of profiling
for a very small percentage of production requests though, such as 0.1% of
requests hitting a single server.  Copying around cachegrind files and
using KCacheGrind wouldn't be very practical.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Query performance - run code faster, merge code faster :-)

2013-03-06 Thread Asher Feldman

Database query performance isn't the leading performance bottleneck on the
WMF cluster. If reading or writing to a database, certainly do take the
time to specifically profile your database queries, and make sure to
efficiently use caching (and avoid stampede scenarios on cache expiration)
whenever possible. Hopefully in the future, caching won't be as up to
individual developers to get right on an ad hoc basis. In the last year,
we made changes that reduced the query load to mysql masters by nearly
70%. Those queries were well written - there was nothing to tune at the
sql layer. The point being, query tuning can't substitute for or even
correlate to making efficient design decisions.

If you have profiled sql queries, or if your code doesn't have any to
profile, don't stop there. Profiling the code itself is at least as
important. The mediawiki profiler (
https://www.mediawiki.org/wiki/Profiler#Profiling) offers an easy place to
start and it's good to include profiling hooks as they automatically result
in p90/p99, etc. latency graphs in graphite in production. But for
individual development environments, setting up xhprof might be more
useful. There are plenty of tutorials out there, such as -
http://blog.cnizz.com/2012/05/05/enhanced-php-performance-profiling-with-xhprof/

Ori - I think this has been discussed but automated xhprof configuration as
part of the vagrant dev env setup would be amazing :)

On Wed, Mar 6, 2013 at 4:36 PM, Sumana Harihareswara
suma...@wikimedia.orgwrote:

If you want your code merged, you need to keep your database queries
efficient. How can you tell if a query is inefficient? How do you write
efficient queries, and avoid inefficient ones? We have some resources
around:

Roan Kattouw's

https://www.mediawiki.org/wiki/Manual:Database_layout/MySQL_Optimization/Tutorial
-- slides at
https://commons.wikimedia.org/wiki/File:MediaWikiPerformanceProfiling.pdf

Roan's slides actually at
https://www.mediawiki.org/wiki/File:SQL_indexing_Tutorial.pdf

But! If you're a developer and would appreciate guidance around how to best
create and efficiently use indexes, I highly recommend this slide deck:
http://www.percona.com/files/presentations/WEBINAR-tools-and-techniques-for-index-design.pdf

Asher Feldman's
https://www.mediawiki.org/wiki/File:MediaWiki_Performance_Profiling.ogv
-- slides at https://www.mediawiki.org/wiki/File:SQL_indexing_Tutorial.pdf

slides actually at
https://commons.wikimedia.org/wiki/File:MediaWikiPerformanceProfiling.pdf

More hints:
http://lists.wikimedia.org/pipermail/toolserver-l/2012-June/005075.html

Due to the use of views on toolserver, it isn't really possible to use that
environment to profile or tune queries as they would actually run in
production.

When you need to ask for a performance review, you can check out
https://www.mediawiki.org/wiki/Developers/Maintainers#Other_Areas_of_Focus
which suggests Tim Starling, Asher Feldman, and Ori Livneh. I also
BOLDly suggest Nischay Nahata, who worked on Semantic MediaWiki's
performance for his GSoC project in 2012.

--
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] switching to something better than irc.wikimedia.org

2013-03-01 Thread Asher Feldman

I don't think a custom daemon would actually be needed.

http://redis.io/topics/pubsub

While I was at flickr, we implemented a pubsub based system to push
notifications of all photo uploads and metadata changes to google using
redis as the backend. The rate of uploads and edits at flickr in 2010 was
orders of magnitude greater than the rate of edits across all wmf projects.
Publishing to a redis pubsub channel does grow in cost as the number of
subscribers increases but I don't see a problem at our scale. If so, there
are ways around it.

We are planning on migrating the wiki job queues from mysql to redis in the
next few weeks, so it's already a growing piece of our infrastructure.  I
think the bulk of the work here would actually just be in building
a frontend webservice that supports websockets / long polling, provides a
clean api, and preferably uses oauth or some form of registration to ward
off abuse and allow us to limit the growth of subscribers as we scale.

On Friday, March 1, 2013, Petr Bena wrote:

 I still don't see it as too much complex. Matter of month(s) for
 volunteers with limited time.

 However I quite don't see what is so complicated on last 2 points.
 Given the frequency of updates it's most simple to have the client
 (user / bot / service that need to read the feed) open the persistent
 connection to server (dispatcher) which fork itself just as sshd does
 and the new process handle all requests from this client. The client
 somehow specify what kind of feed they want to have (that's the
 registration part) and forked dispatcher keeps it updated with
 information from cache.

 Nothing hard. And what's the problem with multithreading huh? :) BTW I
 don't really think there is a need for multithreading at all, but even
 if there was, it shouldn't be so hard.

 On Fri, Mar 1, 2013 at 3:47 PM, Tyler Romeo 
 tylerro...@gmail.comjavascript:;
 wrote:
  On Fri, Mar 1, 2013 at 9:16 AM, Petr Bena benap...@gmail.comjavascript:;
 wrote:
 
  I have not yet found a good and stable library for JSON parsing in c#,
  should you know some let me know :)
 
 
  Take a look at http://www.json.org/. They have a list of implementations
  for different languages.
 
  However, I disagree with I feel like such a project would take an
  insane amount of resources to develop. If we wouldn't make it
  insanely complicated, it won't take insane amount of time ;). The
  cache daemon could be memcached which is already written and stable.
  Listener is a simple daemon that just listen in UDP, parse the data
  from mediawiki and store them in memcached in some universal format,
  and dispatcher is just process that takes the data from cache, convert
  them to specified format and send them to client.
 
 
  Here's a quick list of things that are basic requirements we'd have to
  implement:
 
 - Multi-threading, which is in and of itself a pain in the a**.
 - Some sort of queue for messages, rather than hoping the daemon can
 send out every message in realtime.
 - Ability for clients to register with the daemon (and a place to
 store
 a client list)
 - Multiple methods of notification (IRC would be one, XMPP might be a
 candidate, and a simple HTTP endpoint would be a must).
 
  Just those basics isn't an easy task, especially considering unless WMF
  allocates resources to it the project would be run solely by those who
 have
  enough free time. Also, I wouldn't use memcached as a caching daemon,
  primarily because I'm not sure such an application even needs a caching
  daemon. All it does is relay messages.
 
  *--*
  *Tyler Romeo*
  Stevens Institute of Technology, Class of 2015
  Major in Computer Science
  www.whizkidztech.com | tylerro...@gmail.com javascript:;
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org javascript:;
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org javascript:;
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] switching to something better than irc.wikimedia.org

2013-03-01 Thread Asher Feldman

On Friday, March 1, 2013, Petr Bena wrote:

 web frontend you say?

 if you compare the raw data of irc protocol (1 rc feed message) and
 raw data of a http request and response for one page consisting only
 of that 1 rc feed message, you will see a huge difference in size and
 performance.


I was sugesting it for websockets or a long poll, the above comparison
isn't relevant.  Connection is established, with its protocol overhead. It
stays open and messages are continually pushed from the server. Not a web
request for a page containing one rc message.

Also all kinds of authentication required doesn't seem like an
 improvement to me. It will only complicate what is simple now. Have
 there been many attempts to abuse irc.wikimedia.org so far? there is
 no authentication at all.


Maybe none is needed but I don't think the irc feed interests anyone
outside of a very small community. Doing something a little more modern
might attract different uses. It might not, but I have no idea.



 On Fri, Mar 1, 2013 at 5:46 PM, Asher Feldman 
 afeld...@wikimedia.orgjavascript:;
 wrote:
  I don't think a custom daemon would actually be needed.
 
  http://redis.io/topics/pubsub
 
  While I was at flickr, we implemented a pubsub based system to push
  notifications of all photo uploads and metadata changes to google using
  redis as the backend. The rate of uploads and edits at flickr in 2010 was
  orders of magnitude greater than the rate of edits across all wmf
 projects.
  Publishing to a redis pubsub channel does grow in cost as the number of
  subscribers increases but I don't see a problem at our scale. If so,
 there
  are ways around it.
 
  We are planning on migrating the wiki job queues from mysql to redis in
 the
  next few weeks, so it's already a growing piece of our infrastructure.  I
  think the bulk of the work here would actually just be in building
  a frontend webservice that supports websockets / long polling, provides a
  clean api, and preferably uses oauth or some form of registration to ward
  off abuse and allow us to limit the growth of subscribers as we scale.
 
  On Friday, March 1, 2013, Petr Bena wrote:
 
  I still don't see it as too much complex. Matter of month(s) for
  volunteers with limited time.
 
  However I quite don't see what is so complicated on last 2 points.
  Given the frequency of updates it's most simple to have the client
  (user / bot / service that need to read the feed) open the persistent
  connection to server (dispatcher) which fork itself just as sshd does
  and the new process handle all requests from this client. The client
  somehow specify what kind of feed they want to have (that's the
  registration part) and forked dispatcher keeps it updated with
  information from cache.
 
  Nothing hard. And what's the problem with multithreading huh? :) BTW I
  don't really think there is a need for multithreading at all, but even
  if there was, it shouldn't be so hard.
 
  On Fri, Mar 1, 2013 at 3:47 PM, Tyler Romeo 
  tylerro...@gmail.comjavascript:;
 javascript:;
  wrote:
   On Fri, Mar 1, 2013 at 9:16 AM, Petr Bena 
   benap...@gmail.comjavascript:;
 javascript:;
  wrote:
  
   I have not yet found a good and stable library for JSON parsing in
 c#,
   should you know some let me know :)
  
  
   Take a look at http://www.json.org/. They have a list of
 implementations
   for different languages.
  
   However, I disagree with I feel like such a project would take an
   insane amount of resources to develop. If we wouldn't make it
   insanely complicated, it won't take insane amount of time ;). The
   cache daemon could be memcached which is already written and stable.
   Listener is a simple daemon that just listen in UDP, parse the data
   from mediawiki and store them in memcached in some universal format,
   and dispatcher is just process that takes the data from cache,
 convert
   them to specified format and send them to client.
  
  
   Here's a quick list of things that are basic requirements we'd have to
   implement:
  
  - Multi-threading, which is in and of itself a pain in the a**.
  - Some sort of queue for messages, rather than hoping the daemon
 can
  send out every message in realtime.
  - Ability for clients to register with the daemon (and a place to
  store
  a client list)
  - Multiple methods of notification (IRC would be one, XMPP might
 be a
  candidate, and a simple HTTP endpoint would be a must).
  
   Just those basics isn't an easy task, especially considering unless
 WMF
   allocates resources to it the project would be run solely by those who
  have
   enough free time. Also, I wouldn't use memcached as a caching daemon,
   primarily because I'm not sure such an application even needs a
 caching
   daemon. All it does is relay messages.
  
   *--*
   *Tyler Romeo*
   Stevens Institute of Technology, Class of 2015
   Major in Computer Science
   www.whizkidztech.com | tylerro...@gmail.com javascript:;javascript

Re: [Wikitech-l] switching to something better than irc.wikimedia.org

2013-03-01 Thread Asher Feldman

On Friday, March 1, 2013, Tyler Romeo wrote:

 On Fri, Mar 1, 2013 at 11:46 AM, Asher Feldman 
 afeld...@wikimedia.orgjavascript:;
 wrote:

  don't think a custom daemon would actually be needed.
 
  http://redis.io/topics/pubsub
 
 
 
  While I was at flickr, we implemented a pubsub based system to push
  notifications of all photo uploads and metadata changes to google using
  redis as the backend. The rate of uploads and edits at flickr in 2010 was
  orders of magnitude greater than the rate of edits across all wmf
 projects.
  Publishing to a redis pubsub channel does grow in cost as the number of
  subscribers increases but I don't see a problem at our scale. If so,
 there
  are ways around it.
 
  We are planning on migrating the wiki job queues from mysql to redis in
 the
  next few weeks, so it's already a growing piece of our infrastructure.  I
  think the bulk of the work here would actually just be in building
  a frontend webservice that supports websockets / long polling, provides a
  clean api, and preferably uses oauth or some form of registration to ward
  off abuse and allow us to limit the growth of subscribers as we scale.
 

 Interesting. Didn't know Redis had something like this. I'm not too
 knowledgeable about Redis, but would clients be able to subscribe directly
 to Redis queues? Or would that be a security issue (like allowing people to
 access Memcached would be) and we would have to implement our own
 notification service anyway?


I think a very light weight proxy that only passes subscribe commands to
redis would work. A read only redis slave could be provided but I don't
think it includes a way to limit what commands clients can run, including
administrative ones. I think we'd want a thin proxy layer in front anyways,
to track and if necessary, selectively limit access. It could be very
simple though.


 0mq? RabbitMQ? Seem to fit the use case pretty well / closely.


 Hmm, I've always only thought of RabbitMQ as a messaging service between
 linked applications, but I guess it could be used as a type of push
 notification service as well.


 *--*
 *Tyler Romeo*
 Stevens Institute of Technology, Class of 2015
 Major in Computer Science
 www.whizkidztech.com | tylerro...@gmail.com javascript:;
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org javascript:;
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-15 Thread Asher Feldman

Just to tie this thread up - the issue of how to count ajax driven
pageviews loaded from the api and of how to differentiate those requests
from secondary api page requests has been resolved without the need for
code or logging changes.

Tagging of the mobile beta site will be accomplished via a new generic
mediawiki http response header dedicated to logging containing key value
pairs.

-Asher

On Tue, Feb 12, 2013 at 9:56 AM, Asher Feldman afeld...@wikimedia.orgwrote:

On Tuesday, February 12, 2013, Diederik van Liere wrote:

It does still seem to me that the data to determine secondary api
requests
should already be present in the existing log line. If the value of the
page param in an action=mobileview api request matches the page in the
referrer (perhaps with normalization), it's a secondary request as per
case
1 below. Otherwise, it's a pageview as per case 2. Difficult or
expensive
to reconcile? Not when you're doing distributed log analysis via
hadoop.

So I did look into this prior to writing the RFC and the issue is that a
lot of API referrers don't contain the querystring. I don't know what
triggers this so if we can fix this then we can definitely derive the
secondary pageview request from the referrer field.
D

If you can point me to some examples, I'll see if I can find any insights
into the behavior.

On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards
aricha...@wikimedia.org
wrote:

Thanks, Jon. To try and clarify a bit more about the API requests...
they
are not made on a per-section basis. As I mentioned earlier, there are
two
cases in which article content gets loaded by the API:

1) Going directly to a page (eg clicking a link from a Google search)
will
result in the backend serving a page with ONLY summary section content
and
section headers. The rest of the page is lazily loaded via API request
once
the JS for the page gets loaded. The idea is to increase
responsiveness
by
reducing the delay for an article to load (further details in the
article
Jon previously linked to). The API request looks like:

http://en.m.wikipedia.org/w/api.php?format=jsonaction=mobileviewpage=Liverpool+F.C.+in+European+footballvariant=enredirects=yesprop=sections%7Ctextnoheadings=yessectionprop=level%7Cline%7Canchorsections=all

2) Loading an article entirely via Javascript - like when a link is
clicked
in an article to another article, or an article is loaded via search.
This
will make ONE call to the API to load article content. API request
looks
like:

These API requests are identical, but only #2 should be counted as a
'pageview' - #1 is a secondary API request and should not be counted
as a
'pageview'. You could make the argument that we just count all of
these
API
requests as pageviews, but there are cases when we can't load article
content from the API (like devices that do not support JS), so we
need to
be able to count the traditional page request as a pageview - thus we
need
a way to differentiate the types of API requests being made when they
otherwise share the same URL.

On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrob...@gmail.com
wrote:

I'm a bit worried that now we are asking why pages are lazy loaded
rather than focusing on the fact that they currently __are doing
this___ and how we can log these (if we want to discuss this further
let's start another thread as I'm getting extremely confused doing
so
on this one).

Lazy loading sections

For motivation behind moving MobileFrontend into the direction of
lazy
loading section content and subsequent pages can be found here [1],
I
just gave it a refresh as it was a little out of date.

In summary the reason is to
1) make the app feel more responsive by simply loading content
rather
than reloading the entire interface
2) reducing the payload sent to a device.

Session Tracking

Going back to the discussion of tracking mobile page views, it
sounds
like a header stating whether a page is being viewed in alpha, beta
or
stable works fine for standard page views.

As for the situations where an entire page is loaded via the api it
makes no dif

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Labs-l] Maria DB

2013-02-14 Thread Asher Feldman

For most projects, I recommend using the official packages available via
the MariaDB projects own apt repo.

The official packages are based on the Debian mysql packaging where
installing the server package also installs a default database created
around generic config defaults, a debian mysql maintenance user with a
randomly generated password, and scripts (including init) that assume
privileged access via that user. That is, installing the packages provides
you with a fresh running working database with generic defaults suitable
for a small server, and certain admin tasks automated. I think that's what
the average labs and general users wants and expects.

The packages I've built for production use at wmf strips out all of the
debianisms, the debian project script rewrites, the pre/post install
actions. They also leave debug symbols in the binaries and have compiler
flag tweaks, but do not at this stage contain any source
patches. Installing the server package doesn't create a default db, or
provide an environment where you can even start the server on a fresh sever
install without further work. Probably not a good choice for most labs
users.

On Wednesday, February 13, 2013, Petr Bena wrote:

 thanks for updates.

 Can you tell me what is a difference between maria db you are using and
 the version that is recommended for use on ubuntu?


 On Wed, Feb 13, 2013 at 6:58 PM, Asher Feldman 
 afeld...@wikimedia.orgjavascript:_e({}, 'cvml', 'afeld...@wikimedia.org');
  wrote:

 The production migration to MariaDB was paused for a time by the EQIAD
 datacenter migration and issues involving other projects that took up my
 time, but the trial production roll-out will resume this month.  All signs
 still point to our using it in production.

 I did a lot of query testing on an enwiki MariaDB 5.5 slave over the
 course
 of more than a month before the first production deployment.  Major
 version
 migrations with mysql and derivatives are not to be taken lightly in
 production environments.  At a minimum, one must be concerned about query
 optimizer changes making one particular query type significantly slower.
 In the case of the switch to 5.5, there are several default behavior
 changes over 5.1 that can break applications or change results.  Hence,
 some serious work over a plodding time frame before that first production
 slave switch.

 Despite those efforts, a couple weeks after the switch, I saw a query
 generated by what seems to be a very rare edge case from that AFTv4
 extension that violated stricter enforcement of unsigned integer types in
 5.5, breaking replication and requiring one off rewriting and execution of
 the query locally to ensure data consistency before skipping over it.  I
 opened a bug, Mathias fixed the extension, and I haven't seen any other
 compatibility issues from AFTv4 or anything else deployed on enwiki.

 That said, other projects utilize different extensions, so all of my
 testing that has gone into enwiki cannot be assumed to fully cover
 everything else.  Because of that, and because I want to continue
 proceeding with caution for all of our projects, this will continue to be
 a
 slow and methodical process at this stage.  Bugs in extensions that aren't
 used by English Wikipedia may be found and require fixing along the way.

 As the MariaDB roll-out proceeds, I will provide updates on wikitech-l.

 Best,
 Asher

 On Wed, Feb 13, 2013 at 5:19 AM, Petr Bena 
 benap...@gmail.comjavascript:_e({}, 'cvml', 'benap...@gmail.com');
 wrote:

  Okay - so what is outcome? Should we migrate beta cluster? Are we going
 to
  use it in production?
 
 
  On Wed, Feb 13, 2013 at 2:08 PM, Chad 
  innocentkil...@gmail.comjavascript:_e({}, 'cvml', 
  'innocentkil...@gmail.com');
 wrote:
 
  On Wed, Feb 13, 2013 at 8:05 AM, bawolff 
  bawolff...@gmail.comjavascript:_e({}, 'cvml', 
  'bawolff%2...@gmail.com');
 wrote:
   Umm there was a thread several months ago about how it is used on
  several
   of the slave dbs, if I recall.
  
 
  Indeed, you're looking for mariadb 5.5 in production for english
  wikipedia
 
  http://www.gossamer-threads.com/lists/wiki/wikitech/319925
 
  -Chad
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org javascript:_e({}, 'cvml',
 'Wikitech-l@lists.wikimedia.org');
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 
 
  ___
  Labs-l mailing list
  lab...@lists.wikimedia.org javascript:_e({}, 'cvml',
 'lab...@lists.wikimedia.org');
  https://lists.wikimedia.org/mailman/listinfo/labs-l
 
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org javascript:_e({}, 'cvml',
 'Wikitech-l@lists.wikimedia.org');
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman

Re: [Wikitech-l] [Labs-l] Maria DB

2013-02-14 Thread Asher Feldman

Er, no it shouldn't. Initial execution might take microseconds longer due
to larger binary sizes and the elf loader having to skip over the symbols
but that's about it.

On Thursday, February 14, 2013, Petr Bena wrote:

 Keeping debug symbols in binaries will result in poor performance, or it
 should


 On Thu, Feb 14, 2013 at 4:47 PM, Asher Feldman 
 afeld...@wikimedia.orgjavascript:_e({}, 'cvml', 'afeld...@wikimedia.org');
  wrote:

 For most projects, I recommend using the official packages available via
 the MariaDB projects own apt repo.

 The official packages are based on the Debian mysql packaging where
 installing the server package also installs a default database created
 around generic config defaults, a debian mysql maintenance user with a
 randomly generated password, and scripts (including init) that assume
 privileged access via that user. That is, installing the packages provides
 you with a fresh running working database with generic defaults suitable
 for a small server, and certain admin tasks automated. I think that's what
 the average labs and general users wants and expects.

 The packages I've built for production use at wmf strips out all of the
 debianisms, the debian project script rewrites, the pre/post install
 actions. They also leave debug symbols in the binaries and have compiler
 flag tweaks, but do not at this stage contain any source
 patches. Installing the server package doesn't create a default db, or
 provide an environment where you can even start the server on a fresh
 sever
 install without further work. Probably not a good choice for most labs
 users.

 On Wednesday, February 13, 2013, Petr Bena wrote:

  thanks for updates.
 
  Can you tell me what is a difference between maria db you are using and
  the version that is recommended for use on ubuntu?
 
 
  On Wed, Feb 13, 2013 at 6:58 PM, Asher Feldman 
  afeld...@wikimedia.orgjavascript:_e({}, 'cvml', 
  'afeld...@wikimedia.org');javascript:_e({},
 'cvml', 'afeld...@wikimedia.org javascript:_e({}, 'cvml',
 'afeld...@wikimedia.org');');
   wrote:
 
  The production migration to MariaDB was paused for a time by the EQIAD
  datacenter migration and issues involving other projects that took up
 my
  time, but the trial production roll-out will resume this month.  All
 signs
  still point to our using it in production.
 
  I did a lot of query testing on an enwiki MariaDB 5.5 slave over the
  course
  of more than a month before the first production deployment.  Major
  version
  migrations with mysql and derivatives are not to be taken lightly in
  production environments.  At a minimum, one must be concerned about
 query
  optimizer changes making one particular query type significantly
 slower.
  In the case of the switch to 5.5, there are several default behavior
  changes over 5.1 that can break applications or change results.  Hence,
  some serious work over a plodding time frame before that first
 production
  slave switch.
 
  Despite those efforts, a couple weeks after the switch, I saw a query
  generated by what seems to be a very rare edge case from that AFTv4
  extension that violated stricter enforcement of unsigned integer types
 in
  5.5, breaking replication and requiring one off rewriting and
 execution of
  the query locally to ensure data consistency before skipping over it.
  I
  opened a bug, Mathias fixed the extension, and I haven't seen any other
  compatibility issues from AFTv4 or anything else deployed on enwiki.
 
  That said, other projects utilize different extensions, so all of my
  testing that has gone into enwiki cannot be assumed to fully cover
  everything else.  Because of that, and because I want to continue
  proceeding with caution for all of our projects, this will continue to
 be
  a
  slow and methodical process at this stage.  Bugs in extensions that
 aren't
  used by English Wikipedia may be found and require fixing along the
 way.
 
  As the MariaDB roll-out proceeds, I will provide updates on wikitech-l.
 
  Best,
  Asher
 
  On Wed, Feb 13, 2013 at 5:19 AM, Petr Bena 
  benap...@gmail.comjavascript:_e({}, 'cvml', 
  'benap...@gmail.com');javascript:_e({},
 'cvml', 'benap...@gmail.com javascript:_e({}, 'cvml',
 'benap...@gmail.com');');
  wrote:
 
   Okay - so what is outcome? Should we migrate beta cluster? Are we
 going
  to
   use it in production?
  
  
   On Wed, Feb 13, 2013 at 2:08 PM, Chad 
   innocentkil...@gmail.comjavascript:_e({}, 'cvml', 
   'innocentkil...@gmail.com');javascript:_e({},
 'cvml', 'innocentkil...@gmail.com javascript:_e({}, 'cvml',
 'innocentkil...@gmail.com');');
  wrote:
  
   On Wed, Feb 13, 2013 at 8:05 AM, bawolff 
   bawolff...@gmail.comjavascript:_e({}, 'cvml', 
   'bawolff%2...@gmail.com');javascript:_e({},
 'cvml', 'bawolff%2...@gmail.com javascript:_e({}, 'cvml',
 'bawolff%252...@gmail.com');');
  wrote:
Umm there was a thread several months ago about how it is used on
   several
of the slave dbs, if I recall

Re: [Wikitech-l] [Labs-l] Maria DB

2013-02-14 Thread Asher Feldman

I would much rather abandon using debs than use what the debian project has
done to mysql packaging in any production environment. If the discussion
has come down to this, I did WMF a disservice by drifting away from Domas'
optimized make ; make install ; rsync unstripped binaries to prod
workflow.

In general, I find environments that don't individually package according
to distro standards every part of their core application stack
that gets built in-house to be more productive, and more responsive to the
needs of developers and ultimately the application. When an ops team
claims that building a recent version of libmemcached for a stable OS is
almost impossibly hard and will take weeks because it requires backporting
a debian maintainers packaging of it for an experimental distro with that
distros unrelated library version dependencies and reliance on a newer
incompatible dpkg tool chain, there's probably something wrong with
that workflow.

I like to rely on Linux distros for the lowest common denominator layer of
the stack and related security updates. The approach that goes into
building and maintaining such a beast are rather different than the
concerns that go into operating a continually developed and deployed
distributed application used by half a billion people.  I don't see a win
in trying to force the two together.

On Thursday, February 14, 2013, Faidon Liambotis wrote:


 For MySQL/MariaDB, it seems that the Debian packages don't ship a -dbg
 package by default. That's a shame, we can ask for that. As for the rest
 of Asher's changes, I'd love to find a way to make stock packages work
 in our production setup, but I'm not sure if the maintainer would
 welcome the extra complexity of conditionally switching behaviors. We
 can try if you're willing to, Asher :)

 Regards,
 Faidon

 ___
 Labs-l mailing list
 lab...@lists.wikimedia.org javascript:;
 https://lists.wikimedia.org/mailman/listinfo/labs-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Labs-l] Maria DB

2013-02-13 Thread Asher Feldman

The production migration to MariaDB was paused for a time by the EQIAD
datacenter migration and issues involving other projects that took up my
time, but the trial production roll-out will resume this month.  All signs
still point to our using it in production.

I did a lot of query testing on an enwiki MariaDB 5.5 slave over the course
of more than a month before the first production deployment.  Major version
migrations with mysql and derivatives are not to be taken lightly in
production environments.  At a minimum, one must be concerned about query
optimizer changes making one particular query type significantly slower.
In the case of the switch to 5.5, there are several default behavior
changes over 5.1 that can break applications or change results.  Hence,
some serious work over a plodding time frame before that first production
slave switch.

Despite those efforts, a couple weeks after the switch, I saw a query
generated by what seems to be a very rare edge case from that AFTv4
extension that violated stricter enforcement of unsigned integer types in
5.5, breaking replication and requiring one off rewriting and execution of
the query locally to ensure data consistency before skipping over it.  I
opened a bug, Mathias fixed the extension, and I haven't seen any other
compatibility issues from AFTv4 or anything else deployed on enwiki.

That said, other projects utilize different extensions, so all of my
testing that has gone into enwiki cannot be assumed to fully cover
everything else.  Because of that, and because I want to continue
proceeding with caution for all of our projects, this will continue to be a
slow and methodical process at this stage.  Bugs in extensions that aren't
used by English Wikipedia may be found and require fixing along the way.

As the MariaDB roll-out proceeds, I will provide updates on wikitech-l.

Best,
Asher

On Wed, Feb 13, 2013 at 5:19 AM, Petr Bena benap...@gmail.com wrote:

 Okay - so what is outcome? Should we migrate beta cluster? Are we going to
 use it in production?


 On Wed, Feb 13, 2013 at 2:08 PM, Chad innocentkil...@gmail.com wrote:

 On Wed, Feb 13, 2013 at 8:05 AM, bawolff bawolff...@gmail.com wrote:
  Umm there was a thread several months ago about how it is used on
 several
  of the slave dbs, if I recall.
 

 Indeed, you're looking for mariadb 5.5 in production for english
 wikipedia

 http://www.gossamer-threads.com/lists/wiki/wikitech/319925

 -Chad

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



 ___
 Labs-l mailing list
 lab...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/labs-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Can we kill DBO_TRX? It seems evil!

2013-02-13 Thread Asher Feldman

It looks like Daniel's change to log implicit commits went live on the wmf
cluster with the release of 1.21wmf9.

Unfortunately, it doesn't appear to be as useful as hoped for tracking down
nested callers of Database::begin, the majority of log entries just look
like:

Wed Feb 13 22:07:21 UTC 2013mw1146  dewiki  DatabaseBase::begin:
Transaction already in progress (from DatabaseBase::begin),  performing
implicit commit!

It's like we'd need a backtrace at this point.  So I think we should
revisit this issue and either:

- expand the logging to make it more useful

- disable it to prevent filling the dberror log with inactionable messages
and nothing else

- revisit the ideas of either dropping the implicit commit by use of a
transaction counter, or of emulating real nested transactions via save
points.

The negative impact on concurrency due to longer lived transactions and
longer held locks may negate the viability of the third option, even though
it feels the most correct.

-Asher

On Wed, Sep 26, 2012 at 4:30 AM, Daniel Kinzler dan...@brightbyte.dewrote:

 I have submitted two changes for review that hopefully remedy the current
 problems:

 * I1e746322 implements better documentation, more consistent behavior, and
 easier tracking of implicit commits in Database::begin()

 * I6ecb8faa restores the flushing commits that I removed a while ago
 under the
 assumption that a commit without a begin would be a no-op.

 I hope this addresses any pressing issues.

 I still think that we need a way to protect critical sections. But an RFC
 seems
 to be in order for that.

 -- daniel

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-12 Thread Asher Feldman

On Tuesday, February 12, 2013, Diederik van Liere wrote:

If you can point me to some examples, I'll see if I can find any insights
into the behavior.

On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards
aricha...@wikimedia.org
wrote:

1) Going directly to a page (eg clicking a link from a Google search)
will
result in the backend serving a page with ONLY summary section content
and
section headers. The rest of the page is lazily loaded via API request
once
the JS for the page gets loaded. The idea is to increase responsiveness
by
reducing the delay for an article to load (further details in the
article
Jon previously linked to). The API request looks like:

These API requests are identical, but only #2 should be counted as a
'pageview' - #1 is a secondary API request and should not be counted
as a
'pageview'. You could make the argument that we just count all of these
API
requests as pageviews, but there are cases when we can't load article
content from the API (like devices that do not support JS), so we need
to
be able to count the traditional page request as a pageview - thus we
need
a way to differentiate the types of API requests being made when they
otherwise share the same URL.

On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrob...@gmail.com
wrote:

Lazy loading sections

For motivation behind moving MobileFrontend into the direction of
lazy
loading section content and subsequent pages can be found here [1], I
just gave it a refresh as it was a little out of date.

In summary the reason is to
1) make the app feel more responsive by simply loading content rather
than reloading the entire interface
2) reducing the payload sent to a device.

Session Tracking

Going back to the discussion of tracking mobile page views, it sounds
like a header stating whether a page is being viewed in alpha, beta
or
stable works fine for standard page views.

As for the situations where an entire page is loaded via the api it
makes no dif
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-11 Thread Asher Feldman

Max - good answers re: caching concerns.  That leaves studying if the bytes
transferred on average mobile article view increases or decreases with lazy
section loading.  If it increases, I'd say this isn't a positive direction
to go in and stop there.  If it decreases, then we should look at the
effect on total latency, number of requests required per pageview, and the
impact on backend apache utilization which I'd expect to be  0.

Does the mobile team have specific goals that this project aims to
accomplish?  If so, we can use those as the measure against which to
compare an impact analysis.

On Mon, Feb 11, 2013 at 12:21 PM, Max Semenik maxsem.w...@gmail.com wrote:

 On 11.02.2013, 22:11 Asher wrote:

  And then I'd wonder about the server side implementation. How will
 frontend
  cache invalidation work? Are we going to need to purge every individual
  article section relative to /w/api.php on edit?

 Since the API doesn't require pretty URLs, we could simply append the
 current revision ID to the mobileview URLs.

  Article HTML in memcached
  (parser cache), mobile processed HTML in memcached.. Now individual
  sections in memcached? If so, should we calculate memcached space needs
 for
  article text as 3x the current parser cache utilization? More memcached
  usage is great, not asking to dissuade its use but because its better to
  capacity plan than to react.

 action=mobileview caches pages only in full and serves
 only sections requested, so no changes in request patterns will result
 in increased memcached usage.

 --
 Best regards,
   Max Semenik ([[User:MaxSem]])


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-11 Thread Asher Feldman

Thanks for the clarification Arthur, that clears up some misconceptions I
had. I saw a demo around the allstaff where individual sections were lazy
loaded, so I think I had that in my head.

It does still seem to me that the data to determine secondary api requests
should already be present in the existing log line. If the value of the
page param in an action=mobileview api request matches the page in the
referrer (perhaps with normalization), it's a secondary request as per case
1 below. Otherwise, it's a pageview as per case 2. Difficult or expensive
to reconcile? Not when you're doing distributed log analysis via hadoop.

On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards aricha...@wikimedia.orgwrote:

Thanks, Jon. To try and clarify a bit more about the API requests... they
are not made on a per-section basis. As I mentioned earlier, there are two
cases in which article content gets loaded by the API:

1) Going directly to a page (eg clicking a link from a Google search) will
result in the backend serving a page with ONLY summary section content and
section headers. The rest of the page is lazily loaded via API request once
the JS for the page gets loaded. The idea is to increase responsiveness by
reducing the delay for an article to load (further details in the article
Jon previously linked to). The API request looks like:

2) Loading an article entirely via Javascript - like when a link is clicked
in an article to another article, or an article is loaded via search. This
will make ONE call to the API to load article content. API request looks
like:

These API requests are identical, but only #2 should be counted as a
'pageview' - #1 is a secondary API request and should not be counted as a
'pageview'. You could make the argument that we just count all of these API
requests as pageviews, but there are cases when we can't load article
content from the API (like devices that do not support JS), so we need to
be able to count the traditional page request as a pageview - thus we need
a way to differentiate the types of API requests being made when they
otherwise share the same URL.

On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrob...@gmail.com wrote:

Lazy loading sections

For motivation behind moving MobileFrontend into the direction of lazy
loading section content and subsequent pages can be found here [1], I
just gave it a refresh as it was a little out of date.

In summary the reason is to
1) make the app feel more responsive by simply loading content rather
than reloading the entire interface
2) reducing the payload sent to a device.

Session Tracking

Going back to the discussion of tracking mobile page views, it sounds
like a header stating whether a page is being viewed in alpha, beta or
stable works fine for standard page views.

As for the situations where an entire page is loaded via the api it
makes no difference to us to whether we
1) send the same header (set via javascript) or
2) add a query string parameter.

The only advantage I can see of using a header is that an initial page
load of the article San Francisco currently uses the same api url as a
page load of the article San Francisco via javascript (e.g. I click a
link to 'San Francisco' on the California article).

In this new method they would use different urls (as the data sent is
different). I'm not sure how that would effect caching.

Let us know which method is preferred. From my perspective
implementation of either is easy.

[1] http://www.mediawiki.org/wiki/MobileFrontend/Dynamic_Sections

On Mon, Feb 11, 2013 at 12:50 PM, Asher Feldman afeld...@wikimedia.org
wrote:
Max - good answers re: caching concerns. That leaves studying if the
bytes
transferred on average mobile article view increases or decreases with
lazy
section loading. If it increases, I'd say this isn't a positive
direction
to go in and stop there. If it decreases, then we should look at the
effect on total latency, number of requests required per pageview, and
the
impact on backend apache utilization which I'd expect to be 0.

Does the mobile team have specific goals that this project aims

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-09 Thread Asher Feldman

On Thu, Feb 7, 2013 at 4:32 AM, Mark Bergsma m...@wikimedia.org wrote:

  - Since we're repurposing X-CS, should we perhaps rename it to something
  more apt to address concerns about cryptic non-standard headers flying
  about?

 I'd like to propose to define *one* request header to be used for all
 analytics purposes. It can be key/value pairs, and be set client side where
 applicable.


There's been some confusion in this thread between headers used by
mediawiki in determining content generation or for cache variance, and
those intended only for logging.  The zero carrier header is used by the
zero extension to return specific content banners and set different default
behaviors (i.e. hide all images) as negotiated with individual mobile
carriers.  A reader familiar with this might note that their are separate
X-CS and X-Carrier headers but X-Carrier is supposed to go away now.

Agreed that there should be a single header for content that's strictly for
analytics purposes.  All changes to the udplog format in the last year or
so could likely be reverted except for the delimiter change, with a
multipurpose analytics key/value field added for all else.


 I think the question of using a URL param vs a request header should
 mainly take into account whether the response varies on the value of the
 parameter. If the responses are otherwise identical, and the value is only
 used for analytics purposes, I would prefer to put that into the above
 header instead, as it will impair cacheability / cache size otherwise (even
 if those requests are currently not cacheable for other reasons). If the
 responses are actually different based on this parameter, I would prefer to
 have it in the URL where possible.


For this particular case, the API requests are for either getting specific
sections of an article as opposed to either the whole thing, or the first
section as part of an initial pageview.  I might not have grokked the
original RFC email well, but I don't understand why this was being
discussed as a logging challenge or necessitating a request header.  A
mobile api request to just get section 3 of the article on otters should
already utilize a query param denoting that section 3 is being fetched, and
is already clearly not a primary request.

Whether or not it makes sense for mobile to move in the direction of
splitting up article views into many api requests is something I'd love to
see backed up by data.  I'm skeptical for multiple reasons.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-06 Thread Asher Feldman

On Wednesday, February 6, 2013, David Schoonover wrote:

 Just want to summarize and make sure I've got the right conclusions, as
 this thread has wandered a bit.

 *1. X-MF-Mode: Alpha/Beta Site Usage*
 *
 *
 We'll roll this into the X-CS header, which will now be KV-pairs (using
 normal URL encoding), and set by Varnish.


Nope. There will be a header denoting non-standard MobileFrontend views if
the mobile team wants to leave the caching situation as is. It will be a
response header set by mediawiki, not varnish. The header will have a
unique name, it will not share the name of the zero carrier header. The
udplog field that currently only ever contains carrier information on zero
requests will become a key value field. Udplog fields are not named, they
are positional.


  This will avoid an explosion of
 cryptic headers for analytic purposes.

 Questions:
 - It seems there's some confusion around bypassing Varnish. If I
 understand correctly, it's not that Varnish is ever bypassed, just that the
 upstream response is not cached if cookies are present. Is that right?


Bypasses varnish caching != bypassing varnish.  I don't see any use of
the later in this thread, but if there has been confusion, know that all
m.wikipedia.org requests are served via varnish.


 - Since we're repurposing X-CS, should we perhaps rename it to something
 more apt to address concerns about cryptic non-standard headers flying
 about?


Nope.. We're repurposing the fixed position udplog field, not the zero
carrier code header.




 *2. X-MF-Req: Primary vs Secondary API Requests*

 This header will be replaced with a query parameter set by the client-side
 JS code making the request. Analytics will parse it out at processing time
 and Do The Right Thing.


 Kindly correct me if I've gotten anything wrong.


 --
 David Schoonover
 d...@wikimedia.org


 On Tue, Feb 5, 2013 at 2:36 PM, Diederik van Liere
 dvanli...@wikimedia.orgwrote:

   Analytics folks, is this workable from your perspective?
  
   Yes, this works fine for us and it's also no problem to set multiple
  key/value pairs in the http header that we are now using for the X-CS
  header.
  Diederik
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-06 Thread Asher Feldman

On Wednesday, February 6, 2013, David Schoonover wrote:

 That all sounds fine to me so long as we're all agreed.


Lol. RFC closed.


 --
 David Schoonover
 d...@wikimedia.org javascript:;


 On Wed, Feb 6, 2013 at 12:59 PM, Asher Feldman 
 afeld...@wikimedia.orgjavascript:;
 wrote:

  On Wednesday, February 6, 2013, David Schoonover wrote:
 
   Just want to summarize and make sure I've got the right conclusions, as
   this thread has wandered a bit.
  
   *1. X-MF-Mode: Alpha/Beta Site Usage*
   *
   *
   We'll roll this into the X-CS header, which will now be KV-pairs (using
   normal URL encoding), and set by Varnish.
 
 
  Nope. There will be a header denoting non-standard MobileFrontend views
 if
  the mobile team wants to leave the caching situation as is. It will be a
  response header set by mediawiki, not varnish. The header will have a
  unique name, it will not share the name of the zero carrier header. The
  udplog field that currently only ever contains carrier information on
 zero
  requests will become a key value field. Udplog fields are not named, they
  are positional.
 
 
This will avoid an explosion of
   cryptic headers for analytic purposes.
  
   Questions:
   - It seems there's some confusion around bypassing Varnish. If I
   understand correctly, it's not that Varnish is ever bypassed, just that
  the
   upstream response is not cached if cookies are present. Is that right?
 
 
  Bypasses varnish caching != bypassing varnish.  I don't see any use
 of
  the later in this thread, but if there has been confusion, know that all
  m.wikipedia.org requests are served via varnish.
 
 
   - Since we're repurposing X-CS, should we perhaps rename it to
 something
   more apt to address concerns about cryptic non-standard headers flying
   about?
 
 
  Nope.. We're repurposing the fixed position udplog field, not the zero
  carrier code header.
 
 
  
  
   *2. X-MF-Req: Primary vs Secondary API Requests*
  
   This header will be replaced with a query parameter set by the
  client-side
   JS code making the request. Analytics will parse it out at processing
  time
   and Do The Right Thing.
  
  
   Kindly correct me if I've gotten anything wrong.
  
  
   --
   David Schoonover
   d...@wikimedia.org javascript:;
  
  
   On Tue, Feb 5, 2013 at 2:36 PM, Diederik van Liere
   dvanli...@wikimedia.org javascript:;wrote:
  
 Analytics folks, is this workable from your perspective?

 Yes, this works fine for us and it's also no problem to set
 multiple
key/value pairs in the http header that we are now using for the X-CS
header.
Diederik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org javascript:;
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
   
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org javascript:;
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org javascript:;
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org javascript:;
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-04 Thread Asher Feldman

On Mon, Feb 4, 2013 at 4:24 PM, Arthur Richards aricha...@wikimedia.orgwrote:


 Asher, I understand your hesitation about using HTTP header fields, but
 there are a couple problems I'm seeing with using query string parameters.
 Perhaps you or others have some ideas how to get around these:
 * We should keep user-facing URLs canonical as much as possible (primarily
 for link sharing)
 ** If we keep user-facing URLs canonical, we could potentially add query
 string params via javascript, but that would only work on devices that
 support javascript/have javascript enabled (this might not be a huge deal
 as we are planning changes such that users that do not support jQuery will
 get a simplified version of the stable site)


I was thinking of this as a solution for the X-MF-Req header, based on your
explanation of it earlier in the the thread: Almost correct - I realize I
didn't actually explain it correctly. This would be a request HTTP header
set by the client in API requests made by Javascript provided by
MobileFrontend.

I only meant to apply the query string idea to API requests, which can also
be marked to indicate non-standard versions of the site.  I completely
missed the case of non-api requests about which beta/alpha usage data needs
to be collected.  What about doing so via the eventlog service?  Only for
users actually opted into one of these programs, no need to log anything
special for the majority of users getting the standard site.

* How could this work for the first pageview request (eg a user clicking a
 link from Google or even just browsing to http://en.wikipedia.org)?


I think this is covered by the above, in that the data intended to go into
x-mf-req doesn't apply to this sort of page view, and first views from
users opted into a trial can eventlog the trial usage.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-04 Thread Asher Feldman

On Mon, Feb 4, 2013 at 4:59 PM, Arthur Richards aricha...@wikimedia.orgwrote:

 In the case of the cookie, the header would actually get set by the backend
 response (from Apache) and I believe Dave cooked up or was planning on
 cooking some magic to somehow make that information discernable when
 results are cached.


Opting into the mobile beta as it is currently implemented bypasses varnish
caching for all future mobile pageviews for the life of the cookie.  So
this probably isn't quite right (at least the when results are cached
part.)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-04 Thread Asher Feldman

On Mon, Feb 4, 2013 at 5:21 PM, Asher Feldman afeld...@wikimedia.orgwrote:

 On Mon, Feb 4, 2013 at 4:59 PM, Arthur Richards 
 aricha...@wikimedia.orgwrote:

 In the case of the cookie, the header would actually get set by the
 backend
 response (from Apache) and I believe Dave cooked up or was planning on
 cooking some magic to somehow make that information discernable when
 results are cached.


 Opting into the mobile beta as it is currently implemented bypasses
 varnish caching for all future mobile pageviews for the life of the
 cookie.  So this probably isn't quite right (at least the when results are
 cached part.)


Thinking about this further.. So long as all beta optins bypass all caching
and always have to hit an apache, it would be fine for mf to set a response
header reflecting the version of the site the optin cookie triggers (but
only if there's an optin, avoid setting on standard.)  I'd just prefer this
to be logged without adding a field to the entire udplog stream that will
generally just be wasted space.  Mobile already has one dedicated udplog
field currently intended for zero carriers, wasted log space for nearly
every request.  Make it a key/value field that can contain multiple keys,
i.e. zc:orn;v:b1 (zero carrier = orange whatever, version = beta1)

If by some chance mobile beta gets implemented in a way that doesn't kill
frontend caching for its users (maybe solely via different js behavior
based on the presence of the optin cookie?) the above won't be applicable
anymore, so using the event log facility / pixel service to note beta usage
becomes more appropriate.  If beta usage is going to be driven upwards, I
hope this approach is seriously considered.  Mobile currently only has
around a 58% edge cache hitrate as it is and it sounds like upcoming
features will place significant new demands on the apaches and for
memcached space.  If a non cache busting beta site is doable, go for the
logging method now that will later be compatible with it to avoid having to
change processing methods.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-03 Thread Asher Feldman

If you want to differentiate categories of API requests in logs, add
descriptive noop query params to the requests. I.e mfmode=2. Doing this in
request headers and altering edge config is unnecessary and a bad design
pattern. On the analytics side, if parsing query params seems challenging
vs. having a fixed field to parse, deal.

On Sunday, February 3, 2013, David Schoonover wrote:

 Huh! News to me as well. I definitely agree with that decision. Thanks,
 Ori!

 I've already written the Varnish code for setting X-MF-Mode so it can be
 captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or at
 least, MF-Mode?

 Looking especially to hear from Arthur and Matt.

 --
 David Schoonover
 d...@wikimedia.org javascript:;


 On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere
 dvanli...@wikimedia.org javascript:;wrote:

  Thanks Ori, I was not aware of this
  D
 
  Sent from my iPhone
 
  On 2013-02-02, at 16:55, Ori Livneh o...@wikimedia.org javascript:;
 wrote:
 
  
  
   On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:
  
   I don't like it's cryptic nature.
  
   Someone looking at the headers sent to his browser would be very
   confused about what's the point of «X-MF-Mode: b».
  
   Instead something like this would be much more descriptive:
   X-Mobile-Mode: stable
   X-Mobile-Request: secondary
  
   But that also means sending more bytes through the wire :S
   Well, you can (and should) drop the 'X-' :-)
  
   See http://tools.ietf.org/html/rfc6648: Deprecating the X- Prefix
 and
  Similar Constructs in Application Protocols
  
  
   --
   Ori Livneh
  
  
  
  
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org javascript:;
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org javascript:;
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org javascript:;
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-03 Thread Asher Feldman

Regarding varnish cacheability of mobile API requests with a logging query
param - it would probably be worth making frontend varnishes strip out all
occurrences of that query param and its value from their backend requests
so they're all the same to the caching instances. A generic param name that
can take any value would allow for adding as many extra log values as
needed, limited only by the uri log field length.

l=mft2l=mfstable etc.

So still an edge cache change but the result is more flexible
while avoiding changing the fixed field length log format across unrelated
systems like text squids or image caches.

On Sunday, February 3, 2013, Asher Feldman wrote:

 If you want to differentiate categories of API requests in logs, add
 descriptive noop query params to the requests. I.e mfmode=2. Doing this in
 request headers and altering edge config is unnecessary and a bad design
 pattern. On the analytics side, if parsing query params seems challenging
 vs. having a fixed field to parse, deal.

 On Sunday, February 3, 2013, David Schoonover wrote:

 Huh! News to me as well. I definitely agree with that decision. Thanks,
 Ori!

 I've already written the Varnish code for setting X-MF-Mode so it can be
 captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or
 at
 least, MF-Mode?

 Looking especially to hear from Arthur and Matt.

 --
 David Schoonover
 d...@wikimedia.org


 On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere
 dvanli...@wikimedia.orgwrote:

  Thanks Ori, I was not aware of this
  D
 
  Sent from my iPhone
 
  On 2013-02-02, at 16:55, Ori Livneh o...@wikimedia.org wrote:
 
  
  
   On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:
  
   I don't like it's cryptic nature.
  
   Someone looking at the headers sent to his browser would be very
   confused about what's the point of «X-MF-Mode: b».
  
   Instead something like this would be much more descriptive:
   X-Mobile-Mode: stable
   X-Mobile-Request: secondary
  
   But that also means sending more bytes through the wire :S
   Well, you can (and should) drop the 'X-' :-)
  
   See http://tools.ietf.org/html/rfc6648: Deprecating the X- Prefix
 and
  Similar Constructs in Application Protocols
  
  
   --
   Ori Livneh
  
  
  
  
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-03 Thread Asher Feldman

That's not at all true in the real world. Look at the actual requests for
google analytics on a high percentage of sites, etc.

Setting new request headers for mobile that map to new inflexible fields in
the log stream that must be set on all non mobile requests (\t-\t-)
equals gigabytes of unnecessarily log data every day (that we want
to save 100% of) for no good reason. Wanting to keep query params pure
isn't a good reason.

On Sunday, February 3, 2013, Tyler Romeo wrote:

 Considering that the query component of a URI is meant to identify the
 resource whereas HTTP headers are meant to tell the server additional
 information about the request, I think a header approach is much more
 appropriate than a no-op query parameter.

 If the X- is removed, I'd have no problem with the addition of these
 headers, but what is the advantage of having two over one. Wouldn't a
 header like:
 MobileFrontend: 1/2 a/b/s
 work just as fine?

 *--*
 *Tyler Romeo*
 Stevens Institute of Technology, Class of 2015
 Major in Computer Science
 www.whizkidztech.com | tylerro...@gmail.com javascript:;


 On Sun, Feb 3, 2013 at 4:35 AM, Asher Feldman 
 afeld...@wikimedia.orgjavascript:;
 wrote:

  Regarding varnish cacheability of mobile API requests with a logging
 query
  param - it would probably be worth making frontend varnishes strip out
 all
  occurrences of that query param and its value from their backend requests
  so they're all the same to the caching instances. A generic param name
 that
  can take any value would allow for adding as many extra log values as
  needed, limited only by the uri log field length.
 
  l=mft2l=mfstable etc.
 
  So still an edge cache change but the result is more flexible
  while avoiding changing the fixed field length log format across
 unrelated
  systems like text squids or image caches.
 
  On Sunday, February 3, 2013, Asher Feldman wrote:
 
   If you want to differentiate categories of API requests in logs, add
   descriptive noop query params to the requests. I.e mfmode=2. Doing
 this
  in
   request headers and altering edge config is unnecessary and a bad
 design
   pattern. On the analytics side, if parsing query params seems
 challenging
   vs. having a fixed field to parse, deal.
  
   On Sunday, February 3, 2013, David Schoonover wrote:
  
   Huh! News to me as well. I definitely agree with that decision.
 Thanks,
   Ori!
  
   I've already written the Varnish code for setting X-MF-Mode so it can
 be
   captured by varnishncsa. Is there agreement to switch to Mobile-Mode,
 or
   at
   least, MF-Mode?
  
   Looking especially to hear from Arthur and Matt.
  
   --
   David Schoonover
   d...@wikimedia.org
  
  
   On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere
   dvanli...@wikimedia.orgwrote:
  
Thanks Ori, I was not aware of this
D
   
Sent from my iPhone
   
On 2013-02-02, at 16:55, Ori Livneh o...@wikimedia.org wrote:
   


 On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:

 I don't like it's cryptic nature.

 Someone looking at the headers sent to his browser would be very
 confused about what's the point of «X-MF-Mode: b».

 Instead something like this would be much more descriptive:
 X-Mobile-Mode: stable
 X-Mobile-Request: secondary

 But that also means sending more bytes through the wire :S
 Well, you can (and should) drop the 'X-' :-)

 See http://tools.ietf.org/html/rfc6648: Deprecating the X-
 Prefix
   and
Similar Constructs in Application Protocols


 --
 Ori Livneh




 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
   
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

2013-02-03 Thread Asher Feldman

On Sunday, February 3, 2013, Tyler Romeo wrote:

 Remind me again why a production setup is logging every header of every
 request?


That's ludicrous. Please reread our udplog format documentation and this
entire thread carefully, especially the first message before commenting any
further.


  Also, if you are logging every header, then the amount of data
 added by a single extra header would be insignificant compared to the rest
 of the request.

 *--*
 *Tyler Romeo*
 Stevens Institute of Technology, Class of 2015
 Major in Computer Science
 www.whizkidztech.com | tylerro...@gmail.com javascript:;


 On Sun, Feb 3, 2013 at 5:12 AM, Asher Feldman 
 afeld...@wikimedia.orgjavascript:;
 wrote:

  That's not at all true in the real world. Look at the actual requests for
  google analytics on a high percentage of sites, etc.
 
  Setting new request headers for mobile that map to new inflexible fields
 in
  the log stream that must be set on all non mobile requests (\t-\t-)
  equals gigabytes of unnecessarily log data every day (that we want
  to save 100% of) for no good reason. Wanting to keep query params pure
  isn't a good reason.
 
  On Sunday, February 3, 2013, Tyler Romeo wrote:
 
   Considering that the query component of a URI is meant to identify the
   resource whereas HTTP headers are meant to tell the server additional
   information about the request, I think a header approach is much more
   appropriate than a no-op query parameter.
  
   If the X- is removed, I'd have no problem with the addition of these
   headers, but what is the advantage of having two over one. Wouldn't a
   header like:
   MobileFrontend: 1/2 a/b/s
   work just as fine?
  
   *--*
   *Tyler Romeo*
   Stevens Institute of Technology, Class of 2015
   Major in Computer Science
   www.whizkidztech.com | tylerro...@gmail.com javascript:;javascript:;
  
  
   On Sun, Feb 3, 2013 at 4:35 AM, Asher Feldman 
   afeld...@wikimedia.orgjavascript:;
  javascript:;
   wrote:
  
Regarding varnish cacheability of mobile API requests with a logging
   query
param - it would probably be worth making frontend varnishes strip
 out
   all
occurrences of that query param and its value from their backend
  requests
so they're all the same to the caching instances. A generic param
 name
   that
can take any value would allow for adding as many extra log values as
needed, limited only by the uri log field length.
   
l=mft2l=mfstable etc.
   
So still an edge cache change but the result is more flexible
while avoiding changing the fixed field length log format across
   unrelated
systems like text squids or image caches.
   
On Sunday, February 3, 2013, Asher Feldman wrote:
   
 If you want to differentiate categories of API requests in logs,
 add
 descriptive noop query params to the requests. I.e mfmode=2. Doing
   this
in
 request headers and altering edge config is unnecessary and a bad
   design
 pattern. On the analytics side, if parsing query params seems
   challenging
 vs. having a fixed field to parse, deal.

 On Sunday, February 3, 2013, David Schoonover wrote:

 Huh! News to me as well. I definitely agree with that decision.
   Thanks,
 Ori!

 I've already written the Varnish code for setting X-MF-Mode so it
  can
   be
 captured by varnishncsa. Is there agreement to switch to
  Mobile-Mode,
   or
 at
 least, MF-Mode?

 Looking especially to hear from Arthur and Matt.

 --
 David Schoonover
 d...@wikimedia.org


 On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere
 dvanli...@wikimedia.orgwrote:

  Thanks Ori, I was not aware of this
  D
 
  Sent from my iPhone
 
  On 2013-02-02, at 16:55, Ori Livneh o...@wikimedia.org wrote:
 
  
  
   On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:
  
   I don't like it's cryptic nature.
  
   Someone looking at the headers sent to his browser would be
  very
   confused about what's the point of «X-MF-Mode: b».
  
   Instead something like this would be much more descriptive:
   X-Mobile-Mode: stable
   X-Mobile-Request: secondary
  
   But that also means sending more bytes through the wire :S
   Well, you can (and should) drop the 'X-' :-)
  
   See http://tools.ietf.org/html/rfc6648: Deprecating the X-
   Prefix
 and
  Similar Constructs in Application Protocols
  
  
   --
   Ori Livneh
  
  
   
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] mariadb 5.5 in production for english wikipedia

2012-12-13 Thread Asher Feldman

On Wed, Dec 12, 2012 at 6:45 AM, Antoine Musso hashar+...@free.fr wrote:

 Le 12/12/12 01:10, Asher Feldman a écrit :
  This afternoon, I migrated one of the main production English Wikipedia
  slaves, db59, to MariaDB 5.5.28.

 Congratulations :-)

 Out of curiosity, have you looked at Drizzle too?


I've spoken with Drizzle developers at OSCON in the past.  I haven't seen
anyone advocate it as a production quality database though, and it doesn't
currently seem to have a lot of development momentum behind it, with Brian
Aker no longer putting in a lot of time.  Lots of interesting ideas and
features, especially around replication, but they make it incompatible with
MySQL in enough ways where a gradual migration wouldn't be practical even
if it was otherwise desirable.

-A
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] mariadb 5.5 in production for english wikipedia

2012-12-12 Thread Asher Feldman

On Wednesday, December 12, 2012, David Gerard wrote:

 On 12 December 2012 15:32, Thomas Fellows 
 thomas.fell...@gmail.comjavascript:;
 wrote:

  This is awesome!  Is there any write-up of the migration process floating
  around?


 +1

 In fact, this would be a nice thing to put on the WMF blog. It'll
 certainly get a lot of linkage and reporting around the geekosphere.


A detailed blog post is definitely my intent, I'm just waiting until
at least one major project is 100% on mariadb and I have more data and
hence confidence in drawn conclusions. I don't think that's far off at all,
potentially later this month.  If that occurs and goes well, the eqiad data
center migration in late January may also be a migration to all mariadb.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] mariadb 5.5 in production for english wikipedia

2012-12-11 Thread Asher Feldman

Hi,

This afternoon, I migrated one of the main production English Wikipedia
slaves, db59, to MariaDB 5.5.28.  We've previously been testing 5.5.27 on
the primary research slave, and I've been testing the current build for the
last few days on a slave in eqiad.  All has looked good, and I spent the
last few days adapting our monitoring and metrics collection tools to the
new version, and building binary packages that meet our needs.

A main gotcha in major version upgrades is performance regressions due to
changes in query plans.  I've seen no sign of this, and my initial
assessment is that performance for our workload is on par with or slightly
improved over the 5.1 facebook patchset.

Taking the times of 100% of all queries over regular sample windows, the
average query time across all enwiki slave queries is about 8% faster with
MariaDB vs. our production build of 5.1-fb.  Some queries types are 10-15%
faster, some are 3% slower, and nothing looks aberrant beyond those
bounds.  Overall throughput as measured by qps has generally been improved
by 2-10%.  I wouldn't draw any conclusions from this data yet, more is
needed to filter out noise, but it's positive.

MariaDB has some nice performance improvements that our workload doesn't
really hit (better query optimization and index usage during joins, much
better sub query support) but there are also some things, such as full
utilization of the primary key embedded on the right of every secondary
index that we can take advantage of (and improve our schema around) once
prod is fully upgraded, hopefully over the next 1-2 months.

The main goal of migrating to MariaDB is not performance driven.  More so,
I think it's in WMF's and the open source communities interest to coalesce
around the MariaDB Foundation as the best route to ensuring a truly open
and well supported future for mysql derived database technology.
Performance gains along the way are icing on the cake.

-Asher
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Ops] mariadb 5.5 in production for english wikipedia

2012-12-11 Thread Asher Feldman

On Tue, Dec 11, 2012 at 5:49 PM, Terry Chay tc...@wikimedia.org wrote:

 Nice!


  The main goal of migrating to MariaDB is not performance driven.  More
 so, I think it's in WMF's and the open source communities interest to
 coalesce around the MariaDB Foundation as the best route to ensuring a
 truly open and well supported future for mysql derived database technology.
  Performance gains along the way are icing on the cake.
 


 If it works out, then at some point we should probably tell the MariaDB
 peeos that they can mention that the WMF uses it. :-)


We've been talking to Monty Widenius who visited the WMF office prior to
the Foundation announcement, and are fostering mutual support between the
Wikimedia and MariaDB Foundations.  Win-win for the open source community
at large!

-A
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] 2013 Antonio Pizzigati Prize for Software in the Public Interest

2012-12-10 Thread Asher Feldman

If anyone is interested, or knows of a worthy entrant, the application
deadline for the 2013 Antonia Pizzigati prize, honoring open source
software development in the public interest, has been extended to Friday,
14 December 2012.

http://www.tides.org/impact/awards-prizes/pizzigati-prize/

The Antonio Pizzigati Prize for Software in the Public Interest annually
awards a $10,000 cash grant to one individual who has created or led an
effort to create an open source software product of significant value to
the nonprofit sector and movements for social change.

The Pizzigati Prize honors the brief life of Tony Pizzigati (
http://www.tides.org/impact/awards-prizes/pizzigati-prize/tony/ - he does
not have a Wikipedia entry), an early advocate of open source computing.

-Asher
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Let's talk about Solr

2012-10-18 Thread Asher Feldman

Hi all,

I'm excited to see that Max has made a lot of great progress in adding Solr
support to the GeoData extension so that we don't have to use mysql for
spatial search - https://gerrit.wikimedia.org/r/#/c/27610/

GeoData makes use of the Solarium php client, which is currently included
as a part of the extension.  GeoData will be our second use of Solar, after
TranslationMemory extension which is already deployed -
https://www.mediawiki.org/wiki/Help:Extension:Translate/Translation_memoriesand
the Wikidata team is working on using Solr in their extensions as
well.

TranslationMemory also uses Solarium, a copy of which is also bundled with
and loaded from the extension.  For a loading and config example -
https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-config.git;a=blob;f=wmf-config/CommonSettings.php;h=1e7a0e24dcbea106042826474607ec065d328472;hb=HEAD#l2407

I think Solr is the right direction for us to go in.  Current efforts can
pave the way for a complete refresh of WMF's article full text search as
well as how our developers approach information retrieval.  We just need to
make sure that these efforts are unified, with commonality around the
client api, configuration, indexing (preferably with updates asynchronously
pushed to Solr in near real-time), and schema definition.  This is
important from an operational aspect as well, where it would be ideal to
have a single distributed and redundant cluster.

It would be great to see the i18n, mobile tech, wikidata, and any other
interested parties collaborate and agree on a path forward, with a quick
sprint around common code that all can use.

-Asher
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GC cache entry

2012-10-02 Thread Asher Feldman

I think the latter case is a likely candidate, as we see a couple hundred
apache worker segfaults daily, in both php5 and libxml2 space.  The first
case is likely a bug in php core and it's worth checking whether we'd see
the same behavior running a current php release.  Core analysis may help us
determine if a reproduceable state always leads to the crash. Similarly
with libxml2. I suppose we'd have to patch apc with additional logging to
really know for sure that this is the cause.

From your understanding of the apc source, when would such items ever
actually be freed? Only on apache restart?

On Tuesday, October 2, 2012, Patrick Reilly wrote:

 Time to time, we receive a strange warning message
 in fenari:/home/wikipedia/log/syslog/apache.log

 Oct  3 01:01:03 10.0.11.59 apache2[20535]: PHP Warning: * require() [a
 href='function.require'function.require/a]: GC cache entry
 '/usr/local/apache/common-local/wmf-config/ExtensionMessages-1.20wmf12.php'
 (dev=2049 ino=10248005) was on gc-list for 601 seconds in
 /usr/local/apache/common-local/php-1.20wmf12/includes/AutoLoader.php* on
 line 1150

 Definitely this issue comes from *APC*, source code from package
 apc-3.1.6-r1.
 When item is inserted into user cache or file cache, this function is
 called.

 static void process_pending_removals(apc_cache_t* cache TSRMLS_DC)
 {
 slot_t** slot;
 time_t now;

 /* This function scans the list of removed cache entries and deletes any
  * entry whose reference count is zero (indicating that it is no longer
  * being executed) or that has been on the pending list for more than
  * cache-gc_ttl seconds (we issue a warning in the latter case).
  */

 if (!cache-header-deleted_list)
 return;

 slot = cache-header-deleted_list;
 now = time(0);

 while (*slot != NULL) {
 int gc_sec = cache-gc_ttl ? (now - (*slot)-deletion_time) : 0;

 if ((*slot)-value-ref_count = 0 || gc_sec  cache-gc_ttl) {
 slot_t* dead = *slot;

 if (dead-value-ref_count  0) {
 switch(dead-value-type) {
 case APC_CACHE_ENTRY_FILE:
 apc_warning(GC cache entry '%s' (dev=%d ino=%d)
 was on gc-list for %d seconds TSRMLS_CC,
 dead-value-data.file.filename,
 dead-key.data.file.device, dead-key.data.file.inode, gc_sec);
 break;
 case APC_CACHE_ENTRY_USER:
 apc_warning(GC cache entry '%s'was on gc-list for
 %d seconds TSRMLS_CC, dead-value-data.user.info, gc_sec);
 break;
 }
 }
 *slot = dead-next;
 free_slot(dead TSRMLS_CC);
 }
 else {
 slot = (*slot)-next;
 }
 }
 }

 From APC configuration (
 http://us.php.net/manual/en/apc.configuration.php#ini.apc.gc-ttl )

 *apc.gc_ttl integer*

 The number of seconds that a cache entry may remain on the
 garbage-collection list. This value provides a fail-safe in the event that
 a server process dies while executing a cached source file; if that source
 file is modified, the memory allocated for the old version will not be
 reclaimed until this TTL reached. Set to zero to disable this feature.

 We get messages GC cache entry '%s' (dev=%d ino=%d) was on gc-list for %d
 seconds or GC cache entry '%s'was on gc-list for %d seconds in this
 condition:

 (gc_sec  cache-gc_ttl)  (dead-value-ref_count  0)

 First condition means, item was deleted later then apc.gc_ttl seconds ago
 and its still in garbage collector list.
 Seconds condition means, item is still referenced.

 e.g., when a process unexpectedly died, reference is not decreased. First
 apc.ttl seconds is active in APC cache, then is deleted (there isn't next
 hit on this item). Now item is on garbage collector list (GC) and
 apc.gc_ttl timeout is running.
 When apc.gc_ttl is less then (now - item_deletion_time), warning is written
 and item is finally completely flushed.

 So what should we do?
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org javascript:;
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Can we kill DBO_TRX? It seems evil!

2012-09-26 Thread Asher Feldman

On Wednesday, September 26, 2012, Daniel Kinzler wrote:

 I see your point. But if we have the choice between lock contention and
 silent
 data loss, which is better?


This isn't really a choice - by default, when a statement in mysql hits a
lock timeout, it is rolled back but the transaction it's in is not. That
 can also lead to data loss via partial writes in real world cases if not
properly accounted for by the application.

Avoiding holding locks longer than needed really should be paramount.
Developers need to adapt  to cases where transaction semantics alone can't
guarantee consistancy across multiple write statements. We're planning on
sharding some tables this year and there will be cases where writes will
have to go to multiple database servers, likely without the benefit of two
phase commit. That doesn't mean that we should give up on consistancy or
that we shouldn't try to do better, but not in exchange for more lock
contention.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Can we kill DBO_TRX? It seems evil!

2012-09-26 Thread Asher Feldman

On Wed, Sep 26, 2012 at 4:07 AM, Daniel Kinzler dan...@brightbyte.dewrote:

 On 26.09.2012 12:06, Asher Feldman wrote:
  On Wednesday, September 26, 2012, Daniel Kinzler wrote:
 
  I see your point. But if we have the choice between lock contention and
  silent
  data loss, which is better?
 
 
  This isn't really a choice - by default, when a statement in mysql hits a
  lock timeout, it is rolled back but the transaction it's in is not.

 Uh. That sounds evil and breaks the A in ACID, no? Why isn't the entire
 transaction rolled back in such a case?


There's a distinction (possibly misguided) between cases where a statement
can be retried with an expectation of success, and cases that aren't which
trigger an implicit rollback.  Deadlocks are considered the latter by
mysql, they result in a transaction rollback.  Oracle behaves the same way
as mysql with regards to lock timeouts - it's up to developers to either
retry the timed-out statement, or rollback.  The results can definitely be
evil if not handled correctly, but it's debatable if it's a violation of
atomicity.

If lock timeout throws an exception that closes the connection to mysql, at
least that will result in a rollback.  If the connection is pooled and
reused, it can likely result in a commit.

Mysql does offer a rollback_on_timeout option that changes the default
behavior.  We can enable it at wmf, but since that may not be an option for
many installs, it's better to work around it.

 That
   can also lead to data loss via partial writes in real world cases if not
  properly accounted for by the application.

 How could we detect such a case?


I can't think of a way that's actually good. Better to account for the
behavior.

 That doesn't mean that we should give up on consistancy or
  that we shouldn't try to do better, but not in exchange for more lock
  contention.

 Well, improving consistency and avoiding data loss is going to be hard
 without
 the use of locks... how do you propose to do that?


We could try to identify cases where consistency is extremely important,
vs. where it isn't.  In the cases where a very important lock holding
transaction will be entered, can we defer calling hooks or doing anything
unrelated until that transaction is closed at its intended endpoint?  If
so, perhaps everything else can be subject to current behavior, where
unrelated code can call commit.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Indexing sha1 hashes in mysql

2012-09-25 Thread Asher Feldman

As we've increased our use of sha1 hashes to identify unique content over
the past year, I occasionally see changesets or discussions about indexing
sha1's in mysql.  When indexing a text field, it's generally beneficial to
define the smallest index that still uniquely matches a high percentage of
rows.  Search and insert performance both benefit from the space savings.

As a cryptographic hash function, sha1 has a very high degree of
uniformity.  We can estimate the percent of partial index look-ups that
will match a unique result just by comparing the size of the table to the
space covered by the index.

sha1 hashes are 160bits, which mediawiki stores in mysql with base36
encoding.  base36(2^160) == twj4yidkw7a8pn4g709kzmfoaol3x8g.  Looking at
enwiki.revision.rev_sha1, the smallest current value is
02xi72hkkhn1nvfdeffgp7e1w3s and the largest,
twj4yi9tgesxysgyi41bz16jdkwroha.

The number of combinations covered by indexing the top bits represented by
the left-most 4 thru 10 characters:

sha1_index(4) = 1395184 (twj4)
sha1_index(5) = 50226658 (twj4y)
sha1_index(6) = 1808159706 (twj4yi)
sha1_index(7) = 65093749429 (twj4yid)
sha1_index(8) = 2343374979464 (twj4yidk)
sha1_index(9) = 84361499260736 (twj4yidkw)
sha1_index(10) = 3037013973386503 (twj4yidkw7)

percentage of unique matches in a table of 2B sha1's:

sha1_index(7) = 96.92%
sha1_index(8) = 99.91%
sha1_index(9) = 99.997%
sha1_index(10) = 99.%

percentage of unique matches in a table of 10B sha1's:

sha1_index(8) = 99.573%
sha1_index(9) = 99.988%
sha1_index(10) = 99.9996%

Given current table sizes and growth rates, an 8 character index on a sha1
column should be sufficient for years for many cases (i.e. media files
outside of commons, revisions on projects outside of the top 10), while a
10 character index still provides 99.99% coverage of 100 billion sha1's.

Caveat: The likely but rare worst case for a partial index is that we may
have tables with hundreds of rows containing the same sha1, perhaps
revisions of a page that had a crazy revert war.  A lookup for that
specific sha1 will have to do secondary lookups for each match, as would
lookups of any other sha1 that happens to collide within the index space.
If the index is large enough to make the later case quite unlikely, prudent
use of caching can address the first.

tl;dr Where an index is desired on a mysql column of base36 encoded sha1
hashes, I recommend ADD INDEX (sha1column(10)).  Shorter indexes will be
sufficient in many cases, but this is still provides a 2/3 space savings
while covering a huge (2^51.43) space.

-Asher
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Indexing sha1 hashes in mysql

2012-09-25 Thread Asher Feldman

Base36 certainly isn't the most efficient way to store a sha1, but it's
what is in use all over mediawiki.  I think there was some discussion on
this list of the tradeoffs of different methods when revision.rev_sha1 was
added, and base36 was picked as a compromise.  I don't know why base36 was
picked over base62 once it was decided to stick with an ascii alpha-numeric
encoding but regardless, there was opposition to binary.   Taken on its
own, an integer index would be more efficient but I don't think it makes
sense if we continue using base36.

On Tue, Sep 25, 2012 at 11:20 AM, Artur Fijałkowski wiki.w...@gmail.comwrote:

  tl;dr Where an index is desired on a mysql column of base36 encoded
 sha1
  hashes, I recommend ADD INDEX (sha1column(10)).  Shorter indexes will be
  sufficient in many cases, but this is still provides a 2/3 space savings
  while covering a huge (2^51.43) space.

 Isn't it better to store BIGINT containing part of (binary) sha1 and
 use index on numeric column?

 AJF/WarX

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] mediawiki profiling presentation slide deck

2012-09-12 Thread Asher Feldman

Here's the slide deck for the mediawiki profiling presentation I gave at
WMF Tech Days 2012 yesterday:

https://commons.wikimedia.org/wiki/File:MediaWikiPerformanceProfiling.pdf

-Asher
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] scaled media (thumbs) as temporary files, not stored forever

2012-09-05 Thread Asher Feldman

On Tue, Sep 4, 2012 at 3:11 PM, Platonides platoni...@gmail.com wrote:

 On 03/09/12 02:59, Tim Starling wrote:
  I'll go for option 4. You can't delete the images from the backend
  while they are still in Squid, because then they would not be purged
  when the image is updated or action=purge is requested. In fact, that
  is one of only two reasons for the existence of the backend thumbnail
  store on Wikimedia. The thumbnail backend could be replaced by a text
  file that stores a list of thumbnail filenames which were sent to
  Squid within a window equivalent to the expiry time sent in the
  Cache-Control header.
 
  The other reason for the existence of the backend thumbnail store is
  to transport images from the thumbnail scalers to the 404 handler. For
  that purpose, the image only needs to exist in the backend for a few
  seconds. It could be replaced by a better 404 handler, that sends
  thumbnails directly by HTTP. Maybe the Swift one does that already.
 
  -- Tim Starling

 The second one seems easy to fix. The first one should IMHO be fixed in
 squid/varnish by allowing wildcard purges (ie. PURGE
 /wikipedia/commons/thumb/5/5c/Tim_starling.jpg/* HTTP/1.0)


fast.ly  implements group purge for varnish like this via a proxy daemon
that watches backend responses for a tag response header (i.e. all
resolutions of Tim_starling.jpg would be tagged that) and builds an
in-memory hash of tags-objects which can be purged on.  I've been told
they'd probably open source the code for us if we want it, and it is
interesting (especially to deal with the fact that we don't purge articles
at all of their possible url's) albeit with its own challenges.  If we
implemented a backend system to track thumbnails that exist for a given
orig, we may be able to remove our dependency on swift container listings
to purge images, paving the way for a second class of thumbnails that are
only cached.

A wiki with such setup could then disable the on-disk storage.


I think this is entirely doable, but scaling the imagescalers to support
cache failures at wmf scale would be a waste, except perhaps for
non-standard sizes that aren't widely used.  I like Brion's thoughts on
revamping image handling, and would like to see semi-permanent (in swift)
storage of a standardized set of thumbnail resolutions but we could still
support additional resolutions.  Browser scaling is also at least worth
experimenting with.  Instances where browser scaling would be bad are
likely instances where the image is already subpar if viewed on a high-dpi
/ retina display.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Wikimedians are rightfully wary

2012-08-21 Thread Asher Feldman

While I don't agree with the negative sentiment around experimentation, I
think there's value both in MZMcBride's op-ed, and in the comment thread
that follows.  He correctly calls out some of our long term organizational
failings around product planning, resource allocation, execution, and
follow-thru.  It's almost as painful to read about LiquidThreads as it is
to use talk pages today, eight years after the LT project was first
proposed.  Are we learning from our failures?

The criticism around AFTv5 in terms of product design (nevermind the code)
is largely echoed in the comments, yet we seem rather sure that we're
giving editors a tool of importance.  My daily sampling of what's flowing
into the enwiki db from the feature appears to be 99% garbage, with the
onus being on volunteers to sort the wheat from the chaff.  If we had a
dead simple, highly function, and well designed discussion system (see
LiquidThreads), wouldn't that be the ideal route for high value feedback
from knowledgeable non-editors instead of an anonymous one-way text box at
the bottom of the articles that's guaranteed to be a garbage collector?

The one thing the op-ed seems to miss is that one of the main goals of the
foundation is to attract new editors and improve the editing experience.  I
think development in that direction (visual editor with a new parser
especially) is hugely promising but we also need to remain cognizant of the
needs of our community, take care in allocating resources, and integrate
feedback lest our efforts mistakenly contribute to our retention problem.

On Tue, Aug 21, 2012 at 10:10 AM, Tyler Romeo tylerro...@gmail.com wrote:

 Hey,

 Not sure if anybody has seen this article yet:
 https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2012-08-20/Op-ed

 Thought it was interesting and possibly worth discussion.

 --Tyler Romeo
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Gerrit downtime this Friday

2012-07-20 Thread Asher Feldman

Mission accomplished!

On Tue, Jul 17, 2012 at 5:26 PM, Asher Feldman afeld...@wikimedia.orgwrote:

 Hi All,

 Ryan Lane and I are migrating gerrit's db to a server in eqiad (where the
 gerrit app server is located) on Friday, and have a downtime window of
 18:00-19:00 UTC (11am-12pm PDT).  Actual downtime should be shorter.
 Gerrit makes many mysql queries for some page requests; this will improve
 the latency of such pages.  Additionally, the new db will have both slow
 and sample based query profiling in ishmael which should assist with
 further optimizations.

 -A


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Gerrit downtime this Friday

2012-07-17 Thread Asher Feldman

Hi All,

Ryan Lane and I are migrating gerrit's db to a server in eqiad (where the
gerrit app server is located) on Friday, and have a downtime window of
18:00-19:00 UTC (11am-12pm PDT).  Actual downtime should be shorter.
Gerrit makes many mysql queries for some page requests; this will improve
the latency of such pages.  Additionally, the new db will have both slow
and sample based query profiling in ishmael which should assist with
further optimizations.

-A
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Guidelines for db schema changes

2012-06-13 Thread Asher Feldman

Hi all,

I'd like to remind everyone involved in development that requires db schema
migrations - please keep in mind the three related guidelines in our
official deployment policies -
http://www.mediawiki.org/wiki/Development_policy#Database_patches -
especially the third, which is to make schema changes optional.

Once a migration has been reviewed, please update
http://wikitech.wikimedia.org/view/Schema_changes with all pertinent
details, then get in touch for deployment scheduling.  There are good and
legitimate reasons to not follow the make schema changes optional policy
but if that's the case, please provide 3-7 days of lead time, depending on
the size of tables and number of effected wikis.

Best,
Asher

On Mon, May 14, 2012 at 7:25 PM, Rob Lanphier ro...@wikimedia.org wrote:

 On Tue, Apr 24, 2012 at 5:52 PM, Rob Lanphier ro...@wikimedia.org wrote:
  Assuming this seems sensible to everyone, I can update this page with
 this:
  http://www.mediawiki.org/wiki/Development_policy

 And this is done now.

 In case you aren't using a threaded mail client, here's the original
 discussion:
 http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/60967

 Rob

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Error after deletion

2012-05-14 Thread Asher Feldman

commons was read-only for about 60 seconds while I was switching the master
this afternoon due to issues necessitating a kernel upgrade.

On Mon, May 14, 2012 at 6:00 PM, Bináris wikipo...@gmail.com wrote:

 I deleted a test article from huwiki, and this was the result instead of
 the usual success message. However, the deletion has been completed.

 A database error has occurred. Did you forget to run maintenance/update.php
 after upgrading? See:
 https://www.mediawiki.org/wiki/Manual:Upgrading#Run_the_update_script
 Query: DELETE FROM `globalimagelinks` WHERE gil_wiki = 'huwiki' AND
 gil_page = '929875'
 Function: GlobalUsage::deleteLinksFromPage
 Error: 1290 The MySQL server is running with the --read-only option so it
 cannot execute this statement (10.0.6.61)

 --
 Bináris
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Guidelines for db schema changes

2012-04-25 Thread Asher Feldman

I am generally in favor of all of this and in the meeting that proceeded
Rob's email, proposed that we develop a new schema migration tool for
mediawiki along similar lines. Such a beast would have to work in all
deployment cases without modifications (stock single wiki installs and at
wmf with many wikis across multiple masters with tiered replication), be
idempotent when run across many databases, track version and state per
migration, and include up/down steps in every migration.

There are opensource php migration tools modeled along those used by the
popular ruby and python frameworks. I deployed
https://github.com/davejkiger/mysql-php-migrations at kiva.org a couple
years ago where it worked well and is still in use.  Nothing will meet our
needs off the shelf though.  A good project could at best be forked into
mediawiki with modifications if the license allows it, or more likely serve
as a model for our own development.

On Tue, Apr 24, 2012 at 11:27 PM, Faidon Liambotis fai...@wikimedia.orgwrote:


  In other systems I've worked before, such problems have been solved by
 each schema-breaking version providing schema *and data* migrations for
 both forward *and backward* steps.


 This means that the upgrade transition mechanism knew how to add or
 remove columns or tables *and* how to fill them with data (say by
 concatenating two columns of the old schema). The same program would
 also take care to do the exact opposite steps in a the migration's
 backward method, in case a rollback was needed.


Down migrations aid development; I find them most useful as documentation
of prior state, making a migration readable as a diff.  They generally
aren't useful in production environments at scale though, which developers
removed from the workings of production need to be aware of.  Even with
transparent execution of migrations, the time it takes to apply changes
will nearly always be far outside of the acceptable bounds of an emergency
response necessitating a code rollback.  So except in obvious cases such as
adding new tables, care is needed to keep forward migration backwards
compatible with code as much as possible.

The migrations themselves can be kept in the source tree, perhaps even
 versioned and with the schema version kept in the database, so that both
 us and external users can at any time forward their database to any
 later version, automagically.


Yep. That we have to pull in migrations from both core and many extensions
(many projects, one migration system) while also running different sets of
extensions across different wikis intermingling on the same database
servers adds some complexity but we should get there.

-Asher
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Guidelines for db schema changes

2012-04-25 Thread Asher Feldman

Thanks, hashar!

On Wed, Apr 25, 2012 at 12:12 AM, Antoine Musso hashar+...@free.fr wrote:

 Le 25/04/12 02:52, Rob Lanphier a écrit :
  3.  For anything that involves a schema change to the production dbs,
  make sure Asher Feldman (afeld...@wikimedia.org) is on the reviewer
  list.  He's already keeping an eye on this stuff the best he can, but
  it's going to be easy for him to miss changes in extensions should
  they happen.

 I am pretty sure Jenkins could detect a change is being made on a .sql
 file and then add a specific reviewer using Gerrit CLI tool.

 Logged as:
  https://bugzilla.wikimedia.org/36228


 --
 Antoine hashar Musso


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Redirect rules ShortURL deployment - need to make a decision

2012-04-20 Thread Asher Feldman

On Fri, Apr 20, 2012 at 8:01 AM, Mark A. Hershberger m...@wikimedia.orgwrote:

 Sumana Harihareswara suma...@wikimedia.org writes:

  Please leave your comments at bug 1450 so we can decide how to write
  the rewrite rule.

 Since Gerrit makes review possible and the relevant Apache config
 (redirects.conf) is on noc and *should* be in git, I've gone ahead and
 (after discussing how to proceed with Ops) submitted a configuration to
 Gerrit: https://gerrit.wikimedia.org/r/5433


I had to give this a -2 since the rewrite rule was broken and we don't
deploy application configs tied to mediawiki via puppet or currently plan
to do so.  For that reason, I don't want this stuff dumped ad-hoc in the
puppet repo (the reason for the -2.)  The change itself is straight
forward, I just have one follow-up question about scope which I'll ask over
at the ticket.

-A
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] MobileFrontend persistent cookie overhaul and caching weirdness

2012-04-13 Thread Asher Feldman

MW needs full etag support, with hooks for extensions. Without it, we can't
widely support caching in the case you've outlined.

Different browsers handle the Vary header differently.  Some treat Varies
as don't cache.  Chrome (possibly other webkit browsers) treats it as a
marker to revalidate whatever varient is cached.  It sends an
If-Modified-Since and if there's an etag, If-None-Match header.

If MediaWiki provided etags, calculated them differently based on login
status, mobilefrontend, etc., and used them for If-None-Match requests, we
could handle browser caching sanely.  The LoggedOut cookie behavior that
Daniel described could provide a less than ideal workaround if set with an
updated timestamp on each view switch but I'd rather not see this exploited
further.  It breaks squid caching in our setup which lessens the user
experience.

On Thu, Apr 12, 2012 at 12:18 PM, Arthur Richards
aricha...@wikimedia.orgwrote:

 Per bug 35842, I've overhauled the persistent cookie handling in the
 MobileFrontend extension. I think my changes will work fine on the WMF
 architecture where most of our sites have a separate domain for their
 mobile version. However, for sites that use a shared domain for both
 desktop and mobile views, there is major browser caching-related weirdness
 that I have not been able to figure out. Details can be found in the bug:
 https://bugzilla.wikimedia.org/show_bug.cgi?id=35842

 A little more context about the issue: we need to be able to allow people
 to switch between desktop/mobile views. We're currently doing this by
 setting a cookie when the user elects to switch their view, in order to
 keep that view persistent across requests. On the WMF architecture, we do
 some funky stuff at the proxy layer for routing requests, depending on
 detected device type and whether or not certain cookies are set for the
 user. Generally speaking the sites hosted on our cluster have a separate
 domain set up for their mobile versions, even though they're powered by the
 same backend. This makes view switching a bit easier, although I think the
 long-term hope is to get rid of mobile-specific domains. For sites that do
 not have a separate domain set up, we rely solely on cookies to handle
 user-selected view toggling. This seemed to generally work OK with the way
 we were previously handling these 'persistent cookies', but the previous
 way of cookie handling has been causing caching problems on our cluster.
 The changes I've introduced to hopefully resolve those issues result in
 browser-caching issues on single-domain sites using MobileFrontend, where
 after toggling the view and browsing to a page that was earlier viewed in
 the previous context, you might see a cached copy of the page from the
 previous context. No good.

 I'm stumped and am at a point where it's hard to see the forest through the
 trees. I could use some help to deal with this - if anyone has any insight
 or suggestions, I'm all ears!

 Thanks,
 Arthur

 --
 Arthur Richards
 Software Engineer, Mobile
 [[User:Awjrichards]]
 IRC: awjr
 +1-415-839-6885 x6687
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] selenium browser testing proposal and prototype

2012-04-06 Thread Asher Feldman

On Thu, Apr 5, 2012 at 5:25 PM, Ryan Lane rlan...@gmail.com wrote:

 How many languages can we reasonably support? We're currently using
 PHP, Python, Java, OCaml and Javascript (and probably more). Should we
 also throw Ruby in here as well? What level of support are the
 Selenium tests really going to get if they require developers to use
 Ruby?


It might be good to see examples of what MW developers would actually have
to do to implement new Selenium tests once the framework is complete.
There's a login example in the github prototype that's straight forward but
I assume it will get simpler as more is written which can be reused.  I
doubt it will require much in terms of actual ruby finesse.

We've already gone down the Ruby road once. I think a lot of the
 people involved with that would say it was a bad call, especially ops.


Ruby at scale can certainly be a lulz engine, especially for those on the
sidelines.  This project doesn't seem to place any software demands on the
production cluster, or even necessarily require anything from ops though.

I assume the road you refer to was the mobile gateway; I consider that to
have been a train wreck primarily from a project standpoint as opposed to a
technical one.  When I stumbled upon it, there wasn't an employee with the
combination of access and knowledge required to commit code changes to its
read-only-to-us repo, and to deploy those changes.  We were essentially
passing bits of duct tape back and forth by transatlantic carrier pigeon.
For a slew of reasons, it makes much more sense to do what we're doing now
with MobileFrontend, but we've yet to reach the point where it does
anything the ruby gateway couldn't have done with a bit of iteration.  In
its last incarnation, it was typically faster than the current
MobileFrontend for a request not served by the frontend caching layer.  The
point being, I don't think language was the main issue there.

Chris makes a compelling argument that his preferred route is closer to
being off the shelf and widely supported by industry and community.  I have
no comment on what QA engineers prefer to hack on, but I think the ease of
hiring new ones who are good at what they do and excited about the tools
they get to use should be part of the decision.

-A
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] enwiki revision schema migration in progress

2012-03-20 Thread Asher Feldman

On Tuesday, March 20, 2012, Roan Kattouw roan.katt...@gmail.com wrote:
 So yeah /normally/
 you hit DB servers at random and different servers might respond
 differently (or be lagged to different degrees), but in this
 particular case it was always the same DB server returning the same
 lag value. Nothing strange going on here, this is how the maxlag
 parameter works.

How do you feel about a switch to change that behavior (maxlag - 1)? It
would be nice to be continue guiding developers towards throttling API
requests around maxlag without complicating schema migrations by requiring
config deployments before and after every db for this reason only.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] enwiki revision schema migration in progress

2012-03-20 Thread Asher Feldman

On Tuesday, March 20, 2012, Roan Kattouw roan.katt...@gmail.com wrote:
 On Tue, Mar 20, 2012 at 11:35 AM, Asher Feldman afeld...@wikimedia.org
wrote:
 How do you feel about a switch to change that behavior (maxlag - 1)? It
 would be nice to be continue guiding developers towards throttling API
 requests around maxlag without complicating schema migrations by
requiring
 config deployments before and after every db for this reason only.
 That sounds reasonable to me, what alternative behavior do you
 propose? A flag that, when enabled, causes maxlag to use the 2nd
 highest lag instead of the highest lag?

That was my original thought. Jeremy's idea is good too, though I wonder if
we could do something similar without depending on deployments.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] enwiki revision schema migration in progress

2012-03-19 Thread Asher Feldman

Just a heads up that the last of the 1.19 migrations, to add the sha1
column to enwiki.revision is going to be running throughout this week.
Don't be alarmed by replication lag messages for s1 dbs in irc.  I'm going
to juggle which db watchlist queries go to during the migration, so nothing
should be noticeable on the site.

-Asher
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] enwiki revision schema migration in progress

2012-03-19 Thread Asher Feldman

I've temporarily commented out db36 from db.php on the cluster.

This is a flaw in the how the client-side use of maxlag interacts with our
schema migration process - we run migrations on slaves one by one in an
automated fashion, only moving to the next after replication lag catches
up.  Mediawiki takes care of not sending queries to the lagged slave that
is under migration.  Meanwhile, maxlag always reports the value of the most
lagged slave.  Not a new issue, but this particular alter table on enwiki
is likely the most time intensive ever run at wmf.  It's slightly
ridiculous.

For this one alter, I can stop the migration script and run each statement
by hand, pulling and re-adding db's one by one along the way, but this
isn't a sustainable process.  Perhaps we can add a migration flag to
mediawiki, which if enabled, changes the behavior of maxlag and
wfWaitForSlaves() to ignore one highly lagged slave so long as others are
available without lag.

-A

On Mon, Mar 19, 2012 at 9:28 PM, MZMcBride z...@mzmcbride.com wrote:

 MZMcBride wrote:
  I'm not sure of the exact configuration, but it seems like nearly every
 API
  request is being handled by the lagged server (db36)? Or perhaps my
 scripts
  just have terrible luck.

 I added some prints to the code. Different servers are responding, but
 they're all unable to get past the lag, apparently:

 {u'servedby': u'srv234', u'error': {u'info': u'Waiting for 10.0.6.46:
 21948
 seconds lagged', u'code': u'maxlag'}}
 {u'servedby': u'srv242', u'error': {u'info': u'Waiting for 10.0.6.46:
 21982
 seconds lagged', u'code': u'maxlag'}}
 {u'servedby': u'mw20', u'error': {u'info': u'Waiting for 10.0.6.46: 21984
 seconds lagged', u'code': u'maxlag'}}
 {u'servedby': u'mw45', u'error': {u'info': u'Waiting for 10.0.6.46: 21986
 seconds lagged', u'code': u'maxlag'}}
 {u'servedby': u'mw14', u'error': {u'info': u'Waiting for 10.0.6.46: 21988
 seconds lagged', u'code': u'maxlag'}}
 {u'servedby': u'mw42', u'error': {u'info': u'Waiting for 10.0.6.46: 21989
 seconds lagged', u'code': u'maxlag'}}
 {u'servedby': u'mw3', u'error': {u'info': u'Waiting for 10.0.6.46: 21991
 seconds lagged', u'code': u'maxlag'}}
 {u'servedby': u'srv230', u'error': {u'info': u'Waiting for 10.0.6.46:
 22005
 seconds lagged', u'code': u'maxlag'}}
 {u'servedby': u'srv259', u'error': {u'info': u'Waiting for 10.0.6.46:
 22006
 seconds lagged', u'code': u'maxlag'}}
 {u'servedby': u'srv274', u'error': {u'info': u'Waiting for 10.0.6.46:
 22008
 seconds lagged', u'code': u'maxlag'}}
 {u'servedby': u'srv280', u'error': {u'info': u'Waiting for 10.0.6.46:
 22009
 seconds lagged', u'code': u'maxlag'}}
 {u'servedby': u'srv236', u'error': {u'info': u'Waiting for 10.0.6.46:
 22010
 seconds lagged', u'code': u'maxlag'}}
 {u'servedby': u'srv230', u'error': {u'info': u'Waiting for 10.0.6.46:
 22011
 seconds lagged', u'code': u'maxlag'}}

 And it goes on and on.

 The relevant branch of code is:

 ---
 def __parseJSON(self, data):
  maxlag = True
  while maxlag:
try:
  maxlag = False
  parsed = json.loads(data.read())
  content = None
  if isinstance(parsed, dict):
content = APIResult(parsed)
content.response = self.response.items()
  elif isinstance(parsed, list):
content = APIListResult(parsed)
content.response = self.response.items()
  else:
content = parsed
  if 'error' in content:
error = content['error']['code']
if error == maxlag:
  lagtime = int(re.search((\d+) seconds,
 content['error']['info']).group(1))
  if lagtime  self.wiki.maxwaittime:
lagtime = self.wiki.maxwaittime
  print(Server lag, sleeping for +str(lagtime)+ seconds)
  maxlag = True
  time.sleep(int(lagtime)+0.5)
  return False
 ---

 MZMcBride



 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] First steps at making MobileFrontend usable beyond the WMF infrastructure

2012-03-15 Thread Asher Feldman

On Wed, Mar 14, 2012 at 5:08 PM, Arthur Richards aricha...@wikimedia.orgwrote:

 To follow up on this, I actually made some additional changes to how
 useformat works to simplify manually switching between mobile and desktop
 views which had been suggested by Brion Vibber. Take a look at:
 https://www.mediawiki.org/wiki/Special:Code/MediaWiki/113865

 This removes the Permanently disable mobile view text (broken for anyone
 other than the WMF anyway) and makes it so accessing the site with
 useformat=mobile in the URL (eg by clicking 'Mobile view' at the bottom
 of any page on a site with MobileFrontend enabled) will set a cookie which
 will ensure that you see the mobile view until either the cookie expires or
 you explicitly switch back to desktop view.


It looks like permanently disable mobile view is broken completely as of
last weeks mobilefrontend deployment.  So its impossible to see how its
supposed to behave currently, but a key part of it for wikipedia is that it
takes you off the m site and disables squid's mobile redirection via the
stopMobileRedirect=true cookie.  It actually disables use of the .m. site
as the text implies, not just disabling the mobilefrontend dom rewrite that
you get when viewing the desktop version of a single article, which keeps
you on the mobile site.

Replacing this with a desktop view that leaves users permanently
accessing the desktop site via m. isn't suitable for our environment.  It
may make sense for smaller sites without a dedicated mobile namespace but
even in that case, some care is needed to ensure that any frontend caching
dosen't get inadvertently polluted or unduly fragmented.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] First steps at making MobileFrontend usable beyond the WMF infrastructure

2012-03-15 Thread Asher Feldman

On Thu, Mar 15, 2012 at 12:10 PM, Brion Vibber br...@pobox.com wrote:


 Let's please kill the m. domain.

 IMO desktop and mobile users should use the same URLs; there should be sane
 device detection; and an easy override in both directions available at all
 times.


This is blocked on migrating text from squid to varnish which is likely at
least a few months off.  Until then, MobileFrontend needs to continue
supporting the current production reality.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] State of the 1.19 deployment

2012-03-01 Thread Asher Feldman

The larger time range includes the removal of the mysql based parsercache
which is the cause of the primary decline and not related to 1.19. The time
range of the originally mentioned graph just shows a bit of context before
the enwiki deployment up until the time I posted it to irc last night.

A key change as hashar suggests is a good theory but I'm not sure if the
hit rate is actually recovering when looking at -24 or -8 hours, more time
will tell.

This may not be valid since the mysql pcache wasn't re-enabled (from an
empty state) long before 1.19 but the rate of selects against it seems to
be up day over day.  Maybe a new key is consistently fetched before it
would ever be set.

2012-02-29T11:24:00+00:00,10288.770238
2012-02-29T12:06:00+00:00,10992.754365
2012-02-29T12:48:00+00:00,11140.606746

2012-03-01T11:54:00+00:00,13912.613492
2012-03-01T12:36:00+00:00,13790.359524
2012-03-01T13:18:00+00:00,14176.010317

(from
http://ganglia.wikimedia.org/latest/graph.php?c=MySQL%20pmtpah=db40.pmtpa.wmnetv=15608m=mysql_com_selectr=customz=defaultjr=js=st=1330622240cs=2%2F28%2F2012%2010%3A23ce=3%2F1%2F2012%200%3A56vl=stmtsti=mysql_com_selectcsv=1
)

On Thursday, March 1, 2012, Antoine Musso wrote:

 Le 01/03/12 09:50, Jeroen De Dauw a écrit :

 Hey,

  There's been a slight regression in our parser cache hit rate:
 http://bit.ly/w6Gy9t


 This one is probably more informative for people not aware of the usual
 hit
 rate http://goo.gl/YY80C

 Looks to me that the miss rate went up over 500% - is that really just a
 slight regression? :)


 We really want to use absolute time range: http://bit.ly/A7kcys


 Anyway, they are absent misses. Probably a key changed somewhere in our
 parser that magically invalidated roughly 15% of the parser cache. It seems
 to slowly recover afterward.


 By zooming and making the Y scale start at 50%, the event seems to have
 occurred on February 17th just before 6am UTC.

 I have uploaded a screenshot on mediawiki.org :

  
 https://www.mediawiki.org/**wiki/File:Parser_cache_hit_**20120217.pnghttps://www.mediawiki.org/wiki/File:Parser_cache_hit_20120217.png

 From the wikitech admin logs we have:


 07:42 binasher:upgraded mysql on db40 to 5.1.53-facebook-r3753, enabled
 innodb_use_purge_thread
 --**--**--
 05:39 tstarling synchronized wmf-config/InitialiseSettings.**php
 05:38 tstarling synchronized wmf-config/CommonSettings.php
 --**--**--
 05:35 Tim: on db40: reduced to 10M, should be causing massive delays, but
 the site's not down and the purge rate is lower if anything. Going to
 disable the mysql parser cache entirely.
 --**--**--
 05:25 Tim: on db40: purge lag is still increasing at 108 per second, so
 reducing innodb_max_purge_lag to 50M
 --**--**--
 05:21 Tim: on db40: giving the innodb manual the benefit of the doubt and
 following its advice, setting innodb_max_purge_lag to 100M, which should
 give a delay of 4.5ms
 --**--**--
 05:13 Tim: killing purgeParserCache.php since it is probably doing more
 harm than good
 --**--**--
 02:43 maplebed: deployed updated thumb_handler.php to ms5 to include
 Content-Length in generated images
 --**--**--
 02:34 logmsgbot: LocalisationUpdate completed (1.19) at Fri Feb 17
 02:34:32 UTC 2012
 --**--**--
 https://wikitech.wikimedia.**org/view/Server_admin_loghttps://wikitech.wikimedia.org/view/Server_admin_log


  Going to disable the mysql parser cache entirely.

 Seems to have been reenabled on Feb 29th at 00:20:

 00:20 Tim: reimported schema files on db40 and re-enabled mysql parser
 cache


 --
 Antoine hashar Musso




 __**_
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Fwd: Revision tagging: use cases needed

2012-02-15 Thread Asher Feldman

+1 to adding to a modified version of change_tag, or something like it.
 While unfamiliar with the current tagging interface(s), the content of
ct_tag seems arbitrary (possible movie studio tagger appears 4 times in
enwiki.change_tag.ct_tag out of 2mil rows) and it probably makes sense to
keep machine tagging automatically added at the time of an edit distinct
from the apparent post-edit human/bot annotation use of ct_tag.

Re: information on which automatic tags to hide, I don't think that should
be stored with every row.  Keeping that in configuration (where
configuration options may consist of patterns to match) seems more
appropriate.

The primary use cases for this feature appear to be around offline analysis
and I'd like to see design take into account the possibility of this table
existing in a separate database from the revision table at some point in
the future.

-A

On Wed, Feb 15, 2012 at 10:27 AM, Platonides platoni...@gmail.com wrote:

 change_tag table?

 Seems straightforward.
 The only thing is that we may not want to show some of those automatic
 tags by default, so we would have to introduce a new concept of a
 'hidden' tag.
 There are several ways to accomplish that, a list in the configuration,
 adding a new column, storing it in ct_params, or just using a convention
 in the tag name for hidden ones.


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Welcome, Andrew Otto - Software Developer for Analytics

2012-01-06 Thread Asher Feldman

This is great!! Welcome Andrew!

On Fri, Jan 6, 2012 at 10:08 AM, Rob Lanphier ro...@wikimedia.org wrote:

 Hi everyone,

 I'm pleased to announce Andrew Otto will be coming to Wikimedia
 Foundation as a software developer in Platform Engineering, focused on
 analytics.  We've been hiring for this spot for quite some time, and
 I'm happy we held out for Andrew.

 Andrew comes to us from CouchSurfing, where he worked for the past
 four years as one of the very early technical staff there, working in
 various places throughout the world (Thailand, Alaska, and New York
 are the ones I recall).  His team scaled their systems from a few web
 servers and one monolithic database, to a cluster of over 30 machines
 handling almost 100 million page views per month.  He was responsible
 for introducing Puppet for system configuration at his last job, and
 much of his work at CouchSurfing has been in reviewing code and
 maintaining a consistent architecture for CouchSurfing.

 We're really excited to have Andrew on board to help bring some
 systems rigor to our data gathering process.  Our current data mining
 regime involves a few pieces of lightweight data gathering
 infrastructure (e.g. udp2log), a combination of one-off special
 purpose log crunching scripts, along with other scripts that started
 their lives as one-off special purpose scripts, but have gradually
 become core infrastructure. Most of these scripts have single
 maintainers, and there is a lot of duplication of effort.  In
 addition, the systems have a nasty tendency to break at the least
 opportune times.  Andrew's background bringing sanity to insane
 environments will be enormously helpful here.

 Andrew has an email address and is technically starting the onboarding
 process, but is still wrapping up at CouchSurfing.  He'll be with us
 part-time starting January 17, and ramping up to full-time starting in
 April.

 Andrew is based out of Virginia, but is still traveling the world.
 Right now, you'll find him in New York City.  Please join me in
 welcoming Andrew to the team!

 Rob

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] bugzilla + etherpad + misc service downtime - weds 21 dec, 18:00pst

2011-12-21 Thread Asher Feldman

Tonight's maintenance on db9 is completed, during which it was read-only
for 7 minutes.  I'm going to perform a second round of maintenance tomorrow
at the same time (Thursday 18:00PST) which will provide the long-term fix
to db9's woes.  Availability of the same set of services will be
interrupted for a similar length of time.

-Asher

On Mon, Dec 19, 2011 at 5:17 PM, Asher Feldman afeld...@wikimedia.orgwrote:

 Hi,

 We need around 20 minutes of downtime to all services that write to db9
 for replication maintenance.  Outside of services that support the ops
 team, this primarily means bugzilla, etherpad, and civicrm.  It will remain
 available for read queries, however, so read usage of services such as the
 tech blog should continue along fine.  I'm planning to start this on Weds
 at 18:00 PST and will send follow-up mail at the start and completion of
 maintenance.

 -Asher

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] bugzilla + etherpad + misc service downtime - weds 21 dec, 18:00pst

2011-12-19 Thread Asher Feldman

Hi,

We need around 20 minutes of downtime to all services that write to db9 for
replication maintenance.  Outside of services that support the ops team,
this primarily means bugzilla, etherpad, and civicrm.  It will remain
available for read queries, however, so read usage of services such as the
tech blog should continue along fine.  I'm planning to start this on Weds
at 18:00 PST and will send follow-up mail at the start and completion of
maintenance.

-Asher
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] error lasted more than 10 minutes....

2011-11-28 Thread Asher Feldman

On Mon, Nov 28, 2011 at 12:06 PM, Roan Kattouw roan.katt...@gmail.comwrote:

 On Mon, Nov 28, 2011 at 8:59 PM, Neil Harris n...@tonal.clara.co.uk
 wrote:
  I hadn't thought properly about cache stampedes: since the parser cache
 is
  only part of page rendering, this might also explain some of the other
  occasional slowdowns I've seen on Wikipedia.
 
  It would be really cool if there could be some sort of general mechanism
 to
  enable this to be prevented this for all page URLs protected by
 memcaching,
  throughout the system.
 
 I'm not very familiar with PoolCounter but I suspect it's a fairly
 generic system for handling this sort of thing. However, stampedes
 have never been a practical problem for anything except massive
 traffic combined with slow recaching, and that's a fairly rare case.
 So I don't think we want to add that sort of concurrency protection
 everywhere.


For memcache objects that can be grouped together into an ok to use if a
bit stale bucket (such as all kinds of stats), there is also the
possibility of lazy async regeneration.

Data is stored in memcache with a fuzzy expire time, i..e { data:foo,
stale:$now+15min } and a cache ttl of forever.  When getting the key, if
the time stamp inside marks the data as stale, you can 1) attempt to obtain
a exclusive (acq4me) lock from poolcounter. If immediately successful,
launch an async job to regenerate the cache (while holding the lock) but
continue the request with stale data.  In all other cases, just use the
stale data.  Mainly useful if the regeneration work is hideously expensive,
such that you wouldn't want clients blocking on even a single cache regen
(as is the behavior with poolcounter as deployed for the parser cache.)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] error lasted more than 10 minutes....

2011-11-27 Thread Asher Feldman

It appears that we were actually taken down by the reddit community, after
a link to the fundraising stats page was posted under Brandon's IAMA there.

sq71.wikimedia.org 943326197 2011-11-27T22:51:09.075 62032 109.125.42.71
TCP_MISS/200 1035 GET
http://wikimediafoundation.org/wiki/Special:FundraiserStatistics ANY_PARENT/
208.80.152.47 text/html *
http://www.reddit.com/r/IAmA/comments/mr4pf/i_am_wikipedia_programmer_brandon_harris_ama/
* -
Mozilla/5.0%20(Windows%20NT%206.1;%20WOW64)%20AppleWebKit/535.2%20(KHTML,%20like%20Gecko)%20Chrome/15.0.874.121%20Safari/535.2

That page wasn't suitable for high volume public consumption (very
expensive db query + not properly cached), so the site problem persisted
even after the db initially suspected as bad was rotated out.

On Sun, Nov 27, 2011 at 2:39 PM, Erik Moeller e...@wikimedia.org wrote:

 We had a site outage of about 30 mins, caused by a major issue,
 potentially hardware-related, with a database server, which blocked
 all MediaWiki application servers (and thereby rendered most of our
 sites unusable). Should be fixed now; we'll prepare a more
 comprehensive incident analysis soon.

 Thanks to the ops team for their speedy response.

 All best,
 Erik
 --
 Erik Möller
 VP of Engineering and Product Development, Wikimedia Foundation

 Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wmfall] WMF Staff Announcement - Welcome Leslie Carr!

2011-10-10 Thread Asher Feldman

Yay! Welcome!

On Mon, Oct 10, 2011 at 9:52 AM, CT Woo ct...@wikimedia.org wrote:

  All,

 Technical Operations department is pleased to announce another new fabulous
 staff member to its team.  Please join us to welcome Leslie Carr , our
 Network Operations Engineer, starting today, 10/10/11. She is based in San
 Francisco office.

 Leslie comes with deep and rich experience in Network Operations, ranging
 from building and scaling a rapidly expanding high capacity global network
 with several large data-centres to designing and migrating systems 
 networks from legacy setup to new state of the art infrastructure.

 Prior to joining us, Leslie was with Twitter, where she was responsible for
 implementation of a major data-centre and network migration.  Before that,
 she worked at Craiglist as the main network architect who redesigned and
 scaled their network infrastructure. Leslie has also worked for Google,
 where she created, designed and deployed redundant and scalable network for
 their various data-centres.

 Leslie has two pet cats and is an avid bike enthusiast , who bikes annually
 from SF to LA, for AIDS LifeCycle.

 Please join me in welcoming Leslie Carr to WMF and do drop by to say hi to
 her. You will not miss her (hint - look out for a reddish pink   blue
 hair!).

 Thanks,
 CT Woo





 ___
 Wmfall mailing list
 wmf...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wmfall


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Google's cached pages are much faster than wiki*edia's

2011-10-07 Thread Asher Feldman

On Thursday, October 6, 2011, IAlex ialex.w...@gmail.com wrote:
 Le 7 oct. 2011 à 06:21, Chad a écrit :
 Well we do serve the logged out cookie. What real purpose
 that serves, I don't know :)

 It's to bypass the browser cache, and to not let the user see a page with
 it's user name at the top when he just logged out.

Couldn't deleting cookies have the same effect? If we do want to set or keep
cookies on logout, do they need to be included in X-Vary-Options and bypass
squid caching?  We could also consider loading login/userbar stuff via
javascript and allow logged in users to take advantage of squid caching
provided care was taken for active editors.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Google's cached pages are much faster than wiki*edia's

2011-10-06 Thread Asher Feldman

From the wmf office in San Francisco,
webcache.googleusercontent.comresolves to something geographically
close and network RT time is around
25ms versus 86ms for en.wikipedia.org, 61ms in googles favor.

From chrome in incognito mode to avoid sending wikipedia cookies, it takes
me 391ms to fetch just the html for
http://webcache.googleusercontent.com/search?q=cache:FXQPcAQ_2WIJ:en.wikipedia.org/wiki/Devo+wikipedia+devoamp;cd=1amp;hl=enamp;ct=clnkamp;gl=us
vs
503ms for http://en.wikipedia.org/wiki/Devo.

That difference of 112ms is less than the latency difference from two round
trips, but the request depends on more, meaning that our squids are serving
the content faster than google is. Pulling
http://en.wikipedia.org/wiki/Devo from
a host in our tampa datacenter takes an average of 3ms. If we had a west
coast caching presence, I think we'd beat google's cache from our office,
but I doubt we'll ever be able to compete with google on global points of
caching presence, or network connectivity.

Note that if you're using wikipedia from a browser that has been logged in
within the last month, it is likely still sending cookies that bypass our
squid caches even when logged out.

On Fri, Sep 30, 2011 at 3:48 PM, jida...@jidanni.org wrote:

Fellows,

This is Google's cache of http://en.wikipedia.org/wiki/Devo. It is a
snapshot of the page as it appeared on 28 Sep 2011 09:22:50 GMT. The
current page could have changed in the meantime. Learn more ...

Like why is it so much faster than the real thing? Even when not logged in.

Nope, you may be one of the top ranked websites, but no by speed.

So if you can't beat 'em join 'em. Somehow use Google's caches instead
of your own. Something, anything, for a little more speed.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table

2011-09-19 Thread Asher Feldman

Since the primary use case here seems to be offline analysis and it may not
be of much interest to mediawiki users outside of wmf, can we store the
checksums in new tables (i.e. revision_sha1) instead of running large
alters, and implement the code to generate checksums on new edits via an
extension?

Checksums for most old revs can be generated offline and populated before
the extension goes live.  Since nothing will be using the new table yet,
there'd be no issues with things like gap lock contention on the revision
table from mass populating it.

On Mon, Sep 19, 2011 at 12:10 PM, Brion Vibber br...@pobox.com wrote:

 [snip]

 So just FYI -- the only *actual* controversy that needs to be discussed in
 this thread is how do we make this update applicable in a way that doesn't
 disrupt live sites with many millions of pages?

 We're pretty fixed on SHA-1 as a checksum sig (already using it elsewhere)
 and have no particular desire or need to change or think about
 alternatives;
 bikeshedding details of the formatting and storage are not at issue.

 -- brion
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-02 Thread Asher Feldman

Would it be possible to generate offline hashes for the bulk of our revision
corpus via dumps and load that into prod to minimize the time and impact of
the backfill?

When using for analysis, will we wish the new columns had partial indexes
(first 6 characters?)

Is code written to populate rev_sha1 on each new edit?

On Thu, Aug 18, 2011 at 7:40 AM, Diederik van Liere dvanli...@gmail.comwrote:

 Hi!
 I am starting this thread because Brion's revision r94289 reverted
 r94289 [0] stating core schema change with no discussion [1].
 Bugs 21860 [2] and 25312 [3] advocate for the inclusion of a hash
 column (either md5 or sha1) in the revision table. The primary use
 case of this column will be to assist detecting reverts. I don't think
 that data integrity is the primary reason for adding this column. The
 huge advantage of having such a column is that it will not be longer
 necessary to analyze full dumps to detect reverts, instead you can
 look for reverts in the stub dump file by looking for the same hash
 within a single page. The fact that there is a theoretical chance of a
 collision is not very important IMHO, it would just mean that in very
 rare cases in our research we would flag an edit being reverted  while
 it's not. The two bug reports contain quite long discussions and this
 feature has also been discussed internally quite extensively but oddly
 enough it hasn't happened yet on the mailinglist.

 So let's have a discussion!

 [0] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94289
 [1] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94541
 [2] https://bugzilla.wikimedia.org/show_bug.cgi?id=21860
 [3] https://bugzilla.wikimedia.org/show_bug.cgi?id=25312

 Best,

 Diederik

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] https via GPU?

2011-08-04 Thread Asher Feldman

From the orig post Recent Intel CPU has a fature called
AES-NIhttp://en.wikipedia.org/wiki/AES_instruction_set that
accelerates AES processing. A CPU with AES-NI can perform 5 to 10 times
faster than a CPU without it. We observe that a single core can perform 5
Gbps and 15 Gbps for encryption and decryption respectively. There's no
longer a need for specialized hardware solutions in this space, GPU based or
otherwise.

On Fri, Jul 29, 2011 at 12:10 PM, Brion Vibber br...@pobox.com wrote:

 On Fri, Jul 29, 2011 at 11:53 AM, Jon Davis w...@konsoletek.com wrote:

  On Fri, Jul 29, 2011 at 11:29, Platonides platoni...@gmail.com wrote:
   Our servers don't have a GPU, so that would need a hardware upgrade.
  
   Yes, but if large scale SSL deployment increased CPU usage to the point
  of necessitating new hardware... the cost could be reduced by purchased
  GPU's for servers rather than bunches of entirely new boxes.
 
  Conceptually I think it is a cool idea.
 

 Most likely we'll end up with dedicated SSL termination subcluster, so
 those
 machines could be grabbed with whatever hardware they specifically needed.
 Certainly something to keep in mind!

 -- brion
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] https via GPU?

2011-08-04 Thread Asher Feldman

On Thu, Aug 4, 2011 at 10:31 AM, Aryeh Gregor a...@aryeh.name wrote:

 I was under the impression that the biggest cost in TLS isn't the
 symmetric encryption for an ongoing connection, it's the asymmetric
 encryption for the connection setup.  If so, AES acceleration isn't
 going to help with the most important performance issue.  Am I wrong?


The handshake operations still aren't all that expensive these days, and
with a prudent amount of sticky loadbalancing to ssl terminating boxes, a
good hit rate can be achieved from openssl's session cache which eliminates
some of the asymmetric operations and half of the connection handshake.

From: http://www.imperialviolet.org/2010/06/25/overclocking-ssl.html

In January this year (2010), Gmail switched to using HTTPS for everything by
 default. Previously it had been introduced as an option, but now all of our
 users use HTTPS to secure their email between their browsers and Google, all
 the time. In order to do this we had to deploy *no additional machines*
 and *no special hardware*. On our production frontend machines, SSL/TLS
 accounts for less than 1% of the CPU load, less than 10KB of memory per
 connection and less than 2% of network overhead. Many people believe that
 SSL takes a lot of CPU time and we hope the above numbers (public for the
 first time) will help to dispel that.


We can't get these sorts of numbers if we run the version of openssl bundled
with lucid but everything we need is available either in patch form or has
become part of the mainline openssl source.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] New Employee Announcement - Jeff Green

2011-06-30 Thread Asher Feldman

Woo! Looking forward to working with you, Jeff!

On Thu, Jun 30, 2011 at 2:31 PM, CT Woo ct...@wikimedia.org wrote:

 All,

 Please join me to welcome Jeff Green to Wikimedia Foundation.

 Jeff is taking up the Special Ops position in the Tech Ops department where
 one of his responsibilities is to keep our Fundraising infrastructure
 secured, in compliance with regulation, scalable and highly available. Jeff
 comes with strong systems operation background especially in scaling
 and building highly secured infrastructure. He hails from Craiglist where he
 started as their first system administrator and served as their lead system
 administrator as well as their Operations manager, most of his tenure
 there.

 When not working, Jeff likes cycling, playing music, and building stuff. He
 is a proud father of two young kids and a lucky husband. He and his family
 will be moving back to Massachusetts this August. Please drop by next week
 to the 3rd floor to welcome him. For those who have already met him earlier,
 do come by as well to see the new 'ponytailess' Jeff ;-)

 Thanks,
 CT




___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

81 matches

Mail list logo