Re: [Wikitech-l] RfC update: LESS stylesheet support in core
On Thu, Sep 19, 2013 at 4:04 PM, Dan Andreescu dandree...@wikimedia.org wrote: - Has http://learnboost.github.io/stylus/ been considered? I've heard that it's a good compromise between sass and less (but I haven't played with it myself to see if it really lets you do more compass-like things). *Popularity* - does matter; one of the long comment threads on the RFC is from a potential contributor who is concerned that LESS makes it harder to contribute. I mostly agree with Jon's and Steven's arguments that LESS is pretty easy to learn. However, I have also heard about a year's worth of complaints about Limn being written in Coco instead of pure Javascript. I personally think CSS - LESS is just as mentally taxing as Javascript - Coco, but I'm objectively in the minority based on the feedback I've received. I'd be cautious here. You can upcompile CSS into LESS, sure, but if a contributor has to understand a complex LESS codebase full of mixins and abstractions while debugging the generated CSS in the browser, they're right to point out that this requires effort. And this is effort is only increased for more elegant languages like Stylus. I'm for any compiled-to-css language because I feel they fill a big gaping hole in css's ability to share code. That is really compelling to me. I haven't been convinced the compiled-to-js languages offer quite as compelling a value proposition so the analogy to Limn and Coco is less relevant to me. I admit I could be wrong about the value proposition thing but that is how I feel. I really don't want to start a language war though. I'm a Sass fan but I'll take whatever I can get. I will point out that CSS is valid LESS which could assuage some fears. Nik Everett ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] RfC update: LESS stylesheet support in core
I just want to check on folks to see if there's any more comments or issues with this RfC: https://www.mediawiki.org/wiki/Requests_for_comment/LESS Basically, this adds a stylesheet preprocessor for ResourceLoader styles specified as '.less' files; currently no on-wiki or gadget handling is included, so there are not security issues with LESS @import rules. LESS http://lesscss.org/ is pretty handy and is used by a number of our extensions to make styles more maintainable (set constants, do math, make combined rules for things like -webkit-blah). Direct LESS support in core will do away with the precompilation step during development. There's a patch implementing it in core: https://gerrit.wikimedia.org/r/#/c/78669/ and a sample patch updating MobileFrontend to use it: https://gerrit.wikimedia.org/r/#/c/84139/ Open questions: * Are there any remaining problems with the caching and dependency checks? * What's the best way to handle image embedding? (/* @embed */ rules get messed up but we can use an alternate function...) -- see notes on gerrit * ...any other concerns with performance, security, or basic functionality? -- brion ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] CentralNotice -- Caching and Proxies
Node is a good choice for this kind of task. If the total size of all unique banners is relatively small you might even be able to cache the banners in-memory instead of doing backend cache requests. Though, not very explicitly proposed; but I was thinking that the best plan would be to have both node and varnish on the proxy box. I'd rather not write a caching layer in node when varnish does a fine job at it; but I also think it's somewhat silly to have symmetric traffic on the proxy when I can avoid it by having varnish on box. And the amount of data is small enough that we can easily fit it into 16GB RAM. (Probably less than 8; but I don't know how it's all going to work itself out in production.) ~Matt Walker Wikimedia Foundation Fundraising Technology Team On Thu, Sep 19, 2013 at 8:48 AM, Gabriel Wicke gwi...@wikimedia.org wrote: On 09/18/2013 06:06 PM, Matthew Walker wrote: Hey all, I've been scheming for a while on how to reduce the number of calls up to the server for CentralNotice. At the same time I want to greatly reduce the number of objects I have in cache. To do this I propose to change the architecture to having an intermediate proxy server with a static head JS section in mediawiki page head. The proxy would map down all the variables to only what is required at the time. +1 for limiting the application logic in regular text Varnishes, both from a performance and risk management perspective. Having your own banner proxies should make it easier to tweak its behavior to your needs without the risk of taking down the entire site. Node is a good choice for this kind of task. If the total size of all unique banners is relatively small you might even be able to cache the banners in-memory instead of doing backend cache requests. Gabriel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RfC update: LESS stylesheet support in core
- Has http://learnboost.github.io/stylus/ been considered? I've heard that it's a good compromise between sass and less (but I haven't played with it myself to see if it really lets you do more compass-like things). I was just writing a message about Stylus [0] so I'm glad you brought it up. Limn [1] uses Stylus and we've been pretty happy with it. I read the RFC carefully and it seems the two big reasons to pick LESS over Stylus/SASS are popularity and support in PHP. The reason to pick Stylus/SASS over LESS is a more elegant syntax and a slight edge in features. *PHP support* - Stylus does have PHP support [2] but it's not even close to as mature as the LESS support. *Popularity* - does matter; one of the long comment threads on the RFC is from a potential contributor who is concerned that LESS makes it harder to contribute. I mostly agree with Jon's and Steven's arguments that LESS is pretty easy to learn. However, I have also heard about a year's worth of complaints about Limn being written in Coco instead of pure Javascript. I personally think CSS - LESS is just as mentally taxing as Javascript - Coco, but I'm objectively in the minority based on the feedback I've received. I'd be cautious here. You can upcompile CSS into LESS, sure, but if a contributor has to understand a complex LESS codebase full of mixins and abstractions while debugging the generated CSS in the browser, they're right to point out that this requires effort. And this is effort is only increased for more elegant languages like Stylus. *Syntax* - Stylus and SASS definitely have cleaner, simpler syntax. Stylus aims to be the cleanest of the three but it definitely smells like that SNL skit about the number of razor blades. They have 4 blades?! Fine, we'll make one with *5* BLADES!!! What I'm referring to here is that Stylus has optional colons and tries to be as much like python as possible. *Features* - The interesting thing about the features comparisons out there is that all of them seem to be outdated. For example this write-up [3] highlights that @media queries can be nested in SASS (same is true for Stylus). But the LESS people implemented that as well (Feb 2013). This said, it does seem that Stylus and SASS are leading the pack in terms of new features. Introspection [4] is a very cool one in Stylus that I'm not sure you can do in LESS. I think the decision's pretty much been made to go with LESS, and I agree with it. I think it strikes the better balance between making it easy for people to contribute and DRY-ing up our codebase. But in the future, if we loved the migration to LESS and we just wish it had more features and more DRY-ness, we should revisit Stylus. [0] - http://learnboost.github.io/stylus/ [1] - https://github.com/wikimedia/limn/tree/develop/css [2] - https://github.com/AustP/Stylus.php [3] - http://css-tricks.com/sass-vs-less/ [4] - http://learnboost.github.io/stylus/docs/introspection.html ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RfC update: LESS stylesheet support in core
Some more questions for discussion: - I'm concerned that some of the useful things people do with sass (ie, robust cross-browser support with compass) are impossible with less. - Has http://learnboost.github.io/stylus/ been considered? I've heard that it's a good compromise between sass and less (but I haven't played with it myself to see if it really lets you do more compass-like things). - The interaction between ResourceLoader and @import seems a bit under-defined. Although less has not really documented it yet, less added a slew of new @import options in 1.4.1/1.5.0 (see https://github.com/less/less.js/blob/master/CHANGELOG.md ; https://github.com/less/less.js/issues/1185 ; https://github.com/less/less.js/issues/1209 ; https://github.com/less/less.js/issues/1210 ). It would be nice to have a concrete written guideline for how MW authors are expected to use @import and/or better integrate @import with ResourceLoader. - @import processes referenced URLs asynchronously, IIRC, which might cause issues w/ integration. I haven't done a code review to see how the existing patches handle this (or not). --scott ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [WikimediaMobile] [Analytics] Mobile stats
+Analytics On Thu, Sep 19, 2013 at 1:57 PM, Adam Baso ab...@wikimedia.org wrote: A run on yesterday's valid Wikipedia Zero hits showed that user agents NOT supporting HTML (i.e., only supporting WAP) is only 0.098 - 0.108 *percent*. Assuming a bunch of complaints don't come in (e.g., I'm getting tag soup!, as Max might say), I think we could make a reasonable case to stop supporting WAP through the formal channels (blog, mailing list(s), etc.). -Adam On Tue, Sep 17, 2013 at 1:11 PM, Arthur Richards aricha...@wikimedia.orgwrote: That's awesome - thanks Max and Adam; it's great to see the last vestiges of X-Device finally disappear! On Tue, Sep 17, 2013 at 1:07 PM, Max Semenik maxsem.w...@gmail.comwrote: After looking at Varnish VCL with Adam, we discovered a bug in regex resulting in many phones being detected as WAP when they shouldn't be. Since the older change[1] simplifying detection had also fixed this bug, Brandon Black deployed it and since today the usage share of WAP should seriously drop. We will be monitoring the situation and revisit the issue of WAP popularity once we have enough data. [1] https://gerrit.wikimedia.org/r/83919 On Tue, Sep 10, 2013 at 4:39 PM, Adam Baso ab...@wikimedia.org wrote: Thanks. 7-9% of responses on Wikipedia Zero being WAP is pretty substantial. On Tue, Sep 10, 2013 at 2:01 PM, Andrew Otto o...@wikimedia.orgwrote: These zero.tsv.log* files to which I refer seem to be, basically Varnish log lines that correspond to Wikipedia Zero-targeted traffic. Yup! Correct. zero.tsv.log* files are captured unsampled and based on the presence of a zero= tag in the X-Analytics header: http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df8779e9a3bdb69066b2/templates%2Fudp2log%2Ffilters.oxygen.erb#L10 Do I understand correctly that field as Content-Type? Yup again! The varnishncsa format string that is currently being beamed at udp2log is here: http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df8779e9a3bdb69066b2/modules%2Fvarnish%2Ffiles%2Fvarnishncsa.default -- Best regards, Max Semenik ([[User:MaxSem]]) ___ Mobile-l mailing list mobil...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l -- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RfC update: LESS stylesheet support in core
@ori: You might want to look into the different @import options before being so dogmatic. In particular, the media-query restrictions are probably very useful to MW. The (less) option also allows overriding CSS files, which can help prevent the everything must be less! problem. And the (reference) option would let you use ResourceLoader to bundle files as usual while *also* allowing less overrides. This could be important when we're trying to override styles defined in a different resource loader bundle. @dan: the particular less isn't very powerful issues I'm concerned about are the ones solved by compass. As is well-known, there is no equivalent to compass for less, and is not likely every to be, since less can not express the transformations required. Compass uses ruby code to do this w/ sass. For example, https://github.com/chriseppstein/compass/blob/stable/lib/compass/sass_extensions/functions/gradient_support.rbis the code in compass in order to generate clean gradient specifications that work with all major browsers (including synthesizing SVG background images where required). (Spec in http://compass-style.org/reference/compass/css3/images/ ). Now, maybe we don't actually need all that power. But the automatic cross-browser compatibility it allows sure is nice... --scott ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [WikimediaMobile] [Analytics] Mobile stats
A run on yesterday's valid Wikipedia Zero hits showed that user agents NOT supporting HTML (i.e., only supporting WAP) is only 0.098 - 0.108 *percent*. Assuming a bunch of complaints don't come in (e.g., I'm getting tag soup!, as Max might say), I think we could make a reasonable case to stop supporting WAP through the formal channels (blog, mailing list(s), etc.). -Adam On Tue, Sep 17, 2013 at 1:11 PM, Arthur Richards aricha...@wikimedia.orgwrote: That's awesome - thanks Max and Adam; it's great to see the last vestiges of X-Device finally disappear! On Tue, Sep 17, 2013 at 1:07 PM, Max Semenik maxsem.w...@gmail.comwrote: After looking at Varnish VCL with Adam, we discovered a bug in regex resulting in many phones being detected as WAP when they shouldn't be. Since the older change[1] simplifying detection had also fixed this bug, Brandon Black deployed it and since today the usage share of WAP should seriously drop. We will be monitoring the situation and revisit the issue of WAP popularity once we have enough data. [1] https://gerrit.wikimedia.org/r/83919 On Tue, Sep 10, 2013 at 4:39 PM, Adam Baso ab...@wikimedia.org wrote: Thanks. 7-9% of responses on Wikipedia Zero being WAP is pretty substantial. On Tue, Sep 10, 2013 at 2:01 PM, Andrew Otto o...@wikimedia.org wrote: These zero.tsv.log* files to which I refer seem to be, basically Varnish log lines that correspond to Wikipedia Zero-targeted traffic. Yup! Correct. zero.tsv.log* files are captured unsampled and based on the presence of a zero= tag in the X-Analytics header: http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df8779e9a3bdb69066b2/templates%2Fudp2log%2Ffilters.oxygen.erb#L10 Do I understand correctly that field as Content-Type? Yup again! The varnishncsa format string that is currently being beamed at udp2log is here: http://git.wikimedia.org/blob/operations%2Fpuppet.git/37ffb0ccc1cd7d3f5612df8779e9a3bdb69066b2/modules%2Fvarnish%2Ffiles%2Fvarnishncsa.default -- Best regards, Max Semenik ([[User:MaxSem]]) ___ Mobile-l mailing list mobil...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l -- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RfC update: LESS stylesheet support in core
On Thu, Sep 19, 2013 at 12:24 PM, C. Scott Ananian canan...@wikimedia.orgwrote: - The interaction between ResourceLoader and @import seems a bit under-defined. [...] It would be nice to have a concrete written guideline for how MW authors are expected to use @import and/or better integrate @import with ResourceLoader. - @import processes referenced URLs asynchronously, IIRC, which might cause issues w/ integration. I haven't done a code review to see how the existing patches handle this (or not). @import directives in LESS files pointing at other LESS files are processed synchronously by phpless and are not present in the generated CSS output, and that's the only use of '@import' we encourage / allow. @import is reserved for loading mix-ins and variables so that they may be used by the current LESS stylesheet. It is not intended to be used for concatenating / bundling stylesheets that are related to one another only conceptually; that's what the ResourceLoader module definition is for. /* - Valid use of @import: - */ # myExtension.less @import extensionColors.less; body { background-color: @bgColor; } # extensionColors.less @bgColor: #ccc; /* - Invalid use of @import: - */ # myExtension.less @import headerStyles.less; body { background-color: #ccc; } # headerStyles.less h1 { font-family: serif; } The relatedness of myExtension.less / headerStyles.less in the second example should be expressed by referencing these files in the 'styles' array of the same ResourceLoader module. I can commit to documenting this on mw.org if / when the proposal is accepted and the patch is merged. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RecentChanges types (RC_* constants)
Does any of the 3 options avoid the same problem as https://bugzilla.wikimedia.org/show_bug.cgi?id=44874 from hitting us? users can ignore Wikidata changes in turn of efficiency (enhanced RC), but I understand you don't want them to ignore Flow. Nemo ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RecentChanges types (RC_* constants)
Unfortunatly no, none of this has anything to do specifically with fixing the spaghetti that is the enhanced changes format. I have not loofked deeply into the problem, but the comments from the WikiData developers that have looked into it suggest it is a non-trivial change. The change proposed above is very trivial from an implementation perspective, but it affects one of the most used tables in mediawiki and the developers I've spoken with have different opinions on which way is the best way to go. I wanted to give those I have not talked to directly an opportunity to be heard before we change anything. Erik Bernhardson On Thu, Sep 19, 2013 at 4:11 PM, Federico Leva (Nemo) nemow...@gmail.comwrote: Does any of the 3 options avoid the same problem as https://bugzilla.wikimedia.**org/show_bug.cgi?id=44874https://bugzilla.wikimedia.org/show_bug.cgi?id=44874from hitting us? users can ignore Wikidata changes in turn of efficiency (enhanced RC), but I understand you don't want them to ignore Flow. Nemo __**_ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RfC update: LESS stylesheet support in core
On 09/19/2013 04:53 PM, Ori Livneh wrote: I can commit to documenting this on mw.org if / when the proposal is accepted and the patch is merged. This is a good example. I recommend adding it to https://www.mediawiki.org/wiki/Requests_for_comments/LESS/Conventions. Matt Flaschen ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RecentChanges types (RC_* constants)
On Thu, Sep 19, 2013 at 11:45 AM, Erik Bernhardson ebernhard...@wikimedia.org wrote: 3. Replace RC_EXTERNAL with RC_WIKIDATA and RC_FLOW constants in their respective extensions. This is also straightforward, but adds development overhead to ensure future creators of RC_* constants do not conflict with each other. It would be handled similarly to NS_* constants with an on-wiki list. I have heard some mention that naming conflicts have occurred in the past with this solution. This would force queries looking for only core sources of change to provide an inclusive list of RC_* values to find, rather than using rc_type != RC_EXTERNAL. Please don't repeat the mistake of having extension authors actually caring what their namespace number is. Everyone just goes Oh, nobody's probably using 200 so I'll just do that. -Chad ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RecentChanges types (RC_* constants)
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/] On 2013-09-19 4:44 PM, Chad wrote: On Thu, Sep 19, 2013 at 11:45 AM, Erik Bernhardson ebernhard...@wikimedia.org wrote: 3. Replace RC_EXTERNAL with RC_WIKIDATA and RC_FLOW constants in their respective extensions. This is also straightforward, but adds development overhead to ensure future creators of RC_* constants do not conflict with each other. It would be handled similarly to NS_* constants with an on-wiki list. I have heard some mention that naming conflicts have occurred in the past with this solution. This would force queries looking for only core sources of change to provide an inclusive list of RC_* values to find, rather than using rc_type != RC_EXTERNAL. Please don't repeat the mistake of having extension authors actually caring what their namespace number is. Everyone just goes Oh, nobody's probably using 200 so I'll just do that. -Chad +1 @Eric The on-wiki list you talk about is here: https://www.mediawiki.org/wiki/Extension_default_namespaces I have heard some mention that naming conflicts have occurred in the past with this solution. Yes there are plenty. 120-121 is used by both RefHelper and Rich Media 200-203 is used by SocialProfile and Data Import 300-301 is used by PollNY and Access Control List Wikia also uses 300-399 when writing it's own extensions and doesn't bother co-operating by at least adding the defaults they use to that list to avoid conflicts. 500-501 is used by BlogPage and Linked Data 700-701 is used by LinkFilter and Collaboration BlueSpice and BlogPage have a different type of conflict too. They BOTH use the constant NS_BLOG and define different namespace defaults for it. This on-wiki page is ONLY a registry of defaults. The standard practice for these is that the starting number should be configurable so namespace ids other than the default can be used to avoid conflicts. I'm not so sure you'll be able to to that very well for RC external ids. Anyways, this whole extension namespace id setup is considered a bug. You don't want to get into this situation again. We have an open bug on dropping this default namespace nonsense and using dynamic registration of namespace IDs https://bugzilla.wikimedia.org/show_bug.cgi?id=31063 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..
On 09/17/2013 05:59 AM, Daniel Friesen wrote: Side topic https://en.wiktionary.org/w/r/t is messed up: To check for r/t on Wikipedia, see: //en.wikipedia.org/wiki/r/t https://en.wikipedia.org/wiki/r/t Good catch, filed: https://bugzilla.wikimedia.org/show_bug.cgi?id=54357 Matt Flaschen ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RecentChanges types (RC_* constants)
I will take a look over the bug, quite a long conversation. It will take me the night most likely to digest the suggestions included. I suppose my first worry is that I was targeting simple changes which can be agree'd on and implemented in a few lines, whereas the linked bug report seems to suggest a system that I know will require many iterations and weeks of on/off work before +2'd into core. Erik Bernhardson On Thu, Sep 19, 2013 at 5:07 PM, Daniel Friesen dan...@nadir-seen-fire.comwrote: ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/] On 2013-09-19 4:44 PM, Chad wrote: On Thu, Sep 19, 2013 at 11:45 AM, Erik Bernhardson ebernhard...@wikimedia.org wrote: 3. Replace RC_EXTERNAL with RC_WIKIDATA and RC_FLOW constants in their respective extensions. This is also straightforward, but adds development overhead to ensure future creators of RC_* constants do not conflict with each other. It would be handled similarly to NS_* constants with an on-wiki list. I have heard some mention that naming conflicts have occurred in the past with this solution. This would force queries looking for only core sources of change to provide an inclusive list of RC_* values to find, rather than using rc_type != RC_EXTERNAL. Please don't repeat the mistake of having extension authors actually caring what their namespace number is. Everyone just goes Oh, nobody's probably using 200 so I'll just do that. -Chad +1 @Eric The on-wiki list you talk about is here: https://www.mediawiki.org/wiki/Extension_default_namespaces I have heard some mention that naming conflicts have occurred in the past with this solution. Yes there are plenty. 120-121 is used by both RefHelper and Rich Media 200-203 is used by SocialProfile and Data Import 300-301 is used by PollNY and Access Control List Wikia also uses 300-399 when writing it's own extensions and doesn't bother co-operating by at least adding the defaults they use to that list to avoid conflicts. 500-501 is used by BlogPage and Linked Data 700-701 is used by LinkFilter and Collaboration BlueSpice and BlogPage have a different type of conflict too. They BOTH use the constant NS_BLOG and define different namespace defaults for it. This on-wiki page is ONLY a registry of defaults. The standard practice for these is that the starting number should be configurable so namespace ids other than the default can be used to avoid conflicts. I'm not so sure you'll be able to to that very well for RC external ids. Anyways, this whole extension namespace id setup is considered a bug. You don't want to get into this situation again. We have an open bug on dropping this default namespace nonsense and using dynamic registration of namespace IDs https://bugzilla.wikimedia.org/show_bug.cgi?id=31063 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RecentChanges types (RC_* constants)
You can trivially avoid the need to do anything as complex as dynamic namespace registration by simply using one of your other options like using the string 'wikidata' or 'flow' rather than a constant and integer id. If you want integer ids that badly you could always create a new rc_external_types (or whatever you want to call it) mapping an auto_increment id to keys like 'wikidata' and 'flow' and use the primary key there as the rc_external_type. Long story short. Hardcoding integer numbers into extensions hoping you're not going to conflict with other extensions is never a good idea. You're just subjecting yourself to future pain you could have avoided at the start with a simple solution. ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/] On 2013-09-19 5:41 PM, Erik Bernhardson wrote: I will take a look over the bug, quite a long conversation. It will take me the night most likely to digest the suggestions included. I suppose my first worry is that I was targeting simple changes which can be agree'd on and implemented in a few lines, whereas the linked bug report seems to suggest a system that I know will require many iterations and weeks of on/off work before +2'd into core. Erik Bernhardson ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..
On 20/09/13 03:04, Jon Robson wrote: Thanks Tim for running those data. That seems to suggest the URL structure works for the most case. I think the request rate for actual articles in the root is very, very low. And if you look at the paste I gave earlier: http://paste.tstarling.com/p/uhtFqg.html there's reason to think that the amount of traffic that comes from naive readers typing URLs and expecting an article is much smaller than even 149k per week. A naive user would be more likely to type a URL starting with a lower-case letter, and if you take those entries, and filter out the obvious client bugs and typos, that leaves only 39 log entries. If we filter out some more log entries that are unlikely search terms for Wikipedia articles (enregistrement-audio-musique, is, unlimited_data_plan, etc.), that leaves maybe 30. http://paste.tstarling.com/p/KWuHif.html Of these, only 12 actually correspond to Wikipedia articles or redirects: abolition addicting_games apple_inc carnaval dreamshade facade girls insidious karthik online_coupons snam walkabout So the number of naive readers actually helped by our 404 Refresh to /wiki/ is probably closer to 12k per week than 149k per week. Personally, I think the refresh is annoying, since it makes it much more difficult to correct typos in manually-typed URLs. If you actually meant to type some non-article URL like a CSS resource, and make a typo which causes it to hit the refresh, the URL you typed is erased from your browser's address bar and history, making correction of the typo much more difficult. Maybe we should just include a link to the search page, rather than redirect or refresh. -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..
Tim Starling wrote: Personally, I think the refresh is annoying, since it makes it much more difficult to correct typos in manually-typed URLs. If you actually meant to type some non-article URL like a CSS resource, and make a typo which causes it to hit the refresh, the URL you typed is erased from your browser's address bar and history, making correction of the typo much more difficult. Maybe we should just include a link to the search page, rather than redirect or refresh. Mark Ryan redesigned the 404 page in 2009 and specifically removed the meta refresh tag (cf. https://bugs.wikimedia.org/17316#c0). The redesigned page eventually got deployed, but the client-side refresh very sneakily moved from the HTML output to a Refresh header (cf. https://bugs.wikimedia.org/35052#c0). Neither bug is resolved, if anyone is interested in helping out. :-) MZMcBride ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [RFC]: Clean URLs- dropping /w/index.php?title=..
On 09/19/2013 10:04 AM, Jon Robson wrote: Thanks Tim for running those data. That seems to suggest the URL structure works for the most case. It certainly confirms that search engines link to working links, and users typing URLs manually are rare and (eventually) learn to prefix /wiki/. I am not that convinced that the current number of 404s says that much about the user-friendliness or aesthetics of different URL schemes, but that is besides the point (and subjective). I see /w/index.php?title=.. as the more important clean-up, which is why the RFC is only about that aspect. Gabriel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..
On 19 Sep 2013 18:23, Tim Starling tstarl...@wikimedia.org wrote: On 20/09/13 03:04, Jon Robson wrote: Thanks Tim for running those data. That seems to suggest the URL structure works for the most case. I think the request rate for actual articles in the root is very, very low. I agree.. Sorry I guess my message wasn't so clear. I meant existing URL structure :) And if you look at the paste I gave earlier: http://paste.tstarling.com/p/uhtFqg.html there's reason to think that the amount of traffic that comes from naive readers typing URLs and expecting an article is much smaller than even 149k per week. A naive user would be more likely to type a URL starting with a lower-case letter, and if you take those entries, and filter out the obvious client bugs and typos, that leaves only 39 log entries. If we filter out some more log entries that are unlikely search terms for Wikipedia articles (enregistrement-audio-musique, is, unlimited_data_plan, etc.), that leaves maybe 30. http://paste.tstarling.com/p/KWuHif.html Of these, only 12 actually correspond to Wikipedia articles or redirects: abolition addicting_games apple_inc carnaval dreamshade facade girls insidious karthik online_coupons snam walkabout So the number of naive readers actually helped by our 404 Refresh to /wiki/ is probably closer to 12k per week than 149k per week. Personally, I think the refresh is annoying, since it makes it much more difficult to correct typos in manually-typed URLs. If you actually meant to type some non-article URL like a CSS resource, and make a typo which causes it to hit the refresh, the URL you typed is erased from your browser's address bar and history, making correction of the typo much more difficult. Maybe we should just include a link to the search page, rather than redirect or refresh. -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] People with knowledge of English swear words needed :o
Are you good in swearing? WE NEED YOU Huggle 3 comes with vandalism-prediction as it is precaching the diffs even before they are enqueued including their contents. Each edit has so called score which is a numerical value that if higher, the edit is more likely a vandalism. If you want to help us improve this feature, it is necessary to define a score words list for every wiki where huggle is about to be used, for example on English wiki. Each list has following syntax: (see https://en.wikipedia.org/w/index.php?title=Wikipedia:Huggle/Configdiff=573615259oldid=573615075) score-words(score): list of words separated by comma, can contain newlines but comma must be present example score-words(200): these, are, some, words, which, presence, of, increases, the, score, each, word, by, 200, So, if you know english better than me, which you likely do, go ahead and improve the configuration file there, no worries, huggle's config parser is very syntax-error proof. If you have any other suggestion how to improve huggle's prediction, go ahead and tell us! ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] People with knowledge of English swear words needed :o
Perhaps we could use some Math here? Can we grab a list of the last, say, 100,000 edits reverted for vandalism, look at the diff, and compute a frequency score based on that? --scott ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] People with knowledge of English swear words needed :o
Le 19/09/13 11:35, Petr Bena a écrit : snip Huggle 3 comes with vandalism-prediction as it is precaching the diffs even before they are enqueued including their contents. Each edit has so called score which is a numerical value that if higher, the edit is more likely a vandalism. If you want to help us improve this feature, it is necessary to define a score words list for every wiki where huggle is about to be used, for example on English wiki. Each list has following syntax: (see https://en.wikipedia.org/w/index.php?title=Wikipedia:Huggle/Configdiff=573615259oldid=573615075) The good thing while reinventing the wheel, is that you can reuse existing material :-] Cluebot-NG has such a list: http://review.cluebot.cluenet.org and its a quite active one: http://en.wikipedia.org/wiki/Special:Contributions/ClueBot_NG It uses a variety of algorithms to determine the score of an edit: http://en.wikipedia.org/wiki/User:ClueBot_NG#Vandalism_Detection_Algorithm Maybe get in touch with them and reuse their engine? -- Antoine hashar Musso ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] People with knowledge of English swear words needed :o
On Thu, Sep 19, 2013 at 2:35 AM, Petr Bena benap...@gmail.com wrote: Are you good in swearing? WE NEED YOU I know 7 words you can add ;-) [[w:Seven dirty words]] -Chad ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] CentralNotice -- Caching and Proxies
On 09/18/2013 06:06 PM, Matthew Walker wrote: Hey all, I've been scheming for a while on how to reduce the number of calls up to the server for CentralNotice. At the same time I want to greatly reduce the number of objects I have in cache. To do this I propose to change the architecture to having an intermediate proxy server with a static head JS section in mediawiki page head. The proxy would map down all the variables to only what is required at the time. +1 for limiting the application logic in regular text Varnishes, both from a performance and risk management perspective. Having your own banner proxies should make it easier to tweak its behavior to your needs without the risk of taking down the entire site. Node is a good choice for this kind of task. If the total size of all unique banners is relatively small you might even be able to cache the banners in-memory instead of doing backend cache requests. Gabriel ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] People with knowledge of English swear words needed :o
Hi, cool, I was actually expecting someone to come out with suggestions like this. Indeed I didn't know that and now I do. In fact closer cooperation with cluebot is on TO-DO :-) any good algorithm to calculate vandalism is appreciated, in fact this might be the first thing we should create hooks for, so that people can implement own algorithms as either c++ or python plugins which count the score just as they like... (unfortunately I didn't manage to get python engine working for windows build yet) On Thu, Sep 19, 2013 at 4:47 PM, Antoine Musso hashar+...@free.fr wrote: Le 19/09/13 11:35, Petr Bena a écrit : snip Huggle 3 comes with vandalism-prediction as it is precaching the diffs even before they are enqueued including their contents. Each edit has so called score which is a numerical value that if higher, the edit is more likely a vandalism. If you want to help us improve this feature, it is necessary to define a score words list for every wiki where huggle is about to be used, for example on English wiki. Each list has following syntax: (see https://en.wikipedia.org/w/index.php?title=Wikipedia:Huggle/Configdiff=573615259oldid=573615075) The good thing while reinventing the wheel, is that you can reuse existing material :-] Cluebot-NG has such a list: http://review.cluebot.cluenet.org and its a quite active one: http://en.wikipedia.org/wiki/Special:Contributions/ClueBot_NG It uses a variety of algorithms to determine the score of an edit: http://en.wikipedia.org/wiki/User:ClueBot_NG#Vandalism_Detection_Algorithm Maybe get in touch with them and reuse their engine? -- Antoine hashar Musso ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] People with knowledge of English swear words needed :o
On Thu, Sep 19, 2013 at 7:19 AM, C. Scott Ananian canan...@wikimedia.orgwrote: Perhaps we could use some Math here? Can we grab a list of the last, say, 100,000 edits reverted for vandalism, look at the diff, and compute a frequency score based on that? --scott This is pretty much what my gsoc student implemented in the bayesian filter extension. If that gets some use, then those lists could easily be fed back. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Meet Bingle and Bugello
Ergh no not yet, although during the hackathon at Wikimania, Diederik and I made some big changes in Bingle to use Bugzilla's jsonrpc api, which opens a lot of doors for new cool things. I'm planning to write a blog post about it in the coming weeks - I'll work with Diederik to come up with a potential roadmap beforehand. On Thu, Sep 19, 2013 at 7:56 AM, Sumana Harihareswara suma...@wikimedia.org wrote: On 06/20/2013 09:37 AM, Brion Vibber wrote: On Jun 20, 2013 9:26 AM, Arthur Richards aricha...@wikimedia.org wrote: On Wed, Jun 19, 2013 at 4:39 PM, Andre Klapper aklap...@wikimedia.org wrote: Is there any kind of Roadmap file that lists stuff that you think would be great to get fixed next or other random ideas, for potential drive-by contributors on GitHub? Not yet, but good idea! I'll get something up. Another thing to put on that list is Yuvi's bot for GitHub pull requests to gerrit. We're starting to use this on the android Commons app, and it's pretty sweet! -- brion -- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l Hey Arthur, did you end up putting together a roadmap? :-) -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation -- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] People with knowledge of English swear words needed :o
On 19/09/13 10:35, Petr Bena wrote: Are you good in swearing? WE NEED YOU Huggle 3 comes with vandalism-prediction as it is precaching the diffs even before they are enqueued including their contents. Each edit has so called score which is a numerical value that if higher, the edit is more likely a vandalism. If you want to help us improve this feature, it is necessary to define a score words list for every wiki where huggle is about to be used, for example on English wiki. Each list has following syntax: (see https://en.wikipedia.org/w/index.php?title=Wikipedia:Huggle/Configdiff=573615259oldid=573615075) score-words(score): list of words separated by comma, can contain newlines but comma must be present example score-words(200): these, are, some, words, which, presence, of, increases, the, score, each, word, by, 200, [[en:User:/DeltaQuad/UAA/Blacklist]] contains a fairly comprehensive overview of English-language profanity and general trash-talk formatted as regexps, mixed in with other non-sweary blocking patterns that are specific to that blacklist's needs. Neil ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..
Thanks Tim for running those data. That seems to suggest the URL structure works for the most case. On Wed, Sep 18, 2013 at 12:07 AM, Tim Starling tstarl...@wikimedia.org wrote: On 17/09/13 13:59, Jon Robson wrote: I would suggest taking a look at the number of 404s caused by people trying to access pages without the wiki prefix This would be interesting data to go alongside this interesting proposal... There are lots of different sorts of 404s, so it's necessary to do some filtering. For example: * double-slashes, due to bug 52253 * sitemap.xml * Apple touch icons * bullet.gif in various directories * vulnerability scanning, e.g. xmlrpc.php * BlueCoat verify/notify, as described in http://www.webmasterworld.com/search_engine_spiders/3859463.htm * Serial numbers like http://en.wikipedia.org/B008NAYASM . I filtered out everything with a dot or slash in the prospective article title, as well as the BlueCoat URLs and the UAs responsible for serial number URLs. To simplify analysis, I took log lines from the English Wikipedia only. Most of the remaining log entries were search engine crawlers, so I took those out too. The result was 149 log entries at a 1/1000 sample rate, for the week of September 8-14, implying a request rate of about 639,000 per month. This is about 0.006% of the English Wikipedia's page view rate. The 149 URLs are at http://paste.tstarling.com/p/uhtFqg.html -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Jon Robson http://jonrobson.me.uk @rakugojon ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] RecentChanges types (RC_* constants)
Within the Flow extension we have a need for inserting our own special changes into the recentchanges table so that Watchlists continue to inform users of changes in the same ways they are used to. Within mediawiki the WikiData extension has similar requirements and has implemented a solution that works for their use case. Flow is looking to extend this to handle multiple types of external change sources. The solution taken by WikiData to render the lines works well and will be used by Flow, but we have some concerns regarding how different types of external changes will be filtered by the queries that generate the Special:RecentChanges and Special:Watchlist pages. How does the current solution work? There is a field in the recentchanges table, rc_type. All WikiData entries use the value of RC_EXTERNAL( = 5) for this field. Queries are generated with either (rc_type = 5) or (rc_type != 5) when filtering is required. Requirements: - Currently WikiData entries into recentchanges are filtered from Special:RecentChanges and Special:Watchlist. This is toggleable. By default we will not want to filter Flow entries, but will want to offer a toggle much like WikiData does. - More types of external change sources should be able to add themselves in the future without core changes - We should play nice with the db slave's serving up watchlists. There are a couple options, each with their own tradeoffs. 1. Use rc_type = RC_EXTERNAL and add a new field to the recentchanges table, rc_external_type. This would be a varchar(16) field. Wikidata and Flow would put their respective names in the field to distinguish between each other. This is conceptually simple, but makes the queries look even odder. (rc_type != 5) becomes (rc_type != 5 AND rc_external_type != 'wikidata'). 2. Similar to 1, but instead of creating a new field reuse rc_log_type field which is only used when rc_type = RC_LOG. This seems a bit hacky, but would only need a field rename to not feel so hacky. I'm not proposing to rename the field though as there are a variety of extensions depending on the current field name and we are not going to coordinate getting them all updated at the exact same time. The fact that this field is used by various extensions may be a hint that we shouldn't reuse it. 3. Replace RC_EXTERNAL with RC_WIKIDATA and RC_FLOW constants in their respective extensions. This is also straightforward, but adds development overhead to ensure future creators of RC_* constants do not conflict with each other. It would be handled similarly to NS_* constants with an on-wiki list. I have heard some mention that naming conflicts have occurred in the past with this solution. This would force queries looking for only core sources of change to provide an inclusive list of RC_* values to find, rather than using rc_type != RC_EXTERNAL. Things to consider: On smaller wiki's WikiData changes can account for 50% of the changes. Talk namespace edits, which we expect to eventually replace with flow edits, account for ~20% of enwiki recentchanges rows The standard query issued by Special:RecentChanges is SELECT /* lots of fields */ FROM `recentchanges` FORCE INDEX (rc_timestamp) LEFT JOIN `watchlist` ON (wl_user = '2' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) LEFT JOIN `tag_summary` ON ((ts_rc_id=rc_id)) WHERE (rc_timestamp = '2013091200') AND rc_bot = '0' AND (rc_type != 5) ORDER BY rc_timestamp DESC LIMIT 50 The standard query issued by Special:Watchlist is SELECT /* lots of fields */ FROM `recentchanges` INNER JOIN `watchlist` ON (wl_user = '2' AND (wl_namespace=rc_namespace) AND (wl_title=rc_title)) LEFT JOIN `page` ON ((rc_cur_id=page_id)) LEFT JOIN `tag_summary` ON ((ts_rc_id=rc_id)) WHERE (rc_timestamp '20130916175626') AND (rc_this_oldid=page_latest OR rc_type=3) AND (rc_type != 5) ORDER BY rc_timestamp DESC Without further input I will be implementing option 3 from above, I welcome any input on better solutions, or potential pitfalls with this solution. Erik Bernhardson ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l