Re: [Wikitech-l] Search and accents
On Fri, Jun 12, 2015 at 5:33 PM, Lars Aronsson l...@aronsson.se wrote: This is a suggestion to change search, so it ignores postfix accents. Russian dictionaries (including Wiktionary) use accents to indicate stress on syllables, but these accents are never seen in plain text. In Russian Wiktionary, the verb бороться has the inflected form боритесь (imperative, plural), which does not have an entry of its own, but appears in a fact box (table) of inflected forms. However, since this is a dictionary, the word in the box is written with an accent: бори́тесь https://ru.wiktionary.org/wiki/бороться (I do realize that it would be possible to add redirect entries for all such inflected forms, but this has not been done in ru.wiktionary.) Searching for бори́тесь (which nobody would do) finds the relevant page, https://ru.wiktionary.org/w/index.php?search=бори́тесь but searching for боритесь (the normal thing) does not find the relevant page, https://ru.wiktionary.org/w/index.php?search=боритесь Note that Unicode doesn't contain accented versions of Cyrillic letters. Instead, the accent is made by suffixing a separate accent sign. $ echo и | od -c 000 320 270 \n $ echo и́ | od -c 000 320 270 314 201 \n That makes sense to me. I've filed it as https://phabricator.wikimedia.org/T102298 and we'll get it prioritized. Let me know if you don't like how I just copied your (very good) email into the issue and I'll try to re-summarize. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Feedback requested on our search APIs
On Tue, Jun 9, 2015 at 2:19 AM, Gergo Tisza gti...@wikimedia.org wrote: On Mon, Jun 8, 2015 at 4:16 PM, Brian Wolff bawo...@gmail.com wrote: Additionally, from the help page, its not entirely clear about some of the limitations. e.g. You can't do incategory:Foo OR intitle:bar. regexes on intitle don't seem to work over the whole title, only word level tokens (I think, maybe? I'm a bit unclear on how the regex operator works). Being able to see a parse tree of the search expression would be nice, like with the parse/expandtemplates APIs. That would make it easier to find out whether the search fails because the query is parsed differently from what you imagined, or because there really is nothing to return. You can _kindof_ get that now by adding the cirrusDumpQuery url parameter. But it only dumps the query as sent by Cirrus to Elasticsearch and that contains a query_string query that Elasticsearch (Lucene really) parses on its own. One interesting option would be to make a way for Cirrus to return Elasticsearch's explain results. Its not perfect because it only explains why things are found and scored the way they are but it doesn't explain why things aren't found. Exporting the actual parsed query is more ambitious. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Feedback requested on our search APIs
On Mon, Jun 8, 2015 at 7:16 PM, Brian Wolff bawo...@gmail.com wrote: You can't do incategory:Foo OR intitle:bar. regexes on intitle don't seem to work over the whole title, only word level tokens (I think, maybe? I'm a bit unclear on how the regex operator works). intitle is word level though you can do phrase searching. Its pretty much the same as a regular search but limited to the title field. incategory:Foo OR intitle:Bar is a limitation I'm working on now. No idea when it'll be avilable. Limitation comes from us trying to be cute with the command parsing in Cirrus and not writing a whole grammar for the query language. Regexes only work for wikitext. This is a somewhat arbitrary decision on my part - we need to made special ngram fields to accelerate the regex searching and we only do that for wikitext. We _can_ do it for other fields at the cost of update time and disk space. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] First impression with Differential
Some comments inline! On Thu, May 21, 2015 at 4:43 AM, Quim Gil q...@wikimedia.org wrote: Hi, thank you for this short and fresh review. Your help is welcome at https://phabricator.wikimedia.org/T597, where we are trying to identify blockers for using Arcanist, so we can discuss them and address them effectively. Meanwhile, some comments here. On Thu, May 21, 2015 at 9:01 AM, Ricordisamoa ricordisa...@openmailbox.org wrote: review rant Arcanist has to be manually cloned from Git and added to $PATH. Really? Having seen how users struggle installing git-review and dependencies in their environments, I'm not sure this is a bad idea. Plus, I guess it makes updating to master pretty easy as well? This isn't _that_ big a deal to me. git_review wasn't any easier to install. I'd prefer to have to install nothing but if I have to install something your description doesn't sound _that_ bad. Test Plan is required. Sounds like a good practice to me. Worst case scenario, type I didn't test this patch at all. We can turn this off, I think: http://stackoverflow.com/questions/20598026/how-do-i-disable-test-plan-enforcement-in-phabricator I suspect we _should_ turn it off too because we should be minimizing the number of changes we have to make when we switch tools. I'm not against requiring one for most commits at some point but that should be a separate thing. I should mention that I've never seen other open source projects require it, for what that is worth. .arcconfig should be automatically detected on git clone. I can't review my own revisions. Neither should you, that is the point of code review. Then again, if there is no workaround for this, it might be a blocker for urgent Ops deployments (where we see many self-merged patches) and one-person projects. If this is the case, please create a blocker for T597 so we can discuss it in detail. https://phabricator.wikimedia.org/T99905 I laid out my argument there but it goes the same as the test plan argument: we do it now and we shouldn't change to support the tool. We should change because we believe its a good idea. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Per-user search query limiting being deployed to wmf wikis
On Mon, May 18, 2015 at 8:50 PM, MZMcBride z...@mzmcbride.com wrote: Jonathan Morgan wrote: On Mon, May 18, 2015 at 5:08 PM, Bahodir Mansurov bmansu...@wikimedia.org wrote: I doubt all 200 students will be making concurrent searches. I can easily imagine a scenario where 200 students in a large lecture classroom might be instructed to open their laptops, go to Wikipedia, and search for a particular topic at the same time. Similar to how teachers [used to] say now everyone in the class turn to Chapter 8 If that is indeed what we're talking about here, it will be disruptive. I imagine the more common cases involve either distributing a URL or instructing students to search for a particular topic, which typically routes through Google or Yahoo! or some external search engine. Both of these cases wouldn't be disrupted, as I understand it. We'll still keep an eye on it. More worrying is the assertion that some countries come through a surprisingly small number of IP for some reason. I've got a pretty itchy rollback finger and deploy rights. That said, I'm not sure what this thread is about. What problem are we trying to solve? Are we having issues with concurrent searches? Does anyone have links to Phabricator Maniphest tasks or Gerrit commits? This is the last of some security recommendations brownout a few months ago caused by someone finding an inefficient query and _hammering_ the reload button a couple hundred times. I'd link to the bug but it contains reproduction steps so its under some level of lock and key. The fix is us-specific so it's possible the issue is repeatable against other Lucene/Elasticsearch/SOLR users. As I said we've since prevented it from being exploitable on our side. If we have to increase the limits or add whitelists we will. It'll be nice to have some protection but I'm sensitive to it causing trouble. I expect Erik will be monitoring the logs tonight PDT time and I'll have a look early tomorrow morning EDT. The relevant commit in gerrit is https://gerrit.wikimedia.org/r/#/c/210622/ . Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Per-user search query limiting being deployed to wmf wikis
On Mon, May 18, 2015 at 9:30 PM, John phoenixoverr...@gmail.com wrote: If the stressor point is a few hundred hits, lets pick a value low enough not to risk reaching the max, but high enough to not risk excessive collateral damage, Something along the lines of 40-50 would avoid most accidental triggers and low enough to limit server stress. Its far better to incrementally step the limit down, to reach optimal values than to cut back radically and piss everyone off until you can raise the threshold I bumped the limit from 5 to 15. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Connecting to github community
On Tue, May 5, 2015 at 4:28 PM, Bryan Davis bd...@wikimedia.org wrote: Facebook uses a bot to transfer pull requests from GitHub [5] to their Phabricator instance [6] for HHVM. I gotta say I wasn't thrilled with it. It just felt all disjointed a broken. As much as I like the idea of lowering the barrier to entry it felt like a bait and switch. I saw github issues and sent a pull request and was bounced to some other system where I needed yet another account. At least with our setup its clear up front what you are getting into. A two way synch bot like that spoke as the proper user would be pretty awesome. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] MediaWiki-Vagrant now has support for install in Linux Containers (LXC)
I've just tried it and it seems to be working well! I heard that some OSX users were seeing huge huge performance problems with vagrant. Something about having to run it inside a vm. I imagine running lxc inside a vm is much less painful than running virtualbox On Tue, Mar 3, 2015 at 3:24 PM, Dan Duvall dduv...@wikimedia.org wrote: Thanks a ton, Bryan! I know many users have been concerned with the hefty memory requirements (not to mention VT-x requirements) of MW-Vagrant+VirtualBox, especially on lower end hardware. This should be a huge help. Labs users can definitely look to benefit from this feature as well (once the Vagrant 1.7.x kinks are worked out [0]). [0]: https://gerrit.wikimedia.org/r/#/c/193665/ On Tue, Mar 3, 2015 at 11:26 AM, Bryan Davis bd...@wikimedia.org wrote: We have working support for installing MediaWiki-Vagrant in an LXC container now! See the instructions in support/README-lxc.md [0] for a description of how to use it from a Ubuntu 14.04 host computer. Patches are welcome giving alternate instructions for other distributions. Note that Vagrant 1.7+ is required for the latest version of the vagrant-lxc plugin that this uses so you will probably not be able to install Vagrant from a package repo unless you are running Debian unstable. Making using MediaWiki-Vagrant a lighter weight experience for users who are running Linux on their laptops. I took a shot at this right after Wikimania last year by figuring out how to use MediaWiki-Vagrant to provision a Docker container. That experiment made a system that we too unstable for me to promote anyone using it as more than a proof of concept. Since then I've been meaning to try out LXC by using the vagrant-lxc plugin [1] and last weekend I finally found the time. Thanks to Marko Obrovac and Dan Duvall for helping test this. [0]: https://phabricator.wikimedia.org/diffusion/MWVA/browse/master/support/README-lxc.md [1]: https://github.com/fgrehm/vagrant-lxc -- Bryan Davis Wikimedia Foundationbd...@wikimedia.org [[m:User:BDavis_(WMF)]] Sr Software EngineerBoise, ID USA irc: bd808v:415.839.6885 x6855 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Dan Duvall Automation Engineer Wikimedia Foundation http://wikimediafoundation.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] More news on Wikidata Query Indexing Strategy
tl/dr: The technology we started building against (Titan) is probably dead. We're reopening the investigation for a backing technology. Yesterday DataStax http://www.datastax.com/ announced http://www.datastax.com/2015/02/datastax-acquires-aurelius-the-experts-behind-titandb that they'd acquired http://www.datastax.com/2015/02/datastax-acquires-aurelius-the-experts-behind-titandb ThinkAurelius http://thinkaurelius.com/, the company for whom almost all the Titan developers work. The ZDNet article http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/ made it pretty clear that they are killing the project We're not going to do an integration. The play here is we'll take everything that's been done on Titan as inspiration, and maybe some of the Titan project will make it into DSE Graph, DataStax engineering VP Martin Van Ryswyk said. While its certainly possible that someone from the community will come out of the woodwork and continue Titan its now lost almost all of its top developers. It looks like there is some secret succession discussions going on but I'm not holding out hope that anything will come of it. This pretty much blows this project's schedule of having a hardware request by the end of the month and a publicly released beta at the end of March. Anyway, we're reopening the investigation to pick a new backend. We're including more options than we had before as its become clear that open source graph databases is a bit of a wild west space. But there are people waiting on this. The developer summit made that clear. So we're not going to do the month long dive into each choice like we did last time. I'm not 100% sure exactly what we'll do but I can assure you we'll be careful. I know you might want to talk about other options - you may as well stuff them on https://www.mediawiki.org/wiki/Wikibase/Indexing#Other_possible_candidates and we'll get to them. As always, you can check out our workboard https://phabricator.wikimedia.org/project/board/37/query/DwEBx9K4vaHo/ to see what we're actually working on. Titan is still in the running assuming it gets active maintainers. OrientDB, which we evaluated last round, is still in there too. So too are GraphX and Neo4j. And ArangoDB. And Magnus' WDQ. We'd get much more involved in maintenance, I think. And writing a TinkerPop implementation Elasticsearch. That's not a serious contender. It'd get geo support for free but its really just a low bar to compare all the other options to. Thanks, Nik https://phabricator.wikimedia.org/T88550 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] More news on Wikidata Query Indexing Strategy
Top posting to add context: this is for the initiative to get a version of Magnus' wonderful http://wdq.wmflabs.org/ running in production at WMF. On Wed, Feb 4, 2015 at 4:50 PM, Nikolas Everett never...@wikimedia.org wrote: tl/dr: The technology we started building against (Titan) is probably dead. We're reopening the investigation for a backing technology. Yesterday DataStax http://www.datastax.com/ announced http://www.datastax.com/2015/02/datastax-acquires-aurelius-the-experts-behind-titandb that they'd acquired http://www.datastax.com/2015/02/datastax-acquires-aurelius-the-experts-behind-titandb ThinkAurelius http://thinkaurelius.com/, the company for whom almost all the Titan developers work. The ZDNet article http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/ made it pretty clear that they are killing the project We're not going to do an integration. The play here is we'll take everything that's been done on Titan as inspiration, and maybe some of the Titan project will make it into DSE Graph, DataStax engineering VP Martin Van Ryswyk said. While its certainly possible that someone from the community will come out of the woodwork and continue Titan its now lost almost all of its top developers. It looks like there is some secret succession discussions going on but I'm not holding out hope that anything will come of it. This pretty much blows this project's schedule of having a hardware request by the end of the month and a publicly released beta at the end of March. Anyway, we're reopening the investigation to pick a new backend. We're including more options than we had before as its become clear that open source graph databases is a bit of a wild west space. But there are people waiting on this. The developer summit made that clear. So we're not going to do the month long dive into each choice like we did last time. I'm not 100% sure exactly what we'll do but I can assure you we'll be careful. I know you might want to talk about other options - you may as well stuff them on https://www.mediawiki.org/wiki/Wikibase/Indexing#Other_possible_candidates and we'll get to them. As always, you can check out our workboard https://phabricator.wikimedia.org/project/board/37/query/DwEBx9K4vaHo/ to see what we're actually working on. Titan is still in the running assuming it gets active maintainers. OrientDB, which we evaluated last round, is still in there too. So too are GraphX and Neo4j. And ArangoDB. And Magnus' WDQ. We'd get much more involved in maintenance, I think. And writing a TinkerPop implementation Elasticsearch. That's not a serious contender. It'd get geo support for free but its really just a low bar to compare all the other options to. Thanks, Nik https://phabricator.wikimedia.org/T88550 And, too add more context, we chose not to just immediately deploy Magnus' WDQ because we didn't want to maintain a graph database ourselves. You should now be able to appreciate the irony of the situation more thoroughly. Its healthy to find humor where you can. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] SOA in .NET, or Microsoft is going open source MIT style
On Wed, Feb 4, 2015 at 5:09 AM, Yuri Astrakhan yastrak...@wikimedia.org wrote: flame war ahead For those not adicted to slashdot, see here http://news.slashdot.org/story/15/02/04/0332238/microsoft-open-sources-coreclr-the-net-execution-engine . Licenced under MIT https://github.com/dotnet/coreclr/blob/master/LICENSE.TXT, plus an additional patents promise https://github.com/dotnet/coreclr/blob/master/PATENTS.TXT. I'm not sure how relevant it is, but are promises legally binding? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Sane versioning for core (was: Re: Fwd: No more Architecture Committee?)
+1 for something like this. Its not a huge problem not to do semver but it'd be simpler to explain if we did. On Sun, Jan 25, 2015 at 10:27 AM, Legoktm legoktm.wikipe...@gmail.com wrote: On 01/15/2015 08:26 PM, Chad wrote: I've been saying for over a year now we should just drop the 1. from the 1.x.y release versions. So the next release would be 25.0, 26.0, etc etc. +1, let's do this. It would allow us to follow semver and still retain our current version number history instead of waiting for a magical 2.0. -- Legoktm ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Attracting new talent to our projects
On Sat, Jan 3, 2015 at 11:30 PM, MZMcBride z...@mzmcbride.com wrote: Jon Robson wrote: Thoughts? Adding easter eggs sounds like a fairly strange recruitment tactic, but I don't see any harm in trying it out and seeing what happens. It's not totally clear to me what problem we're trying to solve here (if any). It's also not completely clear to me whether you want to recruit for the Wikimedia Foundation specifically or for the Wikimedia movement. Depending on the specifics, certain solutions might be more or less appropriate. I think the best thing for recruiting for MediaWiki (the open source project) is the extraction portion of the Librarization https://www.mediawiki.org/wiki/Library_infrastructure_for_MediaWiki project. Breaking MediaWiki into parts will get it used in more places and the more people that rely on it the more people will contribute to it. Making reusable PHP libraries as opposed to services is doubly good at getting contributions because the people that will be integrating with them will also be PHP developers so they'll be reasonably quickly able to contribute. My expertise doesn't really extend beyond the open source project so I won't guess at ways to recruit for the movement or the foundation. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Phabricator migration part II: Replacing gitblit with Diffusion
On Nov 29, 2014 1:58 PM, Legoktm legoktm.wikipe...@gmail.com wrote: On the talk page I suggested dropping the G prefix for top-level repos, and just giving them an unprefixed callsign. I think that would fix the ugliness of some of the frequently used names. On Sat, 29 Nov 2014 17:51:11 +0100, Chad innocentkil...@gmail.com wrote: The only exception I'd make is MediaWiki. Under this scheme the callsign is MWMW. MediaWiki should be just plain MW. If we rename mediawiki/core to just mediawiki it becomes a top-level repo and can just be MW. On 29 November 2014 at 09:13, Bartosz Dziewoński matma@gmail.com wrote: Feels slippery. Next thing you know, someone will want VE and SMW. :) If the VisualEditor/VisualEditor repo becomes VisualEditor I don't see anything against naming it VE. On 11/29/14 9:26 AM, James Forrester wrote: When we have VE-WordPress and VE-Drupal and VE-Joomla and whatever, we'll put them as… GVEW, GVED, GVEJ *etc.* and just hope we never have a first character clash on integrations? I think we would set up VE as a prefix rather than dumping them under general, so: VEWP, VEDP, VEJM (or whatever). -- Legoktm +1 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Phabricator migration part II: Replacing gitblit with Diffusion
I think that is a bit sad. Not tearing of cloths or gnashing of teeth sad. Maybe stare whistfully into the sunset and think of what could have been bad. I'd prefer not to have them but I ultimately don't care that much. It does provide a fun bikeshedding opportunity I guess. Nik On Nov 26, 2014 12:52 AM, Chad innocentkil...@gmail.com wrote: No we can't not. -Chad On Tue, Nov 25, 2014, 9:11 PM MZMcBride z...@mzmcbride.com wrote: James Forrester wrote: We need to agree how we are going to name our repos, and much more importantly because it can't change, what their callsign is. These will be at the heart of e-mails, IRC notifications and git logs for a long time, so it's important to get this right rather than regret it after the fact. A handful of repos are so important and high-profile that we can use an acronym without too much worry, like MW for MediaWiki or VE for VisualEditor. For the rest, we need to make sure we've got a good enough name that won't cause inconveniences or confusion, and doesn't repeat the mistakes we've identified over time. We've learnt since the SVN to git migration a few years ago that calling your repository /core is a bad plan, for instance. Could we not? JIRA does this prefixing with tickets and I don't really understand its purpose. We already have Git hashes and positive integers. Is another scheme really needed? And what was wrong with the repository names again? I was pleased that Maniphest simply uses T as a prefix. I'm kind of bummed out that Diffusion is introducing shouting obscure immutable abbreviations. MZMcBride ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Phabricator repository callsigns
On Thu, Nov 13, 2014 at 4:14 PM, Brian Wolff bawo...@gmail.com wrote: On 11/13/14, Chad innocentkil...@gmail.com wrote: Please help me draft some guidelines for Phabricator repo callsigns. https://www.mediawiki.org/wiki/Phabricator/Callsign_naming_conventions The subpage on naming our existing repos should be especially fun: https://www.mediawiki.org/wiki/Phabricator/Callsign_naming_conventions/Existing_repositories Bikeshedding on the second hardest problem in CS? Who on this list can pass up a chance to join in there? ;-) -Chad ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l Do we get full unicode including astral characters? If so, I vote MediaWiki be (U+1f33b). If we're going unicode why no U+2620. We could make Cirrus U+2601 or maybe U+5377U+96F2. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Proposed timeline for remaining Cirrus/Elastic rollout
On Thu, Oct 30, 2014 at 10:18 PM, MZMcBride z...@mzmcbride.com wrote: James Forrester wrote: On 30 October 2014 09:53, Chad innocentkil...@gmail.com wrote: New hardware is in place and we've got plenty of breathing room to wrap up the migration to the new search engine. Excellent news! Indeed! Thanks to all who made this possible. An independent search engine is an incredibly important piece of infrastructure that's now getting a more appropriate level of attention and love. This is great and I'm excited to see what we'll be able to (continue to) build on top of it. I lost track of the discussion about the ability to run regular expressions across wikitext. I see a large amount of opportunity in being able to search through wikitext in real-time. There are plenty of findable issues and errors in our articles and search is a key component in improving the situation. I have OK news and bad news on that front unfortunately. We used to have brute force regex searches and they were pretty much garbage. It was just too easy to write a search that would take minutes to complete. That would cause the queue of other regex searches to get backed up. And it'd timeout on the varnish side so the user, even if they had the patience to wait for 5 minutes to get a result, wouldn't get one. The OK news is that we cut over to using trigram accelerated regex searches about a week ago. Its better. Almost usable but not quite. The worst case run time is now 30 seconds. If you search for something that is rare it'll probably come back in a few seconds. There are issues with consistency when your request runs really long as well ( https://bugzilla.wikimedia.org/show_bug.cgi?id=72128). Far from perfect but serviceable. The bad news. We've had two Cirrus outages in the past week that I believe are caused by the accelerator so its disabled and we're back to brute force for now. The first outage was on Monday and we didn't have a clue what caused it. We added logging to learn for the next time and it didn't happen again until this morning. The extra logging failed (:shakes fist:) but we were able to implicate this code in the process. The silver lining is that I'll be working on it again and might be able to get some speed improvements while were there. I'm not sure what the outage says about the schedule. I'll have to do some thinking about that. In the mean time it does say that we should keep the old search there as a backup. We've been able to fall back to it during the outages to minimize the suffering. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] DB performance review: Wikibase Usage Tracking
I can't access those links! On Tue, Sep 16, 2014 at 11:14 AM, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: Hi all! The Wikibase team would like to allow data from any item to be used on any client page. To do this, we need to track which item is being used where, so we can purge the appropriate pages when the item changes. We would like to have people with database experience to look at our proposal and let us know about any concerns, especially wrt performance. Here you find a proposal for two database tables for tracking the usage of entities across wikis: https://gerrit.wikimedia.org/r/#/c/158078/9/usagetracking/includes/Usage/Sql/entity_usage.sql,unified https://gerrit.wikimedia.org/r/#/c/158078/9/subscription/includes/Subscription/Sql/entities_per_client.sql,unified The entity_usage table would be on every client, recording wich entity is used on which page (kind of like the iwlinks table). The entity_per_client table would be on the repo, and track which wiki (client) is interested in changes to which entity. Please have a look and let me know if you have any questions or suggestions, especially with regards to the following use cases: The following would happpen when editing/re-parsing a page on a client wiki (e.g. wikipedia): * get all entities used on a given page from entity_usage * delete rows based on a page id and a list of entity ids from entity_usage * insert rows for a page / entity pair into entity_usage * queriy rows for a set of entities from entity_usage (with no page id specified). * add rows for a set of (newly used) entites to the entity_per_client table * remove rows for a set of (no longer used) entites from the entity_per_client table The following would happen when dispatching a change from wikibase: * looking up interested wikis for a list of entities from the entity_per_client table. * (notification via the job queue) * looking up pages to be purged/updated based on a list of entity ids (and possibly an aspect id) in the entity_usage table. -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Parser cache update/migration strategies
Also option 5 could be to continue without the days until the parser cash is invalidated on its own. Maybe option 6 could be to continue without the data and invalidate the cache and completely rerender only some of the time. Like 5% of the time for the first couple hours then 25% of the time for a day then 100% of the time after that. It'd guarantee that the cache is good after a certain amount of time without causing a big spike ridge after deploys. All those options are less good then just updating the cache I think. Nik On Sep 9, 2014 6:42 AM, aude aude.w...@gmail.com wrote: On Tue, Sep 9, 2014 at 12:03 PM, Daniel Kinzler dan...@brightbyte.de wrote: Hi all! tl;dr: How to best handle the situation of an old parser cache entry not containing all the info expected by a newly deployed version of code? We are currently working to improve our usage of the parser cache for Wikibase/Wikidata. E.g., We are attaching additional information related to languagelinks the to ParserOutput, so we can use it in the skin when generating the sidebar. However, when we change what gets stored in the parser cache, we still need to deal with old cache entries that do not yet have the desired information attached. Here's a few options we have if the expected info isn't in the cached ParserOutput: 1) ...then generate it on the fly. On every page view, until the parser cache is purged. This seems bad especially if generating the required info means hitting the database. 2) ...then invalidate the parser cache for this page, and then a) just live with this request missing a bit of output, or b) generate on the fly c) trigger a self-redirect. 3) ...then generated it, attach it to the ParserOutput, and push the updated ParserOutput object back into the cache. This seems nice, but I'm not sure how to do that. https://gerrit.wikimedia.org/r/#/c/158879/ is my attempt to update ParserOutput cache entry, though it seems too simplistic a solution. Any feedback on this would be great or suggestions on how to do this better, or maybe it's crazy idea. :P Cheers, Katie 4) ...then force a full re-rendering and re-caching of the page, then continue. I'm not sure how to do this cleanly. So, the simplest solution seems to be 2, but it means that we invalidate the parser cache of *every* page on the wiki potentially (though we will not hit the long tail of rarely viewed pages immediately). It effectively means that any such change requires all pages to be re-rendered eventually. Is that acceptable? Solution 3 seems nice and surgical, just injecting the new info into the cached object. Is there a nice and clean way to *update* a parser cache entry like that, without re-generating it in full? Do you see any issues with this approach? Is it worth the trouble? Any input would be great! Thanks, daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- @wikimediadc / @wikidata ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Parser cache update/migration strategies
On Tue, Sep 9, 2014 at 8:00 AM, Daniel Kinzler dan...@brightbyte.de wrote: Am 09.09.2014 13:45, schrieb Nikolas Everett: All those options are less good then just updating the cache I think. Indeed. And that *sounds* simple enough. The issue is that we have to be sure to update the correct cache key, the exact one the OutputPage object in question was loaded from. Otherwise, we'll be updating the wrong key, and will read the incomplete object again, and try to update again, and again, on every page view. Sadly, the mechanism for determining the parser cache key is quite complicated and rather opaque. The approach Katie tries in I1a11b200f0c looks fine at a glance, but even if i can verify that it works as expected on my machine, I have no idea how it will behave on the more strange wikis on the live cluster. Any ideas who could help with that? No, not really. My only experience with the parser cache was accidentally polluting it with broken pages one time. I suppose one option is to be defensive around reusing the key. I mean, if you could check the key used to fetch from the parser cache and you had a cache hit then you know if you do a put you'll be setting _something_. Another thing - I believe uncached calls to the parser are wrapped in pool counter acquisitions to make sure no two processes spend duplicate effort. You may want to acquire that to make sure anything you do that is heavy doesn't get done twice. Once you start talking about that it might just be simpler to invalidate the whole entry. Another option: Kick off some kind of cache invalidation job that _slowly_ invalidates the appropriate parts of the cache. Something like how the varnish cache is invalidated on template change. That gives you marginally more control than randomized invalidation. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Recent vagrant issues
If you've just started having vagrant issues, particularly if `vagrant provision` has started complaining about git and vector then make sure to pull the newest version of mediawiki. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Thoughts about roles
Cirrus's dependencies are too get the integration tests passing and they verify some stuff that came up in dictionary which doesn't force capitals. I'm all for splitting the roles into basic ones and bloated ones. On Aug 9, 2014 3:47 PM, Chad innocentkil...@gmail.com wrote: On Sat, Aug 9, 2014 at 3:40 PM, Max Semenik maxsem.w...@gmail.com wrote: Currently a lot of our extension Vagrant roles are working like Swiss knives: they do everything possible to imagine. For example, MobileFrontend always installs 3 optional dependencies while CirrusSearch includes its configuration for unit tests that among other things enforces $wgCapitalLinks = false which is untypical for most MW installs. I hate that stupid config file for Cirrus. HATE HATE HATE. I think many of these actually make development harder. Solution? Can we split some larger roles to basic and advanced parts, so that people who need an extension to play around or to satisfy a dependency will not be forced to emulate a significant part of WMF infrastructure? Not a bad idea. Cirrus doesn't depend on half the things it says it does unless you're wanting to run browser tests. -Chad ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Special:Search returning errors on English Wikipedia
I'll have a look at it. On Wed, Jul 2, 2014 at 12:29 PM, Florian Schmidt florian.schmidt.wel...@t-online.de wrote: Hello! Is working for me. I have only opened your link in Google chrome and searched for Android (search suggestions working, too) and click ok. After this I see the result page. Can you try to delete cache and cookies? What browser you use? Kind regards Florian Freundliche Grüße Florian -Ursprüngliche Nachricht- Von: wikitech-l-boun...@lists.wikimedia.org [mailto: wikitech-l-boun...@lists.wikimedia.org] Im Auftrag von Pine W Gesendet: Mittwoch, 2. Juli 2014 18:26 An: wikitech-l@lists.wikimedia.org Betreff: [Wikitech-l] Special:Search returning errors on English Wikipedia I am unable to search using any of the options on https://en.wikipedia.org/w/index.php?title=Special:Search This came to my attention when a user reported that they were unable to search English Wikipedia's help files. It turns out that none of the advanced, everything, multimedia, or content pages search functions are working. All searches return the error An error has occurred while searching: The search backend returned an error: Pine ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Special:Search returning errors on English Wikipedia
Hmmm - its working for me. The first couple of times I tried it was slow but it worked. I tried both search engine options (BetaFeature and default) and a bunch of different search options. We don't have good logs for the default search and that error message looks like it came from there. If it did and its gone now we'll have to chalk it up to a temporary blip in the old search system that we don't really understand very well. That's a painful thing to say, but the last time I poked that system trying to fix it I took out enwiki's search for half an hour while it warmed its caches. I try not to taunt it unless it is seriously broken. Failing on all searches certainly counts so if it does it again then please reply. The BetaFeature search log is only complaining about errors that I know about and am fixing, literally right now. Nik On Wed, Jul 2, 2014 at 12:34 PM, Nikolas Everett never...@wikimedia.org wrote: I'll have a look at it. On Wed, Jul 2, 2014 at 12:29 PM, Florian Schmidt florian.schmidt.wel...@t-online.de wrote: Hello! Is working for me. I have only opened your link in Google chrome and searched for Android (search suggestions working, too) and click ok. After this I see the result page. Can you try to delete cache and cookies? What browser you use? Kind regards Florian Freundliche Grüße Florian -Ursprüngliche Nachricht- Von: wikitech-l-boun...@lists.wikimedia.org [mailto: wikitech-l-boun...@lists.wikimedia.org] Im Auftrag von Pine W Gesendet: Mittwoch, 2. Juli 2014 18:26 An: wikitech-l@lists.wikimedia.org Betreff: [Wikitech-l] Special:Search returning errors on English Wikipedia I am unable to search using any of the options on https://en.wikipedia.org/w/index.php?title=Special:Search This came to my attention when a user reported that they were unable to search English Wikipedia's help files. It turns out that none of the advanced, everything, multimedia, or content pages search functions are working. All searches return the error An error has occurred while searching: The search backend returned an error: Pine ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Finding images
On Jun 18, 2014 2:28 PM, Brian Wolff bawo...@gmail.com wrote: On 6/18/14, Kristian Kankainen krist...@eki.ee wrote: Hello! I think, if one is clever enough, some categorization could be automated allready. Searching for pictures based on meta-data is called Concept Based Image Retrieval, searching based on the machine vision recognized content of the image is called Content Based Image Retrieval. What I understood of Lars' request, is an automated way of finding the superfluous concepts or meta-data for pictures based on their content. Of course recognizing an images content is very hard (and subjective), but I think it would be possible for many of these superfluous categories, such as winter landscape, summer beach and perhaps also red flowers and bicycle. There exist today many open source Content Based Image Retrieval systems, that I understand basically works in the way that you give them a picture, and they find you the matching pictures accompanied with a score. Now suppose we show a picture with known content (pictures from Commons with good meta-data), then we could to a degree of trust find pictures with overlapping categories. I am not sure whether this kind of automated reverse meta-data labelling should be done for only one category per time, or if some kind of category bundles work better. Probably adjectives and items should be compounded (eg red flowers). Relevant articles and links from Wikipedia: # https://en.wikipedia.org/wiki/Image_retrieval # https://en.wikipedia.org/wiki/Content-based_image_retrieval # https://en.wikipedia.org/wiki/List_of_CBIR_engines#CBIR_research_projects.2Fdemos.2Fopen_source_projects Best wishes Kristian Kankainen 18.06.2014 09:14, Pine W kirjutas: Machine vision is definitely getting better with time. We have computer-driven airplanes, computer-driven cars, and computer-driven spacecraft. The computers need us less and less as hardware and software improve. I think it may be less than a decade before machine vision is good enough to categorize most objects in photographs. Pine ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l Interesting. Some demo links that I found: * http://demo-itec.uni-klu.ac.at/liredemo/ Lire has been on my list of things to look at for a while now. Its nice because it could integrate reasonably easily into cirrus because it is built on lucene. I can't promise anything quick but I'll look into the others as well. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Tell my favorite conference about your Wikimedia tech
Man! I'd love to go ride in a balloon and give a talk but Auckland is so close to half way around the world On Mon, Jun 16, 2014 at 11:36 AM, Sumana Harihareswara suma...@wikimedia.org wrote: Thanks, Luis! And for Tyler or anyone else on this list who has the same questions: Sometimes I come up with a talk idea by asking myself, What do I know now that I wish I'd known a year ago? This is a way to think about what you've learned that a lot of other people don't know as well as you. That's basically how I thought of A Few Python Tips. To practice it in front of a small crowd first, I'm doing a tech talk this Thursday: https://www.mediawiki.org/wiki/Meetings/2014-06-19 before I talk next week at Open Source Bridge. To give a Wikimedia tech talk about your topic: https://www.mediawiki.org/wiki/Project:Calendar/How_to_schedule_an_event And, just like with submitting patches, don't reject *yourself* before the conference organizers have a chance to. ;-) -Sumana On 06/13/2014 10:52 AM, Luis Villa wrote: On Fri, Jun 13, 2014 at 7:07 AM, Tyler Romeo tylerro...@gmail.com wrote: I’ve always wanted to submit a cool MediaWiki talk to these conferences, but I have no idea what I’d talk about (or whether I’m even experienced enough to talk about anything at a conference). The answer to that second part is yes :) LCA is not TED :) Background on their speaker selection process and what makes for a good submission (useful for any conference, not just LCA): http://opensource.com/life/14/1/get-your-conference-talk-submission-accepted Are there any guidelines on what would make a good talk? http://speaking.io/plan/an-idea/ ? [All of speaking.io is useful.] HTH- Luis ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Getting phpunit working with Vagrant
I _thought_ someone was working on getting it to just work. For now, though, if you start with a clean machine you can run the commands here: https://www.mediawiki.org/wiki/Manual:PHP_unit_testing/Installing_PHPUnit#Using_PEAR to get it installed. Make sure the use the pear commands because it'll get you phpunit 3.7.X. phpunit 4.0 doesn't work for us mediawiki. Anyway, after following the pear commands inside your vagrant VM phpunit should work. Nik On Fri, Jun 13, 2014 at 1:44 PM, Jon Robson jdlrob...@gmail.com wrote: Has anyone had success with this...? This is what happens when I try to run: master x ~/git/vagrant/mediawiki/tests/phpunit $ php phpunit.php Warning: require_once(/vagrant/LocalSettings.php): failed to open stream: No such file or directory in /Users/jrobson/git/vagrant/mediawiki/LocalSettings.php on line 130 Fatal error: require_once(): Failed opening required '/vagrant/LocalSettings.php' (include_path='.:') in /Users/jrobson/git/vagrant/mediawiki/LocalSettings.php on line 130 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] How to show the page AND section for CirrusSearch search results
On Fri, May 9, 2014 at 8:32 AM, J jollylittlebottom jollylittlebot...@hotmail.com wrote: If I search on http://www.mediawiki.org for Search Weighting I get as result a line: Search (section Search Weighting Ideas) with links to the page and to the section. This section contains the word GeoLoc. But if I search for GeoLoc I get just the page link. I want to show this section link as a search result too. Is there an easy way or is it a planned feature? What do I have to change in the CirrusSearch extension? Right, the (section *Search Weighting* Ideas) bit is populated by matching query terms to the section titles rather then doing something in combination of the text snippet below. This can lead to it sometimes showing a snippet from one section and a section heading from another. Let me have a think about how to make that better Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] GeoData now uses Elasticsearch
On Thu, Apr 10, 2014 at 3:43 AM, Faidon Liambotis fai...@wikimedia.orgwrote: On Thu, Apr 10, 2014 at 05:04:38AM +0400, Max Semenik wrote: And finally, appreciation: this was made possible only thanks to awesome help from our search team, Nik Everett and Chad Horohoe. You kick ass guys! Extending appreciation: thanks Max, good work! This is great :) Yeah, you did most of the work Max! As to which wiki to go with next: look at notcirrus.dblist of all the wikis that don't have any access to Cirrus. All the others are indexing pages and would just require enabling the Cirrus integration with GeoData and running a reindex to work. Oh, and don't pick enwiki. We're running it at -1 redundency right now do to space concerns. So only two total copied instead of 3. We're working on fixing this but it'll take some time. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Next steps down the TitleValue road
Now that TitleValue has been merged - what's next? I'll admit I'm an odd choice to be sending out this email [1], but someone's got to do it. So, I'm thinking, maybe: 1. Start on the TODO in Linker.php [2], turning it into a deprecated compatibility interface calling HtmlPageLinkRenderer. 2. Start writing code in the same fashion for an upcoming project. I believe the upcoming revision storage work might lend itself well to this. Also, I think we should think about how we want interdependent components to come together. Right now everything must know how to make all of its dependencies. For example, LinksSearchPage must know how to build MediaWikiTitleCodec. That isn't a hardship now, but might become one when we have 30 things like LinksSearchPage and we want to add another dependency to MediaWikiTitleCodec. I don't claim to know a whole lot about the state of the art for this problem in PHP but I'm used to solving it with an inversion of control container. Each component declares its dependencies in some way and the container makes sure that each component gets the dependencies it needs. Do we need something like this now or will we need something like this in the future? Nik/manybubbles [1]: I was the most against TitleValue at the Architecture Summit but have since softened my opinion. Also, the vast majority of my work is in the CirrusSearch extension and not core. [2]: https://gerrit.wikimedia.org/r/#/c/106517/22/includes/Linker.php,cm ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Wikitech-ambassadors] Roadmap and deployment highlights - week of March 31st
On Fri, Mar 28, 2014 at 4:57 PM, Greg Grossmeier g...@wikimedia.org wrote: == Wednesday == * Cirrus Search will be graduated from Beta Feature to enabled for all users on all non-wikipedia wikis (eg Commons, etc) ** https://www.mediawiki.org/wiki/Search I'd prefer to do commons on its own some other time because it is much higher traffic. Also, we're not even a BetaFeature in a few non-wikipedias and it wouldn't be fair (or even work) to just switch them on too. So all non-wikipedias that aren't Commons, Meta, or Incubator. As always I'm open if you find any show stopper issues while trying Cirrus as a BetaFeature - we won't deploy it to a wiki that it'll make worse. Sorry for the late notice, Nik Everett/manybubbles ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Bach redirecting to Bạch (notice the dot under the a)
Are either one of you opted into the New Search BetaFeature? Nik On Mon, Mar 17, 2014 at 9:01 AM, John phoenixoverr...@gmail.com wrote: It works for me. On Monday, March 17, 2014, David Cuenca dacu...@gmail.com wrote: Hi, When I type bach on the top right en.wp search box, I only have the option to select Bach from the list. This option however takes me to Bạch (with a dot under the a). https://en.wikipedia.org/wiki/B%E1%BA%A1ch However when I type the url I'm taken to the right article https://en.wikipedia.org/wiki/Bach Is this a problem with the search box? I wanted to report the bug, but I didn't know to which component to report it. Cheers, Micru ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Bach redirecting to Bạch (notice the dot under the a)
Filed: https://bugzilla.wikimedia.org/show_bug.cgi?id=62727 I figured out the problem and kicked off a process to fix it. You should be able to opt back in a few hours and the problem will have gone away. Nik On Mon, Mar 17, 2014 at 9:19 AM, David Cuenca dacu...@gmail.com wrote: I was! When opting out, it works fine. Thanks for the hint! Micru On Mon, Mar 17, 2014 at 2:16 PM, Nikolas Everett never...@wikimedia.org wrote: Are either one of you opted into the New Search BetaFeature? Nik On Mon, Mar 17, 2014 at 9:01 AM, John phoenixoverr...@gmail.com wrote: It works for me. On Monday, March 17, 2014, David Cuenca dacu...@gmail.com wrote: Hi, When I type bach on the top right en.wp search box, I only have the option to select Bach from the list. This option however takes me to Bạch (with a dot under the a). https://en.wikipedia.org/wiki/B%E1%BA%A1ch However when I type the url I'm taken to the right article https://en.wikipedia.org/wiki/Bach Is this a problem with the search box? I wanted to report the bug, but I didn't know to which component to report it. Cheers, Micru ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Etiamsi omnes, ego non ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Bach redirecting to Bạch (notice the dot under the a)
Looks like I lied: I'll have to make a software change to fix this after all It'll be more then a few hours but I'll reply on the bug when it is really fixed. On Mon, Mar 17, 2014 at 9:44 AM, Nikolas Everett never...@wikimedia.orgwrote: Filed: https://bugzilla.wikimedia.org/show_bug.cgi?id=62727 I figured out the problem and kicked off a process to fix it. You should be able to opt back in a few hours and the problem will have gone away. Nik On Mon, Mar 17, 2014 at 9:19 AM, David Cuenca dacu...@gmail.com wrote: I was! When opting out, it works fine. Thanks for the hint! Micru On Mon, Mar 17, 2014 at 2:16 PM, Nikolas Everett never...@wikimedia.org wrote: Are either one of you opted into the New Search BetaFeature? Nik On Mon, Mar 17, 2014 at 9:01 AM, John phoenixoverr...@gmail.com wrote: It works for me. On Monday, March 17, 2014, David Cuenca dacu...@gmail.com wrote: Hi, When I type bach on the top right en.wp search box, I only have the option to select Bach from the list. This option however takes me to Bạch (with a dot under the a). https://en.wikipedia.org/wiki/B%E1%BA%A1ch However when I type the url I'm taken to the right article https://en.wikipedia.org/wiki/Bach Is this a problem with the search box? I wanted to report the bug, but I didn't know to which component to report it. Cheers, Micru ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Etiamsi omnes, ego non ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] CirrusSearch outage Feb 28 ~19:30 UTC
CirrusSearch flaked out Feb 28 around 19:30 UTC and I brought it back from the dead around 21:25 UTC. During the time it was flaking out searches that used it (mediawiki.org, wikidata.org, ca.wikipedia.org, and everything in Italian) took a long, long time or failed immediately with a message about this being a temporary problem we're working on fixing. Events: We added four new Elasticserach servers on Rack D (yay) around 18:45 UTC The Elasticsearch cluster started serving simple requests very slowly around 19:30 UTC I was alerted to a search issue on IRC at 20:45 UTC I fixed the offending Elasticsearch servers around 21:25 UTC Query times recovered shortly after that Explanation: We very carefully installed the same version of Elasticsearch and Java as we use on the other machines then used puppet to configure the Elasticsearch machines to join the cluster. It looks like they only picked up half the configuration provided by puppet (/etc/elasticsearch/elasticsearch.yml but not /etc/defaults/elasticsearch). Unfortunately for us that is the bad half to miss because /etc/default/elasticsearch contains the JVM heap settings. The servers came online with the default amount of heap which worked fine until Elasticsearch migrated a sufficiently large index to them. At that point the heap filled up and Java does what it does in that case and spun forever trying to free garbage. It pretty much pegged one CPU and rendered the entire application unresponsive. Unfortunately (again) pegging one CPU isn't that weird for Elasticsearch. It'll do that when it is merging. The application normally stays responsive because the rest of the JVM keeps moving along. That doesn't happen when heap is full. Knocking out one of those machines caused tons of searches to block, presumably waiting for those machine to respond. I'll have to dig around to see if I can find the timeout but we're obviously using the default which in our case is way way way to long. We then filled the pool queue and started rejecting requests to search altogether. When I found the problem all I had to do was kill -9 the Elasticsearch servers and restart them. -9 is required because JVMs don't catch the regular signal if they are too busy garbage collecting. What we're doing to prevent it from happening again: * We're going to monitor the slow query log and have icinga start complaining if it grows very quickly. We normally get a couple of slow queries per day so this shouldn't be too noisy. We're going to also have to monitor error counts, especially once we get more timeouts. ( https://bugzilla.wikimedia.org/show_bug.cgi?id=62077) * We're going to sprinkle more timeouts all over the place. Certainly in Cirrus while waiting on Elasticsearch and figure out how to tell Elasticsearch what the shard timeouts should be as well.( https://bugzilla.wikimedia.org/show_bug.cgi?id=62079) * We're going to figure out why we only got half the settings. This is complicated because we can't let puppet restart Elasticsearch because Elasticsearch restarts must be done one node at a time. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Thoughts on hiding text from the internal search
I can make a better case for hiding things from internal search then I did on the bug. I'll send it here and copy it to the mailing list: The biggest case I can think of for excluding text from search is the license information on commons. Please take that as an example. Maybe it is the only example I think it is pretty important. 1. The license information doesn't add a whole lot to the result. Try searching commons with Cirrus for distribute, transmit, or following and you'll very quickly start to see the text of the CC license. And the searches find 14 million results. Heaven forbid you want to find distributed transmits or something. You'll almost exclusively get the license highlighted and you'll still find 14 million results. This isn't _horrible_ because the top results all have distribute or transmit in the title but it isn't great. 2. Knock on effect from #2: because relevance is calculated based on the inverse of the number of documents that contain the word the then every term in the CC license is worth less then words not in the license. I can't point to any example of why that is bad but I feel it in my bones. Feel free to ignore this. I'm probably paranoid. 3. Entirely self serving: given #1, the contents of the license take up an awful lot of space for very little benefit. If I had more space I could make Cirrus a beta on more wikis. It is kind of a lame reason and I'm attacking the space issue from other angles so maybe it'll be moot long before we get this deployed and convince the community that it is worth doing. 4. Really really self serving: if .nosearch is the right solution and is useful then it is super duper easy to implement. Like one line of code, a few tests, and bam. Its already done, just waiting to be rebased and merged. It was so easy it would have taken longer to estimate the effort then to propose an implementation. I really wouldn't be surprised if someone couldn't come up with great reason why #1 is silly and we just shouldn't do it. The big problem with the nosearch class implementation is that it'd be pretty simple to abuse and hard to catch the abuse because the text is still on the page. One of the nice things about the solution is you could use a web browser's debugger to highlight all the text excluded from search by writing a simple CSS class. I think that is all I have on the subject, Nik/manybubbles On Wed, Feb 19, 2014 at 1:29 AM, Chad innocentkil...@gmail.com wrote: On Tue, Feb 18, 2014 at 9:50 PM, MZMcBride z...@mzmcbride.com wrote: Chad wrote: I'm curious how people would go about hiding text from the internal MediaWiki search engine (not external robots). Right now I'm thinking of doing a rather naïve .nosearch class that would be stripped before indexing. I can see potentials for abuse though. Does anyone have any bright ideas? It's difficult to offer advice without knowing why you're trying to do what it is you're trying to do. You've described a potential solution, but I'm not sure what problem you're trying to solve. Are there some example use-cases or perhaps there's a relevant bug in Bugzilla? Ah, here's the bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=60484 -Chad ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Thoughts on hiding text from the internal search
On Wed, Feb 19, 2014 at 12:17 PM, Helder . helder.w...@gmail.com wrote: On Wed, Feb 19, 2014 at 12:14 PM, Nikolas Everett never...@wikimedia.org wrote: The big problem with the nosearch class implementation is that it'd be pretty simple to abuse and hard to catch the abuse because the text is still on the page. One of the nice things about the solution is you could use a web browser's debugger to highlight all the text excluded from search by writing a simple CSS class. What if the abuse is inside of a hidden element? http://jsfiddle.net/WQ6K2/ Helder Yeah, nowhere near perfect. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] TitleValue
On Fri, Jan 24, 2014 at 8:55 PM, Daniel Kinzler dan...@brightbyte.dewrote: Am 24.01.2014 14:44, schrieb Brad Jorsch (Anomie): It looks to me like the existing patch *already is* getting too far into the Javaification, with it's proliferation of classes with single methods that need to be created or passed around. There is definitely room for discussion there. Should we have separate interfaces for parsing and formatting, or should both be covered by the same interface? Should we have a Linker interface for generating all kinds of links, or separate interfaces (and/or implementations) for different kinds of links? I don't have strong feelings about those, I'm happy to discuss the different options. I'm not sure about the right place for that discussion though - the patch? The RFC? This list? I vote mailing list. Maybe it'll be livelier. Personally, as I said in previous mails, I like the idea of pulling things out of the Title class. I'm going to pose questions and answer them in the order that they come to me. * Should linking, parsing, and formatting live outside the Title class? Yes for a bunch of reasons. At a minimum the Title class is just too large to hold in your head properly. Linking, parsing, and formatting aren't really the worst offenders but they are reasonably easy to start with. I would, though, like to keep some canonical formatting in the new TitleValue. Just a useful __toString that doesn't do anything other than print the contents in a form easy to read. * Should linking, parsing, and formatting all live together in one class outside the Title class? I've seen parsing and formatting live together before just fine as they really are the inverse of one another. If they are both massively complex then they probably ought not to live together. Linking feels like a thing that should consume the thing that does formatting. I think putting them together will start to mix metaphors too much. * Should we have a formatter (or linker or parser) for wikitext and another for html and others as we find new output formats? I'm inclined against this both because it requires tons of tiny classes that can make tracing through the code more difficult and because it implies that each implementation is substitutable for the other at any point when that isn't the case. Replacing the html formatter used in the linker with the wikitext formatter would produce unusable output. I really think that the patch should start modifying the Title object to use the the functionality that it is removing from it. I'm not sure we're ready to start deprecating methods in this patch though. In a parallel to getting the consensus to merge a start on TitleValue we need to be talking about what kind of inversion of control we're willing to have. You can't step too far down the services path without some kind of strategy to prevent one service from having to know what its dependencies dependencies are. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Lsearch and MWSearch: how to turn on morphology for Russian
I hate to say this after all you went through setting up Lucene Search but it is end of life and not receiving any real support. We're in the process of replacing it with the combination of CirrusSearchhttps://www.mediawiki.org/wiki/Extension:CirrusSearch /Elasticsearch http://www.elasticsearch.org/ which work pretty much the same way the MWSearch/Lucene Search combination does. CirrusSearch has to be smarter than MWSearch because Elasticsearch doesn't have any Mediawiki knowledge but because it links into Mediawiki it can do things like expand templates. I like it but I'm biased. That aside, it looks like Lucene Search is supposed to read InitializeSettings which is kind of wmf specific thing. You might be able to trick it into doing it by putting a file called InitializeSettings.php in the conf directory with the contents 'wgLanguageCode' = array( 'your $wgDBname' = 'ru', ), CirrusSearch, if you care to try it, reads the language code from wgLanguageCode. Nik On Thu, Jan 30, 2014 at 3:39 PM, Yury Katkov katkov.ju...@gmail.com wrote: Hi guys! I've installed MWSearch and Lucene Search extensions but I can see that the search engine doesn't understand the morphology of Russian (doesn't recognize word forms). How can I turn the morphological analyzer on? How it's done in Russian Wikipedia? Cheers, - Yury Katkov, WikiVote ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Smarter namespace defaults for searches
I like the idea. I wonder a few things: 1. Is this something that only makes sense to do for the help namespace? 2. Would it be good enough to catch help me kinds of queries and provide a did you mean-like suggestion for a new search that'd actually search help? Nik On Mon, Jan 6, 2014 at 4:42 PM, Tobias church.of.emacs...@googlemail.comwrote: I've been a Wikipedia trainer at schools for quite some time now. Probably the single most common mistake people in my workshops make when accessing a Wiki's meta pages (i.e. Wikipedia:Help) is by omitting the colon indicating the namespace. The default search namespace is just NS-0, i.e. the main namespace. This means if you enter Wikisource Help on en.wikisource.org, you get nothing useful: http://en.wikisource.org/w/index.php?search=Wikisource+Helpbutton=title=Special%3ASearch English Wikipedia has implemented a workaround by creating redirects from the main namespace to the project namespace: an ugly fix, since it mixes up the distinction between namespaces. Instead, we should make MediaWiki a bit smarter with regard to the namespace selection: When you search for Help Editing, the Help namespace should be included. This could be done in the most simplest form by checking whether a namespace string is a prefix of the search string (perhaps excluding exotic namespaces such as MediaWiki) or even if the namespace name is contained in the search string. What do you think? Best regards, Tobias ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RFC cluster summary: HTML templating
On Fri, Dec 27, 2013 at 1:30 PM, Chad innocentkil...@gmail.com wrote: On Fri, Dec 27, 2013 at 12:34 PM, Jon Robson jdlrob...@gmail.com wrote: I want a templating system that can be used both in PHP and JavaScript and fits in our way of doing i18n. And a bunny. I'm not sure if this was meant as sarcastic but I do want this too and think it is a reasonable achievable goal - bunny optional! Bunnies should be listed in the requirements ;-) I believe unicorns were in the requirements for search. In all seriousness, PHP, JavaScript, and fitting i18n sound like minimum requirements. I'd also throw in html escaping by default. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Deploymight highlights - week of December 16th
I wonder if this is 38273 revived. Like 58042 was. Cirrus hasn't changed this code so I'm reasonably confident it isn't us this time. Though it is still possible given that we're on mediawikiwiki and itwiki. On Wed, Dec 18, 2013 at 4:23 AM, Federico Leva (Nemo) nemow...@gmail.comwrote: Did the PHP upgrade affect tidy in some way? Some pages are severely broken e.g. by unbalanced div or table tags (both Vector and monobook). Only two reports on #wikimedia-tech in two days, so maybe no real change, but I used not to hear any. :) https://www.mediawiki.org/w/index.php?title=Extension% 3ABugzilla_Reportsdiff=844734oldid=773425 https://it.wikipedia.org/w/index.php?title=Utente% 3AVale14orladiff=63098711oldid=54419590 Nemo ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Hook for Adding to Empty Search Results
SpecialSearchResultsPrepend lets you add html directly to the search page but doesn't let you add your own results. The html actually gets injected above the search for so it'd take some css trickery to move it. Example: This wiki is using a new search engine. (Learn morehttps://www.mediawiki.org/wiki/Special:MyLanguage/Help:CirrusSearch) on https://www.mediawiki.org/wiki/Special:Search Beyond that I think you have three options: 1. Extend SearchMySQL. 2. Add a hook yourself and know that you are running a patched version of core. I'm happy to help get the patch upstream if you don't want to live with that burden forever. 3. Add that pages with importTextFile. Nik On Thu, Dec 12, 2013 at 9:15 PM, Paul Dugas p...@dugasenterprises.comwrote: I have an extension using the ArticleFromTitle hook to generate pages for components of a large system we operate. There are approximately 6000 components at the moment with static inventory and config data in a database and live status data in a number of other systems. We are using MediaWiki as a historical maintenance knowledge-base for the staff. With this extension, we can integrate all the data for each device in one place. We can hit MyNS:DeviceName and get a page that describes a device and that page can link to other pages in the main namespace that techs create with vendor details, model info, manuals, etc. We can even keep a talk page for each device. Very handy. Trouble now is I want to be able to find devices using the search feature. SpecialSearchResults looked promising but that only gets called when there is at least one match in normal pages. So, I looked at SpecialSearchNoResults but that doesn't allow me to add to the empty results. Doesn't anyone have a suggestion on how I could go about this? I really want to avoid generating the text of pages externally periodically and loading them into the wiki using the importTextFile maintenance script. The only other thought I had was to extend the SearchMySQL class and change $wgSearchType but I'm hoping to avoid that. Any ideas? --Paul ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Hook for Adding to Empty Search Results
Glad to hear it! I hadn't seen SpecialSearchResultsAppend before. Unesful. On Fri, Dec 13, 2013 at 9:02 AM, Paul Dugas p...@dugasenterprises.comwrote: Thanks Nikolas. I found SpecialSearchResultsPrepend and SpecialSearchResultsAppend looking through the code though I didn't see them in the documentation. I implemented the later to add a section below the standard search results that lists results from my system. Seems to be working for now and requires no patching of the core code. P On Fri, Dec 13, 2013 at 8:55 AM, Nikolas Everett never...@wikimedia.org wrote: SpecialSearchResultsPrepend lets you add html directly to the search page but doesn't let you add your own results. The html actually gets injected above the search for so it'd take some css trickery to move it. Example: This wiki is using a new search engine. (Learn morehttps://www.mediawiki.org/wiki/Special:MyLanguage/Help:CirrusSearch ) on https://www.mediawiki.org/wiki/Special:Search Beyond that I think you have three options: 1. Extend SearchMySQL. 2. Add a hook yourself and know that you are running a patched version of core. I'm happy to help get the patch upstream if you don't want to live with that burden forever. 3. Add that pages with importTextFile. Nik On Thu, Dec 12, 2013 at 9:15 PM, Paul Dugas p...@dugasenterprises.com wrote: I have an extension using the ArticleFromTitle hook to generate pages for components of a large system we operate. There are approximately 6000 components at the moment with static inventory and config data in a database and live status data in a number of other systems. We are using MediaWiki as a historical maintenance knowledge-base for the staff. With this extension, we can integrate all the data for each device in one place. We can hit MyNS:DeviceName and get a page that describes a device and that page can link to other pages in the main namespace that techs create with vendor details, model info, manuals, etc. We can even keep a talk page for each device. Very handy. Trouble now is I want to be able to find devices using the search feature. SpecialSearchResults looked promising but that only gets called when there is at least one match in normal pages. So, I looked at SpecialSearchNoResults but that doesn't allow me to add to the empty results. Doesn't anyone have a suggestion on how I could go about this? I really want to avoid generating the text of pages externally periodically and loading them into the wiki using the importTextFile maintenance script. The only other thought I had was to extend the SearchMySQL class and change $wgSearchType but I'm hoping to avoid that. Any ideas? --Paul ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- *Paul Dugas* • *Dugas Enterprises, LLC* • *Computer Engineer* p...@dugasenterprises.com p...@dugasenterprises.com • +1.404.932.1355 522 Black Canyon Park, Canton GA 30114 USA ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] OAuth currently broken on wikis with CirrusSearch
Note that the wikis that say they were deployed on December 11th but do not have a strike through them have Cirrus running, but their indexes are still being built. I believe OAuth will be broken on those wikis as well. This requires two fixes to actually fix, both of which are in review state pending approval, another test on beta, and eventual deployment. We should have them out sometime in the next few hours. Nik On Thu, Dec 12, 2013 at 11:48 AM, Dan Garry dga...@wikimedia.org wrote: For reference, the list of wikis which Cirrus is deployed, and therefore where OAuth is broken, is available here: https://www.mediawiki.org/wiki/Search#Wikis Dan On 12 December 2013 16:46, Dan Garry dga...@wikimedia.org wrote: Dear all, OAuth is currently broken on any wiki that has CirrusSearch deployed to it in either primary or secondary mode. We're working on getting this issue fixed as soon as possible. I'll post an update here when we have a timescale for the fix. Thanks, Dan -- Dan Garry Associate Product Manager for Platform Wikimedia Foundation -- Dan Garry Associate Product Manager for Platform Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] OAuth currently broken on wikis with CirrusSearch
On Thu, Dec 12, 2013 at 11:53 AM, Nikolas Everett never...@wikimedia.orgwrote: Note that the wikis that say they were deployed on December 11th but do not have a strike through them have Cirrus running, but their indexes are still being built. I believe OAuth will be broken on those wikis as well. This requires two fixes to actually fix, both of which are in review state pending approval, another test on beta, and eventual deployment. We should have them out sometime in the next few hours. I've just verified the fix in production. Please let me know if any of you are still seeing the error. Thanks, Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] workflow to add multiple patches to gerrit.wikimedia.org:29418/operations/puppet.git
Normally: * clone a repo * setup git hooks # patch 1: * git checkout -b some_branch_name * apply my changes * git commit -a * git review # patch 2: * git checkout production (or master on non-puppet repositories) * git pull * git checkout -b some_other_branch_name * apply my changes * git commit -a * git review Nik On Wed, Nov 20, 2013 at 8:13 AM, Petr Bena benap...@gmail.com wrote: Currently I do: * clone a repo * setup git hooks # patch 1: * apply my changes * commit * execute git-review # patch 2: * apply my changes * commit FAIL - the new commit it depending on previous commit - I can't push What am I supposed to do in order to push multiple separate patches? GIT-IDIOT way please, no long explanations, just commands and examples. Thanks ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] workflow to add multiple patches to gerrit.wikimedia.org:29418/operations/puppet.git
On Wed, Nov 20, 2013 at 9:01 AM, Petr Bena benap...@gmail.com wrote: when I did a new branch before git-review it now show this as topic in gerrit: https://gerrit.wikimedia.org/r/#/c/96484/ will it merge this to production branch? Yes. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Tip for Sublime Text editors: DocBlockr plugin and conf for JSDuck
Package Control is you friend. How else do you install a linter or syntax highlighting for a new language without touching a mouse? On Tue, Nov 19, 2013 at 2:42 PM, Tomasz Finc tf...@wikimedia.org wrote: On Tue, Nov 19, 2013 at 3:35 AM, Krinkle krinklem...@gmail.com wrote: DocBlockr Nice. I hadn't know about Package Control either. thanks --tomasz ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Exceptions, return false/null, and other error handling possibilities.
On Tue, Oct 8, 2013 at 12:15 AM, Tim Starling tstarl...@wikimedia.org wrote: On 08/10/13 14:40, Erik Bernhardson wrote: A reviewer should be able to know if the error conditions are properly handled by looking at the new code, not by looking up all the function calls to see what they can possibly return. This is why the recommended pattern for Status objects is to return a Status object unconditionally Can we add an example of that usage to the status object with a note not to follow the return this in case of error pattern that you might see elsewhere in the code? It might even be worth a bit of refactoring to get rid of the old pattern or people will still keep finding it and copying it. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Exceptions, return false/null, and other error handling possibilities.
On Mon, Oct 7, 2013 at 3:12 PM, Jeroen De Dauw jeroended...@gmail.com wrote: Hey, We use lots of libraries that happen to use composer. We just don't use composer to deploy them. Oh? Lots? Is there a list somewhere? Are most of those libraries forked? Are a good portion of them semi-assimilated into core? I hope the answer to the later two is no. I believe the procedure is to set up a clone of them on gerrit, include them as a submodule, and then do *something* to make the classes autoload. Updating from upstream should be a mater of pulling the upstream update locally, pushing to gerrit, updating the submodule pointer, and making sure the autoloading still makes sense. In some respects it is a very convenient way to do things. In others, not so much. There isn't a list, they are scattered among the mediawiki extensions in gerrit. I'm not defending it, but I can see why we do it. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Exceptions, return false/null, and other error handling possibilities.
On Mon, Oct 7, 2013 at 3:45 PM, Brion Vibber bvib...@wikimedia.org wrote: I've heard the vague claim that exceptions are confusing for years, but for the life of me I've never seen exception-handling code that looked more complex or confusing than code riddled with checks for magic return values. When I'm writing Haskell nothing is more intuitive than the error monad because that is how the compiler works. When I'm writing Java nothing is more intuitive than exceptions because that is how the standard library works. When I'm writing Scala nothing is more intuitive than exceptions for unrecoverable errors and Option/Either for recoverable ones because that is how the standard library works. When I'm writing C I deal with magic return values, modified arguments, and errno because that is what libc burdens me with. When I'm writing PHP I deal with magic return values, modifiable arguments, and exceptions because that is what is in the standard library. Oh, yeah, and I deal with Status too, because we use it sometimes. I don't see the point in adding another error handling mechanism beyond the ones you are stuck with in the standard library. It is just too much work to wrap the standard library over and over and over again. Unless you are writing Javascript, then promises are too compelling. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] How to get search on mediawiki.org not to use synonyms?
On Tue, Sep 24, 2013 at 12:00 PM, David Gerard dger...@gmail.com wrote: I just went looking for the word referer. The response started with lots of instances of the word reference. Put it in quotes, no difference. Eventually resorted to Google. Is MW.org using the exciting new search engine? Is there any way to search without using synonyms? It is using the exciting new search engine but that search engine has a bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=54020 For now, go back to the old search system by adding srbackend=LuceneSearch to the search url like so: https://www.mediawiki.org/w/index.php?title=Special%3ASearchprofile=defaultsearch=%22referer%22fulltext=Searchsrbackend=LuceneSearch Feel free to add yourself to the cc list to watch our progress squashing it. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Wikitech-ambassadors] Fwd: Deployment highlights for the week of Sept 23rd
On Mon, Sep 23, 2013 at 12:40 PM, Chris McMahon cmcma...@wikimedia.org wrote: So nice to see what Nik has done here. Information on running these tests is in the README: http://git.wikimedia.org/blob/mediawiki%2Fextensions%2FCirrusSearch.git/a7d5386c659e0afff1bae24967b333b06f639512/tests%2Fbrowser%2FREADME I'd like to get those tests running in MediaWiki-Vagrant but I just can't find the time at the moment. In other news, https://www.mediawiki.org/wiki/Search/CirrusSearchFeatures now has a reasonably complete list of CirrusSearch's features. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Wikitech-ambassadors] Fwd: Deployment highlights for the week of Sept 23rd
On Sat, Sep 21, 2013 at 10:26 AM, Chad innocentkil...@gmail.com wrote: On Fri, Sep 20, 2013 at 11:47 PM, billinghurst billinghu...@gmail.comwrote: Excellent news! Would someone be able to provide or point to some configuration and examples that English Wikisource can utilise to allow some side-by-side searches, and some guidance that can be provided to the community on the new features and their use (if there us any). I think Nik's e-mail from when we deployed to mw.org is still the best info. http://lists.wikimedia.org/pipermail/wikitech-l/2013-August/071548.html CirrusSearch is pretty much a work alike for the current search so I haven't done too much documenting. I'll fill out https://www.mediawiki.org/wiki/Search/CirrusSearchFeatures a boiled down list of features. The big one I think you care about is that templates are evaluated during indexing. If you can't wait you can read the regression tests here: http://git.wikimedia.org/tree/mediawiki%2Fextensions%2FCirrusSearch.git/master/tests%2Fbrowser%2Ffeatures . They are written in cucumber so they should be reasonably readable. Fair warning: I use the term page and article pretty much interchangeably. Also one of the tests is failing on my development machine but I haven't commented it out with an associated bug like I usually do because, well, I like looking at it failing I guess. The bug is here: https://bugzilla.wikimedia.org/show_bug.cgi?id=53426 Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] RfC update: LESS stylesheet support in core
On Thu, Sep 19, 2013 at 4:04 PM, Dan Andreescu dandree...@wikimedia.org wrote: - Has http://learnboost.github.io/stylus/ been considered? I've heard that it's a good compromise between sass and less (but I haven't played with it myself to see if it really lets you do more compass-like things). *Popularity* - does matter; one of the long comment threads on the RFC is from a potential contributor who is concerned that LESS makes it harder to contribute. I mostly agree with Jon's and Steven's arguments that LESS is pretty easy to learn. However, I have also heard about a year's worth of complaints about Limn being written in Coco instead of pure Javascript. I personally think CSS - LESS is just as mentally taxing as Javascript - Coco, but I'm objectively in the minority based on the feedback I've received. I'd be cautious here. You can upcompile CSS into LESS, sure, but if a contributor has to understand a complex LESS codebase full of mixins and abstractions while debugging the generated CSS in the browser, they're right to point out that this requires effort. And this is effort is only increased for more elegant languages like Stylus. I'm for any compiled-to-css language because I feel they fill a big gaping hole in css's ability to share code. That is really compelling to me. I haven't been convinced the compiled-to-js languages offer quite as compelling a value proposition so the analogy to Limn and Coco is less relevant to me. I admit I could be wrong about the value proposition thing but that is how I feel. I really don't want to start a language war though. I'm a Sass fan but I'll take whatever I can get. I will point out that CSS is valid LESS which could assuage some fears. Nik Everett ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] CirrusSearch on mediawiki.org
On Sat, Sep 7, 2013 at 5:29 PM, MZMcBride z...@mzmcbride.com wrote: Federico Leva (Nemo) wrote: Nice! As for next steps, what about using Wiktionary as next pioneering project for the new CirrusSearch (first opt-in and then default)? It exists in most languages (we really need to see how the new search works in different languages), it's one of the most impacted projects by the new features (e.g. expanded templates indexing) [...]. This seems like a good idea to me. Chad / Nik: your thoughts? English Wiktionary is certainly on my list of possible next victims. I don't _think_ I want to flip the switch on all the languages at once though. On the other hand if anyone in the community really really really wants to try it we'd love work with them. I like enthusiasm. And I want to try CirrusSearch against other languages but it wouldn't do much good without someone active in that community willing to test it. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] BetaFeatures framework, and a minor call for technical input
I worked on an accounting system with similar requirements and we had an even more complicated system but one you might want to consider: 1. When something happens record the event and how much it changed the value along with a timestamp. In our case we'd just have enable and disable events. 2. We ran a job that summarized those events into hourly changes. 3. Every day we a log of the actual value (at midnight or whatever). This let us quickly make all kinds of crazy graphs with super deep granularity over short periods of time and less granularity over long periods of time. Essentially it was an accountant's version of RRDtool. It didn't have problems with getting out of sync because we never had more than one process update more than one field. It is probably overkill but might serve as a dramatic foil to the simpler ideas. Nik On Tue, Sep 3, 2013 at 5:58 PM, Mark Holmquist mtrac...@member.fsf.org wrote: Timezone-appropriate greeting, wikitech! I've been working on a new extension, BetaFeatures[0]. A lot of you have heard about it through the grapevine, and for the rest of you, consider this an announcement for the developers. :) The basic idea of the extension is to enable features to be enabled experimentally on a wiki, on an opt-in basis, instead of just launching them immediately, sometimes hidden behind a checkbox that has no special meaning in the interface. It also has a lot of cool design work on top of it, courtesy of Jared and May of the WMF design team, so thanks very much to them. There are still a few[1] things[2] we have to build out, but overall the extension is looking pretty nice so far. I am of course always soliciting advice about the extension in general, but in particular, we have a request for a feature for the fields that has been giving me a bit of trouble. We want to put a count of users that have each preference enabled on the page, but we don't want to, say, crash the site with long SQL queries. Our theories thus far have been: * Count all rows (grouped) in user_properties that correspond to properties registered through the BetaFeatures hook. Potentially a lot of rows, but we have at least decided to use an IN query, as opposed to LIKE, which would have been an outright disaster. Obviously: Caching. Caching more would lead to more of the below issues, though. * Fire off a job, every once in a while, to update the counts in a table that the extension registers. Downsides: Less granular, sort of fakey (since one of the subfeatures will be incrementing the count, live, when a user enables a preference). Upside: Faster. * Update counts with simple increment/decrement queries. Upside: Blazingly faster. Potential downside: Might get out of sync. Maybe fire off jobs even less frequently, to ensure it's not always out of date in weird ways? So my question is, which of these are best, and are there even better ways out there? I love doing things right the first time, hence my asking. [0] https://www.mediawiki.org/wiki/Extension:BetaFeatures [1] https://mingle.corp.wikimedia.org/projects/multimedia/cards/2 [2] https://mingle.corp.wikimedia.org/projects/multimedia/cards/21 P.S. One of the first features that we'll launch with this framework is the MultimediaViewer extension which is also under[3] development[4] as we speak. Exciting times for the Multimedia team! [3] https://mingle.corp.wikimedia.org/projects/multimedia/cards/8 [4] https://mingle.corp.wikimedia.org/projects/multimedia/cards/12 -- Mark Holmquist Software Engineer, Multimedia Wikimedia Foundation mtrac...@member.fsf.org https://wikimediafoundation.org/wiki/User:MHolmquist ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] New search backend live on mediawiki.org
Today we threw the big lever and turned on our new search backend at mediawiki.org. It isn't the default yet but it is just about ready for you to try. Here is what is we think we've improved: 1. Templates are now expanded during search so: 1a. You can search for text included in templates 1b. You can search for categories included in templates 2. The search engine is updated very quickly after articles change. 3. A few funky things around intitle and incategory: 3a. You can combine them with a regular query (incategory:kings peaceful) 3b. You can use prefix searches with them (incategory:norma*) 3c. You can use them everywhere in the query (roger incategory:normans) What we think we've made worse and we're working on fixing: 1. Because we're expanding templates some things that probably shouldn't be searched are being searched. We've fixed a few of these issues but I wouldn't be surprised if more come up. We opened Bug 53426 regarding audio tags. 2. The relative weighting of matches is going to be different. We're still fine tuning this and we'd appreciate any anecdotes describing search results that seem out of order. 3. We don't currently index headings beyond the article title in any special way. We'll be fixing that soon. (Bug 53481) 4. Searching for file names or clusters of punctuation characters doesn't work as well as it used to. It still works reasonably well if you surround your query in quotes but it isn't as good as it was. (Bugs 53013 and 52948) 5. Did you mean suggestions currently aren't highlighted at all and sometimes we'll suggest things that aren't actually better. (Bugs 52286 and 52860) 6. incategory:category with spaces isn't working. (Bug 53415) What we've changed that you probably don't care about: 1. Updating search in bulk is much more slow then before. This is the cost of expanding templates. 2. Search is now backed by a horizontally scalable search backend that is being actively developed (Elasticsearch) so we're in a much better place to expand on the new solution as time goes on. Neat stuff if you run your own MediaWiki: CirrusSearch is much easier to install than our current search infrastructure. So what will you notice? Nothing! That is because while the new search backend (CirrusSearch) is indexing we've left the current search infrastructure as the default while we work on our list of bugs. You can see the results from CirrusSearch by performing your search as normal and adding srbackend=CirrusSearch to the url parameters. If you notice any problems with CirrusSearch please file bugs directly for it: https://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensionscomponent=CirrusSearch Nik Everett ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] New search backend live on mediawiki.org
On Wed, Aug 28, 2013 at 3:37 PM, Paul Selitskas p.selits...@gmail.comwrote: Will it be set as the search backend further on Wikimedia projects? Yes. I'm not sure when though. Is there source code available for Elasticsearch on Gerrit? Our plugin that interacts with Elasticsearch is called CirrusSearch and lives in gerrit here: https://gerrit.wikimedia.org/r/#/projects/mediawiki/extensions/CirrusSearch,dashboards/default https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/CirrusSearch Elasticsearch lives in github here: https://github.com/elasticsearch/elasticsearch Stemming doesn't work for some languages at all, thus searching exact matches only. Stemming is done based on the language of the wiki. I expect only English stemming to work on mediawiki.org. Right now we use the default language analysers for all the languages that Elasticsearch supports out of the box ( http://www.elasticsearch.org/guide/reference/index-modules/analysis/lang-analyzer/) with some customizations for English. Languages that aren't better supported get a default analyser that doesn't do any stemming and splits on spaces. I expect we'll have to add build some more analysers in the future. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Article Concerning Error Handling
On Tue, Aug 27, 2013 at 2:48 PM, Tyler Romeo tylerro...@gmail.com wrote: I know this list isn't really for linking stuff, but I found this article earlier today: http://zenol.fr/site/2013/08/27/an-alternative-error-handling-strategy-for-cpp/ It's about C++, but what it describes is very relevant to our error handling since we use the exact same pattern (via the Status class) except in PHP. I have to admin that I skimmed the article but I don't believe we use the pattern that he describes. It looks like he's advocating using an error monad. That'd bring our error handling pattern count up to 4. All we'd need next is promises! Seriously though, either data structure could be useful for us but we'd want to weigh the extra brain space required to use them. And the impedance between those structures and traditional error handling. And the performance Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] CirrusSearch live on test2wiki
On Thu, Aug 15, 2013 at 8:09 PM, Daniel Friesen dan...@nadir-seen-fire.comwrote: Wait, Elasticsearch? I thought the original discussions were about Solr? It certainly started that way but there were but some rather insistent folks talked me in to giving Elasticsearch a chance. I spent a week putting together a prototype and I was so impressed that I convinced us to move over. I'm reasonably sure I sent out an email at the time. I know I updated the RFC. In any case, that is where we are. As far what impressed me about elasticsearch: I like the documentation. I like the query syntax. I like the fully baked schema api. I (mostly) liked the source code itself. i like the deb package. I like how organized the bug submission and contribution process is. Seriously, if you are running an open source project, build something like http://www.elasticsearch.org/contributing-to-elasticsearch/ . Forcing the user to reproduce bugs with curl is genius for a service like elasticsearch. So, yeah, we started with solr but didn't stay there. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Request for Comments: New Search
Scott, I was going to respond to this a while ago but couldn't really do it justice. I'm still pretty sure my explanation won't be great, which is an indication of just how good Google is. For strait search there is nothing we can do that Google can't. It might cost them more time and money to make searching mediawiki awesome but they lots of both so we're just not going to beat them there. There are a few things that we can do more easily/cheaply than Google: 1. We can update our search index right when changes are made including when changes are made to transcluded pages. 2. We can search based on redirects to a page. 3. We can filter (and maybe one day facet) based on categories. 4. We could search based on citations. We will, on the other hand, be better about listening to what the community needs with regards to search. Part of the problem here is that historically we've let search languish and my first foray into making search nicer isn't going to provide much new stuff for the community. Instead its a solid platform on which to build things that the community needs and which should make search less exciting for operations engineers. That really isn't exciting for the community to hear and for that I am sorry. I can only promise that we'll do more later. There are some more deep integrations into mediawiki that I don't see google doing but we could work on in the future: 1. We could create a section that allowed users to easily find similar pages. I'm a little fuzzy on exactly how we'd calculate similarity. 2. We could automatically dig around in commons for useful media for an article. We could use this to automatically provide extra media which might be relevant or as a curation aid. On second thought the second one sounds much better. Actually, some kind of game around tagging media as relevant to an article might be quite a decent way to encourage engagement. By game I mean something like Galaxy Zoo or LinkedIn's endorsements. You could do this without a nice search but it'd help produce much more relevant results. And then there is the cynic in me that says that it is worth doing just so we aren't reliant on external (corporate) entities. I'm really not sure how I would feel if the only way to find stuff on WMF's wikis was with Google/Bing/Yahoo Finally we have the private wikis like you mentioned - they mostly can't use google. We are trying to make sure CirrusSearch works for them. The idea there is to provide something that is better at finding results than the database based search because it uses the same analysis that we've optimized for WMF. Elasticsearch isn't some kind of precision tuned machine - you can actually get quite decent behaviour out of downloading the deb or rpm and installing it. You only really need one instance. So now that I've created this wall of text I don't feel that I've really answered your question well, but I've answered it. That is the thing about hard questions: they are harder to answer than to ask. I'd really love more brainstorming. Cross wiki search was another good idea someone added to the page a while ago. Nik On Fri, Jul 19, 2013 at 2:24 PM, C. Scott Ananian canan...@wikimedia.orgwrote: I wonder if there are queries or use cases we can support that *aren't* already better handled by google. Granted, users of private wikis can't simply use the 'site:' trick to reuse Google search results -- but users of private wikis also probably don't need superduper scalability. Trying to brainstorm here, not start a flame war. What sorts of useful searches could we excel at? (Maybe these are searches/use cases that will facilitate editor engagement?) --scott -- (http://cscott.net) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Request for Comments: New Search
Everyone, I'm reviving this old thread to update everyone on the status of the RFC: We've continued working on implementation and everything seems to be proceeding smoothly. We evaluated Elasticsearch and were super impressed and decided it was very likely to be worth switching from Solr4 to it. The evaluation and the switch did cost some time but in my opinion doing it was time well spent. Thanks so much for your comments a month ago when I first posted this. If you are interested please give the page another look. Just to be helpful, here is a link to what I changed: http://www.mediawiki.org/w/index.php?title=Requests_for_comment%2FCirrusSearchdiff=740790oldid=728213 Nik Everett On Fri, Jun 14, 2013 at 4:21 PM, Nikolas Everett never...@wikimedia.orgwrote: So Chad and I feel like we've gotten far enough in our prototype of our new search backend for MediaWiki that we're ready to request comments. So here is our format RFC: https://www.mediawiki.org/wiki/Requests_for_comment/CirrusSearch You'll note that the plugin is called CirrusSearch. SolrSearch seems to have been taken by an unrelated project so we had to pick a different name. Please read and comment in whatever way is normal for these things. Thanks so much for your attention, Nik Everett ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Project idea
As a ChromeOS user I really just think of it as a laptop with a funky set of apps. I'm pretty sure I wouldn't have thought to search for a wikipedia app for it because I'm so used to getting wikipedia in the browser. On the other hand if the app could modify search key behaviour so I can hit search, type wikipedia, hit tab, type search term, then hit enter, then I'd like that. On the other other hand I already have this behaviour in all browser windows so from (pretty much) anywhere in the OS I can hit ctrl-t, ctrl-l, type wikipedia, hit tab, type search term, then hit enter. Also, it feels like that search key behaviour is up to google anyway and at some point they'll make it work the same as the location bar. Nik On Fri, Jul 12, 2013 at 2:22 PM, Steven Walling steven.wall...@gmail.comwrote: On Fri, Jul 12, 2013 at 9:00 AM, Brion Vibber bvib...@wikimedia.org wrote: I'd recommend against building any specific 'app' for a web-based OS like this, but if we can have a Chrome Web Store entry that conveniently bookmarks us and that makes us easier to use, well that'd be awesome. You mean you recommend against OS-specific apps, like we have specific apps for Windows Phone, iOS, and Android? ;) Snark aside: what you proposed is essentially how most Chrome apps work and is easiest to implement. For HTML5 games and such, I'm sure it's more app-like in that you may not be able to launch the game without installing the app, but most people basically just redirect users to the normal site. Obviously this makes the use of the name app seem bizarre, but the advantage for ChromeOS users is that we make it easier to get back to Wikipedia. (One step instead of three.) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Search documentation
I'm not sure about http://www.mediawiki.org/wiki/Help:Searching but https://en.wikipedia.org/wiki/Help:Searching has lots of things we're going to have to add to our list. My guess is http://www.mediawiki.org/wiki/Help:Searching is simply out of date. Nik On Mon, Jun 17, 2013 at 4:33 PM, Chris McMahon cmcma...@wikimedia.orgwrote: On Mon, Jun 17, 2013 at 1:28 PM, S Page sp...@wikimedia.org wrote: * enwiki says Hello dolly in quotes gives different results, mw directly contradicts this. Even on my local wiki, quotes make a difference. * enwiki disagrees with itself what a dash in front of a word does. I did some research a few weeks ago on the current state of Search and there are a number of discrepancies between the documentation and actual behavior. Some of them have BZ tickets, like https://bugzilla.wikimedia.org/show_bug.cgi?id=44238 -Chris ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Search documentation
One of our goals while building this has been to make something reasonably easy to install by folks outside of WMF. I've added some notes about this to the page. I'd certainly love to hear ways that'd make it simpler to use. Nik On Mon, Jun 17, 2013 at 8:23 PM, Brian Wolff bawo...@gmail.com wrote: Just as a note, MediaWiki default (aka crappy) search is very different from the lucene stuff used by Wikimedia. Lucene search is rather difficult to set up, so most third party wikis do not use it. --bawolff On 6/17/13, Nikolas Everett never...@wikimedia.org wrote: I'm not sure about http://www.mediawiki.org/wiki/Help:Searching but https://en.wikipedia.org/wiki/Help:Searching has lots of things we're going to have to add to our list. My guess is http://www.mediawiki.org/wiki/Help:Searching is simply out of date. Nik On Mon, Jun 17, 2013 at 4:33 PM, Chris McMahon cmcma...@wikimedia.orgwrote: On Mon, Jun 17, 2013 at 1:28 PM, S Page sp...@wikimedia.org wrote: * enwiki says Hello dolly in quotes gives different results, mw directly contradicts this. Even on my local wiki, quotes make a difference. * enwiki disagrees with itself what a dash in front of a word does. I did some research a few weeks ago on the current state of Search and there are a number of discrepancies between the documentation and actual behavior. Some of them have BZ tickets, like https://bugzilla.wikimedia.org/show_bug.cgi?id=44238 -Chris ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Request for Comments: New Search
So Chad and I feel like we've gotten far enough in our prototype of our new search backend for MediaWiki that we're ready to request comments. So here is our format RFC: https://www.mediawiki.org/wiki/Requests_for_comment/CirrusSearch You'll note that the plugin is called CirrusSearch. SolrSearch seems to have been taken by an unrelated project so we had to pick a different name. Please read and comment in whatever way is normal for these things. Thanks so much for your attention, Nik Everett ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Architecture Guidelines: Writing Testable Code
On Tue, Jun 4, 2013 at 12:36 PM, Jeroen De Dauw jeroended...@gmail.comwrote: Hey, My own experience is that test coverage is a poor evaluation metric for anything but test coverage; it doesn't produce better code, and tends to produce code that is considerably harder to understand conceptually because it has been over-factorized into simple bits that hide the actual code and data flow. Forest for the trees. Test coverage is a metric to see how much of your code is executed by your tests. From this alone you cannot say if some code is good or bad. You can have bad code with 100% coverage, and good code without any coverage. You are first stating it is a poor metric to measure quality and then proceed to make the claim that more coverage implies bad code. Aside from contradicting yourself, this is pure nonsense. Perhaps you just expressed yourself badly, as test coverage does not produce code to begin with. The thing is quite a few of us have seen cases where people bend over backwards for test coverage, sacrificing code quality and writing tests that don't provide any real value. In this respect high test coverage can poison your code. It shouldn't but it can. The problem is rejecting changes like this while still encouraging people to write the useful kinds of tests - tests for usefully large chunks that serve as formal documentation. Frankly, one of my favorite tools in the world is Python's doctests because the test _is_ the documentation. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Architecture Guidelines: Writing Testable Code
I have no qualms with any of the guidelines. They are good guidelines but like all guidelines they are made to be bent when appropriate so long as you leave a good explanatory comment. My main concern is that the article is about test how to write more unit testable code which is something I think people take too far. The thing that unit tests are good for is testing that a unit of code does what you expect it to. The problem is that people sometimes test portions of atomic units without testing the whole unit. Java folks are especially dogmatic about testing just one class at a time which is a great guideline but tends to be the wrong thing to do about 20% of the time. My favorite example of this is testing a Repository or a DAO with a mock database. A repository's job is to issue the correct queries to the database and spit the results back correctly. Without talking to an actual database you aren't testing this. Without some good test data in that database you aren't testing this. I'd go so far as to say you have to talk to _exactly_ the right database (MySQL in our case) but other very smart people disagree with me on that point. While this example is especially silly I'm sure we've all finished writing a tests, looked at the test code and thought, This test proves that I'm interacting correctly with collaborator objects but doesn't prove that my functionality is correct. Sometimes this is caused by collaborators being non-obvious. Sometimes this is caused by global state that you have to work around. In any case I'd argue that these tests should really be deleted because all they really do is make your code coverage statistics better, give you a false sense of security, and slow down your builds. So I just wrote a nice little wall of text about what is wrong with the world and like any good preacher I'll propose a few solutions: 1. Live with having bigger units. Call the tests an integration test if it makes you feel better. I don't really care. But you have to stand up the whole database connection, populate it with test data that mimics production in a useful sense, and then run the query. 2. Build smaller components sensibly and carefully. The goal is to be able to hold all of the component in your head at once and for the component to present such a clean API that when you mock it out tests are meaningful. 2. Write tests that test the entire application after it is started with stuff like Selenium. The disadvantage here is that these run way slower than unit tests and require you learn yet another tool. Too bad. Some stuff is simply untestable without a real browser like Tim's HTML forms. 3. Use lots of static analysis tools. They really do help identify dumb mistakes and don't even require you do anything other than turn them on, run them before you commit, and fail the build when they fail. Worth it. 4. Don't write automated tests at all and do lots of code reviews and manual testing. Sometimes this is really the most sensible thing. I'll leave it to you to figure out when that is though. There is a great presentation on InfoQ about unit testing that I can't find anymore where the presenter likens testing to guard rails. He claims that just because you have guard rails you shouldn't stop paying attention and expect them to save you. Sorry for the rambling wall of text. Nik On Mon, Jun 3, 2013 at 7:58 AM, Daniel Kinzler dan...@brightbyte.de wrote: Thanks for your thoughtful reply, Tim! Am 03.06.2013 07:35, schrieb Tim Starling: On 31/05/13 20:15, Daniel Kinzler wrote: Writing Testable Code by Miško Hevery http://googletesting.blogspot.de/2008/08/by-miko-hevery-so-you-decided-to.html . It's just 10 short and easy points, not some rambling discussion of code philosophy. I'm not convinced that unit testing is worth doing down to the level of detail implied by that blog post. Unit testing is essential for certain kinds of problems -- especially complex problems where the solution and verification can come from two different (complementary) directions. I think testability is important, but I think it's not the only (or even main) reason to support the principles from that post. I think these principles are also important for maintainability and extensibility. Essentially, they enforce modularization of code in a way that makes all parts as independent of each other as possible. This means they can also be understood by themselves, and can easily be replaced. But if you split up your classes to the point of triviality, and then write unit tests for a couple of lines of code at a time with an absolute minimum of integration, then the tests become simply a mirror of the code. The application logic, where flaws occur, is at a higher level of abstraction than the unit tests. That's why we should have unit tests *and* integration tests. I agree though that it's not necessary or helpful to enforce the maximum
Re: [Wikitech-l] Architecture Guidelines: Writing Testable Code
On Mon, Jun 3, 2013 at 10:20 AM, Jeroen De Dauw jeroended...@gmail.comwrote: 4. Don't write automated tests at all and do lots of code reviews and manual testing. Sometimes this is really the most sensible thing. I'll leave it to you to figure out when that is though. Absolutist statements are typically wrong. There are almost always cases in which some practice is not applicable. However I strongly disagree with your recommendation of not writing tests and automating them. I disagree even stronger with the notion that manual testing is generally something you want to do. I've seen many experts in the field of software design recommend strongly against manual testing, and am currently seeing the same theme being pretty prevalent here at the International PHP Conference I'm currently attending. I think not having automated tests is right in some situations but I certainly wouldn't recommend it. Manual testing sucks and having nice tests with Selenium or some such tool is way better in most situations but there are totally times where a good code review and manual verification are perfect. I'm thinking of temporary solutions or styling issues are difficult to verify with automated tests. I'm certainly no expert and I'd _love_ to learn more about things that help in the situations where I feel like manual testing is best. I'd love nothing more than to be wrong. So my question is not how do we write code that is maximally testable, it is: does convenient testing provide sufficient benefits to outweigh the detrimental effect of making everything else inconvenient? This contains the suggestion that testable code inherently is badly designed. That is certainly not the case. Good design and testability go hand in hand. One of the selling points of testing is that it strongly encourages you to create well designed software. IMHO you can design code so that it is both easy to understand and easy to test but there is a real temptation to sacrifice comprehenability for testability. Mostly I see this in components being split into incomprehensibly small chunks and then tested via an intricate mock waltz. I'm not saying this happens all the time, only that this happens and we need to be vigilent. The guidelines in the article help prevent such craziness. There are other advantages to writing tests as well. Just out of the top of my head: * Regression detection * Replaces manual testing with automated testing, saves lots of time, esp in projects with multiple devs. Manual testing tends to be incomplete and skipped as well, so the number of bugs caught is much lower. And it does not scale. At all. * Documentation so formal it can be executed and is never out of date * Perhaps the most important: removes the fear of change. One can refactor code to clean up some mess without having to fear one broke existing behavior. Tests are a great counter to code rot. Without tests, your code quality is likely to decline. This is perfect! If you think of your tests as formal verification documents then you are in good shape because this implies that the tests are readable. If I had my druthers I'd like all software to be designed in such a way that it can be tested automatically with informative tests that read like documentation. We'd all like that. To me it looks like there are three problems: 1. How do you keep out tests that are incomprehensible as documentation? 2. What do you do with components for which no unit test can be written that could serve as documentation? 3. What do you do when the formal documentation will become out of date so fast that it feels like a waste of time to write it? I really only have a good answer for #2 and that is to test components together like the DB and Repository or the server side application and the browser. 1 troubles me quite a bit because I've found those tests to be genuinely hurtful in that they give you the sense that you are acomplishing something when you aren't. Nik ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l