Re: [Wikitech-l] Search and accents

2015-06-12 Thread Nikolas Everett
On Fri, Jun 12, 2015 at 5:33 PM, Lars Aronsson l...@aronsson.se wrote:

 This is a suggestion to change search, so it ignores
 postfix accents.

 Russian dictionaries (including Wiktionary) use accents to
 indicate stress on syllables, but these accents are never
 seen in plain text.

 In Russian Wiktionary, the verb бороться has the
 inflected form боритесь (imperative, plural),
 which does not have an entry of its own, but
 appears in a fact box (table) of inflected forms.
 However, since this is a dictionary, the word in
 the box is written with an accent: бори́тесь
 https://ru.wiktionary.org/wiki/бороться

 (I do realize that it would be possible to add
 redirect entries for all such inflected forms,
 but this has not been done in ru.wiktionary.)

 Searching for бори́тесь (which nobody would do)
 finds the relevant page,
 https://ru.wiktionary.org/w/index.php?search=бори́тесь

 but searching for боритесь (the normal thing)
 does not find the relevant page,
 https://ru.wiktionary.org/w/index.php?search=боритесь

 Note that Unicode doesn't contain accented versions
 of Cyrillic letters. Instead, the accent is made
 by suffixing a separate accent sign.

 $ echo и | od -c
 000 320 270  \n

 $ echo и́ | od -c
 000 320 270 314 201  \n


That makes sense to me. I've filed it as
https://phabricator.wikimedia.org/T102298 and we'll get it prioritized.

Let me know if you don't like how I just copied your (very good) email into
the issue and I'll try to re-summarize.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Feedback requested on our search APIs

2015-06-10 Thread Nikolas Everett
On Tue, Jun 9, 2015 at 2:19 AM, Gergo Tisza gti...@wikimedia.org wrote:

 On Mon, Jun 8, 2015 at 4:16 PM, Brian Wolff bawo...@gmail.com wrote:

  Additionally, from the help page, its not entirely clear about some of
  the limitations. e.g. You can't do incategory:Foo OR intitle:bar.
  regexes on intitle don't seem to work over the whole title, only word
  level tokens (I think, maybe? I'm a bit unclear on how the regex
  operator works).
 

 Being able to see a parse tree of the search expression would be nice, like
 with the parse/expandtemplates APIs. That would make it easier to find out
 whether the search fails because the query is parsed differently from what
 you imagined, or because there really is nothing to return.


You can _kindof_ get that now by adding the cirrusDumpQuery url parameter.
But it only dumps the query as sent by Cirrus to Elasticsearch and that
contains a query_string query that Elasticsearch (Lucene really) parses on
its own.

One interesting option would be to make a way for Cirrus to return
Elasticsearch's explain results. Its not perfect because it only explains
why things are found and scored the way they are but it doesn't explain why
things aren't found. Exporting the actual parsed query is more ambitious.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Feedback requested on our search APIs

2015-06-10 Thread Nikolas Everett
On Mon, Jun 8, 2015 at 7:16 PM, Brian Wolff bawo...@gmail.com wrote:

 You can't do incategory:Foo OR intitle:bar.
 regexes on intitle don't seem to work over the whole title, only word
 level tokens (I think, maybe? I'm a bit unclear on how the regex
 operator works).


intitle is word level though you can do phrase searching. Its pretty much
the same as a regular search but limited to the title field.
incategory:Foo OR intitle:Bar is a limitation I'm working on now. No idea
when it'll be avilable. Limitation comes from us trying to be cute with the
command parsing in Cirrus and not writing a whole grammar for the query
language.
Regexes only work for wikitext. This is a somewhat arbitrary decision on my
part - we need to made special ngram fields to accelerate the regex
searching and we only do that for wikitext. We _can_ do it for other fields
at the cost of update time and disk space.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] First impression with Differential

2015-05-21 Thread Nikolas Everett
Some comments inline!

On Thu, May 21, 2015 at 4:43 AM, Quim Gil q...@wikimedia.org wrote:

 Hi, thank you for this short and fresh review. Your help is welcome at
 https://phabricator.wikimedia.org/T597, where we are trying to identify
 blockers for using Arcanist, so we can discuss them and address them
 effectively.

 Meanwhile, some comments here.

 On Thu, May 21, 2015 at 9:01 AM, Ricordisamoa 
 ricordisa...@openmailbox.org
 wrote:

  review
  rant
  Arcanist has to be manually cloned from Git and added to $PATH. Really?
 

 Having seen how users struggle installing git-review and dependencies in
 their environments, I'm not sure this is a bad idea. Plus, I guess it makes
 updating to master pretty easy as well?


This isn't _that_ big a deal to me. git_review wasn't any easier to
install. I'd prefer to have to install nothing but if I have to install
something your description doesn't sound _that_ bad.



  Test Plan is required.
 

 Sounds like a good practice to me. Worst case scenario, type I didn't test
 this patch at all.


We can turn this off, I think:
http://stackoverflow.com/questions/20598026/how-do-i-disable-test-plan-enforcement-in-phabricator

I suspect we _should_ turn it off too because we should be minimizing the
number of changes we have to make when we switch tools. I'm not against
requiring one for most commits at some point but that should be a separate
thing. I should mention that I've never seen other open source projects
require it, for what that is worth.




  .arcconfig should be automatically detected on git clone.
  I can't review my own revisions.
 

 Neither should you, that is the point of code review. Then again, if there
 is no workaround for this, it might be a blocker for urgent Ops deployments
 (where we see many self-merged patches) and one-person projects. If this is
 the case, please create a blocker for T597 so we can discuss it in detail.


https://phabricator.wikimedia.org/T99905

I laid out my argument there but it goes the same as the test plan
argument: we do it now and we shouldn't change to support the tool. We
should change because we believe its a good idea.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Per-user search query limiting being deployed to wmf wikis

2015-05-18 Thread Nikolas Everett
On Mon, May 18, 2015 at 8:50 PM, MZMcBride z...@mzmcbride.com wrote:

 Jonathan Morgan wrote:
 On Mon, May 18, 2015 at 5:08 PM, Bahodir Mansurov
 bmansu...@wikimedia.org wrote:
  I doubt all 200 students will be making concurrent searches.
 
 I can easily imagine a scenario where 200 students in a large lecture
 classroom might be instructed to open their laptops, go to Wikipedia, and
 search for a particular topic at the same time. Similar to how teachers
 [used to] say now everyone in the class turn to Chapter 8
 
 If that is indeed what we're talking about here, it will be disruptive.

 I imagine the more common cases involve either distributing a URL or
 instructing students to search for a particular topic, which typically
 routes through Google or Yahoo! or some external search engine. Both of
 these cases wouldn't be disrupted, as I understand it.


We'll still keep an eye on it. More worrying is the assertion that some
countries come through a surprisingly small number of IP for some reason.
I've got a pretty itchy rollback finger and deploy rights.


 That said, I'm not sure what this thread is about. What problem are we
 trying to solve? Are we having issues with concurrent searches? Does
 anyone have links to Phabricator Maniphest tasks or Gerrit commits?


This is the last of some security recommendations brownout a few months ago
caused by someone finding an inefficient query and _hammering_ the reload
button a couple hundred times. I'd link to the bug but it contains
reproduction steps so its under some level of lock and key. The fix is
us-specific so it's possible the issue is repeatable against other
Lucene/Elasticsearch/SOLR users. As I said we've since prevented it from
being exploitable on our side.

If we have to increase the limits or add whitelists we will. It'll be nice
to have some protection but I'm sensitive to it causing trouble.

I expect Erik will be monitoring the logs tonight PDT time and I'll have a
look early tomorrow morning EDT. The relevant commit in gerrit is
https://gerrit.wikimedia.org/r/#/c/210622/ .

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Per-user search query limiting being deployed to wmf wikis

2015-05-18 Thread Nikolas Everett
On Mon, May 18, 2015 at 9:30 PM, John phoenixoverr...@gmail.com wrote:

 If the stressor point is a few hundred hits, lets pick a value low enough
 not to risk reaching the max, but high enough to not risk excessive
 collateral damage, Something along the lines of 40-50 would avoid most
 accidental triggers and low enough to limit server stress.

 Its far better to incrementally step the limit down, to reach optimal
 values than to cut back radically and piss everyone off until you can raise
 the threshold


I bumped the limit from 5 to 15.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Connecting to github community

2015-05-06 Thread Nikolas Everett
On Tue, May 5, 2015 at 4:28 PM, Bryan Davis bd...@wikimedia.org wrote:

 Facebook uses a bot to transfer pull requests from GitHub [5] to their
 Phabricator instance [6] for HHVM.


I gotta say I wasn't thrilled with it. It just felt all disjointed a
broken. As much as I like the idea of lowering the barrier to entry it felt
like a bait and switch. I saw github issues and sent a pull request and was
bounced to some other system where I needed yet another account. At least
with our setup its clear up front what you are getting into.

A two way synch bot like that spoke as the proper user would be pretty
awesome.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] MediaWiki-Vagrant now has support for install in Linux Containers (LXC)

2015-03-03 Thread Nikolas Everett
I've just tried it and it seems to be working well!  I  heard that some OSX
users were seeing huge huge performance problems with vagrant.  Something
about having to run it inside a vm.  I imagine running lxc inside a vm is
much less painful than running virtualbox

On Tue, Mar 3, 2015 at 3:24 PM, Dan Duvall dduv...@wikimedia.org wrote:

 Thanks a ton, Bryan!

 I know many users have been concerned with the hefty memory requirements
 (not to mention VT-x requirements) of MW-Vagrant+VirtualBox, especially on
 lower end hardware. This should be a huge help.

 Labs users can definitely look to benefit from this feature as well (once
 the Vagrant 1.7.x kinks are worked out [0]).

 [0]: https://gerrit.wikimedia.org/r/#/c/193665/

 On Tue, Mar 3, 2015 at 11:26 AM, Bryan Davis bd...@wikimedia.org wrote:

  We have working support for installing MediaWiki-Vagrant in an LXC
  container now!
 
  See the instructions in support/README-lxc.md [0] for a description of
  how to use it from a Ubuntu 14.04 host computer. Patches are welcome
  giving alternate instructions for other distributions. Note that
  Vagrant 1.7+ is required for the latest version of the vagrant-lxc
  plugin that this uses so you will probably not be able to install
  Vagrant from a package repo unless you are running Debian unstable.
 
  Making using MediaWiki-Vagrant a lighter weight experience for users
  who are running Linux on their laptops. I took a shot at this right
  after Wikimania last year by figuring out how to use MediaWiki-Vagrant
  to provision a Docker container. That experiment made a system that we
  too unstable for me to promote anyone using it as more than a proof of
  concept. Since then I've been meaning to try out LXC by using the
  vagrant-lxc plugin [1] and last weekend I finally found the time.
  Thanks to Marko Obrovac and Dan Duvall for helping test this.
 
  [0]:
 
 https://phabricator.wikimedia.org/diffusion/MWVA/browse/master/support/README-lxc.md
  [1]: https://github.com/fgrehm/vagrant-lxc
  --
  Bryan Davis  Wikimedia Foundationbd...@wikimedia.org
  [[m:User:BDavis_(WMF)]]  Sr Software EngineerBoise, ID USA
  irc: bd808v:415.839.6885 x6855
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l




 --
 Dan Duvall
 Automation Engineer
 Wikimedia Foundation http://wikimediafoundation.org
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] More news on Wikidata Query Indexing Strategy

2015-02-04 Thread Nikolas Everett
tl/dr: The technology we started building against (Titan) is probably
dead.  We're reopening the investigation for a backing technology.

Yesterday DataStax http://www.datastax.com/ announced
http://www.datastax.com/2015/02/datastax-acquires-aurelius-the-experts-behind-titandb
that they'd acquired
http://www.datastax.com/2015/02/datastax-acquires-aurelius-the-experts-behind-titandb
ThinkAurelius http://thinkaurelius.com/, the company for whom almost all
the Titan developers work. The ZDNet article
http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/
made it pretty clear that they are killing the project

 We're not going to do an integration. The play here is we'll take
 everything that's been done on Titan as inspiration, and maybe some of the
 Titan project will make it into DSE Graph, DataStax engineering VP Martin
 Van Ryswyk said.


While its certainly possible that someone from the community will come out
of the woodwork and continue Titan its now lost almost all of its top
developers.  It looks like there is some secret succession discussions
going on but I'm not holding out hope that anything will come of it.  This
pretty much blows this project's schedule of having a hardware request by
the end of the month and a publicly released beta at the end of March.

Anyway, we're reopening the investigation to pick a new backend.  We're
including more options than we had before as its become clear that open
source graph databases is a bit of a wild west space.  But there are people
waiting on this.  The developer summit made that clear.  So we're not going
to do the month long dive into each choice like we did last time.  I'm not
100% sure exactly what we'll do but I can assure you we'll be careful.

I know you might want to talk about other options - you may as well stuff
them on
https://www.mediawiki.org/wiki/Wikibase/Indexing#Other_possible_candidates
and we'll get to them.  As always, you can check out our workboard
https://phabricator.wikimedia.org/project/board/37/query/DwEBx9K4vaHo/ to
see what we're actually working on.

Titan is still in the running assuming it gets active maintainers.
OrientDB, which we evaluated last round, is still in there too.  So too are
GraphX and Neo4j.  And ArangoDB.  And Magnus' WDQ.  We'd get much more
involved in maintenance, I think.  And writing a TinkerPop implementation
Elasticsearch.  That's not a serious contender.  It'd get geo support for
free but its really just a low bar to compare all the other options to.

Thanks,

Nik https://phabricator.wikimedia.org/T88550
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] More news on Wikidata Query Indexing Strategy

2015-02-04 Thread Nikolas Everett
Top posting to add context: this is for the initiative to get a version of
Magnus' wonderful http://wdq.wmflabs.org/ running in production at WMF.

On Wed, Feb 4, 2015 at 4:50 PM, Nikolas Everett never...@wikimedia.org
wrote:

 tl/dr: The technology we started building against (Titan) is probably
 dead.  We're reopening the investigation for a backing technology.

 Yesterday DataStax http://www.datastax.com/ announced
 http://www.datastax.com/2015/02/datastax-acquires-aurelius-the-experts-behind-titandb
 that they'd acquired
 http://www.datastax.com/2015/02/datastax-acquires-aurelius-the-experts-behind-titandb
 ThinkAurelius http://thinkaurelius.com/, the company for whom almost
 all the Titan developers work. The ZDNet article
 http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/
 made it pretty clear that they are killing the project

 We're not going to do an integration. The play here is we'll take
 everything that's been done on Titan as inspiration, and maybe some of the
 Titan project will make it into DSE Graph, DataStax engineering VP Martin
 Van Ryswyk said.


 While its certainly possible that someone from the community will come out
 of the woodwork and continue Titan its now lost almost all of its top
 developers.  It looks like there is some secret succession discussions
 going on but I'm not holding out hope that anything will come of it.  This
 pretty much blows this project's schedule of having a hardware request by
 the end of the month and a publicly released beta at the end of March.

 Anyway, we're reopening the investigation to pick a new backend.  We're
 including more options than we had before as its become clear that open
 source graph databases is a bit of a wild west space.  But there are people
 waiting on this.  The developer summit made that clear.  So we're not going
 to do the month long dive into each choice like we did last time.  I'm not
 100% sure exactly what we'll do but I can assure you we'll be careful.

 I know you might want to talk about other options - you may as well stuff
 them on
 https://www.mediawiki.org/wiki/Wikibase/Indexing#Other_possible_candidates
 and we'll get to them.  As always, you can check out our workboard
 https://phabricator.wikimedia.org/project/board/37/query/DwEBx9K4vaHo/
 to see what we're actually working on.

 Titan is still in the running assuming it gets active maintainers.
 OrientDB, which we evaluated last round, is still in there too.  So too are
 GraphX and Neo4j.  And ArangoDB.  And Magnus' WDQ.  We'd get much more
 involved in maintenance, I think.  And writing a TinkerPop implementation
 Elasticsearch.  That's not a serious contender.  It'd get geo support for
 free but its really just a low bar to compare all the other options to.

 Thanks,

 Nik https://phabricator.wikimedia.org/T88550


And, too add more context, we chose not to just immediately deploy Magnus'
WDQ because we didn't want to maintain a graph database ourselves.  You
should now be able to appreciate the irony of the situation more
thoroughly.  Its healthy to find humor where you can.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] SOA in .NET, or Microsoft is going open source MIT style

2015-02-04 Thread Nikolas Everett
On Wed, Feb 4, 2015 at 5:09 AM, Yuri Astrakhan yastrak...@wikimedia.org
wrote:

 flame war ahead

 For those not adicted to slashdot, see here
 
 http://news.slashdot.org/story/15/02/04/0332238/microsoft-open-sources-coreclr-the-net-execution-engine
 
 .

 Licenced under MIT
 https://github.com/dotnet/coreclr/blob/master/LICENSE.TXT, plus an
 additional patents promise
 https://github.com/dotnet/coreclr/blob/master/PATENTS.TXT.


I'm not sure how relevant it is, but are promises legally binding?
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Sane versioning for core (was: Re: Fwd: No more Architecture Committee?)

2015-01-25 Thread Nikolas Everett
+1 for something like this.  Its not a huge problem not to do semver but
it'd be simpler to explain if we did.

On Sun, Jan 25, 2015 at 10:27 AM, Legoktm legoktm.wikipe...@gmail.com
wrote:

 On 01/15/2015 08:26 PM, Chad wrote:
  I've been saying for over a year now we should just drop the 1. from
  the 1.x.y release versions. So the next release would be 25.0, 26.0,
  etc etc.
 

 +1, let's do this. It would allow us to follow semver and still retain
 our current version number history instead of waiting for a magical 2.0.

 -- Legoktm

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Attracting new talent to our projects

2015-01-05 Thread Nikolas Everett
On Sat, Jan 3, 2015 at 11:30 PM, MZMcBride z...@mzmcbride.com wrote:

 Jon Robson wrote:
 Thoughts?

 Adding easter eggs sounds like a fairly strange recruitment tactic, but I
 don't see any harm in trying it out and seeing what happens. It's not
 totally clear to me what problem we're trying to solve here (if any). It's
 also not completely clear to me whether you want to recruit for the
 Wikimedia Foundation specifically or for the Wikimedia movement. Depending
 on the specifics, certain solutions might be more or less appropriate.



I think the best thing for recruiting for MediaWiki (the open source
project) is the extraction portion of the Librarization
https://www.mediawiki.org/wiki/Library_infrastructure_for_MediaWiki
project.  Breaking MediaWiki into parts will get it used in more places and
the more people that rely on it the more people will contribute to it.
Making reusable PHP libraries as opposed to services is doubly good at
getting contributions because the people that will be integrating with them
will also be PHP developers so they'll be reasonably quickly able to
contribute.

My expertise doesn't really extend beyond the open source project so I
won't guess at ways to recruit for the movement or the foundation.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Phabricator migration part II: Replacing gitblit with Diffusion

2014-11-29 Thread Nikolas Everett
On Nov 29, 2014 1:58 PM, Legoktm legoktm.wikipe...@gmail.com wrote:

 On the talk page I suggested dropping the G prefix for top-level
 repos, and just giving them an unprefixed callsign. I think that would
 fix the ugliness of some of the frequently used names.

  On Sat, 29 Nov 2014 17:51:11 +0100, Chad innocentkil...@gmail.com
wrote:
 
   The only exception I'd make is MediaWiki. Under this
  scheme the callsign is MWMW. MediaWiki should be
  just plain MW.

 If we rename mediawiki/core to just mediawiki it becomes a top-level
 repo and can just be MW.

  On 29 November 2014 at 09:13, Bartosz Dziewoński matma@gmail.com
  wrote:
  Feels slippery. Next thing you know, someone will want VE and SMW. :)

 If the VisualEditor/VisualEditor repo becomes VisualEditor I don't
 see anything against naming it VE.

 On 11/29/14 9:26 AM, James Forrester wrote:
  ​When we have VE-WordPress and VE-Drupal and VE-​Joomla and whatever,
we'll
  put them as… GVEW, GVED, GVEJ *etc.* and just hope we never have a first
  character clash on integrations?

 I think we would set up VE as a prefix rather than dumping them under
 general, so: VEWP, VEDP, VEJM (or whatever).

 -- Legoktm

+1
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Phabricator migration part II: Replacing gitblit with Diffusion

2014-11-26 Thread Nikolas Everett
I think that is a bit sad. Not tearing of cloths or gnashing of teeth sad.
Maybe stare whistfully into the sunset and think of what could have been
bad.

I'd prefer not to have them but I ultimately don't care that much. It does
provide a fun bikeshedding opportunity I guess.

Nik
On Nov 26, 2014 12:52 AM, Chad innocentkil...@gmail.com wrote:

 No we can't not.

 -Chad

 On Tue, Nov 25, 2014, 9:11 PM MZMcBride z...@mzmcbride.com wrote:

  James Forrester wrote:
  We need to agree how we are going to name our repos, and much more
  importantly because it can't change, what their callsign is. These
 will
  be at the heart of e-mails, IRC notifications and git logs for a long
  time, so it's important to get this right rather than regret it after
 the
  fact.
  
  A handful of repos are so important and high-profile that we can use an
  acronym without too much worry, like MW for MediaWiki or VE for
  VisualEditor. For the rest, we need to make sure we've got a good enough
  name that won't cause inconveniences or confusion, and doesn't repeat
 the
  mistakes we've identified over time. We've learnt since the SVN to git
  migration a few years ago that calling your repository /core is a bad
  plan, for instance.
 
  Could we not?
 
  JIRA does this prefixing with tickets and I don't really understand its
  purpose. We already have Git hashes and positive integers. Is another
  scheme really needed? And what was wrong with the repository names again?
 
  I was pleased that Maniphest simply uses T as a prefix. I'm kind of
 bummed
  out that Diffusion is introducing shouting obscure immutable
 abbreviations.
 
  MZMcBride
 
 
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Phabricator repository callsigns

2014-11-13 Thread Nikolas Everett
On Thu, Nov 13, 2014 at 4:14 PM, Brian Wolff bawo...@gmail.com wrote:

 On 11/13/14, Chad innocentkil...@gmail.com wrote:
  Please help me draft some guidelines for Phabricator repo callsigns.
 
  https://www.mediawiki.org/wiki/Phabricator/Callsign_naming_conventions
 
  The subpage on naming our existing repos should be especially fun:
 
 
 https://www.mediawiki.org/wiki/Phabricator/Callsign_naming_conventions/Existing_repositories
 
  Bikeshedding on the second hardest problem in CS? Who on this list can
  pass up a chance to join in there? ;-)
 
  -Chad
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 Do we get full unicode including astral characters? If so, I vote
 MediaWiki be   (U+1f33b).


If we're going unicode why no U+2620.  We could make Cirrus U+2601 or maybe
U+5377U+96F2.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Proposed timeline for remaining Cirrus/Elastic rollout

2014-10-31 Thread Nikolas Everett
On Thu, Oct 30, 2014 at 10:18 PM, MZMcBride z...@mzmcbride.com wrote:

 James Forrester wrote:
 On 30 October 2014 09:53, Chad innocentkil...@gmail.com wrote:
  New hardware is in place and we've got plenty of breathing room to wrap
 up the migration to the new search engine.
 
 ​Excellent news!​

 Indeed! Thanks to all who made this possible. An independent search engine
 is an incredibly important piece of infrastructure that's now getting a
 more appropriate level of attention and love. This is great and I'm
 excited to see what we'll be able to (continue to) build on top of it.

 I lost track of the discussion about the ability to run regular
 expressions across wikitext. I see a large amount of opportunity in being
 able to search through wikitext in real-time. There are plenty of findable
 issues and errors in our articles and search is a key component in
 improving the situation.


I have OK news and bad news on that front unfortunately.

We used to have brute force regex searches and they were pretty much
garbage.  It was just too easy to write a search that would take minutes to
complete.  That would cause the queue of other regex searches to get backed
up.  And it'd timeout on the varnish side so the user, even if they had the
patience to wait for 5 minutes to get a result, wouldn't get one.

The OK news is that we cut over to using trigram accelerated regex searches
about a week ago.  Its better.  Almost usable but not quite.  The worst
case run time is now 30 seconds.  If you search for something that is rare
it'll probably come back in a few seconds.  There are issues with
consistency when your request runs really long as well (
https://bugzilla.wikimedia.org/show_bug.cgi?id=72128).  Far from perfect
but serviceable.

The bad news.  We've had two Cirrus outages in the past week that I believe
are caused by the accelerator so its disabled and we're back to brute force
for now.  The first outage was on Monday and we didn't have a clue what
caused it.  We added logging to learn for the next time and it didn't
happen again until this morning.  The extra logging failed (:shakes fist:)
but we were able to implicate this code in the process.


The silver lining is that I'll be working on it again and might be able to
get some speed improvements while were there.

I'm not sure what the outage says about the schedule.  I'll have to do some
thinking about that.  In the mean time it does say that we should keep the
old search there as a backup.  We've been able to fall back to it during
the outages to minimize the suffering.


Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] DB performance review: Wikibase Usage Tracking

2014-09-17 Thread Nikolas Everett
I can't access those links!

On Tue, Sep 16, 2014 at 11:14 AM, Daniel Kinzler 
daniel.kinz...@wikimedia.de wrote:

 Hi all!

 The Wikibase team would like to allow data from any item to be used on any
 client page. To do this, we need to track which item is being used where,
 so we
 can purge the appropriate pages when the item changes. We would like to
 have
 people with database experience to look at our proposal and let us know
 about
 any concerns, especially wrt performance.

 Here you find a proposal for two database tables for tracking the usage of
 entities across wikis:


 https://gerrit.wikimedia.org/r/#/c/158078/9/usagetracking/includes/Usage/Sql/entity_usage.sql,unified


 https://gerrit.wikimedia.org/r/#/c/158078/9/subscription/includes/Subscription/Sql/entities_per_client.sql,unified


 The entity_usage table would be on every client, recording wich entity
 is used
 on which page (kind of like the iwlinks table). The entity_per_client
 table
 would be on the repo, and track which wiki (client) is interested in
 changes
 to which entity.

 Please have a look and let me know if you have any questions or
 suggestions,
 especially with regards to the following use cases:

 The following would happpen when editing/re-parsing a page on a client wiki
 (e.g. wikipedia):
 * get all entities used on a given page from entity_usage
 * delete rows based on a page id and a list of entity ids from entity_usage
 * insert rows for a page / entity pair into entity_usage
 * queriy rows for a set of entities from entity_usage (with no page id
 specified).
 * add rows for a set of (newly used) entites to the entity_per_client table
 * remove rows for a set of (no longer used) entites from the
 entity_per_client table

 The following would happen when dispatching a change from wikibase:
 * looking up interested wikis for a list of entities from the
 entity_per_client
 table.
 * (notification via the job queue)
 * looking up pages to be purged/updated based on a list of entity ids (and
 possibly an aspect id) in the entity_usage table.

 -- daniel

 --
 Daniel Kinzler
 Senior Software Developer

 Wikimedia Deutschland
 Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Parser cache update/migration strategies

2014-09-09 Thread Nikolas Everett
Also option 5 could be to continue without the days until the parser cash
is invalidated on its own.
Maybe option 6 could be to continue without the data and invalidate the
cache and completely rerender only some of the time. Like 5% of the time
for the first couple hours then 25% of the time for a day then 100% of the
time after that. It'd guarantee that the cache is good after a certain
amount of time without causing a big spike ridge after deploys.
All those options are less good then just updating the cache I think.

Nik
On Sep 9, 2014 6:42 AM, aude aude.w...@gmail.com wrote:

 On Tue, Sep 9, 2014 at 12:03 PM, Daniel Kinzler dan...@brightbyte.de
 wrote:

  Hi all!
 
  tl;dr: How to best handle the situation of an old parser cache entry not
  containing all the info expected by a newly deployed version of code?
 
 
  We are currently working to improve our usage of the parser cache for
  Wikibase/Wikidata. E.g., We are attaching additional information related
 to
  languagelinks the to ParserOutput, so we can use it in the skin when
  generating
  the sidebar.
 
  However, when we change what gets stored in the parser cache, we still
  need to
  deal with old cache entries that do not yet have the desired information
  attached. Here's a few options we have if the expected info isn't in the
  cached
  ParserOutput:
 
  1) ...then generate it on the fly. On every page view, until the parser
  cache is
  purged. This seems bad especially if generating the required info means
  hitting
  the database.
 
  2) ...then invalidate the parser cache for this page, and then a) just
  live with
  this request missing a bit of output, or b) generate on the fly c)
 trigger
  a
  self-redirect.
 
  3) ...then generated it, attach it to the ParserOutput, and push the
  updated
  ParserOutput object back into the cache. This seems nice, but I'm not
 sure
  how
  to do that.
 


 https://gerrit.wikimedia.org/r/#/c/158879/ is my attempt to update
 ParserOutput cache entry, though it seems too simplistic a solution.

 Any feedback on this would be great or suggestions on how to do this
 better, or maybe it's crazy idea. :P

 Cheers,
 Katie


 
  4) ...then force a full re-rendering and re-caching of the page, then
  continue.
  I'm not sure how to do this cleanly.
 
 
  So, the simplest solution seems to be 2, but it means that we invalidate
  the
  parser cache of *every* page on the wiki potentially (though we will not
  hit the
  long tail of rarely viewed pages immediately). It effectively means that
  any
  such change requires all pages to be re-rendered eventually. Is that
  acceptable?
 
  Solution 3 seems nice and surgical, just injecting the new info into the
  cached
  object. Is there a nice and clean way to *update* a parser cache entry
 like
  that, without re-generating it in full? Do you see any issues with this
  approach? Is it worth the trouble?
 
 
  Any input would be great!
 
  Thanks,
  daniel
 
  --
  Daniel Kinzler
  Senior Software Developer
 
  Wikimedia Deutschland
  Gesellschaft zur Förderung Freien Wissens e.V.
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l




 --
 @wikimediadc / @wikidata
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Parser cache update/migration strategies

2014-09-09 Thread Nikolas Everett
On Tue, Sep 9, 2014 at 8:00 AM, Daniel Kinzler dan...@brightbyte.de wrote:

 Am 09.09.2014 13:45, schrieb Nikolas Everett:
  All those options are less good then just updating the cache I think.

 Indeed. And that *sounds* simple enough. The issue is that we have to be
 sure to
 update the correct cache key, the exact one the OutputPage object in
 question
 was loaded from. Otherwise, we'll be updating the wrong key, and will read
 the
 incomplete object again, and try to update again, and again, on every page
 view.

 Sadly, the mechanism for determining the parser cache key is quite
 complicated
 and rather opaque. The approach Katie tries in I1a11b200f0c looks fine at a
 glance, but even if i can verify that it works as expected on my machine,
 I have
 no idea how it will behave on the more strange wikis on the live cluster.

 Any ideas who could help with that?


No, not really.  My only experience with the parser cache was accidentally
polluting it with broken pages one time.

I suppose one option is to be defensive around reusing the key.  I mean, if
you could check the key used to fetch from the parser cache and you had a
cache hit then you know if you do a put you'll be setting _something_.

Another thing - I believe uncached calls to the parser are wrapped in pool
counter acquisitions to make sure no two processes spend duplicate effort.
You may want to acquire that to make sure anything you do that is heavy
doesn't get done twice.

Once you start talking about that it might just be simpler to invalidate
the whole entry.

Another option:
Kick off some kind of cache invalidation job that _slowly_ invalidates the
appropriate parts of the cache.  Something like how the varnish cache is
invalidated on template change.  That gives you marginally more control
than randomized invalidation.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Recent vagrant issues

2014-08-12 Thread Nikolas Everett
If you've just started having vagrant issues, particularly if `vagrant
provision` has started complaining about git and vector then make sure to
pull the newest version of mediawiki.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Thoughts about roles

2014-08-09 Thread Nikolas Everett
Cirrus's dependencies are too get the integration tests passing and they
verify some stuff that came up in dictionary which doesn't force capitals.
I'm all for splitting the roles into basic ones and bloated ones.
On Aug 9, 2014 3:47 PM, Chad innocentkil...@gmail.com wrote:

 On Sat, Aug 9, 2014 at 3:40 PM, Max Semenik maxsem.w...@gmail.com wrote:

  Currently a lot of our extension Vagrant roles are working like Swiss
  knives: they do everything possible to imagine. For example,
 MobileFrontend
  always installs 3 optional dependencies while CirrusSearch includes its
  configuration for unit tests that among other things
  enforces $wgCapitalLinks = false which is untypical for most MW installs.
 
 
 I hate that stupid config file for Cirrus. HATE HATE HATE.


  I think many of these actually make development harder. Solution? Can we
  split some larger roles to basic and advanced parts, so that people
 who
  need an extension to play around or to satisfy a dependency will not be
  forced to emulate a significant part of WMF infrastructure?
 

 Not a bad idea. Cirrus doesn't depend on half the things it says
 it does unless you're wanting to run browser tests.

 -Chad
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Special:Search returning errors on English Wikipedia

2014-07-02 Thread Nikolas Everett
I'll have a look at it.


On Wed, Jul 2, 2014 at 12:29 PM, Florian Schmidt 
florian.schmidt.wel...@t-online.de wrote:

 Hello!

 Is working for me. I have only opened your link in Google chrome and
 searched for Android (search suggestions working, too) and click ok.
 After this I see the result page.

 Can you try to delete cache and cookies? What browser you use?

 Kind regards
 Florian

 Freundliche Grüße
 Florian

 -Ursprüngliche Nachricht-
 Von: wikitech-l-boun...@lists.wikimedia.org [mailto:
 wikitech-l-boun...@lists.wikimedia.org] Im Auftrag von Pine W
 Gesendet: Mittwoch, 2. Juli 2014 18:26
 An: wikitech-l@lists.wikimedia.org
 Betreff: [Wikitech-l] Special:Search returning errors on English Wikipedia

 I am unable to search using any of the options on
 https://en.wikipedia.org/w/index.php?title=Special:Search

 This came to my attention when a user reported that they were unable to
 search English Wikipedia's help files. It turns out that none of the
 advanced, everything, multimedia, or content pages search functions
 are working.

 All searches return the error An error has occurred while searching: The
 search backend returned an error:

 Pine
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Special:Search returning errors on English Wikipedia

2014-07-02 Thread Nikolas Everett
Hmmm - its working for me.  The first couple of times I tried it was slow
but it worked.  I tried both search engine options (BetaFeature and
default) and a bunch of different search options.

We don't have good logs for the default search and that error message looks
like it came from there.  If it did and its gone now we'll have to chalk it
up to a temporary blip in the old search system that we don't really
understand very well.  That's a painful thing to say, but the last time I
poked that system trying to fix it I took out enwiki's search for half an
hour while it warmed its caches.  I try not to taunt it unless it is
seriously broken.  Failing on all searches certainly counts so if it does
it again then please reply.

The BetaFeature search log is only complaining about errors that I know
about and am fixing, literally right now.

Nik



On Wed, Jul 2, 2014 at 12:34 PM, Nikolas Everett never...@wikimedia.org
wrote:

 I'll have a look at it.



 On Wed, Jul 2, 2014 at 12:29 PM, Florian Schmidt 
 florian.schmidt.wel...@t-online.de wrote:

 Hello!

 Is working for me. I have only opened your link in Google chrome and
 searched for Android (search suggestions working, too) and click ok.
 After this I see the result page.

 Can you try to delete cache and cookies? What browser you use?

 Kind regards
 Florian

 Freundliche Grüße
 Florian

 -Ursprüngliche Nachricht-
 Von: wikitech-l-boun...@lists.wikimedia.org [mailto:
 wikitech-l-boun...@lists.wikimedia.org] Im Auftrag von Pine W
 Gesendet: Mittwoch, 2. Juli 2014 18:26
 An: wikitech-l@lists.wikimedia.org
 Betreff: [Wikitech-l] Special:Search returning errors on English Wikipedia

 I am unable to search using any of the options on
 https://en.wikipedia.org/w/index.php?title=Special:Search

 This came to my attention when a user reported that they were unable to
 search English Wikipedia's help files. It turns out that none of the
 advanced, everything, multimedia, or content pages search functions
 are working.

 All searches return the error An error has occurred while searching: The
 search backend returned an error:

 Pine
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Finding images

2014-06-18 Thread Nikolas Everett
On Jun 18, 2014 2:28 PM, Brian Wolff bawo...@gmail.com wrote:

 On 6/18/14, Kristian Kankainen krist...@eki.ee wrote:
  Hello!
 
  I think, if one is clever enough, some categorization could be automated
  allready.
 
  Searching for pictures based on meta-data is called Concept Based Image
  Retrieval, searching based on the machine vision recognized content of
  the image is called Content Based Image Retrieval.
 
  What I understood of Lars' request, is an automated way of finding the
  superfluous concepts or meta-data for pictures based on their content.
  Of course recognizing an images content is very hard (and subjective),
  but I think it would be possible for many of these superfluous
  categories, such as winter landscape, summer beach and perhaps also
  red flowers and bicycle.
 
  There exist today many open source Content Based Image Retrieval
  systems, that I understand basically works in the way that you give them
  a picture, and they find you the matching pictures accompanied with a
  score. Now suppose we show a picture with known content (pictures from
  Commons with good meta-data), then we could to a degree of trust find
  pictures with overlapping categories.
  I am not sure whether this kind of automated reverse meta-data labelling
  should be done for only one category per time, or if some kind of
  category bundles work better. Probably adjectives and items should be
  compounded (eg red flowers).
 
  Relevant articles and links from Wikipedia:
  # https://en.wikipedia.org/wiki/Image_retrieval
  # https://en.wikipedia.org/wiki/Content-based_image_retrieval
  #
 
https://en.wikipedia.org/wiki/List_of_CBIR_engines#CBIR_research_projects.2Fdemos.2Fopen_source_projects
 
  Best wishes
  Kristian Kankainen
 
  18.06.2014 09:14, Pine W kirjutas:
  Machine vision is definitely getting better with time. We have
  computer-driven airplanes, computer-driven cars, and computer-driven
  spacecraft. The computers need us less and less as hardware and
software
  improve. I think it may be less than a decade before machine vision is
  good
  enough to categorize most objects in photographs.
 
  Pine
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 Interesting. Some demo links that I found:

 * http://demo-itec.uni-klu.ac.at/liredemo/

Lire has been on my list of things to look at for a while now. Its nice
because it could integrate reasonably easily into cirrus because it is
built on lucene.

I can't promise anything quick but I'll look into the others as well.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Tell my favorite conference about your Wikimedia tech

2014-06-16 Thread Nikolas Everett
Man!  I'd love to go ride in a balloon and give a talk but Auckland is so
close to half way around the world


On Mon, Jun 16, 2014 at 11:36 AM, Sumana Harihareswara 
suma...@wikimedia.org wrote:

 Thanks, Luis! And for Tyler or anyone else on this list who has the same
 questions:

 Sometimes I come up with a talk idea by asking myself, What do I know
 now that I wish I'd known a year ago? This is a way to think about what
 you've learned that a lot of other people don't know as well as you.
 That's basically how I thought of A Few Python Tips. To practice it in
 front of a small crowd first, I'm doing a tech talk this Thursday:
 https://www.mediawiki.org/wiki/Meetings/2014-06-19 before I talk next
 week at Open Source Bridge.

 To give a Wikimedia tech talk about your topic:
 https://www.mediawiki.org/wiki/Project:Calendar/How_to_schedule_an_event

 And, just like with submitting patches, don't reject *yourself* before
 the conference organizers have a chance to. ;-)

 -Sumana


 On 06/13/2014 10:52 AM, Luis Villa wrote:
  On Fri, Jun 13, 2014 at 7:07 AM, Tyler Romeo tylerro...@gmail.com
 wrote:
 
  I’ve always wanted to submit a cool MediaWiki talk to these conferences,
  but I have no idea what I’d talk about (or whether I’m even experienced
  enough to talk about anything at a conference).
 
 
  The answer to that second part is yes :) LCA is not TED :) Background
 on
  their speaker selection process and what makes for a good submission
  (useful for any conference, not just LCA):
 
 http://opensource.com/life/14/1/get-your-conference-talk-submission-accepted
 
 
  Are there any guidelines on what would make a good talk?
 
 
  http://speaking.io/plan/an-idea/ ?
 
  [All of speaking.io is useful.]
 
  HTH-
  Luis
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Getting phpunit working with Vagrant

2014-06-13 Thread Nikolas Everett
I _thought_ someone was working on getting it to just work.  For now,
though, if you start with a clean machine you can run the commands here:
https://www.mediawiki.org/wiki/Manual:PHP_unit_testing/Installing_PHPUnit#Using_PEAR
to get it installed.  Make sure the use the pear commands because it'll get
you phpunit 3.7.X.  phpunit 4.0 doesn't work for us mediawiki.  Anyway,
after following the pear commands inside your vagrant VM phpunit should
work.

Nik


On Fri, Jun 13, 2014 at 1:44 PM, Jon Robson jdlrob...@gmail.com wrote:

 Has anyone had success with this...?

 This is what happens when I try to run:

 master x ~/git/vagrant/mediawiki/tests/phpunit $ php phpunit.php

 Warning: require_once(/vagrant/LocalSettings.php): failed to open
 stream: No such file or directory in
 /Users/jrobson/git/vagrant/mediawiki/LocalSettings.php on line 130

 Fatal error: require_once(): Failed opening required
 '/vagrant/LocalSettings.php' (include_path='.:') in
 /Users/jrobson/git/vagrant/mediawiki/LocalSettings.php on line 130

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] How to show the page AND section for CirrusSearch search results‏

2014-05-09 Thread Nikolas Everett
On Fri, May 9, 2014 at 8:32 AM, J jollylittlebottom 
jollylittlebot...@hotmail.com wrote:

 If I search on http://www.mediawiki.org for Search Weighting I get as
 result a line:
 Search  (section Search Weighting Ideas) with links to the page and to
 the section.

 This section contains the word GeoLoc.
 But if I search for GeoLoc I get just the page link.

 I want to show this section link as a search result too.

 Is there an easy way or is it a planned feature?

 What do I have to change in the CirrusSearch extension?


Right, the (section *Search Weighting* Ideas) bit is populated by
matching query terms to the section titles rather then doing something in
combination of the text snippet below.  This can lead to it sometimes
showing a snippet from one section and a section heading from another.  Let
me have a think about how to make that better

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GeoData now uses Elasticsearch

2014-04-10 Thread Nikolas Everett
On Thu, Apr 10, 2014 at 3:43 AM, Faidon Liambotis fai...@wikimedia.orgwrote:

 On Thu, Apr 10, 2014 at 05:04:38AM +0400, Max Semenik wrote:
  And finally, appreciation: this was made possible only thanks to awesome
  help from our search team, Nik Everett and Chad Horohoe. You kick ass
  guys!

 Extending appreciation: thanks Max, good work! This is great :)


Yeah, you did most of the work Max!

As to which wiki to go with next: look at notcirrus.dblist of all the wikis
that don't have any access to Cirrus.  All the others are indexing pages
and would just require enabling the Cirrus integration with GeoData and
running a reindex to work.

Oh, and don't pick enwiki.  We're running it at -1 redundency right now do
to space concerns.  So only two total copied instead of 3.  We're working
on fixing this but it'll take some time.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Next steps down the TitleValue road

2014-04-03 Thread Nikolas Everett
Now that TitleValue has been merged - what's next?  I'll admit I'm an odd
choice to be sending out this email [1], but someone's got to do it.  So,
I'm thinking, maybe:

1.  Start on the TODO in Linker.php [2], turning it into a deprecated
compatibility interface calling HtmlPageLinkRenderer.
2.  Start writing code in the same fashion for an upcoming project.  I
believe the upcoming revision storage work might lend itself well to this.

Also, I think we should think about how we want interdependent components
to come together.  Right now everything must know how to make all of its
dependencies.  For example, LinksSearchPage must know how to build
MediaWikiTitleCodec.  That isn't a hardship now, but might become one when
we have 30 things like LinksSearchPage and we want to add another
dependency to MediaWikiTitleCodec.

I don't claim to know a whole lot about the state of the art for this
problem in PHP but I'm used to solving it with an inversion of control
container.  Each component declares its dependencies in some way and the
container makes sure that each component gets the dependencies it needs.
Do we need something like this now or will we need something like this in
the future?

Nik/manybubbles

[1]: I was the most against TitleValue at the Architecture Summit but have
since softened my opinion.  Also, the vast majority of my work is in the
CirrusSearch extension and not core.
[2]: https://gerrit.wikimedia.org/r/#/c/106517/22/includes/Linker.php,cm
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wikitech-ambassadors] Roadmap and deployment highlights - week of March 31st

2014-03-28 Thread Nikolas Everett
On Fri, Mar 28, 2014 at 4:57 PM, Greg Grossmeier g...@wikimedia.org wrote:

 == Wednesday ==

 * Cirrus Search will be graduated from Beta Feature to enabled for all
   users on all non-wikipedia wikis (eg Commons, etc)
 ** https://www.mediawiki.org/wiki/Search


I'd prefer to do commons on its own some other time because it is much
higher traffic.  Also, we're not even a BetaFeature in a few non-wikipedias
and it wouldn't be fair (or even work) to just switch them on too.  So all
non-wikipedias that aren't Commons, Meta, or Incubator.

As always I'm open if you find any show stopper issues while trying Cirrus
as a BetaFeature - we won't deploy it to a wiki that it'll make worse.

Sorry for the late notice,

Nik Everett/manybubbles
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Bach redirecting to Bạch (notice the dot under the a)

2014-03-17 Thread Nikolas Everett
Are either one of you opted into the New Search BetaFeature?

Nik


On Mon, Mar 17, 2014 at 9:01 AM, John phoenixoverr...@gmail.com wrote:

 It works for me.

 On Monday, March 17, 2014, David Cuenca dacu...@gmail.com wrote:

  Hi,
 
  When I type bach on the top right en.wp search box, I only have the
  option to select Bach from the list. This option however takes me to
  Bạch (with a dot under the a).
  https://en.wikipedia.org/wiki/B%E1%BA%A1ch
 
  However when I type the url I'm taken to the right article
  https://en.wikipedia.org/wiki/Bach
 
  Is this a problem with the search box? I wanted to report the bug, but I
  didn't know to which component to report it.
 
  Cheers,
  Micru
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org javascript:;
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Bach redirecting to Bạch (notice the dot under the a)

2014-03-17 Thread Nikolas Everett
Filed: https://bugzilla.wikimedia.org/show_bug.cgi?id=62727

I figured out the problem and kicked off a process to fix it.  You should
be able to opt back in a few hours and the problem will have gone away.

Nik


On Mon, Mar 17, 2014 at 9:19 AM, David Cuenca dacu...@gmail.com wrote:

 I was! When opting out, it works fine.

 Thanks for the hint!

 Micru


 On Mon, Mar 17, 2014 at 2:16 PM, Nikolas Everett never...@wikimedia.org
 wrote:

  Are either one of you opted into the New Search BetaFeature?
 
  Nik
 
 
  On Mon, Mar 17, 2014 at 9:01 AM, John phoenixoverr...@gmail.com wrote:
 
   It works for me.
  
   On Monday, March 17, 2014, David Cuenca dacu...@gmail.com wrote:
  
Hi,
   
When I type bach on the top right en.wp search box, I only have the
option to select Bach from the list. This option however takes me
 to
Bạch (with a dot under the a).
https://en.wikipedia.org/wiki/B%E1%BA%A1ch
   
However when I type the url I'm taken to the right article
https://en.wikipedia.org/wiki/Bach
   
Is this a problem with the search box? I wanted to report the bug,
 but
  I
didn't know to which component to report it.
   
Cheers,
Micru
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org javascript:;
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 



 --
 Etiamsi omnes, ego non
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Bach redirecting to Bạch (notice the dot under the a)

2014-03-17 Thread Nikolas Everett
Looks like I lied: I'll have to make a software change to fix this after
all  It'll be more then a few hours but I'll reply on the bug when it
is really fixed.


On Mon, Mar 17, 2014 at 9:44 AM, Nikolas Everett never...@wikimedia.orgwrote:

 Filed: https://bugzilla.wikimedia.org/show_bug.cgi?id=62727

 I figured out the problem and kicked off a process to fix it.  You should
 be able to opt back in a few hours and the problem will have gone away.

 Nik


 On Mon, Mar 17, 2014 at 9:19 AM, David Cuenca dacu...@gmail.com wrote:

 I was! When opting out, it works fine.

 Thanks for the hint!

 Micru


 On Mon, Mar 17, 2014 at 2:16 PM, Nikolas Everett never...@wikimedia.org
 wrote:

  Are either one of you opted into the New Search BetaFeature?
 
  Nik
 
 
  On Mon, Mar 17, 2014 at 9:01 AM, John phoenixoverr...@gmail.com
 wrote:
 
   It works for me.
  
   On Monday, March 17, 2014, David Cuenca dacu...@gmail.com wrote:
  
Hi,
   
When I type bach on the top right en.wp search box, I only have
 the
option to select Bach from the list. This option however takes me
 to
Bạch (with a dot under the a).
https://en.wikipedia.org/wiki/B%E1%BA%A1ch
   
However when I type the url I'm taken to the right article
https://en.wikipedia.org/wiki/Bach
   
Is this a problem with the search box? I wanted to report the bug,
 but
  I
didn't know to which component to report it.
   
Cheers,
Micru
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org javascript:;
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 



 --
 Etiamsi omnes, ego non
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] CirrusSearch outage Feb 28 ~19:30 UTC

2014-02-28 Thread Nikolas Everett
CirrusSearch flaked out Feb 28 around 19:30 UTC and I brought it back from
the dead around 21:25 UTC.  During the time it was flaking out searches
that used it (mediawiki.org, wikidata.org, ca.wikipedia.org, and everything
in Italian) took a long, long time or failed immediately with a message
about this being a temporary problem we're working on fixing.

Events:
We added four new Elasticserach servers on Rack D (yay) around 18:45 UTC
The Elasticsearch cluster started serving simple requests very slowly
around 19:30 UTC
I was alerted to a search issue on IRC at 20:45 UTC
I fixed the offending Elasticsearch servers around 21:25 UTC
Query times recovered shortly after that

Explanation:
We very carefully installed the same version of Elasticsearch and Java as
we use on the other machines then used puppet to configure the
Elasticsearch machines to join the cluster.  It looks like they only picked
up half the configuration provided by puppet
(/etc/elasticsearch/elasticsearch.yml but not
/etc/defaults/elasticsearch).  Unfortunately for us that is the bad half to
miss because /etc/default/elasticsearch contains the JVM heap settings.

The servers came online with the default amount of heap which worked fine
until Elasticsearch migrated a sufficiently large index to them.  At that
point the heap filled up and Java does what it does in that case and spun
forever trying to free garbage.  It pretty much pegged one CPU and rendered
the entire application unresponsive.  Unfortunately (again) pegging one CPU
isn't that weird for Elasticsearch.  It'll do that when it is merging.  The
application normally stays responsive because the rest of the JVM keeps
moving along.  That doesn't happen when heap is full.

Knocking out one of those machines caused tons of searches to block,
presumably waiting for those machine to respond.  I'll have to dig around
to see if I can find the timeout but we're obviously using the default
which in our case is way way way to long.  We then filled the pool queue
and started rejecting requests to search altogether.

When I found the problem all I had to do was kill -9 the Elasticsearch
servers and restart them.  -9 is required because JVMs don't catch the
regular signal if they are too busy garbage collecting.

What we're doing to prevent it from happening again:
* We're going to monitor the slow query log and have icinga start
complaining if it grows very quickly.  We normally get a couple of slow
queries per day so this shouldn't be too noisy.  We're going to also have
to monitor error counts, especially once we get more timeouts.  (
https://bugzilla.wikimedia.org/show_bug.cgi?id=62077)
* We're going to sprinkle more timeouts all over the place.  Certainly in
Cirrus while waiting on Elasticsearch and figure out how to tell
Elasticsearch what the shard timeouts should be as well.(
https://bugzilla.wikimedia.org/show_bug.cgi?id=62079)
* We're going to figure out why we only got half the settings.  This is
complicated because we can't let puppet restart Elasticsearch because
Elasticsearch restarts must be done one node at a time.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Thoughts on hiding text from the internal search

2014-02-19 Thread Nikolas Everett
I can make a better case for hiding things from internal search then I did
on the bug.  I'll send it here and copy it to the mailing list:

The biggest case I can think of for excluding text from search is the
license information on commons.  Please take that as an example.  Maybe it
is the only example I think it is pretty important.
1.  The license information doesn't add a whole lot to the result.  Try
searching commons with Cirrus for distribute, transmit, or following
and you'll very quickly start to see the text of the CC license.  And the
searches find 14 million results.  Heaven forbid you want to find
distributed transmits or something.  You'll almost exclusively get the
license highlighted and you'll still find 14 million results.  This isn't
_horrible_ because the top results all have distribute or transmit in
the title but it isn't great.
2.  Knock on effect from #2: because relevance is calculated based on the
inverse of the number of documents that contain the word the then every
term in the CC license is worth less then words not in the license.  I
can't point to any example of why that is bad but I feel it in my bones.
Feel free to ignore this.  I'm probably paranoid.
3.  Entirely self serving: given #1, the contents of the license take up an
awful lot of space for very little benefit.  If I had more space I could
make Cirrus a beta on more wikis.  It is kind of a lame reason and I'm
attacking the space issue from other angles so maybe it'll be moot long
before we get this deployed and convince the community that it is worth
doing.
4.  Really really self serving:  if .nosearch is the right solution and is
useful then it is super duper easy to implement.  Like one line of code, a
few tests, and bam.  Its already done, just waiting to be rebased and
merged.  It was so easy it would have taken longer to estimate the effort
then to propose an implementation.

I really wouldn't be surprised if someone couldn't come up with great
reason why #1 is silly and we just shouldn't do it.

The big problem with the nosearch class implementation is that it'd be
pretty simple to abuse and hard to catch the abuse because the text is
still on the page.  One of the nice things about the solution is you could
use a web browser's debugger to highlight all the text excluded from search
by writing a simple CSS class.

I think that is all I have on the subject,

Nik/manybubbles


On Wed, Feb 19, 2014 at 1:29 AM, Chad innocentkil...@gmail.com wrote:

 On Tue, Feb 18, 2014 at 9:50 PM, MZMcBride z...@mzmcbride.com wrote:

  Chad wrote:
  I'm curious how people would go about hiding text from the internal
  MediaWiki search engine (not external robots). Right now I'm thinking of
  doing a rather naïve .nosearch class that would be stripped before
  indexing. I can see potentials for abuse though.
  
  Does anyone have any bright ideas?
 
  It's difficult to offer advice without knowing why you're trying to do
  what it is you're trying to do. You've described a potential solution,
 but
  I'm not sure what problem you're trying to solve. Are there some example
  use-cases or perhaps there's a relevant bug in Bugzilla?
 
 
 Ah, here's the bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=60484

 -Chad
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Thoughts on hiding text from the internal search

2014-02-19 Thread Nikolas Everett
On Wed, Feb 19, 2014 at 12:17 PM, Helder . helder.w...@gmail.com wrote:

 On Wed, Feb 19, 2014 at 12:14 PM, Nikolas Everett
 never...@wikimedia.org wrote:
  The big problem with the nosearch class implementation is that it'd be
  pretty simple to abuse and hard to catch the abuse because the text is
  still on the page.  One of the nice things about the solution is you
 could
  use a web browser's debugger to highlight all the text excluded from
 search
  by writing a simple CSS class.

 What if the abuse is inside of a hidden element?
 http://jsfiddle.net/WQ6K2/

 Helder


Yeah, nowhere near perfect.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] TitleValue

2014-02-04 Thread Nikolas Everett
On Fri, Jan 24, 2014 at 8:55 PM, Daniel Kinzler dan...@brightbyte.dewrote:

 Am 24.01.2014 14:44, schrieb Brad Jorsch (Anomie):
  It looks to me like the existing patch *already is* getting too far into
  the Javaification, with it's proliferation of classes with single methods
  that need to be created or passed around.

 There is definitely room for discussion there. Should we have separate
 interfaces for parsing and formatting, or should both be covered by the
 same
 interface? Should we have a Linker interface for generating all kinds of
 links,
 or separate interfaces (and/or implementations) for different kinds of
 links?

 I don't have strong feelings about those, I'm happy to discuss the
 different
 options. I'm not sure about the right place for that discussion though -
 the
 patch? The RFC? This list?


I vote mailing list.  Maybe it'll be livelier.

Personally, as I said in previous mails, I like the idea of pulling things
out of the Title class.

I'm going to pose questions and answer them in the order that they come to
me.

* Should linking, parsing, and formatting live outside the Title class?
Yes for a bunch of reasons.  At a minimum the Title class is just too large
to hold in your head properly.  Linking, parsing, and formatting aren't
really the worst offenders but they are reasonably easy to start with.  I
would, though, like to keep some canonical formatting in the new
TitleValue.  Just a useful __toString that doesn't do anything other than
print the contents in a form easy to read.

* Should linking, parsing, and formatting all live together in one class
outside the Title class?
I've seen parsing and formatting live together before just fine as they
really are the inverse of one another.  If they are both massively complex
then they probably ought not to live together.  Linking feels like a thing
that should consume the thing that does formatting.  I think putting them
together will start to mix metaphors too much.

* Should we have a formatter (or linker or parser) for wikitext and another
for html and others as we find new output formats?
I'm inclined against this both because it requires tons of tiny classes
that can make tracing through the code more difficult and because it
implies that each implementation is substitutable for the other at any
point when that isn't the case.  Replacing the html formatter used in the
linker with the wikitext formatter would produce unusable output.


I really think that the patch should start modifying the Title object to
use the the functionality that it is removing from it.  I'm not sure we're
ready to start deprecating methods in this patch though.


In a parallel to getting the consensus to merge a start on TitleValue we
need to be talking about what kind of inversion of control we're willing to
have.  You can't step too far down the services path without some kind of
strategy to prevent one service from having to know what its dependencies
dependencies are.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Lsearch and MWSearch: how to turn on morphology for Russian

2014-01-30 Thread Nikolas Everett
I hate to say this after all you went through setting up Lucene Search but
it is end of life and not receiving any real support.  We're in the process
of replacing it with the combination of
CirrusSearchhttps://www.mediawiki.org/wiki/Extension:CirrusSearch
/Elasticsearch http://www.elasticsearch.org/ which work pretty much the
same way the MWSearch/Lucene Search combination does.  CirrusSearch has to
be smarter than MWSearch because Elasticsearch doesn't have any Mediawiki
knowledge but because it links into Mediawiki it can do things like expand
templates.  I like it but I'm biased.

That aside, it looks like Lucene Search is supposed to read
InitializeSettings which is kind of wmf specific thing.  You might be able
to trick it into doing it by putting a file called InitializeSettings.php
in the conf directory with the contents

'wgLanguageCode' = array(
 'your $wgDBname' =  'ru',
),


CirrusSearch, if you care to try it, reads the language code from
wgLanguageCode.

Nik



On Thu, Jan 30, 2014 at 3:39 PM, Yury Katkov katkov.ju...@gmail.com wrote:

 Hi guys!

 I've installed MWSearch and Lucene Search extensions but I can see that the
 search engine doesn't understand the morphology of Russian (doesn't
 recognize word forms). How can I turn the morphological analyzer on? How
 it's done in Russian Wikipedia?

 Cheers,
 -
 Yury Katkov, WikiVote
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Smarter namespace defaults for searches

2014-01-06 Thread Nikolas Everett
I like the idea.  I wonder a few things:
1.  Is this something that only makes sense to do for the help namespace?
2.  Would it be good enough to catch help me kinds of queries and provide
a did you mean-like suggestion for a new search that'd actually search
help?

Nik


On Mon, Jan 6, 2014 at 4:42 PM, Tobias church.of.emacs...@googlemail.comwrote:

 I've been a Wikipedia trainer at schools for quite some time now.
 Probably the single most common mistake people in my workshops make when
 accessing a Wiki's meta pages (i.e. Wikipedia:Help) is by omitting the
 colon indicating the namespace.

 The default search namespace is just NS-0, i.e. the main namespace. This
 means if you enter Wikisource Help on en.wikisource.org, you get
 nothing useful:

 http://en.wikisource.org/w/index.php?search=Wikisource+Helpbutton=title=Special%3ASearch

 English Wikipedia has implemented a workaround by creating redirects
 from the main namespace to the project namespace: an ugly fix, since it
 mixes up the distinction between namespaces.

 Instead, we should make MediaWiki a bit smarter with regard to the
 namespace selection: When you search for Help Editing, the Help
 namespace should be included.

 This could be done in the most simplest form by checking whether a
 namespace string is a prefix of the search string (perhaps excluding
 exotic namespaces such as MediaWiki) or even if the namespace name is
 contained in the search string.

 What do you think?

 Best regards,
 Tobias




 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] RFC cluster summary: HTML templating

2013-12-27 Thread Nikolas Everett
On Fri, Dec 27, 2013 at 1:30 PM, Chad innocentkil...@gmail.com wrote:

 On Fri, Dec 27, 2013 at 12:34 PM, Jon Robson jdlrob...@gmail.com wrote:

  
   I want a templating system that can be used both in PHP and JavaScript
  and
   fits in our way of doing i18n. And a bunny.
  
 
  I'm not sure if this was meant as sarcastic but I do want this too and
  think it is a reasonable achievable goal - bunny optional!
 
 
 Bunnies should be listed in the requirements ;-)



I believe unicorns were in the requirements for search.

In all seriousness, PHP, JavaScript, and fitting i18n sound like minimum
requirements.  I'd also throw in html escaping by default.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Deploymight highlights - week of December 16th

2013-12-18 Thread Nikolas Everett
I wonder if this is 38273 revived.  Like 58042 was.  Cirrus hasn't changed
this code so I'm reasonably confident it isn't us this time.  Though it is
still possible given that we're on mediawikiwiki and itwiki.



On Wed, Dec 18, 2013 at 4:23 AM, Federico Leva (Nemo) nemow...@gmail.comwrote:

 Did the PHP upgrade affect tidy in some way? Some pages are severely
 broken e.g. by unbalanced div or table tags (both Vector and monobook).
 Only two reports on #wikimedia-tech in two days, so maybe no real change,
 but I used not to hear any. :)
 https://www.mediawiki.org/w/index.php?title=Extension%
 3ABugzilla_Reportsdiff=844734oldid=773425
 https://it.wikipedia.org/w/index.php?title=Utente%
 3AVale14orladiff=63098711oldid=54419590

 Nemo


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Hook for Adding to Empty Search Results

2013-12-13 Thread Nikolas Everett
SpecialSearchResultsPrepend lets you add html directly to the search page
but doesn't let you add your own results.  The html actually gets injected
above the search for so it'd take some css trickery to move it.  Example:
This wiki is using a new search engine. (Learn
morehttps://www.mediawiki.org/wiki/Special:MyLanguage/Help:CirrusSearch)
on https://www.mediawiki.org/wiki/Special:Search

Beyond that I think you have three options:
1.  Extend SearchMySQL.
2.  Add a hook yourself and know that you are running a patched version of
core.  I'm happy to help get the patch upstream if you don't want to live
with that burden forever.
3.  Add that pages with importTextFile.

Nik


On Thu, Dec 12, 2013 at 9:15 PM, Paul Dugas p...@dugasenterprises.comwrote:

 I have an extension using the ArticleFromTitle hook to generate pages
 for components of a large system we operate.  There are approximately
 6000 components at the moment with static inventory and config data in
 a database and live status data in a number of other systems.  We are
 using MediaWiki as a historical maintenance knowledge-base for the
 staff.  With this extension, we can integrate all the data for each
 device in one place.  We can hit MyNS:DeviceName and get a page that
 describes a device and that page can link to other pages in the main
 namespace that techs create with vendor details, model info, manuals,
 etc.  We can even keep a talk page for each device.  Very handy.

 Trouble now is I want to be able to find devices using the search
 feature.  SpecialSearchResults looked promising but that only gets
 called when there is at least one match in normal pages.  So, I
 looked at SpecialSearchNoResults but that doesn't allow me to add to
 the empty results.  Doesn't anyone have a suggestion on how I could go
 about this?  I really want to avoid generating the text of pages
 externally periodically and loading them into the wiki using the
 importTextFile maintenance script.

 The only other thought I had was to extend the SearchMySQL class and
 change $wgSearchType but I'm hoping to avoid that.

 Any ideas?

 --Paul

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Hook for Adding to Empty Search Results

2013-12-13 Thread Nikolas Everett
Glad to hear it!  I hadn't seen SpecialSearchResultsAppend before.  Unesful.


On Fri, Dec 13, 2013 at 9:02 AM, Paul Dugas p...@dugasenterprises.comwrote:

 Thanks Nikolas.  I found SpecialSearchResultsPrepend and
 SpecialSearchResultsAppend
 looking through the code though I didn't see them in the documentation.  I
 implemented the later to add a section below the standard search results
 that lists results from my system.  Seems to be working for now and
 requires no patching of the core code.

 P


 On Fri, Dec 13, 2013 at 8:55 AM, Nikolas Everett never...@wikimedia.org
 wrote:

  SpecialSearchResultsPrepend lets you add html directly to the search page
  but doesn't let you add your own results.  The html actually gets
 injected
  above the search for so it'd take some css trickery to move it.  Example:
  This wiki is using a new search engine. (Learn
  morehttps://www.mediawiki.org/wiki/Special:MyLanguage/Help:CirrusSearch
  )
  on https://www.mediawiki.org/wiki/Special:Search
 
  Beyond that I think you have three options:
  1.  Extend SearchMySQL.
  2.  Add a hook yourself and know that you are running a patched version
 of
  core.  I'm happy to help get the patch upstream if you don't want to live
  with that burden forever.
  3.  Add that pages with importTextFile.
 
  Nik
 
 
  On Thu, Dec 12, 2013 at 9:15 PM, Paul Dugas p...@dugasenterprises.com
  wrote:
 
   I have an extension using the ArticleFromTitle hook to generate pages
   for components of a large system we operate.  There are approximately
   6000 components at the moment with static inventory and config data in
   a database and live status data in a number of other systems.  We are
   using MediaWiki as a historical maintenance knowledge-base for the
   staff.  With this extension, we can integrate all the data for each
   device in one place.  We can hit MyNS:DeviceName and get a page that
   describes a device and that page can link to other pages in the main
   namespace that techs create with vendor details, model info, manuals,
   etc.  We can even keep a talk page for each device.  Very handy.
  
   Trouble now is I want to be able to find devices using the search
   feature.  SpecialSearchResults looked promising but that only gets
   called when there is at least one match in normal pages.  So, I
   looked at SpecialSearchNoResults but that doesn't allow me to add to
   the empty results.  Doesn't anyone have a suggestion on how I could go
   about this?  I really want to avoid generating the text of pages
   externally periodically and loading them into the wiki using the
   importTextFile maintenance script.
  
   The only other thought I had was to extend the SearchMySQL class and
   change $wgSearchType but I'm hoping to avoid that.
  
   Any ideas?
  
   --Paul
  
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l




 --

 *Paul Dugas* • *Dugas Enterprises, LLC* • *Computer Engineer*

 p...@dugasenterprises.com p...@dugasenterprises.com • +1.404.932.1355

 522 Black Canyon Park, Canton GA 30114 USA
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] OAuth currently broken on wikis with CirrusSearch

2013-12-12 Thread Nikolas Everett
Note that the wikis that say they were deployed on December 11th but do not
have a strike through them have Cirrus running, but their indexes are still
being built.  I believe OAuth will be broken on those wikis as well.

This requires two fixes to actually fix, both of which are in review state
pending approval, another test on beta, and eventual deployment.  We should
have them out sometime in the next few hours.

Nik


On Thu, Dec 12, 2013 at 11:48 AM, Dan Garry dga...@wikimedia.org wrote:

 For reference, the list of wikis which Cirrus is deployed, and therefore
 where OAuth is broken, is available here:
 https://www.mediawiki.org/wiki/Search#Wikis

 Dan


 On 12 December 2013 16:46, Dan Garry dga...@wikimedia.org wrote:

  Dear all,
 
  OAuth is currently broken on any wiki that has CirrusSearch deployed to
 it
  in either primary or secondary mode.
 
  We're working on getting this issue fixed as soon as possible. I'll post
  an update here when we have a timescale for the fix.
 
  Thanks,
  Dan
 
  --
  Dan Garry
  Associate Product Manager for Platform
  Wikimedia Foundation
 



 --
 Dan Garry
 Associate Product Manager for Platform
 Wikimedia Foundation
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] OAuth currently broken on wikis with CirrusSearch

2013-12-12 Thread Nikolas Everett
On Thu, Dec 12, 2013 at 11:53 AM, Nikolas Everett never...@wikimedia.orgwrote:

 Note that the wikis that say they were deployed on December 11th but do
 not have a strike through them have Cirrus running, but their indexes are
 still being built.  I believe OAuth will be broken on those wikis as well.

 This requires two fixes to actually fix, both of which are in review state
 pending approval, another test on beta, and eventual deployment.  We should
 have them out sometime in the next few hours.


I've just verified the fix in production.  Please let me know if any of you
are still seeing the error.

Thanks,

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] workflow to add multiple patches to gerrit.wikimedia.org:29418/operations/puppet.git

2013-11-20 Thread Nikolas Everett
Normally:

* clone a repo
* setup git  hooks

# patch 1:

* git checkout -b some_branch_name
* apply my changes
* git commit -a
* git review

# patch 2:

* git checkout production (or master on non-puppet repositories)
* git pull
* git checkout -b some_other_branch_name
* apply my changes
* git commit -a
* git review



Nik




On Wed, Nov 20, 2013 at 8:13 AM, Petr Bena benap...@gmail.com wrote:

 Currently I do:

 * clone a repo
 * setup git  hooks

 # patch 1:

 * apply my changes
 * commit
 * execute git-review

 # patch 2:

 * apply my changes
 * commit

 FAIL - the new commit it depending on previous commit - I can't push

 What am I supposed to do in order to push multiple separate patches?

 GIT-IDIOT way please, no long explanations, just commands and examples.
 Thanks

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] workflow to add multiple patches to gerrit.wikimedia.org:29418/operations/puppet.git

2013-11-20 Thread Nikolas Everett
On Wed, Nov 20, 2013 at 9:01 AM, Petr Bena benap...@gmail.com wrote:

 when I did a new branch before git-review it now show this as topic
 in gerrit: https://gerrit.wikimedia.org/r/#/c/96484/

 will it merge this to production branch?


Yes.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Tip for Sublime Text editors: DocBlockr plugin and conf for JSDuck

2013-11-19 Thread Nikolas Everett
Package Control is you friend.  How else do you install a linter or syntax
highlighting for a new language without touching a mouse?


On Tue, Nov 19, 2013 at 2:42 PM, Tomasz Finc tf...@wikimedia.org wrote:

 On Tue, Nov 19, 2013 at 3:35 AM, Krinkle krinklem...@gmail.com wrote:
  DocBlockr

 Nice. I hadn't know about  Package Control either.

 thanks

 --tomasz

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Exceptions, return false/null, and other error handling possibilities.

2013-10-08 Thread Nikolas Everett
On Tue, Oct 8, 2013 at 12:15 AM, Tim Starling tstarl...@wikimedia.org wrote:
 On 08/10/13 14:40, Erik Bernhardson wrote:
 A reviewer should be able to
 know if the error conditions are properly handled by looking at the new
 code, not by looking up all the function calls to see what they can
 possibly return.

 This is why the recommended pattern for Status objects is to return a
 Status object unconditionally

Can we add an example of that usage to the status object with a note
not to follow the return this in case of error pattern that you
might see elsewhere in the code?  It might even be worth a bit of
refactoring to get rid of the old pattern or people will still keep
finding it and copying it.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Exceptions, return false/null, and other error handling possibilities.

2013-10-07 Thread Nikolas Everett
On Mon, Oct 7, 2013 at 3:12 PM, Jeroen De Dauw jeroended...@gmail.com wrote:
 Hey,

 We use lots of libraries that happen to use composer. We just don't
 use composer to deploy them.


 Oh? Lots? Is there a list somewhere? Are most of those libraries forked?
 Are a good portion of them semi-assimilated into core? I hope the answer to
 the later two is no.


I believe the procedure is to set up a clone of them on gerrit,
include them as a submodule, and then do *something* to make the
classes autoload.  Updating from upstream should be a mater of pulling
the upstream update locally, pushing to gerrit, updating the submodule
pointer, and making sure the autoloading still makes sense.  In some
respects it is a very convenient way to do things.  In others, not so
much.

There isn't a list, they are scattered among the mediawiki extensions in gerrit.

I'm not defending it, but I can see why we do it.

Nik

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Exceptions, return false/null, and other error handling possibilities.

2013-10-07 Thread Nikolas Everett
On Mon, Oct 7, 2013 at 3:45 PM, Brion Vibber bvib...@wikimedia.org wrote:
 I've heard the vague claim that exceptions are confusing for years, but for
 the life of me I've never seen exception-handling code that looked more
 complex or confusing than code riddled with checks for magic return values.

When I'm writing Haskell nothing is more intuitive than the error
monad because that is how the compiler works.
When I'm writing Java nothing is more intuitive than exceptions
because that is how the standard library works.
When I'm writing Scala nothing is more intuitive than exceptions for
unrecoverable errors and Option/Either for recoverable ones because
that is how the standard library works.
When I'm writing C I deal with magic return values, modified
arguments, and errno because that is what libc burdens me with.
When I'm writing PHP I deal with magic return values, modifiable
arguments, and exceptions because that is what is in the standard
library.  Oh, yeah, and I deal with Status too, because we use it
sometimes.

I don't see the point in adding another error handling mechanism
beyond the ones you are stuck with in the standard library.  It is
just too much work to wrap the standard library over and over and over
again.

Unless you are writing Javascript, then promises are too compelling.

Nik

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] How to get search on mediawiki.org not to use synonyms?

2013-09-24 Thread Nikolas Everett
On Tue, Sep 24, 2013 at 12:00 PM, David Gerard dger...@gmail.com wrote:
 I just went looking for the word referer. The response started with
 lots of instances of the word reference. Put it in quotes, no
 difference. Eventually resorted to Google.

 Is MW.org using the exciting new search engine? Is there any way to
 search without using synonyms?

It is using the exciting new search engine but that search engine has
a bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=54020

For now, go back to the old search system by adding
srbackend=LuceneSearch to the search url like so:
https://www.mediawiki.org/w/index.php?title=Special%3ASearchprofile=defaultsearch=%22referer%22fulltext=Searchsrbackend=LuceneSearch

Feel free to add yourself to the cc list to watch our progress squashing it.

Nik

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wikitech-ambassadors] Fwd: Deployment highlights for the week of Sept 23rd

2013-09-23 Thread Nikolas Everett
On Mon, Sep 23, 2013 at 12:40 PM, Chris McMahon cmcma...@wikimedia.org wrote:
 So nice to see what Nik has done here.  Information on running these tests
 is in the README:
 http://git.wikimedia.org/blob/mediawiki%2Fextensions%2FCirrusSearch.git/a7d5386c659e0afff1bae24967b333b06f639512/tests%2Fbrowser%2FREADME

I'd like to get those tests running in MediaWiki-Vagrant but I just
can't find the time at the moment.  In other news,
https://www.mediawiki.org/wiki/Search/CirrusSearchFeatures now has a
reasonably complete list of CirrusSearch's features.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wikitech-ambassadors] Fwd: Deployment highlights for the week of Sept 23rd

2013-09-21 Thread Nikolas Everett
On Sat, Sep 21, 2013 at 10:26 AM, Chad innocentkil...@gmail.com wrote:
 On Fri, Sep 20, 2013 at 11:47 PM, billinghurst billinghu...@gmail.comwrote:

 Excellent news! Would someone be able to provide or point to some
 configuration and examples that English Wikisource can utilise to allow
 some side-by-side searches, and some guidance that can be provided to the
 community on the new features and their use (if there us any).


 I think Nik's e-mail from when we deployed to mw.org is still the
 best info.

 http://lists.wikimedia.org/pipermail/wikitech-l/2013-August/071548.html


CirrusSearch is pretty much a work alike for the current search so I
haven't done too much documenting.  I'll fill out
https://www.mediawiki.org/wiki/Search/CirrusSearchFeatures a boiled
down list of features.  The big one I think you care about is that
templates are evaluated during indexing.

If you can't wait you can read the regression tests here:
http://git.wikimedia.org/tree/mediawiki%2Fextensions%2FCirrusSearch.git/master/tests%2Fbrowser%2Ffeatures
.  They are written in cucumber so they should be reasonably readable.
 Fair warning: I use the term page and article pretty much
interchangeably.  Also one of the tests is failing on my development
machine but I haven't commented it out with an associated bug like I
usually do because, well, I like looking at it failing I guess.  The
bug is here: https://bugzilla.wikimedia.org/show_bug.cgi?id=53426

Nik

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] RfC update: LESS stylesheet support in core

2013-09-19 Thread Nikolas Everett
On Thu, Sep 19, 2013 at 4:04 PM, Dan Andreescu dandree...@wikimedia.org wrote:
 - Has http://learnboost.github.io/stylus/ been considered? I've heard that
 it's a good compromise between sass and less (but I haven't played with it
 myself to see if it really lets you do more compass-like things).


 *Popularity* - does matter; one of the long comment threads on the RFC is
 from a potential contributor who is concerned that LESS makes it harder to
 contribute.  I mostly agree with Jon's and Steven's arguments that LESS is
 pretty easy to learn.  However, I have also heard about a year's worth of
 complaints about Limn being written in Coco instead of pure Javascript.  I
 personally think CSS - LESS is just as mentally taxing as Javascript -
 Coco, but I'm objectively in the minority based on the feedback I've
 received.  I'd be cautious here.  You can upcompile CSS into LESS, sure,
 but if a contributor has to understand a complex LESS codebase full of
 mixins and abstractions while debugging the generated CSS in the browser,
 they're right to point out that this requires effort.  And this is effort
 is only increased for more elegant languages like Stylus.


I'm for any compiled-to-css language because I feel they fill a big
gaping hole in css's ability to share code.  That is really compelling
to me.  I haven't been convinced the compiled-to-js languages offer
quite as compelling a value proposition so the analogy to Limn and
Coco is less relevant to me.  I admit I could be wrong about the value
proposition thing but that is how I feel.  I really don't want to
start a language war though.

I'm a Sass fan but I'll take whatever I can get.

I will point out that CSS is valid LESS which could assuage some fears.

Nik Everett

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] CirrusSearch on mediawiki.org

2013-09-07 Thread Nikolas Everett
On Sat, Sep 7, 2013 at 5:29 PM, MZMcBride z...@mzmcbride.com wrote:
 Federico Leva (Nemo) wrote:
Nice! As for next steps, what about using Wiktionary as next pioneering
project for the new CirrusSearch (first opt-in and then default)?
It exists in most languages (we really need to see how the new search
works in different languages), it's one of the most impacted projects by
the new features (e.g. expanded templates indexing) [...].

 This seems like a good idea to me. Chad / Nik: your thoughts?


English Wiktionary is certainly on my list of possible next victims.
I don't _think_ I want to flip the switch on all the languages at once
though.  On the other hand if anyone in the community really really
really wants to try it we'd love work with them.  I like enthusiasm.
And I want to try CirrusSearch against other languages but it wouldn't
do much good without someone active in that community willing to test
it.

Nik

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] BetaFeatures framework, and a minor call for technical input

2013-09-04 Thread Nikolas Everett
I worked on an accounting system with similar requirements and we had
an even more complicated system but one you might want to consider:
1.  When something happens record the event and how much it changed
the value along with a timestamp.  In our case we'd just have enable
and disable events.
2.  We ran a job that summarized those events into hourly changes.
3.  Every day we a log of the actual value (at midnight or whatever).

This let us quickly make all kinds of crazy graphs with super deep
granularity over short periods of time and less granularity over long
periods of time.  Essentially it was an accountant's version of
RRDtool.  It didn't have problems with getting out of sync because we
never had more than one process update more than one field.

It is probably overkill but might serve as a dramatic foil to the simpler ideas.

Nik



On Tue, Sep 3, 2013 at 5:58 PM, Mark Holmquist mtrac...@member.fsf.org wrote:
 Timezone-appropriate greeting, wikitech!

 I've been working on a new extension, BetaFeatures[0]. A lot of you have
 heard about it through the grapevine, and for the rest of you, consider
 this an announcement for the developers. :)

 The basic idea of the extension is to enable features to be enabled
 experimentally on a wiki, on an opt-in basis, instead of just launching
 them immediately, sometimes hidden behind a checkbox that has no special
 meaning in the interface. It also has a lot of cool design work on top
 of it, courtesy of Jared and May of the WMF design team, so thanks very
 much to them. There are still a few[1] things[2] we have to build out,
 but overall the extension is looking pretty nice so far.

 I am of course always soliciting advice about the extension in general,
 but in particular, we have a request for a feature for the fields that
 has been giving me a bit of trouble. We want to put a count of users that
 have each preference enabled on the page, but we don't want to, say, crash
 the site with long SQL queries. Our theories thus far have been:

 * Count all rows (grouped) in user_properties that correspond to properties
   registered through the BetaFeatures hook. Potentially a lot of rows,
   but we have at least decided to use an IN query, as opposed to LIKE,
   which would have been an outright disaster. Obviously: Caching. Caching
   more would lead to more of the below issues, though.

 * Fire off a job, every once in a while, to update the counts in a table
   that the extension registers. Downsides: Less granular, sort of fakey
   (since one of the subfeatures will be incrementing the count, live,
   when a user enables a preference). Upside: Faster.

 * Update counts with simple increment/decrement queries. Upside: Blazingly
   faster. Potential downside: Might get out of sync. Maybe fire off jobs
   even less frequently, to ensure it's not always out of date in weird
   ways?

 So my question is, which of these are best, and are there even better
 ways out there? I love doing things right the first time, hence my asking.

 [0] https://www.mediawiki.org/wiki/Extension:BetaFeatures
 [1] https://mingle.corp.wikimedia.org/projects/multimedia/cards/2
 [2] https://mingle.corp.wikimedia.org/projects/multimedia/cards/21

 P.S. One of the first features that we'll launch with this framework is
 the MultimediaViewer extension which is also under[3] development[4]
 as we speak. Exciting times for the Multimedia team!

 [3] https://mingle.corp.wikimedia.org/projects/multimedia/cards/8
 [4] https://mingle.corp.wikimedia.org/projects/multimedia/cards/12

 --
 Mark Holmquist
 Software Engineer, Multimedia
 Wikimedia Foundation
 mtrac...@member.fsf.org
 https://wikimediafoundation.org/wiki/User:MHolmquist

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] New search backend live on mediawiki.org

2013-08-28 Thread Nikolas Everett
Today we threw the big lever and turned on our new search backend at
mediawiki.org.  It isn't the default yet but it is just about ready for you
to try.  Here is what is we think we've improved:
1.  Templates are now expanded during search so:
1a.  You can search for text included in templates
1b.  You can search for categories included in templates
2.  The search engine is updated very quickly after articles change.
3.  A few funky things around intitle and incategory:
3a.  You can combine them with a regular query (incategory:kings peaceful)
3b.  You can use prefix searches with them (incategory:norma*)
3c.  You can use them everywhere in the query (roger incategory:normans)

What we think we've made worse and we're working on fixing:
1.  Because we're expanding templates some things that probably shouldn't
be searched are being searched.  We've fixed a few of these issues but I
wouldn't be surprised if more come up.  We opened Bug 53426 regarding audio
tags.
2.  The relative weighting of matches is going to be different.  We're
still fine tuning this and we'd appreciate any anecdotes describing search
results that seem out of order.
3.  We don't currently index headings beyond the article title in any
special way.  We'll be fixing that soon. (Bug 53481)
4.  Searching for file names or clusters of punctuation characters doesn't
work as well as it used to.  It still works reasonably well if you surround
your query in quotes but it isn't as good as it was.  (Bugs 53013 and 52948)
5.  Did you mean suggestions currently aren't highlighted at all and
sometimes we'll suggest things that aren't actually better. (Bugs 52286 and
52860)
6.  incategory:category with spaces isn't working. (Bug 53415)

What we've changed that you probably don't care about:
1.  Updating search in bulk is much more slow then before.  This is the
cost of expanding templates.
2.  Search is now backed by a horizontally scalable search backend that is
being actively developed (Elasticsearch) so we're in a much better place to
expand on the new solution as time goes on.

Neat stuff if you run your own MediaWiki:
CirrusSearch is much easier to install than our current search
infrastructure.

So what will you notice?  Nothing!  That is because while the new search
backend (CirrusSearch) is indexing we've left the current search
infrastructure as the default while we work on our list of bugs.  You can
see the results from CirrusSearch by performing your search as normal and
adding srbackend=CirrusSearch to the url parameters.

If you notice any problems with CirrusSearch please file bugs directly for
it:
https://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensionscomponent=CirrusSearch

Nik Everett
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] New search backend live on mediawiki.org

2013-08-28 Thread Nikolas Everett
On Wed, Aug 28, 2013 at 3:37 PM, Paul Selitskas p.selits...@gmail.comwrote:

 Will it be set as the search backend further on Wikimedia projects?


Yes.  I'm not sure when though.


 Is there source code available for Elasticsearch on Gerrit?


Our plugin that interacts with Elasticsearch is called CirrusSearch and
lives in gerrit here:
https://gerrit.wikimedia.org/r/#/projects/mediawiki/extensions/CirrusSearch,dashboards/default
https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/CirrusSearch
Elasticsearch lives in github here:
https://github.com/elasticsearch/elasticsearch


 Stemming doesn't work for some languages at all, thus
 searching exact matches only.


Stemming is done based on the language of the wiki.  I expect only English
stemming to work on mediawiki.org.  Right now we use the default language
analysers for all the languages that Elasticsearch supports out of the box (
http://www.elasticsearch.org/guide/reference/index-modules/analysis/lang-analyzer/)
with some customizations for English.  Languages that aren't better
supported get a default analyser that doesn't do any stemming and splits
on spaces.  I expect we'll have to add build some more analysers in the
future.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Article Concerning Error Handling

2013-08-27 Thread Nikolas Everett
On Tue, Aug 27, 2013 at 2:48 PM, Tyler Romeo tylerro...@gmail.com wrote:

 I know this list isn't really for linking stuff, but I found this article
 earlier today:


 http://zenol.fr/site/2013/08/27/an-alternative-error-handling-strategy-for-cpp/

 It's about C++, but what it describes is very relevant to our error
 handling since we use the exact same pattern (via the Status class) except
 in PHP.


I have to admin that I skimmed the article but I don't believe we use the
pattern that he describes.  It looks like he's advocating using an error
monad.  That'd bring our error handling pattern count up to 4.  All we'd
need next is promises!  Seriously though, either data structure could be
useful for us but we'd want to weigh the extra brain space required to use
them.  And the impedance between those structures and traditional error
handling.  And the performance

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] CirrusSearch live on test2wiki

2013-08-15 Thread Nikolas Everett
On Thu, Aug 15, 2013 at 8:09 PM, Daniel Friesen
dan...@nadir-seen-fire.comwrote:

 Wait, Elasticsearch? I thought the original discussions were about Solr?


It certainly started that way but there were but some rather insistent
folks talked me in to giving Elasticsearch a chance.  I spent a week
putting together a prototype and I was so impressed that I convinced us to
move over.  I'm reasonably sure I sent out an email at the time.  I know I
updated the RFC.  In any case, that is where we are.

As far what impressed me about elasticsearch:
I like the documentation.
I like the query syntax.
I like the fully baked schema api.
I (mostly) liked the source code itself.
i like the deb package.
I like how organized the bug submission and contribution process is.
Seriously, if you are running an open source project, build something like
http://www.elasticsearch.org/contributing-to-elasticsearch/ .  Forcing the
user to reproduce bugs with curl is genius for a service like elasticsearch.

So, yeah, we started with solr but didn't stay there.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Request for Comments: New Search

2013-07-22 Thread Nikolas Everett
Scott,

I was going to respond to this a while ago but couldn't really do it
justice.  I'm still pretty sure my explanation won't be great, which is an
indication of just how good Google is.

For strait search there is nothing we can do that Google can't.  It might
cost them more time and money to make searching mediawiki awesome but they
lots of both so we're just not going to beat them there.  There are a few
things that we can do more easily/cheaply than Google:
1.  We can update our search index right when changes are made including
when changes are made to transcluded pages.
2.  We can search based on redirects to a page.
3.  We can filter (and maybe one day facet) based on categories.
4.  We could search based on citations.

We will, on the other hand, be better about listening to what the community
needs with regards to search.  Part of the problem here is that
historically we've let search languish and my first foray into making
search nicer isn't going to provide much new stuff for the community.
Instead its a solid platform on which to build things that the community
needs and which should make search less exciting for operations engineers.
That really isn't exciting for the community to hear and for that I am
sorry.  I can only promise that we'll do more later.

There are some more deep integrations into mediawiki that I don't see
google doing but we could work on in the future:
1.  We could create a section that allowed users to easily find similar
pages.  I'm a little fuzzy on exactly how we'd calculate similarity.
2.  We could automatically dig around in commons for useful media for an
article.  We could use this to automatically provide extra media which
might be relevant or as a curation aid.  On second thought the second one
sounds much better.

Actually, some kind of game around tagging media as relevant to an article
might be quite a decent way to encourage engagement.  By game I mean
something like Galaxy Zoo or LinkedIn's endorsements.  You could do this
without a nice search but it'd help produce much more relevant results.

And then there is the cynic in me that says that it is worth doing just so
we aren't reliant on external (corporate) entities.  I'm really not sure
how I would feel if the only way to find stuff on WMF's wikis was with
Google/Bing/Yahoo

Finally we have the private wikis like you mentioned - they mostly can't
use google.  We are trying to make sure CirrusSearch works for them.  The
idea there is to provide something that is better at finding results than
the database based search because it uses the same analysis that we've
optimized for WMF.  Elasticsearch isn't some kind of precision tuned
machine - you can actually get quite decent behaviour out of downloading
the deb or rpm and installing it.  You only really need one instance.

So now that I've created this wall of text I don't feel that I've really
answered your question well, but I've answered it.  That is the thing about
hard questions: they are harder to answer than to ask.

I'd really love more brainstorming.  Cross wiki search was another good
idea someone added to the page a while ago.

Nik





On Fri, Jul 19, 2013 at 2:24 PM, C. Scott Ananian canan...@wikimedia.orgwrote:

 I wonder if there are queries or use cases we can support that *aren't*
 already better handled by google.  Granted, users of private wikis can't
 simply use the 'site:' trick to reuse Google search results -- but users of
 private wikis also probably don't need superduper scalability.

 Trying to brainstorm here, not start a flame war.  What sorts of useful
 searches could we excel at?  (Maybe these are searches/use cases that will
 facilitate editor engagement?)
  --scott

 --
 (http://cscott.net)
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Request for Comments: New Search

2013-07-19 Thread Nikolas Everett
Everyone,

I'm reviving this old thread to update everyone on the status of the RFC:

We've continued working on implementation and everything seems to be
proceeding smoothly.  We evaluated Elasticsearch and were super impressed
and decided it was very likely to be worth switching from Solr4 to it.  The
evaluation and the switch did cost some time but in my opinion doing it was
time well spent.

Thanks so much for your comments a month ago when I first posted this. If
you are interested please give the page another look.  Just to be helpful,
here is a link to what I changed:
http://www.mediawiki.org/w/index.php?title=Requests_for_comment%2FCirrusSearchdiff=740790oldid=728213

Nik Everett

On Fri, Jun 14, 2013 at 4:21 PM, Nikolas Everett never...@wikimedia.orgwrote:

 So Chad and I feel like we've gotten far enough in our prototype of our
 new search backend for MediaWiki that we're ready to request comments.  So
 here is our format RFC:
 https://www.mediawiki.org/wiki/Requests_for_comment/CirrusSearch

 You'll note that the plugin is called CirrusSearch.  SolrSearch seems to
 have been taken by an unrelated project so we had to pick a different name.

 Please read and comment in whatever way is normal for these things.

 Thanks so much for your attention,

 Nik Everett

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Project idea

2013-07-12 Thread Nikolas Everett
As a ChromeOS user I really just think of it as a laptop with a funky set
of apps.  I'm pretty sure I wouldn't have thought to search for a wikipedia
app for it because I'm so used to getting wikipedia in the browser.

On the other hand if the app could modify search key behaviour so I can hit
search, type wikipedia, hit tab, type search term, then hit enter, then I'd
like that.  On the other other hand I already have this behaviour in all
browser windows so from (pretty much) anywhere in the OS I can hit ctrl-t,
ctrl-l, type wikipedia, hit tab, type search term, then hit enter.  Also,
it feels like that search key behaviour is up to google anyway and at some
point they'll make it work the same as the location bar.

Nik


On Fri, Jul 12, 2013 at 2:22 PM, Steven Walling steven.wall...@gmail.comwrote:

 On Fri, Jul 12, 2013 at 9:00 AM, Brion Vibber bvib...@wikimedia.org
 wrote:

  I'd recommend against building any specific 'app' for a web-based OS like
  this, but if we can have a Chrome Web Store entry that conveniently
  bookmarks us and that makes us easier to use, well that'd be awesome.
 

 You mean you recommend against OS-specific apps, like we have specific apps
 for Windows Phone, iOS, and Android? ;)

 Snark aside: what you proposed is essentially how most Chrome apps work and
 is easiest to implement. For HTML5 games and such, I'm sure it's more
 app-like in that you may not be able to launch the game without installing
 the app, but most people basically just redirect users to the normal site.
 Obviously this makes the use of the name app seem bizarre, but the
 advantage for ChromeOS users is that we make it easier to get back to
 Wikipedia. (One step instead of three.)
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Search documentation

2013-06-17 Thread Nikolas Everett
I'm not sure about http://www.mediawiki.org/wiki/Help:Searching but
https://en.wikipedia.org/wiki/Help:Searching has lots of things we're going
to have to add to our list.  My guess is
http://www.mediawiki.org/wiki/Help:Searching is simply out of date.

Nik


On Mon, Jun 17, 2013 at 4:33 PM, Chris McMahon cmcma...@wikimedia.orgwrote:

 On Mon, Jun 17, 2013 at 1:28 PM, S Page sp...@wikimedia.org wrote:

  
  * enwiki says Hello dolly in quotes gives different results, mw
 directly
  contradicts this. Even on my local wiki, quotes make a difference.
 
  * enwiki disagrees with itself what a dash in front of a word does.
 

 I did some research a few weeks ago on the current state of Search and
 there are a number of discrepancies between the documentation and actual
 behavior.  Some of them have BZ tickets, like
 https://bugzilla.wikimedia.org/show_bug.cgi?id=44238
 -Chris
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Search documentation

2013-06-17 Thread Nikolas Everett
One of our goals while building this has been to make something reasonably
easy to install by folks outside of WMF.  I've added some notes about this
to the page.  I'd certainly love to hear ways that'd make it simpler to use.

Nik


On Mon, Jun 17, 2013 at 8:23 PM, Brian Wolff bawo...@gmail.com wrote:

 Just as a note, MediaWiki default (aka crappy) search is very
 different from the lucene stuff used by Wikimedia. Lucene search is
 rather difficult to set up, so most third party wikis do not use it.

 --bawolff


 On 6/17/13, Nikolas Everett never...@wikimedia.org wrote:
  I'm not sure about http://www.mediawiki.org/wiki/Help:Searching but
  https://en.wikipedia.org/wiki/Help:Searching has lots of things we're
 going
  to have to add to our list.  My guess is
  http://www.mediawiki.org/wiki/Help:Searching is simply out of date.
 
  Nik
 
 
  On Mon, Jun 17, 2013 at 4:33 PM, Chris McMahon
  cmcma...@wikimedia.orgwrote:
 
  On Mon, Jun 17, 2013 at 1:28 PM, S Page sp...@wikimedia.org wrote:
 
   
   * enwiki says Hello dolly in quotes gives different results, mw
  directly
   contradicts this. Even on my local wiki, quotes make a difference.
  
   * enwiki disagrees with itself what a dash in front of a word does.
  
 
  I did some research a few weeks ago on the current state of Search and
  there are a number of discrepancies between the documentation and actual
  behavior.  Some of them have BZ tickets, like
  https://bugzilla.wikimedia.org/show_bug.cgi?id=44238
  -Chris
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Request for Comments: New Search

2013-06-14 Thread Nikolas Everett
So Chad and I feel like we've gotten far enough in our prototype of our new
search backend for MediaWiki that we're ready to request comments.  So here
is our format RFC:
https://www.mediawiki.org/wiki/Requests_for_comment/CirrusSearch

You'll note that the plugin is called CirrusSearch.  SolrSearch seems to
have been taken by an unrelated project so we had to pick a different name.

Please read and comment in whatever way is normal for these things.

Thanks so much for your attention,

Nik Everett
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Architecture Guidelines: Writing Testable Code

2013-06-04 Thread Nikolas Everett
On Tue, Jun 4, 2013 at 12:36 PM, Jeroen De Dauw jeroended...@gmail.comwrote:

 Hey,

 My own experience is that test coverage is a poor evaluation metric
  for anything but test coverage; it doesn't produce better code, and
  tends to produce code that is considerably harder to understand
  conceptually because it has been over-factorized into simple bits that
  hide the actual code and data flow.  Forest for the trees.
 

 Test coverage is a metric to see how much of your code is executed by your
 tests. From this alone you cannot say if some code is good or bad. You can
 have bad code with 100% coverage, and good code without any coverage. You
 are first stating it is a poor metric to measure quality and then proceed
 to make the claim that more coverage implies bad code. Aside from
 contradicting yourself, this is pure nonsense. Perhaps you just expressed
 yourself badly, as test coverage does not produce code to begin with.


The thing is quite a few of us have seen cases where people bend over
backwards for test coverage, sacrificing code quality and writing tests
that don't provide any real value.  In this respect high test coverage can
poison your code.  It shouldn't but it can.

The problem is rejecting changes like this while still encouraging people
to write the useful kinds of tests - tests for usefully large chunks that
serve as formal documentation.  Frankly, one of my favorite tools in the
world is Python's doctests because the test _is_ the documentation.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Architecture Guidelines: Writing Testable Code

2013-06-03 Thread Nikolas Everett
I have no qualms with any of the guidelines.  They are good guidelines but
like all guidelines they are made to be bent when appropriate so long as
you leave a good explanatory comment.  My main concern is that the article
is about test how to write more unit testable code which is something I
think people take too far.  The thing that unit tests are good for is
testing that a unit of code does what you expect it to.  The problem is
that people sometimes test portions of atomic units without testing the
whole unit.  Java folks are especially dogmatic about testing just one
class at a time which is a great guideline but tends to be the wrong thing
to do about 20% of the time.

My favorite example of this is testing a Repository or a DAO with a mock
database.  A repository's job is to issue the correct queries to the
database and spit the results back correctly.  Without talking to an actual
database you aren't testing this.  Without some good test data in that
database you aren't testing this.  I'd go so far as to say you have to talk
to _exactly_ the right database (MySQL in our case) but other very smart
people disagree with me on that point.

While this example is especially silly I'm sure we've all finished writing
a tests, looked at the test code and thought, This test proves that I'm
interacting correctly with collaborator objects but doesn't prove that my
functionality is correct.  Sometimes this is caused by collaborators being
non-obvious.  Sometimes this is caused by global state that you have to
work around.  In any case I'd argue that these tests should really be
deleted because all they really do is make your code coverage statistics
better, give you a false sense of security, and slow down your builds.

So I just wrote a nice little wall of text about what is wrong with the
world and like any good preacher I'll propose a few solutions:
1.  Live with having bigger units.  Call the tests an integration test if
it makes you feel better.  I don't really care.  But you have to stand up
the whole database connection, populate it with test data
that mimics production in a useful sense, and then run the query.
2.  Build smaller components sensibly and carefully.  The goal is to be
able to hold all of the component in your head at once and for the
component to present such a clean API that when you mock it out tests are
meaningful.
2.  Write tests that test the entire application after it is started with
stuff like Selenium.  The disadvantage here is that these run way slower
than unit tests and require you learn yet another tool.  Too bad.  Some
stuff is simply untestable without a real browser like Tim's HTML forms.
3.  Use lots of static analysis tools.  They really do help identify dumb
mistakes and don't even require you do anything other than turn them on,
run them before you commit, and fail the build when they fail.  Worth it.
4.  Don't write automated tests at all and do lots of code reviews and
manual testing.  Sometimes this is really the most sensible thing.  I'll
leave it to you to figure out when that is though.

There is a great presentation on InfoQ about unit testing that I can't find
anymore where the presenter likens testing to guard rails.  He claims that
just because you have guard rails you shouldn't stop paying attention and
expect them to save you.

Sorry for the rambling wall of text.

Nik


On Mon, Jun 3, 2013 at 7:58 AM, Daniel Kinzler dan...@brightbyte.de wrote:

 Thanks for your thoughtful reply, Tim!

 Am 03.06.2013 07:35, schrieb Tim Starling:
  On 31/05/13 20:15, Daniel Kinzler wrote:
  Writing Testable Code by Miško Hevery
  
 http://googletesting.blogspot.de/2008/08/by-miko-hevery-so-you-decided-to.html
 .
 
  It's just 10 short and easy points, not some rambling discussion of
 code philosophy.
 
  I'm not convinced that unit testing is worth doing down to the level
  of detail implied by that blog post. Unit testing is essential for
  certain kinds of problems -- especially complex problems where the
  solution and verification can come from two different (complementary)
  directions.

 I think testability is important, but I think it's not the only (or even
 main)
 reason to support the principles from that post. I think these principles
 are
 also important for maintainability and extensibility.

 Essentially, they enforce modularization of code in a way that makes all
 parts
 as independent of each other as possible. This means they can also be
 understood
 by themselves, and can easily be replaced.

  But if you split up your classes to the point of triviality, and then
  write unit tests for a couple of lines of code at a time with an
  absolute minimum of integration, then the tests become simply a mirror
  of the code. The application logic, where flaws occur, is at a higher
  level of abstraction than the unit tests.

 That's why we should have unit tests *and* integration tests.

 I agree though that it's not necessary or helpful to enforce the maximum
 

Re: [Wikitech-l] Architecture Guidelines: Writing Testable Code

2013-06-03 Thread Nikolas Everett
On Mon, Jun 3, 2013 at 10:20 AM, Jeroen De Dauw jeroended...@gmail.comwrote:

  4.  Don't write automated tests at all and do lots of code reviews and
  manual testing.  Sometimes this is really the most sensible thing.  I'll
  leave it to you to figure out when that is though.
 

 Absolutist statements are typically wrong. There are almost always cases in
 which some practice is not applicable. However I strongly disagree with
 your recommendation of not writing tests and automating them. I disagree
 even stronger with the notion that manual testing is generally something
 you want to do. I've seen many experts in the field of software design
 recommend strongly against manual testing, and am currently seeing the same
 theme being pretty prevalent here at the International PHP Conference I'm
 currently attending.


I think not having automated tests is right in some situations but I
certainly wouldn't recommend it.  Manual testing sucks and having nice
tests with Selenium or some such tool is way better in most situations but
there are totally times where a good code review and manual verification
are perfect.  I'm thinking of temporary solutions or styling issues are
difficult to verify with automated tests.  I'm certainly no expert and I'd
_love_ to learn more about things that help in the situations where I feel
like manual testing is best.  I'd love nothing more than to be wrong.



 So my question is not how do we write code that is maximally
  testable, it is: does convenient testing provide sufficient benefits
  to outweigh the detrimental effect of making everything else
 inconvenient?
 

 This contains the suggestion that testable code inherently is badly
 designed. That is certainly not the case. Good design and testability go
 hand in hand. One of the selling points of testing is that it strongly
 encourages you to create well designed software.


IMHO you can design code so that it is both easy to understand and easy to
test but there is a real temptation to sacrifice comprehenability for
testability.  Mostly I see this in components being split into
incomprehensibly small chunks and then tested via an intricate mock waltz.
 I'm not saying this happens all the time, only that this happens and we
need to be vigilent.  The guidelines in the article help prevent such
craziness.


 There are other advantages to writing tests as well. Just out of the top of
 my head:

 * Regression detection
 * Replaces manual testing with automated testing, saves lots of time, esp
 in projects with multiple devs. Manual testing tends to be incomplete and
 skipped as well, so the number of bugs caught is much lower. And it does
 not scale. At all.
 * Documentation so formal it can be executed and is never out of date
 * Perhaps the most important: removes the fear of change. One can refactor
 code to clean up some mess without having to fear one broke existing
 behavior. Tests are a great counter to code rot. Without tests, your code
 quality is likely to decline.


This is perfect!  If you think of your tests as formal verification
documents then you are in good shape because this implies that the tests
are readable.

If I had my druthers I'd like all software to be designed in such a way
that it can be tested automatically with informative tests that read like
documentation.  We'd all like that.  To me it looks like there are three
problems:
1.  How do you keep out tests that are incomprehensible as documentation?
2.  What do you do with components for which no unit test can be written
that could serve as documentation?
3.  What do you do when the formal documentation will become out of date so
fast that it feels like a waste of time to write it?

I really only have a good answer for #2 and that is to test components
together like the DB and Repository or the server side application and the
browser.

1 troubles me quite a  bit because I've found those tests to be genuinely
hurtful in that they give you the sense that you are acomplishing something
when you aren't.

Nik
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l