This would allow them to be run in parallel, yes. Being in separate databases means extra last-minute checks for existence or security (CYA for if deletes get missed) are skipped as interwiki links are generated, but overall not a big deal and an expected loss as part of the interwiki search.
On Wed, Sep 9, 2015 at 10:11 AM, Kevin Smith <[email protected]> wrote: > Would this help if we wanted to simultaneously search multiple wikis, or > are those in separate databases so it would have no effect? > > > Kevin Smith > Agile Coach, Wikimedia Foundation > > > On Wed, Sep 9, 2015 at 5:18 AM, David Causse <[email protected]> > wrote: > >> Thanks Erik! >> >> This is very promising and it opens a lot of new possibilities. >> Guessing the gain is pretty hard but I think we run many small requests >> where network overhead is quite high compared to the actual work done by >> elastic. This would definitely help. >> >> Le 08/09/2015 21:01, Erik Bernhardson a écrit : >> >> The php engine used in prod by the wmf, hhvm, has built in support for >> shared (non-preemptive) concurrency via async/await keywords[1][2]. Over >> the weekend i spent some time converting the Elastica client library we use >> to work asynchronously, which would essentially let us continue on >> performing other calculations in the web request while network requests are >> processing. I've only ported over the client library[3], not the >> CirrusSearch code. Also this is not a complete port, there are a couple >> code paths that work but most of the test suite still fails. >> >> The most obvious place we could see a benefit from this is when multiple >> queries are issued to elasticsearch from a single web request. If the >> second query doesn't depend on the results of the first it can be issued in >> parallel. This is actually somewhat common use case, for example doing a >> full text and a title search in the same request. I'm wary of making much >> of a guess in terms of actual latency reduction we could expect, but maybe >> on the order of 50 to 100 ms in cases which we currently perform requests >> serially and we have enough work to process. Really its hard to say at this >> point. >> >> In addition to making some existing code faster, having the ability to do >> multiple network operations in an async manner opens up other possibilities >> when we are implementing things in the future. In closing, this currently >> isn't going anywhere it was just something interesting to toy with. I >> think it could be quite interesting to investigate further. >> >> [1] http://docs.hhvm.com/manual/en/hack.async.php >> [2] https://phabricator.wikimedia.org/T99755 >> [2] https://github.com/ebernhardson/Elastica/tree/async >> >> >> >> _______________________________________________ >> Wikimedia-search mailing >> [email protected]https://lists.wikimedia.org/mailman/listinfo/wikimedia-search >> >> >> >> _______________________________________________ >> Wikimedia-search mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search >> >> > > _______________________________________________ > Wikimedia-search mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikimedia-search > >
_______________________________________________ Wikimedia-search mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
