This would allow them to be run in parallel, yes. Being in separate
databases means extra last-minute checks for existence or security (CYA for
if deletes get missed) are skipped as interwiki links are generated, but
overall not a big deal and an expected loss as part of the interwiki search.

On Wed, Sep 9, 2015 at 10:11 AM, Kevin Smith <[email protected]> wrote:

> Would this help if we wanted to simultaneously search multiple wikis, or
> are those in separate databases so it would have no effect?
>
>
> Kevin Smith
> Agile Coach, Wikimedia Foundation
>
>
> On Wed, Sep 9, 2015 at 5:18 AM, David Causse <[email protected]>
> wrote:
>
>> Thanks Erik!
>>
>> This is very promising and it opens a lot of new possibilities.
>> Guessing the gain is pretty hard but I think we run many small requests
>> where network overhead is quite high compared to the actual work done by
>> elastic. This would definitely help.
>>
>> Le 08/09/2015 21:01, Erik Bernhardson a écrit :
>>
>> The php engine used in prod by the wmf, hhvm, has built in support for
>> shared (non-preemptive) concurrency via async/await keywords[1][2]. Over
>> the weekend i spent some time converting the Elastica client library we use
>> to work asynchronously, which would essentially let us continue on
>> performing other calculations in the web request while network requests are
>> processing. I've only ported over the client library[3], not the
>> CirrusSearch code. Also this is not a complete port, there are a couple
>> code paths that work but most of the test suite still fails.
>>
>> The most obvious place we could see a benefit from this is when multiple
>> queries are issued to elasticsearch from a single web request. If the
>> second query doesn't depend on the results of the first it can be issued in
>> parallel. This is actually somewhat common use case, for example doing a
>> full text and a title search in the same request. I'm wary of making much
>> of a guess in terms of actual latency reduction we could expect, but maybe
>> on the order of 50 to 100 ms in cases which we currently perform requests
>> serially and we have enough work to process. Really its hard to say at this
>> point.
>>
>> In addition to making some existing code faster, having the ability to do
>> multiple network operations in an async manner opens up other possibilities
>> when we are implementing things in the future.  In closing, this currently
>> isn't going anywhere it was just something interesting to toy with.  I
>> think it could be quite interesting to investigate further.
>>
>> [1] http://docs.hhvm.com/manual/en/hack.async.php
>> [2] https://phabricator.wikimedia.org/T99755
>> [2] https://github.com/ebernhardson/Elastica/tree/async
>>
>>
>>
>> _______________________________________________
>> Wikimedia-search mailing 
>> [email protected]https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
>>
>>
>>
>> _______________________________________________
>> Wikimedia-search mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
>>
>>
>
> _______________________________________________
> Wikimedia-search mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
>
>
_______________________________________________
Wikimedia-search mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

Reply via email to