Just to follow up here, i've updated the `async` branch of my Elastica
fork, it now completely passes the test suite so might be ready for further
CirrusSearch testing.

On Wed, Sep 9, 2015 at 12:23 PM, Erik Bernhardson <
ebernhard...@wikimedia.org> wrote:

> This would allow them to be run in parallel, yes. Being in separate
> databases means extra last-minute checks for existence or security (CYA for
> if deletes get missed) are skipped as interwiki links are generated, but
> overall not a big deal and an expected loss as part of the interwiki search.
>
> On Wed, Sep 9, 2015 at 10:11 AM, Kevin Smith <ksm...@wikimedia.org> wrote:
>
>> Would this help if we wanted to simultaneously search multiple wikis, or
>> are those in separate databases so it would have no effect?
>>
>>
>> Kevin Smith
>> Agile Coach, Wikimedia Foundation
>>
>>
>> On Wed, Sep 9, 2015 at 5:18 AM, David Causse <dcau...@wikimedia.org>
>> wrote:
>>
>>> Thanks Erik!
>>>
>>> This is very promising and it opens a lot of new possibilities.
>>> Guessing the gain is pretty hard but I think we run many small requests
>>> where network overhead is quite high compared to the actual work done by
>>> elastic. This would definitely help.
>>>
>>> Le 08/09/2015 21:01, Erik Bernhardson a écrit :
>>>
>>> The php engine used in prod by the wmf, hhvm, has built in support for
>>> shared (non-preemptive) concurrency via async/await keywords[1][2]. Over
>>> the weekend i spent some time converting the Elastica client library we use
>>> to work asynchronously, which would essentially let us continue on
>>> performing other calculations in the web request while network requests are
>>> processing. I've only ported over the client library[3], not the
>>> CirrusSearch code. Also this is not a complete port, there are a couple
>>> code paths that work but most of the test suite still fails.
>>>
>>> The most obvious place we could see a benefit from this is when multiple
>>> queries are issued to elasticsearch from a single web request. If the
>>> second query doesn't depend on the results of the first it can be issued in
>>> parallel. This is actually somewhat common use case, for example doing a
>>> full text and a title search in the same request. I'm wary of making much
>>> of a guess in terms of actual latency reduction we could expect, but maybe
>>> on the order of 50 to 100 ms in cases which we currently perform requests
>>> serially and we have enough work to process. Really its hard to say at this
>>> point.
>>>
>>> In addition to making some existing code faster, having the ability to
>>> do multiple network operations in an async manner opens up other
>>> possibilities when we are implementing things in the future.  In closing,
>>> this currently isn't going anywhere it was just something interesting to
>>> toy with.  I think it could be quite interesting to investigate further.
>>>
>>> [1] http://docs.hhvm.com/manual/en/hack.async.php
>>> [2] https://phabricator.wikimedia.org/T99755
>>> [2] https://github.com/ebernhardson/Elastica/tree/async
>>>
>>>
>>>
>>> _______________________________________________
>>> Wikimedia-search mailing 
>>> listWikimedia-search@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikimedia-search
>>>
>>>
>>>
>>> _______________________________________________
>>> Wikimedia-search mailing list
>>> Wikimedia-search@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
>>>
>>>
>>
>> _______________________________________________
>> Wikimedia-search mailing list
>> Wikimedia-search@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
>>
>>
>
_______________________________________________
Wikimedia-search mailing list
Wikimedia-search@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

Reply via email to